Summary: | [CI][DRMTIP] igt@kms_cursor_legacy@2x-long-flip-vs-cursor-legacy - incomplete - RIP: 0010:skl_check_pipe_max_pixel_rate+0x8b/0x2d0 | ||
---|---|---|---|
Product: | DRI | Reporter: | Lakshmi <lakshminarayana.vudum> |
Component: | DRM/Intel | Assignee: | Clinton Taylor <clinton.a.taylor> |
Status: | CLOSED WORKSFORME | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Severity: | normal | ||
Priority: | high | CC: | intel-gfx-bugs, james.ausmus |
Version: | XOrg git | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | ICL | i915 features: | display/atomic |
Description
Lakshmi
2018-12-18 11:16:02 UTC
A use-after-free, pipe state from drm_atomic_crtc_state_for_each_plane_state()? Still happening: <4> [791.308567] general protection fault: 0000 [#1] PREEMPT SMP NOPTI <4> [791.308571] CPU: 3 PID: 170 Comm: kworker/3:1 Tainted: G U 5.0.0-g2b6425f8c26c-drmtip_243+ #1 <4> [791.308573] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.3087.A00.1902250334 02/25/2019 <4> [791.308576] Workqueue: events drm_mode_rmfb_work_fn <4> [791.308603] RIP: 0010:skl_check_pipe_max_pixel_rate+0x8b/0x2d0 [i915] <4> [791.308605] Code: 00 0f b6 85 80 00 00 00 84 c0 74 36 48 83 7d 10 00 0f 84 e3 01 00 00 48 89 ee 4c 89 ef e8 ad 75 ff ff 48 8b 55 10 48 8b 52 48 <80> 7a 06 08 0f 84 a5 01 00 00 49 8b 95 f8 01 00 00 41 39 c7 44 0f <4> [791.308607] RSP: 0018:ffffb56f80327b48 EFLAGS: 00010293 <4> [791.308609] RAX: 0000000000010000 RBX: ffff946f017d0000 RCX: 0000000000010000 <4> [791.308610] RDX: 6b6b6b6b6b6b6b6b RSI: 0000000000000438 RDI: 0000000004380000 <4> [791.308611] RBP: ffff946f17108358 R08: 0000000000000780 R09: 0000000000000000 <4> [791.308613] R10: 0000000000000000 R11: 0000000000000000 R12: ffff946f0bf75fa8 <4> [791.308614] R13: ffff946f1311e7e8 R14: ffff946f1ad78958 R15: 0000000000010000 <4> [791.308615] FS: 0000000000000000(0000) GS:ffff946f1fec0000(0000) knlGS:0000000000000000 <4> [791.308617] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4> [791.308618] CR2: 00007f7984a8f900 CR3: 00000003e7f62002 CR4: 0000000000760ee0 <4> [791.308619] PKRU: 55555554 <4> [791.308620] Call Trace: <4> [791.308657] intel_crtc_atomic_check+0x374/0x540 [i915] <4> [791.308691] ? intel_plane_atomic_check_with_state+0x88/0x190 [i915] <4> [791.308694] drm_atomic_helper_check_planes+0x14d/0x1f0 <4> [791.308726] intel_atomic_check+0x5f6/0x1300 [i915] <4> [791.308731] drm_atomic_check_only+0x55a/0x7f0 <4> [791.308734] drm_atomic_commit+0xe/0x50 <4> [791.308736] atomic_remove_fb+0x295/0x2c0 <4> [791.308741] drm_framebuffer_remove+0x67/0x140 <4> [791.308743] drm_mode_rmfb_work_fn+0x4a/0x60 <4> [791.308747] process_one_work+0x245/0x610 <4> [791.308750] worker_thread+0x1d0/0x380 <4> [791.308753] ? process_one_work+0x610/0x610 <4> [791.308755] kthread+0x119/0x130 <4> [791.308757] ? kthread_park+0x80/0x80 <4> [791.308760] ret_from_fork+0x24/0x50 <4> [791.308764] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic mei_hdcp i915 x86_pkg_temp_thermal snd_hda_intel coretemp snd_hda_codec snd_hwdep snd_hda_core btusb crct10dif_pclmul btrtl crc32_pclmul btbcm ghash_clmulni_intel cdc_ether btintel usbnet snd_pcm mii bluetooth e1000e i2c_i801 ptp pps_core ecdh_generic mei_me mei prime_numbers <0> [791.308776] Dumping ftrace buffer: [...] <0> [791.337932] --------------------------------- <4> [791.337958] ---[ end trace 5730792f02f87405 ]--- Bumping the priority to highest because use after free are a security bug, and being able to crash machines from the userspace is unacceptable. User impact is maximal even without any known userspace generating these scenarios. Seen every 2.7 drmtip runs, only on fi-icl-u2/u3, which might indicate that multiple screens are needed to get to this situation (like every 2x tests). Martin - how are you getting the "every 2.7 runs" data? From cibuglog, I'm seeing that this hasn't been seen in 4 weeks. Last seen was drmtip_243, and latest idle is drmtip_251, so 8 runs without seeing it ICL systems were updated (BIOS/FW's) during ww10 so might be reason why issues are not seen anymore? (In reply to James Ausmus from comment #4) > Martin - how are you getting the "every 2.7 runs" data? From cibuglog, I'm > seeing that this hasn't been seen in 4 weeks. Last seen was drmtip_243, and > latest idle is drmtip_251, so 8 runs without seeing it This is an average throughout the lifetime of the bug. With the above reproduction rate, we can only say the problem is fixed after drmtip_270. However, I would rather prefer we stop looking at the reproduction rate and instead look at what the bug is: a general protection fault in our driver! (In reply to Jani Saarinen from comment #5) > ICL systems were updated (BIOS/FW's) during ww10 so might be reason why > issues are not seen anymore? No matter what the HW / BIOS is doing, we should not hit a general protection fault. So, please investigate. First step here is most likely to improve instrumentation, at least if we can't easily reproduce. This means we need some idea/theory what could go wrong. Testing so far: Ran 2x-long-flip-vs-cursor-legacy for 5 hours today without duplicating the issue. During the 5 hours the DUT had 3 CRTC's enabled and I hot-plugged DP and USB_C cables in and out at random intervals to attempt to cause an invalid CRTC to occur. Possible fixes to NULL de-reference: There appears to be 2 ways to get a GP fault in skl_check_pipe_max_pixel_rate(). 1. intel_crtc passed in is NULL. 2. pstate is resolving as NULL via drm_atomic_crtc_state_for_each_plane_state(). Assuming intel_crtc_state (cstate) is valid since it's already de-referenced several times in intel_crtc_atomic_check(). Based on the offset (0x8b) in the OOP message the issue is probably not intel_crtc which is de-referenced to get dev_priv in the first line of code. GP message: skl_check_pipe_max_pixel_rate+0x8b/0x2d0 [i915] Submit patch to protect intel_crtc in intel_display.c and pstate in intel_pm.c Not seen in 1 month on CI (since drmtip_243, currently at drmtip_256), or in multiple days of intensive local testing. Dropping priority to High, while Clint continues to pursue a patch to guard against a use after free *** Bug 109546 has been marked as a duplicate of this bug. *** Still no reproduction in CI since drmtip_243, and we're now at drmtip_273, so we've passed the magic milestone of drmtip_270 that Martin mentioned! Resolving as WORKSFORME. The reproduction rate of this issue is once in 3.6 runs, not seen after drmtip_243 (4 months, 2 weeks old). Closing this issue as WORKSFORME. The CI Bug Log issue associated to this bug has been archived. New failures matching the above filters will not be associated to this bug anymore. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.