Summary: | [IVB regression] Screen corruption due to DMAR + stolen | ||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | Steven <sourcepower> | ||||||||||||||||||||
Component: | DRM/Intel | Assignee: | Damien Lespiau <damien.lespiau> | ||||||||||||||||||||
Status: | CLOSED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||||||||||||||||||
Severity: | normal | ||||||||||||||||||||||
Priority: | medium | CC: | sourcepower | ||||||||||||||||||||
Version: | XOrg git | ||||||||||||||||||||||
Hardware: | x86-64 (AMD64) | ||||||||||||||||||||||
OS: | Linux (All) | ||||||||||||||||||||||
Whiteboard: | |||||||||||||||||||||||
i915 platform: | i915 features: | ||||||||||||||||||||||
Attachments: |
|
Description
Steven
2013-08-25 15:28:05 UTC
Created attachment 84598 [details]
photo of screen corruption
Created attachment 84599 [details]
Kernel 3.8 Config
Created attachment 84600 [details]
Kernel 3.10 Config
Created attachment 84601 [details]
dmesg from Kernel 3.8
Our HDMI code did receive a number of tweaks in that timeframe. Would it be possible for you to bisect? Besides the bisect (which I think is really what we need to get going with this one here) can you please also attach a dmesg with drm.debug=0xe added to your kernel cmdline? dmesg from a recent kernel preferred. Created attachment 84607 [details]
Kernel 3.8 dmesg with drm.debug=0xe enabled
OK i tried Kernel 3.11-rc6 but the issue still persist. Only Kernel 3.8.12 dmesg with drm.debug=0xe enabled is available. So i attached this. I will try to bisect the issue but i didn't do this before so i think it will take some days. I will use https://wiki.gentoo.org/wiki/Kernel_git-bisect as HOWTO because i'm running Gentoo. Thanks for your quick response. Short question. I'm running the bisecting process currently. I'm down to roughly 4 turns to do. Now i have a build which "solves" the issue of this bug but adds a new issue while starting the X server. So my question is should i proceed with "git bisect good" because the KMS is working with this bisect build or should i proceed with "git bisect bad" because the X server doesn't start (= system freeze) with this bisect build? (In reply to comment #9) > Short question. I'm running the bisecting process currently. I'm down to > roughly 4 turns to do. Now i have a build which "solves" the issue of this > bug but adds a new issue while starting the X server. > > So my question is should i proceed with "git bisect good" because the KMS is > working with this bisect build or should i proceed with "git bisect bad" > because the X server doesn't start (= system freeze) with this bisect build? If you can't properly test a commit (which is a bit the case here since it's unclear) you can $ git bisect skip If we're lucky the real issue is somewhere else in the history. If not we can analyze things later on precisely. But just skipping avoids that we mislead the bisect process into a deadend if we make the wrong call. Created attachment 84736 [details]
bisect log 1
Created attachment 84737 [details]
bisect log 2
So i completed the bisect run. I ignored the "new" start X results in black screen issue (only X is affected, system doesn't freeze!, i was able to power off the system via short press on the power putton) and continued with "bisect good" whenever the KMS worked (= no screen corruption when KMS gets active) and "bisect bad" whenever the KMS doesn't work (= heavy screen corruption when KMS gets active). According to bisect this is the "root cause" commit. 0ffb0ff283cca16f72caf29c44496d83b0c291fb is the first bad commit commit 0ffb0ff283cca16f72caf29c44496d83b0c291fb Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Nov 15 11:32:27 2012 +0000 drm/i915: Allocate fbcon from stolen memory Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> Acked-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> :040000 040000 89a06752fcaf1931912f666da05789b81b0b00ae 305aff01b7c56dde5b7db3c2b30c5de5bfb6cc45 M drivers Only during KMS takeover, or is the corruption permanent? If it is just the takeover, it is the issue of leaving the outputs running with the BIOS setup as we clobber the state. - this is a permanent corruption, it seems the system freezed or something like this, i can only "recover" the system if i press the power button for >4 seconds / press the reset button or pull the power cable - the system doesn't recognize any keyboard (connected via usb) input - pull out and (re)plugin the monitor cable => the corruption is still there - i can reproduce the issue if i connect the system via HDMI to HDMI cable to my Samsung LCD TV - this is a hardened Gentoo system with software hard disk encryption (dm-crypt) and the corrution happens "just" before i need to enter the password to decrypt/unlock the "/" filesystem and continue the boot process - unfortunatly i don't have a serial console and the board used doesn't offer something like a serial console,so i don't know how to get any information from the console - i don't know if the issue persist if i use a DVI to DVI cable to connect the board to the monitor because i don't have such a cable currently but i can buy it if needed for troubleshooting - not sure if this is important but i'm using the BIOS mode not the UEFI mode of the board Can you please test a broken kernel with disabled IOMMU? Just add intel_iommu=off to the kernel cmdline (and please add a new dmesg for that case). Created attachment 84784 [details]
dmesg from broken kernel with IOMMU off/disabled
- broken kernel (the last kernel from the bisect process with the only one commit) with "intel_iommu=off" as kernel cmd option, dmesg attached - KMS works now + X starts + keyboard (USB) works but the mouse (USB) doesn't work in X Somewhat unsurprisingly dmar/iommmu is the culprit again ... No idea what exactly goes wrong, but if the IOMMU is supposed to set up an identiy map for the stolen range for the gfx then that doesn't work out: [ 0.464905] IOMMU: Setting identity map for device 0000:00:02.0 [0xdf800000 - 0xdf9fffff] Probably relevant: https://patchwork.kernel.org/patch/3457401/ Steven, please tests the patch Chris linked to. The patch didn't help. I still get the heavy screen corruption. Tested on kernel 3.12.6 Created attachment 92041 [details] [review] don't use stolen for fbcon Just to double-check your bisect again can you pls test this commit to make sure the breakage is fixed. Ofc you need to remove any hacks like intel_iommu=off first. /me had high hopes for the sg->offset patch :( I cannot find the intel_fbdev.c file. box i915 # pwd /usr/src/linux-3.12.6-hardened-r4/drivers/gpu/drm/i915 box i915 # ll insgesamt 2160 -rw-r--r-- 1 root root 1197 4. Nov 00:41 Makefile -rw-r--r-- 1 root root 4666 4. Nov 00:41 dvo.h -rw-r--r-- 1 root root 12684 4. Nov 00:41 dvo_ch7017.c -rw-r--r-- 1 root root 8361 4. Nov 00:41 dvo_ch7xxx.c -rw-r--r-- 1 root root 10242 4. Nov 00:41 dvo_ivch.c -rw-r--r-- 1 root root 15992 4. Nov 00:41 dvo_ns2501.c -rw-r--r-- 1 root root 6726 4. Nov 00:41 dvo_sil164.c -rw-r--r-- 1 root root 8160 4. Nov 00:41 dvo_tfp410.c -rw-r--r-- 1 root root 62935 12. Jan 02:58 i915_debugfs.c -rw-r--r-- 1 root root 53781 12. Jan 02:58 i915_dma.c -rw-r--r-- 1 root root 29443 4. Nov 00:41 i915_drv.c -rw-r--r-- 1 root root 73508 12. Jan 02:58 i915_drv.h -rw-r--r-- 1 root root 126809 4. Nov 00:41 i915_gem.c -rw-r--r-- 1 root root 18100 4. Nov 00:41 i915_gem_context.c -rw-r--r-- 1 root root 3630 4. Nov 00:41 i915_gem_debug.c -rw-r--r-- 1 root root 7890 4. Nov 00:41 i915_gem_dmabuf.c -rw-r--r-- 1 root root 5910 4. Nov 00:41 i915_gem_evict.c -rw-r--r-- 1 root root 35095 12. Jan 02:58 i915_gem_execbuffer.c -rw-r--r-- 1 root root 28044 4. Nov 00:41 i915_gem_gtt.c -rw-r--r-- 1 root root 12138 4. Nov 00:41 i915_gem_stolen.c -rw-r--r-- 1 root root 16595 4. Nov 00:41 i915_gem_tiling.c -rw-r--r-- 1 root root 27403 4. Nov 00:41 i915_gpu_error.c -rw-r--r-- 1 root root 7199 12. Jan 02:58 i915_ioc32.c -rw-r--r-- 1 root root 96219 12. Jan 02:58 i915_irq.c -rw-r--r-- 1 root root 195935 4. Nov 00:41 i915_reg.h -rw-r--r-- 1 root root 14826 4. Nov 00:41 i915_suspend.c -rw-r--r-- 1 root root 15686 4. Nov 00:41 i915_sysfs.c -rw-r--r-- 1 root root 11508 4. Nov 00:41 i915_trace.h -rw-r--r-- 1 root root 210 4. Nov 00:41 i915_trace_points.c -rw-r--r-- 1 root root 18994 4. Nov 00:41 i915_ums.c -rw-r--r-- 1 root root 5936 4. Nov 00:41 intel_acpi.c -rw-r--r-- 1 root root 21937 4. Nov 00:41 intel_bios.c -rw-r--r-- 1 root root 17395 4. Nov 00:41 intel_bios.h -rw-r--r-- 1 root root 23203 4. Nov 00:41 intel_crt.c -rw-r--r-- 1 root root 38920 12. Jan 02:58 intel_ddi.c -rw-r--r-- 1 root root 301853 12. Jan 02:58 intel_display.c -rw-r--r-- 1 root root 103135 4. Nov 00:41 intel_dp.c -rw-r--r-- 1 root root 28485 4. Nov 00:41 intel_drv.h -rw-r--r-- 1 root root 16275 12. Jan 02:58 intel_dvo.c -rw-r--r-- 1 root root 8287 4. Nov 00:41 intel_fb.c -rw-r--r-- 1 root root 37885 4. Nov 00:41 intel_hdmi.c -rw-r--r-- 1 root root 16399 4. Nov 00:41 intel_i2c.c -rw-r--r-- 1 root root 32949 4. Nov 00:41 intel_lvds.c -rw-r--r-- 1 root root 3776 4. Nov 00:41 intel_modes.c -rw-r--r-- 1 root root 14393 4. Nov 00:41 intel_opregion.c -rw-r--r-- 1 root root 40169 4. Nov 00:41 intel_overlay.c -rw-r--r-- 1 root root 20945 4. Nov 00:41 intel_panel.c -rw-r--r-- 1 root root 160868 4. Nov 00:41 intel_pm.c -rw-r--r-- 1 root root 54362 4. Nov 00:41 intel_ringbuffer.c -rw-r--r-- 1 root root 8031 4. Nov 00:41 intel_ringbuffer.h -rw-r--r-- 1 root root 91750 4. Nov 00:41 intel_sdvo.c -rw-r--r-- 1 root root 23907 4. Nov 00:41 intel_sdvo_regs.h -rw-r--r-- 1 root root 5100 4. Nov 00:41 intel_sideband.c -rw-r--r-- 1 root root 30453 4. Nov 00:41 intel_sprite.c -rw-r--r-- 1 root root 49115 4. Nov 00:41 intel_tv.c -rw-r--r-- 1 root root 18913 12. Jan 02:58 intel_uncore.c Your kernel sources are a bit too old, we've renamed it from intel_fb.c to intel_fbdev.c. Which is why I didn't just ask you to test the reverted patch (since that would have conflicted) but instead provided the patch. I guess you need to move back to the drm-intel-nightly branch after the bisect you've done? I'm sorry to disappoint you but i'm no longer able to assist the troubleshooting process. A family member needs a "new" PC very soon and he will get my system with an other processor (Pentium G620). The Core i7 will be sold soon. I already switched to a Haswell based low end system (Celeron G1820 + B85 board) which has no vt-d support and doesn't have the issue described in this bug (= no intel_iommu=off needed). You can close the bug if you want. Thanks for the report, presumed fixed by commit 0f4706d2740f2a221cd502922b22e522009041d9 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Mar 18 14:50:50 2014 +0200 drm/i915: Disable stolen memory when DMAR is active |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.