Bug 68535

Summary: [IVB regression] Screen corruption due to DMAR + stolen
Product: DRI Reporter: Steven <sourcepower>
Component: DRM/IntelAssignee: Damien Lespiau <damien.lespiau>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: sourcepower
Version: XOrg git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
photo of screen corruption
none
Kernel 3.8 Config
none
Kernel 3.10 Config
none
dmesg from Kernel 3.8
none
Kernel 3.8 dmesg with drm.debug=0xe enabled
none
bisect log 1
none
bisect log 2
none
dmesg from broken kernel with IOMMU off/disabled
none
don't use stolen for fbcon none

Description Steven 2013-08-25 15:28:05 UTC
- since Kernel 3.9 (continues with 3.10.x) i get a heavy screen corruption during boot when the KMS gets active
- Intel Core i7 3770t on Asrock H77M ITX connected via HDMI (Board) to DVI (Monitor Eizo S1931) cable
- everything is fine with Kernel 3.8.12 and below, but Kernel 3.9.x and 3.10.x is affected
- unfortunatly i cannot get any error message
- running Gentoo hardened stable and the Kernel config is the same between 3.8, 3.9 and 3.10

box ~ # emerge --info
Portage 2.1.12.2 (hardened/linux/amd64, gcc-4.6.3, glibc-2.15-r3, 3.8.12-hardened-v1 x86_64)
=================================================================
System uname: Linux-3.8.12-hardened-v1-x86_64-Intel-R-_Core-TM-_i7-3770T_CPU_@_2.50GHz-with-gentoo-2.2
KiB Mem:     7089592 total,   5367524 free
KiB Swap:    1048572 total,   1048572 free
Timestamp of tree: Sun, 25 Aug 2013 00:45:01 +0000
ld GNU ld (GNU Binutils) 2.23.1
distcc 3.1 x86_64-pc-linux-gnu [disabled]
app-shells/bash:          4.2_p45
dev-java/java-config:     2.1.12-r1
dev-lang/python:          2.7.5-r2, 3.2.5-r2
dev-util/cmake:           2.8.10.2-r2
dev-util/pkgconfig:       0.28
sys-apps/baselayout:      2.2
sys-apps/openrc:          0.11.8
sys-apps/sandbox:         2.6-r1
sys-devel/autoconf:       2.13, 2.69
sys-devel/automake:       1.11.6, 1.12.6
sys-devel/binutils:       2.23.1
sys-devel/gcc:            4.6.3
sys-devel/gcc-config:     1.7.3
sys-devel/libtool:        2.4-r1
sys-devel/make:           3.82-r4
sys-kernel/linux-headers: 3.7 (virtual/os-headers)
sys-libs/glibc:           2.15-r3
Repositories: gentoo poncho x-portage
ACCEPT_KEYWORDS="amd64"
ACCEPT_LICENSE="*"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=corei7-avx -mcx16 -msahf -mno-movbe -maes -mpclmul -mpopcnt -mno-abm -mno-lwp -mno-fma -mno-fma4 -mno-xop -mno-bmi -mno-tbm -mavx -msse4.2 -msse4.1 --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=8192 -mtune=generic -O2 -pipe"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/share/gnupg/qualified.txt"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/dconf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo"
CXXFLAGS="-march=corei7-avx -mcx16 -msahf -mno-movbe -maes -mpclmul -mpopcnt -mno-abm -mno-lwp -mno-fma -mno-fma4 -mno-xop -mno-bmi -mno-tbm -mavx -msse4.2 -msse4.1 --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=8192 -mtune=generic -O2 -pipe"
DISTDIR="/usr/portage/distfiles"
EMERGE_DEFAULT_OPTS="--with-bdeps=y"
FCFLAGS="-O2 -pipe"
FEATURES="assume-digests binpkg-logs config-protect-if-modified distlocks ebuild-locks fixlafiles merge-sync news parallel-fetch preserve-libs protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch webrsync-gpg xattr"
FFLAGS="-O2 -pipe"
GENTOO_MIRRORS="http://distfiles.gentoo.org"
LANG="de_DE.UTF-8"
LDFLAGS="-Wl,-O1 -Wl,--as-needed"
MAKEOPTS="-j8"
PKGDIR="/usr/portage/packages"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/var/lib/layman/poncho /usr/local/portage"
SYNC=""
USE="X aac acl acpi alsa amd64 apng avx avx256 bash-completion berkdb branding bzip2 cairo cdr clamav clamdtop cli consolekit cracklib crypt css curl cxx dbus device-mapper dhcpcd dri dts dvd dvdr encode exif expat extras fftw flac fontconfig ftp gdbm gdu gif git gles2 gnutls gphoto2 gsm gtk gtkhtml gudev gzip hardened hwdb iconv icu idn ieee1394 imagemagick imap imlib ipc ipv6 iscsi java javascript jpeg jpeg2k justify kvm lame libkms libnotify llvm lm_sensors lock lvm lzma lzo mad mem-scramble memlimit mime minizip mmap mmx mng modules mozilla mp3 mp4 mpeg mplayer mudflap multilib natspec ncurses networkmanager nfs nls nptl nsplugin ogg opengl openmp opus pam pax_kernel pcre pdf pdfimport perl pkcs11 png policykit posix postscript ppp python qemu quicktime rar raw rdesktop readline rpc sdl session sharedmem smartcard smp sna sockets socks5 sound speex sqlite sse sse2 sse3 sse4 sse4_1 ssl ssse3 startup-notification subversion svg syslog-ng szip tcpd theora threads thunar tiff tls truetype udev unicode urandom usb vaapi virt-network virtfs vnc vorbis webm wmf x264 xattr xft xml xorg xv xvid zlib" ABI_X86="32 64" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="kexi words flow plan sheets stage tables krita karbon braindump author" CAMERAS="ptp2" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf superstar2 timing tsip tripmate tnt ubx" INPUT_DEVICES="evdev roccat_koneplus roccat_konextd" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" LINGUAS="de en" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php5-5" PYTHON_SINGLE_TARGET="python2_7" PYTHON_TARGETS="python2_7 python3_2" QEMU_SOFTMMU_TARGETS="i386 x86_64" QEMU_USER_TARGETS="i386 x86_64" RUBY_TARGETS="ruby19 ruby18" USERLAND="GNU" VIDEO_CARDS="intel" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account"
Unset:  CPPFLAGS, CTARGET, INSTALL_MASK, LC_ALL, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, USE_PYTHON
Comment 1 Steven 2013-08-25 15:29:24 UTC
Created attachment 84598 [details]
photo of screen corruption
Comment 2 Steven 2013-08-25 15:30:15 UTC
Created attachment 84599 [details]
Kernel 3.8 Config
Comment 3 Steven 2013-08-25 15:30:43 UTC
Created attachment 84600 [details]
Kernel 3.10 Config
Comment 4 Steven 2013-08-25 15:31:54 UTC
Created attachment 84601 [details]
dmesg from Kernel 3.8
Comment 5 Chris Wilson 2013-08-25 15:49:47 UTC
Our HDMI code did receive a number of tweaks in that timeframe. Would it be possible for you to bisect?
Comment 6 Daniel Vetter 2013-08-25 15:57:25 UTC
Besides the bisect (which I think is really what we need to get going with this one here) can you please also attach a dmesg with drm.debug=0xe added to your kernel cmdline? dmesg from a recent kernel preferred.
Comment 7 Steven 2013-08-25 17:24:14 UTC
Created attachment 84607 [details]
Kernel 3.8 dmesg with drm.debug=0xe enabled
Comment 8 Steven 2013-08-25 17:29:54 UTC
OK i tried Kernel 3.11-rc6 but the issue still persist. Only Kernel 3.8.12 dmesg with drm.debug=0xe enabled is available. So i attached this.

I will try to bisect the issue but i didn't do this before so i think it will take some days. I will use https://wiki.gentoo.org/wiki/Kernel_git-bisect as HOWTO because i'm running Gentoo.

Thanks for your quick response.
Comment 9 Steven 2013-08-26 19:50:15 UTC
Short question. I'm running the bisecting process currently. I'm down to roughly 4 turns to do. Now i have a build which "solves" the issue of this bug but adds a new issue while starting the X server.

So my question is should i proceed with "git bisect good" because the KMS is working with this bisect build or should  i proceed with "git bisect bad" because the X server doesn't start (= system freeze) with this bisect build?
Comment 10 Daniel Vetter 2013-08-27 07:45:12 UTC
(In reply to comment #9)
> Short question. I'm running the bisecting process currently. I'm down to
> roughly 4 turns to do. Now i have a build which "solves" the issue of this
> bug but adds a new issue while starting the X server.
> 
> So my question is should i proceed with "git bisect good" because the KMS is
> working with this bisect build or should  i proceed with "git bisect bad"
> because the X server doesn't start (= system freeze) with this bisect build?

If you can't properly test a commit (which is a bit the case here since it's unclear) you can

$ git bisect skip

If we're lucky the real issue is somewhere else in the history. If not we can analyze things later on precisely. But just skipping avoids that we mislead the bisect process into a deadend if we make the wrong call.
Comment 11 Steven 2013-08-27 18:17:44 UTC
Created attachment 84736 [details]
bisect log 1
Comment 12 Steven 2013-08-27 18:18:27 UTC
Created attachment 84737 [details]
bisect log 2
Comment 13 Steven 2013-08-27 18:23:49 UTC
So i completed the bisect run. I ignored the "new" start X results in black screen issue (only X is affected, system doesn't freeze!, i was able to power off the system via short press on the power putton) and continued with "bisect good" whenever the KMS worked (= no screen corruption when KMS gets active) and "bisect bad" whenever the KMS doesn't work (= heavy screen corruption when KMS gets active). According to bisect this is the "root cause" commit.

0ffb0ff283cca16f72caf29c44496d83b0c291fb is the first bad commit
commit 0ffb0ff283cca16f72caf29c44496d83b0c291fb
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Nov 15 11:32:27 2012 +0000

    drm/i915: Allocate fbcon from stolen memory
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
    Acked-by: Ben Widawsky <ben@bwidawsk.net>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

:040000 040000 89a06752fcaf1931912f666da05789b81b0b00ae 305aff01b7c56dde5b7db3c2b30c5de5bfb6cc45 M	 drivers
Comment 14 Chris Wilson 2013-08-27 18:28:38 UTC
Only during KMS takeover, or is the corruption permanent?

If it is just the takeover, it is the issue of leaving the outputs running with the BIOS setup as we clobber the state.
Comment 15 Steven 2013-08-27 19:16:38 UTC
- this is a permanent corruption, it seems the system freezed or something like this, i can only "recover" the system if i press the power button for >4 seconds / press the reset button or pull the power cable
- the system doesn't recognize any keyboard (connected via usb) input
- pull out and (re)plugin the monitor cable => the corruption is still there
- i can reproduce the issue if i connect the system via HDMI to HDMI cable to my Samsung LCD TV
- this is a hardened Gentoo system with software hard disk encryption (dm-crypt) and the corrution happens "just" before i need to enter the password to decrypt/unlock the "/" filesystem and continue the boot process
- unfortunatly i don't have a serial console and the board used doesn't offer something like a serial console,so i don't know how to get any information from the console
- i don't know if the issue persist if i use a DVI to DVI cable to connect the board to the monitor because i don't have such a cable currently but i can buy it if needed for troubleshooting
Comment 16 Steven 2013-08-27 19:24:45 UTC
- not sure if this is important but i'm using the BIOS mode not the UEFI mode of the board
Comment 17 Daniel Vetter 2013-08-28 09:13:09 UTC
Can you please test a broken kernel with disabled IOMMU? Just add intel_iommu=off to the kernel cmdline (and please add a new dmesg for that case).
Comment 18 Steven 2013-08-28 10:59:20 UTC
Created attachment 84784 [details]
dmesg from broken kernel with IOMMU off/disabled
Comment 19 Steven 2013-08-28 11:04:53 UTC
- broken kernel (the last kernel from the bisect process with the only one commit) with "intel_iommu=off" as kernel cmd option, dmesg attached
- KMS works now + X starts + keyboard (USB) works but the mouse (USB) doesn't work in X
Comment 20 Daniel Vetter 2013-08-28 11:17:12 UTC
Somewhat unsurprisingly dmar/iommmu is the culprit again ...

No idea what exactly goes wrong, but if the IOMMU is supposed to set up an identiy map for the stolen range for the gfx then that doesn't work out:

[    0.464905] IOMMU: Setting identity map for device 0000:00:02.0 [0xdf800000 - 0xdf9fffff]
Comment 21 Chris Wilson 2014-01-09 13:38:09 UTC
Probably relevant: https://patchwork.kernel.org/patch/3457401/
Comment 22 Daniel Vetter 2014-01-09 15:28:53 UTC
Steven, please tests the patch Chris linked to.
Comment 23 Steven 2014-01-10 11:45:35 UTC
The patch didn't help. I still get the heavy screen corruption. Tested on kernel 3.12.6
Comment 24 Daniel Vetter 2014-01-14 14:03:04 UTC
Created attachment 92041 [details] [review]
don't use stolen for fbcon

Just to double-check your bisect again can you pls test this commit to make sure the breakage is fixed. Ofc you need to remove any hacks like intel_iommu=off first.

/me had high hopes for the sg->offset patch :(
Comment 25 Steven 2014-01-14 16:57:06 UTC
I cannot find the intel_fbdev.c file.

box i915 # pwd
/usr/src/linux-3.12.6-hardened-r4/drivers/gpu/drm/i915

box i915 # ll
insgesamt 2160
-rw-r--r-- 1 root root   1197  4. Nov 00:41 Makefile
-rw-r--r-- 1 root root   4666  4. Nov 00:41 dvo.h
-rw-r--r-- 1 root root  12684  4. Nov 00:41 dvo_ch7017.c
-rw-r--r-- 1 root root   8361  4. Nov 00:41 dvo_ch7xxx.c
-rw-r--r-- 1 root root  10242  4. Nov 00:41 dvo_ivch.c
-rw-r--r-- 1 root root  15992  4. Nov 00:41 dvo_ns2501.c
-rw-r--r-- 1 root root   6726  4. Nov 00:41 dvo_sil164.c
-rw-r--r-- 1 root root   8160  4. Nov 00:41 dvo_tfp410.c
-rw-r--r-- 1 root root  62935 12. Jan 02:58 i915_debugfs.c
-rw-r--r-- 1 root root  53781 12. Jan 02:58 i915_dma.c
-rw-r--r-- 1 root root  29443  4. Nov 00:41 i915_drv.c
-rw-r--r-- 1 root root  73508 12. Jan 02:58 i915_drv.h
-rw-r--r-- 1 root root 126809  4. Nov 00:41 i915_gem.c
-rw-r--r-- 1 root root  18100  4. Nov 00:41 i915_gem_context.c
-rw-r--r-- 1 root root   3630  4. Nov 00:41 i915_gem_debug.c
-rw-r--r-- 1 root root   7890  4. Nov 00:41 i915_gem_dmabuf.c
-rw-r--r-- 1 root root   5910  4. Nov 00:41 i915_gem_evict.c
-rw-r--r-- 1 root root  35095 12. Jan 02:58 i915_gem_execbuffer.c
-rw-r--r-- 1 root root  28044  4. Nov 00:41 i915_gem_gtt.c
-rw-r--r-- 1 root root  12138  4. Nov 00:41 i915_gem_stolen.c
-rw-r--r-- 1 root root  16595  4. Nov 00:41 i915_gem_tiling.c
-rw-r--r-- 1 root root  27403  4. Nov 00:41 i915_gpu_error.c
-rw-r--r-- 1 root root   7199 12. Jan 02:58 i915_ioc32.c
-rw-r--r-- 1 root root  96219 12. Jan 02:58 i915_irq.c
-rw-r--r-- 1 root root 195935  4. Nov 00:41 i915_reg.h
-rw-r--r-- 1 root root  14826  4. Nov 00:41 i915_suspend.c
-rw-r--r-- 1 root root  15686  4. Nov 00:41 i915_sysfs.c
-rw-r--r-- 1 root root  11508  4. Nov 00:41 i915_trace.h
-rw-r--r-- 1 root root    210  4. Nov 00:41 i915_trace_points.c
-rw-r--r-- 1 root root  18994  4. Nov 00:41 i915_ums.c
-rw-r--r-- 1 root root   5936  4. Nov 00:41 intel_acpi.c
-rw-r--r-- 1 root root  21937  4. Nov 00:41 intel_bios.c
-rw-r--r-- 1 root root  17395  4. Nov 00:41 intel_bios.h
-rw-r--r-- 1 root root  23203  4. Nov 00:41 intel_crt.c
-rw-r--r-- 1 root root  38920 12. Jan 02:58 intel_ddi.c
-rw-r--r-- 1 root root 301853 12. Jan 02:58 intel_display.c
-rw-r--r-- 1 root root 103135  4. Nov 00:41 intel_dp.c
-rw-r--r-- 1 root root  28485  4. Nov 00:41 intel_drv.h
-rw-r--r-- 1 root root  16275 12. Jan 02:58 intel_dvo.c
-rw-r--r-- 1 root root   8287  4. Nov 00:41 intel_fb.c
-rw-r--r-- 1 root root  37885  4. Nov 00:41 intel_hdmi.c
-rw-r--r-- 1 root root  16399  4. Nov 00:41 intel_i2c.c
-rw-r--r-- 1 root root  32949  4. Nov 00:41 intel_lvds.c
-rw-r--r-- 1 root root   3776  4. Nov 00:41 intel_modes.c
-rw-r--r-- 1 root root  14393  4. Nov 00:41 intel_opregion.c
-rw-r--r-- 1 root root  40169  4. Nov 00:41 intel_overlay.c
-rw-r--r-- 1 root root  20945  4. Nov 00:41 intel_panel.c
-rw-r--r-- 1 root root 160868  4. Nov 00:41 intel_pm.c
-rw-r--r-- 1 root root  54362  4. Nov 00:41 intel_ringbuffer.c
-rw-r--r-- 1 root root   8031  4. Nov 00:41 intel_ringbuffer.h
-rw-r--r-- 1 root root  91750  4. Nov 00:41 intel_sdvo.c
-rw-r--r-- 1 root root  23907  4. Nov 00:41 intel_sdvo_regs.h
-rw-r--r-- 1 root root   5100  4. Nov 00:41 intel_sideband.c
-rw-r--r-- 1 root root  30453  4. Nov 00:41 intel_sprite.c
-rw-r--r-- 1 root root  49115  4. Nov 00:41 intel_tv.c
-rw-r--r-- 1 root root  18913 12. Jan 02:58 intel_uncore.c
Comment 26 Daniel Vetter 2014-01-14 20:28:04 UTC
Your kernel sources are a bit too old, we've renamed it from intel_fb.c to intel_fbdev.c. Which is why I didn't just ask you to test the reverted patch (since that would have conflicted) but instead provided the patch. I guess you need to move back to the drm-intel-nightly branch after the bisect you've done?
Comment 27 Steven 2014-01-17 20:24:52 UTC
I'm sorry to disappoint you but i'm no longer able to assist the troubleshooting process.
A family member needs a "new" PC very soon and he will get my system with an other processor (Pentium G620). The Core i7 will be sold soon. I already switched to a Haswell based low end system (Celeron G1820 + B85 board) which has no vt-d support and doesn't have the issue described in this bug (= no intel_iommu=off needed).

You can close the bug if you want.
Comment 28 Jani Nikula 2014-03-19 09:02:30 UTC
Thanks for the report, presumed fixed by 
commit 0f4706d2740f2a221cd502922b22e522009041d9
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Mar 18 14:50:50 2014 +0200

    drm/i915: Disable stolen memory when DMAR is active

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.