Summary: | [SNB GT2] enabling RC6 causes sudden shutdowns (semaphores=0, intel_iommu=off) | ||
---|---|---|---|
Product: | DRI | Reporter: | Jan Urbański <wulczer> |
Component: | DRM/Intel | Assignee: | Eugeni Dodonov <eugeni> |
Status: | CLOSED FIXED | QA Contact: | |
Severity: | normal | ||
Priority: | medium | CC: | abalmos, ben, chris, daniel, fenio, florian, geoff.oxholm, jbarnes, klemmster, leann.ogasawara, robert, techfreak, thcourbon |
Version: | unspecified | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
See Also: | https://launchpad.net/bugs/937378 | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Description
Jan Urbański
2012-01-17 08:38:43 UTC
Created attachment 55683 [details]
kernel log from the session that ended in a shutdown
Eugeni, I suspect an elephant in the room. Can you add this to the list of issues you are tracking and see if anything pops up internally? I got another shutdown, this time without any visible glitches. The logs look the same, but if you want I can attach them. I'll stop running with RC6 for now, I'd be happy to try setting options or trying patches if you'll have them. Could someone confirm if those issues affect UX21e as well? it should be almost the same except for different SSD and i5 processor instead of i7. @Jan - is it the kernel log from the machine which died, or gathered with netconsole? If it is without netconsole, could you setup it and see if kernel says something before dying? (If it is with netconsole, I am out of clues at the moment. Looks like machine just.. well.. dies :)). (In reply to comment #5) > @Jan - is it the kernel log from the machine which died, or gathered with > netconsole? If it is without netconsole, could you setup it and see if kernel > says something before dying? It's the log from the machine that died. I'll set up netconsole logging and wait for the shutdown. Oops, just realised there's a problem with using netconsole on this laptop :( It doesn't have an ethernet port and neither the wireless module (ath9k) nor the USB-Ethernet dongle that comes with the machine (asix) support netpoll... And of course there's no serial interface. Anyone has an idea of how to get the kernel logs to another machine? I get the same shut downs on my UX31E with the Intel Core i5-2557M using: *ubuntu 11.10 *kernel 3.2.1 X86_64 http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.2.1-precise/ *VTd disabled in BIOS ASUS ver.210 *quiet i915.i915_enable_rc6=1 i915.semaphores=1 intel_iommu=off drm.debug=0x06 snd_hda_intel.power_save=10 When on battery it appears to shutdown more often than when on AC. I can attach logs should you want them with this bug report or I can open a separate ticket. (In reply to comment #4) > Could someone confirm if those issues affect UX21e as well? > > it should be almost the same except for different SSD and i5 processor instead > of i7. Don't know how much related it is, but when I tried to use JDownloader (so highly multithreaded app) then shutdowns occure almost every 10 minutes. Maybe it'll help someone to debug it faster ;) I've got UX31e with i5 processor. (In reply to comment #4) > Could someone confirm if those issues affect UX21e as well? > . > it should be almost the same except for different SSD and i5 processor instead > of i7. I have a UX21e71 - to be clear, this model also has an i7 I can confirm this bug for the UX21. I would be happy to provide whatever debug information is useful. Ubuntu 12.04 kernel: 3.2.0-11-generic Could you please install intel-gpu-tools from git (http://cgit.freedesktop.org/xorg/app/intel-gpu-tools/) and attach the results of 'intel_reg_dumper | grep RC6' on the machines affected by this issue? Without RC6 enabled in kernel commandline (this is how I use it on day-by-day basis because of sudden shutdowns): [root@zenbook ~]# intel_reg_dumper | grep RC6 RC6_RESIDENCY_TIME: 0x007aa287 RC6p_RESIDENCY_TIME: 0x00000000 RC6pp_RESIDENCY_TIME: 0x00000000 [root@zenbook ~]# cat /proc/cmdline root=/dev/sda5 ro logo.nologo quiet elevator=noop initcall_debug printk.time=y init=/bin/systemd [root@zenbook ~]# And after enabling RC6: [root@zenbook ~]# intel_reg_dumper | grep RC6 RC6_RESIDENCY_TIME: 0x00f6dc4b RC6p_RESIDENCY_TIME: 0x05ab68a1 RC6pp_RESIDENCY_TIME: 0x00000000 [root@zenbook ~]# cat /proc/cmdline root=/dev/sda5 ro logo.nologo quiet elevator=noop initcall_debug printk.time=y init=/bin/systemd i915.i915_enable_rc6=1 i915.semaphores=1 i915.i915_enable_fbc=1 i915.lvds_downclock=1 i915.modeset=1 [root@zenbook ~]# Created attachment 56336 [details]
intel_reg_dumper Ubuntu 12.04 Kernel 3.2.0.12 Generic
I get no results when intel_reg_dumper | grep RC6
root@blade:~# cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-3.2.0-12-generic root=UUID=b808e6bf-c9dc-4ef3-9534-cdd31e982778 ro i915.i915_enable_rc6=1 i915.semaphores=1 snd_hda_intel.power_save=10 intel_iommu=off quiet splash vt.handoff=7
System: Host: blade Kernel: 3.2.0-12-generic x86_64 (64 bit) Desktop: Gnome Distro: Ubuntu 12.04 precise Machine: System: ASUSTeK (portable) product: UX31E version: 1.0 Mobo: ASUSTeK model: UX31E version: 1.0 Bios: American Megatrends version: UX31E.210 date: 12/26/2011 CPU: Dual core Intel Core i5-2557M CPU (-HT-MCP-) cache: 3072 KB flags: (lm nx sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx) Clock Speeds: 1: 800.00 MHz 2: 800.00 MHz 3: 800.00 MHz 4: 800.00 MHz Graphics: Card: Intel 2nd Generation Core Processor Family Integrated Graphics Controller X.Org: 1.11.3 drivers: intel (unloaded: vesa,fbdev) Resolution: 1600x900@60.0hz GLX Renderer: Mesa DRI Intel Sandybridge Mobile GLX Version: 2.1 Mesa 7.11 Audio: Card: Intel 6 Series/C200 Series Chipset Family High Definition Audio Controller driver: snd_hda_intel Sound: Advanced Linux Sound Architecture ver: 1.0.24 Network: Card: Atheros AR9485 Wireless Network Adapter driver: ath9k IF: wlan0 state: Drives: HDD Total Size: 128.0GB (6.7% used) 1: /dev/sda SanDisk_SSD_U100 128.0GB Partition: ID: / size: 112G used: 8.0G (8%) fs: ext4 Sensors: System Temperatures: cpu: 49.0C mobo: N/A Fan Speeds (in rpm): cpu: N/A Info: Processes: 189 Uptime: 0 min Memory: 627.3/3856.6MB Client: Shell inxi: 1.7.28 Without RC6: RC6_RESIDENCY_TIME: 0x00eb5b28 RC6p_RESIDENCY_TIME: 0x00000000 RC6pp_RESIDENCY_TIME: 0x00000000 BOOT_IMAGE=/boot/vmlinuz-3.3.0-rc1-wulczer root=UUID=9c9dfe9d-32b3-4bac-aff6-d0ca77337a47 ro pcie_aspm=force acpi_osi=Linux acpi_os_name=Linux intel_iommu=off quiet snd_hda_intel.power_save=10 ipv6.disable=1 With RC6 (freshly after rebooting): RC6_RESIDENCY_TIME: 0x00643a8f RC6p_RESIDENCY_TIME: 0x012a78bf RC6pp_RESIDENCY_TIME: 0x00000000 BOOT_IMAGE=/boot/vmlinuz-3.3.0-rc1-wulczer root=UUID=9c9dfe9d-32b3-4bac-aff6-d0ca77337a47 ro quiet pcie_aspm=force acpi_osi=Linux acpi_os_name=Linux i915.i915_enable_rc6=1 i915.semaphores=1 intel_iommu=off snd_hda_intel.power_save=10 ipv6.disable=1 a possible resolve to the shutdowns http://files.benesovi.eu/ux31e/ I've been now running for a full day with the DSDT from http://files.benesovi.eu/ux31e/ RC6 and semaphores and no shutdowns. For the record, the values from the register dumper: RC6_RESIDENCY_TIME: 0x07087233 RC6p_RESIDENCY_TIME: 0x0ce7cf41 RC6pp_RESIDENCY_TIME: 0x00000000 Thanks! Does this mean that this bug should actually be open in the Linux-ACPI project tracker? Haha, I think I spoke too soon! 10 minutes after writing that comment I had a shutdown :) Rebooted with intel_iommu=off, let's see if helps... Just for the record, with latest DSDT and intel_iommu=off, does the issue still happens? Call it fate... I was just writing a comment about how it did happen to me... when it happened again. Resuming: using a 3.2.1 kernel compiled with the .config from the linked page (with some modifications, but still leaving IOMMU entirely out), using the fixed DSDT, booting with intel_iommu=off, rc6=1 and semaphores=1, with VT-d disabled in the BIOS... the laptop died on me :( I can confirm Jan's experiences on 3.3-rc2 kernel. Created attachment 56661 [details] a log of 1s snapshots of RC6 registers I wrote a little script that every second writes the values from intel_reg_dumper into a file and fsyncs it, in the hope of getting a view of the just-before-the-shutdown situation. Attached is this log (it includes register values and the difference between the previous measure, as I wanted to see if they changed more quickly depending on what I'm doing). This was with a kernel.org 3.2.1 kernel, CONFIG_INTEL_IOMMU disabled in .config. Kernel command line: pcie_aspm=force i915.powersave=1 i915.i915_enable_rc6=1 i915.semaphores=1 snd_hda_intel.power_save=10 ipv6.disable=1 By the way, the guy that fixed the DSDT said he uses the following versions of X libraries (it's Gentoo though, so even things like the compiler version used could influence the way it works): * x11-drivers/xf86-video-intel-2.17.0-r3 with sna enabled (sandybridge new acceleration) * x11-base/xorg-server-1.11.2-r2 * media-libs/mesa-7.11.2 with egl ,gallium, llvm, nptl and shared-glapi USE flags * x11-libs/libdrm-2.4.27 and a self-compiled, monolithic kernel (no initrd or loadable modules, the .config file is http://files.benesovi.eu/ux31e/.config). According to him, he hasn't had a shutdown since fixing the DSDT, but several Debian and Ubuntu users did. It leaves me wondering what could be the difference? Possible solutions to the hard shutdowns. http://dodonov.net/blog/2012/02/09/time-for-some-news-2/ http://lists.freedesktop.org/archives/intel-gfx/2012-February/015004.html A datapoint: I haven't had a shutdown in five days. The configuration that made the shutdowns go away is: * UX31E * BIOS 210 * VT-d disabled in BIOS * self-compiled 3.2.5 kernel from a kernel.org tarball * a custom kernel configuration, which disables initrd, ACPI S4 state and swap support and compiles every module I use in the kernel (lsmod output is empty) * IOMMU is disabled entirely in the kernel config * rts5139 (the SD card driver) is compiled as a module, but blacklisted in modprobe.d (I verified that it the card slot works if I load the module) * fixed DSDT * the following boot params: elevator=noop rootfstype=ext4 i915.powersave=1 i915.i915_enable_rc6=1 i915.semaphores=1 snd_hda_intel.power_save=10 ipv6.disable=1 That's *not* using the new RC6 patches. There have been reports from people trying a very similar configuration, but enabling initrd, devtmpfs, LVM and KVM and they said they were still getting shutdowns. So there's some serious voodoo going on around here... I have no idea why with this kernel I'm not getting shutdowns at all or even if it will persist after a cold boot (until now I've only been suspending, never turned the PC off). Anyway, just wanted to mention that there is some magical scenario when the laptop doesn't shut down anymore. Could you try with different values of i915_enable_rc6 which the patch mentioned above accepts, and see if they improve (or worsen?) something please? More specifically, if you could try with the even numbers as well (e.g., 2, 4 and 6), it would be interesting, as it would isolate the states which give issues (if any). Trying the pre-rolled kernel on 12.04 with: Vt-d enabled grub entries i915.powersave=1 i915.i915_enable_rc6=1 i915.semaphores=1 elevator=noop quiet splash dsdt.aml loaded at via grub Several hours of moderate to heavy use on and off of AC and NO shutdowns or any other issues. I will try rc6=3 today and report on my experience. i915_enable_rc6=2 had a shutdown after about 15 min of heavy use Created attachment 57047 [details]
dmesg i915_rc6=2
power shut down after 15 min of heavy use
Created attachment 57048 [details]
kern.log i915_rc6=2
kernel.log with i915_rc6=2 power shut down
rc6=1 (after patch which splitted rc6 to 3 stages) 14 hours uptime. Power shutdown with i915_enable_rc6=4 after one hour heavy use then back to moderate use. Plugged into AC Power shutdown with i915_enable_rc6=4 after one hour heavy use then back to moderate use. Plugged into AC power shut downs on settings 2,4 & 6 with rc6 patched kernel. Currently using i915_enable_rc6=5 no issues after 5 hours of use. Not using custom dsdt.aml Vt-d enabled in bios 211(on prior kernels this setting would cause system to lock up (not shutdown). ) Hello there, I experienced shut downs with i9*15_enable_rc set at 3, 4. I may haven't tested 2 long enough and I currently run on a custom compiled kernel with the attached patch that only active the shallowest RC6 (equivalent to i915_enalbe_rc6=1 with eugeni's patch set). I haven't experienced a shut down with rc6=1 yet. I've made this patch after reading Keith Packard mail (http://lists.freedesktop.org/archives/intel-gfx/2012-February/015125.html). I have run with this patch on an ArchLinux kernel for about 15hr without shutdown. My Git-fu is quite bad so it's possible I've messed up something somewhere, so do not blindly apply this patch. It should apply cleanly on 3.2.6. No custom dsdt.aml, Vt-d enabled in BIOS (ver. 210). Created attachment 57134 [details] [review] Patch to only allow the activation of RC6 an not RC6p So far the only successful setting with a patched rc6 kernel has been with i915_enable_rc6=1 . This is without a custom dsdt.aml and Vt-d enabled by default in bios 211. setting 2 shutdown after 20 min of use setting 3 shutdown approx 1 hr into use 4,5,6,7 shutdown after few hours into each trial. Excellent, great to know that we found one working solution. The patch which enables only RC6 on Sandy Bridge (and RC6p as well on Ivy Bridge) was included in the drm-intel-next, the patch is at http://lists.freedesktop.org/archives/intel-gfx/2012-February/015131.html. So hopefully it will reach Linus kernel tree as well, and then we'll backport it to 3.2 kernel so it would be released as a stable update. Thank you all a *lot* for all this testing and feedback - this was a tricky one and it is great to know that we managed to find a solution which worked for you! I finally got a shutdown with my old configuration (where i915_enable_rc6=1 enabled all types of RC6 sleep). So it seems that all the hokus-pokus with a custom kernel and a custom DSDT just make the shutdown less probable. I've been running with the patch to only enable the shallow RC6 sleep state for a few days and haven't had any shutdowns, but I recently got a GPU hang. Here are the relevant lines from dmesg: [41910.437774] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung [41910.437797] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state debugfs was not mounted at the time and I suspended and resumed the laptop since. Today after seeing the error in dmesg I mounted debugfs and found the i915_error_state file - attached. I'm reporting it here even though I don't know if that's related to using RC6 - if not I'll be happy to open a separate bug report. Created attachment 57305 [details]
contents of i915_error_state after a GPU hang
Ok, that error-state matches another known bug, #45492, so not rc6 related. Using Kubuntu 12.04 with canonical kernel 3.2.0-17.26 (rc6 patched) and have had 3 hard shutdowns using this kernel. VT-d enabled BIOS 211 left i915_enable_rc6=1 in the grub cmd line. I have a vanilla 3.3-rc4 kernel with original patch rc6 and custom dsdt to experiment with. The latest forum posts are starting to show similiar shutdowns as well http://ubuntuforums.org/showthread.php?t=1865577&page=104 Created attachment 57369 [details]
hard shutdown with rc6 patched in 3.2.0-17.26
Kernel log with rc6 patched kernel 3.2.0-17
For the ones of you still affected by those issues, even with deep RC6 disabled, could you *please* describe your machine at https://wiki.ubuntu.com/Kernel/PowerManagementRC6 page? So far, it looks like it affects Samsung machines only (Samsung NP350U2A and Samsung NP700Z5B-W01UB). If you have something different, and you experience any issues, please, fill in that page. We need to find a machine that can reproduce the problems consistently to be able to understand what is going wrong there... I got two hard shutdowns yesterday night with vanilla 3.3-rc4 and the rc6 path from here : http://lists.freedesktop.org/archives/intel-gfx/2012-February/015131.html I was casually browsing some forum in wifi while listening to music in VLC in both case. It happened while the laptop was on battery. Config: - Asus UX31e (128Go, Elantech touchpad) - Bios 210 - Vt-d disabled (I forgot to re-enable it after my previous test). Would it help to report for shutdown with a vanilla kernel compiled without iommu ? The ones of you having issues with the original SNB patch, could you also test with http://lists.freedesktop.org/archives/intel-gfx/2012-February/015319.html on top of it please? E.g., apply http://lists.freedesktop.org/archives/intel-gfx/2012-February/015131.html and then http://lists.freedesktop.org/archives/intel-gfx/2012-February/015319.html and then check if RC6 still causes any issues. I am *almost* certain that it should work in all cases now. I agree that using this patch or pre-rolled kernel appeared to be working very well. I unfortunately had my system "brick" due to a hardware failure not related to any of these issues. I will endeavour to test when I get it back from repairs assuming a solution isn't found by then. http://forum.notebookreview.com/asus/637772-ux31-completely-dead-help-7.html I ran a 3.3-rc4 kernel with the 2 patches from Eugeni's message for about 20 hours withtout any crash (previously my machine crashed every hour or so). As far I'm concerned this is fixed and I can move on fixing slow wifi. It's funny the patch which correct operator precedence didn't make to rc5 :) As of 3.3-rc6 this issue no longer occurs even after several day on battery. I allow myself to mark this bug as fixed. Likely fixed by commit fa37d39e4c6622d80bd8061d600701bcea1d6870 Author: Sean Paul <seanpaul@chromium.org> Date: Fri Mar 2 12:53:39 2012 -0500 drm/i915: Retry reading the PCH FDI receiver ISR and commit c0e2ee1bc0cf82eec89e26b7afe7e4db0561b7d9 Author: Eugeni Dodonov <eugeni.dodonov@intel.com> Date: Thu Feb 23 23:57:06 2012 -0200 drm/i915: fix operator precedence when enabling RC6p A patch referencing this bug report has been merged in Linux v3.4-rc2: commit aa46419186992e6b8b8010319f0ca7f40a0d13f5 Author: Eugeni Dodonov <eugeni.dodonov@intel.com> Date: Fri Mar 23 11:57:19 2012 -0300 drm/i915: enable plain RC6 on Sandy Bridge by default |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.