Summary: | [SNB] Intel driver freezes wheezy system on Sandybridge HD3000 chipset [semaphores=0, garbage in batch] | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | Oscar <lancelot1981> | ||||||||||||||||
Component: | DRM/Intel | Assignee: | Daniel Vetter <daniel> | ||||||||||||||||
Status: | CLOSED DUPLICATE | QA Contact: | |||||||||||||||||
Severity: | normal | ||||||||||||||||||
Priority: | high | CC: | 384toregzteez, ac, ben, chris, daniel, eugeni, graeme.russ, jbarnes, q3aiml, tworaz666, xorgzilla | ||||||||||||||||
Version: | unspecified | ||||||||||||||||||
Hardware: | x86-64 (AMD64) | ||||||||||||||||||
OS: | Linux (All) | ||||||||||||||||||
Whiteboard: | |||||||||||||||||||
i915 platform: | i915 features: | ||||||||||||||||||
Bug Depends on: | |||||||||||||||||||
Bug Blocks: | 42991, 44622 | ||||||||||||||||||
Attachments: |
|
Description
Oscar
2011-09-01 12:31:41 UTC
The first bit of information I need is whether this is a GPU hang. Is the system accessible after the freeze and is /sys/kernel/debug/dri/0/i915_error_state populated? The system is totally unaccessible after freezing, even with standard console key combinations (Ctrl-Alt-F*)
that's the reason for the brutal shutdown. The file /sys/kernel/debug/dri/0/i915_error_state reports
"no error state collected" for the time being, but i've rebooted and uninstalled intel driver, so dunno
if it's overwritten by the new system bootup.
> From: bugzilla-daemon@freedesktop.org
> To: lancelot1981@hotmail.it
> Subject: [Bug 40564] Intel driver freezes wheezy system on Sandybridge HD3000 chipset
> Date: Thu, 1 Sep 2011 12:39:09 -0700
>
> https://bugs.freedesktop.org/show_bug.cgi?id=40564
>
> --- Comment #1 from Chris Wilson <chris@chris-wilson.co.uk> 2011-09-01 12:39:08 PDT ---
> The first bit of information I need is whether this is a GPU hang. Is the
> system accessible after the freeze and is
> /sys/kernel/debug/dri/0/i915_error_state populated?
>
> --
> Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You reported the bug.
Ok, my guess is that this is https://bugzilla.kernel.org/show_bug.cgi?id=27892 in which case adding an Option "DebugFlushCaches" "True" to the Device section in your xorg.conf (or add a snippet to xorg.conf.d if you prefer) will workaround the bug. But i get no error messages in the kernel after the freeze.
Anyway, if i have to follow your suggestion, should i also add that patch
in the bug you linked me to, to the driver?
> From: bugzilla-daemon@freedesktop.org
> To: lancelot1981@hotmail.it
> Subject: [Bug 40564] Intel driver freezes wheezy system on Sandybridge HD3000 chipset
> Date: Thu, 1 Sep 2011 15:15:06 -0700
>
> https://bugs.freedesktop.org/show_bug.cgi?id=40564
>
> --- Comment #3 from Chris Wilson <chris@chris-wilson.co.uk> 2011-09-01 15:15:04 PDT ---
> Ok, my guess is that this is https://bugzilla.kernel.org/show_bug.cgi?id=27892
> in which case adding an Option "DebugFlushCaches" "True" to the Device section
> in your xorg.conf (or add a snippet to xorg.conf.d if you prefer) will
> workaround the bug.
>
> --
> Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You reported the bug.
No, the xorg.conf option is sufficient. The patch was simply to hardcode the workarounds for a distro pkg. Nothing to do.
I followed your suggestion but the system still freezes, this time even in normal screen
mode (no xscreensaver running). Here's my /etc/xorg.conf.d/10-intel.conf:
Section "Device"
Identifier "PCI:00:02.0"
Driver "intel"
Option "DebugFlushCaches" "True"
Option "DebugFlushBatches" "True"
EndSection
The driver version running is 2.15 from debian wheezy package xserver-xorg-video-intel, kernel 2.6.39 .
> From: bugzilla-daemon@freedesktop.org
> To: lancelot1981@hotmail.it
> Subject: [Bug 40564] Intel driver freezes wheezy system on Sandybridge HD3000 chipset
> Date: Fri, 2 Sep 2011 02:10:45 -0700
>
> https://bugs.freedesktop.org/show_bug.cgi?id=40564
>
> --- Comment #5 from Chris Wilson <chris@chris-wilson.co.uk> 2011-09-02 02:10:45 PDT ---
> No, the xorg.conf option is sufficient. The patch was simply to hardcode the
> workarounds for a distro pkg.
>
> --
> Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You reported the bug.
Hi there,
any progress on this bug?
Thanks,
Oscar
> From: bugzilla-daemon@freedesktop.org
> To: lancelot1981@hotmail.it
> Subject: [Bug 40564] Intel driver freezes wheezy system on Sandybridge HD3000 chipset
> Date: Fri, 2 Sep 2011 02:10:45 -0700
>
> https://bugs.freedesktop.org/show_bug.cgi?id=40564
>
> --- Comment #5 from Chris Wilson <chris@chris-wilson.co.uk> 2011-09-02 02:10:45 PDT ---
> No, the xorg.conf option is sufficient. The patch was simply to hardcode the
> workarounds for a distro pkg.
>
> --
> Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You reported the bug.
I am getting the same behavior on a Thinkpad like Oscar, so Chris, I'll turn on the extra settings and hopefully get some information on where this is happening. The mouse pointer freezes, and not even the raising elephants key combos work. I did not try to log in over the network however. (In reply to comment #8) > I am getting the same behavior on a Thinkpad like Oscar, so Chris, I'll turn on > the extra settings and hopefully get some information on where this is > happening. > > The mouse pointer freezes, and not even the raising elephants key combos work. > I did not try to log in over the network however. Setting i915.semaphores=1 on the kernel command line seems to fix this issue on my T420. Apparently Debian and Ubuntu also have or are in the process of releasing an update to their distros that fixes some lockup issues. Please attach the full dmesg after boot and grab all the i915 kernel module parameters (i.e. all of /sys/module/i915/parameters). Also please boot with i915.reset=0 (this sometimes prevents a gpu hang from taking down the complete systems), rehang your machine and try to grab as much as possible. Also check /var/log/messages in case the machine still hard-hangs, maybe something hit the disk. Please describe your describe your desktop enviroment, like what window manager you're using and whether you have enabled/disabled compositioning. And add the version of your mesa package, please. Thanks, Daniel Ping? Ping? Could you confirm if it was fixed, or the issue is still out there, and provide the files Daniel asked please? Created attachment 53270 [details]
dmesg right after boot
(In reply to comment #12) > Ping? > > Could you confirm if it was fixed, or the issue is still out there, and provide > the files Daniel asked please? Hi I am not the original reporter but I believe I am experiencing the same issue (lenovo x121e, 00:02.0 VGA compatible controller: Intel Corporation Sandy Bridge Integrated Graphics Controller (rev 09) (prog-if 00 [VGA controller])) The environment is gnome 2.30 as shipped wth debian squeeze (metacity window manager) and using xserver and intel driver backported from wheezy (xserver-xorg-video-intel: 2.15.0-3~bpo60+1). In most cases nothing is found in the logs but four times (out of maybe 10 or 15 lockups) the last lines of syslog were: ... Nov 6 11:01:35 mezzo kernel: [ 136.772566] [drm:i915_hangcheck_elapsed] *ERROR * Hangcheck timer elapsed... GPU hung Nov 6 11:01:35 mezzo kernel: [ 136.781982] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 15434 at 15431, next 15436) Nov 6 11:01:35 mezzo acpid: client 2465[0:0] has disconnected Nov 6 11:01:35 mezzo acpid: client connected from 2266[0:0] Nov 6 11:01:35 mezzo acpid: 1 client rule loaded Nov 6 11:01:41 mezzo acpid: client 2266[0:0] has disconnected Nov 6 11:01:42 mezzo acpid: client connected from 2465[0:0] Nov 6 11:01:42 mezzo acpid: 1 client rule loaded followed by syslog restarted after reboot. The acpid messages were not always present though. Module parameters: /sys/module/i915/parameters/fbpercrtc: 0 /sys/module/i915/parameters/i915_enable_rc6: 0 /sys/module/i915/parameters/lvds_downclock: 0 /sys/module/i915/parameters/lvds_use_ssc: 1 /sys/module/i915/parameters/modeset: 1 Mesa packages: root@mezzo:/home/tester# dpkg -l *mesa* | grep ^ii ii libgl1-mesa-dri 7.10.3-4~bpo60+1 free implementation of the OpenGL API -- DRI modules ii libgl1-mesa-glx 7.7.1-5 A free implementation of the OpenGL API -- GLX runtime ii libglu1-mesa 7.7.1-5 The OpenGL utility library (GLU) ii mesa-utils 7.7.1-5 Miscellaneous Mesa GL utilities /sys/module/i915/parameters/fbpercrtc: 0 /sys/module/i915/parameters/i915_enable_rc6: 0 /sys/module/i915/parameters/lvds_downclock: 0 /sys/module/i915/parameters/lvds_use_ssc: 1 /sys/module/i915/parameters/modeset: 1 /sys/module/i915/parameters/powersave: 1 /sys/module/i915/parameters/reset: Y /sys/module/i915/parameters/semaphores: 0 Regards, Thanks! Could you attach the /sys/kernel/debug/dri/0/i915_error_state as well, when the issue happens? It should help to narrow down where it comes from.. Unfortunately even when booting with i915.reset=0 I can't access the system without rebooting and nothing is found in /sys/kernel/debug right after reboot. This option did not bring any additional information to syslog either (yet). In addition to bug 43587 I have this bug too. I have disabled the new sna-code and disabled dri in my xorg.conf for testing, but my system freezes from time to time. My system is current useless because the intel-driver freezes my system and the vesa-driver don't work (black screen). I tried the following too: DebugFlushCaches and DebugFlushBatches in xorg.conf i915.reset=0 or i915semaphores=1 in kernel command line login over network and ssh - no response I can't upload the i915_error_state because the only way is to reboot my system. :( How can I help to supply you with more information? In an attempt to get more information I added this line to my crontab: */1 * * * * cp /sys/kernel/debug/dri/0/i915_error_state /home/tester/tmp/debug/i915_error_state"$(date)" but when the machine hangs this line is not executed anymore and the last version of i915_error_state just contains "no error state collected". The only additional information I can provide for now is that altough neither the kernel nor xorg have been updated since my last communication about this bug, the machine now hangs *much* more seldom. It used to be several times a day at that time, sometimes within minutes of booting and that several times in a row, while it did not happen anymore for weeks. Is there anything else I can try to grab this i915 error state? Thanks, Created attachment 54542 [details]
Xorg.0.log
(sorry for my englisch) Hi, same problem here. The system freezes very random several times on the day or only after weeks. Exactly like maurizio wrote. No idea to get debug informations like i915_error_state. -- chipset: Sandybridge Mobile (GT2) -- system architecture: x86_64 -- xf86-video-intel: git version (17.12.2011) -- xserver: 1.10.4 -- mesa: 7.11 -- libdrm: 2.4.29 -- kernel version: 3.0.6 -- Linux distribution: gentoo -- Machine or mobo model: lenovo L520 attached Xorg.0.log. I removed the NEEDINFO flag. Please tell us how to support you with debug information. (In reply to comment #20) > No idea to get debug informations like i915_error_state. http://intellinuxgraphics.org/how_to_report_bug.html has some instructions on how to get them. But for this specific issue, could you try with kernel 3.2-rc6? It should enable semaphores on the system by default (unless you have I/O virtualization). Also, as there is no easy way to get the i915_error_state, and machine locks hard, if you could setup netconsole to get kernel messages over the network it would be very helpful (for example, as explained here - https://wiki.ubuntu.com/Kernel/Netconsole). Created attachment 55391 [details]
Xorg.0.log with crash, dmesg with black screen
If I'm booting kernel 3.2.0, and the graphic driver is changing the
resolution, I get a black screen (modeset?).
The System itself is running normal in the background. Which means I
can type "blind" or login with ssh (see dmesg output at attachment).
Also the system is booting up properly with the parameter "nomodeset".
There were no errors with kernel 3.1.5.
(So I think the bug lies between version 3.1.5 and 3.2.0.)
Maybe somebody with a Sandybridge Mobile (GT2) graphics card could
test it, to exclude errors in my kernelconfig?
I've made the following changes:
mesa 7.11 -> 7.11.2
kernel 3.0.6 -> 3.1.5
xf86-video-intel git version (19.12.2011)
But there is something positive to report:
Instead of freezing the complete system, it's "only" the xserver that
is crashing.
This makes it possible to login per ssh and kill the xserver.
If the server crashes always the same error message appears in the logfile:
kernel: [drm:ironlake_update_pch_refclk] *ERROR* enabling SSC on PCH
Now I'm using xf86-video-intel git version (09.01.2012)...
That's a different bug in libdrm, please update to libdrm-[intel]-2.4.30 A patch referencing this bug report has been merged in Linux v3.2-rc6: commit f45b55575cedb7efa782e43f1ea74338456d0381 Author: Eugeni Dodonov <eugeni.dodonov@intel.com> Date: Fri Dec 9 17:16:37 2011 -0800 drm/i915: enable semaphores on per-device defaults Created attachment 55576 [details]
Xorg crash
Now I use libdrm version 2.4.30.
The system freezes again. I have the errorlog attached.
I cannot use kernel 3.2.0 due a black screen after boot (see message above). Do you have any idea?
Created attachment 55577 [details]
i915_error_state
I think I have a similar issue, so I won't create a new bug report just yet My configuration: - ASRock Z68 Pro3 Gen3 Motherboard - 8GB RAM - Intel i5 2500K - Vanilla 3.2.0+ kernel (Commit ccb19d263fd1c9e34948e2158c53eacbff369344) I've tried logging via Netconsole but no additional messages appear there which aren't already in /var/messages i915.semaphores=1 does not help Using an nVidia PCIe card and unloading the i915 modules at least allows my system to run stable Am willing to do do whatever compiling, logging, testing etc necessary to help figure this out Oh, and I'm running Fedora 16... It looks like my problem is Motherboard and/or CPU related - Reproducible in Windows 7 and also had one hang when using an nVidia G210 PCIe card YAGBB (Yet another garbage batch buffer.) I think I'm affected by exactly the same problem as described in this bug report. In my case the easiest way to reproduce this issue is to run Oblivion under wine. The problem happens few seconds after loading any save game. What is interesting is the fact that my system somehow manages to recover and after the initial freeze it continues to run the game for another few seconds. After that another freeze occurs. During the freezes I'm usually able to switch back to virtual terminal to kill Oblivion.exe process to collect some debugging data. What I've been able to deduce so far is that this is an regression in either mesa or libdrm. When using mesa 7.11 from git (906f670f1a1f33d69139f520ee931b268049eac6) with libdrm 2.4.27 the game runs without any problems. Tried this combo on both kernel 3.2.11 and 3.3.0-rc7 and the results are the same. No hangs, dmesg is error free. When moving to mesa-8.0 (b9f8cb9e0b1bd640b9b362c9ad56791e4c8cabcd) + libdrm-2.4.31 (tar.gz version) the problem with GPU hangs manifests itself. I even tried the latest mesa and libdrm from master branches and the result is the same, GPU hang. Kernel version, i915 module options don't have any effect, the hang still happens. I'll try to bisect mesa from 7.11 to 8.0 maybe I'll find specific commit which introduces the problem. Created attachment 58524 [details]
dmesg showing GPU hangs
Created attachment 58525 [details]
i915_error_state log
Peter, you have a distinct bug. Please do open a separate bug report to track it (and attach your error state there as well). Best of luck with the bisection, thanks! I believe this is related to: commit c501ae7f332cdaf42e31af30b72b4b66cbbb1604 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Dec 14 13:57:23 2011 +0100 drm/i915: Only clear the GPU domains upon a successful finish By clearing the GPU read domains before waiting upon the buffer, we run the risk of the wait being interrupted and the domains prematurely cleared. The next time we attempt to wait upon the buffer (after userspace handles the signal), we believe that the buffer is idle and so skip the wait. There are a number of bugs across all generations which show signs of an overly haste reuse of active buffers. Such as: https://bugs.freedesktop.org/show_bug.cgi?id=29046 https://bugs.freedesktop.org/show_bug.cgi?id=35863 https://bugs.freedesktop.org/show_bug.cgi?id=38952 https://bugs.freedesktop.org/show_bug.cgi?id=40282 https://bugs.freedesktop.org/show_bug.cgi?id=41098 https://bugs.freedesktop.org/show_bug.cgi?id=41102 https://bugs.freedesktop.org/show_bug.cgi?id=41284 https://bugs.freedesktop.org/show_bug.cgi?id=42141 A couple of those pre-date i915_gem_object_finish_gpu(), so may be unrelated (such as a wild write from a userspace command buffer), but this does look like a convincing cause for most of those bugs. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@kernel.org Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> to mark dup to show relationship *** This bug has been marked as a duplicate of bug 29046 *** Closing resolved+duplicate as duplicate of closed+fixed. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.