Created attachment 33551 [details] Batch buffer dump from drm-intel.git kernel Forwarding bug report from Ubuntu user Bill Farrow: https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/511001 [Problem] GPU hang with 855GM with Ubuntu Lucid, also with newest intel-drm-next and kernel.org kernels and with the xorg-edgers PPA. GPU error state from Chris Wilson's Record batch buffer following GPU hang patch captured. [Original report] Testing with Lucid Lynx Alpha 2 Netbook Remix on USB stick. The laptop is an Asus M5200N with Intel i855GM graphics chip. I have the same graphics freezing bug when running 9.10 Karmic. There is already an open bug for Karmic https://bugs.launchpad.net/bugs/447892 but since this bug has not been fixed in the Lucid yet, I am raising a separate bug report. 00:02.0 VGA compatible controller: Intel Corporation 82852/855GM Integrated Graphics Device (rev 02) --- Architecture: i386 DistroRelease: Ubuntu 10.04 DkmsStatus: Error: [Errno 2] No such file or directory InstallationMedia: Error: [Errno 13] Permission denied: '/var/log/installer/media-info' Lsusb: Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub MachineType: ASUSTeK Computer Inc. M5N Package: xserver-xorg-video-intel 2:2.10.0+git20100220.c2c670ef-0ubuntu0sarvatt PackageArchitecture: i386 PccardctlIdent: Socket 0: no product info available Socket 1: no product info available PccardctlStatus: Socket 0: no card Socket 1: no card ProcCmdLine: auto BOOT_IMAGE=Linux ro root=/dev/sda1 ProcEnviron: LANG=en_US.UTF-8 SHELL=/bin/bash ProcVersionSignature: Ubuntu 2.6.32-14.20-generic RelatedPackageVersions: xserver-xorg 1:7.5+1ubuntu6 libgl1-mesa-glx 7.8.0~git20100219.496724b8-0ubuntu0sarvatt libdrm2 2.4.18+git20100217.2d9990c7-0ubuntu0sarvatt xserver-xorg-video-intel 2:2.10.0+git20100220.c2c670ef-0ubuntu0sarvatt Tags: lucid Uname: Linux 2.6.32-14-generic i686 UnreportableReason: This is not a genuine Ubuntu package UserGroups: adm admin cdrom dialout lpadmin plugdev sambashare dmi.bios.date: 12/08/2004 dmi.bios.vendor: American Megatrends Inc. dmi.bios.version: 0212 dmi.board.name: M5N dmi.board.vendor: ASUSTeK Computer Inc. dmi.board.version: 1.0 dmi.chassis.asset.tag: ATN12345678901234567 dmi.chassis.type: 10 dmi.chassis.vendor: ASUSTeK Computer Inc. dmi.chassis.version: 1.0 dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr0212:bd12/08/2004:svnASUSTeKComputerInc.:pnM5N:pvr1.0:rvnASUSTeKComputerInc.:rnM5N:rvr1.0:cvnASUSTeKComputerInc.:ct10:cvr1.0: dmi.product.name: M5N dmi.product.version: 1.0 dmi.sys.vendor: ASUSTeK Computer Inc. system: distro: Ubuntu architecture: i686kernel: 2.6.32-14-generic
Created attachment 33552 [details] Batch buffer dump with v8 patch on top of Linus' kernel as of 2010-02-21
Created attachment 33553 [details] Xorg.0.log
Created attachment 33554 [details] lspci -vvnn
Assigning to Chris Wilson since I assume he may be interested at looking at the captured error state from his patch drm/i915: Record batch buffer following GPU error. Hope this is okay.
Created attachment 33561 [details] crash2: dmesg log
Created attachment 33562 [details] crash2: Xorg log
Created attachment 33563 [details] crash2: batch buffer dump
At the suggestion from Geir, I have collected dmesg [1], Xorg.0.log [2], and the batch buffer dump [3] from a single boot up and crash/freeze instance. This is much better than the previous log files which came from differing runs and maybe even different kernel builds. The kernel was built from the drm-intel.git repository [4], which includes Chris Wilson's gpu debug code. kernel = 2.6.33-rc8-v2.6.29-rc1-51333-g9df3079 git describe = v2.6.29-rc1-51333-g9df3079 [1]: attachment 33561 [details] dmesg [2]: attachment 33562 [details] /var/log/Xorg.0.log [3]: attachment 33563 [details] cat /sys/kernel/debug/dri/0/i915_error_state [4]: http://git.kernel.org/?p=linux/kernel/git/anholt/drm-intel.git
Thanks, this is another cache flushing bug. The telltale here is: IPEHR: 0x40c00000 ... 0x02618194: 0x7c09c0cc: 3DSTATE_MAP_COORD_SET_I830 0x02618198: 0x7d020000: 3DSTATE_MAP_COORD_SETBIND_I830 0x0261819c: HEAD 0x00000098: dword 1 0x026181a0: 0x7c291099: 3DSTATE_MAP_TEX_STREAM_I830 i.e. the last instruction header does not match the previous dword of the command stream -- the GPU is seeing a different state of memory wrt the CPU.
Created attachment 33616 [details] [review] msleep(magic_delay) This patch has proven vital to work-around more obvious cache-flushing bugs. I'd appreciate much wider testing...
(In reply to comment #10) > Created an attachment (id=33616) [details] > msleep(magic_delay) > > This patch has proven vital to work-around more obvious cache-flushing bugs. > I'd appreciate much wider testing... > I've tested this patch for over an hour and my GPU is still up and running. I'm running latest intel-drm-next kernel from git, libdrm-2.4.18, Xorg 1.7.5, latest xf86-video-intel from git. Furtermore, this patch also fixes the render errors that are reported in this bug #26346. That said, rendering (both 2d and 3d) is now quite slow, as expected.
Chris, the msleep patch [1] works, I can log in with gdm and get to the desktop now. Moving and redrawing windows is slow, as expected. I had one weird freeze when closing firefox where the mouse pointer still moved, and clicking on panel icons changes the mouse pointer to the spinning circle as if it was launching the application, but then the mouse pointer returns to an arrow and no application was displayed. Unfortunately I did not grab the logs, and I have been unable to reproduce it since. So how do we clean this up and fix this cache flush problem properly ? I'm happy to code if you give me some pointers. [1]: attachment 33616 [details] [review] msleep(magic_delay)
As it is clearly the CPU/GPU coherency issue, I'm duping this so as to consolidate the reports... As to how to fix it, I've yet to find a suitable solution. The key is to ensure that the ICH has finished its writes prior to the GPU starting to DMA from memory. Sounds like it should be a fairly trivial, well-documented problem... But I've yet to find this precise scenario mentioned. *** This bug has been marked as a duplicate of bug 26345 ***
Created attachment 33742 [details] Batch buffer dump from Crash 3 Crash with msleep() patch applied
Created attachment 33743 [details] Xorg log from crash 3 Crash with msleep() patch applied
Created attachment 33744 [details] Xorg log after X restarted but with black screen Crash with msleep() patch applied
Created attachment 33745 [details] Batch buffer dump from Crash 4 Crash with msleep() patch applied
Created attachment 33746 [details] Xorg.0.old log from before the freeze Crash with msleep() patch applied
Created attachment 33747 [details] Xorg log from freeze Crash with msleep() patch applied
Tonight I updated my ubuntu packages including xserver-xorg-* keeping the kernel with msleep() patch and I had an Xorg crash and restart, and on the next boot an Xorg freeze. I have captured the batch buffer and Xorg log files if that helps.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.