Bug 69375

Summary: [NV4E] GPU lockup when using chrome/flash
Product: xorg Reporter: Peter Taylor <taylor_tails>
Component: Driver/nouveauAssignee: Nouveau Project <nouveau>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: major    
Priority: medium CC: masao-takahashi, travneff
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
dmesg output
none
/sys/kernel/debug/dri/0/vbios.rom
none
xorg log (Nouveau)
none
xorg log (Nvidia proprietary)
none
dmesg for a debug kernel (Nouveau)
none
dmesg for Nvidia (proprietary) none

Description Peter Taylor 2013-09-15 12:36:09 UTC
Created attachment 85858 [details]
dmesg output

I recent upgrade using apt-get dist-upgrade on debian testing ~2 weeks ago has caused chrome to start looking up the GPU.

dmesg highlights:
Linux version 3.9-1-amd64

[  805.747928] nouveau E[  PGRAPH][0000:00:05.0]  ERROR nsource: DATA_ERROR nstatus: BAD_ARGUMENT
[  805.747945] nouveau E[  PGRAPH][0000:00:05.0] ch 3 [0x00056000 chrome[3853]] subc 2 class 0x0039 mthd 0x0314 data 0x01197000
[  836.696327] nouveau E[   PFIFO][0000:00:05.0] DMA_PUSHER - ch 3 [chrome[3853]] get 0xbeef0200 put 0x0065a6f4 state 0xc0028188 (err: MEM_FAULT) push 0x00001000
[  850.504026] nouveau E[chrome[3853]] failed to idle channel 0xcccc0000 [chrome[3853]]
[  853.508018] nouveau E[chrome[3853]] failed to idle channel 0xcccc0000 [chrome[3853]]
[  863.804030] nouveau E[Xorg[2162]] reloc wait_idle failed: -16
[  863.804036] nouveau E[Xorg[2162]] reloc apply: -16
[  865.743035] nouveau E[     DRM] GPU lockup - switching to software fbcon
[  866.804032] nouveau E[Xorg[2162]] reloc wait_idle failed: -16
[  866.804079] nouveau E[Xorg[2162]] reloc apply: -16
[  869.804037] nouveau E[Xorg[2162]] reloc wait_idle failed: -16
[  869.804103] nouveau E[Xorg[2162]] reloc apply: -16
[  887.836034] nouveau E[Xorg[2162]] failed to idle channel 0xcccc0000 [Xorg[2162]]
[  890.836025] nouveau E[Xorg[2162]] failed to idle channel 0xcccc0000 [Xorg[2162]]
Comment 1 Andrew Travneff 2013-09-16 15:25:13 UTC
Seems like I have same for a long time at NV43. Now happens 1-2 times a day, very annoying. VT works after that, but reboot is necessary to launch X again.

Affected kernels are at least 3.6 .. 3.11. However it was much better before I updated Fedora 18 -> 19. 3.11 seems the worst case since it adds much more freeze and errors messages like "BUG: soft lockup - CPU#0 stuck for a ...".

Proprietary driver periodically prints errors after some desktop freeze. Can it be a dying HW?

Debug kernel also prints a note about "possible circular locking dependency" in drm / nouveau.

Reproducible with launching "Savage: Rebirth" game. I haven't tried another 3D games, but use KDE desktop effects.

Also a render artefacts are shown sometimes, mostly in Firefox.

Besides that, is there a GPU recovery feature in Nouveau? What happened with these patches: [3]?

More info: [1][2].
SW: Fedora 19 x64
  libdrm-2.4.46-1.fc19.x86_64
  xorg-x11-server-Xorg-1.14.2-9.fc19.x86_64
  xorg-x11-drv-nouveau-1.0.9-1.fc19.x86_64

HW: 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation NV43 [GeForce 6600] [10de:0141] (rev a2)

1: https://bugzilla.redhat.com/show_bug.cgi?id=979537

2: https://bugzilla.redhat.com/show_bug.cgi?id=868482

3: http://lists.freedesktop.org/archives/nouveau/2012-April/010199.html
Comment 2 Andrew Travneff 2013-09-16 15:26:40 UTC
Created attachment 85919 [details]
/sys/kernel/debug/dri/0/vbios.rom

Get as: "cat /sys/kernel/debug/dri/0/vbios.rom > /tmp/vbios.rom"
Comment 3 Andrew Travneff 2013-09-16 15:28:29 UTC
Created attachment 85920 [details]
xorg log (Nouveau)
Comment 4 Andrew Travneff 2013-09-16 15:30:34 UTC
Created attachment 85921 [details]
xorg log (Nvidia proprietary)

Uploaded to compare the errors.
Comment 5 Andrew Travneff 2013-09-16 15:33:59 UTC
Created attachment 85922 [details]
dmesg for a debug kernel (Nouveau)
Comment 6 Andrew Travneff 2013-09-16 15:35:38 UTC
Created attachment 85923 [details]
dmesg for Nvidia (proprietary)

Uploaded to compare the errors.
Comment 7 Ilia Mirkin 2013-11-09 20:14:02 UTC
Some recent relocation failures have been traced down to undefined behaviour in libdrm which gcc-4.8 interprets differently than gcc-4.7 and earlier. A fix has been submitted: http://cgit.freedesktop.org/mesa/drm/commit/?id=482abbfafb56cbceaf5355c026434e638cddd0f1 and I believe this deb contains the fix: http://ftp.de.debian.org/debian/pool/main/libd/libdrm/libdrm-nouveau2_2.4.46-4_amd64.deb. Please try it either with a libdrm that contains that patch, or with libdrm compiled with gcc-4.7 or clang.
Comment 8 Peter Taylor 2013-11-11 21:08:48 UTC
installed suggested package, and the problem was SOLVED.

Thank you very much.
Peter T

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.