Bug 71662 - [nvd9] hang: nouveau ![ PFIFO][0000:01:00.0] unhandled status 0x00800000
Summary: [nvd9] hang: nouveau ![ PFIFO][0000:01:00.0] unhandled status 0x00800000
Status: RESOLVED DUPLICATE of bug 71659
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/nouveau (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: Nouveau Project
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-11-15 22:32 UTC by Jan Vesely
Modified: 2013-11-27 05:06 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments

Description Jan Vesely 2013-11-15 22:32:02 UTC
This is a regression from 3.12 to 3.13 (still present on nouveau master).
I started seeing these in dmesg:
[   64.358170] hda-codec: out of range cmd 0:6:707:fffffffc
[ 2310.511292] nouveau W[   PFIFO][0000:01:00.0] INTR 0x01000000: 0x00000005
[ 2310.778618] nouveau W[   PFIFO][0000:01:00.0] INTR 0x01000000: 0x00000005
[ 2310.837097] nouveau W[   PFIFO][0000:01:00.0] INTR 0x01000000: 0x00000005
[ 2310.895963] nouveau W[   PFIFO][0000:01:00.0] INTR 0x01000000: 0x00000005
[ 2311.231493] nouveau W[   PFIFO][0000:01:00.0] INTR 0x01000000: 0x00000005
[ 2311.326618] nouveau W[   PFIFO][0000:01:00.0] INTR 0x01000000: 0x00000005
[ 2311.466523] nouveau W[   PFIFO][0000:01:00.0] INTR 0x01000000: 0x00000005
[ 2312.081457] nouveau W[   PFIFO][0000:01:00.0] INTR 0x01000000: 0x00000005
[ 2312.118070] nouveau W[   PFIFO][0000:01:00.0] INTR 0x01000000: 0x00000005
[ 2312.484497] nouveau W[   PFIFO][0000:01:00.0] INTR 0x01000000: 0x00000005
[ 2312.690892] nouveau W[   PFIFO][0000:01:00.0] INTR 0x01000000: 0x00000005
[ 2312.854135] nouveau W[   PFIFO][0000:01:00.0] INTR 0x01000000: 0x00000005
[ 2313.160414] nouveau W[   PFIFO][0000:01:00.0] INTR 0x01000000: 0x00000005
[ 2313.421500] nouveau W[   PFIFO][0000:01:00.0] INTR 0x01000000: 0x00000005
[ 2313.878647] nouveau W[   PFIFO][0000:01:00.0] INTR 0x01000000: 0x00000005
[ 2314.296446] nouveau W[   PFIFO][0000:01:00.0] INTR 0x01000000: 0x00000005
[ 2314.440653] nouveau W[   PFIFO][0000:01:00.0] INTR 0x01000000: 0x00000005
[ 2315.068705] nouveau W[   PFIFO][0000:01:00.0] INTR 0x01000000: 0x00000005
[ 2315.089576] nouveau W[   PFIFO][0000:01:00.0] INTR 0x01000000: 0x00000005
[ 2315.414808] nouveau W[   PFIFO][0000:01:00.0] INTR 0x01000000: 0x00000005
[ 2315.777496] nouveau W[   PFIFO][0000:01:00.0] INTR 0x01000000: 0x00000005
[ 2315.900813] nouveau ![   PFIFO][0000:01:00.0] unhandled status 0x00800000

I'm using mesa git master. Sometimes openarena (exec anholt) finishes OK with just the warnings (tons of them), occasionally it hangs with the last message. The hang appears to happen more often on nouveau master than vanilla 3.12
The card is:
[   10.238807] nouveau  [  DEVICE][0000:01:00.0] Chipset: GF119 (NVD9)
[   10.238809] nouveau  [  DEVICE][0000:01:00.0] Family : NVD0
couple with snb intel.

the bug is perfectly reproducable, let me know what other info I can provide.
Comment 1 Jan Vesely 2013-11-15 22:35:41 UTC
(In reply to comment #0)
> This is a regression from 3.12 to 3.13 (still present on nouveau master).
should have been 3.11 to 3.12.
Comment 2 Ilia Mirkin 2013-11-15 22:38:00 UTC
A git bisect would be useful. You should be able to just run the bisect under drivers/gpu/drm/nouveau, i.e.

git bisect start v3.12 v3.11 -- drivers/gpu/drm/nouveau
Comment 3 Jan Vesely 2013-11-16 03:23:17 UTC
(In reply to comment #2)
> A git bisect would be useful. You should be able to just run the bisect
> under drivers/gpu/drm/nouveau, i.e.
> 
> git bisect start v3.12 v3.11 -- drivers/gpu/drm/nouveau

$ git bisect log
# bad: [5e01dc7b26d9f24f39abace5da98ccbd6a5ceb52] Linux 3.12
# good: [6e4664525b1db28f8c4e1130957f70a94c19213e] Linux 3.11
git bisect start 'v3.12' 'v3.11' 'drivers/gpu/drm/nouveau/'
# bad: [bd9c5a2016307164c419c5e24a46921c10e620a0] drm/nouveau: require contiguous bo for framebuffer
git bisect bad bd9c5a2016307164c419c5e24a46921c10e620a0
# bad: [0d69704ae348c03bc216b01e32a0e9a2372be419] gpu/vga_switcheroo: add driver control power feature. (v3)
git bisect bad 0d69704ae348c03bc216b01e32a0e9a2372be419
# bad: [baa7094355a10b432bbccacb925da4bdac861c8d] drm: const'ify ioctls table (v2)
git bisect bad baa7094355a10b432bbccacb925da4bdac861c8d
# good: [72525b3f333de54fa0c42ef87f27861e41478f1e] drm/ttm: convert to unified vma offset manager
git bisect good 72525b3f333de54fa0c42ef87f27861e41478f1e
# bad: [43387b37fa2d0f368142b8fa8c9440da92e5381b] drm/gem: create drm_gem_dumb_destroy
git bisect bad 43387b37fa2d0f368142b8fa8c9440da92e5381b
# first bad commit: [43387b37fa2d0f368142b8fa8c9440da92e5381b] drm/gem: create drm_gem_dumb_destroy


note that the only good commit was a bit weird, plymouth won't run, gnome-shell segfaults immediatelly, and DRI_PRIME=13d apps showed no output (both glxgears and oa). However, both were running (displayed fps, ...) and the hang was always casued by running oa (never glxgears) so I figured the black screen does not matter.
Comment 4 Ilia Mirkin 2013-11-16 03:32:38 UTC
Hmm... that commit is outside of nouveau. I'm not 100% sure that the bisect results are accurate in that case. (But all the work you did is still useful, future bisects can use the same good/bad information.)

Can you verify that 43387b37fa2d0^ is good? (i.e. commit 86e81f0e62)

If that is not the case, you can restart the bisect with

git bisect v3.12 v3.11 72525b3f333de54 -- drivers/gpu/drm

which will include everything in drm.
Comment 5 Jan Vesely 2013-11-16 03:33:52 UTC
If oa does not hang, I also see a lot of:

[  683.090527] dmar: DRHD: handling fault status reg 3
[  683.090531] dmar: DMAR:[DMA Write] Request device [01:00.0] fault addr fe000000 
DMAR:[fault reason 05] PTE Write access is not set

There's 1400+ entries in generally ascending order (sometimes it goes back or stays on the same address). The last one is:

[  683.093190] dmar: DRHD: handling fault status reg 3
[  683.093192] dmar: DMAR:[DMA Write] Request device [01:00.0] fault addr fe579000 
DMAR:[fault reason 05] PTE Write access is not set

I'm not sure if it's a direct correlation (hang xor dma errors)
Comment 6 Ilia Mirkin 2013-11-16 03:36:03 UTC
As for the DMAR errors, see if disabling the IOMMU (aka VT-d for intel) helps. (It'll definitely get rid of the messages, but might not get rid of the underlying issues.)
Comment 7 Jan Vesely 2013-11-16 05:02:00 UTC
nvm, I just reproduced it on 3.11.6. It was a bit harder to hit but after cca 20 runs I got unhandled status 0x00800000. Not a regression.
Comment 8 Ilia Mirkin 2013-11-27 05:06:57 UTC

*** This bug has been marked as a duplicate of bug 71659 ***


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.