Bug 41585

Summary:

X freeze and PGRAPH errors in dmesg

Product:

xorg

Reporter:

Marco Albarelli <motosauro>

Component:

Driver/nouveau

Assignee:

Nouveau Project <nouveau>

Status:

RESOLVED INVALID

QA Contact:

Xorg Project Team <xorg-team>

Severity:

major

Priority:

medium

CC:

jeremyhu

Version:

unspecified

Hardware:

x86-64 (AMD64)

OS:

Linux (All)

Whiteboard:

2011BRB_Reviewed

i915 platform:

i915 features:

Attachments:

Description	Flags
Complete dmseg	none
Complete dmesg after the freeze	none
Xorg log after the freeze	none
Complete dmesg after the freeze	none
Xorg log after the freeze	none
Log: Piglit sanity.tests failing	none

Description Marco Albarelli 2011-10-08 03:15:22 UTC

Created attachment 52113 [details]
Complete dmseg

Hi 
The system hanged when watching a movie via VLC on one screen 
I have a dual monitor setup: one 1920x1080 and one 768x1366 (rotated 90° left)
DM is Kde 4.6

kernel kept on working: I could log in through ssh and audio kept on playing

I don't know how to reproduce the bug, but it sometimes happen, alos sometimes the system fails to disable the video output when going to suspension and I have to manually reboot, but this might be an unrelated bug


if you need more info please go ahead and ask
Thanks in advance

athlonno ~ # uname -a
Linux athlonno 2.6.39-gentoo-r3 #6 SMP Sun Aug 21 22:54:31 CEST 2011 x86_64 AMD Athlon(tm) II X4 645 Processor AuthenticAMD GNU/Linux

athlonno ~ # emerge -s nouveau
Searching...    
[ Results for search key : nouveau ]
[ Applications found : 2 ]

*  x11-base/nouveau-drm [ Masked ]
      Latest version available: 20110820
      Latest version installed: [ Not Installed ]
      Size of files: 1,778 kB
      Homepage:      http://nouveau.freedesktop.org/
      Description:   Nouveau DRM Kernel Modules for X11
      License:       MIT

*  x11-drivers/xf86-video-nouveau
      Latest version available: 0.0.16_pre20110801
      Latest version installed: 0.0.16_pre20110801
      Size of files: 131 kB
      Homepage:      http://nouveau.freedesktop.org/
      Description:   Accelerated Open Source driver for nVidia cards
      License:       MIT

athlonno ~ # lspci
00:00.0 Host bridge: Advanced Micro Devices [AMD] RS780 Host Bridge Alternate
00:02.0 PCI bridge: Advanced Micro Devices [AMD] RS780 PCI to PCI bridge (ext gfx port 0)
00:07.0 PCI bridge: Advanced Micro Devices [AMD] RS780 PCI to PCI bridge (PCIE port 3)
00:09.0 PCI bridge: Advanced Micro Devices [AMD] RS780 PCI to PCI bridge (PCIE port 4)
00:0a.0 PCI bridge: Advanced Micro Devices [AMD] RS780 PCI to PCI bridge (PCIE port 5)
00:11.0 SATA controller: ATI Technologies Inc SB700/SB800 SATA Controller [AHCI mode] (rev 40)
00:12.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller
00:12.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller
00:13.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller
00:13.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller
00:14.0 SMBus: ATI Technologies Inc SBx00 SMBus Controller (rev 42)
00:14.2 Audio device: ATI Technologies Inc SBx00 Azalia (Intel HDA) (rev 40)
00:14.3 ISA bridge: ATI Technologies Inc SB700/SB800 LPC host controller (rev 40)
00:14.4 PCI bridge: ATI Technologies Inc SBx00 PCI to PCI Bridge (rev 40)
00:14.5 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI2 Controller
00:16.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller
00:16.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller
00:18.0 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] HyperTransport Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Miscellaneous Control
00:18.4 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Link Control
01:00.0 FireWire (IEEE 1394): VIA Technologies, Inc. Device 3403 (rev 01)
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 06)
03:00.0 USB Controller: Device 1b73:1000 (rev 01)
04:00.0 VGA compatible controller: nVidia Corporation GT200 [GeForce 210] (rev a2)
04:00.1 Audio device: nVidia Corporation High Definition Audio Controller (rev a1)

Comment 1 Marco Albarelli 2011-10-09 10:44:02 UTC

Created attachment 52150 [details]
Complete dmesg after the freeze

Comment 2 Marco Albarelli 2011-10-09 10:44:43 UTC

Created attachment 52151 [details]
Xorg log after the freeze

Comment 3 Jeremy Huddleston Sequoia 2011-10-09 17:42:50 UTC

How reproducible is this?

It's not clear based just on the backtrace if we're spinning in kernel land in 
the ioctl or if that's just where we happened to be at the time.  Could you use 
strace to see what syscalls are being generated.  That will give an idea if 
it's spinning in the kernel or somewhere in userland:

sudo strace -p <pid of server process>

Comment 4 Marco Albarelli 2011-10-10 01:25:13 UTC

(In reply to comment #3)
> How reproducible is this?
> 
> It's not clear based just on the backtrace if we're spinning in kernel land in 
> the ioctl or if that's just where we happened to be at the time.  Could you use 
> strace to see what syscalls are being generated.  That will give an idea if 
> it's spinning in the kernel or somewhere in userland:
> 
> sudo strace -p <pid of server process>

Thanks for the suggestion, but strace dumps a huge amount of data and the bug is very random
Once it happened when watching a movie via VLC, another time when using the videochat function of google plus through firefox 7
Bith cases happened after a couple of days of uptime (and suspension: I never turn the pc off)
(In reply to comment #3)
> How reproducible is this?
> 
> It's not clear based just on the backtrace if we're spinning in kernel land in 
> the ioctl or if that's just where we happened to be at the time.  Could you use 
> strace to see what syscalls are being generated.  That will give an idea if 
> it's spinning in the kernel or somewhere in userland:
> 
> sudo strace -p <pid of server process>

I've been able to reproduce the freeze
I have a full strace for that but it's ~500mb uncompressed so I guess it's not really the case to post it. Is tehere a specific part you'll need?

This is the point where the freeze happened

setitimer(ITIMER_REAL, {it_interval={0, 20000}, it_value={0, 20000}}, NULL) = 0
read(43, "\211\10\10\0\3\0\200\3\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 4096) = 32
ioctl(8, 0x40086482, 0x7fff7b42b790)    = 0
ioctl(8, 0x40046483, 0x7fff7b42b7c0)    = 0
ioctl(8, 0xc0406481, 0x7fff7b42b2f0)    = 0
ioctl(8, 0xc0406481, 0x7fff7b42b920)    = 0
writev(26, [{"f\0d\0\371\231p\1\10\232p\1\266(.\0\355\1Z\1\330\0\207\0\0\0@\1\262\4\0\4", 32}], 1) = 32
writev(43, [{"J%\220\3\2\0\0\0\3\0\200\3\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 96}], 1) = 96
read(43, 0x1887520, 4096)               = -1 EAGAIN (Resource temporarily unavailable)
setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0
select(256, [1 3 6 8 9 10 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 33 34 35 36 37 38 39 40 41 42 43 44 45 47 48], NULL, NULL, {0, 0}) = 0 (Timeout)
setitimer(ITIMER_REAL, {it_interval={0, 20000}, it_value={0, 20000}}, NULL) = 0
ioctl(8, 0xc0406481, 0x7fff7b42b6a0)    = 0
ioctl(8, 0x40086482, 0x7fff7b42b730)    = ? ERESTARTSYS (To be restarted)
--- SIGALRM (Alarm clock) @ 0 (0) ---
rt_sigreturn(0xe)                       = -1 EINTR (Interrupted system call)
ioctl(8, 0x40086482, 0x7fff7b42b730)    = ? ERESTARTSYS (To be restarted)
--- SIGALRM (Alarm clock) @ 0 (0) ---
rt_sigreturn(0xe)                       = -1 EINTR (Interrupted system call)
ioctl(8, 0x40086482, 0x7fff7b42b730)    = ? ERESTARTSYS (To be restarted)


attaching desg and xorg.log

Comment 5 Marco Albarelli 2011-10-10 01:26:01 UTC

Created attachment 52164 [details]
Complete dmesg after the freeze

Comment 6 Marco Albarelli 2011-10-10 01:26:36 UTC

Created attachment 52165 [details]
Xorg log after the freeze

Comment 7 Marcin Slusarz 2011-10-10 09:42:46 UTC

It's looping on DRM_NOUVEAU_GEM_CPU_PREP ioctl which is periodically interrupted by SIGALRM signal. On the kernel side it's hanging on a fence in __nouveau_fence_wait.

It's a typical symptom of GPU lockup.

To fix this, someone needs to figure out why GPU locked up and/or how to reset the GPU.

Comment 8 Marco Albarelli 2011-10-11 01:09:19 UTC

(In reply to comment #7)
cut...
> To fix this, someone needs to figure out why GPU locked up and/or how to reset
> the GPU.

If needed I have the strace dump which is ~19MB compressed text. I can attach it here or make it availabale through one of my external servers

Comment 9 Marcin Slusarz 2011-10-11 09:32:27 UTC

It's not enough. Sorry.

You need to find quick and reliable way to reproduce it, like running some  rendercheck or piglit test.

Comment 10 dirkneukirchen 2011-10-29 05:45:21 UTC

I think i see the same bug 

uname -a
Linux tenchi-htpc 3.1.0-1-generic #1-Ubuntu SMP Tue Oct 18 21:46:02 UTC 2011 i686 athlon i386 GNU/Linux

System:
- Ubuntu oneiric with current xorg-edgers
- Dual Monitor Setup (1680x1050,1280x1024)
- videocard is onboard 8200
- several video player freezes (totem), faster if I skip forwards/backwards -> updated to xorg-edgers but problem remained 
- filed some launchpad bug: https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-nouveau/+bug/877626

(In reply to comment #9)
> It's not enough. Sorry.
> 
> You need to find quick and reliable way to reproduce it, like running some 
> rendercheck or piglit test.

I installed the libs and got piglet from that git repo

env PIGLIT_BUILD_DIR=`pwd` ./piglit-run.py tests/sanity.tests results/sanity.results
[Sat Oct 29 14:26:02 2011] ::  running :: glean/basic
[Sat Oct 29 14:26:15 2011] ::     pass :: glean/basic
[Sat Oct 29 14:26:15 2011] ::  running :: glean/readPixSanity
[Sat Oct 29 14:29:07 2011] ::     fail :: glean/readPixSanity

Thank you for running Piglit!
Results have been written to results/sanity.results/main

Dmesg output:
[ 1079.152099] [drm] nouveau 0000:02:00.0: PGRAPH - TRAP_TPDMA - TP0: Unhandled ustatus 0x00020000
[ 1079.152105] [drm] nouveau 0000:02:00.0: PGRAPH - TRAP
[ 1079.152112] [drm] nouveau 0000:02:00.0: PGRAPH - ch 4 (0x0004d6c000) subc 5 class 0x8397 mthd 0x19d0 data 0x00000001
[ 1079.466136] [drm] nouveau 0000:02:00.0: PGRAPH - TRAP_TPDMA - TP0: Unhandled ustatus 0x00020000
[ 1079.466146] [drm] nouveau 0000:02:00.0: PGRAPH - TRAP
[ 1079.466158] [drm] nouveau 0000:02:00.0: PGRAPH - ch 4 (0x0004d6c000) subc 5 class 0x8397 mthd 0x19d0 data 0x00000001
[ 1079.764620] [drm] nouveau 0000:02:00.0: PGRAPH - TRAP_TPDMA - TP0: Unhandled ustatus 0x00020000
[ 1079.764631] [drm] nouveau 0000:02:00.0: PGRAPH - TRAP
[ 1079.764643] [drm] nouveau 0000:02:00.0: PGRAPH - ch 4 (0x0004d6c000) subc 5 class 0x8397 mthd 0x19d0 data 0x00000001
[ 1080.064785] [drm] nouveau 0000:02:00.0: PGRAPH - TRAP_TPDMA - TP0: Unhandled ustatus 0x00020000

and more identical lines.
Attaching sanity.results/main

Are some new logs needed?

Comment 11 dirkneukirchen 2011-10-29 05:47:16 UTC

Created attachment 52884 [details]
Log: Piglit sanity.tests failing

Comment 12 Ilia Mirkin 2013-08-18 18:09:07 UTC

It appears that this bug report has laid dormant for quite a while. Sorry we haven't gotten to it. Since we fix bugs all the time, chances are pretty good that your issue has been fixed with the latest software. Please give it a shot. (Linux kernel 3.10.7, xf86-video-nouveau 1.0.9, mesa 9.1.6, or their git versions.) If upgrading to the latest isn't an option for you, your distro's bugzilla is probably the right destination for your bug report.

In an effort to clean up our bug list, we're pre-emptively closing all bugs that haven't seen updates since 2011. If the original issue remains, please make sure to provide fresh info, see http://nouveau.freedesktop.org/wiki/Bugs/ for what we need to see, and re-open this one.

Thanks,

The Nouveau Team

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.