Bug 80865

Summary: [NVE7] Hard hang (GPC0/TPC0/MP trap: MULTIPLE_WARP_ERRORS MEM_OUT_OF_BOUNDS)
Product: xorg Reporter: tethys
Component: Driver/nouveauAssignee: Nouveau Project <nouveau>
Status: NEW --- QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium CC: mbarrera
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Screenshot of log output
none
Relevant parts of /var/log/messages
none
text/plain snippet of /var/log/messages none

Description tethys 2014-07-03 18:31:49 UTC
Created attachment 102212 [details]
Screenshot of log output

I came in to work this morning, to find two of my four monitors blank,
and the other two showing the attached log output (screenshot only,
because I can't find it in anywhere on the filesystem).

/var/log/messages contains errors of the form:

nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 7 [0x007f730000 cubicgrid[18458]]
nouveau E[  PGRAPH][0000:01:00.0] GPC0/TPC0/MP trap: MULTIPLE_WARP_ERRORS MEM_OUT_OF_BOUNDS

cubicgrid is my screensaver (part of xscreensaver)

Here's some more information that may or may not be useful. If you need
any more, just ask and I'll see what I can do:

localhost:~# uname -a
Linux localhost.localdomain 3.16.0-rc2 #1 SMP Tue Jun 24 16:08:14 BST 2014 x86_64 x86_64 x86_64 GNU/Linux
localhost:~# lspci | fgrep VGA
01:00.0 VGA compatible controller: NVIDIA Corporation GK107 [NVS 510] (rev a1)
localhost:~# xrandr
Screen 0: minimum 320 x 200, current 7920 x 1920, maximum 8192 x 8192
DP-1 connected 1600x1200+2560+0 (normal left inverted right x axis y axis) 408mm x 306mm
   1600x1200     60.00*+
   1280x1024     60.02  
   1280x960      60.00  
   1024x768      60.00  
   800x600       60.32  
   640x480       60.00  
   720x400       70.08  
DP-2 connected primary 2560x1440+4160+0 (normal left inverted right x axis y axis) 597mm x 336mm
   2560x1440     59.95*+
   1920x1200     59.88  
   1920x1080     60.00    60.00    50.00    59.94    24.00    23.98  
   1920x1080i    60.00    50.00    59.94  
   1600x1200     60.00  
   1680x1050     59.95  
   1280x1024     75.02    60.02  
   1280x800      59.81  
   1152x864      75.00  
   1280x720      60.00    50.00    59.94  
   1440x576i     50.00  
   1024x768      75.08    60.00  
   1440x480i     60.00    59.94  
   800x600       75.00    60.32  
   720x576       50.00  
   720x480       60.00    59.94  
   640x480       75.00    60.00    59.94  
   720x400       70.08  
DP-3 connected 1200x1920+6720+0 left (normal left inverted right x axis y axis) 518mm x 324mm
   1920x1200     59.95*+
   1600x1200     60.00  
   1680x1050     59.88  
   1280x1024     60.02  
   1280x960      60.00  
   1024x768      60.00  
   800x600       60.32  
   640x480       60.00  
   720x400       70.08  
DP-4 connected 2560x1440+0+0 (normal left inverted right x axis y axis) 597mm x 336mm
   2560x1440     59.95*+
   1920x1200     59.88  
   1920x1080     60.00    60.00    50.00    59.94    24.00    23.98  
   1920x1080i    60.00    50.00    59.94  
   1600x1200     60.00  
   1680x1050     59.95  
   1280x1024     75.02    60.02  
   1280x800      59.81  
   1152x864      75.00  
   1280x720      60.00    50.00    59.94  
   1440x576i     50.00  
   1024x768      75.08    60.00  
   1440x480i     60.00    59.94  
   800x600       75.00    60.32  
   720x576       50.00  
   720x480       60.00    59.94  
   640x480       75.00    60.00    59.94  
   720x400       70.08  
fgrep Monitor /var/log/Xorg.0.log
[    16.578] (**) |   |-->Monitor "<default monitor>"
[    16.731] (II) NOUVEAU(0): Monitor name: SDM-S205F/K
[    16.737] (II) NOUVEAU(0): Monitor name: DELL U2711
[    16.771] (II) NOUVEAU(0): Monitor name: DELL U2412M
[    16.777] (II) NOUVEAU(0): Monitor name: DELL U2713HM
Comment 1 tethys 2014-07-03 18:34:49 UTC
Created attachment 102214 [details]
Relevant parts of /var/log/messages
Comment 2 tethys 2014-07-03 18:36:54 UTC
Created attachment 102215 [details]
text/plain snippet of /var/log/messages

The relevant parts of /var/log/messages were too big to attach uncompressed,
but they appear to consist of repeated sections like this.
Comment 3 tethys 2014-07-03 18:37:37 UTC
The machine was completely hung after this point, but did respond to
Magic SysRq requests.
Comment 4 tethys 2014-07-03 18:39:07 UTC
The two blank monitors were DP-1 and DP-2. The log output was visible
on DP-3 and DP-4.
Comment 5 Ilia Mirkin 2014-07-03 23:06:22 UTC
Those errors are most likely to happen when e.g. you're accessing a UBO out of its bounds. Or if the driver messed up binding said UBO... or some code generation error. However it should be ~harmless, except for the fact that the render in question will be messed up.

BTW, what mesa version are you using?

The DISP errors at the end are the reason (or at least a demonstration) that the GPU hung. I suspect the rest of the system was totally fine.

This also looks worrying:

[160330.118473] nouveau E[   PFIFO][0000:01:00.0] write fault at 0x00004d6000 [PTE] from GR/GPC0/PROP_0 on channel 0x007f971000 [Xorg[981]]
[160330.118474] nouveau E[   PFIFO][0000:01:00.0] PGRAPH engine fault on channel 2, recovering...

I'm guessing the errors stressed the ctxsw logic and it couldn't handle some condition.
Comment 6 tethys 2014-07-07 12:05:34 UTC
localhost:~# rpm -qa | fgrep mesa
mesa-libGL-10.1.5-1.20140607.fc20.i686
mesa-libglapi-10.1.5-1.20140607.fc20.x86_64
mesa-filesystem-10.1.5-1.20140607.fc20.x86_64
mesa-libglapi-10.1.5-1.20140607.fc20.i686
mesa-libGLU-9.0.0-5.fc20.x86_64
mesa-libgbm-10.1.5-1.20140607.fc20.x86_64
mesa-libwayland-egl-10.1.5-1.20140607.fc20.x86_64
mesa-dri-drivers-10.1.5-1.20140607.fc20.x86_64
mesa-libEGL-10.1.5-1.20140607.fc20.x86_64
mesa-libxatracker-10.1.5-1.20140607.fc20.x86_64
mesa-libGL-10.1.5-1.20140607.fc20.x86_64
Comment 7 Mario Barrera 2015-06-22 19:07:19 UTC
This has been happening to me for a while, now it is more frequent, it happens even twice a day.

Relevant parts of the log from my latest crash.

Jun 22 15:12:40 localhost kernel: nouveau E[     PGR][0000:01:00.0] TRAP ch 6 [0x00bf556000 chromium[6894]]
Jun 22 15:12:40 localhost kernel: nouveau E[     PGR][0000:01:00.0] GPC0/TPC0/MP trap: MULTIPLE_WARP_ERRORS MEM_OUT_OF_BOUNDS
Jun 22 15:12:40 localhost kernel: nouveau E[     PGR][0000:01:00.0] GPC0/TPC1/MP trap: MULTIPLE_WARP_ERRORS MEM_OUT_OF_BOUNDS
Jun 22 15:12:40 localhost kernel: nouveau E[     PGR][0000:01:00.0] GPC1/TPC0/MP trap: MULTIPLE_WARP_ERRORS MEM_OUT_OF_BOUNDS
Jun 22 15:12:40 localhost kernel: nouveau E[     PGR][0000:01:00.0] GPC2/TPC0/MP trap: MULTIPLE_WARP_ERRORS MEM_OUT_OF_BOUNDS
Jun 22 15:12:40 localhost kernel: nouveau E[     PGR][0000:01:00.0] GPC3/TPC0/MP trap: MULTIPLE_WARP_ERRORS MEM_OUT_OF_BOUNDS
Jun 22 15:12:40 localhost kernel: nouveau E[     PGR][0000:01:00.0] TRAP ch 6 [0x00bf556000 chromium[6894]]
Jun 22 15:12:40 localhost kernel: nouveau E[     PGR][0000:01:00.0] GPC0/TPC0/MP trap: MULTIPLE_WARP_ERRORS MEM_OUT_OF_BOUNDS
Jun 22 15:12:40 localhost kernel: nouveau E[     PGR][0000:01:00.0] GPC0/TPC1/MP trap: MULTIPLE_WARP_ERRORS MEM_OUT_OF_BOUNDS
Jun 22 15:12:40 localhost kernel: nouveau E[     PGR][0000:01:00.0] GPC1/TPC0/MP trap: MULTIPLE_WARP_ERRORS MEM_OUT_OF_BOUNDS
Jun 22 15:12:40 localhost kernel: nouveau E[     PGR][0000:01:00.0] GPC2/TPC0/MP trap: MULTIPLE_WARP_ERRORS MEM_OUT_OF_BOUNDS
Jun 22 15:12:44 localhost kernel: nouveau E[   PFIFO][0000:01:00.0] SCHED_ERROR [ CTXSW_TIMEOUT ]
Jun 22 15:12:44 localhost kernel: nouveau E[   PFIFO][0000:01:00.0] PGR engine fault on channel 5, recovering...
Jun 22 15:12:44 localhost kernel: nouveau E[     PGR][0000:01:00.0] TRAP ch 6 [0x00bf556000 chromium[6894]]
Jun 22 15:12:44 localhost kernel: nouveau E[     PGR][0000:01:00.0] GPC0/TPC0/MP trap: MULTIPLE_WARP_ERRORS MEM_OUT_OF_BOUNDS
Jun 22 15:12:44 localhost kernel: nouveau E[     PGR][0000:01:00.0] GPC0/TPC1/MP trap: MULTIPLE_WARP_ERRORS MEM_OUT_OF_BOUNDS
Jun 22 15:12:44 localhost kernel: nouveau E[     PGR][0000:01:00.0] GPC1/TPC0/MP trap: MULTIPLE_WARP_ERRORS MEM_OUT_OF_BOUNDS
Jun 22 15:12:44 localhost kernel: nouveau E[     PGR][0000:01:00.0] GPC2/TPC0/MP trap: MULTIPLE_WARP_ERRORS MEM_OUT_OF_BOUNDS
Jun 22 15:12:44 localhost kernel: nouveau E[     PGR][0000:01:00.0] GPC3/TPC0/MP trap: MULTIPLE_WARP_ERRORS MEM_OUT_OF_BOUNDS
Jun 22 15:13:19 localhost libvirtd[583]: No response from client 0x7faa4465af70 after 5 keepalive messages in 30 seconds
Jun 22 15:13:19 localhost libvirtd[1206]: No response from client 0x7f37e7faea50 after 5 keepalive messages in 30 seconds
-- Reboot --

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.