Bug 93101 - Radeon hang on VLC DVD playback (GL deinterlace filter?)
Summary: Radeon hang on VLC DVD playback (GL deinterlace filter?)
Status: NEW
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/radeonsi (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact: Default DRI bug account
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-11-25 10:00 UTC by Thomas DEBESSE
Modified: 2015-12-10 02:52 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
syslog (short) (27.30 KB, text/plain)
2015-11-25 10:00 UTC, Thomas DEBESSE
Details
syslog (long) (214.51 KB, text/plain)
2015-11-25 10:00 UTC, Thomas DEBESSE
Details
graphical glitches (screen photo) (931.26 KB, image/jpeg)
2015-11-25 10:04 UTC, Thomas DEBESSE
Details
lspci (verbose) (3.22 KB, text/plain)
2015-11-25 10:15 UTC, Thomas DEBESSE
Details
mpv vdpau output (1.54 KB, text/plain)
2015-12-10 02:47 UTC, Thomas DEBESSE
Details
vlc vdpau output (1.32 KB, text/plain)
2015-12-10 02:48 UTC, Thomas DEBESSE
Details
dmesg (106.29 KB, text/plain)
2015-12-10 02:48 UTC, Thomas DEBESSE
Details
syslog (736.38 KB, text/plain)
2015-12-10 02:50 UTC, Thomas DEBESSE
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Thomas DEBESSE 2015-11-25 10:00:11 UTC
Created attachment 120103 [details]
syslog (short)

Hi, this is an issue about the fact that some GPU lockup can lead to some CPU burn (for real).

Some hours ago I get a GPU lockup while I was trying to read a DVD with VLC.
The video rendering wasn't functionnal (no picture), then the GPU started to display weird things (see attached photo) then locked up.

I've joined some log, one very long syslog, and some abstract for this one (more easy to read, but I gave you the original one in case of I missed something).

To summarize, you can read lines like that in the syslog:

```
Nov 24 22:58:18 gollum gnome-session[3720]: [00007f134c173c20] avcodec decoder: Using G3DVL VDPAU Driver Shared Library version 1.0 for hardware decoding
Nov 24 22:58:18 gollum kernel: [97035.599456] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00002126
Nov 24 22:58:18 gollum kernel: [97035.599460] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0408800C
Nov 24 22:58:18 gollum kernel: [97035.599465] VM fault (0x0c, vmid 2) at page 8486, read from 'TC4' (0x54433400) (136)
Nov 24 22:58:55 gollum kernel: [97072.747472] radeon 0000:01:00.0: ring 0 stalled for more than 10088msec
Nov 24 22:58:55 gollum kernel: [97072.747483] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000059fcff last fence id 0x000000000059fd12 on ring 0)
Nov 24 22:59:04 gollum kernel: [97081.259933] WARNING: CPU: 4 PID: 23502 at /home/kernel/COD/linux/drivers/gpu/drm/radeon/radeon_object.c:83 radeon_ttm_bo_destroy+0xe7/0xf0 [radeon]()
```

My system is running:

vlc 3.0.0~~git20151123+r62463+34~ubuntu15.10.1
linux-image-4.3.0-040300-generic 4.3.0-040300.201511020949
libdrm-radeon1 2.4.65+git1511161830.8913cd~gd~w
xserver-xorg-video-radeon 7.6.99+git1511170732.10b7c3~gd~w
libgl1-mesa-dri 11.2~git1511231930.e4c122~gd~w
mesa-vdpau-drivers 11.2~git1511231930.e4c122~gd~w

That is a real issue but it's not the topic of this ticket.

The really big problem is this bug almost burned my CPU. I explain.

When the bug occurred, I tried to track it. Instead of rebooting my computer I started a laptop in order to connect to my computer using ssh, and to diagnose some stuff on the living system. While the laptop were booting, I took some photo of my screen.

But suddenly, my computer shutdown itself. The CPU critical temperature was reached.

Normal operation temperature is normally between 30°C and 40°C on my system. In case of emergency, I have two regulators running on my computer. The first one raises fan speed from 128 tr/min to 1400 tr/min when temperature reaches 50°C, and the second one downclocks all the 8 core from 4.7 GHz to 1.4GHz when the temperature reaches 70°C.

Both regulators are userspace regulators. The first is the well-known fancontrol, and the other one is mine. Both works well (if I use cpuburn for example).

The fact is, when the GPU lockup occurred, something from the driver goes wrong on the CPU side. It looks like some infinite loop started on my cores, doing some extensive tasks, probably without having to deal with external components (like central memory unit) in order to never slow done the CPU.

In fact, the computer acted exactly like if I was running one cpuburn process per core using performance cpu governor during a summer noon. But there was an exception, the fan never accelerated (so it was still running at 128 tr/min when the CPU reached 90°, and the cpu was never downclocked too.

That's why I wrote this issue. When this bug occured, the system goes so wrong the CPU was on knees and no regulator was able to control the CPU fan so the CPU endlessly heating.

Hopefully, the internal CPU temperature protection shutdown automatically my computer to save itself. But if someone use a CPU with a faulty temperature safety mechanisme, this GPU lockup can lead to a CPU burn for real !
Comment 1 Thomas DEBESSE 2015-11-25 10:00:58 UTC
Created attachment 120104 [details]
syslog (long)
Comment 2 Thomas DEBESSE 2015-11-25 10:04:02 UTC
Created attachment 120105 [details]
graphical glitches (screen photo)
Comment 3 Thomas DEBESSE 2015-11-25 10:15:01 UTC
The GPU is an AMD radeon R9 390X, lspci says:

01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii XT [Radeon R9 290X] [1002:67b0] (rev 80) (prog-if 00 [VGA controller])
	Subsystem: Micro-Star International Co., Ltd. [MSI] Device [1462:2015]
	Flags: bus master, fast devsel, latency 0, IRQ 55
	Memory at c0000000 (64-bit, prefetchable) [size=256M]
	Memory at d0000000 (64-bit, prefetchable) [size=8M]
	I/O ports at e000 [size=256]
	Memory at fea00000 (32-bit, non-prefetchable) [size=256K]
	Expansion ROM at fea40000 [disabled] [size=128K]
	Capabilities: [48] Vendor Specific Information: Len=08 <?>
	Capabilities: [50] Power Management version 3
	Capabilities: [58] Express Legacy Endpoint, MSI 00
	Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
	Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
	Capabilities: [150] Advanced Error Reporting
	Capabilities: [200] #15
	Capabilities: [270] #19
	Capabilities: [2b0] Address Translation Service (ATS)
	Capabilities: [2c0] #13
	Capabilities: [2d0] #1b
	Kernel driver in use: radeon

See attached lspci log file for a more complete log.

The CPU is an AMD FX-9590, but it's not very significant.
Comment 4 Thomas DEBESSE 2015-11-25 10:15:26 UTC
Created attachment 120107 [details]
lspci (verbose)
Comment 5 Christian König 2015-11-25 10:42:06 UTC
Well to be honest this is more or less the expected behavior.

The GPU crashed in a not recoverable way and so we ended in an endless loop trying to reset the hardware over and over again when userspace did a command submission.

This is unfortunate, but crashing the kernel is always possible and exactly this is the reason why there are usually dedicated micro-controllers for temperature and fan control.
Comment 6 Thomas DEBESSE 2015-12-08 02:49:16 UTC
Hi, thank you for your answer, does this mean there is no fix possible to prevent the driver to burn the cpu in case of hang?
Comment 7 Christian König 2015-12-08 09:03:29 UTC
(In reply to Thomas DEBESSE from comment #6)
> Hi, thank you for your answer, does this mean there is no fix possible to
> prevent the driver to burn the cpu in case of hang?

No, not really.

We should of course try to fix the hang and that's why I think we should keep this bug open, but driver bugs like this can always happen and not only in radeon but other kernel drivers as well.

As I said that's why dedicated hardware independent of the main CPU is usually used for things like this.

IIRC there is also a rather big warning in the fancontrol manual about this.
Comment 8 Thomas DEBESSE 2015-12-09 05:45:03 UTC
OK, so, about the hang itself, it looks like related to VDPAU if I read fine these lines (see after), may I create a special thread for this one or do we talk about that in the current thread (and have I to edit the title) ?

---8<-------------
Nov 24 22:58:06 gollum gnome-session[3720]: [0000000001ae7140] core libvlc: Lancement de vlc avec l'interface par defaut. Utilisez «cvlc» pour demarrer VLC sans interface.
Nov 24 22:58:15 gollum gnome-session[3720]: libdvdnav: Using dvdnav version 5.0.1
Nov 24 22:58:15 gollum gnome-session[3720]: libdvdnav: DVD disk reports itself with Region mask 0x00c00000. Regions: 1 2 3 4 5 6
Nov 24 22:58:17 gollum gnome-session[3720]: libdvdread: Found 2 VTS's
Nov 24 22:58:17 gollum gnome-session[3720]: [00007f135c0011c0] core input error: ES_OUT_RESET_PCR called
Nov 24 22:58:18 gollum gnome-session[3720]: [00007f134c173c20] avcodec decoder: Using G3DVL VDPAU Driver Shared Library version 1.0 for hardware decoding
Nov 24 22:58:18 gollum gnome-session[3720]: [00007f13300cd7c0] deinterlace filter error: unsupported chroma VDV0
Nov 24 22:58:18 gollum kernel: [97035.599447] radeon 0000:01:00.0: GPU fault detected: 146 0x04c4880c
Nov 24 22:58:18 gollum kernel: [97035.599456] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00002126
Nov 24 22:58:18 gollum kernel: [97035.599460] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0408800C
Nov 24 22:58:18 gollum kernel: [97035.599465] VM fault (0x0c, vmid 2) at page 8486, read from 'TC4' (0x54433400) (136)
------------->8---
Comment 9 Christian König 2015-12-09 18:32:33 UTC
(In reply to Thomas DEBESSE from comment #8)
> OK, so, about the hang itself, it looks like related to VDPAU if I read fine
> these lines (see after), may I create a special thread for this one or do we
> talk about that in the current thread (and have I to edit the title) ?

Changing the topic of this one should be fine.

And yes, that sounds like some kind of video playback problem, but it's not necessary VDPAU related.

> Nov 24 22:58:18 gollum gnome-session[3720]: [00007f13300cd7c0] deinterlace
> filter error: unsupported chroma VDV0

Where is this error message coming from? Is that reproducible? Does pure VDPAU playback using mpv works.

The problem seems to be that something tries to use a deinterlace filter, most likely using OpenGL instead of VDPAU and that filter is what is causing the crash.
Comment 10 Thomas DEBESSE 2015-12-09 21:01:53 UTC
(In reply to Christian König from comment #9)
> > Nov 24 22:58:18 gollum gnome-session[3720]: [00007f13300cd7c0] deinterlace
> > filter error: unsupported chroma VDV0
> 
> Where is this error message coming from?

I guess it's from VLC, all info from VLC where marked as 'gnome-session', see for example the « core libvlc: Lancement de vlc avec l'interface par defaut. Utilisez «cvlc» pour demarrer VLC sans interface. » message which is marked from 'gnome-session' too and I'm 100% sure it comes from VLC.

> Is that reproducible?

I will try to reproduce.

> Does pure VDPAU playback using mpv works.

I will try.

> The problem seems to be that something tries to use a deinterlace filter,
> most likely using OpenGL instead of VDPAU and that filter is what is causing
> the crash.

Perhaps !
Comment 11 Thomas DEBESSE 2015-12-10 02:47:04 UTC
> Is that reproducible?

Yes it is.

> Does pure VDPAU playback using mpv works.

Yes without crashing, if I did it right (see log).

MPV said that:

VO: [vdpau] 720x576 => 768x576 yuv420p
[vo/vdpau] Compositing window manager detected. Assuming timing info is inaccurate.

VLC said that:

[00007f11a40009d8] core input error: ES_OUT_RESET_PCR called
[00007f11a40009d8] core input error: ES_OUT_RESET_PCR called
[00007f119c175288] avcodec decoder: Using G3DVL VDPAU Driver Shared Library version 1.0 for hardware decoding
[00007f11880a7398] deinterlace filter error: unsupported chroma VDV0
[00007f11a40009d8] core input error: ES_OUT_SET_(GROUP_)PCR  is called too late (pts_delay increased to 300 ms)
[00007f11a40009d8] core input error: ES_OUT_RESET_PCR called

And the system hang. The driver tried to reset itself, but let the system in an unusable state (but it was enough to log stuff).

This time I caught that stuff on syslog:

Dec 10 03:30:17 gollum /usr/lib/gdm/gdm-x-session[3819]: (EE) Backtrace:
Dec 10 03:30:17 gollum /usr/lib/gdm/gdm-x-session[3819]: (EE) 0: /usr/bin/X (xorg_backtrace+0x4e) [0x55f48bc7a68e]
Dec 10 03:30:17 gollum /usr/lib/gdm/gdm-x-session[3819]: (EE) 1: /usr/bin/X (QueuePointerEvents+0x52) [0x55f48bb36152]
Dec 10 03:30:17 gollum /usr/lib/gdm/gdm-x-session[3819]: (EE) 2: /usr/lib/xorg/modules/input/evdev_drv.so (0x7fc39ebf4000+0x60a7) [0x7fc39ebfa0a7]
Dec 10 03:30:17 gollum /usr/lib/gdm/gdm-x-session[3819]: (EE) 3: /usr/lib/xorg/modules/input/evdev_drv.so (0x7fc39ebf4000+0x687d) [0x7fc39ebfa87d]
Dec 10 03:30:17 gollum /usr/lib/gdm/gdm-x-session[3819]: (EE) 4: /usr/bin/X (0x55f48bac6000+0x96ac8) [0x55f48bb5cac8]
Dec 10 03:30:17 gollum /usr/lib/gdm/gdm-x-session[3819]: (EE) 5: /usr/bin/X (0x55f48bac6000+0xbfc92) [0x55f48bb85c92]
Dec 10 03:30:17 gollum /usr/lib/gdm/gdm-x-session[3819]: (EE) 6: /lib/x86_64-linux-gnu/libc.so.6 (0x7fc3a950c000+0x352f0) [0x7fc3a95412f0]
Dec 10 03:30:17 gollum /usr/lib/gdm/gdm-x-session[3819]: (EE) 7: /lib/x86_64-linux-gnu/libpthread.so.0 (pthread_cond_wait+0xc0) [0x7fc3a90f0da0]
Dec 10 03:30:17 gollum /usr/lib/gdm/gdm-x-session[3819]: (EE) 8: /usr/lib/x86_64-linux-gnu/dri/radeonsi_dri.so (0x7fc3a302c000+0x738e4b) [0x7fc3a3764e4b]
Dec 10 03:30:17 gollum /usr/lib/gdm/gdm-x-session[3819]: (EE) 9: /usr/lib/x86_64-linux-gnu/dri/radeonsi_dri.so (0x7fc3a302c000+0x739e8e) [0x7fc3a3765e8e]
Dec 10 03:30:17 gollum /usr/lib/gdm/gdm-x-session[3819]: (EE) 10: /usr/lib/x86_64-linux-gnu/dri/radeonsi_dri.so (0x7fc3a302c000+0x69b2b7) [0x7fc3a36c72b7]
Dec 10 03:30:17 gollum /usr/lib/gdm/gdm-x-session[3819]: (EE) 11: /usr/lib/x86_64-linux-gnu/dri/radeonsi_dri.so (0x7fc3a302c000+0x75693f) [0x7fc3a378293f]
Dec 10 03:30:17 gollum /usr/lib/gdm/gdm-x-session[3819]: (EE) 12: /usr/lib/x86_64-linux-gnu/dri/radeonsi_dri.so (0x7fc3a302c000+0x20861c) [0x7fc3a323461c]
Dec 10 03:30:17 gollum /usr/lib/gdm/gdm-x-session[3819]: (EE) 13: /usr/lib/xorg/modules/libglamoregl.so (glamor_block_handler+0x4b) [0x7fc3a48f484b]
Dec 10 03:30:17 gollum /usr/lib/gdm/gdm-x-session[3819]: (EE) 14: /usr/lib/xorg/modules/drivers/radeon_drv.so (0x7fc3a5794000+0x46bf2) [0x7fc3a57dabf2]
Dec 10 03:30:17 gollum /usr/lib/gdm/gdm-x-session[3819]: (EE) 15: /usr/bin/X (_CallCallbacks+0x34) [0x55f48bb23654]
Dec 10 03:30:17 gollum /usr/lib/gdm/gdm-x-session[3819]: (EE) 16: /usr/bin/X (WriteToClient+0x244) [0x55f48bc7e2d4]
Dec 10 03:30:17 gollum /usr/lib/gdm/gdm-x-session[3819]: (EE) 17: /usr/bin/X (0x55f48bac6000+0x188758) [0x55f48bc4e758]
Dec 10 03:30:17 gollum /usr/lib/gdm/gdm-x-session[3819]: (EE) 18: /usr/bin/X (0x55f48bac6000+0x5818f) [0x55f48bb1e18f]
Dec 10 03:30:17 gollum /usr/lib/gdm/gdm-x-session[3819]: (EE) 19: /usr/bin/X (0x55f48bac6000+0x5c34b) [0x55f48bb2234b]
Dec 10 03:30:17 gollum /usr/lib/gdm/gdm-x-session[3819]: (EE) 20: /lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main+0xf0) [0x7fc3a952ca40]
Dec 10 03:30:17 gollum /usr/lib/gdm/gdm-x-session[3819]: (EE) 21: /usr/bin/X (_start+0x29) [0x55f48bb0c6c9]
Comment 12 Thomas DEBESSE 2015-12-10 02:47:42 UTC
Created attachment 120444 [details]
mpv vdpau output
Comment 13 Thomas DEBESSE 2015-12-10 02:48:01 UTC
Created attachment 120445 [details]
vlc vdpau output
Comment 14 Thomas DEBESSE 2015-12-10 02:48:49 UTC
Created attachment 120446 [details]
dmesg
Comment 15 Thomas DEBESSE 2015-12-10 02:50:48 UTC
Created attachment 120447 [details]
syslog


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.