Bug 103185 - Intel GPU hang using kodi when VAAPI is enabled
Summary: Intel GPU hang using kodi when VAAPI is enabled
Status: CLOSED WORKSFORME
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2017-10-10 09:39 UTC by hasu0bs
Modified: 2018-04-25 08:15 UTC (History)
2 users (show)

See Also:
i915 platform: SNB
i915 features: GPU hang


Attachments

Description hasu0bs 2017-10-10 09:39:31 UTC
Sorry if this has been reported before, I have no clue how to check if this is a duplicate :P

Here it goes: I have been experiencing a complete freeze of the application when using kodi to watch SD content when VAAPI is enabled. The operating system is Libreelec. Here are the corresponding bug reports:

Kodi: https://trac.kodi.tv/ticket/17138
Invalid, because its a Libreelec issue?

Libreelec: https://forum.libreelec.tv/thread/5349-libreelec-8-9-kernel-regression-vaapi/
Invalid because its a kernel/intel driver issue?

Here is all the info requested:
LibreELEC:~ # uname -m
x86_64
LibreELEC:~ # uname -r
4.13.4
LibreELEC:~ # xrandr --verbose | paste
http://sprunge.us/effM
LibreELEC:~ # dmesg | paste
http://sprunge.us/eYSc
LibreELEC:~ # cat /sys/class/drm/card0/error | gzip > error.gz
LibreELEC:~ # echo 1 > /sys/devices/pci0000:00/0000:00:02.0/rom
LibreELEC:~ # cat /sys/devices/pci0000:00/0000:00:02.0/rom > vbios.dump
LibreELEC:~ # echo 0 > /sys/devices/pci0000:00/0000:00:02.0/rom

Here are the two compressed files:
https://www.dropbox.com/s/tmfqnl609olgq9l/error.gz?dl=1
https://www.dropbox.com/s/tgjasvmom9w685v/vbios.dump?dl=1

Unfortunately, the 2M log buffer wasn't enough to capture everything after restarting the video file ~20 times. If neccessary I can try again with even more buffer size.

Additional Kodi log, if it helps anyone: http://sprunge.us/RfDH
Comment 1 Elizabeth 2017-10-10 17:06:49 UTC
IOMMU enabled?: -1
Could you please try with intel_iommu=igfx_off on grub?

From error state:
...
Active (rcs0) [17]:
    00000000_0084d000     4096 3e 02 [ a7a7 00 00 00 00 ] 00 dirty LLC
    00000000_01821000     4096 3f 00 [ a7a7 00 00 00 00 ] 00 dirty LLC
    00000000_01038000  8294400 7e 00 [ a7a7 00 00 00 00 ] 00 X dirty uncached (name: 2) (fence: 1)
...
Pinned (global) [12]:
    00000000_01038000  8294400 7e 00 [ a7a7 00 00 00 00 ] 00 X dirty uncached (name: 2) (fence: 1)
    00000000_7fdfe000     8192 41 00 [ 00 00 00 00 00 ] 00 dirty LLC
...
    00000000_0084e000  8294400 7e 00 [ 00 00 00 00 00 ] 00 X dirty uncached (name: 3) (fence: 0)
... ...

[0ff0] 00000000 00000000 00000000 00000000
  pid 657, ban score 0, seqno        3:0000a7a7, emitted 5498856ms ago, head 00005128, tail 000051d8
 seqno 0x0000a7a7 for i915/signal:0 [81]
ring (rcs0) at 0x00000000_00002000; HEAD points to: 0x00000000_00007128
0x00002000:      0x7a000003: PIPE_CONTROL
0x00002004:      0x00004000:    qword write, 
0x00002008:      0x7fdfd084:    destination address
0x0000200c:      0x00000000:    immediate dword low
0x00002010:      0x00000000:    immediate dword high
0x00002014:      0x00000000: MI_NOOP
0x00002018:      0x7a000002: PIPE_CONTROL
0x0000201c:      0x00144c1c:    qword write, cs stall, tlb invalidate, instruction cache invalidate, texture cache invalidate, vf fetch invalidate, constant cache invalidate, state cache invalidate, 
0x00002020:      0x7fdfd084:    
0x00002024:      0x00000000:    
0x00002028:      0x18800100: MI_BATCH_BUFFER_START
0x0000202c:      0x051da000:    dword 1
0x00002030:      0x7a000003: PIPE_CONTROL
0x00002034:      0x00100002:    no write, cs stall, stall at scoreboard, 
0x00002038:      0x7fdfd084:    destination address
0x0000203c:      0x00000000:    immediate dword low
0x00002040:      0x00000000:    immediate dword high
0x00002044:      0x00000000: MI_NOOP
0x00002048:      0x7a000003: PIPE_CONTROL
0x0000204c:      0x00004000:    qword write, 
0x00002050:      0x7fdfd084:    destination address
0x00002054:      0x00000000:    immediate dword low
0x00002058:      0x00000000:    immediate dword high
0x0000205c:      0x00000000: MI_NOOP
0x00002060:      0x7a000002: PIPE_CONTROL
0x00002064:      0x00101001:    no write, cs stall, render target cache flush, depth cache flush, 
...
Comment 2 hasu0bs 2017-10-11 10:48:52 UTC
OK, I finally got it to crash again. Here are the logs:

head of dmesg if 20M buffer wasn't enough: http://sprunge.us/NBST


LibreELEC:~/.kodi/temp # cat /proc/cmdline
root=/dev/ram0 rdinit=/init usbcore.autosuspend=-1 BOOT_IMAGE=/KERNEL boot=UUID=3108-2154 live quiet tty vga=current drm.debug=0x1e intel_iommu=igfx_off log_buf_len=20M
LibreELEC:~/.kodi/temp # xrandr --verbose | paste
http://sprunge.us/SLdV
LibreELEC:~/.kodi/temp # dmesg | gzip > dmesg_iommu.gz
LibreELEC:~/.kodi/temp # ls
archive_cache   dmesg_iommu.gz  kodi.log        scrapers        temp
LibreELEC:~/.kodi/temp # paste kodi.log
http://sprunge.us/CcfN
LibreELEC:~/.kodi/temp # cat /sys/class/drm/card0/error | gzip > error_iommu.gz
LibreELEC:~/.kodi/temp # echo 1 > /sys/devices/pci0000:00/0000:00:02.0/rom
LibreELEC:~/.kodi/temp # cat /sys/devices/pci0000:00/0000:00:02.0/rom > vbios_io
mmu.dump
LibreELEC:~/.kodi/temp # echo 0 > /sys/devices/pci0000:00/0000:00:02.0/rom

https://www.dropbox.com/s/opnidg41i1mes93/dmesg_iommu.gz?dl=1
https://www.dropbox.com/s/mj923ehn5losjrn/error_iommu.gz?dl=1
https://www.dropbox.com/s/hpsdu69ic3u3t24/vbios_iommu.dump?dl=1
Comment 3 hasu0bs 2017-10-24 09:25:07 UTC
I just noticed that I didn't mention the file to reproduce the issue. It is right here:
https://www.dropbox.com/s/t5gp1dbwbuh5ssb/Die%20Ruhrpottwache-SAT.12017-02-1712-59.ts?dl=1

I have to restart it several times before it crashes.
Comment 4 Jani Saarinen 2018-03-29 07:10:31 UTC
First of all. Sorry about spam.
This is mass update for our bugs. 

Sorry if you feel this annoying but with this trying to understand if bug still valid or not.
If bug investigation still in progress, please ignore this and I apologize!

If you think this is not anymore valid, please comment to the bug that can be closed.
If you haven't tested with our latest pre-upstream tree(drm-tip), can you do that also to see if issue is valid there still and if you cannot see issue there, please comment to the bug.
Comment 5 Jani Saarinen 2018-04-25 08:15:10 UTC
Closing, please re-open is issue still exists.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.