Bug 50719 - [SNB] Intermittent GPU hang/kernel oops when decoding h264 video using VAAPI
Summary: [SNB] Intermittent GPU hang/kernel oops when decoding h264 video using VAAPI
Status: RESOLVED WORKSFORME
Alias: None
Product: libva
Classification: Unclassified
Component: intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium major
Assignee: haihao
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-06-05 04:40 UTC by jackjones11
Modified: 2015-11-18 08:16 UTC (History)
6 users (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg (239.30 KB, text/plain)
2012-06-05 04:40 UTC, jackjones11
Details
syslog (757.82 KB, application/x-gzip)
2012-06-05 04:50 UTC, jackjones11
Details
intel_reg_dumper (11.62 KB, text/plain)
2012-06-05 04:57 UTC, jackjones11
Details
glxinfo (12.27 KB, text/plain)
2012-06-05 04:57 UTC, jackjones11
Details
vainfo (910 bytes, text/plain)
2012-06-05 04:58 UTC, jackjones11
Details
xorg.log (52.56 KB, text/plain)
2012-06-05 04:59 UTC, jackjones11
Details
syslog2 (598.72 KB, application/x-gzip)
2012-06-05 05:00 UTC, jackjones11
Details
dmesg-2 (239.18 KB, text/plain)
2012-06-05 15:55 UTC, jackjones11
Details
syslog-2 (37.44 KB, text/plain)
2012-06-05 15:56 UTC, jackjones11
Details
intel_reg_dumper-2 (11.62 KB, text/plain)
2012-06-05 15:57 UTC, jackjones11
Details
glxinfo-2 (12.54 KB, text/plain)
2012-06-05 15:58 UTC, jackjones11
Details
vainfo-2 (910 bytes, text/plain)
2012-06-05 15:59 UTC, jackjones11
Details
Xorg.0.log-2 (44.75 KB, text/plain)
2012-06-05 16:00 UTC, jackjones11
Details
i915_error_state-2 (2.10 MB, text/plain)
2012-06-05 16:04 UTC, jackjones11
Details
xbmc.log-2 (7.82 KB, text/plain)
2012-06-05 16:05 UTC, jackjones11
Details
i915_error_state-3 (2.06 MB, text/plain)
2012-08-23 22:17 UTC, jackjones11
Details

Description jackjones11 2012-06-05 04:40:14 UTC
Created attachment 62566 [details]
dmesg

I'll try to outline the problem as best I can but the problem is intermittent and can produce different symptoms when it occurs - there may in fact be more than one bug here, so let me know if you think this is the case and want them split up into separate bug reports.

Problem occurs when watching live DVB TV via tvheadend and xbmc PVR branch and mostly upon the first few seconds of a channel change to a h264/aac stream with VAAPI decoding enabled.    

The attachments dmesg, syslog, intel_reg_dumper are all from a channel change incident.  A GPU hang will occur followed by a kernel oops with "kernel BUG at drivers/gpu/drm/i915/i915_gem.c:3364!".  System is usually still running and SSH accessible, at least for a few minutes, then freezes and requires a reboot. 

Syslog2 is from my system left overnight tuned into a h264 channel and shows frequent GPU hung messages, which appear to be recoverable hangs as system is still up and running and SSH accessible.  Oddly it was outputting 1280x720@60HZ to my TV, when xdpyinfo, gnome and xbmc all claimed to be still running at the configured 1920x1080@50Hz, so maybe not 100% recoverable.  Not all GPU hung incidents cause this resolution change, often the GPU seems to reset itself and livetv resumes normally.  Note the seemingly random occurences of the GPU hangs and other one following (I'm guessing) a temp drop in the quality of the tv signal:
 Jun  5 08:51:39 htpc tvheadend[1668]: TS: TurboSight TBS 62x0 DVBT/T2 frontend 1/Central Scotland: 474,200       kHz/BBC One HD: AAC @ #6606: Continuity counter error
Jun  5 08:51:46 htpc kernel: [49719.508177] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed...       GPU hung

I don't run with an xorg.conf as recent kernel improvements to EDID/ELD querying means I don't need to.

Gnome is in 2D mode with no compiz stuff running.

xbmc's vblank setting is at "let driver decide"

i915_error_state always has the same message after any type of incident "unable to allocate memory".
  
System specs:
H67MA-USB3-B3/H67MA-USB3-B3
Intel i3-2100T
Ubuntu Oneiric 11.10
Linux htpc 3.4.0.20120531 #2 SMP Fri Jun 1 06:40:33 BST 2012 x86_64 x86_64 x86_64 GNU/Linux

From xorg-edgers ppa:
X.Org X Server 1.11.2.902 (1.11.3 RC 2)
libdrm 2.4.33+git20120403.43704256-0ubuntu0ricotz~oneiric
xserver-xorg-video-intel 2:2.18.0+git20120416.a1661620-0ubuntu0sarvatt~oneiric
Mesa 8.02

tvheadend from git 2.99.15g78213.dirty
xbmc-pvr 12.0-ALPHA3 Git:20120601-cdbea16(compiled Jun 4 2012)

This isn't a new bug related to the alpha status of xbmc - I've been experiencing this type of bug for around 7months since I bought the HW and have been trying the latest builds of all software components since to see if problem was fixed.

For what it's worth there's a small clip of a recorded h264/aac stream from livetv available here http://www.mediafire.com/?jik0xt5r475b4ik - watching it doesn't cause the problem for me but I've included it to give an idea of what video does cause the problem when viewed live.
Comment 1 Chris Wilson 2012-06-05 04:47:04 UTC
You need to reproduce this on an uptodate kernel and gfx stack and attach the /sys/kernel/debug/dri/0/i915_error_state.
Comment 2 jackjones11 2012-06-05 04:50:24 UTC
Created attachment 62569 [details]
syslog
Comment 3 jackjones11 2012-06-05 04:57:03 UTC
Created attachment 62571 [details]
intel_reg_dumper
Comment 4 jackjones11 2012-06-05 04:57:46 UTC
Created attachment 62572 [details]
glxinfo
Comment 5 jackjones11 2012-06-05 04:58:21 UTC
Created attachment 62573 [details]
vainfo
Comment 6 jackjones11 2012-06-05 04:59:21 UTC
Created attachment 62574 [details]
xorg.log
Comment 7 jackjones11 2012-06-05 05:00:53 UTC
Created attachment 62575 [details]
syslog2
Comment 8 jackjones11 2012-06-05 15:55:30 UTC
Created attachment 62607 [details]
dmesg-2
Comment 9 jackjones11 2012-06-05 15:56:26 UTC
Created attachment 62608 [details]
syslog-2
Comment 10 jackjones11 2012-06-05 15:57:32 UTC
Created attachment 62609 [details]
intel_reg_dumper-2
Comment 11 jackjones11 2012-06-05 15:58:22 UTC
Created attachment 62610 [details]
glxinfo-2
Comment 12 jackjones11 2012-06-05 15:59:18 UTC
Created attachment 62611 [details]
vainfo-2
Comment 13 jackjones11 2012-06-05 16:00:12 UTC
Created attachment 62612 [details]
Xorg.0.log-2
Comment 14 jackjones11 2012-06-05 16:04:23 UTC
Created attachment 62613 [details]
i915_error_state-2
Comment 15 jackjones11 2012-06-05 16:05:25 UTC
Created attachment 62616 [details]
xbmc.log-2
Comment 16 jackjones11 2012-06-05 16:24:34 UTC
OK I've upgraded the gfx stack and kernel to:

X.Org X Server 1.12.2 Release Date: 2012-05-29
libdrm 2.4.34+git20120520.481234f2-0ubuntu0ricotz~precise
xserver-xorg-video-intel 2.19.0+git20120604.81f09347-0ubuntu0sarvatt~precise
Mesa 8.1-devel
Linux htpc 3.5.0-rc1.20120605 #3 SMP Tue Jun 5 16:28:27 BST 2012 x86_64 x86_64 x86_64 GNU/Linux

I hope that software stack is recent enough as it's as new as I can find using the ubuntu PPAs.  Really don't fancy the daunting prospect of compiling the gfx stack from source myself.

The -2 attachments are from a switch to a h264/aac HD channel with VAAPI and deinterlacing turned on.  No kernel oops this time, only a hung GPU.  Screen is frozen showing a still frame of the tv picture and system is still SSH accessible. USB seems to be a bit borked in 3.5.0-rc1 as neither my wired or wireless USB keyboards work and my usual usb wireless network stick causes a kernel panic.  Luckily the USB remote control IR receiver and alternative wireless stick worked so I could test things.

Let me know if you need anything else.
Comment 17 Chris Wilson 2012-06-06 00:42:23 UTC
Thanks, the error states are spectacularly gruesome. I'm inclined to think that this is a wild write by libva-intel.
Comment 18 Gwenole Beauchesne 2012-08-20 13:58:02 UTC
Hi, what version of the libva-intel-driver do you use? Is your h.264 stream interlaced? Could the issue be reproduce with something lighter like mplayer-vaapi or gstreamer-vaapi? Thanks.

Does this only occur during MPEG-2 to H.264 switch?
Comment 19 Sean V Kelley 2012-08-22 05:08:02 UTC
(In reply to comment #16)
> OK I've upgraded the gfx stack and kernel to:
> 
> X.Org X Server 1.12.2 Release Date: 2012-05-29
> libdrm 2.4.34+git20120520.481234f2-0ubuntu0ricotz~precise
> xserver-xorg-video-intel 2.19.0+git20120604.81f09347-0ubuntu0sarvatt~precise
> Mesa 8.1-devel
> Linux htpc 3.5.0-rc1.20120605 #3 SMP Tue Jun 5 16:28:27 BST 2012 x86_64 x86_64
> x86_64 GNU/Linux
> 

Is this only seen on SNB?  I am not seeing it with my IVB with older X.Org, kernel, et al.  On a call but will post details later.
Comment 20 Gwenole Beauchesne 2012-08-22 05:25:10 UTC
@jackjones11: please update both xbmc and libva-driver-intel. The former now has a check against VA Intel driver >= 1.0.17 to enable deinterlacing. And for the latter, the final 1.0.17 has the required fixes for vaPutSurface() [used for VA/GLX] + VA_TOP_FIELD|VA_BOTTOM_FIELD. Otherwise, this indeed used to generate GPU hangs in the past.

However, please note that the actual decoding of H.264 interlaced content is not correctly supported in FFmpeg and VA driver.
Comment 21 jackjones11 2012-08-23 22:13:18 UTC
Hi Gwenole,

I'm now running:

xbmc-pvr git master branch from 20120721 - tried compiling latest a few days ago but it wasn't playing ball.

libva and libva-driver-intel git master branch from 20120823 - was previously running 1.0.17-pre1 - This was from vaapi-ext branch but this doesn't seem to have had any commits for a long time so I assume deinterlacing is now merged into master?

Dunno what the version number is now as vainfo isn't working.
and I can't reinstall the package as apt seems to have gotten itself into broken dependency hell I can't seem to fix, probably due to me running ubuntu oneiric but changing package lists to point to precise repositories in order to get latest edgers graphics stack installed to satisfy original bug report :)

kernel 3.5 stable.

h264 streams are 1080 interlaced, and MPEG2 streams also interlaced.  xbmc seems to be using vaapi to deinterlace correctly as decoder info during playback shows ff-h264-vaapi and ff-mpeg2video-vaapi, and fps show correctly at 50fps as opposed to 25fps with deinterlacing off.

Had another GPU hang tonight when switching from MPEG2 to h264 stream - I hadn't really noticed until you asked whether it only happens during this transition, but since turning VAAPI back on in xbmc a couple of days ago, any GPU hangs I have experienced have been during this transition so you may be onto something, although I must say it is very intermittent so may just not have been "lucky" to hit the problem when switching between the h264 channels during my tests.

I don't have mplayer-vaapi or gstreamer-vaapi installed but can pursue this if you really want me to, although depending on how needy they are in terms of dependencies might have a few problems with this as I'll need to use source due to my apt problems.  Also if problem does happen with instant codec transition when changing tv channels in xbmc, not sure how easy this would be to simulate with these players and live streams without player stopping in between if you know what I mean - anyway let me know if you'd like me to try and I'll see what I can come up with.

i915_error_state-3 attachment is from tonight's GPU hang with the latest libva-driver-intel.
Comment 22 jackjones11 2012-08-23 22:17:07 UTC
Created attachment 66036 [details]
i915_error_state-3
Comment 23 jackjones11 2012-08-23 23:45:17 UTC
Wild stab in the dark/clutching at straws moment:

Although EDID parsing seems to work ok at boot to allow me to run without an xorg.conf, syslog is filled with msgs such as:

Aug 23 17:40:58 htpc kernel: [22763.565205] [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder is 15

and lately (may have been after an edgers package update) switching inputs on my receiver, then back to htpc input, my tv claims it isn't receiving any signal.  xrandr and xdpyinfo claims everything is ok but I need to do:   

xrandr -display :0 --output HDMI2 --mode 0x47
xrandr -display :0 --output HDMI2 --mode 0x46

in order to get the signal back.  This turned into a hard fault so had to revert to running with an xorg.conf file and turning hotplugging off with:

Option "HotPlug" "false"

Which has vastly improved things.

Am I hitting the deinterlacing problem just as I change channels at the same time as EDID parsing is failing which is affecting the GPU in some strange way?
Comment 24 jackjones11 2012-08-24 02:24:46 UTC
No it can't be this because the two events don't follow each other in syslog - so much for wild theories :)
Comment 25 Chris Wilson 2012-10-18 14:07:50 UTC
vaapi-intel wtf:

    BEGIN_BATCH(batch, 2);
    OUT_BATCH(batch, MI_BATCH_BUFFER_START | (2 << 6));
    OUT_RELOC(batch, i965_h264_context->avc_it_command_mb_info.bo, 
              I915_GEM_DOMAIN_COMMAND, 0, 
              0);
    ADVANCE_BATCH(batch);
Comment 26 ykzhao 2013-11-29 02:35:54 UTC
Will you please try the latest intel-vaapi driver and see whether the issue still exists?

Thanks.
Comment 27 haihao 2015-11-18 08:16:36 UTC
No response from user for a long time so closing as WORKSFORME, Feel free to reopen this bug if you still experience the issue


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.