|Summary:||[3.15 regression] relocation value wraparound due to bios-fb preserved hole at 0|
|Product:||DRI||Reporter:||Kenny MacDermid <kenny.macdermid>|
|Component:||DRM/Intel||Assignee:||Jani Nikula <jani.nikula>|
|Status:||CLOSED FIXED||QA Contact:||Intel GFX Bugs mailing list <intel-gfx-bugs>|
|Priority:||high||CC:||andreas.pokorny, aurimas, intel-gfx-bugs, jinxianx.guo, kenny.macdermid|
|i915 platform:||i915 features:|
Description Kenny MacDermid 2014-05-10 16:42:30 UTC
Created attachment 98823 [details] /sys/class/drm/card0/error compressed [11359.443122] [drm] stuck on render ring [11359.444476] [drm] GPU HANG: ecode 0:0x86dffffd, in X , reason: Ring hung, action: reset [11359.444490] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [11359.444494] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [11359.444498] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [11359.444501] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [11359.444505] [drm] GPU crash dump saved to /sys/class/drm/card0/error [11361.444736] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off [11365.447937] [drm] stuck on render ring [11365.449339] [drm] GPU HANG: ecode 0:0x86dffffd, in X , reason: Ring hung, action: reset [11365.449462] [drm:i915_context_is_banned] *ERROR* gpu hanging too fast, banning! [11367.449564] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off Using the linux-mainline kernel on Arch from the AUR: Linux orange 3.15.0-1-mainline #1 SMP PREEMPT Tue May 6 15:54:05 CEST 2014 x86_64 GNU/Linux
Comment 1 Chris Wilson 2014-05-10 17:21:01 UTC
Hmm, could you tell if this was a recent regression?
Comment 2 Kenny MacDermid 2014-05-10 21:07:27 UTC
I haven't seen this error before upgrading to a 3.15 kernel. I've only run it a couple days and this has occurred twice. The actual kernel version is 3.15rc4. It looks like the Arch AUR package maintainer has updated the package to rcc5, so I can try that and let you know if it continues happening. Possibly unrelated but just for completeness I've also noticed a decreased frame rate in TagPro. For my /etc/X11/xorg.conf.d/20-intel.conf I'm using: Section "Device" Identifier "Intel Graphics" Option "SwapbuffersWait" "true" Option "AccelMethod" "sna" Option "TearFree" "true" EndSection The laptop is a Lenovo Yoga 2 Pro so it has a high dpi screen. 3200x1800 iirc.
Comment 3 Chris Wilson 2014-05-11 07:06:44 UTC
Please do check whether the current packages work with a 3.14 kernel. That will narrow down the error to being in the kernel.
Comment 4 Kenny MacDermid 2014-05-14 13:21:43 UTC
Switched back to 3.14.2 for the last 3 days and the hangs do not happen. The framerate seems the same though, so perhaps that was another package.
Comment 5 Chris Wilson 2014-05-14 18:01:10 UTC
Hmm, I didn't notice this first time around: 0x0000a044: 0x61010008: STATE_BASE_ADDRESS 0x0000a048: 0x00000000: general state base not updated 0x0000a04c: 0xffffd001: surface state base address 0xffffd000 0x0000a050: 0x044a1501: dynamic state base address 0x044a1500 0x0000a054: 0x00000000: indirect state base not updated 0x0000a058: 0x044a1501: instruction state base address 0x044a1500 0x0000a05c: 0x00000000: general state upper bound not updated 0x0000a060: 0x00000001: dynamic state upper bound disabled 0x0000a064: 0x00000000: indirect state upper bound not updated 0x0000a068: 0x00000001: instruction state upper bound disabled Oh boy. This is going to be fun.
Comment 6 Chris Wilson 2014-05-14 18:36:33 UTC
Created attachment 99041 [details] [review] Please consume whiskey first. Urgh. Something like this.
Comment 7 Chris Wilson 2014-05-15 06:31:42 UTC
Created attachment 99059 [details] [review] Prevent negative relocation deltas from causing wraparound
Comment 8 Chris Wilson 2014-05-15 07:47:59 UTC
Testcase: igt/gem_bad_reloc commit daa9e3d80a6c25667b259e864376ac929d5a11bd Author: Chris Wilson <firstname.lastname@example.org> Date: Thu May 15 08:43:11 2014 +0100 Add gem_bad_reloc This test feeds a batch containing self-references into the kernel and checks that the relocation offsets remain as valid GTT addresses. This is to exercise SNA passing in negative relocation deltas which can hang the GPU if they wrap around. References: https://bugs.freedesktop.org/show_bug.cgi?id=78533 Signed-off-by: Chris Wilson <email@example.com>
Comment 9 Chris Wilson 2014-05-15 07:48:39 UTC
Created attachment 99064 [details] [review] Prevent negative relocation deltas from causing wraparound
Comment 10 Chris Wilson 2014-05-15 12:32:52 UTC
Created attachment 99083 [details] [review] Prevent negative relocation deltas from causing wraparound Now actually handles SNA's batchbuffers.
Comment 11 Daniel Vetter 2014-05-15 14:07:14 UTC
I still like to know which patch introduced this regression ... Kenny, can you please try to do a bisect?
Comment 12 Kenny MacDermid 2014-05-15 14:25:09 UTC
I can try, but it was only occurring around once a day so it'll take a bit. Is there a start commit I should use other than 3.14?
Comment 13 Chris Wilson 2014-05-15 14:40:52 UTC
It's BIOS fb preservation leaving a hole at 0.
Comment 14 Daniel Vetter 2014-05-15 14:42:11 UTC
Hey, at least that works. But yeah, makes tons of sense ... So no bisect result needed.
Comment 15 Chris Wilson 2014-05-15 15:58:37 UTC
Created attachment 99103 [details] [review] Offsect batch buffers to prevent delta wrapping An alternative, Daniel's suggestion. The problem, imo, is that this bakes in assumptions about userspace, pessimising all (and fragile) rather than fixing the pathological cases.
Comment 16 Chris Wilson 2014-05-19 06:29:02 UTC
*** Bug 78876 has been marked as a duplicate of this bug. ***
Comment 17 Daniel Vetter 2014-05-19 08:44:13 UTC
We need tested-bys on Chris' latest patch ... That goes to all the people who's report has been de-duped to this one here, too.
Comment 18 Chris Wilson 2014-05-21 12:47:44 UTC
*** Bug 79013 has been marked as a duplicate of this bug. ***
Comment 19 Chris Wilson 2014-05-28 18:30:31 UTC
commit d23db88c3ab233daed18709e3a24d6c95344117f Author: Chris Wilson <firstname.lastname@example.org> Date: Fri May 23 08:48:08 2014 +0200 drm/i915: Prevent negative relocation deltas from wrapping