Summary: | [3.15 regression] relocation value wraparound due to bios-fb preserved hole at 0 | ||
---|---|---|---|
Product: | DRI | Reporter: | Kenny MacDermid <kenny.macdermid> |
Component: | DRM/Intel | Assignee: | Jani Nikula <jani.nikula> |
Status: | CLOSED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Severity: | normal | ||
Priority: | high | CC: | andreas.pokorny, aurimas, intel-gfx-bugs, jinxianx.guo, kenny.macdermid |
Version: | XOrg git | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Hmm, could you tell if this was a recent regression? I haven't seen this error before upgrading to a 3.15 kernel. I've only run it a couple days and this has occurred twice. The actual kernel version is 3.15rc4. It looks like the Arch AUR package maintainer has updated the package to rcc5, so I can try that and let you know if it continues happening. Possibly unrelated but just for completeness I've also noticed a decreased frame rate in TagPro. For my /etc/X11/xorg.conf.d/20-intel.conf I'm using: Section "Device" Identifier "Intel Graphics" Option "SwapbuffersWait" "true" Option "AccelMethod" "sna" Option "TearFree" "true" EndSection The laptop is a Lenovo Yoga 2 Pro so it has a high dpi screen. 3200x1800 iirc. Please do check whether the current packages work with a 3.14 kernel. That will narrow down the error to being in the kernel. Switched back to 3.14.2 for the last 3 days and the hangs do not happen. The framerate seems the same though, so perhaps that was another package. Hmm, I didn't notice this first time around: 0x0000a044: 0x61010008: STATE_BASE_ADDRESS 0x0000a048: 0x00000000: general state base not updated 0x0000a04c: 0xffffd001: surface state base address 0xffffd000 0x0000a050: 0x044a1501: dynamic state base address 0x044a1500 0x0000a054: 0x00000000: indirect state base not updated 0x0000a058: 0x044a1501: instruction state base address 0x044a1500 0x0000a05c: 0x00000000: general state upper bound not updated 0x0000a060: 0x00000001: dynamic state upper bound disabled 0x0000a064: 0x00000000: indirect state upper bound not updated 0x0000a068: 0x00000001: instruction state upper bound disabled Oh boy. This is going to be fun. Created attachment 99041 [details] [review] Please consume whiskey first. Urgh. Something like this. Created attachment 99059 [details] [review] Prevent negative relocation deltas from causing wraparound Testcase: igt/gem_bad_reloc commit daa9e3d80a6c25667b259e864376ac929d5a11bd Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu May 15 08:43:11 2014 +0100 Add gem_bad_reloc This test feeds a batch containing self-references into the kernel and checks that the relocation offsets remain as valid GTT addresses. This is to exercise SNA passing in negative relocation deltas which can hang the GPU if they wrap around. References: https://bugs.freedesktop.org/show_bug.cgi?id=78533 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Created attachment 99064 [details] [review] Prevent negative relocation deltas from causing wraparound Created attachment 99083 [details] [review] Prevent negative relocation deltas from causing wraparound Now actually handles SNA's batchbuffers. I still like to know which patch introduced this regression ... Kenny, can you please try to do a bisect? I can try, but it was only occurring around once a day so it'll take a bit. Is there a start commit I should use other than 3.14? It's BIOS fb preservation leaving a hole at 0. Hey, at least that works. But yeah, makes tons of sense ... So no bisect result needed. Created attachment 99103 [details] [review] Offsect batch buffers to prevent delta wrapping An alternative, Daniel's suggestion. The problem, imo, is that this bakes in assumptions about userspace, pessimising all (and fragile) rather than fixing the pathological cases. *** Bug 78876 has been marked as a duplicate of this bug. *** We need tested-bys on Chris' latest patch ... That goes to all the people who's report has been de-duped to this one here, too. *** Bug 79013 has been marked as a duplicate of this bug. *** commit d23db88c3ab233daed18709e3a24d6c95344117f Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri May 23 08:48:08 2014 +0200 drm/i915: Prevent negative relocation deltas from wrapping *** Bug 79539 has been marked as a duplicate of this bug. *** |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 98823 [details] /sys/class/drm/card0/error compressed [11359.443122] [drm] stuck on render ring [11359.444476] [drm] GPU HANG: ecode 0:0x86dffffd, in X [813], reason: Ring hung, action: reset [11359.444490] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [11359.444494] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [11359.444498] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [11359.444501] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [11359.444505] [drm] GPU crash dump saved to /sys/class/drm/card0/error [11361.444736] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off [11365.447937] [drm] stuck on render ring [11365.449339] [drm] GPU HANG: ecode 0:0x86dffffd, in X [813], reason: Ring hung, action: reset [11365.449462] [drm:i915_context_is_banned] *ERROR* gpu hanging too fast, banning! [11367.449564] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off Using the linux-mainline kernel on Arch from the AUR: Linux orange 3.15.0-1-mainline #1 SMP PREEMPT Tue May 6 15:54:05 CEST 2014 x86_64 GNU/Linux