Created attachment 84297 [details] dmesg System Environment: -------------------------- Platform: Pineview Kernel: (drm-intel-next-nightly)e0fd293e471707a301fb9d8f83295265b9315161 Bug detailed description: ----------------------------- Run glxgears, it causes call trace and system hang. It happens on -nightly and -queued kernel.It works well on -fixes kernel. The latest known good commit: 030c22d0c4a7a98968db820d3a9759197d580217 The latest known bad commit: 815ca9817afc1c1a45799fb9d4a4f413b7b1517c Call trace: [ 337.005193] Call Trace: [ 337.005209] [<c0bf78e0>] ? start_kernel+0x318/0x31c [ 337.005216] Code: e8 4c e8 fa ff 83 3d e0 47 bf c0 00 74 10 e8 d6 4b 02 00 fb 89 e2 81 e2 00 e0 ff ff eb 0d e8 55 74 00 00 85 c0 75 e7 eb 10 f3 90 <8b> 42 08 a8 08 74 f7 e8 4a 4b 02 00 eb 55 89 e0 25 00 e0 ff ff [ 337.005009] NMI backtrace for cpu 1 [ 337.005009] CPU: 1 PID: 3892 Comm: X Not tainted 3.11.0-rc2_drm-intel-next-queued_815ca9_20130818_+ #7128 [ 337.005009] Hardware name: MICRO-STAR INTERNATIONAL CO., LTD MS-N014/MS-N014, BIOS EN014IMS.10B 11/30/2009 [ 337.005009] task: f6146e10 ti: c37de000 task.ti: c37de000 [ 337.005009] EIP: 0060:[<c04796ac>] EFLAGS: 00203006 CPU: 1 [ 337.005009] EIP is at __const_udelay+0x0/0x1a [ 337.005009] EAX: 00418958 EBX: 00002710 ECX: c0a93665 EDX: fffff000 [ 337.005009] ESI: f67e8d5c EDI: 000006e0 EBP: c0ba94a0 ESP: c37dfc8c [ 337.005009] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 [ 337.005009] CR0: 80050033 CR2: b75a2e40 CR3: 363e6000 CR4: 000007d0 [ 337.005009] Stack: [ 337.005009] c021febf c0ba94a0 c027d79c c0a9c5c0 0000ea60 00000f8d 00000f8c 000006e0 [ 337.005009] 00000000 00000000 00000001 f6146e10 f6146e10 00000000 00000001 c37dfd60 [ 337.005009] c02347a3 f67e8bf4 771044e8 0000004e c025fe2d c0260003 f67e8bf4 f67e899c [ 337.005009] Call Trace: [ 337.005009] [<c021febf>] ? arch_trigger_all_cpu_backtrace+0x57/0x60 [ 337.005009] [<c027d79c>] ? rcu_check_callbacks+0x141/0x3d5 [ 337.005009] [<c02347a3>] ? update_process_times+0x2a/0x4e [ 337.005009] [<c025fe2d>] ? tick_sched_handle+0x2d/0x37 [ 337.005009] [<c0260003>] ? tick_sched_timer+0x28/0x4b [ 337.005009] [<c024227e>] ? __run_hrtimer.isra.23+0x3b/0x88 [ 337.005009] [<c02429d2>] ? hrtimer_interrupt+0xf0/0x1e7 [ 337.005009] [<c021e76a>] ? local_apic_timer_interrupt+0x3d/0x3f [ 337.005009] [<c021ea78>] ? smp_apic_timer_interrupt+0x2b/0x39 [ 337.005009] [<c087535d>] ? apic_timer_interrupt+0x2d/0x34 [ 337.005009] [<f82af6a4>] ? i915_gem_execbuffer_reserve+0x1d5/0x2d6 [i915] [ 337.005009] [<f82affee>] ? i915_gem_do_execbuffer.isra.17+0x48c/0xd69 [i915] [ 337.005009] [<f82ab508>] ? i915_gem_obj_bound_any+0x28/0x43 [i915] [ 337.005009] [<f82ab508>] ? i915_gem_obj_bound_any+0x28/0x43 [i915] [ 337.005009] [<f82ab500>] ? i915_gem_obj_bound_any+0x20/0x43 [i915] [ 337.005009] [<c02b1754>] ? __kmalloc+0xbe/0xe0 [ 337.005009] [<f82b0e3c>] ? i915_gem_execbuffer2+0x12e/0x1c2 [i915] [ 337.005009] [<f82b0d0e>] ? i915_gem_execbuffer+0x443/0x443 [i915] [ 337.005009] [<f8103c1e>] ? drm_ioctl+0x23d/0x323 [drm] [ 337.005009] [<f82b0d0e>] ? i915_gem_execbuffer+0x443/0x443 [i915] [ 337.005009] [<c02a0328>] ? handle_pte_fault+0x274/0x5e3 [ 337.005009] [<f81039e1>] ? drm_copy_field+0x47/0x47 [drm] [ 337.005009] [<c02c04fc>] ? vfs_ioctl+0x18/0x21 [ 337.005009] [<c02c0ec8>] ? do_vfs_ioctl+0x3ec/0x42c [ 337.005009] [<c08778c9>] ? __do_page_fault+0x400/0x43b [ 337.005009] [<c087787d>] ? __do_page_fault+0x3b4/0x43b [ 337.005009] [<c02b505a>] ? vfs_read+0xd4/0x11e [ 337.005009] [<c02c0f51>] ? SyS_ioctl+0x49/0x74 [ 337.005009] [<c087945a>] ? sysenter_do_call+0x12/0x22 [ 337.005009] Code: c0 74 1f eb 0a 8d 76 00 8d bc 27 00 00 00 00 eb 0e 8d b4 26 00 00 00 00 8d bc 27 00 00 00 00 48 75 fd 48 c3 ff 15 50 43 bb c0 c3 <64> 8b 15 fc 5b a3 c3 69 d2 fa 00 00 00 c1 e0 02 f7 e2 8d 42 01 Reproduce steps: ---------------------------- 1. xinit 2. glxgears
Hm, the backtrace is a bit unhelpful ... Can you please attempt to bisect this regression?
Bisect shows: 04038a515d6eda6dd0857c0ade0b3950d372f4c0 is the first bad commit commit 04038a515d6eda6dd0857c0ade0b3950d372f4c0 Author: Ben Widawsky <ben@bwidawsk.net> AuthorDate: Wed Aug 14 11:38:36 2013 +0200 Commit: Daniel Vetter <daniel.vetter@ffwll.ch> CommitDate: Thu Aug 15 15:45:45 2013 +0200 drm/i915: Convert execbuf code to use vmas In order to transition more of our code over to using a VMA instead of an <OBJ, VM> pair - we must have the vma accessible at execbuf time. Up until now, we've only had a VMA when actually binding an object. The previous patch helped handle the distinction on bound vs. unbound. This patch will help us catch leaks, and other issues before we actually shuffle a bunch of stuff around. This attempts to convert all the execbuf code to speak in vmas. Since the execbuf code is very self contained it was a nice isolated conversion. The meat of the code is about turning eb_objects into eb_vma, and then wiring up the rest of the code to use vmas instead of obj, vm pairs. Unfortunately, to do this, we must move the exec_list link from the obj structure. This list is reused in the eviction code, so we must also modify the eviction code to make this work. WARNING: This patch makes an already hotly profiled path slower. The cost is unavoidable. In reply to this mail, I will attach the extra data. v2: Release table lock early, and two a 2 phase vma lookup to avoid having to use a GFP_ATOMIC. (Chris) v3: s/obj_exec_list/obj_exec_link/ Updates to address commit 6d2b888569d366beb4be72cacfde41adee2c25e1 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Aug 7 18:30:54 2013 +0100 drm/i915: List objects allocated from stolen memory in debugfs v4: Use obj = vma->obj for neatness in some places (Chris) need_reloc_mappable() should return false if ppgtt (Chris) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> [danvet: Split out prep patches. Also remove a FIXME comment which is now taken care of.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Ben, here's the gen3 failure case we expected to see last night after Dan's for_each_safe email.
Please test this patch from Chris: https://patchwork.kernel.org/patch/2847056/
(In reply to comment #4) > Please test this patch from Chris: > > https://patchwork.kernel.org/patch/2847056/ I test it on commit 26b32d77b2469(include this patch).It still exists.
Created attachment 84370 [details] dmesg(26b32d77b2469)
Hm 0x00100104, so LIST_POISON1 + 0x4, i.e. we've chased a ->next pointer somewhere and then dereffed ->next->prev. Doesn't smell good ...
Created attachment 84432 [details] [review] Patch to rework the exec_list trick Please test this patch on top of latest -nightly.
Created attachment 84445 [details] [review] More vma fixups Updated patch to address a now bogus WARN.
Created attachment 84499 [details] dmesg It still exists with this patch.
Can you please test this patch? https://patchwork.kernel.org/patch/2848475/
(In reply to comment #11) > Can you please test this patch? > > https://patchwork.kernel.org/patch/2848475/ Fixed by this patch.
Fixed with commit f833c65abf79c2456fe8e8c487e3d78b9c329daa Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Mon Aug 26 11:23:47 2013 +0200 drm/i915: More vma fixups around unbind/destroy
Verified.Fixed.
Closing old verified.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.