Created attachment 60145 [details]
Kernel: (drm-intel-next-queued) fc6826d1dcd65f3d1e9a5377678882e4e08f02be
Bug detailed description:
It happens on sandybridge with drm-intel-next-queued kernel.The result is unstable, it happens once in 5 runs.It doesn't happen on fixes kernel.
This case has another Bug 47488, since bug 47488 occured, The result becomes unstable,FAIL or XHANG.
[ 1553.194706] [<c02369f6>] ? wq_worker_sleeping+0xc/0x71
[ 1553.195842] [<c053f376>] __schedule+0x13c/0x766
[ 1553.197216] [<c02c1d0f>] ? kmem_cache_free+0x95/0xc6
[ 1553.198566] [<c0221bf5>] ? __cleanup_sighand+0x23/0x26
[ 1553.200060] [<c0237d53>] ? free_pid+0x8c/0x93
[ 1553.201791] [<c027a844>] ? call_rcu_sched+0xf/0x12
[ 1553.203618] [<c0225eb0>] ? release_task+0x368/0x378
[ 1553.205668] [<c023dd5c>] ? switch_task_namespaces+0xf/0x3a
[ 1553.207680] [<c053fc03>] schedule+0x51/0x53
[ 1553.209421] [<c022745a>] do_exit+0x690/0x694
[ 1553.210698] [<c0541305>] oops_end+0x93/0x9b
[ 1553.211957] [<c021d1f3>] no_context+0x158/0x162
[ 1553.213206] [<c021d2e8>] __bad_area_nosemaphore+0xeb/0xf5
[ 1553.214457] [<c0542aef>] ? spurious_fault+0xad/0xad
[ 1553.215703] [<c021d2ff>] bad_area_nosemaphore+0xd/0x10
[ 1553.216951] [<c0542cae>] do_page_fault+0x1bf/0x3a7
[ 1553.218193] [<c02473b7>] ? default_wake_function+0xb/0xd
[ 1553.219436] [<c0240153>] ? __wake_up_common+0x34/0x5c
[ 1553.220644] [<c0542aef>] ? spurious_fault+0xad/0xad
[ 1553.221815] [<c0540cb2>] error_code+0x5a/0x60
[ 1553.222968] [<c0542aef>] ? spurious_fault+0xad/0xad
[ 1553.224114] [<c0237033>] ? process_one_work+0x2f/0x2d3
[ 1553.225269] [<f8401edf>] ? i915_driver_irq_postinstall+0x156/0x156 [i915]
[ 1553.226413] [<c02375eb>] worker_thread+0x17f/0x298
[ 1553.227558] [<c023746c>] ? rescuer_thread+0x195/0x195
[ 1553.228704] [<c023a02d>] kthread+0x67/0x6c
[ 1553.229820] [<c0239fc6>] ? kthread_freezable_should_stop+0x4e/0x4e
[ 1553.230922] [<c0545c76>] kernel_thread_helper+0x6/0xd
[ 1553.231995] Code: e8 ff f6 ff ff 31 c0 59 5b 5e 5f 5d c3 55 64 a1 8c 45 76 c0 8b 80 64 02 00 00 89 e5 5d 8b 40 f8 c3 55 8b 80 64 02 00 00 89 e5 5d <8b> 40 fc c3 55 31 c0 89 e5 5d c3 55 8d 50 04 89 e5 66 c7 00 00
[ 1553.234391] EIP: [<c0239cda>] kthread_data+0xa/0xe SS:ESP 0068:f5715d58
[ 1553.235482] CR2: 00000000fffffffc
[ 1553.236532] ---[ end trace 88094ceb151ece1b ]---
1. start X
2. ./oglconform -z -suite all -v 2 -D 123 -test conditional_render advanced.fbo.queryAndRenderOnFBO
What happened to the original dmesg? Can you please attach the full unmolested output?
In future, it is the *first* OOPS that is important from the initial BUG line to the end trace. As the first few lines give the reason for the oops, with the callstack giving where.
- the dmesg is cut to a width of 80 chars, i.e. a lot of the long lines are not complete.
- dmesg talks about a gpu hang, can you try to grab the i915_error_state?
And like Chris said, for issues with dmesg output, the first error/backtrace is the important one, not the last. We have an oops in i915_driver_irq_postinstall, but unfortunately that one's cut off, too.
The NULL dereference is from:
static struct cpu_workqueue_struct *get_work_cwq(struct work_struct *work)
unsigned long data = atomic_long_read(&work->data);
if (data & WORK_STRUCT_CWQ)
return (void *)(data & WORK_STRUCT_WQ_DATA_MASK);
static void process_one_work(struct worker *worker, struct work_struct *work)
struct cpu_workqueue_struct *cwq = get_work_cwq(work);
struct global_cwq *gcwq = cwq->gcwq; <-- OOPS
Author: Tejun Heo <email@example.com>
Date: Tue Jun 29 10:07:13 2010 +0200
workqueue: use shared worklist and pool all workers per cpu
So the question is: why are we hitting it, and why now?
(That disection was based on the assumption that our compiled objects are similar enough.)
If you run "gdb vmlinux; list *process_one_work+0x2f; list *worker_thread+0x17f;" and paste the output as well.
Created attachment 60227 [details]
Created attachment 60228 [details]
The error state is just another mesa fail, annoying but the real bug here lies in the inability to recover from the error.
Does the bug trigger if you "echo 1 > /sys/kernel/debug/dri/0/i915_wedged" immediately upon booting?
Run 'echo 1 > /sys/kernel/debug/dri/0/i915_wedged'.It has same result.
Now you have a quick test to use for bisecting. Good luck! :)
... at least smells like one.
I will bisect it.
Ping on the bisect result of this regression ...
There are only 'skip'ped commits left to test.
The first bad commit could be any of:
We cannot bisect more!
4a1e8ebc5e5 is a bad commit. The others skip because of build fail.
Build fail is really bad. Can you please paste the compiler error you're getting? I can prep a quick git branch with the compiler error fixed so that you can bisect the complete range.
Can you confirm that the bug is NOT in 66cfb32772495068fbb5627b2dc88649ad66c3e5, but is in 4a1e8ebc5e5918079109cc1cd1c44c2f0fd0e11b?
Created attachment 60598 [details]
SNB build fail error message
Commit 66cfb32772495068fbb5627b2dc88649ad66c3e5 is a good commit, commit 4a1e8ebc5e5918079109cc1cd1c44c2f0fd0e11b is a bad commit.(In reply to comment #15)
> Can you confirm that the bug is NOT in
> 66cfb32772495068fbb5627b2dc88649ad66c3e5, but is in
> --- Comment #16 from lu hua <firstname.lastname@example.org> 2012-04-26 00:43:34 PDT ---
> Created attachment 60598 [details]
> --> https://bugs.freedesktop.org/attachment.cgi?id=60598
> SNB build fail error message
Looks like autofs4 fails to compile. You can just disable that in the
configuration with the CONFIG_AUTOFS4_FS option. The you could bisect the
remaining kernels revisions.
Created attachment 60635 [details] [review]
dont clobber rps work when reinstalling the irq
I've managed to reproduct this bug by accident, and I think this patch here should fix the problem. Note that it's also included in latest -queued. Please test.
Ok, I've tested this on my ivb with a patch series of my. Before this patch, it crashed after 1-3 gpu hangs, now it just survived 250+. I call this fixed, please reopen if this is not the case.
Author: Chris Wilson <email@example.com>
Date: Tue Apr 24 22:59:41 2012 +0100
drm/i915: Unconditionally initialise the interrupt workers
This issue doesn't happen on -queued kernel commit b57aa4007a558be50955f9b58f5da98fcb78aa85.
Closing old verified.