Created attachment 98446 [details] dmesg system Environment: -------------------------- Platform: Broadwell Kernel:(drm-intel-nightly)08ce6614d07dd1e426109672a5e323317c8d6ec7 Bug detailed description: ----------------------------- Open google-chrome browser, reports BUG: unable to handle kernel paging request at ffffc900110a0000. It happens on Broadwell with -nightly and -queued branch. The latest known good commit: c79057922ed6c2c6df1214e6ab4414fea1b23db2 The latest known bad commit: e5c03ca362819ba8ffbe5674340b61b9cd75de8f dmesg: [ 38.784886] BUG: unable to handle kernel paging request at ffffc900110a0000 [ 38.784938] IP: [<ffffffff812e6723>] iowrite32+0xe/0x28 [ 38.784976] PGD 14a80e067 PUD 14a80f067 PMD 149b68067 PTE 0 [ 38.785017] Oops: 0002 [#1] SMP [ 38.785041] Modules linked in: ip6table_filter ip6_tables ipv6 iptable_filter ip_tables ebtable_nat ebtables x_tables dm_mod snd_hda_codec_hdmi iTCO_wdt iTCO_vendor_support ppdev pcspkr i2c_i801 snd_hda_intel snd_hda_controller snd_hda_codec lpc_ich snd_hwdep mfd_core snd_pcm snd_timer snd soundcore battery parport_pc parport ac acpi_cpufreq i915 video button drm_kms_helper drm [ 38.785298] CPU: 0 PID: 4171 Comm: X Not tainted 3.15.0-rc2_drm-intel-nightly_08ce66_20140505+ #2318 [ 38.785353] task: ffff880002221f80 ti: ffff8800a633c000 task.ti: ffff8800a633c000 [ 38.785397] RIP: 0010:[<ffffffff812e6723>] [<ffffffff812e6723>] iowrite32+0xe/0x28 [ 38.785446] RSP: 0018:ffff8800a633dc00 EFLAGS: 00010292 [ 38.785478] RAX: 00000000fffff40c RBX: ffff880002da5780 RCX: 000000000001ffd8 [ 38.785520] RDX: 000000000001ffe8 RSI: ffffc900110a0000 RDI: ffffc900110a0000 [ 38.785561] RBP: 0000000000000002 R08: 00000000000144c0 R09: ffff8800020e7d78 [ 38.785603] R10: ffff8800020e7d78 R11: 00007f597a566106 R12: ffff880002da4000 [ 38.785644] R13: 0000000000000000 R14: 0000000000022040 R15: ffff8800a6b62d80 [ 38.785691] FS: 00007f597d6558c0(0000) GS:ffff88014f400000(0000) knlGS:0000000000000000 [ 38.785738] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 38.785771] CR2: ffffc900110a0000 CR3: 00000000a6301000 CR4: 00000000003407f0 [ 38.785812] Stack: [ 38.785825] ffffffffa008b646 000000000081c000 0000000000000004 ffff880002da5780 [ 38.785874] ffff880002da4000 ffff880002315e00 0000000000000000 ffff880002da5780 [ 38.785923] ffffffffa008a9a8 ffff880002315e00 ffffffffa008d3a1 0000000000000000 [ 38.785973] Call Trace: [ 38.786006] [<ffffffffa008b646>] ? gen6_signal+0xd7/0x10c [i915] [ 38.786053] [<ffffffffa008a9a8>] ? gen6_add_request+0x13/0x88 [i915] [ 38.786101] [<ffffffffa008d3a1>] ? intel_ring_flush_all_caches+0x1c/0x29 [i915] [ 38.786155] [<ffffffffa007a17f>] ? __i915_add_request+0x1c4/0x1e3 [i915] [ 38.786204] [<ffffffffa0075eb1>] ? i915_gem_do_execbuffer.isra.16+0xff1/0x1142 [i915] [ 38.786253] [<ffffffff812e3c23>] ? __sg_page_iter_next+0x2b/0x58 [ 38.786298] [<ffffffffa00764d0>] ? i915_gem_execbuffer2+0x177/0x1fb [i915] [ 38.786345] [<ffffffffa0002f22>] ? drm_ioctl+0x25c/0x3ad [drm] [ 38.786388] [<ffffffffa0076359>] ? i915_gem_execbuffer+0x357/0x357 [i915] [ 38.786431] [<ffffffff8104d453>] ? enqueue_hrtimer+0x15/0x37 [ 38.786468] [<ffffffff8104da02>] ? __hrtimer_start_range_ns+0x21a/0x238 [ 38.786512] [<ffffffff810ee9b3>] ? do_vfs_ioctl+0x3ec/0x435 [ 38.786548] [<ffffffff810eea45>] ? SyS_ioctl+0x49/0x78 [ 38.786582] [<ffffffff81092d69>] ? __audit_syscall_exit+0x209/0x225 [ 38.786623] [<ffffffff8172d622>] ? system_call_fastpath+0x16/0x1b [ 38.786659] Code: 48 89 f7 76 05 e9 21 5c d4 ff 48 81 fe 00 00 01 00 77 09 48 c7 c6 15 9f 9b 81 eb aa c3 48 81 fe ff ff 03 00 89 f8 48 89 f7 76 03 <89> 06 c3 48 81 fe 00 00 01 00 76 05 0f b7 d6 ef c3 48 c7 c6 49 [ 38.786873] RIP [<ffffffff812e6723>] iowrite32+0xe/0x28 [ 38.786909] RSP <ffff8800a633dc00> [ 38.786930] CR2: ffffc900110a0000 [ 38.801814] ---[ end trace b37fc0904a9ee0eb ]--- Reproduce steps: ---------------------------- 1. xinit 2. open google-chrome browser
Run Piglit case, it also has this issue and X is no response. Run ./bin/glean -o -v -v -v -t +blendFunc output ---------------------------------------------------------------------- This test checks all combinations of source and destination blend factors for the GL_FUNC_ADD blend equation. It operates on all RGB or RGBA drawing surface configurations that support the creation of windows. Note that a common cause of failures for this test is small errors introduced when an implementation scales color values incorrectly; for example, converting an 8-bit color value to float by dividing by 256 rather than 255, or computing a blending result by shifting a double-width intermediate value rather than scaling it. Also, please note that the OpenGL spec requires that when converting from floating-point colors to integer form, the result must be rounded to the nearest integer, not truncated. [1.2.1, 2.13.9] The test reports two error measurements. The first (readback) is the error detected when reading back raw values that were written to the framebuffer. The error in this case should be very close to zero, since the values are carefully constructed so that they can be represented accurately in the framebuffer. The second (blending) is the error detected in the result of the blending computation. For the test to pass, these errors must both be no greater than one least-significant bit in the framebuffer representation of a color.
Bisect shows: 78325f2d270897c9ee0887125b7abb963eb8efea is the first bad commit commit 78325f2d270897c9ee0887125b7abb963eb8efea Author: Ben Widawsky <benjamin.widawsky@intel.com> AuthorDate: Tue Apr 29 14:52:29 2014 -0700 Commit: Daniel Vetter <daniel.vetter@ffwll.ch> CommitDate: Mon May 5 10:56:53 2014 +0200 drm/i915: Virtualize the ringbuffer signal func This abstraction again is in preparation for gen8. Gen8 will bring new semantics for doing this operation. While here, make the writes of MI_NOOPs explicit for non-existent rings. This should have been implicit before. NOTE: This is going to be removed in a few patches. Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This issue blocks Piglit,Webglc,Media testing. Increasing priority.
Many igt cases also have this issue. Run ./gem_exec_nop --run-subtest render, It doesn't exit testing. output: IGT-Version: 1.6-g4bd9fe6 (x86_64) (Linux: 3.15.0-rc3_drm-intel-nightly_5e83a7_20140506+ x86_64) Time to exec x 1: 128.000µs (ring=render) Time to exec x 2: 74.000µs (ring=render) Time to exec x 4: 29.000µs (ring=render) Time to exec x 8: 19.125µs (ring=render) Time to exec x 16: 11.875µs (ring=render) Time to exec x 32: 6.656µs (ring=render) Time to exec x 64: 6.016µs (ring=render) Time to exec x 128: 5.078µs (ring=render) Time to exec x 256: 4.672µs (ring=render) dmesg [ 128.955986] BUG: unable to handle kernel paging request at ffffc900110a0000 [ 128.956037] IP: [<ffffffff812e6803>] iowrite32+0xe/0x28 [ 128.956075] PGD 14a80e067 PUD 14a80f067 PMD 145448067 PTE 0 [ 128.956117] Oops: 0002 [#1] SMP [ 128.956142] Modules linked in: dm_mod snd_hda_codec_hdmi iTCO_wdt iTCO_vendor_support ppdev snd_hda_intel snd_hda_controller pcspkr i2c_i801 snd_hda_codec snd_hwdep snd_pcm lpc_ich mfd_core snd_timer snd soundcore battery parport_pc parport ac acpi_cpufreq i915 video button drm_kms_helper drm [ 128.956348] CPU: 0 PID: 4569 Comm: gem_exec_nop Not tainted 3.15.0-rc3_drm-intel-nightly_5e83a7_20140506+ #2350 [ 128.956407] task: ffff88014651de80 ti: ffff8801448a8000 task.ti: ffff8801448a8000 [ 128.956452] RIP: 0010:[<ffffffff812e6803>] [<ffffffff812e6803>] iowrite32+0xe/0x28 [ 128.956501] RSP: 0018:ffff8801448a9c00 EFLAGS: 00010292 [ 128.956534] RAX: 00000000fffff38f RBX: ffff880002d41780 RCX: 000000000001ffd8 [ 128.956573] RDX: 000000000001ffe8 RSI: ffffc900110a0000 RDI: ffffc900110a0000 [ 128.956613] RBP: 0000000000000002 R08: 00000000000144c0 R09: ffff8800a8bbae78 [ 128.956655] R10: ffff8800a8bbae78 R11: 0000000000000000 R12: ffff880002d40000 [ 128.956695] R13: 0000000000000000 R14: 0000000000022040 R15: ffff8800a3831120 [ 128.956738] FS: 00007ff717b848c0(0000) GS:ffff88014f400000(0000) knlGS:0000000000000000 [ 128.956784] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 128.956815] CR2: ffffc900110a0000 CR3: 000000014488d000 CR4: 00000000003407f0 [ 128.956859] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 128.956902] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 128.956945] Stack: [ 128.956959] ffffffffa008a5fe 000000000081c000 0000000000000004 ffff880002d41780 [ 128.957008] ffff880002d40000 ffff8800a88c7200 0000000000000000 ffff880002d41780 [ 128.957056] ffffffffa0089960 ffff8800a88c7200 ffffffffa008c364 0000000000000000 [ 128.957104] Call Trace: [ 128.957135] [<ffffffffa008a5fe>] ? gen6_signal+0xd7/0x10c [i915] [ 128.957182] [<ffffffffa0089960>] ? gen6_add_request+0x13/0x88 [i915] [ 128.957229] [<ffffffffa008c364>] ? intel_ring_flush_all_caches+0x1c/0x29 [i915] [ 128.957283] [<ffffffffa007916c>] ? __i915_add_request+0x1c4/0x1e3 [i915] [ 128.957333] [<ffffffffa0074eb1>] ? i915_gem_do_execbuffer.isra.16+0x1005/0x1147 [i915] [ 128.957383] [<ffffffff81051aae>] ? ttwu_do_wakeup+0xe/0x79 [ 128.957426] [<ffffffffa00754c1>] ? i915_gem_execbuffer2+0x177/0x1fa [i915] [ 128.957474] [<ffffffffa0002f52>] ? drm_ioctl+0x25c/0x3ad [drm] [ 128.957515] [<ffffffffa007534a>] ? i915_gem_execbuffer+0x357/0x357 [i915] [ 128.957560] [<ffffffff81726343>] ? __schedule+0x638/0x77e [ 128.957595] [<ffffffff810eea63>] ? do_vfs_ioctl+0x3ec/0x435 [ 128.957633] [<ffffffff810eeaf5>] ? SyS_ioctl+0x49/0x78 [ 128.957668] [<ffffffff8172dba2>] ? system_call_fastpath+0x16/0x1b [ 128.957706] Code: 48 89 f7 76 05 e9 45 5b d4 ff 48 81 fe 00 00 01 00 77 09 48 c7 c6 2d a1 9b 81 eb aa c3 48 81 fe ff ff 03 00 89 f8 48 89 f7 76 03 <89> 06 c3 48 81 fe 00 00 01 00 76 05 0f b7 d6 ef c3 48 c7 c6 86 [ 128.957942] RIP [<ffffffff812e6803>] iowrite32+0xe/0x28 [ 128.957978] RSP <ffff8801448a9c00> [ 128.958000] CR2: ffffc900110a0000 [ 128.958021] ---[ end trace 13d907d4a9e6b3e4 ]---
Something is not right. Semaphores should be disabled by default for Broadwell. Can you double check the bisect. I am having problems with my platform at the moment, and cannot reproduce.
Please test this while I think more: diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 40a7aa4..1371bf6 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -717,6 +717,9 @@ gen6_add_request(struct intel_ring_buffer *ring) { int ret; + if (!i915_semaphore_is_enabled(dev)) + return 0; + ret = ring->semaphore.signal(ring, 4); if (ret) return ret;
(In reply to comment #6) > Please test this while I think more: > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c > b/drivers/gpu/drm/i915/intel_ringbuffer.c > index 40a7aa4..1371bf6 100644 > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c > @@ -717,6 +717,9 @@ gen6_add_request(struct intel_ring_buffer *ring) > { > int ret; > > + if (!i915_semaphore_is_enabled(dev)) > + return 0; > + > ret = ring->semaphore.signal(ring, 4); > if (ret) > return ret; Apply this patch, make fail. CC [M] drivers/gpu/drm/i915/intel_ringbuffer.o drivers/gpu/drm/i915/intel_ringbuffer.c: In function ‘gen6_add_request’: drivers/gpu/drm/i915/intel_ringbuffer.c:720:33: error: ‘dev’ undeclared (first use in this function) drivers/gpu/drm/i915/intel_ringbuffer.c:720:33: note: each undeclared identifier is reported only once for each function it appears in make[4]: *** [drivers/gpu/drm/i915/intel_ringbuffer.o] Error 1 make[3]: *** [drivers/gpu/drm/i915] Error 2 make[2]: *** [drivers/gpu/drm] Error 2 make[1]: *** [drivers/gpu] Error 2 make: *** [drivers] Error 2 Run ./gem_exec_nop --run-subtest render, it can reproduce the bisect result.
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 40a7aa4..62fd8a6 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -717,6 +717,9 @@ gen6_add_request(struct intel_ring_buffer *ring) { int ret; + if (!i915_semaphore_is_enabled(ring->dev)) + return 0; + ret = ring->semaphore.signal(ring, 4); if (ret) return ret;
(In reply to comment #8) > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c > b/drivers/gpu/drm/i915/intel_ringbuffer.c > index 40a7aa4..62fd8a6 100644 > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c > @@ -717,6 +717,9 @@ gen6_add_request(struct intel_ring_buffer *ring) > { > int ret; > > + if (!i915_semaphore_is_enabled(ring->dev)) > + return 0; > + > ret = ring->semaphore.signal(ring, 4); > if (ret) > return ret; make fail drivers/gpu/drm/i915/intel_ringbuffer.c: In function ‘gen6_add_request’: drivers/gpu/drm/i915/intel_ringbuffer.c:725:33: error: ‘dev’ undeclared (first use in this function) drivers/gpu/drm/i915/intel_ringbuffer.c:725:33: note: each undeclared identifier is reported only once for each function it appears in make[4]: *** [drivers/gpu/drm/i915/intel_ringbuffer.o] Error 1 make[3]: *** [drivers/gpu/drm/i915] Error 2 make[2]: *** [drivers/gpu/drm] Error 2 make[1]: *** [drivers/gpu] Error 2 make: *** [drivers] Error 2
You sure you used the right patch? drivers/gpu/drm/i915/intel_ringbuffer.c:725:33: error: ‘dev’ undeclared It should have been ring->dev
(In reply to comment #10) > You sure you used the right patch? > > drivers/gpu/drm/i915/intel_ringbuffer.c:725:33: error: ‘dev’ undeclared > > It should have been ring->dev Sorry. It fixed by the 2nd patch. Run: ./gem_exec_nop --run-subtest render output: IGT-Version: 1.6-g7935bbd (x86_64) (Linux: 3.15.0-rc3_prts_2557e2_20140508 x86_64) Test assertion failure function gem_quiescent_gpu, file drmtest.c:158: Last errno: 5, Input/output error Failed assertion: drmIoctl((fd), ((((1U) << (((0+8)+8)+14)) | ((('d')) << (0+8)) | (((0x40 + 0x29)) << 0) | ((((sizeof(struct drm_i915_gem_execbuffer2)))) << ((0+8)+8)))), (&execbuf)) == 0 Subtest render: FAIL
Created attachment 98661 [details] dmesg(patch)
*** Bug 78579 has been marked as a duplicate of this bug. ***
It seems that this bug can be fixed by the following patch from Oscar Mateo: >http://lists.freedesktop.org/archives/intel-gfx/2014-May/044900.html Thanks. Yakui
It still happens on latest -nightly kernel(2be456541ea41). run ./gem_exec_nop --run-subtest render output: Time to exec x 1: 99.000µs (ring=render) Time to exec x 2: 79.500µs (ring=render) Time to exec x 4: 42.750µs (ring=render) Time to exec x 8: 24.125µs (ring=render) Time to exec x 16: 13.625µs (ring=render) Time to exec x 32: 9.594µs (ring=render) Time to exec x 64: 6.469µs (ring=render) Time to exec x 128: 5.188µs (ring=render) Time to exec x 256: 4.824µs (ring=render)
Created attachment 99004 [details] dmesg(2be456)
Yes, it still occurs on -nightly. Ben's semaphores series fix that, but please let this bug open until the proper solution lands on -nigthly.
(In reply to comment #14) > It seems that this bug can be fixed by the following patch from Oscar Mateo: > >http://lists.freedesktop.org/archives/intel-gfx/2014-May/044900.html > > Thanks. > Yakui Test it, this issue still exists. commit d1533379584f8edcfcabb024dffc1b334db8da0f Author: Oscar Mateo <oscar.mateo@intel.com> Date: Fri May 9 13:44:59 2014 +0100 drm/i915: Ringbuffer signal func for the second BSD ring This is missing in: commit 78325f2d270897c9ee0887125b7abb963eb8efea Author: Ben Widawsky <benjamin.widawsky@intel.com> Date: Tue Apr 29 14:52:29 2014 -0700 drm/i915: Virtualize the ringbuffer signal func Looks to me like a rebase side-effect... Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 09b6d04..3974e82 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -2186,6 +2186,7 @@ int intel_init_bsd2_ring_buffer(struct drm_device *dev) ring->dispatch_execbuffer = gen8_ring_dispatch_execbuffer; ring->semaphore.sync_to = gen6_ring_sync; + ring->semaphore.signal = gen6_signal; /* * The current semaphore is only applied on the pre-gen8. And there * is no bsd2 ring on the pre-gen8. So now the semaphore_register output: IGT-Version: 1.6-g351e7d3 (x86_64) (Linux: 3.15.0-rc3_prts_d15333_20140514 x86_64) Time to exec x 1: 129.000µs (ring=render) Time to exec x 2: 74.000µs (ring=render) Time to exec x 4: 28.750µs (ring=render) Time to exec x 8: 18.875µs (ring=render) Time to exec x 16: 10.625µs (ring=render) Time to exec x 32: 9.000µs (ring=render) Time to exec x 64: 6.188µs (ring=render) Time to exec x 128: 5.047µs (ring=render) Time to exec x 256: 4.582µs (ring=render)
Just applied a patch from Mika yesterday which also fixes some Oops at load. Please retest.
commit 6e450ab24dc645d776e65bbb91fc5f6788087c32 Author: Mika Kuoppala <mika.kuoppala@linux.intel.com> Date: Thu May 15 20:58:07 2014 +0300 drm/i915: Bail out early on gen6_signal if no semaphores If we dont have semaphores enabled, we allocate 4 dwords for signalling. But end up emitting more regardless. Fix this by bailing out early if semaphores are not enabled. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78274 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78283 Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Verified.Fixed.
Closing verified+fixed.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.