Bug 111530

Summary: [CI][Piglit] spec@*texture* - crash
Product: DRI Reporter: Martin Peres <martin.peres>
Component: DRM/IntelAssignee: Chris Wilson <chris>
Status: RESOLVED MOVED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: minor    
Priority: low CC: intel-gfx-bugs
Version: XOrg git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: HSW i915 features: Perf/OA

Description Martin Peres 2019-09-02 06:39:26 UTC
A lot of HSW regressed in piglit: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6816/pig-hsw-4770r/spec@ext_framebuffer_object@fbo-generatemipmap-formats.html

This looks like it was caused by:
dffa8feb3084 drm/i915/perf: Assert locking for i915_init_oa_perf_state()
Comment 1 CI Bug Log 2019-09-02 06:42:07 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* pig-hsw-4770r: spec@*texture* - crash 
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6816/pig-hsw-4770r/spec@ext_framebuffer_object@fbo-generatemipmap-formats.html

  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6816/pig-hsw-4770r/spec@ext_texture_norm16@render.html

  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6816/pig-hsw-4770r/spec@ext_texture_array@maxlayers.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6816/pig-hsw-4770r/spec@ext_texture_array@gen-mipmap.html



  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6817/pig-hsw-4770r/spec@ext_texture_array@maxlayers.html






  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6817/pig-hsw-4770r/spec@ext_texture_compression_rgtc@fbo-generatemipmap-formats.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6817/pig-hsw-4770r/spec@ext_texture_compression_rgtc@fbo-generatemipmap-formats-signed.html

  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14252/pig-hsw-4770r/spec@ext_framebuffer_object@fbo-generatemipmap-formats.html

  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14252/pig-hsw-4770r/spec@ext_texture_norm16@render.html

  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14252/pig-hsw-4770r/spec@ext_texture_array@maxlayers.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14252/pig-hsw-4770r/spec@ext_texture_array@gen-mipmap.html


  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6818/pig-hsw-4770r/spec@ext_texture_norm16@render.html

  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6818/pig-hsw-4770r/spec@ext_texture_array@maxlayers.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6818/pig-hsw-4770r/spec@ext_texture_array@gen-mipmap.html







  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6818/pig-hsw-4770r/spec@ext_texture_compression_rgtc@fbo-generatemipmap-formats.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6818/pig-hsw-4770r/spec@ext_texture_compression_rgtc@fbo-generatemipmap-formats-signed.html
Comment 2 Chris Wilson 2019-09-03 04:32:45 UTC
(In reply to Martin Peres from comment #0)
> A lot of HSW regressed in piglit:
> https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6816/pig-hsw-4770r/
> spec@ext_framebuffer_object@fbo-generatemipmap-formats.html
> 
> This looks like it was caused by:
> dffa8feb3084 drm/i915/perf: Assert locking for i915_init_oa_perf_state()

We can rule that out simply by virtue hsw is not gen8.
Comment 3 Chris Wilson 2019-09-03 05:36:09 UTC
-7 should be SIGBUS, so I'd bet on an -ENOSPC from ggtt pinning. That too should be impossible...
Comment 6 Chris Wilson 2019-09-03 20:00:38 UTC
commit 8f9fb61caed13e282e1e3387e64905b90cc65abd (HEAD -> drm-intel-next-queued, drm-intel/drm-intel-next-queued)
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Sep 2 05:02:46 2019 +0100

    drm/i915: Refresh the errno to vmf_fault translations
    
    It's been a long time since we accidentally reported -EIO upon wedging,
    it can now only be generated by failure to swap in a page.
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
    Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20190902040303.14195-4-chris@chris-wilson.co.uk

includes a test for the -ENOSPC on pin hypothesis.
Comment 7 Chris Wilson 2019-09-04 05:20:29 UTC
Would you look at that!

<3>[ 1377.293444] i915_gem_fault:283 GEM_BUG_ON(vma == ERR_PTR(-28))
<4>[ 1377.293555] ------------[ cut here ]------------
<2>[ 1377.293558] kernel BUG at drivers/gpu/drm/i915/gem/i915_gem_mman.c:283!
<4>[ 1377.293600] invalid opcode: 0000 [#1] PREEMPT SMP PTI
<4>[ 1377.293610] CPU: 1 PID: 11839 Comm: fbo-clear-forma Not tainted 5.3.0-rc6-CI-Patchwork_14255+ #1
<4>[ 1377.293623] Hardware name: GIGABYTE M4HM87P-00/M4HM87P-00, BIOS F6 12/10/2014
<4>[ 1377.293696] RIP: 0010:i915_gem_fault+0x6c2/0x910 [i915]
<4>[ 1377.293706] Code: fd 79 e4 e0 48 8b 35 15 0e 20 00 49 c7 c0 18 02 45 a0 b9 1b 01 00 00 48 c7 c2 58 34 3f a0 48 c7 c7 bd 5f 2e a0 e8 7e 5d eb e0 <0f> 0b 48 c7 c1 61 01 45 a0 ba 70 01 00 00 48 c7 c6 c0 33 3f a0 48
<4>[ 1377.293727] RSP: 0000:ffffc90019b73d50 EFLAGS: 00010282
<4>[ 1377.293736] RAX: 000000000000000a RBX: 0000000000000000 RCX: 0000000000000000
<4>[ 1377.293746] RDX: 0000000000000001 RSI: 0000000000000008 RDI: 0000000000000664
<4>[ 1377.293755] RBP: ffffc90019b73de0 R08: 0000000000000000 R09: 0000000000000664
<4>[ 1377.293765] R10: 0000000000000000 R11: ffff888214d28aa8 R12: ffff888208aa3880
<4>[ 1377.293774] R13: ffff88820eb70000 R14: ffff88820eb7bb10 R15: 0000000000000000
<4>[ 1377.293784] FS:  00007f2b2a5f4780(0000) GS:ffff888216680000(0000) knlGS:0000000000000000
<4>[ 1377.293794] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[ 1377.293802] CR2: 00007f2b20fd9000 CR3: 00000002078a8004 CR4: 00000000001606e0
<4>[ 1377.293811] Call Trace:
<4>[ 1377.293824]  __do_fault+0x2c/0xb0
<4>[ 1377.293832]  __handle_mm_fault+0x9a6/0xfc0
<4>[ 1377.293844]  handle_mm_fault+0x155/0x360
<4>[ 1377.293853]  __do_page_fault+0x249/0x4f0
<4>[ 1377.293863]  page_fault+0x34/0x40
<4>[ 1377.293869] RIP: 0033:0x7f2b24a78fa6
<4>[ 1377.293876] Code: 01 fb 44 8b 4c 24 20 48 8b 4c 24 38 48 8b 7c 24 40 c1 ea 02 44 0f af f2 44 03 74 24 70 83 7c 24 08 73 0f b6 04 01 43 8d 14 0e <8b> 14 97 0f 85 69 ff ff ff 48 8b 7c 24 48 01 db 89 14 9f 41 8b 55
<4>[ 1377.293898] RSP: 002b:00007fff9444e1f0 EFLAGS: 00010287
<4>[ 1377.293906] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00007f2b2a476000
<4>[ 1377.293916] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007f2b20fd9000
<4>[ 1377.293925] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
<4>[ 1377.293934] R10: 0000000000000000 R11: 0000000000000000 R12: 000055966141cd70
<4>[ 1377.293944] R13: 000055966143b120 R14: 0000000000000000 R15: 0000000000000000
<4>[ 1377.293957] Modules linked in: snd_hda_codec_hdmi i915 x86_pkg_temp_thermal coretemp mei_hdcp btusb btrtl btbcm btintel crct10dif_pclmul bluetooth crc32_pclmul ghash_clmulni_intel ecdh_generic ecc snd_hda_codec_realtek snd_hda_codec_generic r8169 realtek snd_hda_intel snd_intel_nhlt snd_hda_codec snd_hwdep snd_hda_core snd_pcm mei_me lpc_ich mei prime_numbers
Comment 9 Vanshidhar Konda 2019-11-06 19:23:48 UTC
This test was failing with this signature with 10-15 runs of the previous failure up until 1.5 weeks ago. This failure has not occurred in the past 78 runs. We should monitor this failure and close it if it doesn't reproduce in the next month.

Setting the priority to low as the bug has not been observed for a while.
Comment 10 Martin Peres 2019-11-29 19:25:38 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/intel/issues/389.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.