Bug 101597

Summary: [BAT][PNV] pipe [AB] vblank wait timed out when running igt@kms_pipe_crc_basic@hang-read-crc-pipe-a/b
Product: DRI Reporter: Martin Peres <martin.peres>
Component: DRM/IntelAssignee: Maarten Lankhorst <bugs>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: critical    
Priority: high CC: intel-gfx-bugs
Version: DRI git   
Hardware: Other   
OS: All   
Whiteboard: ReadyForDev
i915 platform: PNV i915 features: display/Other

Description Martin Peres 2017-06-26 12:33:09 UTC
Starting from CI_DRM_2765, the igt@kms_pipe_crc_basic@hang-read-crc-pipe-a/b tests started being run on fi-pnv-d510.

The pipe b always gives the following warning:
[  490.402413] Setting dangerous option reset - tainting kernel
[  491.525783] drm/i915: Resetting chip after gpu hang
[  492.097964] pipe B vblank wait timed out
[  492.098046] ------------[ cut here ]------------
[  492.098203] WARNING: CPU: 0 PID: 3536 at drivers/gpu/drm/i915/intel_display.c:12851 intel_atomic_commit_tail+0xf4e/0xf70 [i915]
[  492.098210] Modules linked in: vgem i915 coretemp snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec lpc_ich snd_hwdep r8169 snd_hda_core mii snd_pcm prime_numbers
[  492.098299] CPU: 0 PID: 3536 Comm: kms_pipe_crc_ba Tainted: G     U  W       4.12.0-rc7-CI-CI_DRM_2770+ #1
[  492.098307] Hardware name:                  /D510MO, BIOS MOPNV10J.86A.0311.2010.0802.2346 08/02/2010
[  492.098314] task: ffff880015b8cbc0 task.stack: ffffc90000cdc000
[  492.098452] RIP: 0010:intel_atomic_commit_tail+0xf4e/0xf70 [i915]
[  492.098462] RSP: 0018:ffffc90000cdfac8 EFLAGS: 00010292
[  492.098475] RAX: 000000000000001c RBX: 0000000000000001 RCX: 0000000000000006
[  492.098482] RDX: 0000000000000006 RSI: ffffffff81cba139 RDI: ffffffff81c9984f
[  492.098488] RBP: ffffc90000cdfb70 R08: 0000000000000000 R09: 0000000000000001
[  492.098494] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88001a3b8000
[  492.098501] R13: 0000000000000218 R14: 0000000000000004 R15: 0000000000000002
[  492.098508] FS:  00007f75d4c39a40(0000) GS:ffff88001ec00000(0000) knlGS:0000000000000000
[  492.098515] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  492.098522] CR2: 00007f75cfd88000 CR3: 000000000dfc8000 CR4: 00000000000006f0
[  492.098529] Call Trace:
[  492.098562]  ? wake_atomic_t_function+0x30/0x30
[  492.098712]  intel_atomic_commit+0x3fb/0x500 [i915]
[  492.098732]  ? drm_atomic_check_only+0x3d0/0x560
[  492.098745]  ? drm_connector_list_iter_end+0x2f/0x40
[  492.098759]  ? handle_conflicting_encoders+0x270/0x290
[  492.098776]  drm_atomic_commit+0x46/0x50
[  492.098791]  drm_atomic_helper_set_config+0x68/0x90
[  492.098807]  __drm_mode_set_config_internal+0x60/0x110
[  492.098823]  drm_mode_setcrtc+0x3e9/0x5f0
[  492.098956]  drm_ioctl+0x202/0x490
[  492.098968]  ? drm_mode_getcrtc+0x180/0x180
[  492.099007]  ? __this_cpu_preempt_check+0x13/0x20
[  492.099026]  do_vfs_ioctl+0x90/0x6d0
[  492.099040]  ? entry_SYSCALL_64_fastpath+0x5/0xb1
[  492.099054]  ? __this_cpu_preempt_check+0x13/0x20
[  492.099069]  ? trace_hardirqs_on_caller+0xe7/0x1c0
[  492.099087]  SyS_ioctl+0x3c/0x70
[  492.099107]  entry_SYSCALL_64_fastpath+0x1c/0xb1
[  492.099117] RIP: 0033:0x7f75d313d357
[  492.099125] RSP: 002b:00007ffc465d6bd8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  492.099137] RAX: ffffffffffffffda RBX: ffffffff81470333 RCX: 00007f75d313d357
[  492.099144] RDX: 00007ffc465d6c10 RSI: 00000000c06864a2 RDI: 0000000000000003
[  492.099150] RBP: ffffc90000cdff88 R08: 0000000000000000 R09: 00000000020372b8
[  492.099157] R10: 00000000020372d8 R11: 0000000000000246 R12: 0000000002036d18
[  492.099163] R13: 0000000000000003 R14: 00000000c06864a2 R15: 0000000000000003
[  492.099181]  ? __this_cpu_preempt_check+0x13/0x20
[  492.099207] Code: ff ff ff 48 83 c7 08 e8 01 c6 f5 e0 4c 8b 85 78 ff ff ff 4d 85 c0 0f 85 d4 fd ff ff 8d 73 41 48 c7 c7 50 02 24 a0 e8 6b 99 00 e1 <0f> ff e9 be fd ff ff 8d 70 41 48 c7 c7 20 02 24 a0 e8 55 99 00 
[  492.099547] ---[ end trace ffde64c737a3f6b0 ]---

The pipe A sometimes throws the following warning:
[  487.226361] drm/i915: Resetting chip after gpu hang
[  487.783046] pipe A vblank wait timed out
[  487.783119] ------------[ cut here ]------------
[  487.783253] WARNING: CPU: 3 PID: 3530 at drivers/gpu/drm/i915/intel_display.c:12851 intel_atomic_commit_tail+0xf4e/0xf70 [i915]
[  487.783259] Modules linked in: vgem i915 coretemp snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec lpc_ich snd_hwdep r8169 snd_hda_core mii snd_pcm prime_numbers
[  487.783329] CPU: 3 PID: 3530 Comm: kms_pipe_crc_ba Tainted: G     U          4.12.0-rc7-CI-CI_DRM_2770+ #1
[  487.783334] Hardware name:                  /D510MO, BIOS MOPNV10J.86A.0311.2010.0802.2346 08/02/2010
[  487.783340] task: ffff88001e6ea600 task.stack: ffffc90000cd4000
[  487.783457] RIP: 0010:intel_atomic_commit_tail+0xf4e/0xf70 [i915]
[  487.783463] RSP: 0018:ffffc90000cd7ac8 EFLAGS: 00010292
[  487.783472] RAX: 000000000000001c RBX: 0000000000000000 RCX: 0000000000000006
[  487.783478] RDX: 0000000000000006 RSI: ffffffff81cba139 RDI: ffffffff81c9984f
[  487.783483] RBP: ffffc90000cd7b70 R08: 0000000000000000 R09: 0000000000000001
[  487.783488] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88001a3b8000
[  487.783492] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
[  487.783499] FS:  00007fb889686a40(0000) GS:ffff88001ed80000(0000) knlGS:0000000000000000
[  487.783504] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  487.783509] CR2: 00007fb8847d5000 CR3: 0000000007997000 CR4: 00000000000006e0
[  487.783514] Call Trace:
[  487.783542]  ? wake_atomic_t_function+0x30/0x30
[  487.783666]  intel_atomic_commit+0x3fb/0x500 [i915]
[  487.783681]  ? drm_atomic_check_only+0x3d0/0x560
[  487.783691]  ? drm_connector_list_iter_end+0x2f/0x40
[  487.783701]  ? handle_conflicting_encoders+0x270/0x290
[  487.783714]  drm_atomic_commit+0x46/0x50
[  487.783724]  drm_atomic_helper_set_config+0x68/0x90
[  487.783736]  __drm_mode_set_config_internal+0x60/0x110
[  487.783747]  drm_mode_setcrtc+0x3e9/0x5f0
[  487.783789]  drm_ioctl+0x202/0x490
[  487.783796]  ? drm_mode_getcrtc+0x180/0x180
[  487.783824]  ? __this_cpu_preempt_check+0x13/0x20
[  487.783837]  do_vfs_ioctl+0x90/0x6d0
[  487.783848]  ? entry_SYSCALL_64_fastpath+0x5/0xb1
[  487.783858]  ? __this_cpu_preempt_check+0x13/0x20
[  487.783867]  ? trace_hardirqs_on_caller+0xe7/0x1c0
[  487.783880]  SyS_ioctl+0x3c/0x70
[  487.783894]  entry_SYSCALL_64_fastpath+0x1c/0xb1
[  487.783981] RIP: 0033:0x7fb887b8a357
[  487.783987] RSP: 002b:00007ffd201881d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  487.783997] RAX: ffffffffffffffda RBX: ffffffff81470333 RCX: 00007fb887b8a357
[  487.784003] RDX: 00007ffd20188210 RSI: 00000000c06864a2 RDI: 0000000000000003
[  487.784008] RBP: ffffc90000cd7f88 R08: 0000000000000000 R09: 00000000011fe2b8
[  487.784013] R10: 00000000011fe2d8 R11: 0000000000000246 R12: 00000000011fdc70
[  487.784018] R13: 0000000000000003 R14: 00000000c06864a2 R15: 0000000000000003
[  487.784032]  ? __this_cpu_preempt_check+0x13/0x20
[  487.784050] Code: ff ff ff 48 83 c7 08 e8 01 c6 f5 e0 4c 8b 85 78 ff ff ff 4d 85 c0 0f 85 d4 fd ff ff 8d 73 41 48 c7 c7 50 02 24 a0 e8 6b 99 00 e1 <0f> ff e9 be fd ff ff 8d 70 41 48 c7 c7 20 02 24 a0 e8 55 99 00 
[  487.784319] ---[ end trace ffde64c737a3f6af ]---

Full logs:
 - Pipe A: https://intel-gfx-ci.01.org/CI/CI_DRM_2770/fi-pnv-d510/igt@kms_pipe_crc_basic@hang-read-crc-pipe-a.html
 - Pipe B: https://intel-gfx-ci.01.org/CI/CI_DRM_2770/fi-pnv-d510/igt@kms_pipe_crc_basic@hang-read-crc-pipe-b.html
Comment 1 Elizabeth 2017-06-27 15:24:04 UTC
Adding tag into "Whiteboard" field - ReadyForDev
*Status is correct
*Platform is included
*Feature is included
*Priority and Severity correctly set
*Logs included
Comment 2 Maarten Lankhorst 2017-07-24 10:50:20 UTC
This bug only affects fi-pnv-d510, the one in farm 2 is unaffected.

I'm not sure when this was broken or if it was always broken, but the code needs to disable CXSR during atomic modeset enable. When cxsr is active you get no vblanks, which means you end up with this.

It will be fixed when converting the gen4- watermarks to atomic, the code properly disables cxsr there. I've tested this bug and on nightly I hit it all the time, while with atomic watermarks it's gone.
Comment 3 Martin Peres 2017-09-11 10:58:50 UTC
(In reply to Maarten Lankhorst from comment #2)
> This bug only affects fi-pnv-d510, the one in farm 2 is unaffected.
> 
> I'm not sure when this was broken or if it was always broken, but the code
> needs to disable CXSR during atomic modeset enable. When cxsr is active you
> get no vblanks, which means you end up with this.
> 
> It will be fixed when converting the gen4- watermarks to atomic, the code
> properly disables cxsr there. I've tested this bug and on nightly I hit it
> all the time, while with atomic watermarks it's gone.

121 runs without failures, this seems to be fixed.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.