Bug 110331 - [CI][SHARDS] igt@* - incomplete - NMI backtrace for cpu \d skipped: idling at acpi_processor_ffh_cstate_enter
Summary: [CI][SHARDS] igt@* - incomplete - NMI backtrace for cpu \d skipped: idling at...
Status: RESOLVED MOVED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: low normal
Assignee: Andi
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2019-04-05 06:41 UTC by Martin Peres
Modified: 2019-11-29 18:11 UTC (History)
1 user (show)

See Also:
i915 platform: G45, ICL
i915 features:


Attachments

Description Martin Peres 2019-04-05 06:41:23 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4926/shard-iclb5/igt@kms_pipe_crc_basic@hang-read-crc-pipe-d.html

<3> [938.216144] INFO: task kcompactd0:55 blocked for more than 61 seconds.
<3> [938.216421]       Tainted: G     U            5.1.0-rc3-CI-CI_DRM_5866+ #1
<3> [938.216466] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<6> [938.216521] kcompactd0      D14040    55      2 0x80000000
<4> [938.216567] Call Trace:
<4> [938.216593]  ? __schedule+0x36d/0xb40
<4> [938.216624]  ? wait_for_common+0x116/0x1f0
<4> [938.216654]  schedule+0x2d/0x80
<4> [938.216682]  schedule_timeout+0x236/0x4f0
<4> [938.216714]  ? wait_for_common+0x116/0x1f0
<4> [938.216746]  ? mark_held_locks+0x49/0x70
<4> [938.216775]  ? _raw_spin_unlock_irq+0x24/0x50
<4> [938.216809]  ? wait_for_common+0x116/0x1f0
<4> [938.216840]  wait_for_common+0x13a/0x1f0
<4> [938.216872]  ? wake_up_q+0x80/0x80
<4> [938.216904]  __flush_work+0x22f/0x320
<4> [938.216934]  ? flush_workqueue_prep_pwqs+0x1b0/0x1b0
<4> [938.216977]  ? drain_local_pages+0x30/0x30
<4> [938.217011]  drain_all_pages+0x172/0x1d0
<4> [938.217044]  kcompactd_do_work+0x28c/0x2d0
<4> [938.217084]  ? kcompactd_do_work+0x2d0/0x2d0
<4> [938.217114]  ? kcompactd+0x96/0x280
<4> [938.217174]  kcompactd+0x96/0x280
<4> [938.217201]  ? wait_woken+0xa0/0xa0
<4> [938.217236]  ? kcompactd_do_work+0x2d0/0x2d0
<4> [938.217266]  kthread+0x119/0x130
<4> [938.217292]  ? kthread_park+0x80/0x80
<4> [938.217323]  ret_from_fork+0x24/0x50
<4> [938.217375] 
Showing all locks held in the system:
<4> [938.217431] 1 lock held by khungtaskd/52:
<4> [938.217460]  #0: 0000000024b6c7d5 (rcu_read_lock){....}, at: debug_show_all_locks+0xe/0x1a0
<4> [938.217518] 1 lock held by kcompactd0/55:
<4> [938.217547]  #0: 00000000ce6f3070 (pcpu_drain_mutex){+.+.}, at: drain_all_pages+0x27/0x1d0
<4> [938.217605] 1 lock held by in:imklog/565:
<4> [938.217633]  #0: 00000000b002c24a (&f->f_pos_lock){+.+.}, at: __fdget_pos+0x3a/0x50
<4> [938.217689] 2 locks held by kworker/3:11/1689:
<4> [938.217722] 4 locks held by dmesg/2126:
<4> [938.217751] 
<4> [938.217765] =============================================

<4> [938.217814] NMI backtrace for cpu 2
<4> [938.217841] CPU: 2 PID: 52 Comm: khungtaskd Tainted: G     U            5.1.0-rc3-CI-CI_DRM_5866+ #1
<4> [938.217900] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.3087.A00.1902250334 02/25/2019
<4> [938.217980] Call Trace:
<4> [938.218001]  dump_stack+0x67/0x9b
<4> [938.218027]  nmi_cpu_backtrace+0x8a/0x90
<4> [938.218058]  ? lapic_can_unplug_cpu+0x90/0x90
<4> [938.218088]  nmi_trigger_cpumask_backtrace+0xb7/0xf0
<4> [938.218123]  watchdog+0x376/0x640
<4> [938.218154]  ? hungtask_pm_notify+0x40/0x40
<4> [938.218186]  kthread+0x119/0x130
<4> [938.218213]  ? kthread_park+0x80/0x80
<4> [938.218244]  ret_from_fork+0x24/0x50
<6> [938.218299] Sending NMI from CPU 2 to CPUs 0-1,3-7:
<4> [938.218356] NMI backtrace for cpu 0 skipped: idling at acpi_processor_ffh_cstate_enter+0x64/0xb0
<4> [938.218381] NMI backtrace for cpu 5 skipped: idling at acpi_processor_ffh_cstate_enter+0x64/0xb0
<4> [938.218383] NMI backtrace for cpu 1 skipped: idling at acpi_processor_ffh_cstate_enter+0x64/0xb0
<4> [938.218388] NMI backtrace for cpu 6
<4> [938.218389] CPU: 6 PID: 2126 Comm: dmesg Tainted: G     U            5.1.0-rc3-CI-CI_DRM_5866+ #1
<4> [938.218390] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.3087.A00.1902250334 02/25/2019
<4> [938.218391] RIP: 0010:get_partial_node.isra.30+0x2c5/0x460
<4> [938.218392] Code: 00 00 00 4c 89 95 40 ff ff ff 48 89 8d 48 ff ff ff 48 89 95 58 ff ff ff 4c 89 8d 78 ff ff ff e8 01 f5 ea ff f0 49 0f ba 2e 00 <4c> 8b 8d 78 ff ff ff 48 8b 95 58 ff ff ff 48 8b 8d 48 ff ff ff 4c
<4> [938.218393] RSP: 0018:ffffc90000c63a70 EFLAGS: 00000096
<4> [938.218394] RAX: 0000000000000002 RBX: 0000000000000000 RCX: 0000000000000000
<4> [938.218394] RDX: 0000000000000000 RSI: 00000000ffffffff RDI: 0000000000000001
<4> [938.218395] RBP: ffffc90000c63b30 R08: 0000000094d43629 R09: ffff888455a9a788
<4> [938.218396] R10: ffffea001156a688 R11: 0000000000000000 R12: 0000000000000015
<4> [938.218396] R13: ffffea00048e1c80 R14: ffffea001156a680 R15: ffff88849a337bc0
<4> [938.218397] FS:  00007efe80d36240(0000) GS:ffff88849ff80000(0000) knlGS:0000000000000000
<4> [938.218397] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [938.218398] CR2: 00007f7089d22db0 CR3: 00000004843f8006 CR4: 0000000000760ee0
<4> [938.218398] PKRU: 55555554
<4> [938.218399] Call Trace:
<4> [938.218399]  ? ___slab_alloc.constprop.35+0x1aa/0x380
<4> [938.218400]  ___slab_alloc.constprop.35+0x1aa/0x380
<4> [938.218400]  ? jbd2__journal_start+0x79/0x270
<4> [938.218401]  ? find_get_entry+0x198/0x2f0
<4> [938.218401]  ? jbd2__journal_start+0x79/0x270
<4> [938.218402]  ? __slab_alloc.isra.28.constprop.34+0x3d/0x70
<4> [938.218402]  __slab_alloc.isra.28.constprop.34+0x3d/0x70
<4> [938.218403]  ? jbd2__journal_start+0x79/0x270
<4> [938.218403]  kmem_cache_alloc+0x21c/0x280
<4> [938.218404]  jbd2__journal_start+0x79/0x270
<4> [938.218404]  ext4_da_write_begin+0x122/0x430
<4> [938.218405]  generic_perform_write+0xb4/0x1c0
<4> [938.218405]  __generic_file_write_iter+0xfd/0x190
<4> [938.218406]  ext4_file_write_iter+0x149/0x3d0
<4> [938.218406]  new_sync_write+0x104/0x160
<4> [938.218407]  vfs_write+0xbd/0x1b0
<4> [938.218407]  ksys_write+0x50/0xc0
<4> [938.218408]  do_syscall_64+0x55/0x190
<4> [938.218408]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [938.218409] RIP: 0033:0x7efe80215154
<4> [938.218409] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 b1 07 2e 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 f3 c3 66 90 41 54 55 49 89 d4 53 48 89 f5
<4> [938.218410] RSP: 002b:00007ffee59912b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
<4> [938.218411] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007efe80215154
<4> [938.218412] RDX: 0000000000000001 RSI: 000055ed54133118 RDI: 0000000000000001
<4> [938.218412] RBP: 000055ed54133118 R08: 0000000000000001 R09: 0000000000000000
<4> [938.218413] R10: 0000000000000005 R11: 0000000000000246 R12: 00007efe804f1760
<4> [938.218413] R13: 0000000000000001 R14: 00007efe804ec760 R15: 0000000000000001
<4> [938.218415] NMI backtrace for cpu 7
<4> [938.218417] CPU: 7 PID: 565 Comm: in:imklog Tainted: G     U            5.1.0-rc3-CI-CI_DRM_5866+ #1
<4> [938.218418] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.3087.A00.1902250334 02/25/2019
<4> [938.218418] RIP: 0010:__lock_acquire+0x43a/0x1590
<4> [938.218419] Code: c5 06 41 31 ef 44 89 f8 45 29 fd c1 c0 08 41 31 c5 8b 44 24 18 41 01 c7 44 29 e8 41 89 c6 44 89 e8 45 01 fd c1 c0 10 41 31 c6 <44> 89 e8 44 89 f5 45 29 f7 45 01 f5 c1 cd 0d 44 31 fd 41 89 ef 29
<4> [938.218420] RSP: 0018:ffffc900005c7d20 EFLAGS: 00000006
<4> [938.218422] RAX: 000000000f47069e RBX: ffff88847c4208e0 RCX: 0000000000000000
<4> [938.218423] RDX: 000000009db434f6 RSI: ffff88847c4208b8 RDI: ffff88847c420040
<4> [938.218423] RBP: 00000000e42210c2 R08: 0000000000000001 R09: 0000000000000000
<4> [938.218424] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88847c420040
<4> [938.218425] R13: 0000000092768246 R14: 000000007e9c4a9a R15: 000000008bd872ff
<4> [938.218426] FS:  00007f3067000700(0000) GS:ffff88849ffc0000(0000) knlGS:0000000000000000
<4> [938.218426] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [938.218427] CR2: 00007f4c99495628 CR3: 000000047c4c0003 CR4: 0000000000760ee0
<4> [938.218427] PKRU: 55555554
<4> [938.218428] Call Trace:
<4> [938.218429]  lock_acquire+0xa6/0x1c0
<4> [938.218429]  ? do_syslog+0x41d/0x870
<4> [938.218430]  _raw_spin_lock+0x2a/0x40
<4> [938.218431]  ? do_syslog+0x41d/0x870
<4> [938.218431]  do_syslog+0x41d/0x870
<4> [938.218432]  ? wait_woken+0xa0/0xa0
<4> [938.218432]  kmsg_read+0x39/0x50
<4> [938.218433]  proc_reg_read+0x34/0x60
<4> [938.218434]  vfs_read+0x9e/0x150
<4> [938.218434]  ksys_read+0x50/0xc0
<4> [938.218435]  do_syscall_64+0x55/0x190
<4> [938.218436]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [938.218436] RIP: 0033:0x7f3069eb2384
<4> [938.218437] Code: 84 00 00 00 00 00 41 54 55 49 89 d4 53 48 89 f5 89 fb 48 83 ec 10 e8 8b fc ff ff 4c 89 e2 41 89 c0 48 89 ee 89 df 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 38 44 89 c7 48 89 44 24 08 e8 c7 fc ff ff 48
<4> [938.218438] RSP: 002b:00007f3066fdf4c0 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
<4> [938.218440] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f3069eb2384
<4> [938.218441] RDX: 0000000000001fa0 RSI: 00007f3066fdfd00 RDI: 0000000000000005
<4> [938.218442] RBP: 00007f3066fdfd00 R08: 0000000000000000 R09: a3d70a3d70a3d70b
<4> [938.218443] R10: 2ce33e6c02ce33e7 R11: 0000000000000246 R12: 0000000000001fa0
<4> [938.218443] R13: 0000000000001fa0 R14: 0000000000001f9f R15: 00007f3066fdfd2b
<4> [938.218445] NMI backtrace for cpu 4
<4> [938.218447] CPU: 4 PID: 2282 Comm: kms_pipe_crc_ba Tainted: G     U            5.1.0-rc3-CI-CI_DRM_5866+ #1
<4> [938.218449] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.3087.A00.1902250334 02/25/2019
<4> [938.218450] RIP: 0010:dw_readl+0x36/0x50
<4> [938.218452] Code: f6 f6 c2 02 48 8d 04 31 74 1e 66 8b 38 48 8d 44 31 02 66 8b 00 0f b7 cf c1 e0 10 09 c8 83 e2 01 89 c1 0f c9 0f 45 c1 c3 8b 00 <89> c1 83 e2 01 0f c9 0f 45 c1 c3 0f 1f 44 00 00 66 2e 0f 1f 84 00
<4> [938.218453] RSP: 0000:ffff88849ff03e88 EFLAGS: 00000046
<4> [938.218456] RAX: 00000000ffffffff RBX: 0000000000000040 RCX: ffffc9000028d000
<4> [938.218458] RDX: 0000000000000000 RSI: 0000000000000054 RDI: ffff8884999622c0
<4> [938.218459] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
<4> [938.218461] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000fffffeff
<4> [938.218462] R13: ffff88849ff03f2c R14: 0000000000000000 R15: ffff8884999622c0
<4> [938.218464] FS:  00007fa03ac3f980(0000) GS:ffff88849ff00000(0000) knlGS:0000000000000000
<4> [938.218465] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [938.218466] CR2: 00007fa03321c83c CR3: 000000048cb62002 CR4: 0000000000760ee0
<4> [938.218468] PKRU: 55555554
<4> [938.218469] Call Trace:
<4> [938.218470]  <IRQ>
<4> [938.218471]  i2c_dw_isr+0x215/0x6d0
<4> [938.218473]  __handle_irq_event_percpu+0x41/0x2d0
<4> [938.218474]  ? handle_irq_event+0x27/0x50
<4> [938.218475]  handle_irq_event_percpu+0x2b/0x70
<4> [938.218476]  handle_irq_event+0x2f/0x50
<4> [938.218478]  handle_fasteoi_irq+0x8e/0x150
<4> [938.218479]  handle_irq+0x67/0x160
<4> [938.218480]  do_IRQ+0x5e/0x130
<4> [938.218481]  common_interrupt+0xf/0xf
<4> [938.218483]  </IRQ>
<4> [938.218484] RIP: 0033:0x7fa03aa61713
<4> [938.218486] Code: 38 4c 8d 3c 87 4c 89 64 24 70 eb 11 0f 1f 00 49 83 c7 04 41 83 e5 01 0f 85 63 07 00 00 41 8b 07 49 89 c5 48 33 04 24 48 d1 e8 <75> e3 4c 89 fb 48 2b 5c 24 38 48 c1 fb 02 41 89 dc 89 db 48 8d 04
<4> [938.218487] RSP: 002b:00007ffc3d1c88a0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdd
<4> [938.218490] RAX: 0000000000000000 RBX: 00007fa03ac55400 RCX: 00000000881ad35e
<4> [938.218492] RDX: 00000000000007a8 RSI: 0000000000000805 RDI: 00007fa03320eff4
<4> [938.218493] RBP: 00007fa033213130 R08: 00007fa033213130 R09: 00007fa03ac80428
<4> [938.218495] R10: 00007fa03ac55400 R11: 0000000000000000 R12: 00007fa03ac495a8
<4> [938.218496] R13: 00000000881ad35e R14: 0000000000000000 R15: 00007fa033212e8c
<4> [938.218498] NMI backtrace for cpu 3
<4> [938.218499] CPU: 3 PID: 1689 Comm: kworker/3:11 Tainted: G     U            5.1.0-rc3-CI-CI_DRM_5866+ #1
<4> [938.218501] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.3087.A00.1902250334 02/25/2019
<4> [938.218502] Workqueue: events delayed_fput
<4> [938.218505] RIP: 0010:mark_lock+0x3c/0x650
<4> [938.218507] Code: 00 00 d3 e3 48 89 f5 41 89 d4 48 83 ec 08 0f b7 46 20 66 25 ff 1f 75 2a 8b 05 98 ca 5b 02 85 c0 74 7a 31 c0 48 85 58 50 74 35 <41> be 01 00 00 00 48 83 c4 08 44 89 f0 5b 5d 41 5c 41 5d 41 5e 41
<4> [938.218508] RSP: 0018:ffffc900009fb9c0 EFLAGS: 00000002
<4> [938.218511] RAX: ffffffff82c34e00 RBX: 0000000000000040 RCX: 0000000000000006
<4> [938.218512] RDX: 0000000000000006 RSI: ffff8884897488b8 RDI: ffff888489748040
<4> [938.218514] RBP: ffff8884897488b8 R08: 00000000c30f7ceb R09: 0000000000000000
<4> [938.218515] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000006
<4> [938.218516] R13: ffff8884897488b8 R14: 0000000000000000 R15: ffff888489748040
<4> [938.218518] FS:  0000000000000000(0000) GS:ffff88849fec0000(0000) knlGS:0000000000000000
<4> [938.218519] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [938.218521] CR2: 00007f708b1243c0 CR3: 0000000497aae006 CR4: 0000000000760ee0
<4> [938.218522] PKRU: 55555554
<4> [938.218523] Call Trace:
<4> [938.218524]  mark_held_locks+0x49/0x70
<4> [938.218526]  ? _raw_spin_unlock_irqrestore+0x4c/0x60
<4> [938.218527]  lockdep_hardirqs_on+0x10d/0x1b0
<4> [938.218528]  _raw_spin_unlock_irqrestore+0x4c/0x60
<4> [938.218530]  debug_check_no_obj_freed+0x132/0x210
<4> [938.218531]  free_pcp_prepare+0x77/0x160
<4> [938.218532]  free_unref_page_list+0x69/0x250
<4> [938.218534]  release_pages+0x2d3/0x390
<4> [938.218535]  __pagevec_release+0x1f/0x30
<4> [938.218536]  shmem_undo_range+0x3ad/0x890
<4> [938.218537]  shmem_truncate_range+0x11/0x30
<4> [938.218539]  shmem_evict_inode+0xb8/0x1a0
<4> [938.218540]  ? dput+0x20/0x2c0
<4> [938.218541]  evict+0xcb/0x190
<4> [938.218542]  __dentry_kill+0xca/0x190
<4> [938.218544]  dentry_kill+0x4b/0x1b0
Comment 1 Martin Peres 2019-04-05 06:42:29 UTC
This seems to be related to the known BIOS issue. Let's monitor that!
Comment 2 CI Bug Log 2019-04-05 06:43:06 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* ICL: all tests - incomplete - NMI backtrace for cpu \d skipped: idling at acpi_processor_ffh_cstate_enter
  - https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4926/shard-iclb5/igt@kms_pipe_crc_basic@hang-read-crc-pipe-d.html
Comment 3 Martin Peres 2019-04-16 12:55:36 UTC
Let's monitor the failure rate now that the following commit has landed:

https://cgit.freedesktop.org/drm/drm-intel/commit/?h=topic/core-for-CI&id=b573fba52f339dc4fadef7282af4a9413fd6173d

CL HACK: Disable ACPI idle drivertopic/core-for-CI
There were few system hung observed while running i915_pm_rpm igt test.
FDO https://bugs.freedesktop.org/show_bug.cgi?id=108840
Root cause is believed to due to page fault in ACPI idle driver.
(FDO comment 18).
It has been suggested by Daniel Vetter to disable ACPI idle
driver for Core-for-CI, only for ICL.

This hacky patch is only for ICL processor and for Core-for-CI branch.

v2: Fixed compilation errors raised by lkp.
    commit message improvement.

Cc: martin.peres@intel.com
Cc: daniel.vetter@intel.com

Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/1554791361-27684-1-git-send-email-anshuman.gupta@intel.com
Comment 4 Jani Saarinen 2019-04-22 12:17:16 UTC
Not seen after change in topic for CI. Monitor still..
Comment 5 Ville Syrjala 2019-04-26 11:35:19 UTC
(In reply to Martin Peres from comment #3)
> Let's monitor the failure rate now that the following commit has landed:
> 
> https://cgit.freedesktop.org/drm/drm-intel/commit/?h=topic/core-for-
> CI&id=b573fba52f339dc4fadef7282af4a9413fd6173d
> 
> CL HACK: Disable ACPI idle drivertopic/core-for-CI
> There were few system hung observed while running i915_pm_rpm igt test.
> FDO https://bugs.freedesktop.org/show_bug.cgi?id=108840
> Root cause is believed to due to page fault in ACPI idle driver.
> (FDO comment 18).
> It has been suggested by Daniel Vetter to disable ACPI idle
> driver for Core-for-CI, only for ICL.
> 
> This hacky patch is only for ICL processor and for Core-for-CI branch.
> 
> v2: Fixed compilation errors raised by lkp.
>     commit message improvement.
> 
> Cc: martin.peres@intel.com
> Cc: daniel.vetter@intel.com
> 
> Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Link:
> https://patchwork.freedesktop.org/patch/msgid/1554791361-27684-1-git-send-
> email-anshuman.gupta@intel.com

Ugh. I think this means we've effectively disabled all C-states in CI. Last I looked intel_idle icl support wasn't upstream yet.
Comment 6 Ville Syrjala 2019-04-26 11:43:10 UTC
(In reply to Ville Syrjala from comment #5)
> (In reply to Martin Peres from comment #3)
> > Let's monitor the failure rate now that the following commit has landed:
> > 
> > https://cgit.freedesktop.org/drm/drm-intel/commit/?h=topic/core-for-
> > CI&id=b573fba52f339dc4fadef7282af4a9413fd6173d
> > 
> > CL HACK: Disable ACPI idle drivertopic/core-for-CI
> > There were few system hung observed while running i915_pm_rpm igt test.
> > FDO https://bugs.freedesktop.org/show_bug.cgi?id=108840
> > Root cause is believed to due to page fault in ACPI idle driver.
> > (FDO comment 18).
> > It has been suggested by Daniel Vetter to disable ACPI idle
> > driver for Core-for-CI, only for ICL.
> > 
> > This hacky patch is only for ICL processor and for Core-for-CI branch.
> > 
> > v2: Fixed compilation errors raised by lkp.
> >     commit message improvement.
> > 
> > Cc: martin.peres@intel.com
> > Cc: daniel.vetter@intel.com
> > 
> > Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Link:
> > https://patchwork.freedesktop.org/patch/msgid/1554791361-27684-1-git-send-
> > email-anshuman.gupta@intel.com
> 
> Ugh. I think this means we've effectively disabled all C-states in CI. Last
> I looked intel_idle icl support wasn't upstream yet.

Yeah, no C-states visible on icl-u2. So low power watermarks and whatnot not getting tested at all currently.
Comment 7 Martin Peres 2019-04-26 11:44:44 UTC
(In reply to Ville Syrjala from comment #6)
> (In reply to Ville Syrjala from comment #5)
> > (In reply to Martin Peres from comment #3)
> > > Let's monitor the failure rate now that the following commit has landed:
> > > 
> > > https://cgit.freedesktop.org/drm/drm-intel/commit/?h=topic/core-for-
> > > CI&id=b573fba52f339dc4fadef7282af4a9413fd6173d
> > > 
> > > CL HACK: Disable ACPI idle drivertopic/core-for-CI
> > > There were few system hung observed while running i915_pm_rpm igt test.
> > > FDO https://bugs.freedesktop.org/show_bug.cgi?id=108840
> > > Root cause is believed to due to page fault in ACPI idle driver.
> > > (FDO comment 18).
> > > It has been suggested by Daniel Vetter to disable ACPI idle
> > > driver for Core-for-CI, only for ICL.
> > > 
> > > This hacky patch is only for ICL processor and for Core-for-CI branch.
> > > 
> > > v2: Fixed compilation errors raised by lkp.
> > >     commit message improvement.
> > > 
> > > Cc: martin.peres@intel.com
> > > Cc: daniel.vetter@intel.com
> > > 
> > > Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > Link:
> > > https://patchwork.freedesktop.org/patch/msgid/1554791361-27684-1-git-send-
> > > email-anshuman.gupta@intel.com
> > 
> > Ugh. I think this means we've effectively disabled all C-states in CI. Last
> > I looked intel_idle icl support wasn't upstream yet.
> 
> Yeah, no C-states visible on icl-u2. So low power watermarks and whatnot not
> getting tested at all currently.

Changing the priority to highest because of the testing gap that will potentially lead to flickering screens.
Comment 8 Jani Saarinen 2019-05-06 07:04:30 UTC
Should we actually consider revert of ICL HACK: Disable ACPI idle drivertopic/core-for-CI?
Comment 9 Jani Saarinen 2019-05-13 06:06:42 UTC
According to Martin no. So now discussing with Core team to look at fixing ACPI idle.
Comment 10 Jani Saarinen 2019-05-23 06:31:02 UTC
Patch has been taken away, waiting devs to confirm next steps here.
Comment 11 Jani Saarinen 2019-05-23 06:33:17 UTC
This issue has been seen only once also. I propose to lower priority on this.
Comment 12 Anshuman Gupta 2019-05-23 11:39:51 UTC
This issue seems due to kernel task kcompactd0   is blocked for more then 61 seconds.
There is cicular lock dpepedency 

<4>[  938.494641] ======================================================
<4>[  938.494641] WARNING: possible circular locking dependency detected
<4>[  938.494641] 5.1.0-rc3-CI-CI_DRM_5866+ #1 Tainted: G     U           
<4>[  938.494642] ------------------------------------------------------
<4>[  938.494642] kworker/3:11/1689 is trying to acquire lock:
<4>[  938.494642] 000000008b3d0788 ((console_sem).lock){-.-.}, at: down_trylock+0xa/0x30
<4>[  938.494643] 
<4>[  938.494644] but task is already holding lock:
<4>[  938.494644] 000000002522a786 (&rq->lock){-.-.}, at: try_to_wake_up+0x1e2/0x5f0
<4>[  938.494645] 
<4>[  938.494645] which lock already depends on the new lock.
<4>[  938.494646] 
<4>[  938.494646] 
<4>[  938.494646] the existing dependency chain (in reverse order) is:

And this test hang-read-crc-pipe checks crc after doing a hang and gpu reset (not sure what is it).

Can some gem expert provide their comments whether the hang and gpu reset done by this test is causing to block kcompactd0 for more then 61 seconds.
Comment 13 Lakshmi 2019-06-05 07:16:15 UTC
Update: For now, this issue has occurred twice
CI_DRM_6145_full (1 week, 1 day old), current CI_DRM run is 6189.	
IGT_4926_full (2 months old), current IGT run is 5037.
Comment 14 Jani Saarinen 2019-06-07 12:49:56 UTC
Francesco and team have not been able to reproduce issue. Dropping priority.
Comment 15 Jani Saarinen 2019-06-07 12:50:43 UTC
And not seen now 1,5 weeks either. Has seen only 2 times altogether.
Comment 16 Francesco Balestrieri 2019-06-17 10:28:06 UTC
Same status, seen twice and not for the last 3 weeks.
Comment 17 Andi 2019-07-24 14:25:22 UTC
I did not manage to reproduce the issue.

This bug has a reproduction rate of once every (a bit less than) 2 months... shall we lower the priority?
Comment 18 Francesco Balestrieri 2019-07-24 14:31:19 UTC
Agreed, setting to medium and let's see when it happens again.
Comment 19 CI Bug Log 2019-09-24 12:35:47 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* ELK: igt@perf_pmu@cpu-hotplug - dmesg-warn - INFO: task kworker/u8:20:1058 blocked for more than 61 seconds
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_376/fi-elk-e7500/igt@perf_pmu@cpu-hotplug.html
Comment 20 Francesco Balestrieri 2019-11-25 04:42:45 UTC
Last happened two months ago. Given the reproduction rate of once every couple of months, we can't say this is gone, but at least setting priority to low.
Comment 21 Martin Peres 2019-11-29 18:11:28 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/intel/issues/256.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.