Bug 100434 - [BSW][EXT] gem_pwrite@big-cpu-backwards hung task
Summary: [BSW][EXT] gem_pwrite@big-cpu-backwards hung task
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-03-28 15:13 UTC by Tomi Sarvela
Modified: 2017-04-12 08:10 UTC (History)
1 user (show)

See Also:
i915 platform: BSW/CHT
i915 features: GEM/Other


Attachments

Description Tomi Sarvela 2017-03-28 15:13:35 UTC
Todays DRM-Tip, kconfig close to
http://benchsrv.fi.intel.com/archive/results/CI_IGT_test/CI_DRM_2422/kernel.config.bz2

Hardware: Intel BSW-nuc with n3050, Ubuntu base

Running IGT extended-list WIP, triggered following problem on igt@gem_pwrite@big-cpu-backwards:

<14>[17257.582836] [IGT] gem_pwrite: executing
<14>[17257.606719] [IGT] gem_pwrite: starting subtest big-cpu-backwards
<3>[17314.864204] INFO: task kworker/0:0:23627 blocked for more than 30 seconds.
<3>[17314.864621]       Tainted: G     U  W       4.11.0-rc4-CI-CI_DRM_320+ #1
<3>[17314.864830] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<6>[17314.865075] kworker/0:0     D    0 23627      2 0x00000000
<6>[17314.865349] Workqueue: events __i915_gem_free_work [i915]
<4>[17314.865626] Call Trace:
<4>[17314.865686]  __schedule+0x3fd/0xbc0
<4>[17314.865743]  schedule+0x3b/0x90
<4>[17314.865778]  schedule_preempt_disabled+0x13/0x20
<4>[17314.865812]  __mutex_lock+0x293/0x990
<4>[17314.865840]  ? __mutex_lock+0xb0/0x990
<4>[17314.865878]  ? __lock_acquire+0x45b/0x1bb0
<4>[17314.866094]  ? __i915_gem_free_objects+0x29/0x4e0 [i915]
<4>[17314.866172]  mutex_lock_nested+0x16/0x20
<4>[17314.866387]  __i915_gem_free_objects+0x29/0x4e0 [i915]
<4>[17314.867921]  __i915_gem_free_work+0x2e/0x50 [i915]
<4>[17314.867972]  process_one_work+0x1f4/0x6d0
<4>[17314.868007]  ? process_one_work+0x16e/0x6d0
<4>[17314.868064]  worker_thread+0x49/0x4a0
<4>[17314.868117]  kthread+0x107/0x140
<4>[17314.868147]  ? process_one_work+0x6d0/0x6d0
<4>[17314.868182]  ? kthread_create_on_node+0x40/0x40
<4>[17314.868772]  ret_from_fork+0x2e/0x40
<4>[17314.868849] 
<4>[17314.868849] Showing all locks held in the system:
<4>[17314.868891] 2 locks held by khungtaskd/24:
<4>[17314.868966]  #0:  (rcu_read_lock){......}, at: [<ffffffff8113d00c>] watchdog+0x9c/0x600
<4>[17314.869096]  #1:  (tasklist_lock){.+.+..}, at: [<ffffffff810dc92d>] debug_show_all_locks+0x3d/0x1a0
<4>[17314.869449] 3 locks held by kworker/0:0/23627:
<4>[17314.869474]  #0:  ("events"){++++.+}, at: [<ffffffff8109d0ae>] process_one_work+0x16e/0x6d0
<4>[17314.869604]  #1:  ((&dev_priv->mm.free_work)){+.+...}, at: [<ffffffff8109d0ae>] process_one_work+0x16e/0x6d0
<4>[17314.869718]  #2:  (&dev->struct_mutex){+.+.+.}, at: [<ffffffffa011e559>] __i915_gem_free_objects+0x29/0x4e0 [i915]
<4>[17314.870062] 4 locks held by gem_pwrite/23954:
<4>[17314.870093]  #0:  (sb_writers#10){.+.+.+}, at: [<ffffffff8120950d>] vfs_write+0x17d/0x1f0
<4>[17314.870378]  #1:  (debugfs_srcu){......}, at: [<ffffffff8131c1b1>] debugfs_use_file_start+0x41/0xa0
<4>[17314.870497]  #2:  (&attr->mutex){+.+.+.}, at: [<ffffffff812360b7>] simple_attr_write+0x37/0xe0
<4>[17314.870608]  #3:  (&dev->struct_mutex){+.+.+.}, at: [<ffffffffa0100d1a>] i915_drop_caches_set+0x3a/0x150 [i915]
<4>[17314.870898] 
<4>[17314.870919] =============================================
<4>[17314.870919] 
<4>[17314.871845] NMI backtrace for cpu 0
<4>[17314.871881] CPU: 0 PID: 24 Comm: khungtaskd Tainted: G     U  W       4.11.0-rc4-CI-CI_DRM_320+ #1
<4>[17314.871907] Hardware name:                  /NUC5CPYB, BIOS PYBSWCEL.86A.0058.2016.1102.1842 11/02/2016
<4>[17314.871926] Call Trace:
<4>[17314.871966]  dump_stack+0x67/0x92
<4>[17314.872000]  ? irq_force_complete_move+0xf0/0xf0
<4>[17314.872030]  nmi_cpu_backtrace+0x92/0xa0
<4>[17314.872059]  ? irq_force_complete_move+0xf0/0xf0
<4>[17314.872091]  nmi_trigger_cpumask_backtrace+0xc0/0x100
<4>[17314.872129]  arch_trigger_cpumask_backtrace+0x14/0x20
<4>[17314.872158]  watchdog+0x433/0x600
<4>[17314.872182]  ? watchdog+0x9c/0x600
<4>[17314.872225]  kthread+0x107/0x140
<4>[17314.872250]  ? reset_hung_task_detector+0x10/0x10
<4>[17314.872277]  ? kthread_create_on_node+0x40/0x40
<4>[17314.872310]  ret_from_fork+0x2e/0x40
<6>[17314.872569] Sending NMI from CPU 0 to CPUs 1:
<4>[17314.873007] NMI backtrace for cpu 1
<4>[17314.873021] CPU: 1 PID: 512 Comm: rs:main Q:Reg Tainted: G     U  W       4.11.0-rc4-CI-CI_DRM_320+ #1
<4>[17314.873029] Hardware name:                  /NUC5CPYB, BIOS PYBSWCEL.86A.0058.2016.1102.1842 11/02/2016
<4>[17314.873035] task: ffff88016f3f8040 task.stack: ffffc90000988000
<4>[17314.873041] RIP: 0010:ext4_da_write_end+0x267/0x3f0
<4>[17314.873049] RSP: 0018:ffffc9000098bc98 EFLAGS: 00000282
<4>[17314.873064] RAX: 0000000000000000 RBX: 0000000000000757 RCX: 0000000000000757
<4>[17314.873070] RDX: 00000000064c977f RSI: ffff880171d21570 RDI: ffff88017388a740
<4>[17314.873077] RBP: ffffc9000098bcf8 R08: 0000000000000757 R09: ffffea0005fd8e40
<4>[17314.873083] R10: ffffea0005fd8e40 R11: ffff8801698af818 R12: 00000000064c977f
<4>[17314.873090] R13: 0000000000000757 R14: 000000000000077f R15: ffff880171d21378
<4>[17314.873096] FS:  00007f43f7db3700(0000) GS:ffff88017fd00000(0000) knlGS:0000000000000000
<4>[17314.873103] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[17314.873109] CR2: 00007f8f994b6000 CR3: 0000000171b94000 CR4: 00000000001006e0
<4>[17314.873114] Call Trace:
<4>[17314.873120]  ? iov_iter_copy_from_user_atomic+0xf8/0x360
<4>[17314.873126]  generic_perform_write+0x102/0x1b0
<4>[17314.873132]  __generic_file_write_iter+0x18e/0x1e0
<4>[17314.873138]  ext4_file_write_iter+0xef/0x380
<4>[17314.873143]  __vfs_write+0xbf/0x120
<4>[17314.873148]  vfs_write+0xc6/0x1f0
<4>[17314.873154]  SyS_write+0x44/0xb0
<4>[17314.873160]  entry_SYSCALL_64_fastpath+0x1c/0xb1
<4>[17314.873165] RIP: 0033:0x7f43faa024bd
<4>[17314.873171] RSP: 002b:00007f43f7db2600 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
<4>[17314.873184] RAX: ffffffffffffffda RBX: ffffffff8147edf3 RCX: 00007f43faa024bd
<4>[17314.873190] RDX: 0000000000000757 RSI: 00007f43e8019ec0 RDI: 0000000000000006
<4>[17314.873196] RBP: ffffc9000098bf88 R08: 00000000010d2930 R09: 0000000000000000
<4>[17314.873203] R10: 002eca2640000000 R11: 0000000000000293 R12: 000000000046a8b4
<4>[17314.873209] R13: 00007f43f7db2620 R14: 00007f43e8019c40 R15: 00007f43e8019ec0
<4>[17314.873215]  ? __this_cpu_preempt_check+0x13/0x20
<4>[17314.873220] Code: 8b 55 d0 be 0e 0c 00 00 48 c7 c7 10 50 a2 81 e8 d0 c9 02 00 e9 0f ff ff ff 4c 89 4d b8 65 ff 05 50 95 d6 7e 48 8b 05 01 9a c8 00 <e8> 34 a2 e5 ff 85 c0 4c 8b 4d b8 74 0d 80 3d 22 d6 bf 00 00 0f 
<0>[17314.875280] Kernel panic - not syncing: hung_task: blocked tasks
<4>[17314.875327] CPU: 0 PID: 24 Comm: khungtaskd Tainted: G     U  W       4.11.0-rc4-CI-CI_DRM_320+ #1
<4>[17314.875384] Hardware name:                  /NUC5CPYB, BIOS PYBSWCEL.86A.0058.2016.1102.1842 11/02/2016
<4>[17314.875442] Call Trace:
<4>[17314.875467]  dump_stack+0x67/0x92
<4>[17314.875496]  panic+0xcf/0x205
<4>[17314.875527]  watchdog+0x43f/0x600
<4>[17314.875553]  ? watchdog+0x9c/0x600
<4>[17314.875584]  kthread+0x107/0x140
<4>[17314.875610]  ? reset_hung_task_detector+0x10/0x10
<4>[17314.875644]  ? kthread_create_on_node+0x40/0x40
<4>[17314.875679]  ret_from_fork+0x2e/0x40
<0>[17314.875724] Kernel Offset: disabled
Comment 1 Tomi Sarvela 2017-03-28 15:14:16 UTC
More reachable kconfig at
https://intel-gfx-ci.01.org/CI/CI_DRM_2422/kernel.config.bz2
Comment 2 Chris Wilson 2017-03-28 15:23:22 UTC
Could be i915_gem_wait_for_idle() blocking, or the shrinker. Unclear, the hung task looks to be a symptom and not a deadlock.
Comment 3 yann 2017-03-29 07:19:15 UTC
Reference to Chris' patch: https://patchwork.freedesktop.org/series/22020/
Comment 4 Chris Wilson 2017-04-08 14:29:17 UTC
commit 8f612d055183545070ca1009ac2eb1f2e044cc20
Author: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Date:   Fri Apr 7 13:49:34 2017 +0300

    drm/i915: Don't call synchronize_rcu_expedited under struct_mutex


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.