Bug 103413 - R9285 Xonotic gpu lock since radeonsi: split si_emit_shader_pointer
Summary: R9285 Xonotic gpu lock since radeonsi: split si_emit_shader_pointer
Status: RESOLVED WORKSFORME
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/radeonsi (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact: Default DRI bug account
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-10-23 10:17 UTC by Andy Furniss
Modified: 2017-10-31 10:06 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments

Description Andy Furniss 2017-10-23 10:17:49 UTC
R9285 Tonga, probably since

36626ff radeonsi: split si_emit_shader_pointer

Xonotic big-key-bench timedemo may provoke a display/gpu lockup.

On head it seemed easy to provoke: one,two or three runs.
Bisecting was harder as it took more hence the possibility that I am on a false good.
Currently on commit before above and haven't locked so far, don't know if that's luck yet.

Lock seems to be same(ish) place in demo = frame 6512, though once it was 6514.

Waiting before doing SysRq I will get a timeout trace as below. I tried on older kernels with same result, the dirty here is just a CPU fix I need that's in rc-5.

 INFO: task gallium_drv:0:985 blocked for more than 120 seconds.
       Not tainted 4.14.0-rc3-g96687ec-dirty #1
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 gallium_drv:0   D    0   985    962 0x00000000
 Call Trace:
  __schedule+0x2ce/0x890
  ? _raw_write_unlock+0x11/0x30
  schedule+0x3b/0x90
  amd_sched_entity_push_job+0x9f/0xf0 [amdgpu]
  ? remove_wait_queue+0x80/0x80
  amdgpu_job_submit+0x9a/0xc0 [amdgpu]
  amdgpu_vm_bo_update_mapping+0x2de/0x3a0 [amdgpu]
  ? amdgpu_vm_free_mapping.isra.20+0x30/0x30 [amdgpu]
  amdgpu_vm_bo_update+0x2e8/0x6a0 [amdgpu]
  amdgpu_gem_va_ioctl+0x476/0x480 [amdgpu]
  ? amdgpu_gem_metadata_ioctl+0x1d0/0x1d0 [amdgpu]
  drm_ioctl_kernel+0x6f/0xc0 [drm]
  drm_ioctl+0x2f9/0x3c0 [drm]
  ? futex_wake+0x7c/0x140
  ? amdgpu_gem_metadata_ioctl+0x1d0/0x1d0 [amdgpu]
  ? do_futex+0x289/0xb20
  ? put_prev_entity+0xf8/0x5a0
  ? preempt_count_add+0x99/0xb0
  ? _raw_write_unlock_irqrestore+0x13/0x30
  ? _raw_spin_unlock_irqrestore+0x9/0x10
  amdgpu_drm_ioctl+0x54/0x90 [amdgpu]
  do_vfs_ioctl+0x98/0x5b0
  ? __fget+0x6e/0xa0
  SyS_ioctl+0x47/0x80
  entry_SYSCALL_64_fastpath+0x17/0x98
 RIP: 0033:0x7fee63cb8717
 RSP: 002b:00007fee567415a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
 RAX: ffffffffffffffda RBX: 00007fee2c00bbe0 RCX: 00007fee63cb8717
 RDX: 00007fee567415f0 RSI: 00000000c0286448 RDI: 000000000000000e
 RBP: 0000000000000000 R08: 0000000150400000 R09: 000000000000000e
 R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
 R13: 0000000000000000 R14: 000000000370e4a0 R15: 0000000008881590
Comment 1 Andy Furniss 2017-10-26 09:48:58 UTC
Lots of "normal" runs indicated the commit before the "bad" is OK - I've not locked doing many of

vblank_mode=0 ./xonotic-linux64-glx -benchmarkruns 20 -benchmark demos/the-big-keybench.dem

cpu & gpu set high for testing.
vblank_mode=0 or the perf settings are not required to provoke lock, they are just much faster to get the multiple runs in.

xonotic settings are ultra with aniso and aa highest, 1920x1080 fullscreen.

Unfortunately, I tried a more abnormal test = with a 2160p framebuffer + panning and like that I can still lock. Locks are again in same place, but that place is different = frame 9411.

Over time I'll try to go back further.
Comment 2 Andy Furniss 2017-10-31 10:06:27 UTC
I have no clue about any fixing commit, this turned out to be very random whether I could provoke or not and currently I can't, so closing as the bisect was clearly wrong. I guess LLVM version or getting GPU into some "state" was involved, but whatever it was this bug is not correct.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.