Summary: | ring 0 stalled after bytes_moved_threshold reached - CAPVERDE/HD7770 - TAHITI/W9000 | ||
---|---|---|---|
Product: | DRI | Reporter: | Julien Isorce <julien.isorce> |
Component: | DRM/Radeon | Assignee: | Default DRI bug account <dri-devel> |
Status: | RESOLVED MOVED | QA Contact: | |
Severity: | normal | ||
Priority: | medium | CC: | dodgyville+freedesktop |
Version: | DRI git | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Description
Julien Isorce
2017-04-18 15:14:55 UTC
Created attachment 130902 [details]
dmesg_HD7770_kernel_amd-staging-4.9_ring_stalled
Created attachment 130903 [details]
dmesg_HD7770_kernel_amd-staging-4.9_ring_stalled
Created attachment 130904 [details]
ddebug_dumps_HD7770_kernel_amd-staging-4.9_ring_stalled
(In reply to Julien Isorce from comment #0) > In kernel radeon_object.c::radeon_bo_list_validate, once "bytes_moved > > bytes_moved_threshold" is reached (this is the case for 850 bo in the same > list_for_each_entry loop), I can see that radeon_ib_schedule emits a fence > that it takes more than the radeon.lockup_timeout to be signaled. radeon_ib_schedule is called for submitting the command stream from userspace, not for any BO moves directly, right? How did you determine that this hang is directly related to bytes_moved / bytes_moved_threshold? Maybe it's only indirectly related, e.g. due to the threshold preventing a BO from being moved to VRAM despite userspace's preference. > Also it seems the fence is signaled by swapper after more than 10 seconds > but it is too late. I requires to reduce the "15" param above to 4 to see > that. How does "swapper" (what is that exactly?) signal the fence? > Is it normal that radeon_bo_list_validate still tries to move the bo if > bytes_moved_threshold is reached ? There are circumstances where a BO has to be moved even though the threshold is reached. > Indeed ttm_bo_validate is always called ttm_bo_validate must be called for every BO referenced by the command stream from userspace for correct lifetime management of its memory. > (it blits from vram to vram). It might be worth looking into why this happens, though. If domain == current_domain == RADEON_GEM_DOMAIN_VRAM, I wouldn't expect ttm_bo_validate to trigger a blit. (In reply to Michel Dänzer from comment #4) > (In reply to Julien Isorce from comment #0) > > In kernel radeon_object.c::radeon_bo_list_validate, once "bytes_moved > > > bytes_moved_threshold" is reached (this is the case for 850 bo in the same > > list_for_each_entry loop), I can see that radeon_ib_schedule emits a fence > > that it takes more than the radeon.lockup_timeout to be signaled. > > radeon_ib_schedule is called for submitting the command stream from > userspace, not for any BO moves directly, right? > > How did you determine that this hang is directly related to bytes_moved / > bytes_moved_threshold? Maybe it's only indirectly related, e.g. due to the > threshold preventing a BO from being moved to VRAM despite userspace's > preference. > I added a trace and the fence that is not signaled on time is always the one emited by radeon_ib_schedule after that the bytes_moved_threshold is reached. But you are right it could be only indirectly related. Here is the sequence I have: ioctl_radeon_cs radeon_bo_list_validate bytes_moved > bytes_moved_threshold(=1024*1024ull) 800 bo are not moved from gtt to vram because of that. radeon_cs_ib_vm_chunk radeon_ib_schedule(rdev, &parser->ib, NULL, true); radeon_fence_emit on ring 0 r600_mmio_hdp_flush /ioctl_radeon_cs Then anything calling ttm_bo_wait will block more than the radeon.lockup_timeout because the above fence is not signaled on time. Could it be that something is not flushed properly ? (ref: https://patchwork.kernel.org/patch/5807141/ ? tlb_flush ?) Are you saying that some bos are required to be moved from gtt to vram in order for this fence to be signaled ? As you can see above it happens when vram_usage >= half_vram so radeon_bo_get_threshold_for_moves returns 1024*1024, which explains why only 1 or 2 bos can be moved from gtt to vram in that case and why all others are forced to stay in gtt. In the same run of radeon_bo_list_validate there are many calls to ttm_bo_validate with both domain and current_domain as VRAM, this is the case for around 400 bo. Maybe this cause delay for this fence to be signaled, providing vram usage is high too. > > > Also it seems the fence is signaled by swapper after more than 10 seconds > > but it is too late. I requires to reduce the "15" param above to 4 to see > > that. > > How does "swapper" (what is that exactly?) signal the fence? My wording was wrong sorry, I should have said "the first entity noticing that the fence is signaled" by calling radeon_fence_activity. swapper is the name for process 0 (idle). I change drm logging to print process name and id: (current->comm, current->pid) > > It might be worth looking into why this happens, though. If domain == > current_domain == RADEON_GEM_DOMAIN_VRAM, I wouldn't expect ttm_bo_validate > to trigger a blit. I will check though I think I get just confused by a previous trace. Created attachment 130947 [details]
dmesg_W9000_with_custom_fence_debug.log
I made 2 apitrace using Zach's test mentioned here https://bugs.freedesktop.org/show_bug.cgi?id=100465#c24 . This test is also good to reproduce this ring 0 stalled issue. 1: apitrace ideal for vram size 2048 (ex: HD7770) https://drive.google.com/file/d/0Bzat_iFKrgCWYzBlZFFLQjgyRU0/view?usp=sharing 2: apitrace ideal for vram size 6144 (ex: W9000) https://drive.google.com/file/d/0Bzat_iFKrgCWczgzM2FzaVFTUXc/view?usp=sharing DISPLAY=:0 apitrace replay thrash.trace Also I have attached the log (dmesg_W9000_with_custom_fence_debug.log) I get with my dev branch here https://github.com/CapOM/linux/commits/amd-staging-4.9_add_debug_fences where I added traces to debug ring N stalled issues. It prints the backtrace from where it waits for the fence and it also prints the backtrace from where it has emited that fence. Also note that setting R600_DEBUG=nowc avoids this ring N stalled (so the endless fence is signaled). Hack submitted here https://patchwork.kernel.org/patch/9695945/. It contains some info in the commit message and in the replies. For those who want to try, it is easier to just set R600_DEBUG=nowc as said in #7. -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/793. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.