Bug 108898 - (Recoverable) GPU hangs with GfxBench Manhattan GL tests
Summary: (Recoverable) GPU hangs with GfxBench Manhattan GL tests
Status: NEW
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/radeonsi (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: Default DRI bug account
QA Contact: Default DRI bug account
URL: https://gfxbench.com/linux-download/
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-11-29 11:36 UTC by Eero Tamminen
Modified: 2019-05-17 11:45 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Eero Tamminen 2018-11-29 11:36:19 UTC
Setup:
- FullHD monitor (through HDMI KVM)
- HadesCanyon KBL i7-8809G ([AMD/ATI] Vega [Radeon RX Vega M] (rev c0))
- Ubuntu 18.04
- drm-tip git kernel v4.20-rc4 (i.e. kernel.org v4.20-rc4 kernel + latest drm code from yesterday)
- Mesa git (c120dbfe4d) with AMD VEGAM renderer
- X server git version
- Proprietary GfxBench v5, but public GfxBench v4 should have same tests:
  http://gfxbench.com

Test-cases:
* Manhattan 3.0 offscreen:
  bin/testfw_app --gfx glfw --gl_api desktop_core --width 1920 --height 1080 --fullscreen 1 --test_id gl_manhattan_off
* Manhattan 3.1 onscreen:
  bin/testfw_app --gfx glfw --gl_api desktop_core --width 1920 --height 1080 --fullscreen 1 --test_id gl_manhattan31

Expected outcome:
* No GPU timeouts

Actual outcome:
* 1 out of 3 runs gives in dmesg:
[ 2817.689624] [drm:drm_sched_job_timedout [gpu_sched]] *ERROR* ring gfx timeout, but soft recovered

NOTE: These were happening already when we started testing this machine in mid October, with Mesa 18cc65edf8480 & drm-tip kernel v4.19-rc8.
Comment 1 Eero Tamminen 2019-02-07 09:33:56 UTC
Hangs are still happening with the latest Mesa (a203eaa4f4fb) and drm-tip kernel (v5.0-rc4) git versions:
[ 2776.782754] Iteration 3/3: bin/testfw_app --gfx glfw --gl_api desktop_core --width 1920 --height 1080 --fullscreen 1 --test_id gl_manhattan_off
[ 2845.656793] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
[ 2845.836983] Iteration 1/3: testfw_app --gfx glfw --gl_api desktop_core --width 1920 --height 1080 --fullscreen 1 --test_id gl_manhattan31
[ 2915.288863] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
[ 2915.383696] Iteration 2/3: testfw_app --gfx glfw --gl_api desktop_core --width 1920 --height 1080 --fullscreen 1 --test_id gl_manhattan31
[ 2980.104777] Iteration 3/3: bin/testfw_app --gfx glfw --gl_api desktop_core --width 1920 --height 1080 --fullscreen 1 --test_id gl_manhattan31
[ 3044.823739] Iteration 1/3: bin/testfw_app --gfx glfw --gl_api desktop_core --width 1920 --height 1080 --fullscreen 1 --test_id gl_manhattan31_off
[ 3113.432727] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
[ 3113.528273] Iteration 2/3: bin/testfw_app --gfx glfw --gl_api desktop_core --width 1920 --height 1080 --fullscreen 1 --test_id gl_manhattan31_off
Comment 2 Eero Tamminen 2019-03-06 12:09:08 UTC
Hangs are still happening with the latest Mesa (43f40dc7cb234e) and drm-tip kernel (v5.0) git versions in Manhattan test offscreen versions.
Comment 3 Eero Tamminen 2019-03-22 11:18:36 UTC
(In reply to Eero Tamminen from comment #2)
> Hangs are still happening with the latest Mesa (43f40dc7cb234e) and drm-tip
> kernel (v5.0) git versions in Manhattan test offscreen versions.

Hangs still continue with latest Mesa & drm-tip kernel.

Public version of GfxBench v4 has these same tests:
  https://gfxbench.com/result.jsp?benchmark=gfx40
  https://gfxbench.com/linux-download/

(It just doesn't support automating their running from command line.)
Comment 4 Eero Tamminen 2019-05-09 14:23:44 UTC
Any updates on this (VegaM) bug?  These recoverable hangs are still happening with git versions of kernel, Mesa and linux-firmware. Unlike with the hard hang bug 108900, this test-case is freely available.
Comment 5 Eero Tamminen 2019-05-17 11:45:46 UTC
Sometimes there's also another error message, about fences:
[ 5813.444709] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out or interrupted!
[ 5818.564819] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.