Bug 97371 - AMDGPU/Iceland amdgpu: failed testing IB on ring 9/10
Summary: AMDGPU/Iceland amdgpu: failed testing IB on ring 9/10
Status: RESOLVED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-08-16 18:40 UTC by Armin K
Modified: 2016-08-23 14:10 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Possible fix (1.00 KB, patch)
2016-08-17 12:12 UTC, Christian König
no flags Details | Splinter Review

Description Armin K 2016-08-16 18:40:22 UTC
Ever since I upgraded to 4.8 based kernel, I keep seeing the message like the one at the end. Wasn't present in 4.7. FYI, same message is present in drm-next-4.9-wip.

[   15.091381] [drm] amdgpu kernel modesetting enabled.
[   15.091394] vga_switcheroo: detected switching method \_SB_.PCI0.GFX0.ATPX handle
[   15.091496] ATPX version 1, functions 0x00000003
[   15.091567] ATPX Hybrid Graphics
[   15.146747] CRAT table not found
[   15.146748] Finished initializing topology ret=0
[   15.146762] kfd kfd: Initialized module
[   15.146945] amdgpu 0000:01:00.0: enabling device (0006 -> 0007)
[   15.147246] [drm] initializing kernel modesetting (TOPAZ 0x1002:0x6900 0x103C:0x811C 0x83).
[   15.147256] [drm] register mmio base: 0xE2000000
[   15.147257] [drm] register mmio size: 262144
[   15.147262] [drm] doorbell mmio base: 0xE0000000
[   15.147263] [drm] doorbell mmio size: 2097152
[   15.147270] [drm] probing gen 2 caps for device 8086:9d10 = 1724843/e
[   15.147272] [drm] probing mlw for device 8086:9d10 = 1724843
[   15.147276] vga_switcheroo: enabled
[   15.150833] ATOM BIOS: HP/Quanta
[   15.150847] [drm] GPU not posted. posting now...
[   15.154089] [drm] Changing default dispclk from 0Mhz to 600Mhz
[   15.217613] amdgpu 0000:01:00.0: VRAM: 2048M 0x0000000000000000 - 0x000000007FFFFFFF (2048M used)
[   15.217615] amdgpu 0000:01:00.0: GTT: 2048M 0x0000000080000000 - 0x00000000FFFFFFFF
[   15.217616] [drm] Detected VRAM RAM=2048M, BAR=256M
[   15.217617] [drm] RAM width 64bits DDR3
[   15.217688] [TTM] Zone  kernel: Available graphics memory: 4027936 kiB
[   15.217689] [TTM] Zone   dma32: Available graphics memory: 2097152 kiB
[   15.217690] [TTM] Initializing pool allocator
[   15.217709] [TTM] Initializing DMA pool allocator
[   15.217724] [drm] amdgpu: 2048M of VRAM memory ready
[   15.217724] [drm] amdgpu: 2048M of GTT memory ready.
[   15.217735] [drm] GART: num cpu pages 524288, num gpu pages 524288
[   15.218660] [drm] PCIE GART of 2048M enabled (table at 0x0000000000040000).
[   15.218689] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[   15.218689] [drm] Driver supports precise vblank timestamp query.
[   15.218720] amdgpu 0000:01:00.0: amdgpu: using MSI.
[   15.218744] [drm] amdgpu: irq initialized.
[   15.492088] iwlwifi 0000:03:00.0: L1 Enabled - LTR Enabled
[   15.510118] amdgpu 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000080000010, cpu addr 0xffff880231649010
[   15.510142] amdgpu 0000:01:00.0: fence driver on ring 1 use gpu addr 0x0000000080000020, cpu addr 0xffff880231649020
[   15.510159] amdgpu 0000:01:00.0: fence driver on ring 2 use gpu addr 0x0000000080000030, cpu addr 0xffff880231649030
[   15.510175] amdgpu 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000080000040, cpu addr 0xffff880231649040
[   15.510191] amdgpu 0000:01:00.0: fence driver on ring 4 use gpu addr 0x0000000080000050, cpu addr 0xffff880231649050
[   15.510220] amdgpu 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000080000060, cpu addr 0xffff880231649060
[   15.510238] amdgpu 0000:01:00.0: fence driver on ring 6 use gpu addr 0x0000000080000070, cpu addr 0xffff880231649070
[   15.510253] amdgpu 0000:01:00.0: fence driver on ring 7 use gpu addr 0x0000000080000080, cpu addr 0xffff880231649080
[   15.510272] amdgpu 0000:01:00.0: fence driver on ring 8 use gpu addr 0x0000000080000090, cpu addr 0xffff880231649090
[   15.557848] ieee80211 phy0: Selected rate control algorithm 'iwl-mvm-rs'
[   15.753033] amdgpu 0000:01:00.0: fence driver on ring 9 use gpu addr 0x00000000800000a0, cpu addr 0xffff8802316490a0
[   15.753116] amdgpu 0000:01:00.0: fence driver on ring 10 use gpu addr 0x00000000800000b0, cpu addr 0xffff8802316490b0
[   15.996625] [drm] ring test on 0 succeeded in 15 usecs
[   15.996840] [drm] ring test on 1 succeeded in 19 usecs
[   15.996872] [drm] ring test on 2 succeeded in 15 usecs
[   15.996883] [drm] ring test on 3 succeeded in 3 usecs
[   15.996889] [drm] ring test on 4 succeeded in 2 usecs
[   15.996913] [drm] ring test on 5 succeeded in 2 usecs
[   15.996919] [drm] ring test on 6 succeeded in 2 usecs
[   15.996925] [drm] ring test on 7 succeeded in 2 usecs
[   15.996946] [drm] ring test on 8 succeeded in 2 usecs
[   15.996989] [drm] ring test on 9 succeeded in 6 usecs
[   15.996995] [drm] ring test on 10 succeeded in 4 usecs
[   15.997193] [drm] ib test on ring 0 succeeded
[   15.997326] [drm] ib test on ring 1 succeeded
[   15.997374] [drm] ib test on ring 2 succeeded
[   15.997405] [drm] ib test on ring 3 succeeded
[   15.997435] [drm] ib test on ring 4 succeeded
[   15.997466] [drm] ib test on ring 5 succeeded
[   15.997495] [drm] ib test on ring 6 succeeded
[   15.997526] [drm] ib test on ring 7 succeeded
[   15.997556] [drm] ib test on ring 8 succeeded
[   15.997612] [drm:sdma_v2_4_ring_test_ib [amdgpu]] *ERROR* amdgpu: fence wait failed (1000).
[   15.997654] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: failed testing IB on ring 9 (1000).
[   15.997713] [drm:sdma_v2_4_ring_test_ib [amdgpu]] *ERROR* amdgpu: fence wait failed (1000).
[   15.997749] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: failed testing IB on ring 10 (1000).
Comment 1 Alex Deucher 2016-08-16 18:43:37 UTC
Can you bisect?
Comment 2 Armin K 2016-08-16 20:46:05 UTC
bbec97aae660adafa5208c5defc54e3cbbe6b129 is the first bad commit
commit bbec97aae660adafa5208c5defc54e3cbbe6b129
Author: Christian König <christian.koenig@amd.com>
Date:   Tue Jul 5 21:07:17 2016 +0200

    drm/amdgpu: add a fence timeout for the IB tests v2
    
    10ms should be enough for now.
    
    v2: fix some typos in CIK code
    
    Signed-off-by: Christian König <christian.koenig@amd.com>
    Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
    Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Comment 3 Christian König 2016-08-17 07:01:35 UTC
Yeah that is a known issue. David already raised the timeout to 1s because of this.

On the other hand I would really like to know why 10ms isn't enough for VCE to come up.
Comment 4 Armin K 2016-08-17 11:12:35 UTC
(In reply to Christian König from comment #3)
> Yeah that is a known issue. David already raised the timeout to 1s because
> of this.
> 
> On the other hand I would really like to know why 10ms isn't enough for VCE
> to come up.

Could you please point out the commit which raises the timeout? Thanks.
Comment 5 Christian König 2016-08-17 11:49:20 UTC
That's commit e0d079679705b02407cccea1f0e48bff39befce5 increase timeout of IB test.

Should be available in Alex repository.
Comment 6 Armin K 2016-08-17 11:53:36 UTC
(In reply to Christian König from comment #5)
> That's commit e0d079679705b02407cccea1f0e48bff39befce5 increase timeout of
> IB test.
> 
> Should be available in Alex repository.

patching file drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
Reversed (or previously applied) patch detected!  Skipping patch.
1 out of 1 hunk ignored -- saving rejects to file drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c.rej

That patch is applied on top of 4.8-rc2. However, I still see the problem.
Comment 7 Christian König 2016-08-17 12:12:45 UTC
Created attachment 125847 [details] [review]
Possible fix

Indeed it isn't the timeout value. Instead there is a stupid typo in the return check.

Please try the attached patch.
Comment 8 Armin K 2016-08-17 12:51:21 UTC
(In reply to Christian König from comment #7)
> Created attachment 125847 [details] [review] [review]
> Possible fix
> 
> Indeed it isn't the timeout value. Instead there is a stupid typo in the
> return check.
> 
> Please try the attached patch.

The attached patch makes the error message go away.
Comment 9 Christian König 2016-08-18 09:47:33 UTC
Good, the patch should show up in the next rc.

Thanks for testing,
Christian.
Comment 10 Dea1993 2016-08-22 12:20:21 UTC
same problem on my laptaop.
i've tested:
linux 4.8 rc1, rc2 and now also rc3.

@Christian König
"Good, the patch should show up in the next rc."
do you mean on rc4??
Comment 11 Alex Deucher 2016-08-23 14:10:38 UTC
The patch will go upstream in the -fixes pull this week.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.