Bug 100663

Summary: commit 61e47d92c5196 breaks RS780
Product: Mesa Reporter: octoploid <octoploid>
Component: Drivers/Gallium/r600Assignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact: Default DRI bug account <dri-devel>
Severity: normal    
Priority: medium CC: Hi-Angel, octoploid
Version: git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:

Description octoploid 2017-04-12 12:33:25 UTC
I get frequent ring 0 stalls:
[    2.311046] [drm] radeon kernel modesetting enabled.
[    2.311245] [drm] initializing kernel modesetting (RS780 0x1002:0x9614 0x1043:0x834D 0x00).
[    2.311287] [drm] register mmio base: 0xFBEE0000
[    2.311345] [drm] register mmio size: 65536
[    2.311908] ATOM BIOS: 113
[    2.311960] radeon 0000:01:05.0: VRAM: 128M 0x00000000C0000000 - 0x00000000C7FFFFFF (128M used)
[    2.312013] radeon 0000:01:05.0: GTT: 512M 0x00000000A0000000 - 0x00000000BFFFFFFF
[    2.312054] [drm] Detected VRAM RAM=128M, BAR=128M
[    2.312093] [drm] RAM width 32bits DDR
[    2.312188] [TTM] Zone  kernel: Available graphics memory: 4079280 kiB
[    2.312228] [TTM] Zone   dma32: Available graphics memory: 2097152 kiB
[    2.312266] [TTM] Initializing pool allocator
[    2.312308] [TTM] Initializing DMA pool allocator
[    2.312360] [drm] radeon: 128M of VRAM memory ready
[    2.312398] [drm] radeon: 512M of GTT memory ready.
[    2.312439] [drm] Loading RS780 Microcode
[    2.312481] [drm] radeon: power management initialized
[    2.312521] [drm] GART: num cpu pages 131072, num gpu pages 131072
[    2.318502] [drm] PCIE GART of 512M enabled (table at 0x00000000C0040000).
[    2.318587] radeon 0000:01:05.0: WB enabled
[    2.318626] radeon 0000:01:05.0: fence driver on ring 0 use gpu addr 0x00000000a0000c00 and cpu addr 0xffff880215c61c00
[    2.318677] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    2.318715] [drm] Driver supports precise vblank timestamp query.
[    2.318753] radeon 0000:01:05.0: radeon: MSI limited to 32-bit
[    2.318804] [drm] radeon: irq initialized.
[    2.351053] [drm] ring test on 0 succeeded in 1 usecs
[    2.351374] [drm] ib test on ring 0 succeeded in 0 usecs
[    2.351560] [drm] Radeon Display Connectors
[    2.351597] [drm] Connector 0:
[    2.351635] [drm]   VGA-1
[    2.351672] [drm]   DDC: 0x7e40 0x7e40 0x7e44 0x7e44 0x7e48 0x7e48 0x7e4c 0x7e4c
[    2.351710] [drm]   Encoders:
[    2.351747] [drm]     CRT1: INTERNAL_KLDSCP_DAC1
[    2.351784] [drm] Connector 1:
[    2.351821] [drm]   DVI-D-1
[    2.351858] [drm]   HPD3
[    2.351896] [drm]   DDC: 0x7e50 0x7e50 0x7e54 0x7e54 0x7e58 0x7e58 0x7e5c 0x7e5c
[    2.351934] [drm]   Encoders:
[    2.351971] [drm]     DFP3: INTERNAL_KLDSCP_LVTMA
[    2.397754] [drm] fb mappable at 0xF0141000
[    2.397793] [drm] vram apper at 0xF0000000
[    2.397830] [drm] size 8294400
[    2.397867] [drm] fb depth is 24
[    2.397904] [drm]    pitch is 7680
[    2.398000] fbcon: radeondrmfb (fb0) is primary device
[    2.443745] Console: switching to colour frame buffer device 135x120
[    2.451569] radeon 0000:01:05.0: fb0: radeondrmfb frame buffer device
[    2.451662] [drm] Initialized radeon 2.49.0 20080528 for 0000:01:05.0 on minor 0
...
[   15.518878] random: crng init done
[  108.365295] radeon 0000:01:05.0: ring 0 stalled for more than 10483msec
[  108.365307] radeon 0000:01:05.0: GPU lockup (current fence id 0x0000000000000209 last fence id 0x000000000000020b on ring 0)
[  108.366380] radeon 0000:01:05.0: Saved 57 dwords of commands on ring 0.
[  108.366393] radeon 0000:01:05.0: GPU softreset: 0x00000009
[  108.366398] radeon 0000:01:05.0:   R_008010_GRBM_STATUS      = 0xA2533030
[  108.366402] radeon 0000:01:05.0:   R_008014_GRBM_STATUS2     = 0x00000103
[  108.366407] radeon 0000:01:05.0:   R_000E50_SRBM_STATUS      = 0x20000040
[  108.366410] radeon 0000:01:05.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[  108.366414] radeon 0000:01:05.0:   R_008678_CP_STALLED_STAT2 = 0x00000002
[  108.366418] radeon 0000:01:05.0:   R_00867C_CP_BUSY_STAT     = 0x00008084
[  108.366421] radeon 0000:01:05.0:   R_008680_CP_STAT          = 0x80018645
[  108.366425] radeon 0000:01:05.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[  108.418745] radeon 0000:01:05.0: R_008020_GRBM_SOFT_RESET=0x00007FEF
[  108.418800] radeon 0000:01:05.0: SRBM_SOFT_RESET=0x00000100
[  108.420907] radeon 0000:01:05.0:   R_008010_GRBM_STATUS      = 0xA0003030
[  108.420911] radeon 0000:01:05.0:   R_008014_GRBM_STATUS2     = 0x00000003
[  108.420915] radeon 0000:01:05.0:   R_000E50_SRBM_STATUS      = 0x20008040
[  108.420918] radeon 0000:01:05.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[  108.420922] radeon 0000:01:05.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
[  108.420925] radeon 0000:01:05.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
[  108.420928] radeon 0000:01:05.0:   R_008680_CP_STAT          = 0x80100000
[  108.420932] radeon 0000:01:05.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[  108.420939] radeon 0000:01:05.0: GPU reset succeeded, trying to resume
[  108.437117] [drm] PCIE GART of 512M enabled (table at 0x00000000C0040000).
[  108.437153] radeon 0000:01:05.0: WB enabled
[  108.437155] radeon 0000:01:05.0: fence driver on ring 0 use gpu addr 0x00000000a0000c00 and cpu addr 0xffff880215c61c00
[  108.469093] [drm] ring test on 0 succeeded in 1 usecs
[  108.478693] [drm] ib test on ring 0 succeeded in 0 usecs
[  119.005100] radeon 0000:01:05.0: ring 0 stalled for more than 10503msec
[  119.005112] radeon 0000:01:05.0: GPU lockup (current fence id 0x000000000000020d last fence id 0x000000000000020f on ring 0)
[  119.006160] radeon 0000:01:05.0: Saved 57 dwords of commands on ring 0.
[  119.006166] radeon 0000:01:05.0: GPU softreset: 0x00000009
[  119.006168] radeon 0000:01:05.0:   R_008010_GRBM_STATUS      = 0xA2533030
[  119.006169] radeon 0000:01:05.0:   R_008014_GRBM_STATUS2     = 0x00000103
[  119.006170] radeon 0000:01:05.0:   R_000E50_SRBM_STATUS      = 0x20001040
[  119.006171] radeon 0000:01:05.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[  119.006172] radeon 0000:01:05.0:   R_008678_CP_STALLED_STAT2 = 0x00000002
[  119.006173] radeon 0000:01:05.0:   R_00867C_CP_BUSY_STAT     = 0x00008080
[  119.006175] radeon 0000:01:05.0:   R_008680_CP_STAT          = 0x80038645
[  119.006176] radeon 0000:01:05.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[  119.071075] radeon 0000:01:05.0: R_008020_GRBM_SOFT_RESET=0x00007FEF
[  119.071127] radeon 0000:01:05.0: SRBM_SOFT_RESET=0x00000100
[  119.073231] radeon 0000:01:05.0:   R_008010_GRBM_STATUS      = 0xA0003030
[  119.073232] radeon 0000:01:05.0:   R_008014_GRBM_STATUS2     = 0x00000003
[  119.073233] radeon 0000:01:05.0:   R_000E50_SRBM_STATUS      = 0x20008040
[  119.073234] radeon 0000:01:05.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[  119.073235] radeon 0000:01:05.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
[  119.073237] radeon 0000:01:05.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
[  119.073238] radeon 0000:01:05.0:   R_008680_CP_STAT          = 0x80100000
[  119.073239] radeon 0000:01:05.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[  119.073243] radeon 0000:01:05.0: GPU reset succeeded, trying to resume
[  119.089455] [drm] PCIE GART of 512M enabled (table at 0x00000000C0040000).
[  119.089491] radeon 0000:01:05.0: WB enabled
[  119.089493] radeon 0000:01:05.0: fence driver on ring 0 use gpu addr 0x00000000a0000c00 and cpu addr 0xffff880215c61c00
[  119.121425] [drm] ring test on 0 succeeded in 1 usecs
[  119.139238] [drm] ib test on ring 0 succeeded in 0 usecs

It worked fine with trunk from April 9th.
Comment 1 Emil Velikov 2017-04-12 13:17:14 UTC
octoploid don't think we have many people working on r600, so if you can track down the commit that caused the regression that will be appreciated.

git bisect should be done in ~7 steps.
Comment 2 octoploid 2017-04-12 13:53:17 UTC
(In reply to Emil Velikov from comment #1)
> octoploid don't think we have many people working on r600, so if you can
> track down the commit that caused the regression that will be appreciated.
> 
> git bisect should be done in ~7 steps.

No sorry, but this would be too annoying (these GPU lockup sometimes hang the whole machine). So I would have to reboot all the time.
So take it as an FYI.
Comment 3 octoploid 2017-04-12 14:42:44 UTC
Well, I figured out a way to bisect without rebooting.

The issue started with:

61e47d92c5196bf0240e322bb1b9d305836559e3 is the first bad commit
commit 61e47d92c5196bf0240e322bb1b9d305836559e3
Author: Constantine Kharlamov <Hi-Angel@yandex.ru>
Date:   Mon Apr 10 23:04:37 2017 +0300

    r600g: get rid of dummy pixel shader
Comment 4 Marek Olšák 2017-04-12 15:50:26 UTC
I reverted the bad commit. Thanks for bisecting.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.