Bug 91881 - regression: GPU lockups since mesa-11.0.0_rc1 on RV620 (r600) driver
Summary: regression: GPU lockups since mesa-11.0.0_rc1 on RV620 (r600) driver
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/r600 (show other bugs)
Version: 11.0
Hardware: Other All
: medium normal
Assignee: Default DRI bug account
QA Contact: Default DRI bug account
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-09-04 15:41 UTC by markus gapp
Modified: 2015-09-11 02:53 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
possible fix (3.93 KB, patch)
2015-09-06 14:41 UTC, Marek Olšák
Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description markus gapp 2015-09-04 15:41:27 UTC
Hi

On a Gentoo System I experience repeated random segfaults of mostly kde x11 applications and xorg freezes since updrading mesa from 10.x.y to 11.0.0_rc[12].

mesa was compiled with (gentoo use flags): classic dri3 egl gallium gbm gles1 gles2 nptl osmesa udev xa xvmc

llvm was compiled with (gentoo use flags): clang libffi ncurses python static-analyzer xml

llvm 3.7.0 & 3.6.2 (no difference)
kernel 4.2.0 & 4.1.6 (no difference)
librm 2.6.64

xorg log is silent.

dmesg says:
[    4.478020] [drm] ring test on 5 succeeded in 1 usecs
[    4.478028] [drm] UVD initialized successfully.
[    4.478234] [drm] ib test on ring 0 succeeded in 0 usecs
[    4.619151] BTRFS: device fsid ac3f8e32-b09b-4b1f-957e-2c6aeaeb3378 devid 1 transid 64035 /dev/sde7
[    4.728599] BTRFS: device fsid 5675a9c3-7faf-4dc3-b39a-f0c8a62c3110 devid 1 transid 4068 /dev/sde6
[    4.783491] firewire_core 0000:01:04.0: created device fw0: GUID 0030bd051503fa03, S400
[    4.797151] BTRFS: device fsid 375af032-c3e6-48a7-8d41-fc8cbb2d0d93 devid 1 transid 8280 /dev/sde8
[    4.870681] BTRFS: device fsid ad4d8109-d62f-46ab-b06b-2f214b7ab074 devid 1 transid 1050393 /dev/sde2
[    4.888326] BTRFS: device fsid b3287a1f-2ae1-4840-8d92-6d0a601be91d devid 1 transid 395285 /dev/sde5
[    4.889676] BTRFS: device fsid c15ef5ed-b111-4859-9d7f-9f224b3b8071 devid 1 transid 3666 /dev/sde9
[    4.915208] BTRFS: device fsid bc774d82-73a2-4658-a01f-565c1e08475e devid 1 transid 324575 /dev/sde3
[    5.126696] [drm] ib test on ring 5 succeeded
[    5.127201] [drm] Radeon Display Connectors
[    5.127203] [drm] Connector 0:
[    5.127204] [drm]   DIN-1
[    5.127205] [drm]   Encoders:
[    5.127206] [drm]     TV1: INTERNAL_KLDSCP_DAC2
[    5.127207] [drm] Connector 1:
[    5.127208] [drm]   VGA-1
[    5.127210] [drm]   DDC: 0x7e40 0x7e40 0x7e44 0x7e44 0x7e48 0x7e48 0x7e4c 0x7e4c
[    5.127211] [drm]   Encoders:
[    5.127212] [drm]     CRT1: INTERNAL_KLDSCP_DAC1
[    5.127213] [drm] Connector 2:
[    5.127213] [drm]   DVI-I-1
[    5.127214] [drm]   HPD1
[    5.127216] [drm]   DDC: 0x7e50 0x7e50 0x7e54 0x7e54 0x7e58 0x7e58 0x7e5c 0x7e5c
[    5.127217] [drm]   Encoders:
[    5.127218] [drm]     CRT2: INTERNAL_KLDSCP_DAC2
[    5.127219] [drm]     DFP1: INTERNAL_KLDSCP_LVTMA
[    5.178330] [drm] fb mappable at 0xD0355000
[    5.178332] [drm] vram apper at 0xD0000000
[    5.178333] [drm] size 5242880
[    5.178334] [drm] fb depth is 24
[    5.178335] [drm]    pitch is 5120
[    5.178436] fbcon: radeondrmfb (fb0) is primary device
[    5.191187] Console: switching to colour frame buffer device 160x64
[    5.195585] radeon 0000:80:00.0: fb0: radeondrmfb frame buffer device
[    5.195587] radeon 0000:80:00.0: registered panic notifier
[    5.206687] [drm] Initialized radeon 2.43.0 20080528 for 0000:80:00.0 on minor 0

[..snip..]

[   64.896681] radeon 0000:80:00.0: ring 0 stalled for more than 10380msec
[   64.896686] radeon 0000:80:00.0: GPU lockup (current fence id 0x0000000000000381 last fence id 0x00000000000003b6 on ring 0)
[   64.907310] radeon 0000:80:00.0: Saved 1689 dwords of commands on ring 0.
[   64.907322] radeon 0000:80:00.0: GPU softreset: 0x00000009
[   64.907324] radeon 0000:80:00.0:   R_008010_GRBM_STATUS      = 0xE5700030
[   64.907327] radeon 0000:80:00.0:   R_008014_GRBM_STATUS2     = 0x00110103
[   64.907329] radeon 0000:80:00.0:   R_000E50_SRBM_STATUS      = 0x200000C0
[   64.907331] radeon 0000:80:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[   64.907333] radeon 0000:80:00.0:   R_008678_CP_STALLED_STAT2 = 0x00008002
[   64.907336] radeon 0000:80:00.0:   R_00867C_CP_BUSY_STAT     = 0x00008086
[   64.907338] radeon 0000:80:00.0:   R_008680_CP_STAT          = 0x80018645
[   64.907340] radeon 0000:80:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[   64.958304] radeon 0000:80:00.0: R_008020_GRBM_SOFT_RESET=0x00007FEF
[   64.958358] radeon 0000:80:00.0: SRBM_SOFT_RESET=0x00000100
[   64.960463] radeon 0000:80:00.0:   R_008010_GRBM_STATUS      = 0xA0003030
[   64.960465] radeon 0000:80:00.0:   R_008014_GRBM_STATUS2     = 0x00000003
[   64.960468] radeon 0000:80:00.0:   R_000E50_SRBM_STATUS      = 0x200080C0
[   64.960470] radeon 0000:80:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[   64.960472] radeon 0000:80:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
[   64.960474] radeon 0000:80:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
[   64.960476] radeon 0000:80:00.0:   R_008680_CP_STAT          = 0x80100000
[   64.960479] radeon 0000:80:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[   64.960485] radeon 0000:80:00.0: GPU reset succeeded, trying to resume
[   64.976284] [drm] PCIE gen 2 link speeds already enabled
[   64.978076] [drm] PCIE GART of 512M enabled (table at 0x0000000000254000).
[   64.978101] radeon 0000:80:00.0: WB enabled
[   64.978105] radeon 0000:80:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffff8800c9485c00
[   64.978783] radeon 0000:80:00.0: fence driver on ring 5 use gpu addr 0x00000000000521d0 and cpu addr 0xffffc900010121d0
[   65.010310] [drm] ring test on 0 succeeded in 1 usecs
[   65.186359] [drm] ring test on 5 succeeded in 1 usecs
[   65.186365] [drm] UVD initialized successfully.
[   65.206723] [drm] ib test on ring 0 succeeded in 0 usecs
[   65.856693] [drm] ib test on ring 5 succeeded
[   76.496682] radeon 0000:80:00.0: ring 0 stalled for more than 10496msec
[   76.496687] radeon 0000:80:00.0: GPU lockup (current fence id 0x00000000000003d9 last fence id 0x00000000000003f5 on ring 0)
[   76.996678] radeon 0000:80:00.0: ring 0 stalled for more than 10996msec
[   76.996681] radeon 0000:80:00.0: GPU lockup (current fence id 0x00000000000003d9 last fence id 0x00000000000003f6 on ring 0)
[   77.496676] radeon 0000:80:00.0: ring 0 stalled for more than 11496msec
[   77.496679] radeon 0000:80:00.0: GPU lockup (current fence id 0x00000000000003d9 last fence id 0x00000000000003f7 on ring 0)
[   77.996674] radeon 0000:80:00.0: ring 0 stalled for more than 11996msec
[   77.996677] radeon 0000:80:00.0: GPU lockup (current fence id 0x00000000000003d9 last fence id 0x00000000000003f8 on ring 0)
[   78.496673] radeon 0000:80:00.0: ring 0 stalled for more than 12496msec
[   78.496676] radeon 0000:80:00.0: GPU lockup (current fence id 0x00000000000003d9 last fence id 0x00000000000003f9 on ring 0)
[   78.996677] radeon 0000:80:00.0: ring 0 stalled for more than 12996msec
[   78.996680] radeon 0000:80:00.0: GPU lockup (current fence id 0x00000000000003d9 last fence id 0x00000000000003fa on ring 0)
[   79.496674] radeon 0000:80:00.0: ring 0 stalled for more than 13496msec
[   79.496677] radeon 0000:80:00.0: GPU lockup (current fence id 0x00000000000003d9 last fence id 0x00000000000003fb on ring 0)
[   79.996675] radeon 0000:80:00.0: ring 0 stalled for more than 13996msec
[   79.996678] radeon 0000:80:00.0: GPU lockup (current fence id 0x00000000000003d9 last fence id 0x00000000000003fc on ring 0)
[   80.496691] radeon 0000:80:00.0: ring 0 stalled for more than 14496msec
[   80.496697] radeon 0000:80:00.0: GPU lockup (current fence id 0x00000000000003d9 last fence id 0x00000000000003fd on ring 0)
[   80.996679] radeon 0000:80:00.0: ring 0 stalled for more than 14996msec
[   80.996682] radeon 0000:80:00.0: GPU lockup (current fence id 0x00000000000003d9 last fence id 0x00000000000003fe on ring 0)
[   81.496673] radeon 0000:80:00.0: ring 0 stalled for more than 15496msec
[   81.496677] radeon 0000:80:00.0: GPU lockup (current fence id 0x00000000000003d9 last fence id 0x00000000000003ff on ring 0)
[   81.996685] radeon 0000:80:00.0: ring 0 stalled for more than 15996msec
[   81.996690] radeon 0000:80:00.0: GPU lockup (current fence id 0x00000000000003d9 last fence id 0x0000000000000400 on ring 0)
[   82.496692] radeon 0000:80:00.0: ring 0 stalled for more than 16496msec
[   82.496697] radeon 0000:80:00.0: GPU lockup (current fence id 0x00000000000003d9 last fence id 0x0000000000000401 on ring 0)
[   82.996682] radeon 0000:80:00.0: ring 0 stalled for more than 16996msec
[   82.996686] radeon 0000:80:00.0: GPU lockup (current fence id 0x00000000000003d9 last fence id 0x0000000000000402 on ring 0)
[   83.496678] radeon 0000:80:00.0: ring 0 stalled for more than 17496msec
[   83.496681] radeon 0000:80:00.0: GPU lockup (current fence id 0x00000000000003d9 last fence id 0x0000000000000403 on ring 0)
[   83.996673] radeon 0000:80:00.0: ring 0 stalled for more than 17996msec
[   83.996675] radeon 0000:80:00.0: GPU lockup (current fence id 0x00000000000003d9 last fence id 0x0000000000000404 on ring 0)
[   84.380875] radeon 0000:80:00.0: Saved 1433 dwords of commands on ring 0.
[   84.380886] radeon 0000:80:00.0: GPU softreset: 0x00000009
[   84.380889] radeon 0000:80:00.0:   R_008010_GRBM_STATUS      = 0xA1703030
[   84.380891] radeon 0000:80:00.0:   R_008014_GRBM_STATUS2     = 0x00000103
[   84.380893] radeon 0000:80:00.0:   R_000E50_SRBM_STATUS      = 0x200000C0
[   84.380896] radeon 0000:80:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[   84.380898] radeon 0000:80:00.0:   R_008678_CP_STALLED_STAT2 = 0x00008002
[   84.380900] radeon 0000:80:00.0:   R_00867C_CP_BUSY_STAT     = 0x00008086
[   84.380902] radeon 0000:80:00.0:   R_008680_CP_STAT          = 0x80018645
[   84.380905] radeon 0000:80:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[   84.446444] radeon 0000:80:00.0: R_008020_GRBM_SOFT_RESET=0x00007FEF
[   84.446498] radeon 0000:80:00.0: SRBM_SOFT_RESET=0x00000100
[   84.448603] radeon 0000:80:00.0:   R_008010_GRBM_STATUS      = 0xA0003030
[   84.448605] radeon 0000:80:00.0:   R_008014_GRBM_STATUS2     = 0x00000003
[   84.448607] radeon 0000:80:00.0:   R_000E50_SRBM_STATUS      = 0x200080C0
[   84.448610] radeon 0000:80:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[   84.448612] radeon 0000:80:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
[   84.448614] radeon 0000:80:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
[   84.448616] radeon 0000:80:00.0:   R_008680_CP_STAT          = 0x80100000
[   84.448618] radeon 0000:80:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[   84.448625] radeon 0000:80:00.0: GPU reset succeeded, trying to resume
[   84.464433] [drm] PCIE gen 2 link speeds already enabled
[   84.466240] [drm] PCIE GART of 512M enabled (table at 0x0000000000254000).
[   84.466265] radeon 0000:80:00.0: WB enabled
[   84.466268] radeon 0000:80:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffff8800c9485c00
[   84.466978] radeon 0000:80:00.0: fence driver on ring 5 use gpu addr 0x00000000000521d0 and cpu addr 0xffffc900010121d0
[   84.498510] [drm] ring test on 0 succeeded in 1 usecs
[   84.674557] [drm] ring test on 5 succeeded in 1 usecs
[   84.674563] [drm] UVD initialized successfully.
[   84.696700] [drm] ib test on ring 0 succeeded in 0 usecs
[   85.346679] [drm] ib test on ring 5 succeeded


thank you very much

markus
Comment 1 markus gapp 2015-09-04 15:42:49 UTC
video card is:

lspci -v:

80:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] RV620 LE [Radeon HD 3450] (prog-if 00 [VGA controller])
        Subsystem: ASUSTeK Computer Inc. RV620 LE [Radeon HD 3450]
        Flags: bus master, fast devsel, latency 0, IRQ 36
        Memory at d0000000 (64-bit, prefetchable) [size=256M]
        Memory at f0000000 (64-bit, non-prefetchable) [size=64K]
        I/O ports at 1000 [size=256]
        [virtual] Expansion ROM at f0020000 [disabled] [size=128K]
        Capabilities: [50] Power Management version 3
        Capabilities: [58] Express Legacy Endpoint, MSI 00
        Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
        Kernel driver in use: radeon
        Kernel modules: radeon
Comment 2 Alex Deucher 2015-09-04 15:48:59 UTC
r600 does not use llvm for graphics so that doesn't matter.  What version of 10.x was working?  Can you bisect?
Comment 3 markus gapp 2015-09-05 20:44:09 UTC
git bisect found commit 29aaab2b5f55cc6d9a84f58ce2bb8607e76a9dde as the culprit. reverting it makes 11.0.0-rc2 work on my hw. for some reason it does not like 
commit 29aaab2b5f55cc6d9a84f58ce2bb8607e76a9dde
Author: Grigori Goronzy <greg@chown.ath.cx>
Date:   Wed Jun 24 03:38:02 2015 +0200

    winsys/radeon: align BO size to page size
    
    This is the basic granularity for BO allocations. The alignment also
    helps with BO reuse by the cached bufmgr.
    
    This results in a huge 45% speedup in Metro 2033 Redux on my test
    system. The game relies on buffer orphaning with very small buffers
    (hundreds of bytes in size) and that did not work efficiently
    before. This change may also affect other applications and games.
    
    Reviewed-by: Marek Olšák <marek.olsak@amd.com>


thank you!! markus
Comment 4 Marek Olšák 2015-09-06 14:41:29 UTC
Created attachment 118101 [details] [review]
possible fix

Would you please try this patch?
Comment 5 markus gapp 2015-09-06 19:00:54 UTC
Thank you, Marek, your patch applied on 11.0.0-rc2 fixes my problem. Feel free to close this one as resoved.
Great work!!

markus
Comment 6 Michel Dänzer 2015-09-11 02:53:33 UTC
Module: Mesa
Branch: master
Commit: 5c6c5b524649997805d0128d4df9dda5e8567cbb
URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=5c6c5b524649997805d0128d4df9dda5e8567cbb

Author: Marek Olšák <marek.olsak@amd.com>
Date:   Sun Sep  6 16:40:21 2015 +0200

r600g: use pipe_resource::width0 instead pb_buffer::size


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct.