Bug 85204

Summary: [Radeon HD 5650] return from sleep state failed
Product: DRI Reporter: Rafael Pereira <rnp>
Component: DRM/RadeonAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium CC: florent.bondoux, johannes.hirte
Version: XOrg 6.7.0   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Kernel log
none
Kernel Config
none
lspci graphic card infos
none
Keep GART table mapped across suspend/resume
none
kernel log with patch
none
Split off separate radeon_gart_get_page_entry ASIC hook
none
Reinstate radeon_gart_restore for when the GART table is in VRAM none

Description Rafael Pereira 2014-10-19 19:21:48 UTC
Created attachment 108069 [details]
Kernel log

The system is put from sleep from GUI (X.org + KDE). After "waking up" the from sleep state the GUI is not accessible anymore and the screen keeps switching from conlose with kernel log output and a black screen.

Tested on Arch Linux, Kernel version 3.17.0, 3.17.1
Graphic card infos and kenrel log: see files attached.


Okt 19 19:18:56 fourier kernel: [drm:radeon_cs_ib_fill] *ERROR* Failed to get ib !
Okt 19 19:18:56 fourier kernel: [drm] PCIE GART of 1024M enabled (table at 0x000000000025D000).
Okt 19 19:18:56 fourier kernel: [drm] UVD initialized successfully.
Okt 19 19:18:56 fourier kernel: [drm] ring test on 0 succeeded in 1 usecs
Okt 19 19:18:56 fourier kernel: [drm] ring test on 3 succeeded in 1 usecs
Okt 19 19:18:56 fourier kernel: [drm] ring test on 5 succeeded in 1 usecs
Okt 19 19:18:56 fourier kernel: radeon 0000:01:00.0:   GRBM_STATUS               = 0x00003828
Okt 19 19:18:56 fourier kernel: radeon 0000:01:00.0:   GRBM_STATUS               = 0xA0003828
Okt 19 19:18:56 fourier kernel: radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000007
Okt 19 19:18:56 fourier kernel: radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000007
Okt 19 19:18:56 fourier kernel: radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000007
Okt 19 19:18:56 fourier kernel: radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000007
Okt 19 19:18:56 fourier kernel: radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
Okt 19 19:18:56 fourier kernel: radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
Okt 19 19:18:56 fourier kernel: radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
Okt 19 19:18:56 fourier kernel: radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00010002
Okt 19 19:18:56 fourier kernel: radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
Okt 19 19:18:56 fourier kernel: radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00020182
Okt 19 19:18:56 fourier kernel: radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x00000000
Okt 19 19:18:56 fourier kernel: radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80038647
Okt 19 19:18:56 fourier kernel: radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
Okt 19 19:18:56 fourier kernel: radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
Okt 19 19:18:56 fourier kernel: radeon 0000:01:00.0:   SRBM_STATUS               = 0x200000C0
Okt 19 19:18:56 fourier kernel: radeon 0000:01:00.0:   SRBM_STATUS               = 0x200000C0
Okt 19 19:18:56 fourier kernel: radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
Okt 19 19:18:56 fourier kernel: radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
Okt 19 19:18:56 fourier kernel: radeon 0000:01:00.0: GPU lockup (waiting for 0x000000000000140e last fence id 0x00000000000013f7 on ring 0)
Okt 19 19:18:56 fourier kernel: radeon 0000:01:00.0: GPU reset succeeded, trying to resume
Okt 19 19:18:56 fourier kernel: radeon 0000:01:00.0: GPU softreset: 0x00000008
Okt 19 19:18:56 fourier kernel: radeon 0000:01:00.0: GRBM_SOFT_RESET=0x00004001
Okt 19 19:18:56 fourier kernel: radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
Okt 19 19:18:56 fourier kernel: radeon 0000:01:00.0: Saved 1202 dwords of commands on ring 0.
Okt 19 19:18:56 fourier kernel: radeon 0000:01:00.0: WB enabled
Okt 19 19:18:56 fourier kernel: radeon 0000:01:00.0: failed to get a new IB (-35)
Okt 19 19:18:56 fourier kernel: radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffff88020cfc2c00
Okt 19 19:18:56 fourier kernel: radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffff88020cfc2c0c
Okt 19 19:18:56 fourier kernel: radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x000000000005c418 and cpu addr 0xffffc9000731c418
Okt 19 19:18:56 fourier kernel: radeon 0000:01:00.0: ffff8801fa15d000 pin failed
Okt 19 19:18:56 fourier kernel: radeon 0000:01:00.0: ring 0 stalled for more than 10203msec
Okt 19 19:19:06 fourier kernel: [drm:r600_ib_test] *ERROR* radeon: fence wait failed (-35).
Okt 19 19:19:06 fourier kernel: [drm:radeon_ib_ring_tests] *ERROR* radeon: failed testing IB on GFX ring (-35).
Okt 19 19:19:06 fourier kernel: [drm] PCIE GART of 1024M enabled (table at 0x000000000025D000).
Okt 19 19:19:06 fourier kernel: [drm] ring test on 0 succeeded in 1 usecs
Okt 19 19:19:06 fourier kernel: [drm] ring test on 3 succeeded in 1 usecs
Comment 1 Rafael Pereira 2014-10-19 19:22:36 UTC
Created attachment 108070 [details]
Kernel Config

Kernel config - 3.17.1
Comment 2 Rafael Pereira 2014-10-19 19:23:35 UTC
Created attachment 108071 [details]
lspci graphic card infos

lspci output for video card
Comment 3 Alex Deucher 2014-10-20 16:01:40 UTC
What do you mean by sleep state?  DPMS (turing the monitors off) or suspend?  Is this a regression?  If so, when was it last working and can you bisect?
Comment 4 Rafael Pereira 2014-10-21 05:59:46 UTC
By sleep state I mean the S3 sleeping state (sleep, standby Suspend to RAM and DPMS), according the ACPI standard. 

Yes, it is a regression. The last working version is 3.16.6 (3.15.x, 3.14.x and so on, didn't have this problem). I noticed the bug was first introduced on version 3.17.0. I am working on the bisect.
Comment 5 Rafael Pereira 2014-10-22 22:36:57 UTC
After bisecting, I found the first occurence (of the bug) on the following commit:

git describe --contains 43c40df2c7fedce640a6c39fcdf58764f6bbac5c
v3.17-rc1~83
Comment 6 Michel Dänzer 2014-10-23 01:25:10 UTC
(In reply to Rafael Pereira from comment #5)
> After bisecting, I found the first occurence (of the bug) on the following
> commit:
> 
> git describe --contains 43c40df2c7fedce640a6c39fcdf58764f6bbac5c


That's a merge of a linux-leds tree, so I'm afraid something went wrong during the bisection. Most likely, the problem doesn't happen every time even with affected commits, so you need to test several times before declaring a commit good.
Comment 7 Rafael Pereira 2014-10-23 07:22:58 UTC
Terrebly sorry, I did the oposite. This is the bisect result:


git bisect log
# bad: [bfe01a5ba2490f299e1d2d5508cbbbadd897bbe9] Linux 3.17
# good: [acfaf47549491e5f804d30855c9055ebeb6ecc7b] Linux 3.16.6
git bisect start 'v3.17' 'v3.16.6'
# good: [19583ca584d6f574384e17fe7613dfaeadcdc4a6] Linux 3.16
git bisect good 19583ca584d6f574384e17fe7613dfaeadcdc4a6
# good: [f2d7e4d4398092d14fb039cb4d38e502d3f019ee] checkpatch: add fix_insert_line and fix_delete_line helpers
git bisect good f2d7e4d4398092d14fb039cb4d38e502d3f019ee
# bad: [c309bfa9b481e7dbd3e1ab819271bf3009f44859] Merge tag 'for-linus-20140808' of git://git.infradead.org/linux-mtd
git bisect bad c309bfa9b481e7dbd3e1ab819271bf3009f44859

So the last GOOD commit was this one: 

git describe --contains f2d7e4d4398092d14fb039cb4d38e502d3f019ee
v3.17-rc1~94^2~11
Comment 8 Michel Dänzer 2014-10-23 08:40:31 UTC
I don't think you've actually finished the bisection yet. When it's finished, running bisect good/bad will say 'xyz is the first bad commit' followed by the log of commit xyz. When that happens, please provide that output.
Comment 9 Rafael Pereira 2014-10-27 19:10:47 UTC
@Michel Dänzer: thanks for helping.

I performed the bisection and that is result:


%git bisect log 
# bad: [bfe01a5ba2490f299e1d2d5508cbbbadd897bbe9] Linux 3.17
# good: [acfaf47549491e5f804d30855c9055ebeb6ecc7b] Linux 3.16.6
git bisect start 'v3.17' 'v3.16.6'
# good: [19583ca584d6f574384e17fe7613dfaeadcdc4a6] Linux 3.16
git bisect good 19583ca584d6f574384e17fe7613dfaeadcdc4a6
# good: [f2d7e4d4398092d14fb039cb4d38e502d3f019ee] checkpatch: add fix_insert_line and fix_delete_line helpers
git bisect good f2d7e4d4398092d14fb039cb4d38e502d3f019ee
# bad: [c309bfa9b481e7dbd3e1ab819271bf3009f44859] Merge tag 'for-linus-20140808' of git://git.infradead.org/linux-mtd
git bisect bad c309bfa9b481e7dbd3e1ab819271bf3009f44859
# good: [43c40df2c7fedce640a6c39fcdf58764f6bbac5c] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds
git bisect good 43c40df2c7fedce640a6c39fcdf58764f6bbac5c
# bad: [b3345d7c57d70e6cb6749af25cdbe80515582e99] Merge tag 'soc-for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
git bisect bad b3345d7c57d70e6cb6749af25cdbe80515582e99
# bad: [03f62abd112d5150b6ce8957fa85d4f6e85e357f] drm/radeon: split PT setup in more functions
git bisect bad 03f62abd112d5150b6ce8957fa85d4f6e85e357f
# good: [2f295791312937d077cecec4cd8f453ce2ce93df] drm/i915: Reorder ctx unref on ppgtt cleanup
git bisect good 2f295791312937d077cecec4cd8f453ce2ce93df
# good: [60b61c2f9e13cb1cfa995984d3a3dca1fdb6490d] drm/exynos/ipp: remove type casting
git bisect good 60b61c2f9e13cb1cfa995984d3a3dca1fdb6490d
# bad: [920f946428b70494eb1c64e0de260da0d8bde040] Merge branch 'tda998x-devel' of git://ftp.arm.linux.org.uk/~rmk/linux-cubox into drm-next
git bisect bad 920f946428b70494eb1c64e0de260da0d8bde040
# good: [2ee39452fa2fff1e8edb954ccb7e0daee9646557] drm: close race in connector registration (v2)
git bisect good 2ee39452fa2fff1e8edb954ccb7e0daee9646557
# good: [b544021f18931258e4e1ef8d88d43dedb73a7ecf] drm/msm: fix double struct_mutex acquire
git bisect good b544021f18931258e4e1ef8d88d43dedb73a7ecf
# good: [944fc36c31ed685cf8d3d125eb681ae7198f06fc] drm/msm: use upstream iommu
git bisect good 944fc36c31ed685cf8d3d125eb681ae7198f06fc
# good: [a2fe6cdc03d7a9b0d048a7f32f9d8827e06c67fa] drm/msm/hdmi: fix HDMI_MUX_EN gpio request typo
git bisect good a2fe6cdc03d7a9b0d048a7f32f9d8827e06c67fa
# good: [c707c3619ca81f499a5ce032021405e989a96ff0] drm/i2c: tda998x: add component support
git bisect good c707c3619ca81f499a5ce032021405e989a96ff0
# good: [eceb55a0ecd584ac5bebf667ffd8e859f261d0c0] Merge branch 'msm-next' of git://people.freedesktop.org/~robclark/linux into drm-next
git bisect good eceb55a0ecd584ac5bebf667ffd8e859f261d0c0
# first bad commit: [920f946428b70494eb1c64e0de260da0d8bde040] Merge branch 'tda998x-devel' of git://ftp.arm.linux.org.uk/~rmk/linux-cubox into drm-next
Comment 10 Michel Dänzer 2014-10-28 01:21:27 UTC
(In reply to Rafael Pereira from comment #9)
> I performed the bisection and that is result:

[...]

> # first bad commit: [920f946428b70494eb1c64e0de260da0d8bde040] Merge branch
> 'tda998x-devel' of git://ftp.arm.linux.org.uk/~rmk/linux-cubox into drm-next

Still same answer as in comment #6, I'm afraid. That commit doesn't directly affect the radeon driver AFAICT, so it can't be the cause of your problem. Please try bisecting again, but make sure you test several (at least three, but the more the better) times before running git bisect good.
Comment 11 Florent Bondoux 2014-12-25 16:16:26 UTC
I have the same bug with another GPU (Mobility HD 5430, code name is CEDAR).

Log after suspend/resume :

[  139.610475] [drm] PCIE GART of 1024M enabled (table at 0x000000000025E000).
[  139.610628] radeon 0000:01:00.0: WB enabled
[  139.610633] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffff8800b88d7c00
[  139.610635] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000020000c0c and cpu addr 0xffff8800b88d7c0c
[  139.611445] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x000000000005c418 and cpu addr 0xffffc90005c9c418
[  139.628149] [drm] ring test on 0 succeeded in 0 usecs
[  139.628174] [drm] ring test on 3 succeeded in 3 usecs
[  139.805679] [drm] ring test on 5 succeeded in 1 usecs
[  139.805683] [drm] UVD initialized successfully.
[  139.805727] [drm] ib test on ring 0 succeeded in 0 usecs
[  140.457604] [drm] ib test on ring 5 succeeded
[  152.692727] radeon 0000:01:00.0: ring 0 stalled for more than 10463msec
[  152.692737] radeon 0000:01:00.0: GPU lockup (current fence id 0x00000000000015b8 last fence id 0x00000000000016b8 on ring 0)
[  152.692786] radeon 0000:01:00.0: failed to get a new IB (-35)
[  152.692795] [drm:radeon_cs_ib_fill] *ERROR* Failed to get ib !
[  152.750259] radeon 0000:01:00.0: Saved 8183 dwords of commands on ring 0.
[  152.750281] radeon 0000:01:00.0: GPU softreset: 0x00000008
[  152.750284] radeon 0000:01:00.0:   GRBM_STATUS               = 0xA0003828
[  152.750287] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000007
[  152.750290] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000007
[  152.750293] radeon 0000:01:00.0:   SRBM_STATUS               = 0x200000C0
[  152.750296] radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
[  152.750299] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[  152.750302] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00010100
[  152.750304] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00020182
[  152.750307] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80038243
[  152.750311] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[  152.759991] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x00004001
[  152.760049] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
[  152.761216] radeon 0000:01:00.0:   GRBM_STATUS               = 0x00003828
[  152.761219] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000007
[  152.761222] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000007
[  152.761225] radeon 0000:01:00.0:   SRBM_STATUS               = 0x200000C0
[  152.761228] radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
[  152.761231] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[  152.761233] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
[  152.761236] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
[  152.761239] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x00000000
[  152.761242] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[  152.761267] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[  152.784402] [drm] PCIE GART of 1024M enabled (table at 0x000000000025E000).
[  152.784520] radeon 0000:01:00.0: WB enabled
[  152.784523] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffff8800b88d7c00
[  152.784526] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000020000c0c and cpu addr 0xffff8800b88d7c0c
[  152.785283] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x000000000005c418 and cpu addr 0xffffc90005c9c418
[  152.801936] [drm] ring test on 0 succeeded in 1 usecs
[  152.801945] [drm] ring test on 3 succeeded in 3 usecs
[  152.979461] [drm] ring test on 5 succeeded in 1 usecs
[  152.979468] [drm] UVD initialized successfully.
[  163.207936] radeon 0000:01:00.0: ring 0 stalled for more than 10234msec
[  163.207946] radeon 0000:01:00.0: GPU lockup (current fence id 0x00000000000015b9 last fence id 0x00000000000016b9 on ring 0)
[  163.208075] [drm:r600_ib_test] *ERROR* radeon: fence wait failed (-35).
[  163.208088] [drm:radeon_ib_ring_tests] *ERROR* radeon: failed testing IB on GFX ring (-35).


Bisecting gave commit [a3eb06dbca08e3fdad7039021ae03b46b215f22a] drm/radeon: Remove radeon_gart_restore()

commit a3eb06dbca08e3fdad7039021ae03b46b215f22a
Author: Michel Dänzer <michel.daenzer@amd.com>
Date:   Wed Jul 9 20:15:42 2014 +0200

    drm/radeon: Remove radeon_gart_restore()

    Doesn't seem necessary, the GART table memory should be persistent.

    Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
    Reviewed-by: Christian König <christian.koenig@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

That's the same commit as in bug #86267. The attached patches didn't help either.
Comment 12 Michel Dänzer 2014-12-26 10:01:09 UTC
Created attachment 111361 [details] [review]
Keep GART table mapped across suspend/resume

Does this patch help?

Note that it's just an incomplete proof of concept, but it should at least help narrow down the problem.
Comment 13 Florent Bondoux 2014-12-26 19:24:18 UTC
Created attachment 111376 [details]
kernel log with patch

Hello Michel,

The patch didn't work (tested on 3.18.1).
Both warnings triggered, see attached suspend/resume log.
Comment 14 Johannes Hirte 2015-01-05 18:41:28 UTC
Same problem with a Mobility Radeon HD 5470 Cedar here. But I first noticed with 3.18 kernel. 3.18-rc1 seems to be fine. I've never seen a lockup with this version. Bisect points at least to 3.18.0-rc1-00005-g031d705. I've stopped bisecting here, cause this doesn't make sense. All commits here are from v4l tree and don't look like affecting graphics. Maybe 3.18-rc1 is bad too and I only didn't hit the bug with this version.

the log shows me this with linux-3.18:

Jan  5 18:16:13 acer kernel: radeon 0000:01:00.0: ring 0 stalled for more than 10020msec
Jan  5 18:16:13 acer kernel: radeon 0000:01:00.0: GPU lockup (current fence id 0x00000000000450cf last fence id 0x00000000000450ea on ring 0)
Jan  5 18:16:13 acer kernel: radeon 0000:01:00.0: failed to get a new IB (-35)
Jan  5 18:16:13 acer kernel: [drm:radeon_cs_ib_fill] *ERROR* Failed to get ib !
Jan  5 18:52:54 acer kernel: radeon 0000:01:00.0: ring 0 stalled for more than 10020msec
Jan  5 18:52:54 acer kernel: radeon 0000:01:00.0: GPU lockup (current fence id 0x00000000000450cf last fence id 0x00000000000450ea on ring 0)
Jan  5 18:52:54 acer kernel: radeon 0000:01:00.0: failed to get a new IB (-35)
Jan  5 18:52:54 acer kernel: [drm:radeon_cs_ib_fill] *ERROR* Failed to get ib !
Jan  5 18:16:13 acer kernel: radeon 0000:01:00.0: Saved 855 dwords of commands on ring 0.
Jan  5 18:16:13 acer kernel: radeon 0000:01:00.0: GPU softreset: 0x00000008
Jan  5 18:16:13 acer kernel: radeon 0000:01:00.0:   GRBM_STATUS               = 0xA0003828
Jan  5 18:16:13 acer kernel: radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000007
Jan  5 18:16:13 acer kernel: radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000007
Jan  5 18:16:13 acer kernel: radeon 0000:01:00.0:   SRBM_STATUS               = 0x200000C0
Jan  5 18:16:13 acer kernel: radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
Jan  5 18:16:13 acer kernel: radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
Jan  5 18:16:13 acer kernel: radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00010002
Jan  5 18:16:13 acer kernel: radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00020186
Jan  5 18:16:13 acer kernel: radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80038647
Jan  5 18:16:13 acer kernel: radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
Jan  5 18:52:54 acer kernel: radeon 0000:01:00.0: Saved 855 dwords of commands on ring 0.
Jan  5 18:52:54 acer kernel: radeon 0000:01:00.0: GPU softreset: 0x00000008
Jan  5 18:52:54 acer kernel: radeon 0000:01:00.0:   GRBM_STATUS               = 0xA0003828
Jan  5 18:52:54 acer kernel: radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000007
Jan  5 18:52:54 acer kernel: radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000007
Jan  5 18:52:54 acer kernel: radeon 0000:01:00.0:   SRBM_STATUS               = 0x200000C0
Jan  5 18:52:54 acer kernel: radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
Jan  5 18:52:54 acer kernel: radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
Jan  5 18:52:54 acer kernel: radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00010002
Jan  5 18:52:54 acer kernel: radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00020186
Jan  5 18:52:54 acer kernel: radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80038647
Jan  5 18:52:54 acer kernel: radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
Jan  5 18:16:13 acer kernel: radeon 0000:01:00.0: GRBM_SOFT_RESET=0x00004001
Jan  5 18:16:13 acer kernel: radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
Jan  5 18:52:54 acer kernel: radeon 0000:01:00.0: GRBM_SOFT_RESET=0x00004001
Jan  5 18:52:54 acer kernel: radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
Jan  5 18:16:13 acer kernel: radeon 0000:01:00.0:   GRBM_STATUS               = 0x00003828
Jan  5 18:16:13 acer kernel: radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000007
Jan  5 18:16:13 acer kernel: radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000007
Jan  5 18:16:13 acer kernel: radeon 0000:01:00.0:   SRBM_STATUS               = 0x200000C0
Jan  5 18:16:13 acer kernel: radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
Jan  5 18:16:13 acer kernel: radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
Jan  5 18:16:13 acer kernel: radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
Jan  5 18:16:13 acer kernel: radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
Jan  5 18:16:13 acer kernel: radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x00000000
Jan  5 18:16:13 acer kernel: radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
Jan  5 18:16:13 acer kernel: radeon 0000:01:00.0: GPU reset succeeded, trying to resume
Jan  5 18:52:54 acer kernel: radeon 0000:01:00.0:   GRBM_STATUS               = 0x00003828
Jan  5 18:52:54 acer kernel: radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000007
Jan  5 18:52:54 acer kernel: radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000007
Jan  5 18:52:54 acer kernel: radeon 0000:01:00.0:   SRBM_STATUS               = 0x200000C0
Jan  5 18:52:54 acer kernel: radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
Jan  5 18:52:54 acer kernel: radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
Jan  5 18:52:54 acer kernel: radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
Jan  5 18:52:54 acer kernel: radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
Jan  5 18:52:54 acer kernel: radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x00000000
Jan  5 18:52:54 acer kernel: radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
Jan  5 18:52:54 acer kernel: radeon 0000:01:00.0: GPU reset succeeded, trying to resume
Jan  5 18:16:13 acer kernel: [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0
Jan  5 18:52:54 acer kernel: [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0
Jan  5 18:52:54 acer kernel: [drm] PCIE GART of 1024M enabled (table at 0x000000000025E000).
Jan  5 18:52:54 acer kernel: radeon 0000:01:00.0: WB enabled
Jan  5 18:52:54 acer kernel: radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffff8800de25bc00
Jan  5 18:52:54 acer kernel: radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000020000c0c and cpu addr 0xffff8800de25bc0c
Jan  5 18:16:13 acer kernel: [drm] PCIE GART of 1024M enabled (table at 0x000000000025E000).
Jan  5 18:16:13 acer kernel: radeon 0000:01:00.0: WB enabled
Jan  5 18:16:13 acer kernel: radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffff8800de25bc00
Jan  5 18:16:13 acer kernel: radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000020000c0c and cpu addr 0xffff8800de25bc0c
Jan  5 18:52:54 acer kernel: radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x000000000005c418 and cpu addr 0xffffc9000189c418
Jan  5 18:16:13 acer kernel: radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x000000000005c418 and cpu addr 0xffffc9000189c418
Jan  5 18:52:54 acer kernel: [drm] ring test on 0 succeeded in 1 usecs
Jan  5 18:52:54 acer kernel: [drm] ring test on 3 succeeded in 2 usecs
Jan  5 18:16:13 acer kernel: [drm] ring test on 0 succeeded in 1 usecs
Jan  5 18:16:13 acer kernel: [drm] ring test on 3 succeeded in 2 usecs
Jan  5 18:52:55 acer kernel: [drm] ring test on 5 succeeded in 1 usecs
Jan  5 18:52:55 acer kernel: [drm] UVD initialized successfully.
Jan  5 18:16:13 acer kernel: [drm] ring test on 5 succeeded in 1 usecs
Jan  5 18:16:13 acer kernel: [drm] UVD initialized successfully.
Jan  5 18:16:24 acer kernel: radeon 0000:01:00.0: ring 0 stalled for more than 10222msec
Jan  5 18:16:24 acer kernel: radeon 0000:01:00.0: GPU lockup (current fence id 0x00000000000450d3 last fence id 0x00000000000450ea on ring 0)
Jan  5 18:53:05 acer kernel: radeon 0000:01:00.0: ring 0 stalled for more than 10222msec
Jan  5 18:53:05 acer kernel: radeon 0000:01:00.0: GPU lockup (current fence id 0x00000000000450d3 last fence id 0x00000000000450ea on ring 0)
Jan  5 18:53:05 acer kernel: radeon 0000:01:00.0: ring 0 stalled for more than 10723msec
Jan  5 18:53:05 acer kernel: radeon 0000:01:00.0: GPU lockup (current fence id 0x00000000000450d3 last fence id 0x00000000000450ea on ring 0)
Jan  5 18:16:24 acer kernel: radeon 0000:01:00.0: ring 0 stalled for more than 10723msec
Jan  5 18:16:24 acer kernel: radeon 0000:01:00.0: GPU lockup (current fence id 0x00000000000450d3 last fence id 0x00000000000450ea on ring 0)
Jan  5 18:16:25 acer kernel: [drm:r600_ib_test] *ERROR* radeon: fence wait failed (-35).
Jan  5 18:16:25 acer kernel: [drm:radeon_ib_ring_tests] *ERROR* radeon: failed testing IB on GFX ring (-35).
Jan  5 18:53:06 acer kernel: [drm:r600_ib_test] *ERROR* radeon: fence wait failed (-35).
Jan  5 18:53:06 acer kernel: [drm:radeon_ib_ring_tests] *ERROR* radeon: failed testing IB on GFX ring (-35).
Jan  5 18:53:07 acer kernel: radeon 0000:01:00.0: GPU softreset: 0x00000008
Jan  5 18:53:07 acer kernel: radeon 0000:01:00.0:   GRBM_STATUS               = 0xA0003828
Jan  5 18:53:07 acer kernel: radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000007
Jan  5 18:53:07 acer kernel: radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000007
Jan  5 18:53:07 acer kernel: radeon 0000:01:00.0:   SRBM_STATUS               = 0x200000C0
Jan  5 18:53:07 acer kernel: radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
Jan  5 18:53:07 acer kernel: radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
Jan  5 18:53:07 acer kernel: radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00010002
Jan  5 18:53:07 acer kernel: radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00020186
Jan  5 18:53:07 acer kernel: radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80038647
Jan  5 18:16:26 acer kernel: radeon 0000:01:00.0: GPU softreset: 0x00000008
Jan  5 18:16:26 acer kernel: radeon 0000:01:00.0:   GRBM_STATUS               = 0xA0003828
Jan  5 18:16:26 acer kernel: radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000007
Jan  5 18:16:26 acer kernel: radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000007
Jan  5 18:16:26 acer kernel: radeon 0000:01:00.0:   SRBM_STATUS               = 0x200000C0
Jan  5 18:16:26 acer kernel: radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
Jan  5 18:16:26 acer kernel: radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
Jan  5 18:16:26 acer kernel: radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00010002
Jan  5 18:16:26 acer kernel: radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00020186
Jan  5 18:16:26 acer kernel: radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80038647
Jan  5 18:53:07 acer kernel: radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
Jan  5 18:16:26 acer kernel: radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
Jan  5 18:53:07 acer kernel: radeon 0000:01:00.0: GRBM_SOFT_RESET=0x00004001
Jan  5 18:53:07 acer kernel: radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
Jan  5 18:16:26 acer kernel: radeon 0000:01:00.0: GRBM_SOFT_RESET=0x00004001
Jan  5 18:16:26 acer kernel: radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
Jan  5 18:53:07 acer kernel: radeon 0000:01:00.0:   GRBM_STATUS               = 0x00003828
Jan  5 18:53:07 acer kernel: radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000007
Jan  5 18:53:07 acer kernel: radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000007
Jan  5 18:53:07 acer kernel: radeon 0000:01:00.0:   SRBM_STATUS               = 0x200000C0
Jan  5 18:53:07 acer kernel: radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
Jan  5 18:53:07 acer kernel: radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
Jan  5 18:53:07 acer kernel: radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
Jan  5 18:53:07 acer kernel: radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
Jan  5 18:53:07 acer kernel: radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x00000000
Jan  5 18:53:07 acer kernel: radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
Jan  5 18:16:26 acer kernel: radeon 0000:01:00.0:   GRBM_STATUS               = 0x00003828
Jan  5 18:16:26 acer kernel: radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000007
Jan  5 18:16:26 acer kernel: radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000007
Jan  5 18:16:26 acer kernel: radeon 0000:01:00.0:   SRBM_STATUS               = 0x200000C0
Jan  5 18:16:26 acer kernel: radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
Jan  5 18:16:26 acer kernel: radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
Jan  5 18:16:26 acer kernel: radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
Jan  5 18:16:26 acer kernel: radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
Jan  5 18:16:26 acer kernel: radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x00000000
Jan  5 18:16:26 acer kernel: radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
Jan  5 18:53:07 acer kernel: radeon 0000:01:00.0: GPU reset succeeded, trying to resume
Jan  5 18:16:26 acer kernel: radeon 0000:01:00.0: GPU reset succeeded, trying to resume
Jan  5 18:53:07 acer kernel: [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0
Jan  5 18:16:26 acer kernel: [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0
Jan  5 18:53:07 acer kernel: [drm] PCIE GART of 1024M enabled (table at 0x000000000025E000).
Jan  5 18:53:07 acer kernel: radeon 0000:01:00.0: WB enabled
Jan  5 18:53:07 acer kernel: radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffff8800de25bc00
Jan  5 18:53:07 acer kernel: radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000020000c0c and cpu addr 0xffff8800de25bc0c
Jan  5 18:16:26 acer kernel: [drm] PCIE GART of 1024M enabled (table at 0x000000000025E000).
Jan  5 18:16:26 acer kernel: radeon 0000:01:00.0: WB enabled
Jan  5 18:16:26 acer kernel: radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffff8800de25bc00
Jan  5 18:16:26 acer kernel: radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000020000c0c and cpu addr 0xffff8800de25bc0c
Jan  5 18:53:07 acer kernel: radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x000000000005c418 and cpu addr 0xffffc9000189c418
Jan  5 18:16:26 acer kernel: radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x000000000005c418 and cpu addr 0xffffc9000189c418
Jan  5 18:16:26 acer kernel: [drm] ring test on 0 succeeded in 1 usecs
Jan  5 18:16:26 acer kernel: [drm] ring test on 3 succeeded in 2 usecs
Jan  5 18:53:07 acer kernel: [drm] ring test on 0 succeeded in 1 usecs
Jan  5 18:53:07 acer kernel: [drm] ring test on 3 succeeded in 2 usecs
Jan  5 18:16:26 acer kernel: [drm] ring test on 5 succeeded in 1 usecs
Jan  5 18:16:26 acer kernel: [drm] UVD initialized successfully.
Jan  5 18:53:07 acer kernel: [drm] ring test on 5 succeeded in 1 usecs
Jan  5 18:53:07 acer kernel: [drm] UVD initialized successfully.
Jan  5 18:53:08 acer kernel: [drm] ib test on ring 0 succeeded in 0 usecs
Jan  5 18:53:08 acer kernel: [drm] ib test on ring 3 succeeded in 0 usecs
Jan  5 18:16:27 acer kernel: [drm] ib test on ring 0 succeeded in 0 usecs
Jan  5 18:16:27 acer kernel: [drm] ib test on ring 3 succeeded in 0 usecs
Comment 15 Alex Deucher 2015-01-05 20:56:32 UTC
Please take the bisect to completion.
Comment 16 Johannes Hirte 2015-01-05 21:38:43 UTC
(In reply to Alex Deucher from comment #15)
> Please take the bisect to completion.

Will do so. This will take some time. From observation it needs more suspend/resume-cycles for every step closer to the bad commit. 3.18 fails in the first place every time, for 3.18.0-rc1-00005-g031d705 it took more than 10 trials to trigger the bug.
Comment 17 Johannes Hirte 2015-01-07 16:44:54 UTC
Ok, 3.18-rc1 is affected too. Searching for a good version to restart bisecting.
Comment 18 Michel Dänzer 2015-01-20 10:15:24 UTC
Created attachment 112529 [details] [review]
Split off separate radeon_gart_get_page_entry ASIC hook
Comment 19 Michel Dänzer 2015-01-20 10:16:02 UTC
Created attachment 112530 [details] [review]
Reinstate radeon_gart_restore for when the GART table is in VRAM

Do these two patches help?
Comment 20 Florent Bondoux 2015-01-20 19:33:59 UTC
Yes they do help!
The patches solve the bug for me on 3.19.0-rc5.

Thanks
Comment 21 Johannes Hirte 2015-01-20 21:30:51 UTC
The last two patches fix the problem for me too.
Comment 22 Rafael Pereira 2015-01-22 21:32:41 UTC
Tested here on v3.19-rc5 + patches, and (after 15 trials) no sign of the bug.
Comment 23 Michel Dänzer 2015-01-23 03:31:06 UTC
The fix is queued up in Alex Deucher's -fixes tree.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.