Bug 93895 - GPU lockup on AMD A4-3400 APU when starting X server on opensource drivers. (works fine with fglrx)
Summary: GPU lockup on AMD A4-3400 APU when starting X server on opensource drivers. (...
Status: RESOLVED MOVED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Radeon (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium critical
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-01-28 01:21 UTC by Azari
Modified: 2019-11-19 09:12 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
dmesg log from the machine in question. (53.24 KB, text/plain)
2016-01-28 03:52 UTC, Azari
no flags Details
Xorg log file from the machine in question. (39.20 KB, text/plain)
2016-01-28 03:53 UTC, Azari
no flags Details
lspci output with pci IDs for everything, from the machine in question. (2.84 KB, text/plain)
2016-01-28 03:56 UTC, Azari
no flags Details
possible fix (1.32 KB, patch)
2016-01-28 04:45 UTC, Alex Deucher
no flags Details | Splinter Review
startx with 'exec twm' in .xinitrc right after bootup; causes lockup. (39.64 KB, text/plain)
2016-02-02 01:15 UTC, Azari
no flags Details
second attempt after lockup; startx with 'exec twm' in .xinitrc suddenly works fine. (38.05 KB, text/plain)
2016-02-02 01:18 UTC, Azari
no flags Details

Description Azari 2016-01-28 01:21:28 UTC
I've had this lockup on this machine for the past few years, across several different kernel versions, different distributions, etc. Booting with KMS works fine, but the second a graphical environment starts (whether X or wayland-based), it locks up.

Booting Ubuntu with user-space mode-setting works, and then I can install FGLRX from there and everything works fine. After speaking with airlied on IRC, they suggested it could be a workaround that AMD has put into FGLRX that never made it into the opensource drivers, and that AMD might have to look into it.

CPU/GPU     : A4-3400
Motherboard : GA-A75M-D2H  ( http://www.gigabyte.com/products/product-page.aspx?pid=3930#ov )

journalctl log of the lockup:

------------------------------------------------------------

    Jan 27 18:28:32 miku dbus-daemon[374]: Successfully activated service 'org.freedesktop.systemd1'
    Jan 27 18:28:42 miku kernel: radeon 0000:00:01.0: ring 0 stalled for more than 10000msec
    Jan 27 18:28:42 miku kernel: radeon 0000:00:01.0: GPU lockup (current fence id 0x0000000000000001 last fence id 0x0000000000000003 on ring 0)
    Jan 27 18:28:42 miku kernel: radeon 0000:00:01.0: Saved 55 dwords of commands on ring 0.
    Jan 27 18:28:42 miku kernel: radeon 0000:00:01.0: GPU softreset: 0x00000009
    Jan 27 18:28:42 miku kernel: radeon 0000:00:01.0:   GRBM_STATUS               = 0xB1403828
    Jan 27 18:28:42 miku kernel: radeon 0000:00:01.0:   GRBM_STATUS_SE0           = 0x28000007
    Jan 27 18:28:42 miku kernel: radeon 0000:00:01.0:   GRBM_STATUS_SE1           = 0x00000007
    Jan 27 18:28:42 miku kernel: radeon 0000:00:01.0:   SRBM_STATUS               = 0x20000840
    Jan 27 18:28:42 miku kernel: radeon 0000:00:01.0:   SRBM_STATUS2              = 0x00000000
    Jan 27 18:28:42 miku kernel: radeon 0000:00:01.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
    Jan 27 18:28:42 miku kernel: radeon 0000:00:01.0:   R_008678_CP_STALLED_STAT2 = 0x40000000
    Jan 27 18:28:42 miku kernel: radeon 0000:00:01.0:   R_00867C_CP_BUSY_STAT     = 0x00008000
    Jan 27 18:28:42 miku kernel: radeon 0000:00:01.0:   R_008680_CP_STAT          = 0x80228643
    Jan 27 18:28:42 miku kernel: radeon 0000:00:01.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
    Jan 27 18:28:42 miku kernel: radeon 0000:00:01.0: GRBM_SOFT_RESET=0x00007F6B
    Jan 27 18:28:42 miku kernel: radeon 0000:00:01.0: SRBM_SOFT_RESET=0x00000100
    Jan 27 18:28:42 miku kernel: radeon 0000:00:01.0:   GRBM_STATUS               = 0x00003828
    Jan 27 18:28:42 miku kernel: radeon 0000:00:01.0:   GRBM_STATUS_SE0           = 0x00000007
    Jan 27 18:28:42 miku kernel: radeon 0000:00:01.0:   GRBM_STATUS_SE1           = 0x00000007
    Jan 27 18:28:42 miku kernel: radeon 0000:00:01.0:   SRBM_STATUS               = 0x20000040
    Jan 27 18:28:42 miku kernel: radeon 0000:00:01.0:   SRBM_STATUS2              = 0x00000000
    Jan 27 18:28:42 miku kernel: radeon 0000:00:01.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
    Jan 27 18:28:42 miku kernel: radeon 0000:00:01.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
    Jan 27 18:28:42 miku kernel: radeon 0000:00:01.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
    Jan 27 18:28:42 miku kernel: radeon 0000:00:01.0:   R_008680_CP_STAT          = 0x00000000
    Jan 27 18:28:42 miku kernel: radeon 0000:00:01.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
    Jan 27 18:28:42 miku kernel: radeon 0000:00:01.0: GPU reset succeeded, trying to resume
    Jan 27 18:28:42 miku kernel: [drm] Found smc ucode version: 0x00011100
    Jan 27 18:28:42 miku kernel: [drm] PCIE GART of 1024M enabled (table at 0x0000000000274000).
    Jan 27 18:28:42 miku kernel: radeon 0000:00:01.0: WB enabled
    Jan 27 18:28:42 miku kernel: radeon 0000:00:01.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffff8800c613fc00
    Jan 27 18:28:42 miku kernel: radeon 0000:00:01.0: fence driver on ring 3 use gpu addr 0x0000000020000c0c and cpu addr 0xffff8800c613fc0c
    Jan 27 18:28:42 miku kernel: radeon 0000:00:01.0: fence driver on ring 5 use gpu addr 0x0000000000072118 and cpu addr 0xffffc90002432118
    Jan 27 18:28:42 miku kernel: [drm] ring test on 0 succeeded in 1 usecs
    Jan 27 18:28:42 miku kernel: [drm] ring test on 3 succeeded in 3 usecs
    Jan 27 18:28:42 miku kernel: [drm] ring test on 5 succeeded in 1 usecs
    Jan 27 18:28:42 miku kernel: [drm] UVD initialized successfully.
    Jan 27 18:28:52 miku kernel: radeon 0000:00:01.0: ring 0 stalled for more than 10370msec
    Jan 27 18:28:52 miku kernel: radeon 0000:00:01.0: GPU lockup (current fence id 0x0000000000000002 last fence id 0x0000000000000004 on ring 0)
    Jan 27 18:28:52 miku kernel: [drm:r600_ib_test [radeon]] *ERROR* radeon: fence wait failed (-35).
    Jan 27 18:28:52 miku kernel: [drm:radeon_ib_ring_tests [radeon]] *ERROR* radeon: failed testing IB on GFX ring (-35).
    Jan 27 18:29:22 miku systemd[1]: Started Getty on tty2.

------------------------------------------------------------
Comment 1 Azari 2016-01-28 03:52:33 UTC
Created attachment 121337 [details]
dmesg log from the machine in question.
Comment 2 Azari 2016-01-28 03:53:35 UTC
Created attachment 121338 [details]
Xorg log file from the machine in question.
Comment 3 Azari 2016-01-28 03:56:34 UTC
Created attachment 121339 [details]
lspci output with pci IDs for everything, from the machine in question.

I forgot to add the PCI ID for the GPU in the initial report, it's 1002:9644

I have also added an attachment with lspci output for all the other devices, as well as attachments of the dmesg ouput and the xorg log.
Comment 4 Alex Deucher 2016-01-28 04:45:22 UTC
Created attachment 121340 [details] [review]
possible fix

Does this kernel patch help?
Comment 5 Azari 2016-01-29 03:22:03 UTC
(In reply to Alex Deucher from comment #4)
> Created attachment 121340 [details] [review] [review]
> possible fix
> 
> Does this kernel patch help?

I just finished compiling and testing a kernel with that patch; it didn't help, it still has the same issues. =(

Thanks for the prompt reply by the way.
Comment 6 Azari 2016-01-31 22:53:21 UTC
I have something new to report after doing more testing.

It seems that if I launch weston with the pixman backend (weston --use-pixman), that works, but meanwhile, startx with 'exec twm' in .xinitrc doesn't work on first attempt, it causes the lockup.

However, after the lockup, if i try to startx again (still with twm), it suddenly works and i can start applications in twm and use the desktop. I managed to reproduce this with Xfce as well, the first 'startxfce4' after bootup will fail and lockup the GPU, and after it resets, I try again and Xfce works.

One thing of note is that when I finally do manage to get a DE started (after the GPU has locked up and reset once), glxinfo shows only "gallium on llvmpipe"; no hardware acceleration available.

So whatever is causing the lockup is something that X does at startup (even when a minimal X window manager like twm is used), but weston-pixman doesn't do.
Comment 7 Alex Deucher 2016-01-31 23:17:24 UTC
Check your xorg log and make sure acceleration is enabled.
Comment 8 Azari 2016-02-02 01:15:26 UTC
Created attachment 121447 [details]
startx with 'exec twm' in .xinitrc right after bootup; causes lockup.
Comment 9 Azari 2016-02-02 01:18:20 UTC
Created attachment 121448 [details]
second attempt after lockup; startx with 'exec twm' in .xinitrc suddenly works fine.
Comment 10 Azari 2016-02-02 01:24:35 UTC
(In reply to Alex Deucher from comment #7)
> Check your xorg log and make sure acceleration is enabled.

It seems that acceleration disables itself after the first startx attempt locks up the GPU:

[   326.027] (--) RADEON(0): Chipset: "SUMO2" (ChipID = 0x9644)
[   326.027] (II) RADEON(0): GPU accel disabled or not working, using shadowfb for KMS
[   326.027] (II) Loading sub module "shadow"
[   326.027] (II) LoadModule: "shadow"
[   326.027] (II) Loading /usr/lib/xorg/modules/libshadow.so
[   326.047] (II) Module shadow: vendor="X.Org Foundation"
[   326.047] 	compiled for 1.18.0, module version = 1.1.0
[   326.047] 	ABI class: X.Org ANSI C Emulation, version 0.4
[   326.047] (II) RADEON(0): KMS Color Tiling: disabled
[   326.047] (II) RADEON(0): KMS Color Tiling 2D: disabled
[...]
[   326.247] (WW) RADEON(0): Direct rendering disabled
[   326.247] (II) RADEON(0): Acceleration disabled
[...]
[   326.252] (II) AIGLX: Screen 0 is not DRI2 capable
[   326.252] (EE) AIGLX: reverting to software rendering
[   326.329] (II) AIGLX: enabled GLX_MESA_copy_sub_buffer
[   326.331] (II) AIGLX: Loaded and initialized swrast
[   326.331] (II) GLX: Initialized DRISWRAST GL provider for screen 0


The full log is in this attachment: https://bugs.freedesktop.org/attachment.cgi?id=121448

I also uploaded another xorg log from the first attempt that causes the lockup, in case you want to see the difference between the two or something.
Comment 11 Martin Peres 2019-11-19 09:12:58 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/694.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.