Bug 73420 - [HAWAII] atombios stuck executing errors
[HAWAII] atombios stuck executing errors
Status: RESOLVED FIXED
Product: DRI
Classification: Unclassified
Component: DRM/Radeon
unspecified
x86-64 (AMD64) Linux (All)
: medium major
Assigned To: Default DRI bug account
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-01-09 01:46 UTC by Luzipher
Modified: 2014-01-30 23:21 UTC (History)
1 user (show)

See Also:


Attachments
dmesg (68.96 KB, text/plain)
2014-01-09 01:46 UTC, Luzipher
no flags Details
Xorg.0.log (53.69 KB, text/plain)
2014-01-09 01:47 UTC, Luzipher
no flags Details
vbios.rom (64.00 KB, application/octet-stream)
2014-01-10 21:52 UTC, Luzipher
no flags Details
rom acquired via GPU-Z (128.00 KB, application/octet-stream)
2014-01-10 21:57 UTC, Luzipher
no flags Details
possible fix (1.22 KB, patch)
2014-01-27 23:34 UTC, Alex Deucher
no flags Details | Splinter Review
possible fix (1.24 KB, patch)
2014-01-27 23:37 UTC, Alex Deucher
no flags Details | Splinter Review
possible fix (2.18 KB, patch)
2014-01-29 05:21 UTC, Alex Deucher
no flags Details | Splinter Review
possible fix (980 bytes, patch)
2014-01-29 21:39 UTC, Alex Deucher
no flags Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description Luzipher 2014-01-09 01:46:43 UTC
Created attachment 91727 [details]
dmesg

Observation:
============

System starts up fine (80x25 text mode), radeon module gets loaded and switches the console to high resolution. My system doesn't use a grahical splash screen or a graphical login manager and instead boots to a text-mode console.
When I try to start X (startx), the screen blanks out (it stays active and there is a non-blinking cursor visible in the upper left corner) and pressing Num or CapsLock doesn't trigger the keyboard light for a few seconds. Then the cursor vanishes and Num/CapsLock work again. When pressing Ctrl-Alt-F2 it takes another few seconds before the switch to the text-mode console happens - there everything works normally again.

Dmesg shows a few of these in 5second intervals (see attachment):
[  635.088465] [drm:atom_op_jump] *ERROR* atombios stuck in loop for more than 5secs aborting
[  635.088467] [drm:atom_execute_table_locked] *ERROR* atombios stuck executing D368 (len 116, WS 0, PS 0) @ 0xD3D1


Driver Stack Details:
=====================

1)    Kernel: Vanilla 3.13.0-rc7
2)    libdrm: git (commit e8cbc579651ef55274763c67acb366dd4155e0ce radeon: fix sumo2 pci id)
3)    mesa: git (commit e8ff08edd823ddf6b0e07ef84d2ba8afc3abbc34 mesa: Namespace qualify fma to override ambiguity with fma from math.h)
4)    Xorg-server: 1.15.0
5)    xf86-video-ati: git (commit bcc454ea2fb239e13942270faec7801270615b9c radeon/exa: Always use a scratch surface for UTS to vram)
6)    glamor: 0.5.1-r1
7)    llvm: 3.5svn (updated on 09.01.2013)

Hardware Configuration:
=====================

Graphics Card: Hawaii XT, Sapphire Radeon R9 290X Tri-X OC Battlefield 4 Edition, 4GB GDDR5, 2x DVI, HDMI, DisplayPort, full retail (11226-00-50G)
Processor: Core i7-965 (LGA 1366)
Mainboard: Asus P6T Deluxe
RAM: 6GB
Comment 1 Luzipher 2014-01-09 01:47:37 UTC
Created attachment 91728 [details]
Xorg.0.log
Comment 2 Alex Deucher 2014-01-09 14:22:05 UTC
It looks like X is picking 640x480 since it's the only common mode across all the connected monitors.  Does it work properly if you start X with 1 monitor and then attach the others after X has started?
Comment 3 Luzipher 2014-01-09 18:19:33 UTC
I disconnected all but one monitor from the card, but the issue is still there. Note that I don't even get a 640x480 screen, the screen is just black and for the first 5 seconds there is the non-blinking text-mode cursor in the upper left corner and the computer is not responding at all.
Comment 4 Alex Deucher 2014-01-10 21:33:37 UTC
Please attach a copy of your vbios.  To get a copy of your vbios:

(as root)
(use lspci to get the bus id)
cd /sys/bus/pci/devices/<pci bus id>
echo 1 > rom
cat rom > /tmp/vbios.rom
echo 0 > rom
Comment 5 Luzipher 2014-01-10 21:52:28 UTC
Created attachment 91844 [details]
vbios.rom

The vbios.rom acquired as instructed via /sys
Comment 6 Luzipher 2014-01-10 21:57:41 UTC
Created attachment 91845 [details]
rom acquired via GPU-Z

Note that the ROM-file acquired by GPU-Z is twice the size of the ROM acquired via /sys.
GPU-Z also reports:
Device ID: 1002 - 67B0
Subvendor: Sapphire/PCPartner (174B)
BIOS Version: 015.042.000.000.000000 (113-C6710100-O05)
Comment 7 Luzipher 2014-01-21 23:12:04 UTC
I retestet this on current drm-next (commit 	8c9b2e322d5c9b4e77fd308984ea303de4b63e1c ), which includes the drm-next-3.14 pull for radeon from agd5f. The problem persists.
Comment 8 Luzipher 2014-01-26 05:10:27 UTC
With newest stuff from git (xf86-video-ati, mesa, llvm, glamor) I now get X up and running - in llvmpipe software mode. It still stalls the system for at least one occurance of the 5s loop on X startup. dmesg shows 9 of the "atombios stuck for more than 5s" messages currently.


Snippet from Xorg.0.log (startx)
[  7148.888] (==) RADEON(0): DPMS enabled
[  7148.888] (==) RADEON(0): Silken mouse enabled
[  7148.888] (II) RADEON(0): RandR 1.2 enabled, ignore the following RandR disabled message.
[  7163.924] (--) RandR disabled
[  7163.928] (II) AIGLX: Screen 0 is not DRI2 capable
[  7163.928] (EE) AIGLX: reverting to software rendering
[  7163.995] (II) AIGLX: Loaded and initialized swrast
[  7163.995] (II) GLX: Initialized DRISWRAST GL provider for screen 0
[  7179.061] (II) RADEON(0): Setting screen physical size to 169 x 127
[  7179.096] (II) config/udev: Adding input device Power Button (/dev/input/event1)
[  7179.096] (**) Power Button: Applying InputClass "evdev keyboard catchall"


Snippet from dmesg (same time):
[ 7140.354544] type=1006 audit(1390711184.643:8): pid=4169 uid=0 old auid=4294967295 new auid=1000 old ses=4294967295 new ses=7 res=1
[ 7147.830641] [drm:atom_op_jump] *ERROR* atombios stuck in loop for more than 5secs aborting
[ 7147.830643] [drm:atom_execute_table_locked] *ERROR* atombios stuck executing D368 (len 116, WS 0, PS 0) @ 0xD3D1
[ 7152.840265] [drm:atom_op_jump] *ERROR* atombios stuck in loop for more than 5secs aborting
[ 7152.840267] [drm:atom_execute_table_locked] *ERROR* atombios stuck executing D368 (len 116, WS 0, PS 0) @ 0xD3D1
[ 7157.849874] [drm:atom_op_jump] *ERROR* atombios stuck in loop for more than 5secs aborting
[ 7157.849876] [drm:atom_execute_table_locked] *ERROR* atombios stuck executing D368 (len 116, WS 0, PS 0) @ 0xD3D1
[ 7162.932450] [drm:atom_op_jump] *ERROR* atombios stuck in loop for more than 5secs aborting
[ 7162.932453] [drm:atom_execute_table_locked] *ERROR* atombios stuck executing D368 (len 116, WS 0, PS 0) @ 0xD3D1
[ 7167.950130] [drm:atom_op_jump] *ERROR* atombios stuck in loop for more than 5secs aborting
[ 7167.950132] [drm:atom_execute_table_locked] *ERROR* atombios stuck executing D368 (len 116, WS 0, PS 0) @ 0xD3D1
[ 7172.965693] [drm:atom_op_jump] *ERROR* atombios stuck in loop for more than 5secs aborting
[ 7172.965696] [drm:atom_execute_table_locked] *ERROR* atombios stuck executing D368 (len 116, WS 0, PS 0) @ 0xD3D1
[ 7178.257075] [drm:atom_op_jump] *ERROR* atombios stuck in loop for more than 5secs aborting
[ 7178.257078] [drm:atom_execute_table_locked] *ERROR* atombios stuck executing D368 (len 116, WS 0, PS 0) @ 0xD3D1
[ 7307.075602] [drm:atom_op_jump] *ERROR* atombios stuck in loop for more than 5secs aborting
[ 7307.075604] [drm:atom_execute_table_locked] *ERROR* atombios stuck executing D368 (len 116, WS 0, PS 0) @ 0xD3D1
[ 7329.660921] [drm:atom_op_jump] *ERROR* atombios stuck in loop for more than 5secs aborting
[ 7329.660924] [drm:atom_execute_table_locked] *ERROR* atombios stuck executing D368 (len 116, WS 0, PS 0) @ 0xD3D1
Comment 9 Alex Deucher 2014-01-27 23:34:45 UTC
Created attachment 92893 [details] [review]
possible fix

Does the attached patch help?
Comment 10 Alex Deucher 2014-01-27 23:37:40 UTC
Created attachment 92894 [details] [review]
possible fix

corrected patch.
Comment 11 Luzipher 2014-01-28 23:19:42 UTC
Unfortunately the patch (corrected version) doesn't help. I applied the patch to the drm-next kernel I cloned on 2014-01-21 (see comment above), resulting version was 3.13.0-rc8-15136-g8c9b2e3-dirty. The dmesg messages are still the same, so I won't repost them.

And thanks for taking a look, Alex ! :-)
Comment 12 Alex Deucher 2014-01-29 05:21:27 UTC
Created attachment 92966 [details] [review]
possible fix

The atombios errors are harmless, but the attached patch will prevent them from happening.
Comment 13 Luzipher 2014-01-29 21:16:51 UTC
The last patch helps, no more messages and the computer doesn't hang completely for several seconds (I wouldn't exactly say that's harmless ;-) ). Thanks for the fix !

I detected a few different error messages when checking dmesg:
[drm:radeon_atom_get_leakage_vddc_based_on_leakage_params] *ERROR* Unknown table version 3, 1

Do those warrant a new bugreport ?
Comment 14 Alex Deucher 2014-01-29 21:19:58 UTC
(In reply to comment #13)
> I detected a few different error messages when checking dmesg:
> [drm:radeon_atom_get_leakage_vddc_based_on_leakage_params] *ERROR* Unknown
> table version 3, 1
> 
> Do those warrant a new bugreport ?

Yes, please open a new bug report for those.
Comment 15 Alex Deucher 2014-01-29 21:39:37 UTC
Created attachment 93015 [details] [review]
possible fix

(In reply to comment #13)
> I detected a few different error messages when checking dmesg:
> [drm:radeon_atom_get_leakage_vddc_based_on_leakage_params] *ERROR* Unknown
> table version 3, 1

This patch should fix this problem.
Comment 16 Luzipher 2014-01-30 00:19:15 UTC
Yes, that patch worked, no more messages.
I still get no accel though ("RADEON(0): GPU accel disabled or not working, using shadowfb for KMS"), but I guess I need to investigate that further first - unless you have a quick idea ?
Comment 17 Alex Deucher 2014-01-30 14:54:55 UTC
(In reply to comment #16)
> Yes, that patch worked, no more messages.
> I still get no accel though ("RADEON(0): GPU accel disabled or not working,
> using shadowfb for KMS"), but I guess I need to investigate that further
> first - unless you have a quick idea ?

Acceleration is currently disabled by default on hawaii as it's unstable for a lot of apps.  You can force it on with
Option "NoAccel" "false"
in the device section of your xorg conf, but you'll see gpu hangs.
Comment 18 Luzipher 2014-01-30 15:57:06 UTC
Ah ok, I didn't know that. Thanks for the info - I'll try the option and see how it goes ;-)

Unrealted to the acceleration I discovered another error message in my dmesg:
[drm:ci_dpm_set_power_state] *ERROR* ci_upload_dpm_level_enable_mask failed
Should I report that in a new bug ? (I should really stop hijacking this bug, sorry)
Comment 19 Alex Deucher 2014-01-30 16:17:15 UTC
(In reply to comment #18)
> Ah ok, I didn't know that. Thanks for the info - I'll try the option and see
> how it goes ;-)
> 
> Unrealted to the acceleration I discovered another error message in my dmesg:
> [drm:ci_dpm_set_power_state] *ERROR* ci_upload_dpm_level_enable_mask failed
> Should I report that in a new bug ? (I should really stop hijacking this
> bug, sorry)

Yes, open a new bug for the dpm stuff.  In the new bug please note whether this is new after applying attachment 93015 [details] [review] or if this was always there.
Comment 20 Luzipher 2014-01-30 23:21:19 UTC
New bug for the dpm issues is: https://bugs.freedesktop.org/show_bug.cgi?id=74250

The patch from attachement 93015 causes the message about ci_upload_dpm_level_enable_mask failing (the message does not appear without the patch).

I'm changing status to resolved fixed, as the patch for the original issue is already committed and in drm-next.