Bug 27744

Summary: atombios stuck in loop - during suspend
Product: DRI Reporter: Parag <parag.warudkar>
Component: DRM/RadeonAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED DUPLICATE QA Contact:
Severity: normal    
Priority: medium CC: aneaspam, jeffm, pauk.denis, pjsanon, willjcroz
Version: unspecified   
Hardware: x86 (IA32)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Radeon 3650 BIOS
none
bump atom loop timeout
none
dmesg log
none
possible fix
none
dmesg ouput after boot, suspend, resuspsend, suspend and resuspend
none
dmesg after resuspend with 2.6.36-rc3
none
dmesg after suspend/hibernate
none
dmesg after a fast resuspend, without stuck, ttyswitch is slow after this
none
resuspend after "fast resuspend without stuck" none

Description Parag 2010-04-19 16:19:30 UTC
Created attachment 35172 [details]
Radeon 3650 BIOS

I regularly hit the below during resume from RAM -
[   80.484213] [drm:atom_op_jump] *ERROR* atombios stuck in loop for more 
> than 1sec aborting
> [   80.484218] [drm:atom_execute_table_locked] *ERROR* atombios stuck 
> executing FA30 (len 493, WS 0, PS 4) @ 0xFA71
> [   81.497547] [drm:atom_op_jump] *ERROR* atombios stuck in loop for more 
> than 1sec aborting
> [   81.497552] [drm:atom_execute_table_locked] *ERROR* atombios stuck 
> executing FA30 (len 493, WS 0, PS 4) @ 0xFA71
> [   81.548053] PM: resume of devices complete after 2129.312 msecs
> [   81.746649] PM: Finishing wakeup.
> [   81.746651] Restarting tasks ... done.

Bumping the 1sec timeout to 2sec cures it - does not seems to be impacting anything else.

lspci -v
01:00.0 VGA compatible controller: ATI Technologies Inc Mobility Radeon HD 3650
        Subsystem: Hewlett-Packard Company Device 30e7
        Flags: bus master, fast devsel, latency 0, IRQ 31
        Memory at c0000000 (32-bit, prefetchable) [size=256M]
        I/O ports at 7000 [size=256]
        Memory at d8300000 (32-bit, non-prefetchable) [size=64K]
        Expansion ROM at d8320000 [disabled] [size=128K]
        Capabilities: [50] Power Management version 3
        Capabilities: [58] Express Legacy Endpoint, MSI 00
        Capabilities: [a0] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable+
        Capabilities: [100] Vendor Specific Information <?>
        Kernel driver in use: radeon
        Kernel modules: radeon
Comment 1 wessam 2010-05-04 18:19:42 UTC
I seem to be getting the same at startup (x86-64, Ububtu 10.04, Latest mainline kernel - 2.6.34-020634rc6-generic; + xorg edgers ppa):

[   14.911283] [drm:atom_op_jump] *ERROR* atombios stuck in loop for more than 1sec aborting
[   14.911289] [drm:atom_execute_table_locked] *ERROR* atombios stuck executing FA2E (len 455, WS 0, PS 4) @ 0xFA6F
[   15.951289] [drm:atom_op_jump] *ERROR* atombios stuck in loop for more than 1sec aborting
[   15.951295] [drm:atom_execute_table_locked] *ERROR* atombios stuck executing FA2E (len 455, WS 0, PS 4) @ 0xFA6F

lspci -v

01:00.0 VGA compatible controller: ATI Technologies Inc Mobility Radeon HD 3650
	Subsystem: Toshiba America Info Systems Device ff1e
	Flags: bus master, fast devsel, latency 0, IRQ 31
	Memory at c0000000 (32-bit, prefetchable) [size=256M]
	I/O ports at 5000 [size=256]
	Memory at d6400000 (32-bit, non-prefetchable) [size=64K]
	Expansion ROM at d6420000 [disabled] [size=128K]
	Capabilities: <access denied>
	Kernel driver in use: radeon
	Kernel modules: radeon
Comment 2 Alex Deucher 2010-06-30 11:53:54 UTC
*** Bug 27796 has been marked as a duplicate of this bug. ***
Comment 3 Alex Deucher 2010-06-30 11:54:03 UTC
*** Bug 28857 has been marked as a duplicate of this bug. ***
Comment 4 Alex Deucher 2010-06-30 11:54:10 UTC
*** Bug 28856 has been marked as a duplicate of this bug. ***
Comment 5 Alex Deucher 2010-06-30 12:03:17 UTC
Created attachment 36644 [details] [review]
bump atom loop timeout

Can you and all the duplicate bugs test this patch?
Comment 6 Jeff Mahoney 2010-06-30 12:55:05 UTC
Review of attachment 36644 [details] [review]:

On 2.6.34, it just changes the timeout messages to reflect the 5 second timeout.

On 2.6.35-rc3, the error messages are gone but X still doesn't work. Rather than two black screens in low power mode, one of them is in low power mode. The other is powered but is black and the LVDS display built into the notebook is displaying garbage. I saved dmesg and Xorg.0.log for this kernel.
Comment 7 Alex B 2010-06-30 16:44:15 UTC
Review of attachment 36644 [details] [review]:

In my case (Bug 28856) the patch just changed the error message from 1 to 5 secs and maybe the time it took (didn't have a stopwatch), but neither the amount of messages nor the error itself.
Comment 8 willjcroz 2010-07-22 11:32:54 UTC
I get the same errors as listed above during resume from suspen (RAM) on Ubuntu PPA kernel 2.6.35-rc5 (x86_64 Ubuntu 10.04 with ATI X1400 mobile chip in Thinkpad T60). On my system the screen displays the errors and hangs on the VT/framebuffer and requires a hard reset.

Additionally they are always proceeded by pciehp messages complaining that a device already exists (maybe related here?). This is what I see:

[   70.630060] pciehp 0000:00:1c.0:pcie04: Device 0000:02:00.0 already exists at 0000:02:00, cannot hot add
[   70.630063] pciehp 0000:00:1c.0:pcie04: Cannot add device at 0000:02:00
[   71.336223] usb 3-1: sierra_reset_resume
[   72.160012] [drm:atom_op_jump] *ERROR* atombios stuck in loop for more than 1sec aborting
[   72.160016] [drm:atom_execute_table_locked] *ERROR* atombios stuck executing EB96 (len 86, WS 4, PS 0) @ 0xEBC9

(there may be a typo or two, the errors above were written down by hand)

I have been having the pciehp errors (and subsequent hang) during resume since I installed 2.6.34 stable, the atombios errors appeared in the 2.6.35 RCs I think.
Comment 9 Alex Deucher 2010-07-22 12:27:05 UTC
(In reply to comment #8)
> I have been having the pciehp errors (and subsequent hang) during resume since
> I installed 2.6.34 stable, the atombios errors appeared in the 2.6.35 RCs I
> think.

Can you bisect to when the atombios errors started happening?
Comment 10 willjcroz 2010-07-22 16:29:21 UTC
(In reply to comment #9)
> Can you bisect to when the atombios errors started happening?

Sure, it might take a couple of days to find the time though.
Comment 11 willjcroz 2010-07-23 03:29:41 UTC
As a rough indicator for now: 2.6.35 rc3 resumes (from ram suspend)  OK and 2.6.35 rc4 hangs on resume with the atombios 'stuck in loop' messages. This was lucky since 2.6.35 rc1 and rc2 will not boot at all on this Thinkpad T60. 

There was one message of concern in the dmesg from the (successful) resume of 2.6.35 rc3:

[drm:drm_mode_getfb] *ERROR* invalid framebuffer id

I will do a git bisect between rc3 and rc4 over the next day or two.
Comment 12 willjcroz 2010-07-30 10:23:57 UTC
(In reply to comment #11)
> As a rough indicator for now: 2.6.35 rc3 resumes (from ram suspend)  OK

Scrub that, rc3 does not resume (either some weird fluke or my mistake).

I'm afraid I didn't get very far bisecting. Basically none of 2.6.34 (release) and 2.6.35-rc kernels resume.

I'm assuming you are only interested in when the atombios errors appear. Sometime between 2.6.34 and 2.6.35-rc1 a bug was introduced preventing my kernel from booting until just before 2.6.35-rc3.

From my bisections all I can confirm is that the atombios errors started appearing in Linus' tree *sometime after* this commit:

df16dd53c575d0cb9dbee20a3149927c862a9ff6  hwmon: (ltc4245) Read only one GPIO pin

and *sometime* before this commit:

09bdf591f4724c7d0328d4d7b8808492addb5a28  drm/radeon/kms: fix dpms state on resume

Alex, I am pretty much a 'newb' when it comes to git and the kernel in general and currently git is making my brain hurt severely. When I have the time I will read up more on git and try and work out an easy way to exclude unrelated commits that break my kernel and include bug fix commits that allow me to bisect this issue successfully. Any pointers to articles/howtos on dealing with this specific kind of issue with git?
Comment 13 Peter Weber 2010-08-03 07:35:43 UTC
Hello!
I use a Radeon 5650 (Acer TimelineX 3820TG with switch off Intel-Graphics!), all run's fine with Kernel "2.6.34.1" including KMS, Suspend to RAM, tty1-6 Framebuffer and X11 as well as switching between them.

With 2.6.35 (and rc6) I have the same "Stuck-Message" as the other people here, but I am afraid this is just the tip of the iceberg!

* wakeup from Suspend is slow, according to the "Stuck-Message"
* switching between TTY1-6 works normal as long nobody use the /dev/fb0 (interesting) device like fbida or mplayer

FBIDA:
Loading of a image or switching between images is normal, but switching away from the TTY which is running FBIDA is extremly slow. Switching back to the TTY is as fast as normal.
MPLAYER:
Same as with mplayer, but here it becomes very difficult to get the system back "under control". Please test this only with short videos, because normally everyting runes fine after mplayer closed itself. I advice to be patient and reboot "blind" or use MagicSysRescue if you don't get the control back.
*switching between TTY1-6 and TTY7 (where X11 is launched) is also extremly slow, similar to FBIDA


I think the problem behind is much bigger. Someone should check the updates between 2.6.34 and 2.6.35 and if possible fix it fast: window for kernel 2.6.36 is closing in two weeks
Comment 14 Peter Weber 2010-08-03 07:37:05 UTC
Created attachment 37542 [details]
dmesg log
Comment 15 Alex Deucher 2010-08-03 11:56:16 UTC
(In reply to comment #12)

> Alex, I am pretty much a 'newb' when it comes to git and the kernel in general
> and currently git is making my brain hurt severely. When I have the time I will
> read up more on git and try and work out an easy way to exclude unrelated
> commits that break my kernel and include bug fix commits that allow me to
> bisect this issue successfully. Any pointers to articles/howtos on dealing with
> this specific kind of issue with git?

If there is a commit that doesn't boot or build, you can skip it with:
git bisect skip
and continue the bisection.

See this page for more info:
http://www.kernel.org/pub/software/scm/git/docs/git-bisect.html
Comment 16 Peter Weber 2010-08-08 06:51:22 UTC
Here we go!

git bisect start '--' 'drivers/gpu/drm/radeon'
# good: [e40152ee1e1c7a63f4777791863215e3faa37a86] Linus 2.6.34
git bisect good e40152ee1e1c7a63f4777791863215e3faa37a86
# bad: [e40152ee1e1c7a63f4777791863215e3faa37a86] Linus 2.6.34
git bisect bad e40152ee1e1c7a63f4777791863215e3faa37a86
# bad: [9fe6206f400646a2322096b56c59891d530e8d51] Linux 2.6.35
git bisect bad 9fe6206f400646a2322096b56c59891d530e8d51
# bad: [ce8a3eb20c4cb7d9e0c33e7560070688cd9066fc] drm/radeon/kms/pm: make pm spam debug only
git bisect bad ce8a3eb20c4cb7d9e0c33e7560070688cd9066fc
# bad: [f405a1ab2bf316b1969fc5355891e5dff4e1a54c] drivers/gpu/drm: Use kmemdup
git bisect bad f405a1ab2bf316b1969fc5355891e5dff4e1a54c
# good: [7fff400be6fbf64f10abca9939718aaf1d61c255] Merge branch 'drm-fbdev-cleanup' into drm-core-next
git bisect good 7fff400be6fbf64f10abca9939718aaf1d61c255
# bad: [fd632aa34c8592fb1d37fc83cbffa827bc7dd42c] drm: free core gem object from driver callbacks
git bisect bad fd632aa34c8592fb1d37fc83cbffa827bc7dd42c
# good: [32fcdbf4084544c3d8fa413004d57e5dc6f2eefe] drm/radeon/kms/evergreen: implement gfx init
git bisect good 32fcdbf4084544c3d8fa413004d57e5dc6f2eefe
# bad: [0ca2ab52d451c25764e53d3d289e1be357c977d7] drm/radeon/kms/evergreen: add hpd support
git bisect bad 0ca2ab52d451c25764e53d3d289e1be357c977d7
# bad: [45f9a39bedc3afab3fc85567792efc0103f34a55] drm/radeon/kms/evergreen: implement irq support
git bisect bad 45f9a39bedc3afab3fc85567792efc0103f34a55
# bad: [fe251e2fffa1ebc17c8e6e895b0374ae4e732fa5] drm/radeon/kms/evergreen: setup and enable the CP
git bisect bad fe251e2fffa1ebc17c8e6e895b0374ae4e732fa5


[peter@timeline linux-2.6]$ git bisect bad
fe251e2fffa1ebc17c8e6e895b0374ae4e732fa5 is the first bad commit
commit fe251e2fffa1ebc17c8e6e895b0374ae4e732fa5
Author: Alex Deucher <alexdeucher@gmail.com>
Date:   Wed Mar 24 13:36:43 2010 -0400

    drm/radeon/kms/evergreen: setup and enable the CP
    
    The command processor (CP) fetches command buffers and
    feeds the GPU.  This patch requires the evergreen
    family me and pfp ucode files.
    
    Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
    Signed-off-by: Dave Airlie <airlied@redhat.com>

:040000 040000 552f6ad9603d8daece1bdd9a7ac9a4058ea6d6ed 00e4e5aaea403d3c3130250a477e999b80d48ffb M	drivers

Bug 27744 (and also Bug 29384, this is affect from a other commit from may 7 )doesn't exist before this.
Comment 17 Peter Weber 2010-08-08 06:55:16 UTC
PM: early resume of devices complete after 0.826 msecs
ehci_hcd 0000:00:1a.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
ehci_hcd 0000:00:1a.0: setting latency timer to 64
HDA Intel 0000:00:1b.0: PCI INT A -> GSI 22 (level, low) -> IRQ 22
HDA Intel 0000:00:1b.0: setting latency timer to 64
ehci_hcd 0000:00:1d.0: PCI INT A -> GSI 23 (level, low) -> IRQ 23
HDA Intel 0000:00:1b.0: irq 28 for MSI/MSI-X
ehci_hcd 0000:00:1d.0: setting latency timer to 64
pci 0000:00:1e.0: setting latency timer to 64
ahci 0000:00:1f.2: setting latency timer to 64
pci 0000:00:1f.3: PCI INT C -> GSI 18 (level, low) -> IRQ 18
radeon 0000:02:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
radeon 0000:02:00.0: setting latency timer to 64
HDA Intel 0000:02:00.1: PCI INT B -> GSI 17 (level, low) -> IRQ 17
HDA Intel 0000:02:00.1: setting latency timer to 64
ath9k 0000:05:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
HDA Intel 0000:02:00.1: irq 29 for MSI/MSI-X
sd 0:0:0:0: [sda] Starting disk
[drm] Clocks initialized !
[drm] ring test succeeded in 1 usecs
radeon 0000:02:00.0: no free indirect buffer !
[drm:r600_ib_test] *ERROR* radeon: failed to get ib (-16).
[drm:evergreen_resume] *ERROR* radeon: failled testing IB (-16).
usb 2-1.5: reset high speed USB device using ehci_hcd and address 3
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata1.00: configured for UDMA/133
PM: resume of devices complete after 360.320 msecs
Restarting tasks ... done.
	[ pm_notifier_block : 171 ] event :4

atl1c 0000:03:00.0: irq 30 for MSI/MSI-X
ADDRCONF(NETDEV_UP): eth0: link is not ready
ADDRCONF(NETDEV_UP): wlan0: link is not ready
ath9k: Two wiphys trying to scan at the same time
ath9k: Two wiphys trying to scan at the same time
wlan0: deauthenticating from 00:1c:10:36:48:42 by local choice (reason=3)
wlan0: authenticate with 00:1c:10:36:48:42 (try 1)
wlan0: authenticated
wlan0: associate with 00:1c:10:36:48:42 (try 1)
wlan0: RX AssocResp from 00:1c:10:36:48:42 (capab=0x431 status=0 aid=1)
wlan0: associated
ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
wlan0: no IPv6 routers present
Comment 18 Alex Deucher 2010-08-09 09:39:30 UTC
Created attachment 37733 [details] [review]
possible fix

Does this patch help?
Comment 19 Peter Weber 2010-08-14 09:18:20 UTC
No other ideas?

Both bugs still exist with 2.6.35.2
Comment 20 Peter Weber 2010-08-14 11:48:29 UTC
The patch doesn't help.
Comment 21 Alex Deucher 2010-08-16 07:13:18 UTC
Does the patch attached to bug 29384 help?
Comment 22 Peter Weber 2010-08-16 08:44:33 UTC
Yes and no. I'am confused.
At first I thought the patch doesn't help.

Bootup works normal.
After first suspend/resuspend I got the "stuck atombios" message.
After the second suspend/resuspend I got no "stuck atombios" message.

See dmesg output I will add.
line 828 first suspend
line 940 resuspsend with atombios stuck
line 954 second suspend but while resuspend no atombios stuck?
Comment 23 Peter Weber 2010-08-16 09:05:22 UTC
Created attachment 37901 [details]
dmesg ouput after boot, suspend, resuspsend, suspend and resuspend
Comment 24 Peter Weber 2010-08-16 09:33:12 UTC
Hmmm.
I rebooted the laptop, suspend/resuspend two times. But this time I got always the "stuck atombios" message. It's confusing.
Comment 25 Peter Weber 2010-08-30 07:21:23 UTC
I upgraded on 2.6.36-rc3, very confusing!
Instead of "stuck atombios" I got now a massive amount of message from pm-suspend, also after resuspending I feels like the terminals "refreshing the content from top to bottom" (text with less, or images with fbi). After clearing the display with "CTRL+L" it works normal again.
Also my ethternet-cards are all down after resuspend (okay, maybe not a problem with radeon ^^) and networkmanage refuses to work.

See attached file.
Comment 26 Peter Weber 2010-08-30 07:21:54 UTC
Created attachment 38300 [details] [review]
dmesg after resuspend with 2.6.36-rc3
Comment 27 Alex Deucher 2010-08-30 08:30:11 UTC
These look like acpi problems.
Comment 28 Tobias Kaminsky 2010-09-02 23:45:57 UTC
I have the same message in dmesg:
[ 1099.645933] [drm:atom_op_jump] *ERROR* atombios stuck in loop for more than 5secs aborting
[ 1099.645936] [drm:atom_execute_table_locked] *ERROR* atombios stuck executing CD14 (len 67, WS 0, PS 0) @ 0xCD43

It is a Sony VAIO VPCEB1Z1E.
Comment 29 Tobias Kaminsky 2010-09-02 23:46:32 UTC
Created attachment 38394 [details]
dmesg after suspend/hibernate
Comment 30 Peter Weber 2010-09-27 10:15:41 UTC
Yes. The Problems with ACPI are gone under 2.6.36rc4.
The leftover is still the same.
Comment 31 Peter Weber 2010-10-02 03:44:45 UTC
Yesterday I installed the new 2.6.36-rc6:
In the most cases I got "stuck atombios" as always after S3 (Suspend to RAM) and resuspend.
But again at one time the system immediately resuspend, without "stuck atombios" and the operations on the TTY1-6 seem to draw text or graphics slower as normal (normal ~ realtime). After a second suspend/resuspuend cycle the wakeup need 10 sec and I got two time "stuckatombios" in the dmesg-output - it feels like the system "catched up its missing delay while wakeup" from the first time.

After this second suspend the reaction time while drawing text or image on the TTYs was normal again.



And doesn't seems to be reproductive, I don't know why.
Comment 32 Pauk Denis 2010-10-14 04:59:38 UTC
Hi!

I have very similar messages on my laptop, but on normal work and often on poweroff (without suspend or other such logic). 

And some time Xorg freeze by scenario like this:
1 - freeze screen with loop music 
2 - after several seconds screen go to black
3 - and laptop not responds to any key press or click.

Additional info on https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-ati/+bug/647665
Comment 33 Tobias Kaminsky 2010-10-21 03:28:29 UTC
Kernel: 2.6.36

$dmesg |grep atom
[    3.186893] ATOM BIOS: Sony
[ 5155.577112] [drm:atom_op_jump] *ERROR* atombios stuck in loop for more than 5secs aborting
[ 5155.577115] [drm:atom_execute_table_locked] *ERROR* atombios stuck executing CD14 (len 67, WS 0, PS 0) @ 0xCD43

Right after booting the scrolling is smooth, but after hibernate to ram it is not.
Sometimes it evens hangs while scrolling in Firefox...

01:00.0 VGA compatible controller [0300]: ATI Technologies Inc Redwood [Radeon HD 5600 Series] [1002:68c1]

xorg-server: 1.7.7

Should I try a newer one?

Thank you
Tobias
Comment 34 Peter Weber 2010-10-21 06:04:22 UTC
You can try Xorg-Server 1.9, but I'm afraid it won fix that.

1. It is only a problem of the Radeon-Cards
2. It happens also on the TTYs
Comment 35 Peter Weber 2010-10-24 10:23:53 UTC
Created attachment 39741 [details]
dmesg after a fast resuspend, without stuck, ttyswitch is slow after this
Comment 36 Peter Weber 2010-10-24 10:25:28 UTC
Created attachment 39742 [details]
resuspend after "fast resuspend without stuck"

As you can see I got there twoxstuck messages and the resuspend takes 10 seconds, afters this resuspend switching between ttys is normal (fast) again.
Comment 37 Tobias Kaminsky 2010-10-28 00:54:30 UTC
(In reply to comment #33)
> Kernel: 2.6.36
> 
> $dmesg |grep atom
> [    3.186893] ATOM BIOS: Sony
> [ 5155.577112] [drm:atom_op_jump] *ERROR* atombios stuck in loop for more than
> 5secs aborting
> [ 5155.577115] [drm:atom_execute_table_locked] *ERROR* atombios stuck executing
> CD14 (len 67, WS 0, PS 0) @ 0xCD43
> 
> Right after booting the scrolling is smooth, but after hibernate to ram it is
> not.
> Sometimes it evens hangs while scrolling in Firefox...
> 
> 01:00.0 VGA compatible controller [0300]: ATI Technologies Inc Redwood [Radeon
> HD 5600 Series] [1002:68c1]
> 
> xorg-server: 1.7.7
> 
> Should I try a newer one?
> 
> Thank you
> Tobias

Trying xorg-server 1.9.1 and xf86-video-ati from SVN lets me scroll smoothly in X.
But I get several Segmentation faults while using it.
Comment 38 Alex Deucher 2010-12-17 08:45:22 UTC

*** This bug has been marked as a duplicate of bug 32066 ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.