Bug 72710 - rv635: resume from hibernation the second time fails
Summary: rv635: resume from hibernation the second time fails
Status: RESOLVED MOVED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Radeon (show other bugs)
Version: XOrg git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-12-14 16:11 UTC by Harald Judt
Modified: 2019-11-19 08:41 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
vbios.rom (62.50 KB, text/plain)
2013-12-14 16:11 UTC, Harald Judt
no flags Details

Description Harald Judt 2013-12-14 16:11:51 UTC
Created attachment 90773 [details]
vbios.rom

With kernel-3.12.0, the system fails to resume from suspend/hibernate.
  - It always fails with radeon.dpm=1.
  - It fails most times with radeon.dpm=0. Usually it works on the first
    try but fails on the second or third, but sometimes even the first try
    is unsuccessful.

The screen goes black before it starts reading the pageset2 data, the numlock/scrolllock leds start blinking indicating a kernel panic. A hard reset is required. There are no weird messages in dmesg (no_console_suspend=1).

As for the kernel version, I think the problems started around 3.7.0 or maybe one of its release candidates. Since there have been so many changes and quite a few problems with those versions, I'm not sure how to bisect this. The last reliable version that I used was 3.6.2. I have also pulled the drm-next-3.13 patches into 3.12, but it didn't help.

Apart from the failure to resume, the driver works fine (even with dpm enabled), I have no lockups nor crashes.

dmesg:

[drm] radeon kernel modesetting enabled.
[drm] initializing kernel modesetting (RV635 0x1002:0x9598 0x1462:0x1260).
[drm] register mmio base: 0xF0100000
[drm] register mmio size: 65536
ATOM BIOS: 113
radeon 0000:01:00.0: VRAM: 512M 0x0000000000000000 - 0x000000001FFFFFFF (512M used)
radeon 0000:01:00.0: GTT: 512M 0x0000000020000000 - 0x000000003FFFFFFF
[drm] Detected VRAM RAM=512M, BAR=256M
[drm] RAM width 128bits DDR
[TTM] Zone  kernel: Available graphics memory: 1952880 kiB
[TTM] Initializing pool allocator
[TTM] Initializing DMA pool allocator
[drm] radeon: 512M of VRAM memory ready
[drm] radeon: 512M of GTT memory ready.
[drm] GART: num cpu pages 131072, num gpu pages 131072
[drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0
[drm] Loading RV635 Microcode
[drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
radeon 0000:01:00.0: WB enabled
radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffff880112f67c00
radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000020000c0c and cpu addr 0xffff880112f67c0c
[drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[drm] Driver supports precise vblank timestamp query.
radeon 0000:01:00.0: irq 41 for MSI/MSI-X
radeon 0000:01:00.0: radeon: using MSI.
[drm] radeon: irq initialized.
[drm] ring test on 0 succeeded in 1 usecs
[drm] ring test on 3 succeeded in 1 usecs
[drm] Enabling audio 0 support
[drm] ib test on ring 0 succeeded in 0 usecs
[drm] ib test on ring 3 succeeded in 0 usecs
[drm] Radeon Display Connectors
[drm] Connector 0:
[drm]   DVI-I-1
[drm]   HPD1
[drm]   DDC: 0x7e60 0x7e60 0x7e64 0x7e64 0x7e68 0x7e68 0x7e6c 0x7e6c
[drm]   Encoders:
[drm]     DFP1: INTERNAL_UNIPHY
[drm]     CRT2: INTERNAL_KLDSCP_DAC2
[drm] Connector 1:
[drm]   DIN-1
[drm]   Encoders:
[drm]     TV1: INTERNAL_KLDSCP_DAC2
[drm] Connector 2:
[drm]   DVI-I-2
[drm]   HPD2
[drm]   DDC: 0x7e40 0x7e40 0x7e44 0x7e44 0x7e48 0x7e48 0x7e4c 0x7e4c
[drm]   Encoders:
[drm]     CRT1: INTERNAL_KLDSCP_DAC1
[drm]     DFP2: INTERNAL_KLDSCP_LVTMA
== power state 0 ==
 ui class: none
 internal class: boot 
 caps: video 
 uvd    vclk: 0 dclk: 0
         power level 0    sclk: 72500 mclk: 40000 vddc: 1250
         power level 1    sclk: 72500 mclk: 40000 vddc: 1250
         power level 2    sclk: 72500 mclk: 40000 vddc: 1250
 status: c r b 
== power state 1 ==
 ui class: performance
 internal class: none
 caps: single_disp video 
 uvd    vclk: 0 dclk: 0
         power level 0    sclk: 11000 mclk: 25200 vddc: 900
         power level 1    sclk: 30000 mclk: 35000 vddc: 1000
         power level 2    sclk: 72500 mclk: 40000 vddc: 1250
 status: 
== power state 2 ==
 ui class: none
 internal class: uvd 
 caps: video 
 uvd    vclk: 40000 dclk: 30000
         power level 0    sclk: 60000 mclk: 40000 vddc: 1150
         power level 1    sclk: 60000 mclk: 40000 vddc: 1150
         power level 2    sclk: 60000 mclk: 40000 vddc: 1150
 status: 
== power state 3 ==
 ui class: performance
 internal class: none
 caps: video 
 uvd    vclk: 0 dclk: 0
         power level 0    sclk: 30000 mclk: 40000 vddc: 1250
         power level 1    sclk: 30000 mclk: 40000 vddc: 1250
         power level 2    sclk: 72500 mclk: 40000 vddc: 1250
 status: 
switching from power state:
 ui class: none
 internal class: boot 
 caps: video 
 uvd    vclk: 0 dclk: 0
         power level 0    sclk: 72500 mclk: 40000 vddc: 1250
         power level 1    sclk: 72500 mclk: 40000 vddc: 1250
         power level 2    sclk: 72500 mclk: 40000 vddc: 1250
 status: c b 
switching to power state:
 ui class: performance
 internal class: none
 caps: single_disp video 
 uvd    vclk: 0 dclk: 0
         power level 0    sclk: 11000 mclk: 25200 vddc: 900
         power level 1    sclk: 30000 mclk: 35000 vddc: 1000
         power level 2    sclk: 72500 mclk: 40000 vddc: 1250
 status: r 
[drm] radeon: dpm initialized
[drm] fb mappable at 0xE0141000
[drm] vram apper at 0xE0000000
[drm] size 7299072
[drm] fb depth is 24
[drm]    pitch is 6912
fbcon: radeondrmfb (fb0) is primary device
Console: switching to colour frame buffer device 210x65
radeon 0000:01:00.0: fb0: radeondrmfb frame buffer device
radeon 0000:01:00.0: registered panic notifier
[drm] Initialized radeon 2.35.0 20080528 for 0000:01:00.0 on minor 0


lspci -vvv:
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] RV635 [Radeon HD 3650/3750/4570/4580] (prog-if 00 [VGA controller])
        Subsystem: Micro-Star International Co., Ltd. Device 1260
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 41
        Region 0: Memory at e0000000 (64-bit, prefetchable) [size=256M]
        Region 2: Memory at f0100000 (64-bit, non-prefetchable) [size=64K]
        Region 4: I/O ports at 2100 [size=256]
        [virtual] Expansion ROM at f0120000 [disabled] [size=128K]
        Capabilities: [50] Power Management version 3
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal+ Unsupported-
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 128 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 5GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
                        ClockPM- Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
                Address: 00000000fee0f00c  Data: 4181
        Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
        Kernel driver in use: radeon

cat /sys/class/drm/card0/device/power_dpm_force_performance_level 
auto

cat /sys/class/drm/card0/device/power_dpm_state
balanced


I have attached the video bios rom, maybe it is of use?
Comment 1 Harald Judt 2013-12-14 16:27:36 UTC
Ok, more tries: I've forced the power state to high. Now the machine survived two hibernate/resume cycles in a row but failed on the third. Not sure how much this piece of information is worth.
Comment 2 Alex Deucher 2013-12-16 23:55:50 UTC
Bisecting would be the best bet.
Comment 3 Harald Judt 2014-12-30 10:28:01 UTC
Status update: After updating to 3.18.1 vanilla and booting with radeon.dpm=0, suspend/resume now works reliably.

Hibernating/resuming still fails, but only and always on the second cycle, the first cycle seems to work fine now:
1) Boot with radeon.dpm=0 (radeon.dpm=1 seems to have its own stability troubles)
2) hibernate
3) resume
4) hibernate
5) resume => kernel panic after loading pages (counting from 0% to 100%).

It seems the kernel panics when trying to switch back to the X screen.

Unfortunately, I am for some reason no longer able to boot with a 3.6/3.7 kernel (maybe because of an udev problem), so I cannot bisect - and I am not sure any 3.6 kernels worked reliably before because I remember it had issues too.

I've tried to suspend/resume between the hibernation cycles, but that does not change anything; hibernate/resume will still fail on the second resume attempt.
Comment 4 Michel Dänzer 2015-01-07 07:23:58 UTC
Is there any way you can get more information about the panic on resume, e.g. via a serial console or netconsole or some suspend/hibernate specific debugging mechanism?
Comment 5 Martin Peres 2019-11-19 08:41:15 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/419.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.