Bug 72921 - DPM Power Cycle with AMD A8-6600K & MSI FM2-A55M-E33
DPM Power Cycle with AMD A8-6600K & MSI FM2-A55M-E33
Status: RESOLVED FIXED
Product: DRI
Classification: Unclassified
Component: DRM/Radeon
unspecified
x86-64 (AMD64) Linux (All)
: medium blocker
Assigned To: Default DRI bug account
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-12-20 15:52 UTC by mmstickman
Modified: 2014-09-09 07:53 UTC (History)
2 users (show)

See Also:


Attachments
testing patch (985 bytes, patch)
2013-12-20 15:59 UTC, Alex Deucher
no flags Details | Splinter Review
dmesg (72.09 KB, text/plain)
2014-06-03 13:44 UTC, Arno Schuring
no flags Details
possible fix (464 bytes, patch)
2014-06-16 14:51 UTC, Alex Deucher
no flags Details | Splinter Review
testing patch (517 bytes, patch)
2014-06-16 19:40 UTC, Alex Deucher
no flags Details | Splinter Review
dmesg output (74.99 KB, text/plain)
2014-06-17 20:17 UTC, Collin
no flags Details
dmesg with working bapm patch (79.71 KB, text/plain)
2014-06-20 09:11 UTC, Arno Schuring
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description mmstickman 2013-12-20 15:52:42 UTC
As Linux kernel 3.13 has DPM now enabled by default, it is unbootable due to an issue with DPM and this set of hardware. Soon after the display is initiated, such as when it gets to the login screen, the system will simply power off and reboot if DPM is enabled.
Comment 1 Alex Deucher 2013-12-20 15:58:25 UTC
Please attach your xorg log and dmesg output with radeon.dpm=1 on the kernel command line in grub.  While dpm is enabled by default, forcing dpm=1 will enable extra debugging output.
Comment 2 Alex Deucher 2013-12-20 15:59:32 UTC
Created attachment 91047 [details] [review]
testing patch

Does the attached patch help?  If so can you narrow down which options are causing the problem?
Comment 3 Arno Schuring 2014-06-03 13:44:11 UTC
Created attachment 100359 [details]
dmesg

Looks like I'm seeing this same issue with an A8-6500 and linux 3.14: the system reboots within a few seconds after the radeon module gets loaded. The motherboard is an MSI A78M-E45. I'm attaching my (netconsole-captured) dmesg, but no obvious errors are logged. Maybe the probed power states are off, I can't tell. It does look like netconsole mangled some of the output, hope it's still useful.

If I boot with radeon.dpm=0, everything seems fine (though I haven't run anything 3D yet, glxinfo output appears correct). Blacklisting the radeon module also prevents the system from rebooting, but X won't start.
Comment 4 Alex Deucher 2014-06-03 13:47:59 UTC
Does attachment 91047 [details] [review] help?
Comment 5 Arno Schuring 2014-06-03 13:57:20 UTC
Hi Alex, just rebooted to test that patch. It applied to 3.15-rc8 with offset 6. Sadly, no change in behaviour. I've compared the power states in the dmesg logs, but they appear similar as well
Comment 6 Kertesz Laszlo 2014-06-03 18:44:29 UTC
Cannot reproduce on A8-6500 with a Gigabyte F2A88X-D3H. I had a F2A85X-D3H before with a A8-5500, and i used kernels below and above 3.13 on that too, but never had this issue.

I compile the kernel and mesa regularly from git (base system is Debian Testing 64 bit), right now i have 3.15.0-rc7-00118-ga4bf79e installed and works without issues.
Comment 7 Collin 2014-06-05 11:08:08 UTC
I think this may apply to MSI hardware in general, because I have an FM2-A75IA-E53 Motherboard with an AMD A10-6700T and I can't boot without radeon.dpm=0 disabled.
Comment 8 Alex Deucher 2014-06-07 16:21:38 UTC
You might try a bios update if one is available for your board.
Comment 9 Collin 2014-06-08 06:51:58 UTC
(In reply to comment #8)
> You might try a bios update if one is available for your board.

Just updated to the latest, no change, this bug NEEDS to get fixed sometime soon, hopefully? Maybe? How long does it normally take for bugs this bad to get fixed?
Comment 10 Kertesz Laszlo 2014-06-08 20:18:17 UTC
Since this bug seems to be restricted to specific boards, without your help will be hard to solve it.

You could try adjusting BIOS settings. Maybe you have some proprietary MSI specific "performance booster" settings or OC activated?
Did you try resetting the BIOS to defaults?
Do you have some special brand of memory?

Also, looking at that dmesg output i see some lines beginning with "#" such as

Jun  3 15:24:01 172.22.15.56 [   51.381984] #011ui class: 
Jun  3 15:24:01 none 
etc.

I dont have any of those in my dmesg and never had (A8-5500 and A8-6500 APUs on Gigabyte boards).
Comment 11 mmstickman 2014-06-08 23:39:45 UTC
If I recall correctly, I used AMD Radeon RAM when I built the system. I can't say whether or not the bug still occurs on the machine I posted this bug with because I built it as an office PC for someone else. Being a production machine, I didn't have any kind of boosters or overclocking features enabled. I think it's more than likely something to do with MSI's implementation of FM2/+
Comment 12 Arno Schuring 2014-06-09 09:08:29 UTC
I have upgraded my bios from 25.1 to 25.3, issue remains. I will browse the BIOS for overclocking settings, but I didn't change any; so if they do effect this, then it's the factory-defaults causing this.

This is a wild guess, but the symptoms to me look like a spurious watchdog kicking in, given that the system appears fully functional right up until the reboot, no errors are logged and there is a delay between the modprobe and the reboot.

@Kertesz: according to Alex in comment #1, "forcing dpm=1 will enable extra debugging output". Possibly that's why you're not seeing those lines.
Comment 13 Kertesz Laszlo 2014-06-09 10:03:35 UTC
I DO have set radeon.dpm=1 in the kernel command line.
Its there since this option was introduced (didnt remove it when it became default).
Comment 14 Collin 2014-06-14 06:12:45 UTC
So what should I do with this? How long do I wait for a fix for this? I would really not want to replace my motherboard just to use my APU.
Comment 15 Alex Deucher 2014-06-16 14:34:27 UTC
(In reply to comment #14)
> So what should I do with this? How long do I wait for a fix for this? I
> would really not want to replace my motherboard just to use my APU.

You can either disable dpm (I can add a quirk in the meantime to disable it by default for your board), or wait for the fix.  I'm not sure how long it will take to track down exactly what is going wrong.
Comment 16 Alex Deucher 2014-06-16 14:51:18 UTC
Created attachment 101179 [details] [review]
possible fix

Maybe there is some problem with the clocks or voltage in the highest power level.  Does the attached patch help?
Comment 17 Collin 2014-06-16 19:24:20 UTC
(In reply to comment #16)
> Created attachment 101179 [details] [review] [review]
> possible fix
> 
> Maybe there is some problem with the clocks or voltage in the highest power
> level.  Does the attached patch help?

Where and how do I apply this patch to... wherever it's supposed to go? And will it work on a Richland APU?
Comment 18 Alex Deucher 2014-06-16 19:38:12 UTC
(In reply to comment #17)
> (In reply to comment #16)
> > Created attachment 101179 [details] [review] [review] [review]
> > possible fix
> > 
> > Maybe there is some problem with the clocks or voltage in the highest power
> > level.  Does the attached patch help?
> 
> Where and how do I apply this patch to... wherever it's supposed to go? And
> will it work on a Richland APU?

It applies to the kernel.  You need to apply the patch and build and install the kernel.
Comment 19 Alex Deucher 2014-06-16 19:40:55 UTC
Created attachment 101191 [details] [review]
testing patch

Here's another kernel patch to try.  Please try this independent of any other patches on this bug.
Comment 20 Collin 2014-06-17 15:27:50 UTC
(In reply to comment #19)
> Created attachment 101191 [details] [review] [review]
> testing patch
> 
> Here's another kernel patch to try.  Please try this independent of any
> other patches on this bug.

First patch failed, gonna try out the second as soon as I start compiling a new kernel.
Comment 21 Collin 2014-06-17 19:43:56 UTC
Second patch from Alex Deucher worked like a charm! 60FPS on TF2 and currently testing out L4D2 :D
Comment 22 Alex Deucher 2014-06-17 20:06:32 UTC
(In reply to comment #21)
> Second patch from Alex Deucher worked like a charm! 60FPS on TF2 and
> currently testing out L4D2 :D

Please attach your dmesg output.
Comment 23 Collin 2014-06-17 20:17:51 UTC
Created attachment 101268 [details]
dmesg output

101191: testing patch fixes the DPM loop
Comment 24 Arno Schuring 2014-06-20 09:11:30 UTC
Created attachment 101420 [details]
dmesg with working bapm patch

I can confirm that the latter patch (attachment 101191 [details] [review]) solves the issue for me, I haven't tried the earlier patch because it didn't work for Collin. I can re-test that if that's still useful.

Attaching dmesg; I compared the probed power states and they appear similar. What's bapm?
Comment 25 Alex Deucher 2014-06-20 13:30:02 UTC
(In reply to comment #24)
> Created attachment 101420 [details]
> dmesg with working bapm patch
> 
> I can confirm that the latter patch (attachment 101191 [details] [review] [review])
> solves the issue for me, I haven't tried the earlier patch because it didn't
> work for Collin. I can re-test that if that's still useful.
> 

Probably not necessary.

> Attaching dmesg; I compared the probed power states and they appear similar.
> What's bapm?

It allows the CPU and GPU to share TDP headroom.  E.g., if the CPU isn't busy, the the GPU can user higher performance states longer and vice versa.
Comment 26 Collin 2014-06-23 10:05:45 UTC
Should this be marked as fixed or is there any other types of testing needed?
Comment 27 Alex Deucher 2014-06-23 14:32:41 UTC
(In reply to comment #26)
> Should this be marked as fixed or is there any other types of testing needed?

I need to push the patches upstream, but I'll be ding that this week.
Comment 28 Collin 2014-06-30 19:04:44 UTC
(In reply to comment #27)
> (In reply to comment #26)
> > Should this be marked as fixed or is there any other types of testing needed?
> 
> I need to push the patches upstream, but I'll be ding that this week.

Was the patch pushed upstream yet? If so how would I know, because I'd like to upgrade to kernel 3.14 or 3.15 for better performance with my APU.
Comment 29 Alex Deucher 2014-07-01 13:45:54 UTC
(In reply to comment #28)
> Was the patch pushed upstream yet? If so how would I know, because I'd like
> to upgrade to kernel 3.14 or 3.15 for better performance with my APU.

Not yet.  It'll be in my -fixes pull request this week.
Comment 30 Kertesz Laszlo 2014-07-03 12:24:19 UTC
I activated bapm on my A8-6500 (on a GA-F2A88X-D3H mobo).
It is working as advertised - the CPU reached its 4GHz turbo core speed and was working between 3.5GHz(nominal top speed) and 4GHz for quite long time periods.
Actually, from a performance point of view, this is the best so far (Catalyst had some nasty CPU slowdowns, not seen in /proc/cpu).

Now, the problems - about once a day the computer reboots suddenly. Every time my wife was playing some flash based Facebook game BTW. Using other stuff, even playing 3D games or using VDPAU acceleration didnt seem to create problems.

Also, i observed this in the system log:

[  299.812285] mce: [Hardware Error]: Machine check events logged

/var/log/mcelog has these:

mcelog: failed to prefill DIMM database from DMI data
mcelog: Unknown CPU type vendor 2 family 15 model 3
Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 4
ADDR feb0c114
TIME 1404369155 Thu Jul  3 09:32:35 2014
STATUS b600000000070f0f MCGSTATUS 0
MCGCAP 107 APICID 0 SOCKETID 0
CPUID Vendor AMD Family 21 Model 3
Comment 31 Kertesz Laszlo 2014-07-10 13:05:48 UTC
I had some more reboots, some when the computer was idling. Now i reverted the patch and rebuilt the kernel, let's see if it will help.
Comment 32 Rpnpif 2014-07-10 17:06:28 UTC
This patch (bapm=true) fixed all bug issue that I had from 3.13 with MSI A78M-E35 motherboard with AMD APU A4-5300. My test are with 3.14.12 kernel and radeon.dpm=1.

More ! "vblank_mode=1 glxgears" give two times more fps than with radeon.dpm=0.
Better temperature of the APU : cooler in idle mode and cool faster.

Succeed until now.

Thank you very much for your work.
Comment 33 Kertesz Laszlo 2014-07-19 21:51:25 UTC
So after more testing, i can confirm that activating bapm on my hardware (A8-6500 APU, Gigabyte GA-F2A88X-D3H mobo) leads to periodical sudden reboots (nothing in the logs).

Deactivating bapm makes the system stable again.
Comment 34 Collin 2014-07-21 04:08:44 UTC
(In reply to comment #33)
> So after more testing, i can confirm that activating bapm on my hardware
> (A8-6500 APU, Gigabyte GA-F2A88X-D3H mobo) leads to periodical sudden
> reboots (nothing in the logs).
> 
> Deactivating bapm makes the system stable again.

That's another big problem, so there needs to be more testing for APUs with other hardware that's not MSI. Time to open up a new bug for that.
Comment 35 Matt 2014-08-20 18:09:07 UTC
Will this filter back to 3.15 or earlier? If so, how quickly? It'd be awesome if it filtered back before the ubuntu 14.04.2 release. This is on the Mint Linux release notes below which points to the ubuntu bug listed.

http://www.linuxmint.com/rel_qiana_xfce.php
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1309578
Comment 36 Alex Deucher 2014-08-20 18:54:34 UTC
I just sent a patch to the stable trees.
Comment 37 Matt 2014-08-24 18:59:06 UTC
(In reply to comment #36)
> I just sent a patch to the stable trees.

Awesome
Comment 38 Matt 2014-09-04 03:33:17 UTC
Where do I look to check if the patch has landed yet? I have been scanning git repos and mailing lists since you stated that but am apparently looking in all the wrong places. Or does it just take longer than 2 weeks.

This is the only commit I can find anywhere that seems to be related but it's in 3.16:
https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=730a336c33a3398d65896e8ee3ef9f5679fe30a9
Comment 39 Alex Deucher 2014-09-08 04:25:22 UTC
You can check the stable branches here:
http://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/
Note that some stable trees are closed so they will not be getting any more updates.
Comment 40 Matt 2014-09-08 04:45:26 UTC
Okay. Thanks.
Comment 41 VF 2014-09-09 07:53:05 UTC
Just my 2 cents:
I have this bug too, on a MSI A88XM-E35 desktop mobo with latest bios 30.3 and no OC settings. I posted my bisect at https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1355044/ which drives to ...
I will now have to understand what I read here and see what I can decide to workaround.
Thank you for the job people here.