Summary: | DPM Power Cycle with AMD A8-6600K & MSI FM2-A55M-E33 | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | mmstickman | ||||||||||||||
Component: | DRM/Radeon | Assignee: | Default DRI bug account <dri-devel> | ||||||||||||||
Status: | RESOLVED FIXED | QA Contact: | |||||||||||||||
Severity: | blocker | ||||||||||||||||
Priority: | medium | CC: | mdinger.bugzilla, rpnpif | ||||||||||||||
Version: | unspecified | ||||||||||||||||
Hardware: | x86-64 (AMD64) | ||||||||||||||||
OS: | Linux (All) | ||||||||||||||||
Whiteboard: | |||||||||||||||||
i915 platform: | i915 features: | ||||||||||||||||
Attachments: |
|
Description
mmstickman
2013-12-20 15:52:42 UTC
Please attach your xorg log and dmesg output with radeon.dpm=1 on the kernel command line in grub. While dpm is enabled by default, forcing dpm=1 will enable extra debugging output. Created attachment 91047 [details] [review] testing patch Does the attached patch help? If so can you narrow down which options are causing the problem? Created attachment 100359 [details]
dmesg
Looks like I'm seeing this same issue with an A8-6500 and linux 3.14: the system reboots within a few seconds after the radeon module gets loaded. The motherboard is an MSI A78M-E45. I'm attaching my (netconsole-captured) dmesg, but no obvious errors are logged. Maybe the probed power states are off, I can't tell. It does look like netconsole mangled some of the output, hope it's still useful.
If I boot with radeon.dpm=0, everything seems fine (though I haven't run anything 3D yet, glxinfo output appears correct). Blacklisting the radeon module also prevents the system from rebooting, but X won't start.
Does attachment 91047 [details] [review] help? Hi Alex, just rebooted to test that patch. It applied to 3.15-rc8 with offset 6. Sadly, no change in behaviour. I've compared the power states in the dmesg logs, but they appear similar as well Cannot reproduce on A8-6500 with a Gigabyte F2A88X-D3H. I had a F2A85X-D3H before with a A8-5500, and i used kernels below and above 3.13 on that too, but never had this issue. I compile the kernel and mesa regularly from git (base system is Debian Testing 64 bit), right now i have 3.15.0-rc7-00118-ga4bf79e installed and works without issues. I think this may apply to MSI hardware in general, because I have an FM2-A75IA-E53 Motherboard with an AMD A10-6700T and I can't boot without radeon.dpm=0 disabled. You might try a bios update if one is available for your board. (In reply to comment #8) > You might try a bios update if one is available for your board. Just updated to the latest, no change, this bug NEEDS to get fixed sometime soon, hopefully? Maybe? How long does it normally take for bugs this bad to get fixed? Since this bug seems to be restricted to specific boards, without your help will be hard to solve it. You could try adjusting BIOS settings. Maybe you have some proprietary MSI specific "performance booster" settings or OC activated? Did you try resetting the BIOS to defaults? Do you have some special brand of memory? Also, looking at that dmesg output i see some lines beginning with "#" such as Jun 3 15:24:01 172.22.15.56 [ 51.381984] #011ui class: Jun 3 15:24:01 none etc. I dont have any of those in my dmesg and never had (A8-5500 and A8-6500 APUs on Gigabyte boards). If I recall correctly, I used AMD Radeon RAM when I built the system. I can't say whether or not the bug still occurs on the machine I posted this bug with because I built it as an office PC for someone else. Being a production machine, I didn't have any kind of boosters or overclocking features enabled. I think it's more than likely something to do with MSI's implementation of FM2/+ I have upgraded my bios from 25.1 to 25.3, issue remains. I will browse the BIOS for overclocking settings, but I didn't change any; so if they do effect this, then it's the factory-defaults causing this. This is a wild guess, but the symptoms to me look like a spurious watchdog kicking in, given that the system appears fully functional right up until the reboot, no errors are logged and there is a delay between the modprobe and the reboot. @Kertesz: according to Alex in comment #1, "forcing dpm=1 will enable extra debugging output". Possibly that's why you're not seeing those lines. I DO have set radeon.dpm=1 in the kernel command line. Its there since this option was introduced (didnt remove it when it became default). So what should I do with this? How long do I wait for a fix for this? I would really not want to replace my motherboard just to use my APU. (In reply to comment #14) > So what should I do with this? How long do I wait for a fix for this? I > would really not want to replace my motherboard just to use my APU. You can either disable dpm (I can add a quirk in the meantime to disable it by default for your board), or wait for the fix. I'm not sure how long it will take to track down exactly what is going wrong. Created attachment 101179 [details] [review] possible fix Maybe there is some problem with the clocks or voltage in the highest power level. Does the attached patch help? (In reply to comment #16) > Created attachment 101179 [details] [review] [review] > possible fix > > Maybe there is some problem with the clocks or voltage in the highest power > level. Does the attached patch help? Where and how do I apply this patch to... wherever it's supposed to go? And will it work on a Richland APU? (In reply to comment #17) > (In reply to comment #16) > > Created attachment 101179 [details] [review] [review] [review] > > possible fix > > > > Maybe there is some problem with the clocks or voltage in the highest power > > level. Does the attached patch help? > > Where and how do I apply this patch to... wherever it's supposed to go? And > will it work on a Richland APU? It applies to the kernel. You need to apply the patch and build and install the kernel. Created attachment 101191 [details] [review] testing patch Here's another kernel patch to try. Please try this independent of any other patches on this bug. (In reply to comment #19) > Created attachment 101191 [details] [review] [review] > testing patch > > Here's another kernel patch to try. Please try this independent of any > other patches on this bug. First patch failed, gonna try out the second as soon as I start compiling a new kernel. Second patch from Alex Deucher worked like a charm! 60FPS on TF2 and currently testing out L4D2 :D (In reply to comment #21) > Second patch from Alex Deucher worked like a charm! 60FPS on TF2 and > currently testing out L4D2 :D Please attach your dmesg output. Created attachment 101268 [details]
dmesg output
101191: testing patch fixes the DPM loop
Created attachment 101420 [details] dmesg with working bapm patch I can confirm that the latter patch (attachment 101191 [details] [review]) solves the issue for me, I haven't tried the earlier patch because it didn't work for Collin. I can re-test that if that's still useful. Attaching dmesg; I compared the probed power states and they appear similar. What's bapm? (In reply to comment #24) > Created attachment 101420 [details] > dmesg with working bapm patch > > I can confirm that the latter patch (attachment 101191 [details] [review] [review]) > solves the issue for me, I haven't tried the earlier patch because it didn't > work for Collin. I can re-test that if that's still useful. > Probably not necessary. > Attaching dmesg; I compared the probed power states and they appear similar. > What's bapm? It allows the CPU and GPU to share TDP headroom. E.g., if the CPU isn't busy, the the GPU can user higher performance states longer and vice versa. Should this be marked as fixed or is there any other types of testing needed? (In reply to comment #26) > Should this be marked as fixed or is there any other types of testing needed? I need to push the patches upstream, but I'll be ding that this week. (In reply to comment #27) > (In reply to comment #26) > > Should this be marked as fixed or is there any other types of testing needed? > > I need to push the patches upstream, but I'll be ding that this week. Was the patch pushed upstream yet? If so how would I know, because I'd like to upgrade to kernel 3.14 or 3.15 for better performance with my APU. (In reply to comment #28) > Was the patch pushed upstream yet? If so how would I know, because I'd like > to upgrade to kernel 3.14 or 3.15 for better performance with my APU. Not yet. It'll be in my -fixes pull request this week. I activated bapm on my A8-6500 (on a GA-F2A88X-D3H mobo). It is working as advertised - the CPU reached its 4GHz turbo core speed and was working between 3.5GHz(nominal top speed) and 4GHz for quite long time periods. Actually, from a performance point of view, this is the best so far (Catalyst had some nasty CPU slowdowns, not seen in /proc/cpu). Now, the problems - about once a day the computer reboots suddenly. Every time my wife was playing some flash based Facebook game BTW. Using other stuff, even playing 3D games or using VDPAU acceleration didnt seem to create problems. Also, i observed this in the system log: [ 299.812285] mce: [Hardware Error]: Machine check events logged /var/log/mcelog has these: mcelog: failed to prefill DIMM database from DMI data mcelog: Unknown CPU type vendor 2 family 15 model 3 Hardware event. This is not a software error. MCE 0 CPU 0 BANK 4 ADDR feb0c114 TIME 1404369155 Thu Jul 3 09:32:35 2014 STATUS b600000000070f0f MCGSTATUS 0 MCGCAP 107 APICID 0 SOCKETID 0 CPUID Vendor AMD Family 21 Model 3 I had some more reboots, some when the computer was idling. Now i reverted the patch and rebuilt the kernel, let's see if it will help. This patch (bapm=true) fixed all bug issue that I had from 3.13 with MSI A78M-E35 motherboard with AMD APU A4-5300. My test are with 3.14.12 kernel and radeon.dpm=1. More ! "vblank_mode=1 glxgears" give two times more fps than with radeon.dpm=0. Better temperature of the APU : cooler in idle mode and cool faster. Succeed until now. Thank you very much for your work. So after more testing, i can confirm that activating bapm on my hardware (A8-6500 APU, Gigabyte GA-F2A88X-D3H mobo) leads to periodical sudden reboots (nothing in the logs). Deactivating bapm makes the system stable again. (In reply to comment #33) > So after more testing, i can confirm that activating bapm on my hardware > (A8-6500 APU, Gigabyte GA-F2A88X-D3H mobo) leads to periodical sudden > reboots (nothing in the logs). > > Deactivating bapm makes the system stable again. That's another big problem, so there needs to be more testing for APUs with other hardware that's not MSI. Time to open up a new bug for that. Will this filter back to 3.15 or earlier? If so, how quickly? It'd be awesome if it filtered back before the ubuntu 14.04.2 release. This is on the Mint Linux release notes below which points to the ubuntu bug listed. http://www.linuxmint.com/rel_qiana_xfce.php https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1309578 I just sent a patch to the stable trees. (In reply to comment #36) > I just sent a patch to the stable trees. Awesome Where do I look to check if the patch has landed yet? I have been scanning git repos and mailing lists since you stated that but am apparently looking in all the wrong places. Or does it just take longer than 2 weeks. This is the only commit I can find anywhere that seems to be related but it's in 3.16: https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=730a336c33a3398d65896e8ee3ef9f5679fe30a9 You can check the stable branches here: http://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/ Note that some stable trees are closed so they will not be getting any more updates. Okay. Thanks. Just my 2 cents: I have this bug too, on a MSI A88XM-E35 desktop mobo with latest bios 30.3 and no OC settings. I posted my bisect at https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1355044/ which drives to ... I will now have to understand what I read here and see what I can decide to workaround. Thank you for the job people here. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.