Yet another bug I've encountered on my Radeon HD 7950 with kernel 3.16.
Please attach the dmesg output.
Created attachment 105003 [details] dmesg
Is this a regression? If so, can you bisect?
I'm not sure if it is a regression or simply a new feature that doesn't work. I don't recall seeing the message in kernel 3.14 or prior. I don't have any experience in bisecting.
Created attachment 105455 [details] Radeon dpm hang
Created attachment 105456 [details] Radeon dpm success
I see this happening on Radeon HD 7950, kernel 3.13, which has the radeon dpm enabled by dafault. It is intermittent, happens on ~30% boots, and causes hang followed by reboot (no panic or oops msgs). Unfortunately, I could only catch the dmesg output via serial, when my machine hangs, kernel logs are not even saved to the disk. Now, I see "[drm:si_dpm_set_power_state] *ERROR* si_disable_ulv failed" on serial only when ignore_loglevel kernel parameter is unset. Machine will hang ad reboot each time I spot it. I attached here the dmesg with ignore_loglevel and drm.debug=1 params, both failure and ok cases, for comparison. When failing, the machine just hangs breefly, and reboots, right after "[drm] pitch is 7680" message. With radeon.dpm=0 parameter, this problem NEVER happns!
Same warning here with 3.17rc5. Building from this repository http://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-fixes-3.17
Created attachment 106873 [details] dmesg | grep drm
Just moved from a 6950 (r600g) to a 7950 (radeonsi) and hit the same error on kernel 3.17-rc6.
Is ULV standing for Ultra-low voltage? If so, isn't this option something meant to be applied on APU only?
Created attachment 106884 [details] journalctl log
I commented out every "return ret;" of si_dpm_set_power_state() in si_dpm.c. After booting this modified kernel, I can confirm this is the only error reported in si_dpm_set_power_state(): every other verification passes OK and it goes down to the very end.
(In reply to comment #11) > Is ULV standing for Ultra-low voltage? If so, isn't this option something > meant to be applied on APU only? Mmm don't know. Haven't digged more, but my GPU (R7-265) is running hotter than before, always around 39°-40° (even without any running application). With the kernel in debian sid, 3.16.X, I can see low temperatures in idle. I'm unable to trigger this warning booting up with radeon.dpm=0, so I think is something related to power management.
(In reply to comment #14) > (In reply to comment #11) > > Is ULV standing for Ultra-low voltage? If so, isn't this option something > > meant to be applied on APU only? > > Mmm don't know. > Haven't digged more, but my GPU (R7-265) is running hotter than before, > always around 39°-40° (even without any running application). > With the kernel in debian sid, 3.16.X, I can see low temperatures in idle. > > I'm unable to trigger this warning booting up with radeon.dpm=0, so I think > is something related to power management. Well, according to my journalctl log, it seems to always be triggered after a power state switching. It doesn't do it from boot (power state 0) to performance (power state 1), but it is later. Strangely, I see a Sep 25 20:28:28 Xander kernel: switching from power state: Sep 25 20:28:28 Xander kernel: ui class: performance ... Sep 25 20:28:28 Xander kernel: switching to power state: Sep 25 20:28:28 Xander kernel: ui class: performance Why would it try to switch from power state 1 to power state 1 (the same power state)? And why is it at that moment the problem arises? I'll have to do more tests to see if this behaviour happens each time.
Alex, I think this "ERROR" should be at most a warning: I've been commenting out the "return ret" when we hit the error, and everything else goes as smooth as possible. Also, do you have any clue on the way we should dig to understand why we are hitting this error? As said by Samir, this appeared with dpm.
Created attachment 107784 [details] [review] disable ulv state on SI (In reply to Alexandre Demers from comment #16) > Alex, I think this "ERROR" should be at most a warning: I've been commenting > out the "return ret" when we hit the error, and everything else goes as > smooth as possible. > > Also, do you have any clue on the way we should dig to understand why we are > hitting this error? As said by Samir, this appeared with dpm. It's part of dpm so it only happens when dpm is enabled. ulv is a special low power state the card can go to in certain idle cases. Does the attached patch help?
(In reply to Alex Deucher from comment #17) > Created attachment 107784 [details] [review] [review] > disable ulv state on SI > > (In reply to Alexandre Demers from comment #16) > > Alex, I think this "ERROR" should be at most a warning: I've been commenting > > out the "return ret" when we hit the error, and everything else goes as > > smooth as possible. > > > > Also, do you have any clue on the way we should dig to understand why we are > > hitting this error? As said by Samir, this appeared with dpm. > > It's part of dpm so it only happens when dpm is enabled. ulv is a special > low power state the card can go to in certain idle cases. > > Does the attached patch help? I changed yet again my card and I'm now running a R9 280X. I'll put the old card in tomorrow to have a look at it. So ulv is a feature available on both APUs and 7950 (and some other GPUs). Nice to know. But is ulv support truly supposed to be available on Tahiti? In fact, prior to your patch, why is there already a comment "/* XXX disable for A0 tahiti */" in drivers/gpu/drm/radeon/si_dpm.c but ulv.supported is set to true anyway just on the next line (the one you propose to change in your patch)? To me, it's like saying a thing and doing exactly the opposite at the same time, isn't it? Or is it because there is a special case (Tahiti) that we should be addressing identified by the comment that we are not?
(In reply to Alexandre Demers from comment #18) > > But is ulv support truly supposed to be available on Tahiti? In fact, prior > to your patch, why is there already a comment "/* XXX disable for A0 tahiti > */" in drivers/gpu/drm/radeon/si_dpm.c but ulv.supported is set to true > anyway just on the next line (the one you propose to change in your patch)? > To me, it's like saying a thing and doing exactly the opposite at the same > time, isn't it? Or is it because there is a special case (Tahiti) that we > should be addressing identified by the comment that we are not? A0 is first silicon (basically the initial silicon samples we get back from the fab during bring up). The issue was fixed in later silicon revisions. There usually aren't any A0 boards in the wild.
Sadly, I won't be able to test this patch, I had an opportunity to sell my hd 7950. We can keep it open if someone else can test it.
I've rebuilded today the whole stack (mesa, ddx, drm, xorg, and kernel) with latest commit. Looks like the problem is now solved. Dmesg attached.
Created attachment 108125 [details] dmesg | grep drm
I've a kaveri a8-7100. After this patch, is there a way to reenable ulv without rebuilding drm?
(In reply to sean darcy from comment #23) > I've a kaveri a8-7100. After this patch, is there a way to reenable ulv > without rebuilding drm? This patch and bug have nothing to do with Kaveri. It's specifically related Southern Islands GPUs.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/518.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.