Description
Kunal
2011-10-29 09:03:58 UTC
Created attachment 52892 [details]
lspci -vvnn output
Created attachment 52893 [details]
Xorg.log when it booted up fine
Should be fixed with this patch: http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-fixes&id=12d5180bd7e683a4ae80830b82ba67e7b7fac7b2 *** This bug has been marked as a duplicate of bug 40103 *** (In reply to comment #3) > Should be fixed with this patch: > http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-fixes&id=12d5180bd7e683a4ae80830b82ba67e7b7fac7b2 > > *** This bug has been marked as a duplicate of bug 40103 *** No, it doesn't solve the screen corruption at both boot time as well as while starting Xorg. I installed new kernel from Ubuntu's builds - so now I'm running 3.1.0-2.3 - the current HEAD of http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-precise.git;a=summary This is rebased to v3.1 from Linus' tree. The ubuntu-precise tree already has the commit you mentioned: http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-precise.git;a=commit;h=12d5180bd7e683a4ae80830b82ba67e7b7fac7b2 The severity of lockups seems to have reduced - however, I haven't tested it thoroughly yet. It still locked up while starting Xorg/KDM, but I could at least switch to tty1 and stop KDM. I started KDM after a while and it started properly. Also, dmesg didn't show any more "GPU lockup" messages after KDM started properly. Is it possible to load radeon.ko with some debug setting enabled? so that it can dump more verbose messages - if that helps you. "modinfo radeon" doesn't show any "debug..." parameter that can be used. Not uploading new screenshots since there are no changes - (b) and (d) from earlier screenshots is what I got when booting into the new kernel. Attaching the new dmesg after this comment. Created attachment 53209 [details]
new dmesg.log with kernel 3.1.0-2.3
Noticed these new messages in dmesg output: [ 15.828387] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x0016 address=0x0000000e7ffff000 flags=0x0010] [ 15.828394] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x0016 address=0x0000000e7ffff080 flags=0x0010] [ 15.828397] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x0016 address=0x0000000e7ffff040 flags=0x0010] [ 15.828399] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x0016 address=0x0000000e7ffff0c0 flags=0x0010] [ 15.828401] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x0016 address=0x0000000e7ffff100 flags=0x0010] [ 15.828404] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x0016 address=0x0000000e7ffff140 flags=0x0010] [ 15.828406] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x0016 address=0x0000000e7ffff180 flags=0x0010] [ 15.828408] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x0016 address=0x0000000e7ffff1c0 flags=0x0010] [ 15.828410] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x0016 address=0x0000000e7ffff200 flags=0x0010] [ 15.828412] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x0016 address=0x0000000e7ffff240 flags=0x0010] [ 15.828414] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x0016 address=0x0000000e7ffff280 flags=0x0010] [ 15.828416] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x0016 address=0x0000000e7ffff2c0 flags=0x0010] [ 15.828418] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x0016 address=0x0000000e7ffff300 flags=0x0010] [ 15.828421] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x0016 address=0x0000000e7ffff480 flags=0x0010] [ 15.828423] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x0016 address=0x0000000e7ffff340 flags=0x0010] [ 15.828425] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x0016 address=0x0000000e7ffff4c0 flags=0x0010] [ 15.828427] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x0016 address=0x0000000e7ffff380 flags=0x0010] [ 15.828429] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x0016 address=0x0000000e7ffff500 flags=0x0010] [ 15.828431] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x0016 address=0x0000000e7ffff3c0 flags=0x0010] [ 15.828433] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x0016 address=0x0000000e7ffff540 flags=0x0010] [ 15.828435] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x0016 address=0x0000000e7ffff400 flags=0x0010] [ 15.828438] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x0016 address=0x0000000e7ffff580 flags=0x0010] [ 15.828440] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x0016 address=0x0000000e7ffff440 flags=0x0010] [ 15.828442] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x0016 address=0x0000000e7ffff5c0 flags=0x0010] Matching the "device=..." from above messages with output from lspci shows that these are from radeon driver. "lspci -vvnn | grep 01:00.0" output: 01:00.0 VGA compatible controller [0300]: ATI Technologies Inc NI Caicos [AMD RADEON HD 6450] [1002:6779] (prog-if 00 [VGA controller]) I don't know much about how IOMMU works - hence thought I'd just post it in the bug report if it helps you in any way. Attaching the full dmesg log after this comment. Created attachment 53225 [details]
dmesg output from 3.1.0-2.3 taken on 2011-11-07
Are you doing this under a virtual box ? If so it's not supported (In reply to comment #8) > Are you doing this under a virtual box ? If so it's not supported No - this is my bare-metal machine booting. I use VirtualBox to boot into a server version of Ubuntu where I do my development work. Besides, I thought VirtualBox emulated its own vga card - why would you see the radeon card inside a virtual machine? Created attachment 53253 [details]
dmesg log after pm-resume attempt
Trying to wake up from suspend-to-ram state also exhibits the same issue.
Moreover, lots of additional messages dumped in dmesg.
Attached the dmesg output here.
(In reply to comment #9) > (In reply to comment #8) > > Are you doing this under a virtual box ? If so it's not supported > > No - this is my bare-metal machine booting. > I use VirtualBox to boot into a server version of Ubuntu where I do my > development work. > > Besides, I thought VirtualBox emulated its own vga card - why would you see the > radeon card inside a virtual machine? BTW, the main issue (of screen corruption) has been happening since even before I installed virtualbox. Does booting with following kernel options help amd_iommu=off iommu=off Created attachment 53375 [details] dmesg log with "amd_iommu=off iommu=off" options added to cmdline (In reply to comment #12) > Does booting with following kernel options help > amd_iommu=off iommu=off No, it doesn't help in any way. Attaching dmesg log. Created attachment 53428 [details] [review] Verbose debug to help pin point issue Can you build a kernel with attached patch and boot with iommu=off and attach dmesg Regarding: AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x0016 address=0x0000000f002e9000 flags=0x0010] (RH BZ: 827123) I do get those from time to time with CAICOS, with no apparent or yet traceable consequences. 01:00.0 VGA compatible controller [0300]: ATI Technologies Inc NI Caicos [AMD RADEON HD 6450] [1002:6779] And it happens on the hosts, just to be sure. I do not use virtualbox on this machine, just KVM. Created attachment 64759 [details] [review] Fixup mc programing This patch should fix your issue. *** Bug 43655 has been marked as a duplicate of this bug. *** (In reply to comment #16) > Created attachment 64759 [details] [review] [review] > Fixup mc programing > > This patch should fix your issue. Thanks for the patch. Will test it over this weekend (4th - 5th Aug.) and post back. For info from bug 43655, it fixes the bug on Cayman (XFX 6950). (In reply to comment #18) > (In reply to comment #16) > > Created attachment 64759 [details] [review] [review] [review] > > Fixup mc programing > > > > This patch should fix your issue. > > Thanks for the patch. Will test it over this weekend (4th - 5th Aug.) and post > back. Damn, my motherboard died :( Waiting for a replacement. Unfortunately, this patch on its own doesn't fix the problem for me. :( I also noticed a new commit with reference to this bug in Linus' tree when I synced my git tree: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=81ee8fb6b52ec69eeed37fe7943446af1dccecc5 So, I cherry-picked this patch on top of commit 0937d042b97a9540b5488ab172aa14b53e80014b of Ubuntu's quantal tree from: git://kernel.ubuntu.com/ubuntu/ubuntu-quantal.git and built the new kernel from it on 15th Aug., 2012. And it still doesn't solve the problem for me :( The behaviour remains the same. Anything more that I can try? I tried to apply this patch on top of commit 81ee8fb6b52ec69eeed37fe7943446af1dccecc5, but it fails to apply at all. I'm now starting to wonder if it's really a driver issue or whether I have gotten a bad card. But, the fact that I can switch to tty (after repeatedly hitting ctrl+alt+f1 for about 30-40 times in fast successions) and then restart kdm without any artifacts suggests that the card is fine. Could it be some race condition being triggered? Attaching the dmesg output. Created attachment 65604 [details]
dmesg output as of 15th Aug., 2012.
dmesg output with new kernel with commit 81ee8fb6b52ec69eeed37fe7943446af1dccecc5.
Lots of GPU softresets at the bottom of the log.
If it doesn't fix the bug for Kunal, does it means bug 43655 is not a duplicate of this one? If so, both bugs should be unlinked and attachment 64759 [details] [review] should be assigned as a fix for bug 43655 and commited as is since it does fix a bug on my side. I am also experiencing complete display corruption with KMS and my Radeon 6450 HD. Building an updated kernel with the following commit did not help: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=81ee8fb6b52ec69eeed37fe7943446af1dccecc5 You might try the 5 patches starting with this one: http://lists.freedesktop.org/archives/dri-devel/2012-August/026498.html (In reply to comment #25) > You might try the 5 patches starting with this one: > http://lists.freedesktop.org/archives/dri-devel/2012-August/026498.html On top of previous patche(s) (by Jerome)? or as separate set? (In reply to comment #26) > (In reply to comment #25) > > You might try the 5 patches starting with this one: > > http://lists.freedesktop.org/archives/dri-devel/2012-August/026498.html > > On top of previous patche(s) (by Jerome)? or as separate set? They apply on top of his patches. (In reply to comment #27) > (In reply to comment #26) > > (In reply to comment #25) > > > You might try the 5 patches starting with this one: > > > http://lists.freedesktop.org/archives/dri-devel/2012-August/026498.html > > > > On top of previous patche(s) (by Jerome)? or as separate set? > > They apply on top of his patches. No luck :( It's still the same after applying those patches and rebuilding the kernel. Attaching the xorg log and dmesg log after this. Created attachment 65703 [details]
dmesg output as of 17th Aug., 2012.
dmesg output with the new kernel.
Created attachment 65704 [details]
xorg log as of 17th Aug, 2012
xorg log for the same.
(In reply to comment #16) > Created attachment 64759 [details] [review] [review] > Fixup mc programing > > This patch should fix your issue. This patch doesn't apply correctly on kernel 3.6-rc2. Would it be possible to rebase or, even better, push it to kernel since it fixes problem with CAYMAN (bug 43655, which should not be considered as a duplicate (similar symptoms, different fix)) A similar patch is already upstream: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=81ee8fb6b52ec69eeed37fe7943446af1dccecc5 (In reply to comment #28) > (In reply to comment #27) > > (In reply to comment #26) > > > (In reply to comment #25) > > > > You might try the 5 patches starting with this one: > > > > http://lists.freedesktop.org/archives/dri-devel/2012-August/026498.html > > > > > > On top of previous patche(s) (by Jerome)? or as separate set? > > > > They apply on top of his patches. > > No luck :( > It's still the same after applying those patches and rebuilding the kernel. > > Attaching the xorg log and dmesg log after this. Anything more that I can try? Or something wrong that I did? Do these patches need any additional bits from 3.6 kernels? Asking since the Ubuntu Quantal series is based on 3.5 while these patches are all fairly new. Any update on this bug? Anything more that I can try? Haven't seen any patches related to this bug going into Linus' tree as well. Thanks, Kunal Did you tested 3.7 kernel ? Bunch of patch went in some might help your case. (In reply to comment #35) > Did you tested 3.7 kernel ? Bunch of patch went in some might help your case. No, not yet. Will build and test a 3.7-rc1 based kernel and report back. Thanks, Kunal (In reply to comment #36) > (In reply to comment #35) > > Did you tested 3.7 kernel ? Bunch of patch went in some might help your case. > > No, not yet. > Will build and test a 3.7-rc1 based kernel and report back. > > Thanks, > Kunal OK, didn't get time to try 3.7-rc1. So, instead, installed 3.7-rc5 based Ubuntu kernel from: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.7-rc5-raring/ (specifically, following files: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.7-rc5-raring/linux-headers-3.7.0-030700rc5-generic_3.7.0-030700rc5.201211110835_amd64.deb http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.7-rc5-raring/linux-headers-3.7.0-030700rc5_3.7.0-030700rc5.201211110835_all.deb http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.7-rc5-raring/linux-image-3.7.0-030700rc5-generic_3.7.0-030700rc5.201211110835_amd64.deb http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.7-rc5-raring/linux-image-extra-3.7.0-030700rc5-generic_3.7.0-030700rc5.201211110835_amd64.deb ) With this, situation is no better. In fact, with this kernel, I cannot even switch to a tty to restart X - which works happily on my current 3.2.0-32-generic (package: "linux-image-3.2.0-32-generic" version: "3.2.0-32.51"). There is no significant message logged in the syslog with 3.7-rc5 which can point to a problem. Attaching the dmesg output and Xorg log anyways. Thanks, Kunal Created attachment 69926 [details]
dmesg output as of 11th Nov., 2012.
dmesg output from 3.7-rc5.
Created attachment 69927 [details]
xorg log as of 11th Nov, 2012
xorg log for 3.7-rc5.
(In reply to comment #37) > (In reply to comment #36) > > (In reply to comment #35) > > > Did you tested 3.7 kernel ? Bunch of patch went in some might help your case. > > > > No, not yet. > > Will build and test a 3.7-rc1 based kernel and report back. > > > > Thanks, > > Kunal > > OK, didn't get time to try 3.7-rc1. So, instead, installed 3.7-rc5 based > Ubuntu kernel from: > http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.7-rc5-raring/ > > (specifically, following files: > http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.7-rc5-raring/linux-headers- > 3.7.0-030700rc5-generic_3.7.0-030700rc5.201211110835_amd64.deb > > http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.7-rc5-raring/linux-headers- > 3.7.0-030700rc5_3.7.0-030700rc5.201211110835_all.deb > > http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.7-rc5-raring/linux-image-3. > 7.0-030700rc5-generic_3.7.0-030700rc5.201211110835_amd64.deb > > http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.7-rc5-raring/linux-image- > extra-3.7.0-030700rc5-generic_3.7.0-030700rc5.201211110835_amd64.deb > ) > > With this, situation is no better. In fact, with this kernel, I cannot even > switch to a tty to restart X - which works happily on my current > 3.2.0-32-generic > (package: "linux-image-3.2.0-32-generic" version: "3.2.0-32.51"). > > There is no significant message logged in the syslog with 3.7-rc5 which can > point to a problem. Attaching the dmesg output and Xorg log anyways. > > Thanks, > Kunal One more point to note: Whenever I try to reboot from 3.7-rc5 into 3.2.0-32, switch-to-tty + restart-x routine doesn't work even with the previously working setup. It exhibits the same issue as 3.7-rc5. Any clue why so? Something to do with VRAM registers? Just guessing.. Thanks, Kunal Hi, Any update on this bug? As I mentioned in my previous comment c#37, v3.7-rc5 has been no better. Anything more that I can try? Any clue if and where can I find the documentation for the chip? (RV730, IIRC) Just so that I can learn about the code. Thanks, Kunal Was attachment 64759 [details] [review] ever applied to kernel's git? While I don't know if the current bug is still happening for radeon HD 6450 cards, it was fixing bug 43655 (which was considered as a duplicate, but it was not as explained in the last comment of that bug because it works for HD 6950). If not, please let me know because I'm still experiencing bug 43655 with kernel 3.9.0-rc4 and attachment 64759 [details] [review] can't be applied on kernel's current git version (was applying on 3.7, not sure about 3.8, definitively not on 3.9). I do not know if this is relevant, but I recently purchased a radeon 6450 and had similar issues - corruption when KMS is started and lockups when X starts. I did not get these issues in fglrx or the windows drivers. I then happened to notice that the memory clock settings for the 'boot' power profile were incorrect, and were significantly higher than the card should support. The high & low power states appeared to be correct. Changing the boot values using the Radeon Bios editor (http://www.techpowerup.com/rbe/) fixed these problems for me. (In reply to comment #43) > I do not know if this is relevant, but I recently purchased a radeon 6450 > and had similar issues - corruption when KMS is started and lockups when X > starts. I did not get these issues in fglrx or the windows drivers. > > I then happened to notice that the memory clock settings for the 'boot' > power profile were incorrect, and were significantly higher than the card > should support. The high & low power states appeared to be correct. Changing > the boot values using the Radeon Bios editor > (http://www.techpowerup.com/rbe/) fixed these problems for me. While this is not related to the proposed attachment, it is quite interesting to know you were able to fix your problem by tweaking your card's bios. Maybe Kunal could tell us more about some similar observations. -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/228. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.