Description
Da Fox
2010-06-05 11:00:07 UTC
Please which GPU (lspci -v) I'm sorry, I should have provided that information immediately. Output of 'lspci -v' for the video controller: ---8<--------- 01:00.0 VGA compatible controller: ATI Technologies Inc RV350 [Mobility Radeon 9600 M10] (prog-if 00 [VGA controller]) Subsystem: IBM Device 0550 Flags: bus master, fast Back2Back, 66MHz, medium devsel, latency 66, IRQ 11 Memory at e0000000 (32-bit, prefetchable) [size=128M] I/O ports at 3000 [size=256] Memory at c0100000 (32-bit, non-prefetchable) [size=64K] [virtual] Expansion ROM at c0120000 [disabled] [size=128K] Capabilities: [58] AGP version 2.0 Capabilities: [50] Power Management version 2 Kernel driver in use: radeon --->8--------- It seems to say memory size is 128M, but this is a 64M board... The command was run under kernel 2.6.33. Please attach full dmesg thanks Created attachment 36221 [details]
Kernel log for one day
Also seeing random freezes here on my Thinkpad T60 on 2.6.34 with X1300 mobility radeon. I've recompiled the kernel (this time revision 8e36113082821980c60ce89a6c5d45fc9492fc26) with netconsole enabled, and have 'triggered' the freeze. At this point no errors, warnings or other messages were being printed to netconsole. Remotely logging-in via ssh was not possible. The kernel did not respond even to SysRQ-b. Netconsole output did stop when the system finished booting, but was re-enabled using the command 'dmesg -n 8', issued as root. Netconsole functionality was then tested by disabling and then re-enabling swap space. this caused the swap enabled message to be printed on the netconsole. The kernel was booted with 'debug' specified on the commandline Is there any way to enable a more verbose output after booting? I've searched for a kernel config option to enable more verbose logging, but I could not find anything which seemed relevant. (In reply to comment #5) > Also seeing random freezes here on my Thinkpad T60 on 2.6.34 with X1300 > mobility radeon. Could you please also specify the exact GPU that you have ? (lspci -v), and your kernel logs? (perhaps there is something being logged in yours?) > (In reply to comment #5)
> > Also seeing random freezes here on my Thinkpad T60 on 2.6.34 with X1300
> > mobility radeon.
>
> Could you please also specify the exact GPU that you have ? (lspci -v), and
> your kernel logs? (perhaps there is something being logged in yours?)
the easy one to start with:
# uname -a
Linux voyager 2.6.34-gentoo #1 SMP PREEMPT Mon May 24 10:53:46 EST 2010 i686 Intel(R) Core(TM) Duo CPU T2400 @ 1.83GHz GenuineIntel GNU/Linux
# lspci -vv -s 01:0
01:00.0 VGA compatible controller: ATI Technologies Inc M52 [Mobility Radeon X1300] (prog-if 00 [VGA controller])
Subsystem: Lenovo Device 2005
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 16
Region 0: Memory at d8000000 (32-bit, prefetchable) [size=128M]
Region 1: I/O ports at 2000 [size=256]
Region 2: Memory at ee100000 (32-bit, non-prefetchable) [size=64K]
[virtual] Expansion ROM at ee120000 [disabled] [size=128K]
Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [58] Express (v1) Legacy Endpoint, MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE- FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x16, ASPM L0s L1, Latency L0 <64ns, L1 <1us
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Kernel driver in use: radeon
Kernel modules: radeon
(In reply to comment #7) > the easy one to start with: So how is it coming with the difficult one? Was there nothing in your logs or did you not have a chance to take a look yet? I seem to have the same problem, see https://bugs.freedesktop.org/show_bug.cgi?id=23660#c23 GPU is the same, I also had the problem with the non-KMS driver earlier, but that magically disappeared with kernel 2.6.33 and KMS. With 2.6.34 I get these freezes again. I tried with kernel 2.6.35.2 over the weekend, and this issue is still present. However there is a slight urge to get this fixed, because the rest of the system will start to depend on more recent kernels in a while. For example the newer udev-151 warns that: ---8<--------- ERROR: setup CONFIG_IDE: should not be set. But it is. WARN: setup Please check to make sure these options are set correctly. Failure to do so may cause unexpected problems. --->8--------- This is due to the fact that I am using the old (and pretty soon deprecated) ATA drivers in the kernel, and not the newer (and nowadays stable) libata drivers. So if I want to use the newer udev I would prefer to be able to do so on a newer kernel, given that the libata drivers are still relatively new. This is of course but one example, but there can be a variety of reasons which necessitate a kernel upgrade. Another good example would be the inter-dependencies of Xorg, video drivers and the kernel. So please: Devs: take a look at the commits I pointed out in the initial description, and answer my question from comment #6. Aidan: upload your kernel logs, I think you should have collected enough data by now. (In reply to comment #10) I can't offer much help anymore on this defect, sorry. Soon after my comment, my T60 laptop was returned as part of the company lease cycle and my new thinkpad has an intel gfx chip. This is still an issue with 2.6.36-rc2. This may be related to connector polling. Does the patch to disable polling in bug 29389 help? (In reply to comment #13) > This may be related to connector polling. Does the patch to disable polling > in bug 29389 help? I'm sorry to report that it does not. I applied the patch, but I had to make a small change to it because the last chunk did not apply. It seems delayed_slow_work_enqueue() was renamed to queue_delayed_work() and the patch only seems to add an additional if() before the call to queue_delayed_work() in drm_helper_hpd_irq_event(), so I modified the patch accordingly. I also modified my kernel command-line to include drm_kms_helper.poll=0. I had read that the patch might cause X to refuse to start, but I did not experience this. X started normally on each attempt (I have tried 3x). Each time the system froze in the same way as before, within a few minutes of booting. I just wanted to point out a thread I just found on the ArchLinux forums which describes the same issue, with many people reportedly experiencing this problem: https://bbs.archlinux.org/viewtopic.php?id=100843&p=1 One user ('vootey') who has attempted to bisect the kernel appears to have isolated the same region of commits that I have identified: "I'm trying to git bisect the kernel and until now I was able to narrow the bug (for me) down to the higher 2.6.33 area.". So at least now I know I'm not alone :( Thank you, Da Fox, for pointing me to this thread. I'm "vootey". I can confirm that issue. The freezes mostly occur, while using firefox. As Da Fox said, I'm on my way to finish bisecting (yet 4 steps). I marked versions as "good", if there was at least 10 hours uptime without freezing. I tried to stress my system more than normally. (If a kernel was bad, the freeze always came within 2 hours uptime.) I will report, when I'm finsihed. I will attach the usual sys-info files. Please ask, if more is needed. Created attachment 38307 [details]
cat /proc/version
kernel is from a bisecting git-repo (so ignore the version)
Created attachment 38308 [details]
cat /proc/cpuinfo
Created attachment 38309 [details]
cat /proc/modules
Created attachment 38310 [details]
cat /proc/ioports
Created attachment 38311 [details]
cat /proc/iomem
Created attachment 38312 [details]
cat /proc/scsi/scsi
Created attachment 38313 [details]
lspci -vvv
Created attachment 38314 [details]
kernel .config
I got it. Since the last kernels were all bad ones, progress was made a bit faster. ############# 32b3c2abaf8c61c80a8b02071c73f05252122ffe is the first bad commit commit 32b3c2abaf8c61c80a8b02071c73f05252122ffe Author: Jerome Glisse <jglisse@redhat.com> Date: Fri Feb 26 19:14:12 2010 +0000 drm/radeon/kms: initialize set_surface_reg reg for rs600 asic rs600 asic was missing set_surface_reg callback leading to oops. Signed-off-by: Jerome Glisse <jglisse@redhat.com> Signed-off-by: Dave Airlie <airlied@redhat.com> :040000 040000 f46b151d49ec9023ce01cded50fda4c52db311cb 4e640582f7f3b07ed9994422432580070565692e M drivers ############ (In reply to comment #25) > I got it. > Since the last kernels were all bad ones, progress was made a bit faster. > > ############# > 32b3c2abaf8c61c80a8b02071c73f05252122ffe is the first bad commit > commit 32b3c2abaf8c61c80a8b02071c73f05252122ffe > Author: Jerome Glisse <jglisse@redhat.com> > Date: Fri Feb 26 19:14:12 2010 +0000 > > drm/radeon/kms: initialize set_surface_reg reg for rs600 asic > > rs600 asic was missing set_surface_reg callback leading to > oops. > > Signed-off-by: Jerome Glisse <jglisse@redhat.com> > Signed-off-by: Dave Airlie <airlied@redhat.com> > > :040000 040000 f46b151d49ec9023ce01cded50fda4c52db311cb > 4e640582f7f3b07ed9994422432580070565692e M drivers > ############ Interesting, that is almost the same as what I found, but not exactly. Which repository did you use for bi-secting? I used dave airlie's (airlied) drm, with what was then the drm-next branch. I don't know if that would make a difference though. I don't know how/if git preserves history across merges in different branches, i.e. if you used linuz's tree, would you see the whole history or only the merge points? The reason I am wondering is because to me it seems 32b3c2abaf8c61c80a8b02071c73f05252122ffe is just after 2 merge points, one of which includes the commits I pointed out in my initial comment. (if I understand the gitk listing correctly at least). Also to me 32b3c2abaf8c61c80a8b02071c73f05252122ffe seems unlikely as the culprit, since it modifies something in the rs600 code, and hence should not have any effect on our r3xx cards. Could you possibly see if the following three (consecutive) commits are in your tree too, and test them? 7a9f0dd9c49425e2b0e39ada4757bc7a38c84873 - drm: Add generic multipart buffer. d594e46ace22afa1621254f6f669e65430048153 - drm/radeon/kms: simplify memory controller setup V2 44ca7478d46aaad488d916f7262253e000ee60f9 - drm/radeon: Add asic hook for dma copy to r200 cards. For me 44ca7478d46aaad488d916f7262253e000ee60f9 seems to be the last stable commit (I've been testing it again today, and it didn't freeze so far), whereas I've noted 7a9f0dd9c49425e2b0e39ada4757bc7a38c84873 down as freezing. My notes also say I could not test d594e46ace22afa1621254f6f669e65430048153 for some reason. (In reply to comment #26) > Which repository did you use for bi-secting? Linus' Tree > I don't know how/if git preserves history across merges in > different branches, i.e. if you used linuz's tree, would you see the whole > history or only the merge points? I don't know. I'm not very familiar with git. > Also to me > 32b3c2abaf8c61c80a8b02071c73f05252122ffe seems unlikely as the culprit, since > it modifies something in the rs600 code, and hence should not have any effect > on our r3xx cards. I completely agree. I will rollback a view versions and try "good"-versions again. > Could you possibly see if the following three (consecutive) commits are in > your tree too, and test them? I'd like, as soon as I find out, how to checkout these specific versions. ;) What we need is a damn trigger to this bug. Without reproducibility trying to catch the right commit is a pure matter of luck and very frustrating. (In reply to comment #27) > (In reply to comment #26) > > Could you possibly see if the following three (consecutive) commits are in > > your tree too, and test them? > I'd like, as soon as I find out, how to checkout these specific versions. ;) Try something like 'git checkout <sha1>' to checkout a specific revision. It will complain if the commit does not exist or if there is something else which prevents the checkout (locally modified files for example). Incase they have a diffent sha1 id in your tree you can try looking for them using 'gitk' (the git repository browser). Since the kernel is so big you may want to limit how far back gitk should show history, try gitk --since=01-01-2010. You can either try to jump directly to the commit by entering a sha1 id into the 'SHA1 ID' box, or by typing a commit message into the 'Find' box. Beware that by default the search is case-sensitive. > What we need is a damn trigger to this bug. Without reproducibility trying to > catch the right commit is a pure matter of luck and very frustrating. I feel your pain, I've also mis-identified an 'unstable' commit as 'stable', which means the whole rest of the bisecting process (which takes a lot of time) is wasted. What triggers the freeze for me most of the time though is opening firefox (with a lot of windows and tabs from my previous browsing session). I do this as soon as I've logged in and my desktop environment has finished loading. On a side note, for those earlier commits (such as 44ca7478d46aaad488d916f7262253e000ee60f9, which I've been testing again during the day), do you also find that the system becomes very slow? As in there is a high CPU usage, but not caused by any running program? I've seen this again today, where top reports all programs using only a small amount of cpu (a grand total of 8% or so), and yet my CPU usage was almost 50%. Thanks for the help. I managed to compile and boot the kernel with 7a9f0dd9c49425e2b0e39ada4757bc7a38c84873 as head. ('uname -r' reports "2.6.33-00035-gaa71fa3") 3,5 hours up and so far no freeze. (In reply to comment #29) > Thanks for the help. > > I managed to compile and boot the kernel with > 7a9f0dd9c49425e2b0e39ada4757bc7a38c84873 as head. > ('uname -r' reports "2.6.33-00035-gaa71fa3") > > 3,5 hours up and so far no freeze. That uname report does not match with the version you've compiled, something must have gone wrong. It should be "2.6.33-00519-7a9f0dd", the first part is the current 'base' version of the kernel (so 2.6.33, since it's before the release of 2.6.34), the second part (00035) I do not know the meaning of (so it could be different for you), and the final part is composed of the letter 'g' (for git?) followed by the first part of the sha1 id. Revision aa71fa3... is "Merge remote branch 'nouveau/for-airlied' into drm-next-stage", which is still a bit before those three commits. So no freeze there is good! For me 7a9f0dd9c49425e2b0e39ada4757bc7a38c84873 resets the computer when starting X. This is probably due to a bug in that version, which has been fixed a few commits later, in 8e36113082821980c60ce89a6c5d45fc9492fc26 - drm/radeon/kms: fix R3XX/R4XX memory controller intialization. I've compiled and tested a kernel based on d594e46ace22afa1621254f6f669e65430048153 with one additionally commit, 8e36113082821980c60ce89a6c5d45fc9492fc26. This again froze within a minute of starting firefox. So the offending commit definitely must be d594e46ace22afa1621254f6f669e65430048153 - drm/radeon/kms: simplify memory controller setup V2. If you want to test this too you can do it like this: $ git checkout d594e46ace22afa1621254f6f669e65430048153 $ git cherry-pick -n 8e36113082821980c60ce89a6c5d45fc9492fc26 cherry-pick applies a commit on top of the current state. the -n flag does not actually commit anything, but only makes local changes. You can now compile and test this version. To get rid of the local changes again run $ git reset --hard (In reply to comment #30) > That uname report does not match with the version you've compiled hm.. maybe I mixed it up with my bisect session. I'm sorry and thank you for your patience. :) > For me 7a9f0dd9c49425e2b0e39ada4757bc7a38c84873 resets the computer when > starting X. This is probably due to a bug in that version, which has been fixed > a few commits later, in 8e36113082821980c60ce89a6c5d45fc9492fc26 - > drm/radeon/kms: fix R3XX/R4XX memory controller intialization. > > I've compiled and tested a kernel based on > d594e46ace22afa1621254f6f669e65430048153 with one additionally commit, > 8e36113082821980c60ce89a6c5d45fc9492fc26. This again froze within a minute of > starting firefox. So the offending commit definitely must be > d594e46ace22afa1621254f6f669e65430048153 - drm/radeon/kms: simplify memory > controller setup V2. At the moment I am on 2.6.32-00518-gd594e46-dirty (should be the d59e.. commit with the 8e36... patch; I did as you explained) with an uptime of 1.5 hours. Hm.. somhow unusual for "bad" kernel. To be honest, I hope, a crash occurs soon. :D Do you also know, what that "-dirty" means in the release-version? d594e46ace22afa1621254f6f669e65430048153 finally caused a freeze. Now I'm checking 7a9f0dd9c49425e2b0e39ada4757bc7a38c84873 (2.6.32-00519-g7a9f0dd-dirty) 7a9f0dd9c49425e2b0e39ada4757bc7a38c84873 freezed for me as well. > Do you also know, what that "-dirty" means in the release-version? I believe it means that you have local, uncommitted changes to the source tree. This is expected since we instructed cherry-pick not to actually commit any of the changes it made. > d594e46ace22afa1621254f6f669e65430048153 finally caused a freeze. > 7a9f0dd9c49425e2b0e39ada4757bc7a38c84873 freezed for me as well. Good, that confirms that the commit causing the freeze issue is indeed prior to the commit you identified at first, 32b3c2abaf8c61c80a8b02071c73f05252122ffe. If you could now confirm 44ca7478d46aaad488d916f7262253e000ee60f9 as not causing the freeze we have finally isolated the exact commit that is causing the freezes. Hopefully the dev's can then at last fix it. Having more people point out the same commit is always more convincing than one person alone, especially in a bug which can be as random as this one (since sometimes it takes quite a bit of time before it happens, and sometimes it almost instantly freezes when launching for example firefox). (In reply to comment #34) > If you could now confirm 44ca7478d46aaad488d916f7262253e000ee60f9 as not > causing the freeze we have finally isolated the exact commit that is causing > the freezes. I'm on it. (In reply to comment #28) > On a side note, for those earlier commits (such as > 44ca7478d46aaad488d916f7262253e000ee60f9, which I've been testing again during > the day), do you also find that the system becomes very slow? As in there is a > high CPU usage, but not caused by any running program? > I've seen this again today, where top reports all programs using only a small > amount of cpu (a grand total of 8% or so), and yet my CPU usage was almost 50%. Yes, it becomes slower, but the CPU usage is only slightly higher. I just notice it, when watching movies by experiencing lags every 5 seconds for instance.) But I guess, that should not bother us, since this issue is none (at least for me) in higher kernel-versions and I doubt, that this has something to do with our bug. Do you agree? (In reply to comment #34) > If you could now confirm 44ca7478d46aaad488d916f7262253e000ee60f9 as not > causing the freeze we have finally isolated the exact commit that is causing > the freezes. Confirmed. 44ca7478d46aaad488d916f7262253e000ee60f9 is in use now for ~ 3 days and no freeze has occured. That would support your assumption. And I think, it's a very likely one. Created attachment 38458 [details]
Output of "radeontool regmatch '*'" on a clean boot with 44ca7478d46aaad488d916f7262253e000ee60f9
Created attachment 38459 [details]
Output of "radeontool regmatch '*'" on a clean boot with d594e46ace22afa1621254f6f669e65430048153 + 8e36113082821980c60ce89a6c5d45fc9492fc26
> Yes, it becomes slower, but the CPU usage is only slightly higher. I just > notice it, when watching movies by experiencing lags every 5 seconds for > instance.) > But I guess, that should not bother us, since this issue is none (at least for > me) in higher kernel-versions and I doubt, that this has something to do with > our bug. Do you agree? Agreed, 'first things first'. I've talked to one of the dev's on IRC ('airlied') and he requested we post the output of the following command: "radeontool regmatch '*'" for both the last known 'good' commit (44ca7478d46aaad488d916f7262253e000ee60f9) and the 'bad' commit (d594e46ace22afa1621254f6f669e65430048153 + 8e36113082821980c60ce89a6c5d45fc9492fc26). I've already attached my output, could you please do the same? p.s. You can find radeontool here (http://cgit.freedesktop.org/~airlied/radeontool/) if your distro does not provide a package. if you boot with a bad kernel and run radeontool regset 0x130 0x70000000 does it stabilise any? Created attachment 38468 [details]
radeontool regmatch '*' on 44ca7478d46aaad488d916f7262253e000ee60f9
Created attachment 38469 [details]
radeontool regmatch '*' on d594e46ace22afa1621254f6f669e65430048153 + 8e36113082821980c60ce89a6c5d45fc9492fc26
Since I don't know, what's important when calling radeontool, I'll better tell you what I did: I booted the new kernel, started X (via kdm), switched to a vt and used radeontools. (In reply to comment #40) > if you boot with a bad kernel and run > radeontool regset 0x130 0x70000000 > does it stabilise any? No. I booted a 2.6.35-r3 kernel with gentoo-patches, which is known for me to possess a high freeze-frequency and did as you said: $ radeontool regset 0x130 0x70000000 OLD: 0x130 (0130) 0x40800000 (1082130432) NEW: 0x130 (0130) 0x70000000 (1879048192) After a few minutes, the freeze appeared again. (In reply to comment #43) > Since I don't know, what's important when calling radeontool, I'll better tell > you what I did: > I booted the new kernel, started X (via kdm), switched to a vt and used > radeontools. I'm sorry for not elaborating more, but I'm no expert myself :) However I don't think it is necessary to switch to a vt, it may in fact not be what is required. My understanding is that radeontool captures the current state of the graphics card (the important bits at least). This is likely different between X and a vt, however we are interested in the state the card has in X (since that is where it freezes). Could please also post the results of running radeontool from within X, just to be sure we capture all relevant information? > (In reply to comment #40) > > if you boot with a bad kernel and run > > radeontool regset 0x130 0x70000000 > > does it stabilise any? > No. > I booted a 2.6.35-r3 kernel with gentoo-patches, which is known for me to > possess a high freeze-frequency and did as you said: > > $ radeontool regset 0x130 0x70000000 > OLD: 0x130 (0130) 0x40800000 (1082130432) > NEW: 0x130 (0130) 0x70000000 (1879048192) > > After a few minutes, the freeze appeared again. I have the same result, using d594e46ace22afa1621254f6f669e65430048153 + 8e36113082821980c60ce89a6c5d45fc9492fc26: # radeontool regset 0x130 0x70000000 OLD: 0x130 (0130) 0x70800000 (1887436800) NEW: 0x130 (0130) 0x70000000 (1879048192) And a freeze soon afterwards. I am currently testing d594e46ace22afa1621254f6f669e65430048153 + 8e36113082821980c60ce89a6c5d45fc9492fc26 and the following patch as suggested by Dave Airlie on IRC: ---8<--------- diff --git a/drivers/gpu/drm/radeon/r300.c b/drivers/gpu/drm/radeon/r300.c index c827738..d1a7803 100644 --- a/drivers/gpu/drm/radeon/r300.c +++ b/drivers/gpu/drm/radeon/r300.c @@ -477,7 +477,7 @@ void r300_mc_init(struct radeon_device *rdev) default: rdev->mc.vram_width = 128; break; } r100_vram_init_sizes(rdev); - base = rdev->mc.aper_base; + base = 0; if (rdev->flags & RADEON_IS_IGP) base = (RREG32(RADEON_NB_TOM) & 0xffff) << 16; radeon_vram_location(rdev, &rdev->mc, base); --->8--------- This seems to help for me, I'm still testing but I've been running for a couple of hours already and so far haven't seen a freeze yet. Could you please test this patch also? (In reply to comment #44) > Could you please test this patch also? I'm testing the 2.6.36-rc3 kernel with this patch at the moment. Looks promising for me as well. But I still need some hours to really confirm it. If I may ask, on which IRC-channel/server are you talking? (In reply to comment #45) > If I may ask, on which IRC-channel/server are you talking? There is an IRC channel #radeon on irc.freenode.net for radeon users and developers. Created attachment 38516 [details] [review] possible fix Does this patch help? It always aligns the MC vram and gtt bases to size. (In reply to comment #47) > Created an attachment (id=38516) [details] > possible fix > > Does this patch help? It always aligns the MC vram and gtt bases to size. I'm sorry to report that it does not. I've tried with 96576a9e1a0cdb8a43d3af5846be0948f52b4460 (current drm-next in airlied's tree). This freezes without any patches, seems stable with airlied's patch to put vmem at address 0, but freezes still with your patch. Lukas, can you confirm that this patch still freezes? I've also noticed (rare) random freezes with 2.6.34.x kernels. Basically, I've tried to wake the PCs from "DPMS OFF", only to find them completely unresponsive and needing a reboot instead. However, only one of those PCs has an RV350 card. The other two have rv280 and rv100 cards instead, so Dave Airlie's patch to r300.c cannot possibly help them. (In reply to comment #49) > I've also noticed (rare) random freezes with 2.6.34.x kernels. Basically, I've > tried to wake the PCs from "DPMS OFF", only to find them completely > unresponsive and needing a reboot instead. However, only one of those PCs has > an RV350 card. The other two have rv280 and rv100 cards instead, so Dave > Airlie's patch to r300.c cannot possibly help them. Chris: these freezes do occur during normal operation, i.e. while working with the computer, not only during DMPS. It happens during all kinds of activities, e.g. it may happen while browsing, typing a letter, chatting, alt-tabbing, or even not doing anything. However almost always it seems to be triggered by some activity. For me for example, for me starting firefox after a fresh boot has a 99% chance of causing a freeze during the 'restore tabs from last time' phase. Although it is quite possible that the freeze will also occur during DPMS sleep, I have not experienced it yet (mostly because the freeze will occur while working, so the computer didn't get a chance to go into DPMS sleep). So the first thing to do would be to verify that you indeed are experiencing the same issue (and not an unrelated DPMS problem) is to keep using your computer and wait for a freeze to occur during usage. Your best bet would be to try the rv350 card, I have mostly only seen people with r300 and/or rv350 describe this problem, and both me and lukas have an rv350 card (we both have a Mobility Radeon 9600 M10). Once you have confirmed that the freeze occurs during normal working operations also, you should proceed to verify our git-bisect results and test the patches provided by Dave Airlie and Alex Deucher. Best of luck! p.s. Is the rv350 card a PC or a laptop? I noticed both lukas and I have a laptop with an rv350 card, so perhaps it has something to do with mobility editions? (In reply to comment #45) > I'm testing the 2.6.36-rc3 kernel with airlied's patch at the moment. Looks > promising for me as well. But I still need some hours to really confirm it. 1.5 days uptime and no freeze. So definitely confirmed. (In reply to comment #48) > I'm sorry to report that it does not. I've tried with > 96576a9e1a0cdb8a43d3af5846be0948f52b4460 (current drm-next in airlied's tree). > This freezes without any patches, seems stable with airlied's patch to put vmem > at address 0, but freezes still with your patch. > > Lukas, can you confirm that this patch still freezes? 2.6.36-rc3 with alex' patch up for > 3 hours and waiting.. :) random - possibly Radeon DRM KMS related - freezes https://bugzilla.kernel.org/show_bug.cgi?id=16376 which I reported seems to be a duplicate of this one. I am having those freezes on a ThinkPad T42 with shambhala:~> lspci -nn | grep -i vga 01:00.0 VGA compatible controller [0300]: ATI Technologies Inc RV350 [Mobility Radeon 9600 M10] [1002:4e50] As per suggestion from Alex I will now test patch from comment #47. Then I will try the patches mentioned in comment #44. Da Fox, are d594e46ace22afa1621254f6f669e65430048153 + 8e36113082821980c60ce89a6c5d45fc9492fc26 in some drm related development branch? Can I apply these with git cherry-pick as well? (In reply to comment #52) > I am having those freezes on a ThinkPad T42 with I have the same laptop, I'm glad to someone else still using an old ThinkPad :) > Da Fox, are d594e46ace22afa1621254f6f669e65430048153 + > 8e36113082821980c60ce89a6c5d45fc9492fc26 in some drm related development > branch? Can I apply these with git cherry-pick as well? Yes, they're from airlied's tree, in the drm-next branch. I think they are in Linus' tree too, which is what Lukas Schneiderbauer uses. Created attachment 38564 [details] [review] vram align patch does not seem to work, now trying this vmembase at 0 patch Alex, your patch from comment #47 does not work. Kernel froze a few seconds after Plasma from KDE 4.4.5 build up the OpenGL compositing desktop. Now testing with the vmem-base-0 patch from Dave from comment #44. I am attaching it here, since cut and paste it from the comment gives a malformed patch. I am using 60140c143b5cd04d85fec8085d56a1430a109846 from Nigel's tuxonice-head branch, since I am now pretty sure, the freeze is unrelated to TuxOnIce and when this vmem base 0 thing works, I also have a TuxOnIce kernel without compiling another time. Its 2.6.36-rc3 and seems to contain all the other patches from comment #44 and comment #48 already. Looks very good so far. I will reboot this kernel several times tomorrow - as a freeze so far only every happened *before* the first hibernation / snapshot cycle - but I looked some Startrek Voyager without a freeze with: martin@shambhala:~/Computer/Shambhala/Kernel/2.6.36> cat /proc/version Linux version 2.6.36-rc3-tp42-toi-3.2-rc1-vmembase-0-05032-g60140c1-dirty (martin@shambhala) (gcc version 4.4.5 20100728 (prerelease) (Debian 4.4.4-8) ) #2 PREEMPT Wed Sep 8 21:36:34 CEST 2010 Thanks. (In reply to comment #48) > (In reply to comment #47) > > Created an attachment (id=38516) [details] [details] > > possible fix > > > > Does this patch help? It always aligns the MC vram and gtt bases to size. > > I'm sorry to report that it does not. I've tried with > 96576a9e1a0cdb8a43d3af5846be0948f52b4460 (current drm-next in airlied's tree). > This freezes without any patches, seems stable with airlied's patch to put vmem > at address 0, but freezes still with your patch. > > Lukas, can you confirm that this patch still freezes? I've tried this patch again today, this time using vanilla 2.6.36-rc3. Unfortunately it froze again upon launching firefox. (In reply to comment #56) > (In reply to comment #48) > > (In reply to comment #47) > > > Created an attachment (id=38516) [details] [details] [details] > > > possible fix > > > > > > Does this patch help? It always aligns the MC vram and gtt bases to size. > > > > I'm sorry to report that it does not. I've tried with > > 96576a9e1a0cdb8a43d3af5846be0948f52b4460 (current drm-next in airlied's tree). > > This freezes without any patches, seems stable with airlied's patch to put vmem > > at address 0, but freezes still with your patch. > > > > Lukas, can you confirm that this patch still freezes? > > I've tried this patch again today, this time using vanilla 2.6.36-rc3. > Unfortunately it froze again upon launching firefox. Hm... damn. My 2.6.36-rc3 with alex' patch didn't give me a freeze for ~ 1 day. And I'm pretty sure, that I applied the patch correctly and didn't mix up any of these patches. (did some checks ...) However, I did a reset of the whole tree, pulled the newest version and applied alex' patch again. I'm on 2.6.36-rc3-00185-gd56557a-dirty and testing.. To what I see with git log | grep -A 4 96576a9e1a0cdb8a43d3af5846be0948f52b4460 this commit titled "agp: intel-agp: do not use PCI resources before pci_enable_device()" is already in 2.6.36-rc3. The vmem base at zero patch that fixes or at least works around the issue is the only differcence I have: martin@shambhala:~/Computer/Shambhala/Kernel/2.6.36/tuxonice-head> git diff | egrep "^(\+|\-)" --- a/drivers/gpu/drm/radeon/r300.c +++ b/drivers/gpu/drm/radeon/r300.c - base = rdev->mc.aper_base; + base = 0; So far this kernel works fine. It locked during userspace software suspend initiating a snapshot but that seems to be a different issue. TuxOnIce hibernation worked two cycles already. (In reply to comment #57) > (In reply to comment #56) > > (In reply to comment #48) > > > (In reply to comment #47) > > > > Created an attachment (id=38516) [details] [details] [details] [details] > > > > possible fix > > > > > > > > Does this patch help? It always aligns the MC vram and gtt bases to size. > > > > > > I'm sorry to report that it does not. I've tried with > > > 96576a9e1a0cdb8a43d3af5846be0948f52b4460 (current drm-next in airlied's tree). > > > This freezes without any patches, seems stable with airlied's patch to put vmem > > > at address 0, but freezes still with your patch. > > > > > > Lukas, can you confirm that this patch still freezes? > > > > I've tried this patch again today, this time using vanilla 2.6.36-rc3. > > Unfortunately it froze again upon launching firefox. > > Hm... damn. My 2.6.36-rc3 with alex' patch didn't give me a freeze for ~ 1 day. > And I'm pretty sure, that I applied the patch correctly and didn't mix up any > of these patches. (did some checks ...) > However, I did a reset of the whole tree, pulled the newest version and applied > alex' patch again. > I'm on 2.6.36-rc3-00185-gd56557a-dirty and testing.. You seem to be the same gfx card, but different surrounding hardware, a Fujitsu-Siemens laptop? Maybe Alex patch works on your hardware, but does not work on Da Fox' and my ThinkPad T42? You have: 01:00.0 VGA compatible controller: ATI Technologies Inc RV350 [Mobility Radeon 9600 M10] (prog-if 00 [VGA controller]) Subsystem: Fujitsu Limited. Device 127f Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B+ DisINTx- Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 66 (2000ns min), Cache Line Size: 32 bytes Interrupt: pin A routed to IRQ 11 Region 0: Memory at c8000000 (32-bit, prefetchable) [size=128M] Region 1: I/O ports at 2000 [size=256] Region 2: Memory at c0100000 (32-bit, non-prefetchable) [size=64K] [virtual] Expansion ROM at c0120000 [disabled] [size=128K] Capabilities: [58] AGP version 2.0 Status: RQ=80 Iso- ArqSz=0 Cal=0 SBA+ ITACoh- GART64- HTrans- 64bit- FW+ AGP3- Rate=x1,x2,x4 Command: RQ=32 ArqSz=0 Cal=0 SBA+ AGP+ GART64- 64bit- FW- Rate=x1 Capabilities: [50] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Kernel driver in use: radeon 01:00.0 VGA compatible controller: ATI Technologies Inc RV350 [Mobility Radeon 9600 M10] (prog-if 00 [VGA controller]) Subsystem: IBM Device 0550 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B+ DisINTx- Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 66 (2000ns min), Cache Line Size: 32 bytes Interrupt: pin A routed to IRQ 11 Region 0: Memory at e0000000 (32-bit, prefetchable) [size=128M] Region 1: I/O ports at 3000 [size=256] Region 2: Memory at c0100000 (32-bit, non-prefetchable) [size=64K] [virtual] Expansion ROM at c0120000 [disabled] [size=128K] Capabilities: [58] AGP version 2.0 Status: RQ=80 Iso- ArqSz=0 Cal=0 SBA+ ITACoh- GART64- HTrans- 64bit- FW+ AGP3- Rate=x1,x2,x4 Command: RQ=32 ArqSz=0 Cal=0 SBA+ AGP+ GART64- 64bit- FW- Rate=x1 Capabilities: [50] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Kernel driver in use: radeon Region 0 memory and I/O ports are at different addresses. Maybe that explains it? Apart from that only PMEClk looks slightly different. I don't know what all that means exactly, but maybe its a hint? Maybe its also from difference in userspace that triggers or not triggers slightly different code paths? I have Debian Squeeze/Sid/Experimental with: martin@shambhala:~> apt-show-versions | egrep "(xserver-xorg/|xserver-xorg-core/|xserver-xorg-video-radeon/|libgl1-mesa-dri/|libdrm2/|libdrm-radeon1/|kde-window-manager/|kdelibs5/)" kde-window-manager/squeeze uptodate 4:4.4.5-3 kdelibs5/squeeze uptodate 4:4.4.5-1 libdrm-radeon1/experimental uptodate 2.4.21-2 libdrm2/experimental uptodate 2.4.21-2 libgl1-mesa-dri/experimental uptodate 7.8.2-2 xserver-xorg/squeeze uptodate 1:7.5+6 xserver-xorg-core/squeeze uptodate 2:1.7.7-4 xserver-xorg-video-radeon/squeeze uptodate 1:6.13.1-2 Created attachment 38576 [details] x11 components version (In reply to comment #59) > You seem to be the same gfx card, but different surrounding hardware, a > Fujitsu-Siemens laptop? Maybe Alex patch works on your hardware, but does not > work on Da Fox' and my ThinkPad T42? Yes, possible. The 2.6.36-rc3-00185-gd56557a-dirty kernel is up for 2 hours yet. Let's see. I should mention, while reviewing my xorg.conf, I discovered an artifact from the beginning of the "radeon"-driver time on this system. It is a Option "AGPMode" "4" and was nessesary to stabilize my system (freezes occured too). I'm sure, that was no longer needed with later kernel- and userspace driver -versions. However, I don't know, if this influenced the kernel behaviours during the patch tests, but I will remove this line and see, if something changes. My x11-package versions are attached. Created attachment 38577 [details]
emerge --info
.. and additional info about my system.
(In reply to comment #60) > However, I don't know, if this influenced the kernel behaviours during the > patch tests, but I will remove this line and see, if something changes. I removed this line and experienced a sudden freeze after X-restart and firefox-start. Could you please add this option to your xorg.conf and see and test this case? (In reply to comment #62) > (In reply to comment #60) > > However, I don't know, if this influenced the kernel behaviours during the > > patch tests, but I will remove this line and see, if something changes. > > I removed this line and experienced a sudden freeze after X-restart and > firefox-start. > Could you please add this option to your xorg.conf and see and test this case? I already have this AGP 4x line in my xorg.conf, too, since ages. Not to stabilize something, but AFAIR cause otherwise the driver would only use AGP 2x or even 1x. One can see that in the X.org logs AFAIR. 2.6.36-rc3-00185-gd56557a-dirty (latest git with alex' patch) freezed as well for me (even with AGP 4x-Option in xorg.conf). I will fall back to 2.6.36-rc3 and test again. Maybe I'll get a freeze this time. Then we would be on the same state again. The AGPMode xorg option isn't used with kms (the AGP mode is set before X starts when the drm loads). To force a particular AGP mode with kms, use the agpmode module parameter: radeon.agpmode=x where x=-1,1,2,4,8. -1 disables AGP and uses the on-chip gart mechanism instead. (In reply to comment #50) > So the first thing to do would be to verify that you indeed are experiencing > the same issue (and not an unrelated DPMS problem) is to keep using your > computer and wait for a freeze to occur during usage. Interesting, because this RV350 machine is my everyday desktop PC. It gets a lot of regular usage, and also a lot of intense CPU activity. And the DPMS-related freezes are the only ones I have seen. Here are the PCI bus details: 01:00.0 VGA compatible controller: ATI Technologies Inc RV350 AS [Radeon 9550] (prog-if 00 [VGA controller]) Subsystem: C.P. Technology Co. Ltd Device 2084 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 64 (2000ns min), Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 16 Region 0: Memory at e0000000 (32-bit, prefetchable) [size=256M] Region 1: I/O ports at ec00 [size=256] Region 2: Memory at ff8f0000 (32-bit, non-prefetchable) [size=64K] Expansion ROM at ff800000 [disabled] [size=128K] Capabilities: [58] AGP version 3.0 Status: RQ=256 Iso- ArqSz=0 Cal=0 SBA+ ITACoh- GART64- HTrans- 64bit- FW+ AGP3+ Rate=x4,x8 Command: RQ=32 ArqSz=2 Cal=0 SBA+ AGP+ GART64- 64bit- FW- Rate=x8 Capabilities: [50] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Kernel driver in use: radeon Kernel modules: radeon 01:00.1 Display controller: ATI Technologies Inc RV350 AS [Radeon 9550] (Secondary) Subsystem: C.P. Technology Co. Ltd Device 2085 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 64 (2000ns min), Cache Line Size: 64 bytes Region 0: Memory at d0000000 (32-bit, prefetchable) [size=256M] Region 1: Memory at ff8e0000 (32-bit, non-prefetchable) [size=64K] Capabilities: [50] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- (In reply to comment #64) > I will fall back to 2.6.36-rc3 and test again. Maybe I'll get a freeze this > time. Then we would be on the same state again. And it froze... :) I have two questions: 1) Now since we established that the vmembase at zero patch fixes or works around the problem - while the patch to align vram from comment #47 does not, and now that I bisected the range of commits down to about 10 and as far as I understand Da Fox and Lukas even bisected down to exact one commit: What next? Is the vmembase at zero patch the proper fix? Actually to me it seems more like a work-around. Is there another fix you propose? I would love to see a fix in time for 2.6.36, although I still have to figure out on how to get a kernel after 2.6.33 that does either userspace software suspend or TuxOnIce stably on my ThinkPad T42 (see bug #18162 regarding userspace software suspend and tuxonice-devel mailing list for TuxOnIce related stuff). 2) Re Comment #65: "The AGPMode xorg option isn't used with kms (the AGP mode is set before X starts when the drm loads). To force a particular AGP mode with kms, use the agpmode module parameter: radeon.agpmode=x where x=-1,1,2,4,8. -1 disables AGP and uses the on-chip gart mechanism instead." Is it necessary? How do I find out with AGP mode is used. I'd prefer when it used best AGP mode (that should be 4x on my ThinkPad T42) automatically. (In reply to comment #68) > I have two questions: > > 1) Now since we established that the vmembase at zero patch fixes or works > around the problem - while the patch to align vram from comment #47 does not, > and now that I bisected the range of commits down to about 10 and as far as I > understand Da Fox and Lukas even bisected down to exact one commit: What next? > Is the vmembase at zero patch the proper fix? Actually to me it seems more like > a work-around. Is there another fix you propose? I would love to see a fix in > time for 2.6.36, although I still have to figure out on how to get a kernel > after 2.6.33 that does either userspace software suspend or TuxOnIce stably on > my ThinkPad T42 (see bug #18162 regarding userspace software suspend and > tuxonice-devel mailing list for TuxOnIce related stuff). > We are currently testing a variation on this patch as suggested by Dave Airlie on IRC. It involves trying to put vram on memory addresses other than 0, but with some restriction on alignment and overlap with the GTT. Interesting values to test would be 0x10000000, 0x18000000 and 0xf0000000, provided that they don't cause overlap with the GTT area. You can see where your GTT area lives by looking at dmesg after boot: ---8<--------- $ dmesg | grep GTT radeon 0000:01:00.0: GTT: 256M 0xD0000000 - 0xDFFFFFFF [drm] radeon: 256M of GTT memory ready. --->8--------- This shows that gtt_start=0xD0000000 and gtt_end=0xDFFFFFFF.You should make sure that either 'base + "size of your vram" <= gtt_start' or that 'gtt_end < base', where base is one of 0x10000000, 0x18000000 or 0xf0000000. I have tested placing vram at 0x10000000, which worked for me for two days without a freeze. I am currently testing vram at 0xf0000000, which thus has not caused a freeze either. Please post your results here too. > 2) Re Comment #65: > > "The AGPMode xorg option isn't used with kms (the AGP mode is set before X > starts when the drm loads). To force a particular AGP mode with kms, use the > agpmode module parameter: radeon.agpmode=x where x=-1,1,2,4,8. -1 disables AGP > and uses the on-chip gart mechanism instead." > > Is it necessary? How do I find out with AGP mode is used. I'd prefer when it > used best AGP mode (that should be 4x on my ThinkPad T42) automatically. Again you can get this info by looking at your dmesg output: ---8<--------- $ dmesg | grep -i AGP Linux agpgart interface v0.103 agpgart-intel 0000:00:00.0: Intel 855PM Chipset agpgart-intel 0000:00:00.0: AGP aperture is 256M @ 0xd0000000 [drm] AGP mode requested: 4 agpgart-intel 0000:00:00.0: AGP 2.0 bridge agpgart-intel 0000:00:00.0: putting AGP V2 device into 4x mode radeon 0000:01:00.0: putting AGP V2 device into 4x mode --->8--------- So currently I am running with AGP mode 4x. As for is it necessary, I don't know, but I can't imagine it making a difference really. Ok here's an update: I've now tested putting the vram at the following locations (in order): - 0x00000000: This is the same location as vram used to be at before the identified bad commit and works. - 0x10000000: This is before GTT (which starts at 0xd0000000), with some room to spare. This works without freezing, tested for two days. - 0xf0000000: This is after GTT (which ends at 0xdfffffff), with some room to spare. This works, tested for two days. - 0xcc000000: This is directly in front of GTT, with no room to spare. This works, tested for several days. - 0xe0000000: This is directly behind GTT, with no room to spare. This where vram is placed starting with the identified commit, and as expected it froze within minutes. (In reply to comment #70) > Ok here's an update: > I've now tested putting the vram at the following locations (in order): > - 0x00000000: This is the same location as vram used to be at before the > identified bad commit and works. > - 0x10000000: This is before GTT (which starts at 0xd0000000), with some > room to spare. This works without freezing, tested for two days. > - 0xf0000000: This is after GTT (which ends at 0xdfffffff), with some room to > spare. This works, tested for two days. > - 0xcc000000: This is directly in front of GTT, with no room to spare. This > works, tested for several days. > - 0xe0000000: This is directly behind GTT, with no room to spare. This where > vram is placed starting with the identified commit, and as > expected it froze within minutes. Da Fox, I seem to have the same setup which is not surprising if you also have an ThinkPad T42: shambhala:~> dmesg | grep GTT radeon 0000:01:00.0: GTT: 256M 0xD0000000 - 0xDFFFFFFF radeon 0000:01:00.0: GTT: 256M 0xD0000000 - 0xDFFFFFFF Thus I do not think I need to test the same values again. Are there some other values I should test? Maybe we can share this work. Thanks for your hints regarding AGP. I think it might make sense to use that agp mode option, cause: shambhala:~> lspci | grep AGP 00:01.0 PCI bridge: Intel Corporation 82855PM Processor to AGP Controller (rev 03) shambhala:~> dmesg | grep -i AGP [drm] AGP mode requested: 1 agpgart-intel 0000:00:00.0: AGP 2.0 bridge agpgart-intel 0000:00:00.0: putting AGP V2 device into 1x mode radeon 0000:01:00.0: putting AGP V2 device into 1x mode [drm] AGP mode requested: 1 agpgart-intel 0000:00:00.0: AGP 2.0 bridge agpgart-intel 0000:00:00.0: putting AGP V2 device into 1x mode radeon 0000:01:00.0: putting AGP V2 device into 1x mode Did you set the agpmode module parameter for radeon or are you getting 4x setup automatically? If the later I wonder why I get 1x automatically. (In reply to comment #71) > [drm] AGP mode requested: 1 [...] > [...] I wonder why I get 1x automatically. The line above means you have (the equivalent of) radeon.agpmode=1 somewhere, either on the kernel command line or maybe in /etc/modprobe.d/ . Without an explicit option, the default is determined by the BIOS and can sometimes be tweaked in the BIOS setup. (In reply to comment #71) > Did you set the agpmode module parameter for radeon or are you getting 4x setup > automatically? If the later I wonder why I get 1x automatically. I get AGP 1x as default as well. I changed it with the kernel parameter to 4x and it seems to work as good as the old setting. (In reply to comment #72) > (In reply to comment #71) > > [drm] AGP mode requested: 1 > > [...] > > > [...] I wonder why I get 1x automatically. > > The line above means you have (the equivalent of) radeon.agpmode=1 somewhere, > either on the kernel command line or maybe in /etc/modprobe.d/ . Without an > explicit option, the default is determined by the BIOS and can sometimes be > tweaked in the BIOS setup. It does not seem so: shambhala:~> grep -r agpmode /etc shambhala:~#1> grep -r agpmode /boot shambhala:~#1> [Friday 17 September 2010] [09:50:58] <vootey> how comes, that my kms-setup with M10 (RV350) gpu defaults to agp 1x mode? [Friday 17 September 2010] [09:51:38] <airlied> vootey: because we have a quirk table and I guess is was unstable for someone in 4x [Friday 17 September 2010] [09:52:00] <MrCooper> or due to the BIOS setup (In reply to comment #75) > [Friday 17 September 2010] [09:50:58] <vootey> how comes, that my kms-setup > with M10 (RV350) gpu defaults to agp 1x mode? > [Friday 17 September 2010] [09:51:38] <airlied> vootey: because we have a quirk > table and I guess is was unstable for someone in 4x > [Friday 17 September 2010] [09:52:00] <MrCooper> or due to the BIOS setup Thanks. Set it to use 4x AGP manually, will see whether its stable on my ThinkPad T42. If was back when the Xorg option still worked. (In reply to comment #73) > (In reply to comment #71) > > Did you set the agpmode module parameter for radeon or are you getting 4x setup > > automatically? If the later I wonder why I get 1x automatically. > I get AGP 1x as default as well. I changed it with the kernel parameter to 4x > and it seems to work as good as the old setting. I also set radeon.agpmode=4 on the kernel commandline in grub.conf. I don't know what the default is, I could test it if it's important. But agpmode=4 has works with the older kernels, so I don't think that is the issue (In reply to comment #74) > (In reply to comment #72) > > (In reply to comment #71) > > > [drm] AGP mode requested: 1 > > > > [...] > > > > > [...] I wonder why I get 1x automatically. > > > > The line above means you have (the equivalent of) radeon.agpmode=1 somewhere, > > either on the kernel command line or maybe in /etc/modprobe.d/ . Without an > > explicit option, the default is determined by the BIOS and can sometimes be > > tweaked in the BIOS setup. > > It does not seem so: > > shambhala:~> grep -r agpmode /etc > shambhala:~#1> grep -r agpmode /boot > shambhala:~#1> That is odd, grep -r on /boot should match at least System.map: ~ # grep -r agpmode /boot/ /boot/System.map:c15a506d r __param_str_agpmode /boot/System.map:c1707b90 r __param_agpmode /boot/System.map:c173fcc0 d radeon_agpmode_quirk_list /boot/System.map:c17ec984 B radeon_agpmode /boot/grub/grub.conf:kernel (hd0,5)/boot/vmlinuz ro root=/dev/sda6 quiet splash=silent,theme:gerabellum CONSOLE=/dev/tty1 resume2=file:/dev/sda6:0x103130 radeon.agpmode=4 drm_kms_helper.poll=0 grep: warning: /boot/boot: recursive directory loop (In reply to comment #77) > I also set radeon.agpmode=4 on the kernel commandline in grub.conf. I don't > know what the default is, I could test it if it's important. But agpmode=4 has > works with the older kernels, so I don't think that is the issue So I've tested with radeon.agpmode=-1 yesterday and today. I've performed three tests, and each time the freeze happened. The first time the freeze happened soon after booting, while starting firefox (although I was also running the 'antspotlight' screensaver in a window. The other two times the freeze took a bit longer to manifest (45minutes to an hour). The third freeze occured while rebuilding my kernel to re-include the 'vram at zero' patch. So the freezing issue exists even in PCI mode. Tested with kernel 26bf62e47261142d528a6109fdd671a2e280b4ea - Merge branch 'drm-radeon-next' of ../drm-radeon-next into drm-core-next , with additional patch to print vram and gtt locations. dmesg | grep -iE 'radeon|agp' contains the following (lines starting with 'RADEON:' mine): ---8<--------- ... gpgart-intel 0000:00:00.0: Intel 855PM Chipset agpgart-intel 0000:00:00.0: AGP aperture is 256M @ 0xd0000000 [drm] radeon defaulting to kernel modesetting. [drm] radeon kernel modesetting enabled. radeon 0000:01:00.0: power state changed by ACPI to D0 radeon 0000:01:00.0: power state changed by ACPI to D0 radeon 0000:01:00.0: PCI INT A -> Link[LNKA] -> GSI 11 (level, low) -> IRQ 11 [drm] Forcing AGP to PCI mode RADEON: base at e0000000, rdev->gtt_start at 0, base would have been at e0000000 RADEON: vram sizes: rdev->mc.mc_vram_size=4000000, rdev->mc.real_vram_size=4000000 rdev->mc.visible_vram_size=8000000 radeon 0000:01:00.0: VRAM: 64M 0xE0000000 - 0xE3FFFFFF (64M used) radeon 0000:01:00.0: GTT: 512M 0xC0000000 - 0xDFFFFFFF [drm] radeon: irq initialized. [drm] radeon: 64M of VRAM memory ready [drm] radeon: 512M of GTT memory ready. [drm] radeon: 1 quad pipes, 1 Z pipes initialized. radeon 0000:01:00.0: WB enabled [drm] radeon: ring at 0x00000000C0001000 ... --->8--------- This shows that VRAM is placed directly after GTT even in PCI mode. Can anyone please confirm these results? Created attachment 39455 [details] [review] Make sure gtt and vram are not directly adjacent Does this patch fix the issue? (In reply to comment #79) > Created an attachment (id=39455) [details] > Make sure gtt and vram are not directly adjacent > > Does this patch fix the issue? It should, since it simply puts vram at 0 when it is detected that gtt and vram are adjacent. dmesg says: ---8<--------- radeon 0000:01:00.0: GTT: 256M 0xD0000000 - 0xDFFFFFFF radeon 0000:01:00.0: VRAM: 64M 0x00000000 - 0x03FFFFFF (64M used) --->8--------- so it has moved vram to 0. I'll test it just the same though :) One thing I am wondering is if it is possible to get vram/gtt overlapping with this, since this patch doesn't seem to perform any further checks to ensure this when moving either vram or gtt to 0? Presumable this already handled elsewhere? (In reply to comment #79) > Created an attachment (id=39455) [details] > Make sure gtt and vram are not directly adjacent > > Does this patch fix the issue? Testing your patch in: martin@shambhala:~> cat /proc/version Linux version 2.6.36-rc8-tp42-gtt-vram-not-adjacent-00020-g2d01971-dirty (martin@shambhala) (gcc version 4.4.5 (Debian 4.4.5-2) ) #1 PREEMPT Sun Oct 17 13:48:48 CEST 2010 I get vram aligned to zero as well - I also have a ThinkPad T42 like Da Fox - so everything should work: martin@shambhala:~/Computer/Shambhala/Kernel/2.6.36> dmesg | grep -i radeon [drm] radeon kernel modesetting enabled. radeon 0000:01:00.0: power state changed by ACPI to D0 radeon 0000:01:00.0: power state changed by ACPI to D0 radeon 0000:01:00.0: PCI INT A -> Link[LNKA] -> GSI 11 (level, low) -> IRQ 11 radeon 0000:01:00.0: putting AGP V2 device into 4x mode radeon 0000:01:00.0: GTT: 256M 0xD0000000 - 0xDFFFFFFF radeon 0000:01:00.0: VRAM: 64M 0x00000000 - 0x03FFFFFF (64M used) [drm] radeon: irq initialized. [drm] radeon: 64M of VRAM memory ready [drm] radeon: 256M of GTT memory ready. [drm] radeon: 1 quad pipes, 1 Z pipes initialized. [drm] radeon: ring at 0x00000000D0000000 [drm] radeon: ib pool ready. [drm] Radeon Display Connectors [drm] radeon: power management initialized fb0: radeondrmfb frame buffer device [drm] Initialized radeon 2.6.0 20080528 for 0000:01:00.0 on minor 0 I will report back after longer term testing. I am not using radeon.agpmode=-1 but martin@shambhala:~> cat /etc/modprobe.d/radeon-kms.conf options radeon modeset=1 agpmode=4 cause it ran stable for me for martin@shambhala:~> uprecords -m 20 | grep "2\.6\.35" 16 12 days, 15:20:19 | Linux 2.6.35.5-tp42-vmem Mon Oct 4 23:44:41 2010 Created attachment 39595 [details] [review] updated patch Updated patch to make sure gtt and vram don't overlap if vram is at 0. Now testing 2.6.36 + your v2 patch. Three reboots all is well so far - I do not expect any surprises, since mem mapping seems to be the same: martin@shambhala:~> dmesg | grep "radeon" | egrep -i "(gtt|vram)" radeon 0000:01:00.0: GTT: 256M 0xD0000000 - 0xDFFFFFFF radeon 0000:01:00.0: VRAM: 64M 0x00000000 - 0x03FFFFFF (64M used) [drm] radeon: 64M of VRAM memory ready [drm] radeon: 256M of GTT memory ready. I also tried to setup Radeon KMS DRM on my Dell Dimension 5100 at work, but I found quite some issues - ttys are blank after KMS switch, machine locks with a backtrace on enabling XRANDR for 1680x1050 + 1280x1024 or so, seemed to be a memory issue. 32-Bit, almost 4GB of RAM (no PAE). I hope to be able to try on this workstation in November again and to report some bugs. (In reply to comment #44) > I am currently testing d594e46ace22afa1621254f6f669e65430048153 + > 8e36113082821980c60ce89a6c5d45fc9492fc26 and the following patch as suggested > by Dave Airlie on IRC: > > ---8<--------- > diff --git a/drivers/gpu/drm/radeon/r300.c b/drivers/gpu/drm/radeon/r300.c > index c827738..d1a7803 100644 > --- a/drivers/gpu/drm/radeon/r300.c > +++ b/drivers/gpu/drm/radeon/r300.c > @@ -477,7 +477,7 @@ void r300_mc_init(struct radeon_device *rdev) > default: rdev->mc.vram_width = 128; break; > } > r100_vram_init_sizes(rdev); > - base = rdev->mc.aper_base; > + base = 0; > if (rdev->flags & RADEON_IS_IGP) > base = (RREG32(RADEON_NB_TOM) & 0xffff) << 16; > radeon_vram_location(rdev, &rdev->mc, base); I have the same problem since 2.6.34. The above hack fixes it for me (tested with 2.6.35.7 and a few earlier 2.6.35 releases, in frequent use for more than a month now), while the patch from comment #82 does not help in my case (last freeze was while using the search bar of firefox). Video chip is a: 01:00.0 VGA compatible controller: ATI Technologies Inc RV350 [Mobility Radeon 9600 M10] (prog-if 00 [VGA controller]) Subsystem: ASUSTeK Computer Inc. Device 1772 Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 11 Memory at d0000000 (32-bit, prefetchable) [size=128M] I/O ports at d800 [size=256] Memory at ff8f0000 (32-bit, non-prefetchable) [size=64K] Expansion ROM at ff8c0000 [disabled] [size=128K] Capabilities: [58] AGP version 2.0 Capabilities: [50] Power Management version 2 Kernel driver in use: radeon With the updated patch I get these messages: Oct 21 16:12:28 [kernel] [drm] initializing kernel modesetting (RV350 0x1002:0x4E50). Oct 21 16:12:28 [kernel] [drm] register mmio base: 0xFF8F0000 Oct 21 16:12:28 [kernel] [drm] register mmio size: 65536 Oct 21 16:12:28 [kernel] agpgart-intel 0000:00:00.0: AGP 2.0 bridge Oct 21 16:12:28 [kernel] agpgart-intel 0000:00:00.0: putting AGP V2 device into 4x mode Oct 21 16:12:28 [kernel] radeon 0000:01:00.0: putting AGP V2 device into 4x mode Oct 21 16:12:28 [kernel] radeon 0000:01:00.0: GTT: 256M 0xE0000000 - 0xEFFFFFFF Oct 21 16:12:28 [kernel] [drm] Generation 2 PCI interface, using max accessible memory Oct 21 16:12:28 [kernel] radeon 0000:01:00.0: VRAM: 64M 0xD0000000 - 0xD3FFFFFF (64M used) Oct 21 16:12:28 [kernel] [drm] radeon: irq initialized. Oct 21 16:12:28 [kernel] [drm] Detected VRAM RAM=64M, BAR=128M Oct 21 16:12:28 [kernel] [drm] RAM width 128bits DDR Oct 21 16:12:28 [kernel] [TTM] Zone kernel: Available graphics memory: 442550 kiB. Oct 21 16:12:28 [kernel] [TTM] Zone highmem: Available graphics memory: 1036090 kiB. Oct 21 16:12:28 [kernel] [TTM] Initializing pool allocator. Oct 21 16:12:28 [kernel] [drm] radeon: 64M of VRAM memory ready Oct 21 16:12:28 [kernel] [drm] radeon: 256M of GTT memory ready. *SNIP* While with the hack from comment #44 it looks like this: Oct 22 00:28:16 [kernel] [drm] radeon kernel modesetting enabled. Oct 22 00:28:16 [kernel] radeon 0000:01:00.0: PCI INT A -> Link[LNKA] -> GSI 11 (level, low) -> IRQ 11 Oct 22 00:28:16 [kernel] [drm] initializing kernel modesetting (RV350 0x1002:0x4E50). Oct 22 00:28:16 [kernel] [drm] register mmio base: 0xFF8F0000 Oct 22 00:28:16 [kernel] [drm] register mmio size: 65536 Oct 22 00:28:16 [kernel] agpgart-intel 0000:00:00.0: AGP 2.0 bridge Oct 22 00:28:16 [kernel] agpgart-intel 0000:00:00.0: putting AGP V2 device into 4x mode Oct 22 00:28:16 [kernel] radeon 0000:01:00.0: putting AGP V2 device into 4x mode Oct 22 00:28:16 [kernel] radeon 0000:01:00.0: GTT: 256M 0xE0000000 - 0xEFFFFFFF Oct 22 00:28:16 [kernel] [drm] Generation 2 PCI interface, using max accessible memory Oct 22 00:28:16 [kernel] radeon 0000:01:00.0: VRAM: 64M 0x00000000 - 0x03FFFFFF (64M used) Oct 22 00:28:16 [kernel] [drm] radeon: irq initialized. Oct 22 00:28:16 [kernel] [drm] Detected VRAM RAM=64M, BAR=128M Oct 22 00:28:16 [kernel] [drm] RAM width 128bits DDR Oct 22 00:28:16 [kernel] [TTM] Zone kernel: Available graphics memory: 442550 kiB. Oct 22 00:28:16 [kernel] [TTM] Zone highmem: Available graphics memory: 1036090 kiB. Oct 22 00:28:16 [kernel] [TTM] Initializing pool allocator. Oct 22 00:28:16 [kernel] [drm] radeon: 64M of VRAM memory ready Oct 22 00:28:16 [kernel] [drm] radeon: 256M of GTT memory ready. Created attachment 39651 [details] [review] Make sure MC vram map is >= pci aperture size Ok, I think I found the root cause in this bug. The vram map in the memory controller needs to be >= the pci aperture size. For the systems here, the vram size is 64 MB and the aperture size is 128 MB so the vram map in the mc needs to be at least 128 MB. However, it's getting set to 64 MB. (In reply to comment #85) > Created an attachment (id=39651) [details] > Make sure MC vram map is >= pci aperture size > > Ok, I think I found the root cause in this bug. The vram map in the memory > controller needs to be >= the pci aperture size. For the systems here, the > vram size is 64 MB and the aperture size is 128 MB so the vram map in the mc > needs to be at least 128 MB. However, it's getting set to 64 MB. The patch fixes this long-standing issue for me. System is rock-solid with 4xAGP and all sorts of firefox and 3d abuse didn't freeze it. Thank you! (In reply to comment #85) > Created an attachment (id=39651) [details] > Make sure MC vram map is >= pci aperture size > > Ok, I think I found the root cause in this bug. The vram map in the memory > controller needs to be >= the pci aperture size. For the systems here, the > vram size is 64 MB and the aperture size is 128 MB so the vram map in the mc > needs to be at least 128 MB. However, it's getting set to 64 MB. I've tested the patch from Comment 79 for over a week now, without any issues (as expected). The patch from Comment 82 is quite similar so I assume that would work too. I'm going to test this patch now on a fresh 2.6.36. This bug affects my laptop as well. It has the rv350 with 64MB of RAM. I have been testing the patch in Comment 85 on the Ubuntu 2.6.36 Natty kernel. It seems to have fixed this freezing problem. I have tried the patches in Comment 79 and Comment 82 but the system still locked up. On my system the lock ups occurred quite fast (within minutes after boot) while doing regular stuff like web browsing, having gedit and a terminal open. When I use an unpatched kernel the system freeze happens every time with in minutes. I have been using this laptop with a patched kernel for the past few days and it hasn't froze once. I've rebooted into an unpatched kernel every once in a while and sure enough it freezes shortly after booting up. (In reply to comment #85) > Created an attachment (id=39651) [details] > Make sure MC vram map is >= pci aperture size > > Ok, I think I found the root cause in this bug. The vram map in the memory > controller needs to be >= the pci aperture size. For the systems here, the > vram size is 64 MB and the aperture size is 128 MB so the vram map in the mc > needs to be at least 128 MB. However, it's getting set to 64 MB. Could you please tell me which git archive this patch has gone into. It doesn't seem to be the official linux-next: next-20101026 Many thanks for a hint (I do need to get a patchset against stock 2.6.36) Helmut. (In reply to comment #89) > Could you please tell me which git archive this patch has gone into. > It doesn't seem to be the official linux-next: next-20101026 > > Many thanks for a hint (I do need to get a patchset against stock 2.6.36) The patch is available here on this bug and Dave pulled it into the drm-next branch of his tree: http://git.kernel.org/?p=linux/kernel/git/airlied/drm-2.6.git;a=commitdiff;h=b7d8cce5b558e0c0aa6898c9865356481598b46d It's not in Linus tree yet however. (In reply to comment #90) > (In reply to comment #89) > > Could you please tell me which git archive this patch has gone into. > > It doesn't seem to be the official linux-next: next-20101026 > > > > Many thanks for a hint (I do need to get a patchset against stock 2.6.36) > > The patch is available here on this bug and Dave pulled it into the drm-next > branch of his tree: > http://git.kernel.org/?p=linux/kernel/git/airlied/drm-2.6.git;a=commitdiff;h=b7d8cce5b558e0c0aa6898c9865356481598b46d > > It's not in Linus tree yet however. Many thanks, Helmut. (In reply to comment #90) > (In reply to comment #89) > The patch is available here on this bug and Dave pulled it into the drm-next > branch of his tree: > http://git.kernel.org/?p=linux/kernel/git/airlied/drm-2.6.git;a=commitdiff;h=b7d8cce5b558e0c0aa6898c9865356481598b46d > > It's not in Linus tree yet however. I'm using drm-next kernel available here http://kernel.ubuntu.com/~kernel-ppa/mainline/drm-next/2010-10-26-maverick/. It seems to be working, no problems so far. Kernels from 2.6.33 were always giving hangs. I'm using an asus a4500g with mobility radeon 9600 m10: 01:00.0 VGA compatible controller: ATI Technologies Inc RV350 [Mobility Radeon 9600 M10] (prog-if 00 [VGA controller]) Subsystem: ASUSTeK Computer Inc. Device 1942 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 64 (2000ns min), Cache Line Size: 32 bytes Interrupt: pin A routed to IRQ 19 Region 0: Memory at d0000000 (32-bit, prefetchable) [size=128M] Region 1: I/O ports at c800 [size=256] Region 2: Memory at dfef0000 (32-bit, non-prefetchable) [size=64K] Expansion ROM at dfec0000 [disabled] [size=128K] Capabilities: [58] AGP version 3.0 Status: RQ=256 Iso- ArqSz=0 Cal=0 SBA+ ITACoh- GART64- HTrans- 64bit- FW+ AGP3+ Rate=x4,x8 Command: RQ=32 ArqSz=2 Cal=0 SBA+ AGP+ GART64- 64bit- FW- Rate=x8 Capabilities: [50] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Kernel driver in use: radeon Kernel modules: radeon, radeonfb I hope the patch will be merged as soon as possible with linus tree. thanks for your work. Silvano (In reply to comment #87) > (In reply to comment #85) > > Created an attachment (id=39651) [details] [details] > > Make sure MC vram map is >= pci aperture size > > > > Ok, I think I found the root cause in this bug. The vram map in the memory > > controller needs to be >= the pci aperture size. For the systems here, the > > vram size is 64 MB and the aperture size is 128 MB so the vram map in the mc > > needs to be at least 128 MB. However, it's getting set to 64 MB. > > I've tested the patch from Comment 79 for over a week now, without any issues > (as expected). The patch from Comment 82 is quite similar so I assume that > would work too. I'm going to test this patch now on a fresh 2.6.36. Ok, I've tested the patch from Comment #85 for the better part of a week now, and I haven't experienced a single freeze yet. VRAM is still placed directly after GTT: ---8<--------- Oct 29 08:47:38 localhost kernel: radeon 0000:01:00.0: GTT: 256M 0xD0000000 - 0xDFFFFFFF Oct 29 08:47:38 localhost kernel: radeon 0000:01:00.0: VRAM: 128M 0xE0000000 - 0xE7FFFFFF (64M used) Oct 29 08:47:38 localhost kernel: [drm] Detected VRAM RAM=128M, BAR=128M Oct 29 08:47:38 localhost kernel: [drm] radeon: 64M of VRAM memory ready Oct 29 08:47:38 localhost kernel: [drm] radeon: 256M of GTT memory ready. Oct 29 08:47:38 localhost kernel: [drm] vram apper at 0xE0000000 Oct 29 10:42:30 localhost kernel: radeon 0000:01:00.0: GTT: 256M 0xD0000000 - 0xDFFFFFFF --->8--------- And the VRAM sizes are still a bit confusing (it's listed as 128M with 64M 'used'?) So I can confirm that this patch indeed fixes the issue. Job well done! (In reply to comment #93) > And the VRAM sizes are still a bit confusing (it's listed as 128M with 64M > 'used'?) The vram aperture size in the memory controller has to match or exceed the pci vram aperture. The pci aperture size is 128 MB, so the MC aperture has to be >= 128 MB. However, of that 128 MB aperture, only 64 MB is actually usable. Fix is upstream: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=b7d8cce5b558e0c0aa6898c9865356481598b46d Thanks to all of you, guys! I'm looking forward to see the next (patched) kernel-release! This fix has been released in kernel 2.6.37-rc1, and queued for stable. Just for the record: The patch in Comment #85 fixes it for me as well. Firefox, compositing (KWin 4.4), Extreme-Tuxracer - everything's just rock solid now. Thanks for your amazing work! This is what I call great support for Linux! *** Bug 32107 has been marked as a duplicate of this bug. *** *** Bug 27525 has been marked as a duplicate of this bug. *** |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.