Summary: | hard lock on startx with AGPFastWrite on radeon mobility 9600 (aka M10) | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | Simon <Simon80> | ||||||
Component: | General | Assignee: | Default DRI bug account <dri-devel> | ||||||
Status: | RESOLVED WONTFIX | QA Contact: | |||||||
Severity: | major | ||||||||
Priority: | high | CC: | sndirsch | ||||||
Version: | XOrg git | ||||||||
Hardware: | x86 (IA32) | ||||||||
OS: | Linux (All) | ||||||||
Whiteboard: | |||||||||
i915 platform: | i915 features: | ||||||||
Attachments: |
|
Description
Simon
2006-10-29 12:52:01 UTC
Created attachment 7572 [details]
output of startx when duplicating bug
I took this from another machine with an ssh client, the text ends at the point
where the computer starting X stops responding.
Created attachment 7573 [details]
xorg.conf
Here's my pretty xorg.conf, though on looking at it, I realize that a large
portion of it is irrelevant cruft, hehe..
AGPFastWrite is know to be bogus, moreover i don't think you can expect a big performance boost with this. If you want to debug further this you might want to search in dri archive or on the web about agp fast write. Otherwise i will likely close the bug. Oh, and you might want to enable debugging in radeon module and thus get a better view of what's going on (look at your kernel log not xorg log). the canonical answer to this is 'yeah, don't do that'. various hardware bugs and combinations make it pretty close to impossible to get right. (In reply to comment #3) > Oh, and you might want to enable debugging in radeon module > and thus get a better view of what's going on (look at your > kernel log not xorg log). Can you elaborate? I was aware that fast write usage is plain discouraged, but I do intend to try and debug this if possible, I just don't know how to get more detailed info out. Fast write basically never works. We don't have documentation, hardware logical analyzers, or time to find the right work around for a weird hardware interaction that will give a 0.0001% performance increase. AGP fast writes are a waste of time. (In reply to comment #6) > Fast write basically never works. We don't have documentation, hardware logical > analyzers, or time to find the right work around for a weird hardware > interaction that will give a 0.0001% performance increase. AGP fast writes are > a waste of time. I know, I know, but I was still wondering if someone could provide guidance for me to turn on debugging, etc., I am interested in looking into this on my own time. Thanks I've actually contemplated making the Linux agpgart drivers ignore requests to turn on fast writes a few times. I get at least one report a month from users being burned by this, and I don't think I've *ever* got a report of it working without some issue or other. Lets remove the option then, or better still, make it a noop so people don't complain about it going away. Used to work great on my Radeon M6, fwiw. Well, great in the sense that it didn't lock anything up. But it still didn't do anything measurable. The only useful tweak is page flipping. If this turns out to be an option that we care about at all, it should be a no-op (as keith suggested) on all combinations *except* the few that are known to work. We can keep a white-list of known good graphics card / motherboard chipset combinations. My recommendation would be to put that in the kernel. The AGP backend is probably the best bet. Since I have *never* seen fast writes be demonstrated to give *any* performance benefit, I seriously doubt that we care. I know that I don't care. :) (In reply to comment #11) > Well, great in the sense that it didn't lock anything up. But it still didn't do > anything measurable. The only useful tweak is page flipping. Are you sure that both your m6 and your motherboard actually supported it? Though I can confirm I got it working on a rv250 and a amd64 chipset to work too (without a performance diff neither). I think the no performance difference is pretty much guaranteed, as fast-writes only affects chipset to graphic card transactions. But generally with dri/drm, the graphic chip fetches all data (in the ring buffer / indirect buffers) itself from memory, so this just doesn't apply. (In reply to comment #7) > I know, I know, but I was still wondering if someone could provide guidance for > me to turn on debugging, etc., I am interested in looking into this on my own time. Well, you could try enabling debug with the drm/radeon kernel modules (debug=1 parameter). You'd probably need to mount the volume with your log files with sync, and even then I don't think you'd get anything useful in the log. The only strange thing about fast writes not working is that proprietary drivers sometimes report it working with the same hardware. I've no idea if they just plain lie about it, or do some workarounds or tweaking with the graphic chips. (In reply to comment #12) > Since I have *never* seen fast writes be demonstrated to give *any* performance > benefit, I seriously doubt that we care. I know that I don't care. :) Thinking about that, I think it could make a difference when writing to the framebuffer directly (e.g. stuff like glWritePixels, which is at least currently not using gpu blits). Only if agp mode is higher than 1, however. (In reply to comment #13) > Well, you could try enabling debug with the drm/radeon kernel modules (debug=1 > parameter). You'd probably need to mount the volume with your log files with > sync, and even then I don't think you'd get anything useful in the log. > The only strange thing about fast writes not working is that proprietary drivers > sometimes report it working with the same hardware. I've no idea if they just > plain lie about it, or do some workarounds or tweaking with the graphic chips. Thanks - I should have stuck to IRC, cause I knew this would be a wontfix and merely wanted help with trying to debug it on my own. It wasn't my intent to start this discussion about disabling the option, I personally think that despite the pain of having to tell users that AGPFastWrite is a bad idea, they should still have the choice of trying it out, so making the option a no-op is not nice. Worse, if you make it a no-op, users will actually think that it worked, and misinformation of already not very well informed users doesn't sound like a well thought out plan either. (In reply to comment #15) > > Thanks - I should have stuck to IRC, cause I knew this would be a wontfix and > merely wanted help with trying to debug it on my own. > > It wasn't my intent to start this discussion about disabling the option, I > personally think that despite the pain of having to tell users that AGPFastWrite > is a bad idea, they should still have the choice of trying it out, so making the > option a no-op is not nice. Worse, if you make it a no-op, users will actually > think that it worked, and misinformation of already not very well informed users > doesn't sound like a well thought out plan either. I'm sorry we sound so discouraging, but I guess we're a bit jaded. Too many users complain the driver in broken only to reveal much later (after much wasted developer time) that they have fastwrites turned on. As such it's a bit of a knee-jerk reaction. The problem is, there's not really a good way to debug this. There have been several suggestions, but the problem is it locks up bad and years later still no one knows why. You could try and track down what fglrx does (if anything) by dumping the radeon and AGP chipset regs. Since you haven't had much luck with conventional software means, you may need access to hardware analyzers or unpublished chipset errata. Perhaps others with more AGP chipset-side knowledge have some ideas. (In reply to comment #15) > Thanks - I should have stuck to IRC, cause I knew this would be a wontfix and > merely wanted help with trying to debug it on my own. After the completely fruitless debugging that we've done, I honestly believe that you'd need to use a logic analyzer to trace the bus signals during the write operations. (In reply to comment #13) > Though I can confirm I got it working on a rv250 and a amd64 chipset to work too > (without a performance diff neither). Meh. Wanted to measure performance difference (I think copypixrate might be the test to use), and it blew up right at xorg startup. So I can't confirm it works for me after all... maybe I had tested by mistake with agp mode 1 before which will turn this feature off automatically (though actually, with agpgart from kernel 2.6.17, it still locks up here, since agpgart will try to put it into 0x mode, which doesn't really seem to hurt otherwise, but thus it will not detect that fast writes won't do anything and not disable them - apparently the 0x mode happens because bridge_agpstat is 1f000a14 (in agp 2.0 mode), thus the bridge claims it's not supporting 1x and 2x modes, which is afaik just plain illegal). I fixed the 0x bug in 2.6.18 (In reply to comment #19) > I fixed the 0x bug in 2.6.18 A bit OT, but no this is not the issue which is fixed in 2.6.18, it still reports 0x mode. In my case, the bridge is AGP 3.5, but the card is AGP 2.0. You can easily see why that happens when looking at the various agp status values, which look like that (printed out in agp_collect_device_status immediately after reading vga_agpstat): agpgart: Found an AGP 3.5 compliant device at 0000:00:00.0. agpgart: req mode 1f000201 bridge_agpstat 1f000a14 vga_agpstat 2f000217. agpgart: Device is in legacy mode, falling back to 2.x agpgart: Putting AGP V2 device at 0000:00:00.0 into 0x mode agpgart: Putting AGP V2 device at 0000:01:00.0 into 0x mode agpgart: Putting AGP V2 device at 0000:01:00.1 into 0x mode I think what the bridge reports (only supporting 4x rate) is illegal, or there is some problem when putting it in 2.0 mode, in any case, motherboard is a asus k8v se deluxe, chipset k8t800, the relevant lspci output: 0000:00:00.0 Host bridge: VIA Technologies, Inc.: Unknown device 3188 (rev 01) Subsystem: Asustek Computer, Inc.: Unknown device 80a3 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR- Latency: 8 Region 0: Memory at f0000000 (32-bit, prefetchable) Capabilities: [80] AGP version 3.5 Status: RQ=32 Iso- ArqSz=0 Cal=2 SBA+ ITACoh- GART64- HTrans- 64bit- FW+ AGP3- Rate=x4 Command: RQ=1 ArqSz=0 Cal=0 SBA+ AGP+ GART64- 64bit- FW- Rate=<none> Capabilities: [c0] #08 [0060] Capabilities: [68] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Capabilities: [58] #08 [8001] sigh. it's in AGPv2 mode, but trying to use an AGPv3 rate. That isn't going to work. I'll fix that up. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.