Bug 8816 - hard lock on startx with AGPFastWrite on radeon mobility 9600 (aka M10)
Summary: hard lock on startx with AGPFastWrite on radeon mobility 9600 (aka M10)
Status: RESOLVED WONTFIX
Alias: None
Product: DRI
Classification: Unclassified
Component: General (show other bugs)
Version: XOrg git
Hardware: x86 (IA32) Linux (All)
: high major
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-10-29 12:52 UTC by Simon
Modified: 2007-08-08 12:09 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
output of startx when duplicating bug (2.02 KB, text/plain)
2006-10-29 12:58 UTC, Simon
no flags Details
xorg.conf (6.21 KB, text/plain)
2006-10-29 13:01 UTC, Simon
no flags Details

Description Simon 2006-10-29 12:52:01 UTC
This is simple: X windows works fine for me with AGPFastWrite disabled in my
xorg.conf, but when I enable it, starting X crashes everything, to the point
where I can't even debug over ssh.  The most useful output I can get is attached.
Comment 1 Simon 2006-10-29 12:58:06 UTC
Created attachment 7572 [details]
output of startx when duplicating bug

I took this from another machine with an ssh client, the text ends at the point
where the computer starting X stops responding.
Comment 2 Simon 2006-10-29 13:01:17 UTC
Created attachment 7573 [details]
xorg.conf

Here's my pretty xorg.conf, though on looking at it, I realize that a large
portion of it is irrelevant cruft, hehe..
Comment 3 Jerome Glisse 2006-10-29 13:05:42 UTC
AGPFastWrite is know to be bogus, moreover i don't think
you can expect a big performance boost with this. If you
want to debug further this you might want to search in dri
archive or on the web about agp fast write. Otherwise i
will likely close the bug.

Oh, and you might want to enable debugging in radeon module
and thus get a better view of what's going on (look at your
kernel log not xorg log).
Comment 4 Daniel Stone 2006-10-29 13:19:47 UTC
the canonical answer to this is 'yeah, don't do that'.  various hardware bugs
and combinations make it pretty close to impossible to get right.
Comment 5 Simon 2006-10-29 13:36:31 UTC
(In reply to comment #3)
> Oh, and you might want to enable debugging in radeon module
> and thus get a better view of what's going on (look at your
> kernel log not xorg log).

Can you elaborate?  I was aware that fast write usage is plain discouraged, but
I do intend to try and debug this if possible, I just don't know how to get more
detailed info out.
Comment 6 Ian Romanick 2006-10-30 08:54:03 UTC
Fast write basically never works.  We don't have documentation, hardware logical
analyzers, or time to find the right work around for a weird hardware
interaction that will give a 0.0001% performance increase.  AGP fast writes are
a waste of time.
Comment 7 Simon 2006-10-30 10:29:44 UTC
(In reply to comment #6)
> Fast write basically never works.  We don't have documentation, hardware logical
> analyzers, or time to find the right work around for a weird hardware
> interaction that will give a 0.0001% performance increase.  AGP fast writes are
> a waste of time.

I know, I know, but I was still wondering if someone could provide guidance for
me to turn on debugging, etc., I am interested in looking into this on my own time.

Thanks
Comment 8 Dave Jones 2006-10-30 11:14:37 UTC
I've actually contemplated making the Linux agpgart drivers ignore requests to
turn on fast writes a few times. I get at least one report a month from users
being burned by this, and I don't think I've *ever* got a report of it working
without some issue or other.
Comment 9 Keith Whitwell 2006-10-30 11:21:54 UTC
Lets remove the option then, or better still, make it a noop so people don't
complain about it going away. 
Comment 10 Donnie Berkholz 2006-10-30 11:24:35 UTC
Used to work great on my Radeon M6, fwiw.
Comment 11 Donnie Berkholz 2006-10-30 11:26:02 UTC
Well, great in the sense that it didn't lock anything up. But it still didn't do
anything measurable. The only useful tweak is page flipping.
Comment 12 Ian Romanick 2006-10-30 12:49:20 UTC
If this turns out to be an option that we care about at all, it should be a
no-op (as keith suggested) on all combinations *except* the few that are known
to work.  We can keep a white-list of known good graphics card / motherboard
chipset combinations.

My recommendation would be to put that in the kernel.  The AGP backend is
probably the best bet.

Since I have *never* seen fast writes be demonstrated to give *any* performance
benefit, I seriously doubt that we care.  I know that I don't care. :)
Comment 13 Roland Scheidegger 2006-10-30 13:02:26 UTC
(In reply to comment #11)
> Well, great in the sense that it didn't lock anything up. But it still didn't do
> anything measurable. The only useful tweak is page flipping.
Are you sure that both your m6 and your motherboard actually supported it?
Though I can confirm I got it working on a rv250 and a amd64 chipset to work too
(without a performance diff neither). I think the no performance difference is
pretty much guaranteed, as fast-writes only affects chipset to graphic card
transactions. But generally with dri/drm, the graphic chip fetches all data (in
the ring buffer / indirect buffers) itself from memory, so this just doesn't apply.

(In reply to comment #7)
> I know, I know, but I was still wondering if someone could provide guidance for
> me to turn on debugging, etc., I am interested in looking into this on my own
time.
Well, you could try enabling debug with the drm/radeon kernel modules (debug=1
parameter). You'd probably need to mount the volume with your log files with
sync, and even then I don't think you'd get anything useful in the log.
The only strange thing about fast writes not working is that proprietary drivers
sometimes report it working with the same hardware. I've no idea if they just
plain lie about it, or do some workarounds or tweaking with the graphic chips.
Comment 14 Roland Scheidegger 2006-10-30 13:39:08 UTC
(In reply to comment #12)
> Since I have *never* seen fast writes be demonstrated to give *any* performance
> benefit, I seriously doubt that we care.  I know that I don't care. :)
Thinking about that, I think it could make a difference when writing to the
framebuffer directly (e.g. stuff like glWritePixels, which is at least currently
not using gpu blits). Only if agp mode is higher than 1, however.
Comment 15 Simon 2006-10-30 17:15:27 UTC
(In reply to comment #13)
> Well, you could try enabling debug with the drm/radeon kernel modules (debug=1
> parameter). You'd probably need to mount the volume with your log files with
> sync, and even then I don't think you'd get anything useful in the log.
> The only strange thing about fast writes not working is that proprietary drivers
> sometimes report it working with the same hardware. I've no idea if they just
> plain lie about it, or do some workarounds or tweaking with the graphic chips.

Thanks - I should have stuck to IRC, cause I knew this would be a wontfix and
merely wanted help with trying to debug it on my own.

It wasn't my intent to start this discussion about disabling the option, I
personally think that despite the pain of having to tell users that AGPFastWrite
is a bad idea, they should still have the choice of trying it out, so making the
option a no-op is not nice.  Worse, if you make it a no-op, users will actually
think that it worked, and misinformation of already not very well informed users
doesn't sound like a well thought out plan either.
Comment 16 Alex Deucher 2006-10-30 17:52:33 UTC
(In reply to comment #15)
> 
> Thanks - I should have stuck to IRC, cause I knew this would be a wontfix and
> merely wanted help with trying to debug it on my own.
> 
> It wasn't my intent to start this discussion about disabling the option, I
> personally think that despite the pain of having to tell users that AGPFastWrite
> is a bad idea, they should still have the choice of trying it out, so making the
> option a no-op is not nice.  Worse, if you make it a no-op, users will actually
> think that it worked, and misinformation of already not very well informed users
> doesn't sound like a well thought out plan either.

I'm sorry we sound so discouraging, but I guess we're a bit jaded.  Too many
users complain the driver in broken only to reveal much later (after much wasted
developer time) that they have fastwrites turned on.  As such it's a bit of a
knee-jerk reaction.  The problem is, there's not really a good way to debug
this.  There have been several suggestions, but the problem is it locks up bad
and years later still no one knows why.  You could try and track down what fglrx
does (if anything) by dumping the radeon and AGP chipset regs.  Since you
haven't had much luck with conventional software means, you may need access to
hardware analyzers or unpublished chipset errata.  Perhaps others with more AGP
chipset-side knowledge have some ideas.  
Comment 17 Ian Romanick 2006-10-30 21:01:44 UTC
(In reply to comment #15)
> Thanks - I should have stuck to IRC, cause I knew this would be a wontfix and
> merely wanted help with trying to debug it on my own.

After the completely fruitless debugging that we've done, I honestly believe
that you'd need to use a logic analyzer to trace the bus signals during the
write operations.
Comment 18 Roland Scheidegger 2006-10-31 13:33:42 UTC
(In reply to comment #13)
> Though I can confirm I got it working on a rv250 and a amd64 chipset to work too
> (without a performance diff neither).
Meh. Wanted to measure performance difference (I think copypixrate might be the
test to use), and it blew up right at xorg startup. So I can't confirm it works
for me after all... maybe I had tested by mistake with agp mode 1 before which
will turn this feature off automatically (though actually, with agpgart from
kernel 2.6.17, it still locks up here, since agpgart will try to put it into 0x
mode, which doesn't really seem to hurt otherwise, but thus it will not detect
that fast writes won't do anything and not disable them - apparently the 0x mode
happens because bridge_agpstat is 1f000a14 (in agp 2.0 mode), thus the bridge
claims it's not supporting 1x and 2x modes, which is afaik just plain illegal).
Comment 19 Dave Jones 2006-10-31 13:56:17 UTC
I fixed the 0x bug in 2.6.18
Comment 20 Roland Scheidegger 2006-11-02 09:06:09 UTC
(In reply to comment #19)
> I fixed the 0x bug in 2.6.18
A bit OT, but no this is not the issue which is fixed in 2.6.18, it still
reports 0x mode. In my case, the bridge is AGP 3.5, but the card is AGP 2.0.
You can easily see why that happens when looking at the various agp status
values, which look like that (printed out in agp_collect_device_status
immediately after reading vga_agpstat):
agpgart: Found an AGP 3.5 compliant device at 0000:00:00.0.
agpgart: req mode 1f000201 bridge_agpstat 1f000a14 vga_agpstat 2f000217.
agpgart: Device is in legacy mode, falling back to 2.x
agpgart: Putting AGP V2 device at 0000:00:00.0 into 0x mode
agpgart: Putting AGP V2 device at 0000:01:00.0 into 0x mode
agpgart: Putting AGP V2 device at 0000:01:00.1 into 0x mode
I think what the bridge reports (only supporting 4x rate) is illegal, or there
is some problem when putting it in 2.0 mode, in any case, motherboard is a asus
k8v se deluxe, chipset k8t800, the relevant lspci output:
0000:00:00.0 Host bridge: VIA Technologies, Inc.: Unknown device 3188 (rev 01)
        Subsystem: Asustek Computer, Inc.: Unknown device 80a3
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort+ >SERR- <PERR-
        Latency: 8
        Region 0: Memory at f0000000 (32-bit, prefetchable)
        Capabilities: [80] AGP version 3.5
                Status: RQ=32 Iso- ArqSz=0 Cal=2 SBA+ ITACoh- GART64- HTrans-
64bit- FW+ AGP3- Rate=x4
                Command: RQ=1 ArqSz=0 Cal=0 SBA+ AGP+ GART64- 64bit- FW- Rate=<none>
        Capabilities: [c0] #08 [0060]
        Capabilities: [68] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [58] #08 [8001]
Comment 21 Dave Jones 2006-11-03 08:41:15 UTC
sigh.  it's in AGPv2 mode, but trying to use an AGPv3 rate.
That isn't going to work.

I'll fix that up.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.