Bug 7697 - r300_check_offset fails on PCI-E R420 (5D4F)
r300_check_offset fails on PCI-E R420 (5D4F)
Status: RESOLVED FIXED
Product: DRI
Classification: Unclassified
Component: DRM/other
DRI git
x86 (IA32) Linux (All)
: high normal
Assigned To: Default DRI bug account
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2006-07-30 12:02 UTC by Timo Jyrinki
Modified: 2006-12-14 10:33 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg output with drm debug=1 (121.49 KB, text/plain)
2006-10-16 10:05 UTC, Timo Jyrinki
no flags Details
r300_cmdbuf offset error (1.33 KB, application/x-gzip)
2006-12-06 10:30 UTC, Timo Jyrinki
no flags Details
r300_cmdbuf offset error, new version (1.63 KB, application/x-gzip)
2006-12-09 09:56 UTC, Timo Jyrinki
no flags Details
offset check fix (825 bytes, patch)
2006-12-09 10:01 UTC, Timo Jyrinki
no flags Details | Splinter Review
Unify offset checking (5.90 KB, patch)
2006-12-12 22:57 UTC, Michel Dänzer
no flags Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description Timo Jyrinki 2006-07-30 12:02:47 UTC
I'm using an AMD64 computer with X800 GTO PCI-Express gfx card (5D4F). I'm using
both a) Ubuntu 6.06 LTS default configuration and b) Ubuntu edgy development
branch with kernel 2.6.17 drm modules, mesa 6.5.0.cvs.20060725,
xserver-xorg-driver-ati-6.6.1 and libdrm cvs. Glxinfo gives me "Direct
rendering: Yes", but glxgears gives: drmRadeonCmdBuffer: -22 (exiting)

In both configurations, dmesg shows the following after trying to run glxgears:
[ 1849.338955] [drm:r300_emit_carefully_checked_packet0] *ERROR* Offset failed
range check (reg=4e28 sz=1)
[ 1849.338960] [drm:r300_do_cp_cmdbuf] *ERROR* r300_emit_packet0 failed

For some reason, r300_check_offset in r300_cmdbuf.c is failing while it probably
should not. At least if I force that function to return 0; (in the Ubuntu edgy
configuration), glxgears runs without problems (ca. 2800 fps) and so does a
couple of screensavers.

Any idea what's going wrong? If you need more information, just tell what should
I find out.
Comment 1 Aapo Tahkola 2006-07-31 16:49:22 UTC
Mesa 6.5.x releases have broken PCIE support. I think r300 support in HEAD is
now as stable as it used to be when 6.5 went out.
Comment 2 Timo Jyrinki 2006-10-15 06:52:08 UTC
Now Ubuntu dev branch has mesa 6.5.1~20060817 and xserver-xorg-video-ati 6.6.2.
Additionally, I've installed libdrm and linux-core/drm.ko + linux-core/radeon.ko
from mesa-drm git HEAD. I'm still getting the same error messages when trying
eg. glxgears. If I apply the following:
diff --git a/shared-core/r300_cmdbuf.c b/shared-core/r300_cmdbuf.c
index c65ffd5..561f614 100644
--- a/shared-core/r300_cmdbuf.c
+++ b/shared-core/r300_cmdbuf.c
@@ -259,7 +259,7 @@ static __inline__ int r300_check_offset(
        if (offset >= dev_priv->gart_vm_start &&
            offset < (dev_priv->gart_vm_start + dev_priv->gart_size))
                return 0;
-       return 1;
+       return 0;
 }
 
 static __inline__ int r300_emit_carefully_checked_packet0(drm_radeon_private_t 
*

and install the new radeon.ko generated, the error message disappears (offset
check always returns 0) and 3D starts to work. Glxgears runs, GL* Gnome
screensavers run etc.

The problem is also happening when running x86 version of Linux instead of
AMD64. Please tell me if I can gather relevant information for you or something. 
Comment 3 Christian - Manny Calavera - Neumair 2006-10-15 10:21:06 UTC
This looks like you're essentially reducing the function to

 return 0;

and thus narrowing down GART boundary checking. I doubt this is the right
solution, I rather suppose that gart_vm_start and gart_size are not set
correctly for PCIE cards.


I'm also having issues with a Xegl / radeon mobility M300 setup (i.e.
R300/CHIP_RV380). I'm trying to understand the DRM code and it actually looks
like some of the code assumes that the GART is right behind the framebuffer, but
it might be hard under some circumstances to get the actual FB size (cf.
http://lists.freedesktop.org/archives/xorg/2005-May/007671.html). For instance,
the kernel seems to limit the VRAM to MAX_VRAM_SIZE, and I fear that the GART
position might be located somewhere inside this VRAM.

It would also be interesting to load the drm module with "debug=1" and check
whether the

 dmesg | grep "Setting GART location based on old memory map"

shell command returns a matching line.

For me, together with another modification, forcing dev_priv->new_memmap in
radeon_drv.c to 1 I got my PCIE card beyond a segfault and made it display
"vertical stripes", which seems to be a well-known radeonfb issue. It's quite
hard to determine whether bogus microcode is involved or not, and not many
experts seems to be available for the radeon driver.

For instance, I've spent roughly 15 hours on this issue and found no valuable
documentation explaining the detailed design desicions for FB implementations
and radeonFB layout. I'm CCing Dave Airlie because I still don't understand the
whole code.
Comment 4 Michel Dänzer 2006-10-16 02:06:23 UTC
There's a good chance this is fixed in xf86-video-ati git commit
6671c1b01bf29d8f1cacf9306ef658b967d8a3cf (not in any release yet), please test.
Comment 5 Timo Jyrinki 2006-10-16 10:04:12 UTC
(In reply to comment #4)
> There's a good chance this is fixed in xf86-video-ati git commit
> 6671c1b01bf29d8f1cacf9306ef658b967d8a3cf (not in any release yet), please test.

Does not seem to help, I installed ati_drv.so, atimisc_drv.so, r128_drv.so and
radeon_drv.so as GIT versions.

As to Chris's comments, I only got "Setting GART location based on new memory
map" (not old) on one drm module load, and now I got:
[  251.148528] [drm] Initialized drm 1.1.0 20060810
[  253.021470] [drm:drm_init] 
[  253.022024] [drm:drm_get_dev] 
[  253.022069] ACPI: PCI Interrupt 0000:01:00.0[A] -> GSI 18 (level, low) -> IRQ
 233
[  253.022076] PCI: Setting latency timer of device 0000:01:00.0 to 64
[  253.022238] [drm:radeon_driver_load] PCIE card detected
[  253.022295] [drm:drm_ctxbitmap_next] drm_ctxbitmap_next bit : 0
[  253.022351] [drm:drm_ctxbitmap_init] drm_ctxbitmap_init : 0
[  253.022354] [drm:drm_get_head] 
[  253.022787] [drm:drm_get_head] new minor assigned 0
[  253.022791] [drm] Initialized radeon 1.25.0 20060524 on minor 0: 
Comment 6 Timo Jyrinki 2006-10-16 10:05:37 UTC
Created attachment 7431 [details]
dmesg output with drm debug=1

dmesg output with drm module loaded with debug=1, when glxgears is executed
Comment 7 Timo Jyrinki 2006-12-04 11:51:28 UTC
Interesting that the same error code is also shown for a person using AGP Radeon
9200,
https://bugs.launchpad.net/distros/ubuntu/+source/xserver-xorg-video-ati/+bug/65605
Not that I'd know if it helps anyone to pinpoint the actual problem, but the
problem seems to (in one way or another) to occur also outside of r300_cmdbuf.c.
Comment 8 Michel Dänzer 2006-12-05 04:47:14 UTC
(In reply to comment #7)
> Interesting that the same error code is also shown for a person using AGP Radeon
> 9200,
>
https://bugs.launchpad.net/distros/ubuntu/+source/xserver-xorg-video-ati/+bug/65605

That's bug 7595. This one might be a duplicate; could everybody make sure
they're using the DRM from kernel >= 2.6.19 (where the fix was integrated) or
from git. If it still happens with that, please add some debugging output that
shows what the offending offset is and why it gets rejected.
Comment 9 Timo Jyrinki 2006-12-06 10:30:46 UTC
Created attachment 7985 [details]
r300_cmdbuf offset error
Comment 10 Timo Jyrinki 2006-12-06 10:33:17 UTC
(In reply to comment #8)
> That's bug 7595. This one might be a duplicate; could everybody make sure
> they're using the DRM from kernel >= 2.6.19 (where the fix was integrated) or
> from git. If it still happens with that, please add some debugging output that
> shows what the offending offset is and why it gets rejected.

Happens with 2.6.19 and git version (for me). I added a DRM_ERROR debug output
in the check_offset function, so there's "Offset=nnn" in the attached log in the
place the checking fails. Do you need more debug output?
Comment 11 Michel Dänzer 2006-12-07 02:46:50 UTC
(In reply to comment #10)
> I added a DRM_ERROR debug output in the check_offset function, so there's
> "Offset=nnn" in the attached log in the place the checking fails.

Thanks.

> Do you need more debug output?

It would be nice if the offset was printed in hex and if it also printed the
value compared against.

Please also attach the corresponding full X log file.
Comment 12 Timo Jyrinki 2006-12-09 09:56:54 UTC
Created attachment 8039 [details]
r300_cmdbuf offset error, new version

Here's a new version of the log file.
Comment 13 Timo Jyrinki 2006-12-09 10:01:25 UTC
Created attachment 8040 [details] [review]
offset check fix

While trying to get the hex output, I noticed that I had to do a lot of casting
to get correct (long enough) values printed out. Next I noticed that actually
the check should be returning ok according to the debug output (see previous
attachment), but it did not... so doing similar casts in the check itself, like
with the patch attached, the problem goes away and 3D works!

You probably know how to make a better patch, but just for reference. Do you
need the Xorg.log anymore?
Comment 14 Roland Scheidegger 2006-12-09 16:03:08 UTC
(In reply to comment #13)
> While trying to get the hex output, I noticed that I had to do a lot of casting
> to get correct (long enough) values printed out. Next I noticed that actually
> the check should be returning ok according to the debug output (see previous
> attachment), but it did not... so doing similar casts in the check itself, like
> with the patch attached, the problem goes away and 3D works!
So there is a problem if the fb gets exactly mapped at the end of the 32bit
address space. Couldn't this happen on 32bit systems too? And with the gart
area? The same bug is certainly present in radeon_state.c too, and radeon_cp.c
uses the same calculation. Just storing fb_size -1 instead of fb_size (and
change the comparisons accordingly) might work too instead of the casts all over
the place, just need to make sure the fb_size wasn't 0 before (which shouldn't
really happen)...
Comment 15 Michel Dänzer 2006-12-11 03:05:55 UTC
Again, see bug 7595; unfortunately, I completely forgot about the r300 DRM being
a whole parallel DRM within radeon when I fixed that. Ideally, everything should
use a single function for this.
Comment 16 Roland Scheidegger 2006-12-11 04:08:39 UTC
(In reply to comment #15)
> Again, see bug 7595; unfortunately, I completely forgot about the r300 DRM being
> a whole parallel DRM within radeon when I fixed that. Ideally, everything should
> use a single function for this.
Ah right, I looked at old code and missed it is fixed already for radeon. You're
right ideally it should use the same function, though r300 doesn't have to worry
about old broken clients.
Comment 17 Michel Dänzer 2006-12-12 22:57:46 UTC
Created attachment 8080 [details] [review]
Unify offset checking

Does this patch work for you as well?
Comment 18 Timo Jyrinki 2006-12-13 05:39:41 UTC
Yes, the patch works. Thanks! Please put a note when you commit the change so I
can mark this as fixed (it seems bugzilla's verified/closed are not used much here).
Comment 19 Michel Dänzer 2006-12-14 10:33:55 UTC
Fixed in drm git commit aefc7a34431a8f1540b261e23d8b8d05d824b60a.