Bug 27355

Summary: [R300] Xorg freezes if EXANoComposite is disabled
Product: xorg Reporter: SpOeK <SpOeK>
Component: Driver/RadeonAssignee: xf86-video-ati maintainers <xorg-driver-ati>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: critical    
Priority: medium    
Version: git   
Hardware: x86 (IA32)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
dmesg output
none
Xorg output
none
kernel output with drm.debug=1
none
R300 AD only has one quad pipe none

Description SpOeK@DistroBit.Net 2010-03-28 09:18:37 UTC
Created attachment 34518 [details]
dmesg output

Enabling EXA leads my system to a complete freeze. Initially, the login screen shows the fonts corrupted and, when I've logged in successfully, the system freezes showing a corrupted desktop (fonts and icons). Even the SysReq combo doesn't work.

I've tried the next:
With "BusType" "PCI" I get "(WW) RADEON(0): Option "BusType" is not used".
Options "EXANoUploadToScreen" and "EXANoDownloadFromScreen" are useless, the
freeze continues.
Option "EXANoComposite" gives me a working system but, in exchange, it seems to
make X a very cpu hungry process.

I'm trying to use KMS. My GPU is an "ATI R300 AD [Radeon 9500 Pro]". I'm on a Gentoo box using the git version of the graphics driver stack through the x11 overlay and, also, the kernel git version 2.6.33-rc8.

I can attach my .config if you wish but these are the related parameters to
radeon in it:
CONFIG_DRM_RADEON=m
CONFIG_DRM_RADEON_KMS=y
CONFIG_FRAMEBUFFER_CONSOLE=y
CONFIG_FB=y
# CONFIG_FB_RADEON is not set

My last build from git (graphics drivers stack) was in March 5th but I can catch up if necessary, kernel included.

Please, feel free to ask for more information or if you want me to try a patch.

P.S.: I thought that my bug was #23660 and I added my data to it but I've been told that by using KMS my problem is not related.
Comment 1 SpOeK@DistroBit.Net 2010-03-28 09:19:17 UTC
Created attachment 34519 [details]
Xorg output

Xorg.0.log file.
Comment 2 Alex Deucher 2010-03-28 09:53:23 UTC
Does changing the agpmode or disabling AGP help?  Add:
radeon.agpmode=X
where X = -1, 1, 2, or 4 to the kernel command line.  -1 will use pci
gart rather than agp.
Comment 3 SpOeK@DistroBit.Net 2010-03-28 10:16:44 UTC
(In reply to comment #2)
> Does changing the agpmode or disabling AGP help?  Add:
> radeon.agpmode=X
> where X = -1, 1, 2, or 4 to the kernel command line.  -1 will use pci
> gart rather than agp.
> 

-1, 4 y 8: freezes.
1 y 2: not supported, defaults to 8.
Comment 4 Michel Dänzer 2010-03-29 00:46:13 UTC
With ExaNoComposite, does the problem still occur when running 3D applications?

Can you provide more details about the kernel Git snapshot you're using?
Comment 5 SpOeK@DistroBit.Net 2010-03-29 12:05:06 UTC
(In reply to comment #4)
> With ExaNoComposite, does the problem still occur when running 3D applications?
> 
Yes, I tried GLHexen2 [1] and the system froze.

[1] http://uhexen2.sourceforge.net


> Can you provide more details about the kernel Git snapshot you're using?
> 
I don't know if you are referring to another kind of information from the git snapshot but this is what I've found:

I'm using the version tagged 2.6.33-rc8 from [2].
Last commit:

commit 724e6d3fe8003c3f60bf404bf22e4e331327c596
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Fri Feb 12 11:07:45 2010 -0800

    Linux 2.6.33-rc8

[2] git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
Comment 6 Michel Dänzer 2010-03-30 01:27:54 UTC
So it seems to be a general stability issue when using the 3D engine. Is it the same when not using KMS? Also, do you have a Windows installation to try if the 3D functionality is stable there?
Comment 7 SpOeK@DistroBit.Net 2010-03-30 12:14:33 UTC
(In reply to comment #6)
> So it seems to be a general stability issue when using the 3D engine. Is it the
> same when not using KMS?
>
(radeon.modeset=0) with EXANoComposite enabled: GLHexen2 works perfectly.
(radeon.modeset=0) without EXANoComposite: the fonts during the login process are corrupted. After loggin in, GNOME tries to set the desktop but it fails. I was able to access through SSH and "top" showed the process X at almost 100% cpu time.

> Also, do you have a Windows installation to try if the
> 3D functionality is stable there?
> 
I've played to UT2004 for a while and no problem.
Comment 8 SpOeK@DistroBit.Net 2010-03-31 12:00:24 UTC
Created attachment 34586 [details]
kernel output with drm.debug=1

I booted Linux with radeon.modeset=0, drm.debug=1 and EXANoComposite disabled.
The attachment is the resulting output from dmesg.
Also, I've attached gdb to X and tried to watch where is the process stalled. This is what I've found:

(gdb) c
Continuing.
^C
Program received signal SIGINT, Interrupt.
0xb777a424 in __kernel_vsyscall ()
(gdb) bt
#0  0xb777a424 in __kernel_vsyscall ()
#1  0xb74d18c9 in ioctl () from /lib/libc.so.6
#2  0xb73574bb in drmDMA () from /usr/lib/libdrm.so.2
#3  0xb726bfe7 in ?? () from /usr/lib/xorg/modules/drivers/radeon_drv.so
#4  0x0000000b in ?? ()
#5  0xbfa780cc in ?? ()
#6  0xbfa780a4 in ?? ()
#7  0x080777a7 in dixLookupPrivate ()
#8  0xb726c439 in ?? () from /usr/lib/xorg/modules/drivers/radeon_drv.so
#9  0x0921b690 in ?? ()
#10 0x0000000d in ?? ()
#11 0xbfa7815c in ?? ()
#12 0x00000010 in ?? ()
#13 0x09228fb0 in ?? ()
#14 0xb70951b4 in ?? () from /usr/lib/xorg/modules/libexa.so
#15 0x09228e60 in ?? ()
#16 0x080777a7 in dixLookupPrivate ()
#17 0xb73086a8 in ?? () from /usr/lib/xorg/modules/drivers/radeon_drv.so
#18 0x0921b690 in ?? ()
#19 0x00000001 in ?? ()
#20 0xa2ba3008 in ?? ()
#21 0x00000000 in ?? ()

I know that this is very basic information and possibly not very useful but I don't know how can I extract more information from my system. Any hint would be appreciated.

P.S.: I work as a coder and I've coded some Linux kernel modules and a driver for Windows, so I could be more helpful, but I don't have any idea about the Linux graphics drivers stack so I'm completely lost. Sorry!
Comment 9 Pauli 2010-04-01 00:53:27 UTC
> --- Comment #8 from Rafael Antonio Porras Samaniego <SpOeK@DistroBit.Net>  2010-03-31 12:00:24 PST ---
>
> I know that this is very basic information and possibly not very useful but I
> don't know how can I extract more information from my system. Any hint would be
> appreciated.
>

You need debug symbols for all packages that are used in backtrace.
That means --enable-debug when configuring xserver and making sure
that symbols are not stripped. For video-ati and libdrm it most likely
is just enough that the resulting modules are not stripped.

But drmDMA is only called from one place in source. That is when
allocating command buffer that will be submitted to GPU.

So dmesg is showing that fence counter is not advancing show something
is causing GPU hang. Maybe it is something incorrect in composite
handler which causes the GPU hang. I don't know exactly but it would
help if the currently in processing command buffer would be captured
and analyzed what is going into GPU.

> P.S.: I work as a coder and I've coded some Linux kernel modules and a driver
> for Windows, so I could be more helpful, but I don't have any idea about the
> Linux graphics drivers stack so I'm completely lost. Sorry!
>

PS. When you disable composite that operation is done in software
which is expensive per pixel operation. It could be a lot better if
disabling all render acceleration until the bug is fixed because
fallback code is more expensive than pure software rendering.
Comment 10 Michel Dänzer 2010-04-01 01:35:14 UTC
Created attachment 34594 [details] [review]
R300 AD only has one quad pipe

Does this kernel patch help with/out KMS?

(In reply to comment #9)
> That means --enable-debug when configuring xserver and making sure
> that symbols are not stripped.

--enable-debug is probably overkill, just make sure -g is passed to the compiler.


> PS. When you disable composite that operation is done in software
> which is expensive per pixel operation. It could be a lot better if
> disabling all render acceleration until the bug is fixed because
> fallback code is more expensive than pure software rendering.

Not sure what you're talking about, there's no better/other way to disable RENDER acceleration while keeping other 2D acceleration.
Comment 11 SpOeK@DistroBit.Net 2010-04-01 11:18:09 UTC
(In reply to comment #10)
> Created an attachment (id=34594) [details]
> R300 AD only has one quad pipe
> 
> Does this kernel patch help with/out KMS?
Yes, it helped a lot! Now EXANoComposite is completely unnecessary with and without KMS. Same goes for GLHexen2.

Just one little thing, with no KMS, Youtube (flash) on Firefox runs smoothly with Firefox getting about 40% of CPU time and X around 6%. 'top' shows that only about 2% of the consumed CPU time is system time.
But, with KMS, Youtube doesn't run so good: Firefox gets 40% of CPU time and X the rest, 60%. In this case, 'top' shows around 55% of the consumed CPU time is system time.

Are this values normal with KMS? If not, is this bug related to this one o should I file another new bug report? Should I try first to play with radeon options (ColorTiling, EnablePageFlip, etc)? Now I don't have any of them enabled:

        Driver  "radeon"
        Option  "AccelMethod"           "EXA"
#       Option "EXANoComposite"        "true"

Finally, with KMS disabled, dmesg output:

glhexen2:3772 freeing invalid memtype d0102000-d0112000
glhexen2:3772 freeing invalid memtype d0112000-d0122000
glhexen2:3772 freeing invalid memtype d0122000-d0132000
glhexen2:3772 freeing invalid memtype d0132000-d0142000
glhexen2:3772 freeing invalid memtype d0142000-d0152000
glhexen2:3772 freeing invalid memtype d0152000-d0162000
glhexen2:3772 freeing invalid memtype d0162000-d0172000
glhexen2:3772 freeing invalid memtype d0172000-d0182000
glhexen2:3772 freeing invalid memtype d0182000-d0192000
glhexen2:3772 freeing invalid memtype d0192000-d01a2000
glhexen2:3772 freeing invalid memtype d01a2000-d01b2000
glhexen2:3772 freeing invalid memtype d01b2000-d01c2000
glhexen2:3772 freeing invalid memtype d01c2000-d01d2000
glhexen2:3772 freeing invalid memtype d01d2000-d01e2000
glhexen2:3772 freeing invalid memtype d01e2000-d01f2000
glhexen2:3772 freeing invalid memtype d01f2000-d0202000
glhexen2:3772 freeing invalid memtype d0202000-d0212000
glhexen2:3772 freeing invalid memtype d0212000-d0222000
glhexen2:3772 freeing invalid memtype d0222000-d0232000
glhexen2:3772 freeing invalid memtype d0232000-d0242000
glhexen2:3772 freeing invalid memtype d0242000-d0252000
glhexen2:3772 freeing invalid memtype d0252000-d0262000
glhexen2:3772 freeing invalid memtype d0262000-d0272000
glhexen2:3772 freeing invalid memtype d0272000-d0282000
glhexen2:3772 freeing invalid memtype d0282000-d0292000
glhexen2:3772 freeing invalid memtype d0292000-d02a2000
glhexen2:3772 freeing invalid memtype d02a2000-d02b2000
glhexen2:3772 freeing invalid memtype d02b2000-d02c2000
glhexen2:3772 freeing invalid memtype d02c2000-d02d2000
glhexen2:3772 freeing invalid memtype d02d2000-d02e2000
glhexen2:3772 freeing invalid memtype d02e2000-d02f2000
glhexen2:3772 freeing invalid memtype d02f2000-d0302000

Exactly the same output appears again but related to process firefox. With KMS, dmesg shows nothing.

In conclusion, whether KMS is enabled or disabled, the system doesn't freeze anymore. But the performance with Youtube(flash) and OpenGL is worse using KMS than without it. It's up to you to decide if should file a new bug report. If so, it's fine if you close this bug.

And last but not least, many thanks to all of you for your time and work.
Comment 12 Michel Dänzer 2010-04-02 10:56:40 UTC
(In reply to comment #11)
> > Does this kernel patch help with/out KMS?
> Yes, it helped a lot! Now EXANoComposite is completely unnecessary with and
> without KMS. Same goes for GLHexen2.

Great, thanks for testing. I've submitted the patch to Dave Airlie and will resolve this report as fixed when it lands in Linus' tree or at least one of the drm trees.


Your other issues would need to be tracked separately, but note that KMS provides additional features which don't come for free, and it's known that some things still perform worse with it than without. We're working on fixing this, an additional report would probably only be useful if you can at least provide profiling information.
Comment 13 Michel Dänzer 2010-04-06 04:21:02 UTC
Fix landed in drm-linus, should get into 2.6.34 and hopefully 2.6.33.y.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.