Bug 36952

Summary: segfault in crackberg xscreensaver on ATI RS482 in mesa git running kms & gallium
Product: Mesa Reporter: dri tester <writemeanddie>
Component: Drivers/Gallium/r300Assignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium    
Version: git   
Hardware: x86 (IA32)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: backtrace
possible fix
fix for crackberg performance and segfaulting on rs482

Description dri tester 2011-05-07 17:20:42 UTC
system 32 bit gentoo
video card ATI RS482 configured to share 128MB system RAM
kernel 2.6.38.4
xorg-server 1.10.1
mesa git commit 37058c3497850f452bdaf70a5dda07ee4840b6b9 May 4 2011
xf86-video-ati git commit 62a4cd180fe884dca24586d453395472516e6496 May 4, 2011
mesa compile flags: classic egl gallium llvm nptl openvg shared-dricore video_cards_r300 video_cards_radeon
xscreensaver 5.12

When running the xscreensaver 'crackberg' on mesa from git the screen freezes for 20-60 seconds (I think memory is filling up and swapping out RAM) and then the program exits with a segfault.  The message in /var/log/kern.log is
kernel: crackberg[30926]: segfault at b5a30000 ip b653b2c1 sp bf943e30 error 4.
If I downgrade to mesa 7.10.2-r1 the screensaver works as expected.  If I run with ums the screensaver works as expected.  Other screensavers are affected as well, I can compile a list if necessary.

Any other logs or info that I can collect is no problem.
Comment 1 Iaroslav Andrusyak 2011-05-08 08:35:00 UTC
Created attachment 46449 [details]
backtrace
Comment 2 Iaroslav Andrusyak 2011-05-08 08:35:24 UTC
I have some problem, and i have  ATI RS482 too, but i think it was the fault llvm, with out llvm or < 2.9 , all woks fine, with llvm > 2.8  openarena, urbanterror and some xscreensaver segfault. see https://bugs.freedesktop.org/show_bug.cgi?id=36738
I make backtrace with crackberg.
Comment 3 Alex Deucher 2011-05-08 19:11:01 UTC
Can you bisect mesa?
Comment 4 dri tester 2011-05-09 06:22:56 UTC
But if I downgrade to mesa 7.10.2, I am still using llvm 2.9 and the
crackberg screensaver works correctly, so that would point to mesa-git
being the problem for me.  I cannot compile mesa-git without llvm as the
build process halts and complains that r300 gallium requires llvm to
build.  There are three gentoo patches that get applied on top of the
mesa 7.10.2-r1 build, I haven't looked at what they do yet.

(In reply to comment #2)
> I have some problem, and i have  ATI RS482 too, but i think it was the fault
> llvm, with out llvm or < 2.9 , all woks fine, with llvm > 2.8  openarena,
> urbanterror and some xscreensaver segfault. see
> https://bugs.freedesktop.org/show_bug.cgi?id=36738
> I make backtrace with crackberg.
Comment 5 dri tester 2011-05-09 06:28:19 UTC
I will try to bisect.  I have to figure out how to install mesa-git without the ebuild (and without messing up my dependencies), or is there a way to pass commit tags to 'emerge' or specify in the ebuild?  Either way once I figure that out I will bisect mesa.
Comment 6 Chris Bandy 2011-05-09 09:52:02 UTC
(In reply to comment #5)
> is there a way to pass commit tags to 'emerge' or specify in the ebuild?

The x11 overlay has live ebuilds. Use EGIT_COMMIT to specify a commit to checkout.

http://devmanual.gentoo.org/eclass-reference/git-2.eclass/index.html
Comment 7 Iaroslav Andrusyak 2011-05-09 09:57:59 UTC
(In reply to comment #4)
> But if I downgrade to mesa 7.10.2, I am still using llvm 2.9 and the
> crackberg screensaver works correctly, so that would point to mesa-git
> being the problem for me.  I cannot compile mesa-git without llvm as the
> build process halts and complains that r300 gallium requires llvm to
> build.  There are three gentoo patches that get applied on top of the
> mesa 7.10.2-r1 build, I haven't looked at what they do yet.
> 
> (In reply to comment #2)
> > I have some problem, and i have  ATI RS482 too, but i think it was the fault
> > llvm, with out llvm or < 2.9 , all woks fine, with llvm > 2.8  openarena,
> > urbanterror and some xscreensaver segfault. see
> > https://bugs.freedesktop.org/show_bug.cgi?id=36738
> > I make backtrace with crackberg.

Can you run crackberg with gdb, and show backtrace? Or run openarena with
mesa-git+llvm-2.9.
Comment 8 dri tester 2011-05-13 03:45:55 UTC
(In reply to comment #3)
> Can you bisect mesa?

Bisecting lead me to this commit:

b10bff11350014e1bb49b0ce18704fdd66e850c0 is the first bad commit
commit b10bff11350014e1bb49b0ce18704fdd66e850c0
Author: Marek Olšák <maraeo@gmail.com>
Date:   Sat Dec 25 14:39:07 2010 +0100

    r300g: increase the size of upload buffers

On the way here, some of the bisects that failed had varying behaviour.  All of them had abysmal frame rates and the CPU% shot up to 100% on one CPU (only uses 50% prior to this commit).  Some continued to run, others locked up the x server and required an Alt-SysRq-B to reboot.
Comment 9 Marek Olšák 2011-05-13 15:24:00 UTC
Created attachment 46696 [details] [review]
possible fix

Could you try this patch?
Comment 10 dri tester 2011-05-14 17:55:51 UTC
(In reply to comment #9)
> Created an attachment (id=46696) [details]
> possible fix
> 
> Could you try this patch?

The patch applied to latest mesa-git allows crackberg to run at the expected framerate with the expected cpu% for a few seconds (5-15 seconds), then it segfaults.  The segfault might be caused by a different commit.  Would it be worth bisecting again but applying the possible fix patch above each time before compiling to see if I can find another bad commit?
Comment 11 Marek Olšák 2011-05-14 18:08:28 UTC
It would be better to experiment with the patch and try smaller buffer sizes. Please let me know when you find a size which works.
Comment 12 dri tester 2011-05-16 06:14:40 UTC
(In reply to comment #11)
> It would be better to experiment with the patch and try smaller buffer sizes.
> Please let me know when you find a size which works.

The current mesa-git value for the R300_MAX_DRAW_VBO_SIZE is 1024 * 1024 and your patch dropped it down to 512 * 1024.  I tried with 64 * 1024 (which is the value that worked before the commit that I bisected above) and crackberg works, the animation is smooth and the cpu usage is 50% on one cpu, but it segfaults after 13-18 seconds.  I really think there is another bug being uncovered here.  I do not know which buffer size to recommend for R300_MAX_DRAW_VBO_SIZE, 512 * 1024 as per your attached patch worked as well as 64 * 1024.  I assume that the value must be a power of 2, or are there some other values to test?  Should I test the values between 64 and 512 for completeness?
Comment 13 dri tester 2011-05-16 06:49:15 UTC
(In reply to comment #12)
> (In reply to comment #11)
> > It would be better to experiment with the patch and try smaller buffer sizes.
> > Please let me know when you find a size which works.
> 
> The current mesa-git value for the R300_MAX_DRAW_VBO_SIZE is 1024 * 1024 and
> your patch dropped it down to 512 * 1024.  I tried with 64 * 1024 (which is the
> value that worked before the commit that I bisected above) and crackberg works,
> the animation is smooth and the cpu usage is 50% on one cpu, but it segfaults
> after 13-18 seconds.  I really think there is another bug being uncovered here.
>  I do not know which buffer size to recommend for R300_MAX_DRAW_VBO_SIZE, 512 *
> 1024 as per your attached patch worked as well as 64 * 1024.  I assume that the
> value must be a power of 2, or are there some other values to test?  Should I
> test the values between 64 and 512 for completeness?

Disregard my last comment above, I was totally incorrect.  A value for R300_MAX_DRAW_VBO_SIZE of 32 * 1024 fixed crackberg completely for me.  I will attach the working patch as 0003-r300g-allocate-smaller-upload-buffers.patch.  Thanks Marek.
Comment 14 dri tester 2011-05-16 06:52:09 UTC
Created attachment 46763 [details] [review]
fix for crackberg performance and segfaulting on rs482
Comment 15 dri tester 2011-05-16 06:56:39 UTC
(In reply to comment #14)
> Created an attachment (id=46763) [details]
> fix for crackberg performance and segfaulting on rs482

Hmmmm, the lateset patch with a value of 32 * 1024 fixes crackberg but now antmaze gives me:
radeon: The kernel rejected CS, see dmesg for more information.

and dmesg says:
radeon 0000:01:05.0: (PW 2) Vertex array 0 need 8436 dwords have 8192 dwords
[drm:radeon_cs_ioctl] *ERROR* Invalid command stream !
radeon 0000:01:05.0: (PW 2) Vertex array 0 need 8436 dwords have 8192 dwords
[drm:radeon_cs_ioctl] *ERROR* Invalid command stream !
radeon 0000:01:05.0: (PW 2) Vertex array 0 need 8436 dwords have 8192 dwords
[drm:radeon_cs_ioctl] *ERROR* Invalid command stream !
radeon 0000:01:05.0: (PW 2) Vertex array 0 need 8436 dwords have 8192 dwords
[drm:radeon_cs_ioctl] *ERROR* Invalid command stream !
radeon 0000:01:05.0: (PW 2) Vertex array 0 need 8436 dwords have 8192 dwords
[drm:radeon_cs_ioctl] *ERROR* Invalid command stream !
radeon 0000:01:05.0: (PW 2) Vertex array 0 need 8436 dwords have 8192 dwords
[drm:radeon_cs_ioctl] *ERROR* Invalid command stream !
antmaze[13259]: segfault at 4 ip b5b57000 sp bfd14fcc error 6

I will play with the values some more.
Comment 16 Marek Olšák 2011-05-16 17:19:46 UTC
It's possible the bisection went wrong and the upload buffer size has nothing to do with this issue.
Comment 17 Tomasz P. 2012-12-04 16:55:28 UTC
I just instaled xscreensaver and run crackberg on rv350 / radeon 9600 agp.Normal memory usage nothing special.No warnings in logs. Animation works without any artefacts or screen flashing.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.