Bug 43893 - X is crashing very often X: radeon_cs_gem.c:181: cs_gem_write_reloc: Assertion `boi->space_accounted' failed.
X is crashing very often X: radeon_cs_gem.c:181: cs_gem_write_reloc: Assertio...
Status: RESOLVED FIXED
Product: xorg
Classification: Unclassified
Component: Driver/Radeon
git
x86-64 (AMD64) Linux (All)
: medium normal
Assigned To: xf86-video-ati maintainers
Xorg Project Team
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-12-16 08:38 UTC by Matthias
Modified: 2012-02-09 01:12 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
xorg.log of crash machine (30.54 KB, text/plain)
2011-12-16 09:07 UTC, Matthias
no flags Details
kernel log (49.55 KB, text/plain)
2011-12-16 09:07 UTC, Matthias
no flags Details
libdrm_radeon: account for write domain or read domains, not both (1.04 KB, patch)
2011-12-20 10:38 UTC, Michel Dänzer
no flags Details | Splinter Review
xf86-video-ati: Specify read domains or write domain for space check, not both (1.10 KB, patch)
2011-12-20 10:39 UTC, Michel Dänzer
no flags Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description Matthias 2011-12-16 08:38:42 UTC
Hi there,

I encountered the following error:
X: radeon_cs_gem.c:181: cs_gem_write_reloc: Assertion `boi->space_accounted' failed.

on my x86_64 archlinux box. verified with newest arch-xorg version with video-ati driver replaced by git version.

Reproduce (very simple):
start xinit session (default, the one without any fancy window manager, just a xterm in it).
In xterm I did "yaourt -S milkytracker" to show some coloured output.
When asking me if I want to see PKGBUILD, X crashes.

Attached gdb to X (without stripped symbols from radeon driver):


failed to revalidate
X: radeon_cs_gem.c:181: cs_gem_write_reloc: Assertion `boi->space_accounted' failed.

Program received signal SIGABRT, Aborted.
0x00007fd09ade5905 in raise () from /lib/libc.so.6
(gdb) bt
#0  0x00007fd09ade5905 in raise () from /lib/libc.so.6
#1  0x00007fd09ade6d7b in abort () from /lib/libc.so.6
#2  0x00007fd09adde74e in ?? () from /lib/libc.so.6
#3  0x00007fd09adde7f2 in __assert_fail () from /lib/libc.so.6
#4  0x00007fd098b035eb in ?? () from /usr/lib/libdrm_radeon.so.1
#5  0x00007fd098dbcfd3 in r600_cp_set_surface_sync.isra.0 ()
   from /usr/lib/xorg/modules/drivers/radeon_drv.so
#6  0x00007fd098dc7c29 in r600_finish_op () from /usr/lib/xorg/modules/drivers/radeon_drv.so
#7  0x00007fd098dbba0a in R600Copy () from /usr/lib/xorg/modules/drivers/radeon_drv.so
#8  0x00007fd0982e1148 in ?? () from /usr/lib/xorg/modules/libexa.so
#9  0x00007fd0982e147f in ?? () from /usr/lib/xorg/modules/libexa.so
#10 0x00000000005456ca in miCopyRegion ()
#11 0x0000000000545bc2 in miDoCopy ()
#12 0x00007fd0982df786 in ?? () from /usr/lib/xorg/modules/libexa.so
#13 0x00000000004fa78c in ?? ()
#14 0x000000000042fcf3 in ?? ()
#15 0x0000000000433c59 in ?? ()
#16 0x0000000000422e8a in ?? ()
#17 0x00007fd09add214d in __libc_start_main () from /lib/libc.so.6
#18 0x000000000042317d in _start ()



I have a M4A78LT-M-LE, BIOS 0507    02/23/2010 board
with CPU0: AMD Athlon(tm) II X2 235e Processor stepping 02.

Dont ask about the onboard video chip, here is what dmesg says:
radeon 0000:01:05.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
[    3.968925] radeon 0000:01:05.0: setting latency timer to 64
[    3.969170] [drm] initializing kernel modesetting (RS780 0x1002:0x9616 0x1043:0x8388).
[    3.969187] [drm] register mmio base: 0xFEAF0000
[    3.969188] [drm] register mmio size: 65536
[    3.969815] ATOM BIOS: B27722_RS780C
[    3.969831] radeon 0000:01:05.0: VRAM: 32M 0x00000000C0000000 - 0x00000000C1FFFFFF (32M used)
[    3.969833] radeon 0000:01:05.0: GTT: 512M 0x00000000A0000000 - 0x00000000BFFFFFFF
[    3.970062] [drm] Detected VRAM RAM=32M, BAR=32M
[    3.970065] [drm] RAM width 32bits DDR
[    3.970121] [TTM] Zone  kernel: Available graphics memory: 1010832 kiB.
[    3.970123] [TTM] Initializing pool allocator.
[    3.970146] [drm] radeon: 32M of VRAM memory ready
[    3.970148] [drm] radeon: 512M of GTT memory ready.
[    3.970167] [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
[    3.970168] [drm] Driver supports precise vblank timestamp query.
[    3.970186] [drm] radeon: irq initialized.
[    3.970190] [drm] GART: num cpu pages 131072, num gpu pages 131072
[    3.971148] [drm] Loading RS780 Microcode



I also installed XFCE4. With very light usage I encountered the same problem. I did not try to reproduce it there, but it might be possible with some steps...

Does nobody except me use this kind of graphics chip under linux?! All other bugs I found to that crash are not relevant, the only one relevant was closed as fixed in april 2011.
(http://comments.gmane.org/gmane.linux.debian.devel.x/102978)
Comment 1 Matthias 2011-12-16 08:59:38 UTC
I managed to compile libdrm and the radeon driver with debug symbols (and with O0). Here is a new backtrace:

failed to revalidate
X: radeon_cs_gem.c:181: cs_gem_write_reloc: Assertion `boi->space_accounted' failed.

Program received signal SIGABRT, Aborted.
0x00007fe4d0943905 in raise () from /lib/libc.so.6
(gdb) bt
#0  0x00007fe4d0943905 in raise () from /lib/libc.so.6
#1  0x00007fe4d0944d7b in abort () from /lib/libc.so.6
#2  0x00007fe4d093c74e in ?? () from /lib/libc.so.6
#3  0x00007fe4d093c7f2 in __assert_fail () from /lib/libc.so.6
#4  0x00007fe4ce5e9856 in cs_gem_write_reloc (cs=0x8e5410, bo=0xc756d0, read_domain=2, 
    write_domain=0, flags=0) at radeon_cs_gem.c:181
#5  0x00007fe4ce5eb380 in radeon_cs_write_reloc (cs=0x8e5410, bo=0xc756d0, read_domain=2, 
    write_domain=0, flags=0) at radeon_cs.c:20
#6  0x00007fe4ce8f4dcf in r600_cp_set_surface_sync (pScrn=0x8d8e40, ib=0x0, sync_type=8388608, 
    size=48, mc_addr=0, bo=0xc756d0, rdomains=2, wdomain=0) at r6xx_accel.c:333
#7  0x00007fe4ce8f92f2 in r600_set_vtx_resource (pScrn=0x8d8e40, ib=0x0, res=0x7fff1b876570, 
    domain=2) at r6xx_accel.c:583
#8  0x00007fe4ce905429 in r600_finish_op (pScrn=0x8d8e40, vtx_size=16) at r6xx_accel.c:1267
#9  0x00007fe4ce8e9240 in R600DoCopy (pScrn=0x8d8e40) at r600_exa.c:514
#10 0x00007fe4ce8e9dd3 in R600Copy (pDst=0x92f000, srcX=4, srcY=17, dstX=4, dstY=4, w=480, h=299)
    at r600_exa.c:752
#11 0x00007fe4cddc7148 in ?? () from /usr/lib/xorg/modules/libexa.so
#12 0x00007fe4cddc747f in ?? () from /usr/lib/xorg/modules/libexa.so
#13 0x00000000005456ca in miCopyRegion ()
#14 0x0000000000545bc2 in miDoCopy ()
#15 0x00007fe4cddc5786 in ?? () from /usr/lib/xorg/modules/libexa.so
#16 0x00000000004fa78c in ?? ()
#17 0x000000000042fcf3 in ?? ()
#18 0x0000000000433c59 in ?? ()
#19 0x0000000000422e8a in ?? ()
#20 0x00007fe4d093014d in __libc_start_main () from /lib/libc.so.6
#21 0x000000000042317d in _start ()



Also the reproduce does not work 100% the times. sometimes it crashes only when I do one action inside yaourt, so there needs to be some more output on xterm... but mostly it crashes without me doing anything except calling yaourt.
Comment 2 Alex Deucher 2011-12-16 09:02:48 UTC
What version of the ddx (xf86-video-ati) are you using?  Please attach your xorg log and dmesg output.
Comment 3 Matthias 2011-12-16 09:07:14 UTC
Created attachment 54504 [details]
xorg.log of crash machine
Comment 4 Matthias 2011-12-16 09:07:50 UTC
Created attachment 54505 [details]
kernel log
Comment 5 Matthias 2011-12-16 09:09:23 UTC
attached dmesg and xorg log. 

version is git aacbd629b02cbee3f9e6a0ee452b4e3f21376bd3
(also reproduceable with extra/xf86-video-ati 6.14.3-1)

Also tried libdrm from git (latest). the latest backtrace is with both packages from git, as attached logs are.
Comment 6 Michel Dänzer 2011-12-20 10:38:08 UTC
Created attachment 54605 [details] [review]
libdrm_radeon: account for write domain or read domains, not both
Comment 7 Michel Dänzer 2011-12-20 10:39:40 UTC
Created attachment 54606 [details] [review]
xf86-video-ati: Specify read domains or write domain for space check, not both
Comment 8 Michel Dänzer 2011-12-20 10:41:27 UTC
I've attached two patches, one of which should fix the root problem in libdrm_radeon, the other of which should avoid the problem in xf86-video-ati. Can you verify that either patch alone fixes / avoids the problem?
Comment 9 Matthias 2011-12-20 10:45:29 UTC
Thank you very much.

Unfortunately I'm not at home until january, 9th. I will try both patches (combined and individual) and reply here at that time.

Maybe someone else is affected too and can try that earlier than me? 3 Weeks is a long time :)
Comment 10 Michel Dänzer 2012-01-17 02:24:14 UTC
(In reply to comment #9)
> Unfortunately I'm not at home until january, 9th. I will try both patches
> (combined and individual) and reply here at that time.

Did you get a chance to test the patches?
Comment 11 Matthias 2012-02-03 09:20:56 UTC
Hi,

I tested both patches alone and combined just today, sorry for that delay.

I used "old" ARCH packages as base. Patches applied without any problem and both alone fixed the occurence of the bug.

I did not encounter any other problems during my test session (worked about 2 hours with firefox/graphical terminal/milkytracker on the system - this would not have been possible for even one minute without the fix).

Thank you very much!
Comment 12 Matthias 2012-02-03 09:22:49 UTC
ah sorry,
I used these versions of todays archlinux repository, if it matters:
libdrm-2.4.28
xf86-video-ati-6.14.3
Comment 13 Michel Dänzer 2012-02-08 06:07:10 UTC
Fix pushed, thanks for testing.
Comment 14 aceman 2012-02-08 11:02:28 UTC
This was whole X server crashing?
May it be related to bug 33038? But there only a single program was crashing alone.
Comment 15 Michel Dänzer 2012-02-09 01:12:31 UTC
(In reply to comment #14)
> May it be related to bug 33038?

No, that was a bug in r600c, which is now dead in favour of r600g.