I haven't found a specific sequence of actions which could reproduce this for sure. After startx, drag any window on the screen back and forth several times, the system will hang, but you don't know exactly when it will hang. When the system hangs, I got this message on stdout: exaCopyDirty: Pending damage region empty! I will attach the xorg.conf I am using and the Xorg.0.log produced.
Created attachment 25609 [details] xorg.conf
http://www.gentoo-cn.org/~zhangle/Xorg.0.log.bz2 The Xorg.0.log is too large, I bzip2'ed it.
BTW, please take a look at this picture: http://www.gentoo-cn.org/~zhangle/2009-05-08-045027_1024x600_scrot.png There are some black lines which should not exist. Any idea?
Does it work right when using XAA instead of EXA? You might also try: > Option "NoAccel" or > Option "MCLK" "0Hz"
I have tried XAA, NoAccel and OHz, but no luck, :(. Also I found sometimes, this "exaCopyDirty: Pending damage region empty!" happened before the crash.
(In reply to comment #5) > I have tried XAA, NoAccel and OHz, but no luck, :(. > > Also I found sometimes, this "exaCopyDirty: Pending damage region empty!" > happened before the crash. > It is very strange that it crashes with acceleration disabled. Could you attach the logs generated with some high verbosity level (like "-logverbose 7") and both config options? Like: > Option "NoAccel" > Option "MCLK" "0Hz" Thanks.
Is there any specific application that tends to trigger it? Does it also happen when you leave the X server idling without any clients, not even a window manager? To discard that this is a duplicate of #15898 (It doesn't sound like the same, but just to be sure), you might try something like: $ xset dpms force off; sleep 1; xset dpms force on Does that hang your computer?
(In reply to comment #7) > To discard that this is a duplicate of #15898 (It doesn't sound like the same, > but just to be sure), you might try something like: > $ xset dpms force off; sleep 1; xset dpms force on > > Does that hang your computer? I have tried in Depth 24 and 16, in both situations the system won't hang. Just that, "xset dpms force on" can't light the screen, I have to touch the keyboard to light the screen.
(In reply to comment #7) > Is there any specific application that tends to trigger it? Does it also happen > when you leave the X server idling without any clients, not even a window > manager? I haven't found any specific application which tends to trigger it. If I start X alone, then the system won't hang. Also I found if I mount the partition where log resides as sync, it became harder to hang the system.
(In reply to comment #9) > (In reply to comment #7) > > Is there any specific application that tends to trigger it? Does it also happen > > when you leave the X server idling without any clients, not even a window > > manager? > > I haven't found any specific application which tends to trigger it. > If I start X alone, then the system won't hang. > > Also I found if I mount the partition where log resides as sync, it became > harder to hang the system. > You could try to use something like "x11perf -repeat 1 -all" to find out if there is an specific request that tends to hang your server. I think I would do it with acceleration disabled (Option "NoAccel" set in the "Device" section) and no wm running, to avoid other interactions.
(In reply to comment #10) > I think I would do it with acceleration disabled (Option "NoAccel" set in the > "Device" section) and no wm running, to avoid other interactions. Yes, I have been using these two options: > Option "NoAccel" > Option "MCLK" "0Hz"
(In reply to comment #10) > (In reply to comment #9) > > (In reply to comment #7) > > > Is there any specific application that tends to trigger it? Does it also happen > > > when you leave the X server idling without any clients, not even a window > > > manager? > > > > I haven't found any specific application which tends to trigger it. > > If I start X alone, then the system won't hang. > > > > Also I found if I mount the partition where log resides as sync, it became > > harder to hang the system. > > > > You could try to use something like "x11perf -repeat 1 -all" to find out if > there is an specific request that tends to hang your server. X got bus error. The last three line was: 160000 reps @ 0.0497 msec ( 20100.0/sec): Char in 80-char rgb core line (Charter 10) 22400 reps @ 0.2502 msec ( 4000.0/sec): Char in 30-char rgb core line (Charter 24) 480000 reps @ 0.0106 msec ( 94500.0/sec): Char in 80-char rgb core line (Courier 12) I will try to fix it and run x11perf again.
I found this test tends to crash X when 16 depth is used: x11perf -repeat 1 -scroll10 Sometimes bus error, sometimes segfault, sometimes completely hang. It works well if 24 depth is used.
(In reply to comment #13) > I found this test tends to crash X when 16 depth is used: > > x11perf -repeat 1 -scroll10 > > Sometimes bus error, sometimes segfault, sometimes completely hang. > > It works well if 24 depth is used. > You might be able to get a backtrace if you run the server with gdb and it doesn't completely hang. That might be useful.
(In reply to comment #14) > You might be able to get a backtrace if you run the server with gdb and it > doesn't completely hang. That might be useful. I forget to save the core file I generated previously. The current core file can't give any useful information. (gdb) bt #0 0xbe754028 in ?? () #1 0xbe754028 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) The previous core file does not make sense to me. It segfaults because it tries to load a value from 0xffffffff. The correct address in in memory as can be seen from core file, and the assembly insn sequence to load the address seems correct too. Don't know why the loaded address would become 0xffffffff. BTW, I found this command is enough to crash X: x11perf -scroll10 Also I have tried -sync options and found that no matter the system completely hangs or just bus error/segfault, it happens at a XCopyArea call in DoScroll function. I will trace into that function to find what's really going on. And, with 24 Depth, x11perf -scroll10 works well.
I'm experiencing the same problem. I've compiled with debug symbols an xorg-server 1.6.5 and got this, attaching X with gdb and running x1perf -scroll10 : Program received signal SIGBUS, Bus error. 0x2b1914a4 in _fbGetWindowPixmap (pWindow=Cannot access memory at address 0x4002e02c ) at fbscreen.c:88 88 } (gdb) bt #0 0x2b1914a4 in _fbGetWindowPixmap (pWindow=Cannot access memory at address 0x4002e02c ) at fbscreen.c:88 Cannot access memory at address 0x4002e054 (gdb) info threads * 1 Thread 0x2b138000 (LWP 2421) 0x2b1914a4 in _fbGetWindowPixmap (pWindow=Cannot access memory at address 0x4002e02c ) at fbscreen.c:88 But I think the real problem happens earlier since, although the bus error seams to trigger instantaneously, the display is already wrong (there is a wrong pattern of approximately 8x80 pixels in the bottom left of the picture). Also, sometimes the system just hangs completely, usually without anything on the kernel console, yet once I got this : spurious 8259A interrupt: IRQ13. I wouldn't be surprised if on this hardware (lemote yeeloong) irq13 were bound to the SM712 (got to check this).
Another time I was more lucky : Program received signal SIGSEGV, Segmentation fault. 0x10216c58 in getDrawableDamageRef (pDrawable=0x10378ed8) at damage.c:92 92 if (pScreen->GetWindowPixmap (gdb) bt #0 0x10216c58 in getDrawableDamageRef (pDrawable=0x10378ed8) at damage.c:92 #1 0x10217ac0 in damageRegionProcessPending (pDrawable=0x10378ed8) at damage.c:386 #2 0x1021a3fc in damageCopyArea (pSrc=0x10378ed8, pDst=0x10378ed8, pGC=0x10379bd8, srcx=10, srcy=23, width=10, height=10, dstx=10, dsty=10) at damage.c:951 #3 0x1004bb90 in ProcCopyArea (client=0x10376980) at dispatch.c:1575 #4 0x10047a18 in Dispatch () at dispatch.c:456 #5 0x100211b4 in main (argc=4, argv=0x7fda8154, envp=0x7fda8168) at main.c:397 Also : (gdb) l 87 if (pDrawable->type == DRAWABLE_WINDOW) 88 { 89 ScreenPtr pScreen = pDrawable->pScreen; 90 91 pPixmap = 0; 92 if (pScreen->GetWindowPixmap 93 #ifdef ROOTLESS_WORKAROUND 94 && ((WindowPtr)pDrawable)->viewable 95 #endif 96 ) (gdb) p *pDrawable $1 = {type = 0 '\000', class = 1 '\001', depth = 16 '\020', bitsPerPixel = 16 '\020', id = 2097153, x = 3, y = 0, width = 600, height = 600, pScreen = 0x10302e28, serialNumber = 11} (gdb) p pScreen $2 = (ScreenPtr) 0xffffffff This is strange since it was compiled with -O0 !? So : (gdb) p *pDrawable->pScreen $4 = {myNum = 0, id = 0, width = 1024, height = 600, mmWidth = 270, mmHeight = 158, numDepths = 7, rootDepth = 16 '\020', allowedDepths = 0x103031d8, rootVisual = 33, defColormap = 32, minInstalledCmaps = 1, maxInstalledCmaps = 1, backingStoreSupport = 0 '\000', saveUnderSupport = 0 '\000', whitePixel = 65535, blackPixel = 0, rgf = 0, GCperDepth = {0x1030c730, 0x1030c818, 0x1030c900, 0x1030ca10, 0x1030cb30, 0x1030cc50, 0x1030cd70, 0x1030ce90, 0x0}, PixmapPerDepth = {0x1030cfb0}, devPrivate = 0x1030c1c0, numVisuals = 3, visuals = 0x1030c468, CloseScreen = 0x101a2ff0 <compCloseScreen>, QueryBestSize = 0x2b4a1300 <fbQueryBestSize>, SaveScreen = 0x2b431030 <SMI_SaveScreen>, GetImage = 0x1016992c <miSpriteGetImage>, GetSpans = 0x10169c10 <miSpriteGetSpans>, PointerNonInterestBox = 0x1015e888 <miPointerPointerNonInterestBox>, SourceValidate = 0x10169f58 <miSpriteSourceValidate>, CreateWindow = 0x101a64b0 <compCreateWindow>, DestroyWindow = 0x101a66d4 <compDestroyWindow>, PositionWindow = 0x101a4e44 <compPositionWindow>, ChangeWindowAttributes = 0x101a3300 <compChangeWindowAttributes>, RealizeWindow = 0x101a5070 <compRealizeWindow>, UnrealizeWindow = 0x101a5158 <compUnrealizeWindow>, ValidateTree = 0x1016d93c <miValidateTree>, PostValidateTree = 0, WindowExposures = 0x1015051c <miWindowExposures>, PaintWindowBackground = 0, PaintWindowBorder = 0, CopyWindow = 0x101a5e40 <compCopyWindow>, ClearToBackground = 0x10175880 <miClearToBackground>, ClipNotify = 0x101a5240 <compClipNotify>, RestackWindow = 0, CreatePixmap = 0x2b49f348 <fbCreatePixmap>, DestroyPixmap = 0x101c482c <ShmDestroyPixmap>, SaveDoomedAreas = 0, RestoreAreas = 0, ExposeCopy = 0, TranslateBackingStore = 0, ClearBackingStore = 0, DrawGuarantee = 0, BackingStoreFuncs = { SaveAreas = 0, RestoreAreas = 0, SetClipmaskRgn = 0, GetImagePixmap = 0, GetSpansPixmap = 0}, RealizeFont = 0x2b4a12a8 <fbRealizeFont>, UnrealizeFont = 0x2b4a12d4 <fbUnrealizeFont>, ConstrainCursor = 0x1015e764 <miPointerConstrainCursor>, CursorLimits = 0x102124f4 <AnimCurCursorLimits>, DisplayCursor = 0x102129b0 <AnimCurDisplayCursor>, RealizeCursor = 0x10212d54 <AnimCurRealizeCursor>, UnrealizeCursor = 0x10212e54 <AnimCurUnrealizeCursor>, RecolorCursor = 0x10212fd0 <AnimCurRecolorCursor>, SetCursorPosition = 0x10212c2c <AnimCurSetCursorPosition>, CreateGC = 0x10217dd4 <damageCreateGC>, CreateColormap = 0x100a7710 <CMapCreateColormap>, DestroyColormap = 0x100b3358 <DGADestroyColormap>, InstallColormap = 0x101a31d0 <compInstallColormap>, UninstallColormap = 0x100b3584 <DGAUninstallColormap>, ListInstalledColormaps = 0x2b47b1a0 <fbListInstalledColormaps>, StoreColors = 0x100a79cc <CMapStoreColors>, ResolveColor = 0x2b47b2a4 <fbResolveColor>, BitmapToRegion = 0x2b49f4e4 <fbPixmapToRegion>, SendGraphicsExpose = 0x1014fe34 <miSendGraphicsExpose>, BlockHandler = 0x101a3514 <compBlockHandler>, WakeupHandler = 0x1005a4dc <NoopDDA>, blockData = 0x0, wakeupData = 0x0, devPrivates = 0x10306110, CreateScreenResources = 0x10168a24 <miCreateScreenResources>, ModifyPixmapHeader = 0x10168630 <miModifyPixmapHeader>, GetWindowPixmap = 0x2b4a1448 <_fbGetWindowPixmap>, SetWindowPixmap = 0x1021f0e4 <damageSetWindowPixmap>, GetScreenPixmap = 0x10168d2c <miGetScreenPixmap>, SetScreenPixmap = 0x10168d58 <miSetScreenPixmap>, pScratchPixmap = 0x0, totalPixmapSize = 48, MarkWindow = 0x10175bc0 <miMarkWindow>, MarkOverlappedWindows = 0x10175c70 <miMarkOverlappedWindows>, ChangeSaveUnder = 0, PostChangeSaveUnder = 0, MoveWindow = 0x101a5618 <compMoveWindow>, ResizeWindow = 0x101a583c <compResizeWindow>, GetLayerWindow = 0x10177810 <miGetLayerWindow>, HandleExposures = 0x10175fd8 <miHandleValidateExposures>, ReparentWindow = 0x101a5c28 <compReparentWindow>, SetShape = 0x1017783c <miSetShape>, ChangeBorderWidth = 0x101a5a50 <compChangeBorderWidth>, MarkUnrealizedWindow = 0x10177db8 <miMarkUnrealizedWindow>, DeviceCursorInitialize = 0x1015e9dc <miPointerDeviceInitialize>, DeviceCursorCleanup = 0x1015eb54 <miPointerDeviceCleanup>} (just in case its informationnal). The places are not always the same, the error (segfault or bus error or hang) not always the same neither, sometime the whole register set seams bogus (including stack pointer)... Notice that here also it works well if depth is 24 or 8. I didn't found any other config parameter that seams to have any influence.
If someone want to have a look I've got a core file now. Download it here : http://happyleptic.org/~rixed/X_siliconmotion_core.tgz It again crashed after the ProcCopyArea() Also, while playing with this bug I got another spurious IRQ13. Not sure it's related. I also changed x11perf test windows to no be clipped and moved them away from screen border (thus video memory borders) but this didn't change anything. I also noticed that the bug seams to hangs the machine if you exec X in 16bpp after a fresh rebbot, but not if you first run X in 32 then kill it and restart at 16. You can then do many experiments before hanging the host. I keep looking for clues...
It would be interesting to know what happens with some other DDX, e.g. xf86-video-dummy.
As for the strange register that gets the value 0xffffffff from nowhere, could it be possible that the 16bpp code triggers an exception that's not handled properly by kernel or hardware, thus leaving -1 in a register ? I can't think of any other explanation for now.
(In reply to comment #19) > It would be interesting to know what happens with some other DDX, e.g. > xf86-video-dummy. > I already tried fb driver at 16bpp and it worked allright.
All bugs I was able to catch with gcc seamed to be related to the stack being corrupt assynchronously. For instance, 0xffffffff is present on the stack instead of a valid address in this kind of instruction sequence : lw v0,4(s8) ; load the word at offset 4 from the stack frame pointed by s8 into v0 lw v0,0(v0) ; dereference it, successfully ...do other reads... lw v0,4(%s8) ; relead the same value that was not modified lw v0,0(v0) ; <- segfault here, v0=0xffffffff and you can see with gdb x cmd that $s8+4 actually holds 0xffffffff. So my theory is that when copying an area, some pixels get written to the stack addresses (0xffffffff would be a pixel, and the stack appears to be not very far from the /dev/mem mmapping used to access video memory), but with different cache settings so that the former correct values were read from the stack up to some point when the cache gets refreshed ? I've looked into xorg-server CopyArea code for some other kind of assynchronous memory writes but couldn't find any (no hardware ROP nor DMA transfert and no pixman).
Even more simple case of magick : Program received signal SIGSEGV, Segmentation fault. 0x2ac4a118 in pixman_region_selfcheck (reg=0x4008802c) at pixman-region.c:2415 2415 if ((reg->extents.x1 > reg->extents.x2) || (gdb) disassemble pixman_region_selfcheck Dump of assembler code for function pixman_region_selfcheck: 0x2ac4a0f8 <pixman_region_selfcheck+0>: addiu sp,sp,-64 0x2ac4a0fc <pixman_region_selfcheck+4>: sd s8,56(sp) 0x2ac4a100 <pixman_region_selfcheck+8>: move s8,sp 0x2ac4a104 <pixman_region_selfcheck+12>: lui a1,0x6 0x2ac4a108 <pixman_region_selfcheck+16>: addu a1,a1,t9 0x2ac4a10c <pixman_region_selfcheck+20>: addiu a1,a1,30328 0x2ac4a110 <pixman_region_selfcheck+24>: sw a0,32(s8) 0x2ac4a114 <pixman_region_selfcheck+28>: lw v0,32(s8) 0x2ac4a118 <pixman_region_selfcheck+32>: lh v1,0(v0) So we segfaulted here, right after storing a0 (reg address) into stack, re-read it into v0, then trying to dereference v0. Now guess what : (gdb) info registers zero at v0 v1 R0 0000000000000000 ffffffffcfffffff 000000004008802c 00000000103888e8 a0 a1 a2 a3 R4 00000000103888e8 000000002acb1770 00000000103894b8 000000000000000a v0 is 4008802c thus the segfault, but a0 is correct (103888e8), and the stack location holds : (gdb) x $s8+32 0x7f978ce0: 0x4008802c The wrong one ! (notice how those wrong values appearing in the stack are always either 0xffffffff or 0x400XXXXX). So either "sw a0,32(s8)" is boggus, or some vicious interrupt changed the stack just after that, of we were jumping to this instruction from somewhere else. Notice that the other values on the stack seams allright, the stack frame is OK, so I do not believe we came here after a misplaced jump. This bug is driving me mad. I'm not familiar with mips nor X11, but this looks magick to me.
If an X11 guru would like to check this code of ftBlt I would be thanksfull, since when I comment it out scroll10 test works : if (alu == GXcopy && pm == FB_ALLONES && !reverse && !(srcX & 7) && !(dstX & 7) && !(width & 7)) { int i; CARD8 *src = (CARD8 *) srcLine; CARD8 *dst = (CARD8 *) dstLine; srcStride *= sizeof(FbBits); dstStride *= sizeof(FbBits); width >>= 3; src += (srcX >> 3); dst += (dstX >> 3); if (!upsidedown) for (i = 0; i < height; i++) MEMCPY_WRAPPED(dst + i * dstStride, src + i * srcStride, width); else for (i = height - 1; i >= 0; i--) MEMCPY_WRAPPED(dst + i * dstStride, src + i * srcStride, width); return; } The code that follow handle correctly this case as well as the more general case anyway. I'm currently running the whole x11perf test suite, but anyway I managed to display reddit homepage in firefox without any destroyed glyphes (wrong glyphes was the main manifestation of the bug for me), so it's clearly better without this code.
OK, so now I recomplied all my xorg server with usual CFLAGS, with the above code commented out. I keep get the wrong glyphes (black lines on top of some chars) but only with the EXA acceleration. If I choose NoAccel or XAA it works allright (so far). With EXA some fonts are corrupt and after a while the computer freeze. With XAA, so far so good.
The cited code apparently cause problem since source and dest copied lines are allowed to overlap. Replacing memcpy in fb/fb.h by memmove solve the problem when NoAccel. Still got to look for what's causing EXA accel mode to crash. I noticed that the memory barrier for mips is merely a sequence of nops. I got to check on the manual, but I guess it's inappropriate for the loongson.
I'm seeing this too on my Yeeloong, SM712. Thanks to rixed for putting 'spurious 8259A interrupt' in his post, that's how I found this bug. I've started using Depth 24 (which requires putting Virtual 1024 600 into xorg.conf so that it allocates a 1024x600 framebuffer instead of 1024x1024 which won't fit into the VRAM given the tiny amount needed by the cursor).
As XAA has been removed from xorg-server-1.12.99, the EXA remains and I experience lock-ups when using any GTK2 application (Loongson MIPS, 16 bpp depth). The common visual symptom is that some widgets (icons, shaded buttons) gets corrupted. I suspected pixman, but downgrading pixman or disabling Loongson optimized paths in pixman did not help. So I think it has something to do with SMI driver.
Same problem here. I'll try the solution from Comment #24.
After some tests, I realized that I have found another issue. It can be reproduced by x11perf --copywinpix100 then the system will completely hang. The last logs from X is: > SMI_SetupForSolidFill color=0000FFFF rop=03 DPR14 = 0000FFFF DPR34 = FFFFFFFF DPR38 = FFFFFFFF < SMI_SetupForSolidFill > SMI_SubsequentSolidFillRect x=3 y=0 w=600 h=600 DPR04 = 00030000 DPR08 = 02580258 DPR0C = 800000F0 < SMI_SubsequentSolidFillRect > SMI_AccelSync < SMI_AccelSync (completely hang) The last SMI_AccelSync returns. Then what up? Because the system is completely hang, neither SysRq or networking is not work. So I can't know the next function's name. But running with depth 24 doesn't have this problem.
I'm using the old version of X with XAA. So the problem can't cause by EXA.
When the kernel hangs, I got [ 1222.876000] spurious 8259A interrupt: IRQ0. from netconsole.
I attach gdb to a running X server, then set a breakpoint and use while 1 shell sleep 0.1 next end to see which line of code crash the machine. 437 in /var/tmp/portage/x11-base/xorg-server-1.11.4-r3/work/xorg-server-1.11.4/dix/dispatch.c 438 in /var/tmp/portage/x11-base/xorg-server-1.11.4-r3/work/xorg-server-1.11.4/dix/dispatch.c 439 in /var/tmp/portage/x11-base/xorg-server-1.11.4-r3/work/xorg-server-1.11.4/dix/dispatch.c (hang) Code: 437 result = XaceHookDispatch(client, client->majorOp); 438 if (result == Success) 439 result = (* client->requestVector[client->majorOp])(client); So, a memory dereference crashes the machine. It may means some areas of memory was destroyed by the buggy driver. and luckily, this time, a kernel panic occurred instead of completely hang: [ 583.648000] spurious 8259A interrupt: IRQ0. [ 583.656000] spurious 8259A interrupt: IRQ13. [ 583.660000] spurious 8259A interrupt: IRQ6. [ 583.664000] CPU 0 Unable to handle kernel paging request at virtual address 0000000000000020, epc == 0000000000000020, ra == ffffffff80279ae4 [ 583.664000] Oops[#1]: [ 583.664000] CPU: 0 PID: 235 Comm: X Not tainted 3.14.4-yeeloong-gaizi+ #10 [ 583.664000] task: 98000000bfbb6db0 ti: 98000000b8370000 task.ti: 98000000b8370000 [ 583.664000] $ 0 : 0000000000000000 ffffffffcfffffff 0000000000000020 0000000000000000 [ 583.664000] $ 4 : 0000000000000008 98000000bf360000 0000000000000020 0000000000000002 [ 583.664000] $ 8 : 0000000000000001 000000000000ffff 000000000000ffff 000000007628fafc [ 583.664000] $12 : 00000000140044e0 000000001000001f 00000000760b4dfa ffffffffffffffff [ 583.664000] $16 : 0000000000000000 0000000000000000 0000000000000008 ffffffff80b61cc0 [ 583.664000] $20 : 000000000000ffff ffffffff80adef58 0000000000000008 ffffffff80ba0000 [ 583.664000] $24 : 00000000000004b0 0000000000000800 [ 583.664000] $28 : 98000000b8370000 98000000b8373e00 98000000bf345080 ffffffff80279ae4 [ 583.664000] Hi : 0000000000000000 [ 583.664000] Lo : 0000000000000000 [ 583.664000] epc : 0000000000000020 0x20 [ 583.664000] Not tainted [ 583.664000] ra : ffffffff80279ae4 handle_irq_event_percpu+0x6c/0x220 [ 583.664000] Status: 140044e2 KX SX UX KERNEL EXL [ 583.664000] Cause : 10008408 [ 583.664000] BadVA : 0000000000000020 [ 583.664000] PrId : 00006303 (ICT Loongson-2) [ 583.664000] Modules linked in: netconsole configfs arc4 rtl8187 eeprom_93cx6 led_class mac80211 cfg80211 rfkill loongson2_cpufreq psmouse snd_cs5535audio 8139too mii snd_ac97_codec ac97_bus snd_pcm snd_timer snd soundcore ipv6 [ 583.664000] Process X (pid: 235, threadinfo=98000000b8370000, task=98000000bfbb6db0, tls=000000007712b4a0) [ 583.664000] Stack : ffffffff80b61cc0 000000000000ffff 000000000000ffff 000000000000ffff 000000000000ffff ffff00000000ffff 000000000000ffff 000000001089c848 00000000000001e0 ffffffff80279d04 000000000000ffff 000000000000ffff ffffffff80b61cc0 ffffffff8027d278 0000000000000000 ffffffff80279154 ffff000000000000 ffffffff80209310 0000000000000008 ffffffff80203f38 00000000000001e0 ffffffff80206f40 0000000000000000 ffffffffcfffffff 000000007628f382 00000000760b4a10 000000007628f302 00000000760b4950 00000000000000c2 0000000000000002 0000000000000001 000000000000ffff 000000000000ffff 000000007628fafc 000000000000001d 00000000000000c8 00000000760b4dfa ffffffffffffffff 0000000000000000 000000000000ffff ... [ 583.664000] Call Trace: [ 583.664000] [<ffffffff80279d04>] handle_irq_event+0x6c/0xa8 [ 583.664000] [<ffffffff8027d278>] handle_level_irq+0xb0/0x170 [ 583.664000] [<ffffffff80279154>] generic_handle_irq+0x5c/0x80 [ 583.664000] [<ffffffff80209310>] do_IRQ+0x18/0x28 [ 583.664000] [<ffffffff80203f38>] mach_irq_dispatch+0x50/0x78 [ 583.664000] [<ffffffff80206f40>] ret_from_irq+0x0/0x4 [ 583.664000] [ 583.664000] Code: (Bad address in epc) [ 583.664000] [ 583.672000] ---[ end trace b4344c9ded821fc4 ]--- [ 583.672000] Kernel panic - not syncing: Fatal exception in interrupt
BTW, the IRQ interrupts before hang are almost random: [ 168.972000] spurious 8259A interrupt: IRQ3. [ 251.516000] spurious 8259A interrupt: IRQ13. [ 254.968000] spurious 8259A interrupt: IRQ3. [ 254.968000] spurious 8259A interrupt: IRQ10. [ 260.968000] spurious 8259A interrupt: IRQ6. [ 46.704000] spurious 8259A interrupt: IRQ13. [ 47.940000] spurious 8259A interrupt: IRQ10.
It is not a problem of XAA implementation. If I use NoAccel, it also crashes. > SMI_SaveScreen > SMILynx_DisplayPowerManagementSet < SMILynx_DisplayPowerManagementSet < SMI_SaveScreen dmesg: [ 65.168000] spurious 8259A interrupt: IRQ13. [ 65.172000] spurious 8259A interrupt: IRQ0.
With EXA, I got corrupted fonts when running any GTK applications, and the kernel will recive an IRQ interrupt, then system will hang completely after a while. But, for x11perf --copywinpix100 test, it works fine. It is very strange.
Another kernel panic: [ 1556.616000] spurious 8259A interrupt: IRQ0. [ 1556.620000] CPU 0 Unable to handle kernel paging request at virtual address 000000000000a400, epc == 000000000000a400, ra == ffffffff80279ae4 [ 1556.620000] Oops[#1]: [ 1556.620000] CPU: 0 PID: 4661 Comm: X Not tainted 3.14.4-yeeloong-gaizi+ #10 [ 1556.620000] task: 98000000b86dba80 ti: 98000000bfef8000 task.ti: 98000000bfef8000 [ 1556.620000] $ 0 : 0000000000000000 ffffffffcfffffff 000000000000a400 0000000000000000 [ 1556.620000] $ 4 : 0000000000000008 98000000bf360000 0000000000000020 0000000000000006 [ 1556.620000] $ 8 : 0000000000000001 000000000000ffff 000000000000ffff 00000000764cf0d0 [ 1556.620000] $12 : 00000000140044e0 000000001000001f 0000000076474a76 ffffffffffffffff [ 1556.620000] $16 : 0000000000000000 0000000000000000 0000000000000008 ffffffff80b61cc0 [ 1556.620000] $20 : 000000000000ffff ffffffff80adef58 0000000000000008 ffffffff80ba0000 [ 1556.620000] $24 : 00000000000004b0 0000000000000800 [ 1556.620000] $28 : 98000000bfef8000 98000000bfefbe00 98000000bf345080 ffffffff80279ae4 [ 1556.620000] Hi : 0000000000000000 [ 1556.620000] Lo : 0000000000000000 [ 1556.620000] epc : 000000000000a400 0xa400 [ 1556.620000] Not tainted [ 1556.620000] ra : ffffffff80279ae4 handle_irq_event_percpu+0x6c/0x220 [ 1556.620000] Status: 140044e2 KX SX UX KERNEL EXL [ 1556.620000] Cause : 10008408 [ 1556.620000] BadVA : 000000000000a400 [ 1556.620000] PrId : 00006303 (ICT Loongson-2) [ 1556.620000] Modules linked in: ctr ccm netconsole configfs arc4 rtl8187 eeprom_93cx6 led_class mac80211 cfg80211 psmouse loongson2_cpufreq rfkill 8139too mii snd_cs5535audio snd_ac97_codec ac97_bus snd_pcm snd_timer snd soundcore ipv6 [ 1556.620000] Process X (pid: 4661, threadinfo=98000000bfef8000, task=98000000b86dba80, tls=00000000773e74a0) [ 1556.620000] Stack : ffffffff80b61cc0 000000000000ffff 000000000000ffff 000000000000ffff 000000000000ffff ffff00000000ffff 000000000000ffff 00000000102e4848 0000000000000000 ffffffff80279d04 000000000000ffff 000000000000ffff ffffffff80b61cc0 ffffffff8027d278 0000000000000000 ffffffff80279154 0000000000000000 ffffffff80209310 0000000000000008 ffffffff80203f38 0000000000000000 ffffffff80206f40 0000000000000000 ffffffffcfffffff 00000000764ce992 0000000076474688 00000000764ce8d2 00000000764745c8 00000000000000c6 0000000000000006 0000000000000001 000000000000ffff 000000000000ffff 00000000764cf0d0 000000000000002e 00000000000000c8 0000000076474a76 ffffffffffffffff 0000000000000000 000000000000ffff ... [ 1556.620000] Call Trace: [ 1556.620000] [<ffffffff80279d04>] handle_irq_event+0x6c/0xa8 [ 1556.620000] [<ffffffff8027d278>] handle_level_irq+0xb0/0x170 [ 1556.620000] [<ffffffff80279154>] generic_handle_irq+0x5c/0x80 [ 1556.620000] [<ffffffff80209310>] do_IRQ+0x18/0x28 [ 1556.620000] [<ffffffff80203f38>] mach_irq_dispatch+0x50/0x78 [ 1556.620000] [<ffffffff80206f40>] ret_from_irq+0x0/0x4 [ 1556.620000] [ 1556.620000] Code: (Bad address in epc) [ 1556.620000] [ 1556.624000] ---[ end trace 6928418bef65e208 ]--- [ 1556.624000] Kernel panic - not syncing: Fatal exception in interrupt
It isn't a memory corruption issue, I think. I said the system hang at: 438 if (result == Success) 439 result = (* client->requestVector[client->majorOp])(client); in fact, it hangs at: upsidedown=0, bitplane=0, closure=0x0) at /var/tmp/portage/x11-base/xorg-server-1.12.4-r2/work/xorg-server-1.12.4/fb/fbcopy.c:79 79 fbGetDrawable(pDstDrawable, dst, dstStride, dstBpp, dstXoff, dstYoff); 81 while (nbox--) { 83 if (pm == FB_ALLONES && alu == GXcopy && !reverse && !upsidedown) { 86 srcBpp, dstBpp, (pbox->x1 + dx + srcXoff), 85 ((uint32_t *) src, (uint32_t *) dst, srcStride, dstStride, 87 (pbox->y1 + dy + srcYoff), (pbox->x1 + dstXoff), 85 ((uint32_t *) src, (uint32_t *) dst, srcStride, dstStride, 87 (pbox->y1 + dy + srcYoff), (pbox->x1 + dstXoff), 85 ((uint32_t *) src, (uint32_t *) dst, srcStride, dstStride, 88 (pbox->y1 + dstYoff), (pbox->x2 - pbox->x1), 85 ((uint32_t *) src, (uint32_t *) dst, srcStride, dstStride, 88 (pbox->y1 + dstYoff), (pbox->x2 - pbox->x1), 85 ((uint32_t *) src, (uint32_t *) dst, srcStride, dstStride, 88 (pbox->y1 + dstYoff), (pbox->x2 - pbox->x1), 85 ((uint32_t *) src, (uint32_t *) dst, srcStride, dstStride, 89 (pbox->y2 - pbox->y1))) 85 ((uint32_t *) src, (uint32_t *) dst, srcStride, dstStride, 89 (pbox->y2 - pbox->y1))) 85 ((uint32_t *) src, (uint32_t *) dst, srcStride, dstStride, Code: if (pm == FB_ALLONES && alu == GXcopy && !reverse && !upsidedown) { if (!pixman_blt ((uint32_t *) src, (uint32_t *) dst, srcStride, dstStride, srcBpp, dstBpp, (pbox->x1 + dx + srcXoff), (pbox->y1 + dy + srcYoff), (pbox->x1 + dstXoff), (pbox->y1 + dstYoff), (pbox->x2 - pbox->x1), (pbox->y2 - pbox->y1))) goto fallback; else goto next; } So I think there was something wrong with pixman. I recompiled pixman with: USE="-loongson2f" emerge pixman and all the problems go away.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/driver/xf86-video-siliconmotion/issues/2.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.