Bug 53423 - The MGA driver and EXA acceleration causes server crash
Summary: The MGA driver and EXA acceleration causes server crash
Status: RESOLVED MOVED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/mga (show other bugs)
Version: unspecified
Hardware: All Linux (All)
: medium major
Assignee: Tilman Sauerbeck
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-08-12 20:26 UTC by Göran Uddeborg
Modified: 2018-08-10 20:43 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
xorg.conf file used during the test (645 bytes, text/plain)
2012-08-12 20:26 UTC, Göran Uddeborg
no flags Details
Xorg log after the crash (33.39 KB, text/x-log)
2012-08-12 20:28 UTC, Göran Uddeborg
no flags Details

Description Göran Uddeborg 2012-08-12 20:26:10 UTC
Created attachment 65480 [details]
xorg.conf file used during the test

The MGA driver uses XAA acceleration by default, but that acceleration was removed in Fedora 17.  Without any acceleration the display have several issues and is significantly slower, so I tried to use EXA acceleration instead.  But that causes the server to crash as soon as any client connects.

I originally reported this to Fedora's bugzilla, bug 826000, but it was suggested it might be better to report it here instead.

I'm using the most recent packages for Fedora 17:

xorg-x11-server-Xorg-1.12.2-4.fc17.x86_64
xorg-x11-drv-mga-1.4.13-19.20120104git9223c44a7.fc17.x86_64

To reproduce, I gdb on Xorg, and start the process with ":1" as the only argument.  The screen now gets black.  Then I run the import command from the ImageMagick set, using "import -display :1 -window root x.png".  That always causes the server to crash.

The backtrace in gdb looks like this:
#0  __memcpy_sse2 () at ../sysdeps/x86_64/memcpy.S:273
#1  0x00007f9fe20a62fe in memcpy (__len=6400, __src=0x0, __dest=0x273c950) at /usr/include/bits/string3.h:52
#2  mgaDownloadFromScreen (pSrc=<optimized out>, x=<optimized out>, y=<optimized out>, w=<optimized out>, h=<optimized out>, dst=0x273c950 "", dst_pitch=6400) at mga_exa.c:745
#3  0x00007f9fe1c7a007 in exaGetImage (pDrawable=0x22dfcc0, x=0, y=0, w=1600, h=10, format=2, planeMask=4294967295, d=0x273c950 "") at exa_accel.c:1290
#4  0x00000000005600f3 in miSpriteGetImage (pDrawable=0x22dfcc0, sx=0, sy=0, w=1600, h=10, format=2, planemask=4294967295, pdstLine=0x273c950 "") at misprite.c:413
#5  0x00000000004c99f0 in compGetImage (pDrawable=0x22dfcc0, sx=<optimized out>, sy=<optimized out>, w=<optimized out>, h=<optimized out>, format=<optimized out>, planemask=4294967295, pdstLine=0x273c950 "") at compinit.c:148
#6  0x000000000043161a in DoGetImage (planemask=<optimized out>, height=1200, width=<optimized out>, y=<optimized out>, x=0, drawable=<optimized out>, format=2, client=0x2709e00, im_return=<optimized out>) at dispatch.c:2128
#7  ProcGetImage (client=0x2709e00) at dispatch.c:2205
#8  0x000000000043444a in Dispatch () at dispatch.c:428
#9  0x0000000000423485 in main (argc=2, argv=0x7fff9e23c288, envp=<optimized out>) at main.c:288
Comment 1 Göran Uddeborg 2012-08-12 20:28:08 UTC
Created attachment 65481 [details]
Xorg log after the crash
Comment 2 Robert Jacobs 2013-10-18 22:54:50 UTC
With the removal of XAA, I was testing EXA and reproduced this by accident on x86. 

Testing mga HEAD against debian xserver-xorg-core 1.14.3 yields a whole slew of instances of the exa methods being called with inputs that are 0, which the mga EXA driver obligingly dereferences and segfaults.

Some of my initial discovery process is at bugs.debian.org/726002, where I put it even though that's an unrelated problem. (oops)

Breakpoint 2, mgaDownloadFromScreen (pSrc=0xb6122008, x=0, y=0, w=176, h=16, 
    dst=0xb6122088 "", dst_pitch=1024) at mga_exa.c:730
732	    char *src = pSrc->devPrivate.ptr;
devPrivate here contains only 0s. 

Well, ok, fine, I go and add a check to see if src is NULL here and if
so it returns FALSE, and now it seems to work fine.

I made an annotated version, it works correctly about 9% of the
time... is that bad? Do you have any ideas why the pSrc->devPrivate
structure would be {0,0,0,0} ? 

Here's another random one:

Program received signal SIGSEGV, Segmentation fault.
0xb7a3a46c in mgaCheckComposite (op=3, pSrcPict=0x806d4b70, pMaskPict=0x805ffd70, 
    pDstPict=0x806d4d98) at mga_exa.c:359
359	    MGAPtr pMga = xf86ScreenToScrn(pSrcPict->pDrawable->pScreen)->driverPrivate;
(gdb) p pSrcPict->pDrawable 
$8 = (DrawablePtr) 0x0

I can be paranoid and add checks that it's never dereferencing NULL,
but that seems wrong. Surely that means something significant is
missing? Why would the acceleration methods be being passed what seems
to be garbage input? 

Is this a problem in the mga EXA driver? In the main EXA methods library? I don't even know how to start tracing back why these things are 0.
Comment 3 Tormod Volden 2013-10-31 22:55:33 UTC
I have applied a commit e9109a0b04695d6971c94abe271dda2dc1a5e886 to git which gets rid of the use of Pixmap devPrivate.ptr and should help for many of NULL pointers seen here. Please test and report back, thanks in advance.
Comment 4 Robert Jacobs 2013-11-01 05:16:43 UTC
Context: (I already emailed this to Tormod)
So I sat down and struggled through this until I found what the problem is; unsurprisingly it seems to be an implicit API change.

As we know, MGA's EXA driver expects devPrivate.ptr to point to somewhere useful.  Pidgin reliably triggered the crash when something tried to mgaDownloadFromScreen XID 0x20006D, which contained drawable 0x20006B, which was made by exaCreatePixmap_classic.

exaCreatePixmap_classic, at least with mga, ultimately calls fbCreatePixmap, which populates devPrivate.ptr.

And then exaCreatePixmap_classic takes the value that was in devPrivate.ptr, moves it to pExaPixmap->sys_ptr, and sets devPrivate.ptr to NULL.

This change was 5bf3f0fd in xserver.git, made in late 2008. No-one detected it at the time because MGA EXA performance was inferior to XAA performance, and XAA was default.

Then I went silent for about a week. (Sorry)

I've tested the patch against xserver 1.14.99.3. I haven't tested it against HEAD because the DRI3 stuff made building it a mess.

In any case, it seems to cause infinite recursion?

I've got the following backtrace:
#0  exaDoMoveOutPixmap (migrate=0xbfffd2e0) at exa_migration_classic.c:401
#1  0xb78f3a25 in exaDoMigration_classic (pixmaps=0xbfffd2e0, npixmaps=1, can_accel=0) at exa_migration_classic.c:719
#2  0xb78f16be in exaDoMigration (pixmaps=0xbfffd2e0, npixmaps=1, can_accel=0) at exa.c:1134
#3  0xb78f3b29 in exaPrepareAccessReg_classic (pPixmap=0xb40b4008, index=1, pReg=0xbfffd330) at exa_migration_classic.c:758
#4  0xb79005e3 in ExaFallbackPrepareReg (pDrawable=0xb40b4008, pGC=0x0, x=0, y=0, width=1, height=1, index=1, checkReads=0)
    at exa_unaccel.c:198
#5  0xb790160d in ExaCheckGetImage (pDrawable=0xb40b4008, x=0, y=0, w=1, h=1, format=2, planeMask=4294967295,
    d=0xbfffd4a9 "\313,\bF%\220\267\b@\v\264\240\006\223\267\370\324\377\277\366\223\222\267\b@\v\264@@ط\001") at exa_unaccel.c:407
#6  0xb78f965e in exaGetImage (pDrawable=0xb40b4008, x=0, y=0, w=1, h=1, format=2, planeMask=4294967295,
    d=0xbfffd4a9 "\313,\bF%\220\267\b@\v\264\240\006\223\267\370\324\377\277\366\223\222\267\b@\v\264@@ط\001") at exa_accel.c:1299
#7  0x082143b2 in miSpriteGetImage (pDrawable=0xb40b4008, sx=0, sy=0, w=1, h=1, format=2, planemask=4294967295,
    pdstLine=0xbfffd4a9 "\313,\bF%\220\267\b@\v\264\240\006\223\267\370\324\377\277\366\223\222\267\b@\v\264@@ط\001")
    at misprite.c:405
#8  0x0812acf8 in compGetImage (pDrawable=0xb40b4008, sx=0, sy=0, w=1, h=1, format=2, planemask=4294967295,
    pdstLine=0xbfffd4a9 "\313,\bF%\220\267\b@\v\264\240\006\223\267\370\324\377\277\366\223\222\267\b@\v\264@@ط\001")
    at compinit.c:148
#9  0xb790266b in exaGetPixmapFirstPixel (pPixmap=0xb40b4008) at exa_unaccel.c:728
#10 0xb79293f6 in mgaDownloadFromScreen (pSrc=0xb40b4008, x=0, y=0, w=176, h=16, dst=0xb40b4088 "", dst_pitch=1024) at mga_exa.c:732
#11 0xb78f2b5e in exaCopyDirty (migrate=0xbfffd680, pValidDst=0xb40b406c, pValidSrc=0xb40b4078,
    transfer=0xb79293c1 <mgaDownloadFromScreen>, fallback_index=1, sync=0xb78f15d7 <exaWaitSync>) at exa_migration_classic.c:220
#12 0xb78f2d70 in exaCopyDirtyToSys (migrate=0xbfffd680) at exa_migration_classic.c:285
#13 0xb78f307e in exaDoMoveOutPixmap (migrate=0xbfffd680) at exa_migration_classic.c:404
#14 0xb78f3a25 in exaDoMigration_classic (pixmaps=0xbfffd680, npixmaps=1, can_accel=0) at exa_migration_classic.c:719
#15 0xb78f16be in exaDoMigration (pixmaps=0xbfffd680, npixmaps=1, can_accel=0) at exa.c:1134
#16 0xb78f3b29 in exaPrepareAccessReg_classic (pPixmap=0xb40b4008, index=1, pReg=0x82d1b5c) at exa_migration_classic.c:758
#17 0xb7901e06 in ExaPrepareCompositeReg (pScreen=0x82d0cd0, op=12 '\f', pSrc=0x84999f8, pMask=0x0, pDst=0x8499908, xSrc=0, ySrc=0,
    xMask=0, yMask=0, xDst=0, yDst=1, width=9, height=10) at exa_unaccel.c:560
#18 0xb790209b in ExaCheckComposite (op=12 '\f', pSrc=0x84999f8, pMask=0x0, pDst=0x8499908, xSrc=0, ySrc=0, xMask=0, yMask=0, xDst=0,
    yDst=1, width=9, height=10) at exa_unaccel.c:608
#19 0xb78fdb3b in exaCompositeRects (op=12 '\f', pSrc=0x84999f8, pMask=0x0, pDst=0x8499908, nrect=16, rects=0xbfffd960)
    at exa_render.c:604
#20 0xb78fa9ca in exaGlyphsToMask (pMask=0x8499908, buffer=0xbfffd95c) at exa_glyphs.c:613
#21 0xb78fb340 in exaGlyphs (op=3 '\003', pSrc=0x8499250, pDst=0x8499128, maskFormat=0x82d1168, xSrc=0, ySrc=0, nlist=-1,
    list=0xbffff32c, glyphs=0xbfffef4c) at exa_glyphs.c:822
#22 0x0818d63d in damageGlyphs (op=3 '\003', pSrc=0x8499250, pDst=0x8499128, maskFormat=0x82d1168, xSrc=0, ySrc=0, nlist=3,
    list=0xbffff308, glyphs=0xbfffef08) at damage.c:572
#23 0x0817a131 in CompositeGlyphs (op=3 '\003', pSrc=0x8499250, pDst=0x8499128, maskFormat=0x82d1168, xSrc=0, ySrc=0, nlist=3,
    lists=0xbffff308, glyphs=0xbfffef08) at glyph.c:558
#24 0x081847cf in ProcRenderCompositeGlyphs (client=0x83ce400) at render.c:1390
#25 0x081861d0 in ProcRenderDispatch (client=0x83ce400) at render.c:1989
#26 0x080728c0 in Dispatch () at dispatch.c:432
#27 0x0807fda1 in dix_main (argc=5, argv=0xbffff7f4, envp=0xbffff80c) at main.c:294
#28 0x080645eb in main (argc=5, argv=0xbffff7f4, envp=0xbffff80c) at stubmain.c:34

The top 13 lines repeat about 7 times until it dereferences something it shouldn't (in dixGetPrivateAddr, going past the end of a pointer, i.e. not NULL). Everywhere 0xb40b4008 shows up it repeats every time that function is called, so that seems confused.


I'll play with this some tomorrow.
Comment 5 Göran Uddeborg 2013-11-05 21:16:42 UTC
Thank you Robert for testing this!

I'm unfortunately not able to try it out easily.  The machine where this happened broke and has been replaced.  It wasn't the graphics card that broke, and I might be able to put it in some other machine later on.  But for the time being I'm glad if Robert can do the testing.
Comment 6 Tormod Volden 2013-11-05 21:54:02 UTC
I suppose one fixed bug was just masking another. The code in the stacktrace is barely touching into the mga driver. It could be that the mga driver fails to set up things correctly, but it could also be a bug in the server that doesn't get much exposure from other drivers. Can you please try different MigrationHeuristic options? And other EXA options (man exa)?

What if you don't set pExa->DownloadFromScreen in src/mga_exa.c:898 or let mgaDownloadFromScreen() simply return FALSE (which should fall back to software download in the server - doesn't look like mgaDownloadFromScreen() does much magic anyway)?
Comment 7 GitLab Migration User 2018-08-10 20:43:44 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/driver/xf86-video-mga/issues/10.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.