Problem (maybe): same memory mapped twice on Silicon Motion card, causes fatal error.
== System ==
Computer: Acer Travelmate C100 tablet PC, ~6 years old
Graphics: Silicon Motion SM720 Lynx3DM
Distros: Fedora 9 and Ubuntu 8.10
== Context ==
At the start of the summer, I installed Fedora 9 over my Ubuntu 8.04 installation, and found that X would not start, giving me the above error message. Rather than being a Good User, I just switched back to Ubuntu 8.04 which I new had worked, as I didn't feel I had time to diagnose or track down the issue. I recently upgraded to Ubuntu 8.10 and it now suffers the same issue.
I didn't see an existing bug that was recent, so I poked around at the code a little. The problem might be that during preInit, SMI_MapMem maps memory from a base at a certain size once, and then it happens again during AddScreen, where another call to SMI_MapMem is made for the same base + size.
== Description with Details ==
The logic seems to flow like this:
>> "xf86Screens[i]->PreInit(xf86Screens[i], 0))" == SM/smi_driver.c/SMI_PreInit()
>> "(*pfnInit)(i, pScreen, argc, argv)" == SM/smi_driver.c/SMI_ScreenInit()
Unfortunately, the base at the given size is already mapped, so pci_device_map_range returns EINVAL (22). Having got an error back from pci_device_map_range, SMI_MapMem opts to return FALSE, and so does SMI_ScreenInit. AddScreen then xfree's the pScreen, decrements screenInfo.numScreens, and returns -1. -1 is taken as scr_index back in main(), which ultimately prompts the terrifying error message:
"Fatal server error:
AddScreen/ScreenInit failed for driver 0"
== Notes ==
I know developer resources are sparse, and I'm willing to do dirty ugly work for this, but with my own limited time, it would be a great boon to know whether something is actually behaving incorrectly here and any clues on what needs doing.
I'm going to try the siliconmotion driver from git head in the morning.
Head seems to suffer the same issue. I notice that the 1.5.1 release of it that Ubuntu shipped for 8.04 doesn't use libpciaccess, and that the 1.6.0 release that they ship have #ifndef XSERVER_LIBPCIACCESS sections. If I try to build the 1.5.1 version or the 1.6.0 version without XSERVER_LIBPCIACCESS defined, it complains about a few a symbols being undefined for anonymous structs or such. I'll poke at that some more, though I think I'd rather see it work well with libpciaccess :D
Created attachment 20709 [details]
Here is the Xorg log.
Created attachment 20710 [details]
It's a tablet, hence all the things that are commented out by Ubuntu's package manager. If I force-install the older X from Ubuntu 8.04, this xorg.conf seems to work, so I don't think there's anything particularly terrible in it.
Francisco Jerez speculated that my issue might be fixed in git head, which I
had checked, and thought I ran into the same problem when X failed to start
and, instead, left my screen still black. However, trying again, I see that
the error is different.
The following is printed to the screen. I'll attach the Xorg.0.log too.
X.Org X Server 1.5.2
Release Date: 10 October 2008
X Protocol Version 11, Revision 0
Build Operating System: Linux 2.6.24-19-server i686 Ubuntu
Current Operating System: Linux skedge 2.6.27-7-generic #1 SMP Tue Nov 4
19:33:20 UTC 2008 i686
Build Date: 24 October 2008 08:00:16AM
xorg-server 2:1.5.2-2ubuntu3 (email@example.com)
Before reporting problems, check http://wiki.x.org
to make sure that you have the latest version.
Module Loader present
Markers: (--) probed, (**) from config file, (==) default setting,
(++) from command line, (!!) notice, (II) informational,
(WW) warning, (EE) error, (NI) not implemented, (??) unknown.
(==) Log file: "/var/log/Xorg.0.log", Time: Tue Dec 2 00:24:54 2008
(==) Using config file: "/etc/X11/xorg.conf"
error setting MTRR (base = 0x14200000, size = 0x00800000, type = 1) Invalid
error setting MTRR (base = 0x14200000, size = 0x00800000, type = 1) Invalid
error setting MTRR (base = 0x14200000, size = 0x00800000, type = 1) Invalid
Created attachment 20738 [details]
Xorg.0.log using Silicon Motion driver from GIT
I'm also going to embark on trying to download all the necessary parts of X from git and see if that affects the situation.
Created attachment 20753 [details]
Xorg.0.log with SMI_DEBUG defined,post-patch 0001
This is after applying patch
from Francisco Jerez and with SMI_DEBUG enabled.
I'll note that trying to compile with it enabled gave me an error at first, because __VA_ARGS__ is used in the definition of the LEAVE() macro when it actually takes a value (called 'value') instead of being variadic (...).
This X session worked as expected at first (after a sleep/resume cycle, and I suspect it would have worked with a fresh power cycle), but then went out of centre and began flickering after a VT switch to a console. (Experience shows that restarting the X server afterwards does not correct it; only a power cycle or resuming from sleep.)
So, pretty close now.
I think the only changes that have been made were the off-centre patch, setting the option "UseBIOS" to "off", and using the current source from git.
Created attachment 21043 [details]
Xorg.0.log, post-patch 0002, XAA
Log of card using XAA. Sadly, SMI_DEBUG isn't one. This is after Francisco's 2nd patch that aimed to fix my issues after VT switching.
Posting this because X with XAA is slower now than before, and X seems to hang for a period of a few seconds whenever a major new drawing operation (?) occurs (that is, if I open a new window, or click on an applet).
Something of note, perhaps, is the repeated appearance of:
"SMI_GEReset called from smi_xaa.c line 341"
after the X server has been running.
It was suggested that SMI_DEBUG which I had on previously was the source of slowness, and while disabling it seemed to improve the situation a bit (I should have used some benchmarking tool...) for EXA, XAA didn't seem to improve.
Created attachment 21044 [details]
Xorg.0.log, SMI_DEBUG post-patch 0002, EXA
This is after running X after Francisco's 2nd patch but using EXA as my acceleration method. It works much betterly than XAA for me, which hangs the X server fairly frequently.
This log has SMI_DEBUG enabled, so it got to be rather large,
Created attachment 21046 [details]
Xorg log from GIT build with 2 patches applied.
As I experience similar problems (stock siliconmotion driver is disfunctional, serving me sometimes corruption, sometimes crashes), I tried building the newest driver from Git and applied two patches (0001-Disable-screen-centering-on-mode-initialization.patch and 0002-Some-corrections-on-the-Lynx-modesetting-code.patch).
Alas, X refuses to start, but dumps the following message frequently:
(II) SMI(0): SMI_GEReset called from smi_accel.c line 107
Full log attached.
Do you also have the following option set for your card in your xorg.conf?
Option "UseBIOS" "off"
As I understand the situation, Francisco already fixed the issue that led to X finding no screens but avoiding the BIOS was also necessary since it lies to us :) The two patches were to help get a centred and functional display afterward :)
Does that help?
(In reply to comment #9)
> Created an attachment (id=21046) [details]
> Xorg log from GIT build with 2 patches applied.
> As I experience similar problems (stock siliconmotion driver is disfunctional,
> serving me sometimes corruption, sometimes crashes), I tried building the
> newest driver from Git and applied two patches
> (0001-Disable-screen-centering-on-mode-initialization.patch and
> Alas, X refuses to start, but dumps the following message frequently:
> (II) SMI(0): SMI_GEReset called from smi_accel.c line 107
> Full log attached.
Indeed, I'm sorry, using this option and both patches makes X work. Still getting those "MSI_GEReset" calls though, something to worry about? (mainly from smi_xaa.c:466)
(In reply to comment #11)
> Indeed, I'm sorry, using this option and both patches makes X work. Still
> getting those "MSI_GEReset" calls though, something to worry about? (mainly
> from smi_xaa.c:466)
Excuse me for the double reply, but I have another thing to add. When using XAA, some tasks (like browsing the web) are /extremely/ laggy, quite undoable in fact. Using EXA solves this, but I get plenty of corruption.
Created attachment 21056 [details] [review]
Created attachment 21057 [details] [review]
Created attachment 21058 [details] [review]
Created attachment 21059 [details] [review]
(In reply to comment #12)
> Excuse me for the double reply, but I have another thing to add. When using
> XAA, some tasks (like browsing the web) are /extremely/ laggy, quite undoable
> in fact. Using EXA solves this, but I get plenty of corruption.
I think patch 0003 will solve the XAA issue. About the graphic corruption with EXA... Could you attach a log? (Better with -logverbose 7 and SMI_DEBUG defined during the driver compilation) Does it make any difference if you switch the framebuffer depth e.g. to 16?
Maybe patch 0004 will make Option "UseBIOS" "off" unnecessary, but I'm not sure because UseBIOS works for me...
Patch 0003 indeed solves the extreme laggyness while using XAA on 24 bit, while patch 0004 makes the driver correctly detect when it should not use the BIOS (confirmed by warning in Xorg's log).
I'm now going to compile the driver with those verbosity options enabled and test EXA acceleration. At first glance I do get the impression that using a bitdepth of 16 makes the corruption's impact a bit smaller, but it still happens and makes the environment unusable.
Thanks for the efforts.
A problem with patch 0004 though, if having applied the patch and starting the Xserver with no "UseBIOS" option specified, the Xserver works correctly, but I cannot switch VT's afterwards, or exit the server and go back to the prompt: the screen blanks. Forcing the "UseBIOS" option with patch applied though, makes everything goes at it should.
Created attachment 21060 [details]
XAA acceleration, 24 bitdepth, forced UseBIOS=off.
Constation: no corruption caused by driver, and quite a smooth experience
Created attachment 21061 [details]
EXA acceleration, 24 bitdepth, forced UseBIOS=off.
Constation: a lot corruption, windows get very unreadable after a while. Hovering/moving/maximizing them sometimes helps. Server reacted smoothly on everything I did.
EXA acceleration, 16 bitdepth, forced UseBIOS=off.
Constation: a lot corruption, windows get very unreadable after a while. Hovering/moving/maximizing them sometimes helps. Possibly a tiny bit less corruption then on 24 bit. Very laggy Xserver though, with a low responsitivity. Warning: big log (+100mb extracted)!
URL (too big to post here): http://maleadt.no-ip.org:8080/files/Xorg-EXA-16.log.bz2
(In reply to comment #22)
> EXA acceleration, 16 bitdepth, forced UseBIOS=off.
> Constation: a lot corruption, windows get very unreadable after a while.
> Hovering/moving/maximizing them sometimes helps. Possibly a tiny bit less
> corruption then on 24 bit. Very laggy Xserver though, with a low
> responsitivity. Warning: big log (+100mb extracted)!
> URL (too big to post here):
Hi, I don't see anything strange on your logs. Maybe it's just some hardware subtlety that isn't taken into account on the EXA implementation.
To discover which acceleration primitive is causing the corruption, you could stick an instruction like:
e.g. after the debug output on SMI_PrepareCopy, at smi_exa.c. That would provoke a software fallback. If the screen then looks okay, we would know which one is misbehaving. (You could also try with SMI_PrepareSolid, but Copy is most likely the problem...BTW, Does the stipple pattern display correctly on server startup?)
It may behave more deterministically if you set
> Option "MigrationHeuristic" "always"
on the Device section in the config file.
I completely understand if you don't want to dig so much on this issue :-)
About the UseBIOS patch, it should probably default to off for this specific chipset, instead of probing.
I'm happy to dig a bit on this bug, it's a good way for me to get used
with driver "development" too.
Anyway, if I understood it correctly, I tried disabling the EXA
primitives which return a bool for software-fallback (which is only
PrepareCopy, PrepareSolid, CheckComposite and PrepareComposite?). Sadly, the corruption did not cease to happen, and everytime I got the exact same level of corruption.
A bit more on the corruption: when I start the X server, the initial stippled map is displayed properly, without any corruption. Then the background gets rendered, together with the initial cursor. Still no corruption. Then however, the cursor corrupts into a rectangle containing random data, and the top of the screen gets filled with more corrupted data when the statusbar gets initialised. During the loading of the statusbar, the corruption seems to be evolving a bit. When I open a menu or application, the corruptions spreads over the whole screen.
These are soms screenshots:
- Picture of the just initialised X-server (only cursor and top-screen corruption): http://maleadt.no-ip.org:8080/files/Afb055.jpg
- Picture when having opened a menu: http://maleadt.no-ip.org:8080/files/Afb056.jpg
- Picture when taking a screenshot with Xpaint: http://maleadt.no-ip.org:8080/files/Afb057.jpg
- The screenshot: http://maleadt.no-ip.org:8080/files/screenshot.png
- Picture what was displayed after taking the screenshot: http://maleadt.no-ip.org:8080/files/Afb058.jpg
(In reply to comment #24)
Could you attach the configuration file you are using?
Created attachment 21122 [details]
Xorg configuration file.
When I test EXA, I just comment out the accelmethod option, and make no other changes. Corruption happens with fluxbox as well as with enlightenment (e16).
Right now (using XAA), I do see some slight corruption too, but couldn't say for sure whether it's caused by the driver, or by the windowmanager. I'll try some more windowmanagers to see if the same slight (mainly taskbar) corruption occurs too.
(In reply to comment #26)
> Created an attachment (id=21122) [details]
> Xorg configuration file.
> When I test EXA, I just comment out the accelmethod option, and make no other
> changes. Corruption happens with fluxbox as well as with enlightenment (e16).
> Right now (using XAA), I do see some slight corruption too, but couldn't say
> for sure whether it's caused by the driver, or by the windowmanager. I'll try
> some more windowmanagers to see if the same slight (mainly taskbar) corruption
> occurs too.
Does it help to set:
> VideoRam 4096
in the config file Device section?
It does indeed :)
The XAA corruption I mentioned isn't caused by the driver either, but most likely an e16 issue.
I seemed to have called victory too early: though the amount of corruption has been reduced drastically, it occasionally appears again, i.e. after clicking an url or opening an application. It mostly concerns movement around the title bar (i.e. buttons from the menu bar under the title bar get shifted "up" across the border of the screen to the downside of the window (the statusbar)). I'll make a screenshot later on.
Created attachment 21125 [details] [review]
(In reply to comment #29)
> I seemed to have called victory too early: though the amount of corruption has
> been reduced drastically, it occasionally appears again, i.e. after clicking an
> url or opening an application. It mostly concerns movement around the title bar
> (i.e. buttons from the menu bar under the title bar get shifted "up" across the
> border of the screen to the downside of the window (the statusbar)). I'll make
> a screenshot later on.
What's the effect of patch 0005? I think it may fix the memory detection issue. Could you try it out without specifying neither UseBIOS nor VideoRam?
Are you now getting corruption with both EXA and XAA? Does it happen with Option "NoAccel"?
Some logs after applying patch 0005 may be helpful.
Patch 0005 seems to be quite successful: VT's doesn't remain blank after closing down the Xserver, EXA acceleration doesn't heavily corrupt the screen when not having specified the amount of video RAM, and the remaining corruption on EXA acceleration which occurred even with VideoRam specified seems to have disappeared too :) I keep on testing though. 2 logs of (corruption-free) Xserver sessions attached.
Created attachment 21127 [details]
Xorg log using patch 0005 with EXA accel
Created attachment 21128 [details]
Xorg log using patch 0005 with XAA accel
Created attachment 21186 [details]
Xorg log with backtrace.
Corruption has completely disappeared, but a new small bug has come up: when switching VT's from and to the X-server, it occasionally crashes. Xorg log with backtrace included. Need a log with more verbosity?
Created attachment 21190 [details] [review]
(In reply to comment #35)
> Created an attachment (id=21186) [details]
> Xorg log with backtrace.
> Corruption has completely disappeared, but a new small bug has come up: when
> switching VT's from and to the X-server, it occasionally crashes. Xorg log with
> backtrace included. Need a log with more verbosity?
I think I reproduced it. Probably patch 0006 fixes it.
Created attachment 21201 [details]
Verbose log of Xorg crashing due to VT switch.
Nope, doesn't fix it. I tried to narrow done the bug a bit, and this reproduces it 100%: start the X server (using startx) as a normal, default-privileged user, and switch back to the calling VT as soon before the environment has fully loaded.
The X-server crashes, with a backtrace. The crash does happen too when I switch back to the VT a bit after the stipple pattern, so it doesn't seem to be strictly related to X loading (but rather X+WM). Strangely, when I try this as root, the server doesn't crash either...
Verbose log included.
(In reply to comment #38)
> Created an attachment (id=21201) [details]
> Verbose log of Xorg crashing due to VT switch.
> Nope, doesn't fix it. I tried to narrow done the bug a bit, and this reproduces
> it 100%: start the X server (using startx) as a normal, default-privileged
> user, and switch back to the calling VT as soon before the environment has
> fully loaded.
> The X-server crashes, with a backtrace. The crash does happen too when I switch
> back to the VT a bit after the stipple pattern, so it doesn't seem to be
> strictly related to X loading (but rather X+WM). Strangely, when I try this as
> root, the server doesn't crash either...
> Verbose log included.
So, patch 0006 fixes a completely unrelated crash... I think your problem is not a driver bug... That shouldn't happen on any server version newer than 1.5.
This should be fixed now in master, so I'm closing it.
Thanks for reporting :-)