|Summary:||"AddScreen/ScreenInit failed for driver 0" error with Silicon Motion card|
|Product:||xorg||Reporter:||Richard Schwarting <aquarichy>|
|Component:||Driver/siliconmotion||Assignee:||Xorg Project Team <xorg-team>|
|Status:||RESOLVED FIXED||QA Contact:||Xorg Project Team <xorg-team>|
|i915 platform:||i915 features:|
Description Richard Schwarting 2008-11-30 01:17:11 UTC
Problem (maybe): same memory mapped twice on Silicon Motion card, causes fatal error. == System == Computer: Acer Travelmate C100 tablet PC, ~6 years old Graphics: Silicon Motion SM720 Lynx3DM Distros: Fedora 9 and Ubuntu 8.10 == Context == At the start of the summer, I installed Fedora 9 over my Ubuntu 8.04 installation, and found that X would not start, giving me the above error message. Rather than being a Good User, I just switched back to Ubuntu 8.04 which I new had worked, as I didn't feel I had time to diagnose or track down the issue. I recently upgraded to Ubuntu 8.10 and it now suffers the same issue. I didn't see an existing bug that was recent, so I poked around at the code a little. The problem might be that during preInit, SMI_MapMem maps memory from a base at a certain size once, and then it happens again during AddScreen, where another call to SMI_MapMem is made for the same base + size. == Description with Details == The logic seems to flow like this: Xorg/dix/main.c/main() > Xorg/xf86Init.c/InitOutput() >> "xf86Screens[i]->PreInit(xf86Screens[i], 0))" == SM/smi_driver.c/SMI_PreInit() >>> SM/smi_driver.c/SMI_MapMem() >>>> pci/common_interface.c/pci_device_map_range() > Xorg/main.c/AddScreen() >> "(*pfnInit)(i, pScreen, argc, argv)" == SM/smi_driver.c/SMI_ScreenInit() >>> SM/smi_driver.c/SMI_MapMem() >>>> pci/common_interface.c/pci_device_map_range() Unfortunately, the base at the given size is already mapped, so pci_device_map_range returns EINVAL (22). Having got an error back from pci_device_map_range, SMI_MapMem opts to return FALSE, and so does SMI_ScreenInit. AddScreen then xfree's the pScreen, decrements screenInfo.numScreens, and returns -1. -1 is taken as scr_index back in main(), which ultimately prompts the terrifying error message: "Fatal server error: AddScreen/ScreenInit failed for driver 0" == Notes == I know developer resources are sparse, and I'm willing to do dirty ugly work for this, but with my own limited time, it would be a great boon to know whether something is actually behaving incorrectly here and any clues on what needs doing. I'm going to try the siliconmotion driver from git head in the morning. Cheers.
Comment 1 Richard Schwarting 2008-11-30 21:32:29 UTC
Head seems to suffer the same issue. I notice that the 1.5.1 release of it that Ubuntu shipped for 8.04 doesn't use libpciaccess, and that the 1.6.0 release that they ship have #ifndef XSERVER_LIBPCIACCESS sections. If I try to build the 1.5.1 version or the 1.6.0 version without XSERVER_LIBPCIACCESS defined, it complains about a few a symbols being undefined for anonymous structs or such. I'll poke at that some more, though I think I'd rather see it work well with libpciaccess :D
Comment 2 Richard Schwarting 2008-12-01 00:21:18 UTC
Created attachment 20709 [details] Xorg log Here is the Xorg log.
Comment 3 Richard Schwarting 2008-12-01 00:22:42 UTC
Created attachment 20710 [details] xorg.conf It's a tablet, hence all the things that are commented out by Ubuntu's package manager. If I force-install the older X from Ubuntu 8.04, this xorg.conf seems to work, so I don't think there's anything particularly terrible in it.
Comment 4 Richard Schwarting 2008-12-02 00:32:44 UTC
Francisco Jerez speculated that my issue might be fixed in git head, which I had checked, and thought I ran into the same problem when X failed to start and, instead, left my screen still black. However, trying again, I see that the error is different. The following is printed to the screen. I'll attach the Xorg.0.log too. ================================= X.Org X Server 1.5.2 Release Date: 10 October 2008 X Protocol Version 11, Revision 0 Build Operating System: Linux 2.6.24-19-server i686 Ubuntu Current Operating System: Linux skedge 2.6.27-7-generic #1 SMP Tue Nov 4 19:33:20 UTC 2008 i686 Build Date: 24 October 2008 08:00:16AM xorg-server 2:1.5.2-2ubuntu3 (firstname.lastname@example.org) Before reporting problems, check http://wiki.x.org to make sure that you have the latest version. Module Loader present Markers: (--) probed, (**) from config file, (==) default setting, (++) from command line, (!!) notice, (II) informational, (WW) warning, (EE) error, (NI) not implemented, (??) unknown. (==) Log file: "/var/log/Xorg.0.log", Time: Tue Dec 2 00:24:54 2008 (==) Using config file: "/etc/X11/xorg.conf" error setting MTRR (base = 0x14200000, size = 0x00800000, type = 1) Invalid argument (22) error setting MTRR (base = 0x14200000, size = 0x00800000, type = 1) Invalid argument (22) error setting MTRR (base = 0x14200000, size = 0x00800000, type = 1) Invalid argument (22)
Comment 5 Richard Schwarting 2008-12-02 00:34:38 UTC
Created attachment 20738 [details] Xorg.0.log using Silicon Motion driver from GIT I'm also going to embark on trying to download all the necessary parts of X from git and see if that affects the situation.
Comment 6 Richard Schwarting 2008-12-02 21:49:20 UTC
Created attachment 20753 [details] Xorg.0.log with SMI_DEBUG defined,post-patch 0001 This is after applying patch 0001-Disable-screen-centering-on-mode-initialization.patch from Francisco Jerez and with SMI_DEBUG enabled. I'll note that trying to compile with it enabled gave me an error at first, because __VA_ARGS__ is used in the definition of the LEAVE() macro when it actually takes a value (called 'value') instead of being variadic (...). This X session worked as expected at first (after a sleep/resume cycle, and I suspect it would have worked with a fresh power cycle), but then went out of centre and began flickering after a VT switch to a console. (Experience shows that restarting the X server afterwards does not correct it; only a power cycle or resuming from sleep.) So, pretty close now. I think the only changes that have been made were the off-centre patch, setting the option "UseBIOS" to "off", and using the current source from git.
Comment 7 Richard Schwarting 2008-12-10 22:30:20 UTC
Created attachment 21043 [details] Xorg.0.log, post-patch 0002, XAA Log of card using XAA. Sadly, SMI_DEBUG isn't one. This is after Francisco's 2nd patch that aimed to fix my issues after VT switching. Posting this because X with XAA is slower now than before, and X seems to hang for a period of a few seconds whenever a major new drawing operation (?) occurs (that is, if I open a new window, or click on an applet). Something of note, perhaps, is the repeated appearance of: "SMI_GEReset called from smi_xaa.c line 341" after the X server has been running. It was suggested that SMI_DEBUG which I had on previously was the source of slowness, and while disabling it seemed to improve the situation a bit (I should have used some benchmarking tool...) for EXA, XAA didn't seem to improve.
Comment 8 Richard Schwarting 2008-12-10 22:41:27 UTC
Created attachment 21044 [details] Xorg.0.log, SMI_DEBUG post-patch 0002, EXA This is after running X after Francisco's 2nd patch but using EXA as my acceleration method. It works much betterly than XAA for me, which hangs the X server fairly frequently. This log has SMI_DEBUG enabled, so it got to be rather large,
Comment 9 Tim Besard 2008-12-11 00:28:38 UTC
Created attachment 21046 [details] Xorg log from GIT build with 2 patches applied. As I experience similar problems (stock siliconmotion driver is disfunctional, serving me sometimes corruption, sometimes crashes), I tried building the newest driver from Git and applied two patches (0001-Disable-screen-centering-on-mode-initialization.patch and 0002-Some-corrections-on-the-Lynx-modesetting-code.patch). Alas, X refuses to start, but dumps the following message frequently: (II) SMI(0): SMI_GEReset called from smi_accel.c line 107 Full log attached.
Comment 10 Richard Schwarting 2008-12-11 01:07:13 UTC
Do you also have the following option set for your card in your xorg.conf? Option "UseBIOS" "off" As I understand the situation, Francisco already fixed the issue that led to X finding no screens but avoiding the BIOS was also necessary since it lies to us :) The two patches were to help get a centred and functional display afterward :) Does that help? (In reply to comment #9) > Created an attachment (id=21046) [details] > Xorg log from GIT build with 2 patches applied. > > As I experience similar problems (stock siliconmotion driver is disfunctional, > serving me sometimes corruption, sometimes crashes), I tried building the > newest driver from Git and applied two patches > (0001-Disable-screen-centering-on-mode-initialization.patch and > 0002-Some-corrections-on-the-Lynx-modesetting-code.patch). > Alas, X refuses to start, but dumps the following message frequently: > (II) SMI(0): SMI_GEReset called from smi_accel.c line 107 > Full log attached. >
Comment 11 Tim Besard 2008-12-11 02:47:46 UTC
Indeed, I'm sorry, using this option and both patches makes X work. Still getting those "MSI_GEReset" calls though, something to worry about? (mainly from smi_xaa.c:466)
Comment 12 Tim Besard 2008-12-11 02:50:44 UTC
(In reply to comment #11) > Indeed, I'm sorry, using this option and both patches makes X work. Still > getting those "MSI_GEReset" calls though, something to worry about? (mainly > from smi_xaa.c:466) > Excuse me for the double reply, but I have another thing to add. When using XAA, some tasks (like browsing the web) are /extremely/ laggy, quite undoable in fact. Using EXA solves this, but I get plenty of corruption.
Comment 13 Francisco Jerez 2008-12-11 07:37:24 UTC
Created attachment 21056 [details] [review] 0001-Disable-screen-centering-on-mode-initialization.patch
Comment 14 Francisco Jerez 2008-12-11 07:38:07 UTC
Created attachment 21057 [details] [review] 0002-Some-corrections-on-the-Lynx-modesetting-code.patch
Comment 15 Francisco Jerez 2008-12-11 07:38:30 UTC
Created attachment 21058 [details] [review] 0003-Fix-XAA-SolidFill-with-32-bpp-framebuffer.patch
Comment 16 Francisco Jerez 2008-12-11 07:38:53 UTC
Created attachment 21059 [details] [review] 0004-Fall-back-to-UseBIOS-off-when-VBEInit-fails.patch
Comment 17 Francisco Jerez 2008-12-11 07:49:33 UTC
(In reply to comment #12) > Excuse me for the double reply, but I have another thing to add. When using > XAA, some tasks (like browsing the web) are /extremely/ laggy, quite undoable > in fact. Using EXA solves this, but I get plenty of corruption. > I think patch 0003 will solve the XAA issue. About the graphic corruption with EXA... Could you attach a log? (Better with -logverbose 7 and SMI_DEBUG defined during the driver compilation) Does it make any difference if you switch the framebuffer depth e.g. to 16? Maybe patch 0004 will make Option "UseBIOS" "off" unnecessary, but I'm not sure because UseBIOS works for me...
Comment 18 Tim Besard 2008-12-11 09:45:26 UTC
Patch 0003 indeed solves the extreme laggyness while using XAA on 24 bit, while patch 0004 makes the driver correctly detect when it should not use the BIOS (confirmed by warning in Xorg's log). I'm now going to compile the driver with those verbosity options enabled and test EXA acceleration. At first glance I do get the impression that using a bitdepth of 16 makes the corruption's impact a bit smaller, but it still happens and makes the environment unusable. Thanks for the efforts.
Comment 19 Tim Besard 2008-12-11 10:03:31 UTC
A problem with patch 0004 though, if having applied the patch and starting the Xserver with no "UseBIOS" option specified, the Xserver works correctly, but I cannot switch VT's afterwards, or exit the server and go back to the prompt: the screen blanks. Forcing the "UseBIOS" option with patch applied though, makes everything goes at it should.
Comment 20 Tim Besard 2008-12-11 10:33:48 UTC
Created attachment 21060 [details] Xorg log. XAA acceleration, 24 bitdepth, forced UseBIOS=off. Constation: no corruption caused by driver, and quite a smooth experience
Comment 21 Tim Besard 2008-12-11 10:34:54 UTC
Created attachment 21061 [details] Xorg log. EXA acceleration, 24 bitdepth, forced UseBIOS=off. Constation: a lot corruption, windows get very unreadable after a while. Hovering/moving/maximizing them sometimes helps. Server reacted smoothly on everything I did.
Comment 22 Tim Besard 2008-12-11 10:37:46 UTC
EXA acceleration, 16 bitdepth, forced UseBIOS=off. Constation: a lot corruption, windows get very unreadable after a while. Hovering/moving/maximizing them sometimes helps. Possibly a tiny bit less corruption then on 24 bit. Very laggy Xserver though, with a low responsitivity. Warning: big log (+100mb extracted)! URL (too big to post here): http://maleadt.no-ip.org:8080/files/Xorg-EXA-16.log.bz2
Comment 23 Francisco Jerez 2008-12-12 11:13:45 UTC
(In reply to comment #22) > EXA acceleration, 16 bitdepth, forced UseBIOS=off. > Constation: a lot corruption, windows get very unreadable after a while. > Hovering/moving/maximizing them sometimes helps. Possibly a tiny bit less > corruption then on 24 bit. Very laggy Xserver though, with a low > responsitivity. Warning: big log (+100mb extracted)! > > URL (too big to post here): > http://maleadt.no-ip.org:8080/files/Xorg-EXA-16.log.bz2 > Hi, I don't see anything strange on your logs. Maybe it's just some hardware subtlety that isn't taken into account on the EXA implementation. To discover which acceleration primitive is causing the corruption, you could stick an instruction like: > LEAVE(FALSE); e.g. after the debug output on SMI_PrepareCopy, at smi_exa.c. That would provoke a software fallback. If the screen then looks okay, we would know which one is misbehaving. (You could also try with SMI_PrepareSolid, but Copy is most likely the problem...BTW, Does the stipple pattern display correctly on server startup?) It may behave more deterministically if you set > Option "MigrationHeuristic" "always" on the Device section in the config file. I completely understand if you don't want to dig so much on this issue :-) About the UseBIOS patch, it should probably default to off for this specific chipset, instead of probing.
Comment 24 Tim Besard 2008-12-13 02:15:17 UTC
I'm happy to dig a bit on this bug, it's a good way for me to get used with driver "development" too. Anyway, if I understood it correctly, I tried disabling the EXA primitives which return a bool for software-fallback (which is only PrepareCopy, PrepareSolid, CheckComposite and PrepareComposite?). Sadly, the corruption did not cease to happen, and everytime I got the exact same level of corruption. A bit more on the corruption: when I start the X server, the initial stippled map is displayed properly, without any corruption. Then the background gets rendered, together with the initial cursor. Still no corruption. Then however, the cursor corrupts into a rectangle containing random data, and the top of the screen gets filled with more corrupted data when the statusbar gets initialised. During the loading of the statusbar, the corruption seems to be evolving a bit. When I open a menu or application, the corruptions spreads over the whole screen. These are soms screenshots: - Picture of the just initialised X-server (only cursor and top-screen corruption): http://maleadt.no-ip.org:8080/files/Afb055.jpg - Picture when having opened a menu: http://maleadt.no-ip.org:8080/files/Afb056.jpg - Picture when taking a screenshot with Xpaint: http://maleadt.no-ip.org:8080/files/Afb057.jpg - The screenshot: http://maleadt.no-ip.org:8080/files/screenshot.png - Picture what was displayed after taking the screenshot: http://maleadt.no-ip.org:8080/files/Afb058.jpg
Comment 25 Francisco Jerez 2008-12-13 05:25:16 UTC
(In reply to comment #24) Could you attach the configuration file you are using? Thanks.
Comment 26 Tim Besard 2008-12-13 05:53:41 UTC
Created attachment 21122 [details] Xorg configuration file. When I test EXA, I just comment out the accelmethod option, and make no other changes. Corruption happens with fluxbox as well as with enlightenment (e16). Right now (using XAA), I do see some slight corruption too, but couldn't say for sure whether it's caused by the driver, or by the windowmanager. I'll try some more windowmanagers to see if the same slight (mainly taskbar) corruption occurs too.
Comment 27 Francisco Jerez 2008-12-13 06:31:46 UTC
(In reply to comment #26) > Created an attachment (id=21122) [details] > Xorg configuration file. > > When I test EXA, I just comment out the accelmethod option, and make no other > changes. Corruption happens with fluxbox as well as with enlightenment (e16). > > Right now (using XAA), I do see some slight corruption too, but couldn't say > for sure whether it's caused by the driver, or by the windowmanager. I'll try > some more windowmanagers to see if the same slight (mainly taskbar) corruption > occurs too. > Does it help to set: > VideoRam 4096 in the config file Device section?
Comment 28 Tim Besard 2008-12-13 07:55:55 UTC
It does indeed :) The XAA corruption I mentioned isn't caused by the driver either, but most likely an e16 issue.
Comment 29 Tim Besard 2008-12-13 08:58:10 UTC
I seemed to have called victory too early: though the amount of corruption has been reduced drastically, it occasionally appears again, i.e. after clicking an url or opening an application. It mostly concerns movement around the title bar (i.e. buttons from the menu bar under the title bar get shifted "up" across the border of the screen to the downside of the window (the statusbar)). I'll make a screenshot later on.
Comment 30 Francisco Jerez 2008-12-13 10:29:29 UTC
Created attachment 21125 [details] [review] 0005-Enable-linear-memory-mode-on-SMI_MapMmio.patch
Comment 31 Francisco Jerez 2008-12-13 10:44:28 UTC
(In reply to comment #29) > I seemed to have called victory too early: though the amount of corruption has > been reduced drastically, it occasionally appears again, i.e. after clicking an > url or opening an application. It mostly concerns movement around the title bar > (i.e. buttons from the menu bar under the title bar get shifted "up" across the > border of the screen to the downside of the window (the statusbar)). I'll make > a screenshot later on. > What's the effect of patch 0005? I think it may fix the memory detection issue. Could you try it out without specifying neither UseBIOS nor VideoRam? Are you now getting corruption with both EXA and XAA? Does it happen with Option "NoAccel"? Some logs after applying patch 0005 may be helpful.
Comment 32 Tim Besard 2008-12-13 11:55:10 UTC
Patch 0005 seems to be quite successful: VT's doesn't remain blank after closing down the Xserver, EXA acceleration doesn't heavily corrupt the screen when not having specified the amount of video RAM, and the remaining corruption on EXA acceleration which occurred even with VideoRam specified seems to have disappeared too :) I keep on testing though. 2 logs of (corruption-free) Xserver sessions attached.
Comment 33 Tim Besard 2008-12-13 11:56:27 UTC
Created attachment 21127 [details] Xorg log using patch 0005 with EXA accel
Comment 34 Tim Besard 2008-12-13 11:56:51 UTC
Created attachment 21128 [details] Xorg log using patch 0005 with XAA accel
Comment 35 Tim Besard 2008-12-15 11:09:31 UTC
Created attachment 21186 [details] Xorg log with backtrace. Corruption has completely disappeared, but a new small bug has come up: when switching VT's from and to the X-server, it occasionally crashes. Xorg log with backtrace included. Need a log with more verbosity?
Comment 36 Francisco Jerez 2008-12-15 18:06:38 UTC
Created attachment 21190 [details] [review] 0006-Fix-crashes-when-switching-VTs-with-EXA-enabled.patch
Comment 37 Francisco Jerez 2008-12-15 18:10:13 UTC
(In reply to comment #35) > Created an attachment (id=21186) [details] > Xorg log with backtrace. > > Corruption has completely disappeared, but a new small bug has come up: when > switching VT's from and to the X-server, it occasionally crashes. Xorg log with > backtrace included. Need a log with more verbosity? > I think I reproduced it. Probably patch 0006 fixes it.
Comment 38 Tim Besard 2008-12-16 02:25:38 UTC
Created attachment 21201 [details] Verbose log of Xorg crashing due to VT switch. Nope, doesn't fix it. I tried to narrow done the bug a bit, and this reproduces it 100%: start the X server (using startx) as a normal, default-privileged user, and switch back to the calling VT as soon before the environment has fully loaded. The X-server crashes, with a backtrace. The crash does happen too when I switch back to the VT a bit after the stipple pattern, so it doesn't seem to be strictly related to X loading (but rather X+WM). Strangely, when I try this as root, the server doesn't crash either... Verbose log included.
Comment 39 Francisco Jerez 2008-12-16 05:49:12 UTC
(In reply to comment #38) > Created an attachment (id=21201) [details] > Verbose log of Xorg crashing due to VT switch. > > Nope, doesn't fix it. I tried to narrow done the bug a bit, and this reproduces > it 100%: start the X server (using startx) as a normal, default-privileged > user, and switch back to the calling VT as soon before the environment has > fully loaded. > The X-server crashes, with a backtrace. The crash does happen too when I switch > back to the VT a bit after the stipple pattern, so it doesn't seem to be > strictly related to X loading (but rather X+WM). Strangely, when I try this as > root, the server doesn't crash either... > Verbose log included. > So, patch 0006 fixes a completely unrelated crash... I think your problem is not a driver bug... That shouldn't happen on any server version newer than 1.5.
Comment 40 Francisco Jerez 2008-12-19 09:43:18 UTC
This should be fixed now in master, so I'm closing it. Thanks for reporting :-)