Summary: | [945GM] starting second X server fails | ||
---|---|---|---|
Product: | xorg | Reporter: | Will Stephenson <wstephenson> |
Component: | Driver/intel | Assignee: | Jesse Barnes <jbarnes> |
Status: | RESOLVED FIXED | QA Contact: | Xorg Project Team <xorg-team> |
Severity: | major | ||
Priority: | high | CC: | eich, kent.liu, ling.yue, mat, pachoramos1, quanxian.wang, sndirsch |
Version: | 7.4 (2008.09) | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Description
Will Stephenson
2008-10-09 00:12:07 UTC
Created attachment 19517 [details]
Log of first X server after starting second server
Created attachment 19518 [details]
Log of second, hung X server
Created attachment 19519 [details]
xorg.conf
Since Kent asked me about the severity/priority. I reproduced the symptom with a GEM enabled kernel, but got a kernel backtrace instead. Still trying to reproduce this with an older kernel... That DSPARB related message indicates we're not doing something correctly in our LeaveVT function probably, which could explain the hangs. Anyway I'll keep trying to reproduce it. In the meantime can you enable the modedebug option in your xorg.conf? It should catch register differences between startup & VT switch time, so it might help. Stefan, would you like to enable the modedebug option to get more information for Jesse? Option ModeDebug "on" Jesse, any progress for that with GEM kernel? Thanks Quanxian, unfortunately I don't have such a machine. Will, could you provide the requested information, please? Thanks. Created attachment 20543 [details]
Log with ModeDebug on
Does this log tell you what you need to know? Interestingly, I started a second server from a running session, and this time the display went black and stayed black only after logging in. Will try again starting both servers from the console.
Created attachment 20544 [details]
Crashed X server's log
This is the logfile from a second run with ModeDebug on.
no display manager was running
$ X :0 &
$ X :1 &
ctrl-alt-f7 and :0 dies with the attached log.
Created attachment 20545 [details]
2nd X server's log
Info provided, sorry for the delay, was on holiday. Jesse, I just now debug this. Based on your comments, it may be the DSPARB setting wrongly problem. The information just as below ------------ First Xwindow (II) intel(0): i830_update_dsparb-qxwang:num_crtc-2, fifo_entries-95 (II) intel(0): i830_update_dsparb-qxwang:crtc-0,enabled-0, plane-0 (II) intel(0): i830_update_dsparb-qxwang:crtc-1,enabled-1, plane-1 (II) intel(0): i830_update_dsparb-qxwang:total_hdisplay-1024,planea_hdisplay-0, planeb_hdisaply-1024 Second Xwindow (II) intel(0): i830_update_dsparb-qxwang:num_crtc-2, fifo_entries-95 (II) intel(0): i830_update_dsparb-qxwang:crtc-0,enabled-0, plane-1 (II) intel(0): i830_update_dsparb-qxwang:crtc-1,enabled-1, plane-0 (II) intel(0): i830_update_dsparb-qxwang:total_hdisplay-1024,planea_hdisplay-1024, planeb_hdisaply-0 ---------------- The setting for DSPARB is like this i830_display.c ------------ planea_entries = fifo_entries * planea_hdisplay / total_hdisplay; planeb_entries = fifo_entries * planeb_hdisplay / total_hdisplay; if (IS_I9XX(pI830)) OUTREG(DSPARB, ((planea_entries + planeb_entries) << DSPARB_CSTART_SHIFT) | (planea_entries << DSPARB_BSTART_SHIFT)); else if (IS_MOBILE(pI830)) OUTREG(DSPARB, ((planea_entries + planeb_entries) << DSPARB_BEND_SHIFT) | (planea_entries << DSPARB_AEND_SHIFT)); ------------- My question is if we open 2 windows, there will be two planes(planea, planeb) on one crtc. Every time we change to another window, the planea_entries and planeb_entries will be 0 or 95(I9XX chipset) based on above testing data. Is it reasonable? Should we apply planeA 95/2 and planeB 95/2 whatever we chvt to which xwindow? Is my understanding right? If right, we should allocate the fifo_entries based the current planes number we got. Regards Jesse, Are you still looking on this? I have no idea about that. :( Both servers should be running on the same pipe though, so DSPARB should be the same. However, it'll switch at VT time back to its original value, then get re-programmed again when the next server's EnterVT gets called. At least, that's what *should* happen... OK. That is means when you do the vt switch, one server will leaveVT, one server will enterVT, based on the log, seems the synchronization has some problem. When one server leaveVT, it doesn't clean up pipe B or others registers which will make PIPE B still run. I have checked the source code, it is in DPMS operation, when dpms off, the DPLL, PIPE, PLANE, PORT will be disabled. Is there other place to make happen? Also if it is enough to do this to cleanup registers shared by two server?? Regards Quanxian Wang Well, we could very well be missing something in our Enter/LeaveVT code. When the driver does a LeaveVT, it's supposed to put the hw back into the state it found it, using the RestoreHWState function (after turning everything off). If, after this, some of the register values aren't the same as they were when the server started, it could definitely cause trouble for a subsequent X server. On EnterVT, the driver uses xf86SetDesiredModes to get things back to where it wants them... On this hardware, it looks like the pipeA quirk might be active? That leaves pipe A on so that the BIOS can bang on it at suspend time for example without hanging the machine. Also, the second server seems to be choosing a different plane/pipe mapping for some reason. Does the second server work better if you disable framebuffer compression? hi,all I can not reproduce this issue on our 945gm, below is my system environment: is there any difference from yours? Host: 945gm Arch: i386 OSD: Fedora release 7 (Moonshine) Libdrm: (master)0243c9f801a35de3465a0321c02f18a4d07ce5b8 Mesa: (intel-2008-q4)f96baeaac3ef41260ac3975750627ece073fdce0 Xserver: (server-1.6-branch)32e81074b967716865aef08b66ec29caf0fec2c5 Xf86_video_intel: (xf86-video-intel-2.6-branch) 83f3c376b5942e134047a220e6e5f2432ffc492c GEM_kernel: (for-airlied)0fbdb7c9455a05eb89f358f0eb66fb8ab094a0c5 Haien, We use intel-Q3 release which is gem-classic. It is big different with yours environment. Also we can not backport yours into SLE11 system. See comment 1 Hi, Jesse After testing, I think 945GM is a special platform. Therefore I add some code into the i830_display.c. It works fine. Do you have some comments for the code? My idea is to make them use the same configuration whatever plane a or plane b is used. Of course, this code is special for 945GM, others platform doesn't have such problem, we don't need care. Seems another quirk process. :) --------------- --- i830_display.c_orig 2008-12-23 22:06:45.000000000 +0800 +++ i830_display.c 2008-12-23 22:07:33.000000000 +0800 @@ -1156,9 +1156,18 @@ i830_update_dsparb(ScrnInfoPtr pScrn) planeb_entries = fifo_entries * planeb_hdisplay / total_hdisplay; if (IS_I9XX(pI830)) + { + if(IS_I945GM(pI830)) + { + OUTREG(DSPARB, (95 << DSPARB_CSTART_SHIFT) | + (48 << DSPARB_BSTART_SHIFT)); + } + else{ OUTREG(DSPARB, - ((planea_entries + planeb_entries) << DSPARB_CSTART_SHIFT) | - (planea_entries << DSPARB_BSTART_SHIFT)); + ((planea_entries + planeb_entries) << DSPARB_CSTART_SHIFT) | + (planea_entries << DSPARB_BSTART_SHIFT)); + } + } else if (IS_MOBILE(pI830)) OUTREG(DSPARB, ((planea_entries + planeb_entries) << DSPARB_BEND_SHIFT) | ---------------- (In reply to comment #18) > Haien, > We use intel-Q3 release which is gem-classic. It is big different with yours > environment. Also we can not backport yours into SLE11 system. I thought this also happened on upstream as Jesse said he can reproduce with GEM kernel (in comment#5). Then I'm removing this from Q4 release blocker now. Jesse, I should set the pipe_entries to 128. I ever read your some comments, you said 945GM has 128 fifo entries. Is it right? --- i830_display.c_orig 2008-12-23 22:06:45.000000000 +0800 +++ i830_display.c 2008-12-23 22:07:33.000000000 +0800 @@ -1156,9 +1156,18 @@ i830_update_dsparb(ScrnInfoPtr pScrn) planeb_entries = fifo_entries * planeb_hdisplay / total_hdisplay; if (IS_I9XX(pI830)) + { + if(IS_I945GM(pI830)) + { + OUTREG(DSPARB, (128 << DSPARB_CSTART_SHIFT) | + (48 << DSPARB_BSTART_SHIFT)); + } + else{ OUTREG(DSPARB, - ((planea_entries + planeb_entries) << DSPARB_CSTART_SHIFT) | - (planea_entries << DSPARB_BSTART_SHIFT)); + ((planea_entries + planeb_entries) << DSPARB_CSTART_SHIFT) | + (planea_entries << DSPARB_BSTART_SHIFT)); + } + } else if (IS_MOBILE(pI830)) OUTREG(DSPARB, ((planea_entries + planeb_entries) << DSPARB_BEND_SHIFT) | (In reply to comment #19) > Hi, Jesse > After testing, I think 945GM is a special platform. Therefore I add some code > into the i830_display.c. It works fine. > > Do you have some comments for the code? My idea is to make them use the same > configuration whatever plane a or plane b is used. Of course, this code is > special for 945GM, others platform doesn't have such problem, we don't need > care. Seems another quirk process. :) > > --------------- > --- i830_display.c_orig 2008-12-23 22:06:45.000000000 +0800 > +++ i830_display.c 2008-12-23 22:07:33.000000000 +0800 > @@ -1156,9 +1156,18 @@ i830_update_dsparb(ScrnInfoPtr pScrn) > planeb_entries = fifo_entries * planeb_hdisplay / total_hdisplay; > > if (IS_I9XX(pI830)) > + { > + if(IS_I945GM(pI830)) > + { > + OUTREG(DSPARB, (95 << DSPARB_CSTART_SHIFT) | > + (48 << DSPARB_BSTART_SHIFT)); > + } > + else{ > OUTREG(DSPARB, > - ((planea_entries + planeb_entries) << DSPARB_CSTART_SHIFT) | > - (planea_entries << DSPARB_BSTART_SHIFT)); > + ((planea_entries + planeb_entries) << DSPARB_CSTART_SHIFT) | > + (planea_entries << DSPARB_BSTART_SHIFT)); > + } > + } > else if (IS_MOBILE(pI830)) > OUTREG(DSPARB, > ((planea_entries + planeb_entries) << DSPARB_BEND_SHIFT) | > ---------------- > qwang, yeah sorry I missed your earlier add. Yeah I think we have the wrong DSPARB values for 945. There's another patch (that uses the entry count variables instead) in 18651 that you should test. (In reply to comment #22) > qwang, yeah sorry I missed your earlier add. Yeah I think we have the wrong > DSPARB values for 945. There's another patch (that uses the entry count > variables instead) in 18651 that you should test. > Jesse, The patch doesn't work(Actually, it just change 95 to 127 for 945GM). After the patch, we can not go to login windows. Just show us mass colorful lines. This is also in GM965. We ever started 2 xserver and found one is white screen. 02/26 is the final release (GMC) for SLED11, actually we should can only provide the batch before one week or earlier the time, this bug will impact much more customers who use Intel Graphics Chipset for Novell SLED11 release. I up the priority to critical. Reproduced again with SLED11 RC3. FYI the hang happens when switching back to the 1st server's VT. There is still pixel puke between gdm and the first user session coming up... If it's pipe mapping problem Zhenyu noticed, the patch in 19603 might be the fix. Can you give it a try? (In reply to comment #26) > If it's pipe mapping problem Zhenyu noticed, the patch in 19603 might be the > fix. Can you give it a try? > i830-create-known-state.patch from 19603 doesn't apply cleanly to 2.5.0, but I'm building with Helge's patch (xf86-video-intel-2.5.99-vesa-vtswitch.patch) which does. Created attachment 23005 [details]
Log of first X server with debug open
After got the patch from 19603 and testing, we don't found underrun information. However the second server directly go to white screen.
Checking the log of 2nd server,
1) dri is disabling (DRISCREEN initilized failed, only one server can use dri)
2) drm front, back, depth buffer is not allocated in second server(since dri is disalbed, it should be OK)
Also there is no EDID informaiton output for 2nd server.
I attach the output for xorg.
Created attachment 23006 [details]
Log of second X server with debug open
Log of second server
Actually the problem is still there. I just come across another problem for compiz issue. When I disable compiz and do the testing again, still the underrun information appears. The patch does nothing on this bug. This same configuration (server 1.5, driver 2.5) works on my 915 test box, so it could be something related to the 945 in particular. Note that the second server is properly swapping pipes & planes, while the first server isn't (probably because DRI is enabled and the drm module doesn't support that option), though for some reason the first server isn't allocating a compressed framebuffer (it does in my case). --- working.out 2009-02-20 10:17:30.406528992 -0800 +++ broken.out 2009-02-20 10:17:09.082552621 -0800 @@ -23,7 +23,7 @@ (II) intel(0): SDVOB: 0x00480000 (disabled, pipe A, stall disabl ed, not detected) (II) intel(0): SDVOC: 0x00480000 (disabled, pipe A, stall disabl ed, not detected) (II) intel(0): SDVOUDI: 0x0000003f -(II) intel(0): DSPARB: 0x00002f80 +(II) intel(0): DSPARB: 0x00002fdf (II) intel(0): DSPFW1: 0x00000000 (II) intel(0): DSPFW2: 0x00000000 (II) intel(0): DSPFW3: 0x00000000 @@ -44,10 +44,10 @@ (II) intel(0): PFIT_PGM_RATIOS: 0x00000000 (II) intel(0): PORT_HOTPLUG_EN: 0x00000020 (II) intel(0): PORT_HOTPLUG_STAT: 0x00000000 -(II) intel(0): DSPACNTR: 0x58000000 (disabled, pipe A) +(II) intel(0): DSPACNTR: 0xd9000000 (enabled, pipe B) (II) intel(0): DSPASTRIDE: 0x00002000 (8192 bytes) (II) intel(0): DSPAPOS: 0x00000000 (0, 0) -(II) intel(0): DSPASIZE: 0x01df027f (640, 480) +(II) intel(0): DSPASIZE: 0x02ff03ff (1024, 768) (II) intel(0): DSPABASE: 0x01000000 (II) intel(0): DSPASURF: 0x00000000 (II) intel(0): DSPATILEOFF: 0x00000000 @@ -66,10 +66,10 @@ (II) intel(0): VSYNC_A: 0x01ea01e8 (489 start, 491 end) (II) intel(0): BCLRPAT_A: 0x00000000 (II) intel(0): VSYNCSHIFT_A: 0x00000000 -(II) intel(0): DSPBCNTR: 0xd9000000 (enabled, pipe B) +(II) intel(0): DSPBCNTR: 0x58000000 (disabled, pipe A) (II) intel(0): DSPBSTRIDE: 0x00002000 (8192 bytes) (II) intel(0): DSPBPOS: 0x00000000 (0, 0) -(II) intel(0): DSPBSIZE: 0x02ff03ff (1024, 768) +(II) intel(0): DSPBSIZE: 0x01df027f (640, 480) (II) intel(0): DSPBBASE: 0x01000000 (II) intel(0): DSPBSURF: 0x00000000 (II) intel(0): DSPBTILEOFF: 0x00000000 I can start compiz & a full desktop on screen 0, then a simple xterm & twm on screen 1 and switch back & forth without problems. If I try to use the same user desktop session on both heads (full gnome session running as root on both) the apps on the second head hang, but this may be because they're confused about which head to run on or are waiting for updates from apps on the first head (which are likely not responding). But all of that is with a GEM enabled kernel, which it doesn't look like you have. I'll build your kernel too and see what happens. Hm, nope 2.6.27.8 works ok too, with a full GNOME+compiz session on one screen and xterm on another, and I can swap between them. But the original report referenced a pipe B underrun, which could hang the chip. So on that platform you may comment out the calls to update_dsparb altogether and just use the default value (it should work for most configs) to see if that prevents the hang. (In reply to comment #32) > Hm, nope 2.6.27.8 works ok too, with a full GNOME+compiz session on one screen > and xterm on another, and I can swap between them. > > But the original report referenced a pipe B underrun, which could hang the > chip. So on that platform you may comment out the calls to update_dsparb > altogether and just use the default value (it should work for most configs) to > see if that prevents the hang. > Basically, it should be works. See comment #21. I ever take this as a workaround. I will try your idea to check what happens. Yes, the bug disappears if I disable update_dsparb. However I will come across another problem, when I enable compiz, the second server will also hang with white screen. I will open another bug for this. This is a general issue for all the platform. Created attachment 23191 [details]
This is the diff for 2008Q3 release
I push my diff file based on Q3 release.
I am just doubt why 2008Q4 don't have such problem.
Jesse, do you try on latest upstream packages if the problem exists on latest upstream code?
Patch submitted for SLE11-RC5. Ok, sounds like this one is resolved. I think the DSPARB/FIFO code will be changing soon as part of 18651 and related bugs anyway, but please opena new bug if you find issues with more recent code (in particular KMS). |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.