Bug 96352 - Complete lockup of Wyse Xn0L laptop using 0.4.158
Summary: Complete lockup of Wyse Xn0L laptop using 0.4.158
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/openchrome (show other bugs)
Version: git
Hardware: x86 (IA32) Linux (All)
: medium blocker
Assignee: Openchrome development list
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-06-03 03:34 UTC by Eric
Modified: 2016-07-01 05:10 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Working 0.4.108 Xorg.0.log from Xn0L (35.28 KB, text/x-log)
2016-06-03 03:36 UTC, Eric
no flags Details
0.4.158 Xorg.0.log from after Xn0L lockup. (36.34 KB, text/x-log)
2016-06-03 03:38 UTC, Eric
no flags Details
0.4.901 corrupted login (3.48 MB, image/jpeg)
2016-06-22 07:35 UTC, Eric
no flags Details
Garbage leftover from login dialog (3.56 MB, image/jpeg)
2016-06-22 07:38 UTC, Eric
no flags Details
0.4.901 Xorg.0.log from Xn0l with DVI and VGA hooked up (66.04 KB, text/x-log)
2016-06-22 07:40 UTC, Eric
no flags Details
0.4.901 Xorg.0.log from Xn0l boot after disconnecting VGA/DVI cables. (52.00 KB, text/x-log)
2016-06-22 07:42 UTC, Eric
no flags Details
Proposed fix to prevent a freeze in Wyse X class mobile thin client (1.10 KB, patch)
2016-06-30 17:38 UTC, Kevin Brace
no flags Details | Splinter Review

Description Eric 2016-06-03 03:34:14 UTC
Current setup: Wyse Xn0L laptop (has LVDS panel, DVI-D and VGA outputs)
Ubuntu 12.04, Kernel 3.2.0-101

Last tried 0.4.108, worked on LVDS fine. 
Upgraded to 0.4.158 and when it goes to X the screen flashes and there are some random garbage pixels. The machine seems to be completely locked at this point. Power button does nothing, Ctrl-Alt-F1 does nothing. 

I have included Xorg.0.log from 0.4.158 (booted from other linux and pulled it out of the filesystem) and Xorg.0.log.old from 0.4.108.
Comment 1 Eric 2016-06-03 03:36:45 UTC
Created attachment 124294 [details]
Working 0.4.108 Xorg.0.log from Xn0L
Comment 2 Eric 2016-06-03 03:38:20 UTC
Created attachment 124297 [details]
0.4.158 Xorg.0.log from after Xn0L lockup.
Comment 3 Kevin Brace 2016-06-04 07:59:57 UTC
Hi Eric,

I compared the 2 log files you have attached, 
The first one is from the log file you claimed that had Version 0.4.108.

______________________________________________________________________
. . .
[   163.639] (II) LoadModule: "openchrome"
[   163.645] (II) Loading /usr/lib/xorg/modules/drivers/openchrome_drv.so
[   163.645] (II) Module openchrome: vendor="http://www.freedesktop.org/wiki/Openchrome/"
[   163.646] 	compiled for 1.11.3, module version = 0.4.115
[   163.646] 	Module class: X.Org Video Driver
[   163.646] 	ABI class: X.Org Video Driver, version 11.0
. . .
______________________________________________________________________


The second one is from the log file you claimed that had Version 0.4.158. 

______________________________________________________________________
. . .
[   151.907] (II) LoadModule: "openchrome"
[   151.913] (II) Loading /usr/lib/xorg/modules/drivers/openchrome_drv.so
[   151.914] (II) Module openchrome: vendor="http://www.freedesktop.org/wiki/Openchrome/"
[   151.914] 	compiled for 1.11.3, module version = 0.4.115
[   151.914] 	Module class: X.Org Video Driver
[   151.914] 	ABI class: X.Org Video Driver, version 11.0
. . .
______________________________________________________________________


Regardless, it is interesting that Wyse Xn0L 1280 X 800 resolution panel does not have I2C bus connected to it, but OpenChrome is recognizing the panel since it always contained the code that will allow it to perform panel automatic detection, but it was not utilized very well until Version 0.4.
Since Version 0.4, in the absence of I2C bus, OpenChrome reads a register that tells the device driver the panel resolution (This register is set by the BIOS during boot time.). 
In both cases, OpenChrome is recognizing the panel.
You may want to rerun the test using the latest Version 0.4.167 from the repository.
Comment 4 Eric 2016-06-22 07:30:58 UTC
Ok so running 0.4.901 when you boot nothing is displayed, The machine seems to be completely locked at this point. Power button does nothing, Ctrl-Alt-F1 does nothing.

There was a zero byte Xorg.0.log file

Next I tried to hook both the VGA and DVI(to HDMI adapter) on a HDTV (also tried a DVI/VGA monitor with same results) and got the Lubuntu login to show up on the VGA and DVI, but it was scrambled. I was able to enter my password but the dialog box went away and left garbage behind. The machine seems to be completely locked at this point. Power button does nothing, Ctrl-Alt-F1 does nothing, Ctrl-Alt-Backspace does nothing. I noticed in the light that I could see the same corrupted display on the laptop LCD, but the inverter/backlight is turned off.

It did generate an Xorg.0.log file.

Next I unhooked the monitor cables and booted, came up clean display with backlight on and I logged in, but it froze right away, no clock advance, nothing.

Also including Xorg.0.log file from this one.
Comment 5 Eric 2016-06-22 07:35:43 UTC
Created attachment 124655 [details]
0.4.901 corrupted login
Comment 6 Eric 2016-06-22 07:38:16 UTC
Created attachment 124656 [details]
Garbage leftover from login dialog
Comment 7 Eric 2016-06-22 07:40:29 UTC
Created attachment 124657 [details]
0.4.901 Xorg.0.log from Xn0l with DVI and VGA hooked up
Comment 8 Eric 2016-06-22 07:42:12 UTC
Created attachment 124658 [details]
0.4.901 Xorg.0.log from Xn0l boot after disconnecting VGA/DVI cables.
Comment 9 Kevin Brace 2016-06-25 05:48:12 UTC
Hi Eric,

I apologize for the inconvenience it has caused with you.
If I were to guess what went wrong, I would think it was at Version 0.4.130.
Hence, let's try Version 0.4.129.

https://cgit.freedesktop.org/openchrome/xf86-video-openchrome/commit/?id=f45c5765bac55d69fa9637cea67844e6f95ff25f

Some of the commits done at Version 0.4.130 were very specific to P4M900 / VN896 / CN896 chipsets, but in particular, likely too specific to Epic 1314 I use for OpenChrome development.
I would imagine that this caused the break down.
    Anyway, you may want to compile the Version 0.4.129 code by checking out the old code.

git checkout f45c5765bac55d69fa9637cea67844e6f95ff25f

From here, regenerate the compilation script, compile, and install.
If this version is working with the LCD then you can try Version 0.4.130.

https://cgit.freedesktop.org/openchrome/xf86-video-openchrome/commit/?id=46fa8e812f13e6cb00f1ec5eee936e089cd9886c

You will check out Version 0.4.130 similar to the way done with Version 0.4.129.

git checkout 46fa8e812f13e6cb00f1ec5eee936e089cd9886c

If Version 0.4.130 is the one that caused the LCD bug, then I will have to fix the code so that I can release OpenChrome Version 0.5.
I will not release a new version until this bug is fixed.
In either case, if you wanted to get out of detached head state.

git checkout master

That should get you out of it.
Comment 10 Kevin Brace 2016-06-25 20:00:11 UTC
Hi Eric,

Another thing to note is that due to the way OpenChrome is written currently, it really cannot handle a triple head configuration.
My recommendation is to avoid using DVI simultaneously with LVDS FP for now.
LVDS FP does work with VGA simultaneously, and in fact, I use it like this almost everyday.
In some ways, triple head configuration is not uncommon; many VIA IGP based laptops were sold in LVDS FP + VGA + TV configuration.
Probably around OpenChrome Version 0.6 or 0.7, code will be updated to accommodate having more than 2 display devices.
I am still largely fixing the existing code, and very little of my own code has made its way into OpenChrome at this point (Also, I only had OpenChrome Git repository access for "only" 4 1/2 months.).
The current work I am conducting is really a preparation work for the future KMS (Kernel Mode Setting) mainlining work that has to happen at some point.
    If my guess as to which version caused the regression is correct, then I will make a patch for you to test so that if I can exonerate the rest of the code.
Comment 11 Eric 2016-06-27 05:33:43 UTC
I narrowed it down

0.4.151 works

0.4.152 crashes

- Log -----------------------------------------------------------------
commit 9243d288410857a8c38d11c391af2734d8d482cf
Author: Kevin Brace <kevinbrace@gmx.com>
Date:   Sun May 22 22:24:16 2016 -0700

    Version bumped to 0.4.152
   
    Signed-off-by: Kevin Brace <kevinbrace@gmx.com>

commit 7738d22741cd7cce32b6d3df1e809e7403f74bc5
Author: Kevin Brace <kevinbrace@gmx.com>
Date:   Sun May 22 22:22:03 2016 -0700

    Major rewrite of viaIGAInitCommon
   
    Reorganized viaIGAInitCommon function, and add initialization code
    for a few registers. This function is located inside via_display.c.
   
    Signed-off-by: Kevin Brace <kevinbrace@gmx.com>
Comment 12 Kevin Brace 2016-06-27 23:47:37 UTC
(In reply to Eric from comment #11)

Hi Eric,

> I narrowed it down
> 
> 0.4.151 works
> 
> 0.4.152 crashes
> 
> - Log -----------------------------------------------------------------
> commit 9243d288410857a8c38d11c391af2734d8d482cf
> Author: Kevin Brace <kevinbrace@gmx.com>
> Date:   Sun May 22 22:24:16 2016 -0700
> 
>     Version bumped to 0.4.152
>    
>     Signed-off-by: Kevin Brace <kevinbrace@gmx.com>
> 
> commit 7738d22741cd7cce32b6d3df1e809e7403f74bc5
> Author: Kevin Brace <kevinbrace@gmx.com>
> Date:   Sun May 22 22:22:03 2016 -0700
> 
>     Major rewrite of viaIGAInitCommon
>    
>     Reorganized viaIGAInitCommon function, and add initialization code
>     for a few registers. This function is located inside via_display.c.
>    
>     Signed-off-by: Kevin Brace <kevinbrace@gmx.com>

Thank you for narrowing down the bug to a certain patch level. (The ZZZ part of Version XXX.YYY.ZZZ)
That really saves a lot of time on my end.
I will get a patch ready for testing by tomorrow.
It will likely remove some of the code within the viaIGAInitCommon function for testing purposes, so that I can narrow down further to determine which hardware register access is the offending party here.
In the meantime, there will be a new RC (RC5) that is not really related to your bug fix.
The patch will be generated against that version, although it might still work with RC4 or RC3.
Hopefully, by RC7 or RC8, it will be a release version.
Comment 13 Kevin Brace 2016-06-30 17:38:21 UTC
Created attachment 124806 [details] [review]
Proposed fix to prevent a freeze in Wyse X class mobile thin client

In Wyse X class mobile thin client, it was observed that setting 
SR2E[3:2] (3C5.2E[3:2]; PCI Master / DMA) to 0b11 (clock on / off 
according to the engine IDLE status) causes an X.Org Server boot 
failure. Setting this register to 0b10 (clock always on) corrects 
the problem.

Signed-off-by: Eric Kudzin <"Eric's e-mail address">
Signed-off-by: Kevin Brace <"Kevin's e-mail address">
Comment 15 Kevin Brace 2016-07-01 05:10:14 UTC
Just for those wondering about this bug, it took about 4 hours to fix this bug.
It required 10 patches to figure out which register was causing the freeze, and the tenth patch nailed the bug.
It was actually really hard to fix this bug since the register I changed the setting was the register I least expected to cause a problem like this.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.