Bug 31156 - Graphical distortion in Citrix XenApp with Intel driver for GL40 in (part of thin client HP 5745)
Summary: Graphical distortion in Citrix XenApp with Intel driver for GL40 in (part of ...
Status: RESOLVED NOTOURBUG
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/intel (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: Carl Worth
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-10-27 02:27 UTC by Moritz Mühlenhoff
Modified: 2011-08-17 05:13 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Kernel config (67.63 KB, text/plain)
2010-10-27 02:27 UTC, Moritz Mühlenhoff
no flags Details
lspci -vvv (21.04 KB, text/plain)
2010-10-27 02:27 UTC, Moritz Mühlenhoff
no flags Details
xorg.conf (3.22 KB, application/octet-stream)
2010-10-27 02:28 UTC, Moritz Mühlenhoff
no flags Details
Xorg.log (35.62 KB, application/octet-stream)
2010-10-27 02:28 UTC, Moritz Mühlenhoff
no flags Details
Horizontal duplication with GL40 under Citrix XenApp (534.02 KB, video/mp4)
2011-07-11 07:05 UTC, Moritz Mühlenhoff
no flags Details
Redraw artefacts using the GL40 under Citrix XenApp (1.82 MB, video/mp4)
2011-07-11 07:07 UTC, Moritz Mühlenhoff
no flags Details

Description Moritz Mühlenhoff 2010-10-27 02:27:17 UTC
Created attachment 39804 [details]
Kernel config

Dear Xorg developers,
we're running into a problem with the Intel driver on the thin client HP 5745. According to the HP sheets the brand name of the chipset is GL40, but I'm attaching the output of "lspci -vvv" to make it specific.

We're seeing two problems when running the Linux client for Citrix XenApp (formerly known as Citrix Presentation Server):

1. The position of the mouse cursor in the Citrix session is not aligned with the position of the mouse cursor: If I keep the mouse button pressed to mark a region of the Windows desktop (e.g. to select multiple icons), then a X position is doubled: If I press the mouse button five pixels from the left screen border, then selection frame appears five pixels from the screen border, but the mouse cursor ten pixels away. If I start the selection 50 pixels away, then the mouse cursor is at 100 pixels. On the Y axis there is no duplication!

2. Screen contents are not updated consistently: Fragments of windows moved around reside on the screen. It seems as if screen repaints are triggered inconsistently.

Both symptons occur together, as explained later.

You may ask yourself why this bug is filed against the Intel driver: These symptoms only occur with the HP 5745 thin clients equipped with a Intel GL40. Other thin clients running the same installation (e.g. HP 5735 with the ati driver and FSC Futro thin clients with the sis driver) work fine against the same Citrix servers (running XenApp 6 on Windows 2008).

All thin clients run Linux based on Debian Lenny (5.0) with a few updated components: The Linux kernel is 2.6.35.4 (.config attached) and the Xserver 2.9.1 (the Debian package, but recompiled for the Lenny base)
As standard, the thin clients use KMS.

As for reproducability:
- If we disable KMS via modprobe, we can always reproduce the problem.

- When running under KMS the problems occurs randomly on some systems, but not systematically, there's not yet a clear pattern how to reproduce the error with KMS.

Users log-in on GDM and connect to Citrix XenApp with a session script. Both errors only occur in Citrix, but not in the GDM screen.

Does the symptoms of the errors give you any hints, as to what's going wrong?

I'm attaching the following files, please get back to me if you need further debug information:

config.txt: The configuration of the kernel used on the thin client
lspci.txt: The output of "lspci -vvv" on the thin client
xorg.log: The X.org.log of the TC running UMS (and thus exposing the problem)
xorg.conf: The X.org configuration used on the thin client

Cheers,
Moritz
Comment 1 Moritz Mühlenhoff 2010-10-27 02:27:45 UTC
Created attachment 39805 [details]
lspci -vvv
Comment 2 Moritz Mühlenhoff 2010-10-27 02:28:34 UTC
Created attachment 39806 [details]
xorg.conf
Comment 3 Moritz Mühlenhoff 2010-10-27 02:28:53 UTC
Created attachment 39807 [details]
Xorg.log
Comment 4 Chris Wilson 2010-10-27 13:25:34 UTC
Sounds like Citrix is using the Damage extension to track areas of the screen that need to be updated. xorg-1.9 contains a fair number of fixes for damage and would be a good starting point for testing.
Comment 5 Moritz Mühlenhoff 2010-11-01 05:46:39 UTC
(In reply to comment #4)
> Sounds like Citrix is using the Damage extension to track areas of the screen
> that need to be updated. xorg-1.9 contains a fair number of fixes for damage
> and would be a good starting point for testing.

Thanks for that very valuable advice!

Adding the following to xorg.conf indeed fixes the problem:

Section "Extensions"
        Option  "DAMAGE" "disabled"
EndSection

Unfortunately it's very difficult to test xorg 1.9, since the problem is only exposed on the graphics chip on the HP thin client, but not on stationary computers, where we could upgrade xorg more easily. 

Is the damage extension something which is negotiated between the device-specific driver and the xserver? Would it be possible to force a different graphics adapter or the VESA driver into using the damage extension? In that case we could test xorg 1.9.
Comment 6 Chris Wilson 2011-06-26 02:06:29 UTC
(In reply to comment #5)
> Is the damage extension something which is negotiated between the
> device-specific driver and the xserver? Would it be possible to force a
> different graphics adapter or the VESA driver into using the damage extension?
> In that case we could test xorg 1.9.

The damage extension is part of the core Xserver. The complication is in the sequence of when the damage is advertised to the listening clients and when it is rendered by the DDX - the bugs have all been where that notification is sent before the driver has had a chance to render the damage, and so the client copies the old contents. As such, it does need to be tested with the right combination of driver/xserver. :|

An alternative to you doing the testing is if you can provide instructions on how I might replicate one of your thin clients?
Comment 7 Moritz Mühlenhoff 2011-06-29 05:59:27 UTC
(In reply to comment #6)
> (In reply to comment #5)
> > Is the damage extension something which is negotiated between the
> > device-specific driver and the xserver? Would it be possible to force a
> > different graphics adapter or the VESA driver into using the damage extension?
> > In that case we could test xorg 1.9.
> 
> The damage extension is part of the core Xserver. The complication is in the
> sequence of when the damage is advertised to the listening clients and when it
> is rendered by the DDX - the bugs have all been where that notification is sent
> before the driver has had a chance to render the damage, and so the client
> copies the old contents. As such, it does need to be tested with the right
> combination of driver/xserver. :|
> 
> An alternative to you doing the testing is if you can provide instructions on
> how I might replicate one of your thin clients?

We did an experimental build of more recent Xorg components: With version 1.9 of the core Xserver and version 2.12 the same bugs are exposed as in my initial report (points 1 and 2 of my initial report).

I suspect the biggest problem in getting this replicated is the fact that it only occurs when connecting to Citrix XenApp. Also the Citrix client is closed source so it's not really possible to check what is being done on the client side :-/

This doesn't occur with other chipsets (tested with SiS and ATI), so your advice on the interaction with the DDX is probably correct.
Comment 8 Moritz Mühlenhoff 2011-06-29 08:09:50 UTC
(In reply to comment #7)

> We did an experimental build of more recent Xorg components: With version 1.9
> of the core Xserver and version 2.12 the same bugs are exposed as in my initial
> report (points 1 and 2 of my initial report).

Intel driver 2.14 built against core server 1.9 exposes the same problem. I'm currently building a 1.10 server to see if it makes any difference.
Comment 9 Moritz Mühlenhoff 2011-06-30 03:04:31 UTC
(In reply to comment #8)
> (In reply to comment #7)
> 
> > We did an experimental build of more recent Xorg components: With version 1.9
> > of the core Xserver and version 2.12 the same bugs are exposed as in my initial
> > report (points 1 and 2 of my initial report).
> 
> Intel driver 2.14 built against core server 1.9 exposes the same problem. I'm
> currently building a 1.10 server to see if it makes any difference.

The same errors occur with a build of the Intel driver 2.15 against the core server 1.10.1
Comment 10 Moritz Mühlenhoff 2011-07-07 23:28:53 UTC
We did further tests to rule out a bug specific to our custom distribution (which is based on Debian Lenny with updated Xorg/kernel bits to support more recent hardware). As such, we installed Ubuntu 10.10 and 11.04 on the thin clients:
The error occurs with Ubuntu 11.04, while with 10.10 everything works correctly.

We're going to provide a screen capture video to give an impression of graphics corruption soon.
Comment 11 Moritz Mühlenhoff 2011-07-11 07:04:23 UTC
I'm attaching two short mp4 videos showcasing the bug:

1. horizontal-duplication.mp4 shows the effect I described in my initial report:

"The position of the mouse cursor in the Citrix session is not aligned with
the position of the mouse cursor: If I keep the mouse button pressed to mark a
region of the Windows desktop (e.g. to select multiple icons), then the X
position is doubled: If I press the mouse button five pixels from the left
screen border, then selection frame appears five pixels from the screen border,
but the mouse cursor ten pixels away. If I start the selection 50 pixels away,
then the mouse cursor is at 100 pixels. On the Y axis there is no duplication!"

In this video I'm marking two regions on the desktop. The second marked region is a bit hard to see due to the MPEG compression, but the box only shows right of the Firefox desktop icon, while the mouse cursor is on the left.

2. redraw-corruption.mp4 shows the graphical artefacts present when moving windows around.

Which additional information would be needed to narrow this down further? Would xtrace protocols help (once for the working and for the non-working X)?

As for replicating the bug: 

- For the client part (I'm not sure for what types of systems the GL40 was sold):
1. If you have access to a HP 5745 thin client we can provide a Squashfs image with the installation running on the HP 5745 thin client exposing the problem. 
2. If you have access to a desktop system with a GL40 we can provide the necessary steps for a Ubuntu 11.04 installation to install the Citrix client (available as a DEB as well) and start the Citrix client through a GDM session script after installation.

- For the server part:
Installing Citrix is a rather complex process requiring Active Directory etc, but we can arrange a SSH tunnel to a Citrix installation for debugging.
Comment 12 Moritz Mühlenhoff 2011-07-11 07:05:49 UTC
Created attachment 48973 [details]
Horizontal duplication with GL40 under Citrix XenApp
Comment 13 Moritz Mühlenhoff 2011-07-11 07:07:17 UTC
Created attachment 48974 [details]
Redraw artefacts using the GL40 under Citrix XenApp
Comment 14 Giles Atkinson 2011-07-22 06:21:00 UTC
The DAMAGE extension is not relevant - the program does not use it.
Comment 15 Moritz Mühlenhoff 2011-08-17 03:05:03 UTC
(In reply to comment #14)
> The DAMAGE extension is not relevant - the program does not use it.

Thanks!

In the mean time we've been able to reproduce this problem on a thin client with a non-Intel chipset as well. As such, this bug should be reassigned to a different component (I'm unsure which one, though):

The same errors (as shown in the videos in the attachments of this bug) also occur on a "Terra Nettop 3100" thin client with the following Nvidia card:

01:00.0 VGA compatible controller: nVidia Corporation Device 0a6f (rev a2) (prog-if 00 [VGA controller])
        Subsystem: Holco Enterprise Co, Ltd/Shuttle Computer Device 4003
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 32 bytes
        Interrupt: pin A routed to IRQ 16
        Region 0: Memory at fd000000 (32-bit, non-prefetchable) [size=16M]
        Region 1: Memory at d0000000 (64-bit, prefetchable) [size=256M]
        Region 3: Memory at ce000000 (64-bit, prefetchable) [size=32M]
        Region 5: I/O ports at dc00 [size=128]
        Expansion ROM at fea80000 [disabled] [size=512K]
        Capabilities: [60] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [68] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable-
                Address: 0000000000000000  Data: 0000
        Capabilities: [78] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 5GT/s, Width x16, ASPM L0s L1, Latency L0 <256ns, L1 <4us
                        ClockPM+ Suprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 128 bytes Disabled- Retrain- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
        Capabilities: [b4] Vendor Specific Information <?>
        Capabilities: [100] Virtual Channel <?>
        Capabilities: [128] Power Budgeting <?>
        Capabilities: [600] Vendor Specific Information <?>
        Kernel driver in use: nouveau
        Kernel modules: nouveau, nvidiafb

We're using the Nouveau driver in version 1:0.0.16+git20100805+b96170a (as packaged in Debian). These thin clients come preinstalled with Ubuntu 10.10 (which we don't use) and the same error is present there.

In the case of the Nouveau driver the distortions only occur under very specific conditions (which is why we didn't spot this earlier):

The bug only occurs if two monitors are connected to the thin client w/o a multi monitor setup being configured with xrandr, i.e. the graphical output of the thin client is mirrored to the two monitors.
When using a single monitor or configuring a xrandr display spanning the two monitors everything works fine!

As such, this is probably a deeper rooting X11 issue. So far we've been unable to reproduce the visual errors with anything else than the Citrix Receiver client.
Comment 16 Giles Atkinson 2011-08-17 03:32:21 UTC
In comment #14, Moritz Mühlenhoff wrote:

> The bug only occurs if two monitors are connected to the thin client w/o a
> multi monitor setup being configured with xrandr, i.e. the graphical output of
> the thin client is mirrored to the two monitors.
> When using a single monitor or configuring a xrandr display spanning the two> 
monitors everything works fine!

That sounds like a bug recently identified and fixed at Citrix.  In that case, the XINERAMA extension was reporting two monitors with identical size and position.  That information was passed on to the (Windows) server, which seems to confuse it.  Graphics corruption as described was also seen, so it seems likely that this is purely a Citrix bug.  It will be fixed in the next release of "Citrix Reciever for Linux", due in the next few months, but if you want to take it to Citrix Technical Support, to get a fix sooner, quoting BUG0156864 should speed things up.
Comment 17 Moritz Mühlenhoff 2011-08-17 05:13:13 UTC
(In reply to comment #16)
> In comment #14, Moritz Mühlenhoff wrote:
> 
> > The bug only occurs if two monitors are connected to the thin client w/o a
> > multi monitor setup being configured with xrandr, i.e. the graphical output of
> > the thin client is mirrored to the two monitors.
> > When using a single monitor or configuring a xrandr display spanning the two> 
> monitors everything works fine!
> 
> That sounds like a bug recently identified and fixed at Citrix.  In that case,
> the XINERAMA extension was reporting two monitors with identical size and
> position.  That information was passed on to the (Windows) server, which seems
> to confuse it.  Graphics corruption as described was also seen, so it seems
> likely that this is purely a Citrix bug.  It will be fixed in the next release
> of "Citrix Reciever for Linux", due in the next few months, but if you want to
> take it to Citrix Technical Support, to get a fix sooner, quoting BUG0156864
> should speed things up.

Confirmed! The reason the bug is triggered with the HP 5745 is because the hardware includes an additional LVDS1 display identifier, which is activated, but not connected to any VGA/DisplayPort/HDMI etc. display link. Adding 

Section "Monitor"
    Identifier      "LVDS1"
    Option "Ignore" "true"
EndSection

to xorg.conf fixes the display problems as a workaround. We'll pass on the bug reference to the local administrator of the Citrix installation.

Thanks for everyone's comments! While it was false alarm on the Xorg side, this bug is hopefully still a useful reference for people experiencing the same Citrix receiver bug.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.