Bug 24748

Summary: [965G] Graphics crashes when resolution is changed with KMS enabled
Product: xorg Reporter: Christian Eggers <ceggers>
Component: Driver/intelAssignee: Chris Wilson <chris>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium CC: chris, maximlevitsky, michael.fu, n-roeser, torsten, walch.martin
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Output of dmesg
none
Ouput of intel_gpu_dump (before resoluation change)
none
Ouput of intel_gpu_dump (after resolution change)
none
Xorg.0.log
none
Screenshot
none
Output of dmesg (linux-2.6.32.2)
none
Xorg.0.log (linux-2.6.32.2)
none
Output of intel_reg_dumper (linux-2.6.32.2)
none
dmesg (for comment #28)
none
xrandr -q --verbose (for comment #28)
none
try the debug patch that updates the self-refresh watermark on 965 platform
none
try the debug patch that updates the self-refresh watermark on 965 platform
none
dmesg (for comment #33)
none
dmesg (for comment #38)
none
try the debug patch that dumps the output pixel clock range of sdvo device
none
dmesg (for comment #41)
none
xrandr -q --verbose (for comment #41)
none
dmesg (for comment #43)
none
try the debug patch that disable memory self-refresh on 965 desktop platform
none
dmesg (for comment #45)
none
dmesg (for comment #47)
none
dmesg for 2.6.33.2 drm
none
Break the mouse cursor, fix resolution changing
none
Unset cursor if out of bounds.
none
Unset cursor if out of bounds. none

Description Christian Eggers 2009-10-27 00:15:11 UTC
--- Bug description ---
When I try to change the display resolution with KMS enabled (i915.modeset=1), the graphic output is destroyed. This means I can only see parts of the screen and the content starts moving (looks similar to a CRT with lost sync).

--- System environment ---
# uname -m
x86_64
# pkg-config --modversion libdrm
2.4.14
# glxinfo | grep Mesa
OpenGL renderer string: Mesa DRI Intel(R) 965G GEM 20090712 2009Q2 RC3
OpenGL version string: 2.1 Mesa 7.6
# cat /var/log/Xorg.0.log
X.Org X Server 1.6.5
Release Date: 2009-10-11
X Protocol Version 11, Revision 0
Build Operating System: openSUSE SUSE LINUX
Current Operating System: Linux linux 2.6.31.3-1-desktop #1 SMP PREEMPT 2009-10-08 00:27:25 +0200 x86_64
Build Date: 12 October 2009  08:58:03PM
...
(II) LoadModule: "intel"
(II) Loading /usr/lib64/xorg/modules//drivers/intel_drv.so
(II) Module intel: vendor="X.Org Foundation"
        compiled for 1.6.5, module version = 2.9.0                        
        Module class: X.Org Video Driver
        ABI class: X.Org Video Driver, version 5.0
...
# uname -r
2.6.31.3-1-desktop
# cat /etc/SuSE-release
openSUSE 11.2 RC 1 (x86_64)
VERSION = 11.2
# Mainboard:
<I'll have to look... (Intel mainboard with i965G chipset)>
Display connector: DVI (via ADD2 card)

--- Steps to reproduce (about 90% reproducible) ---
1. Start openSUSE 11.2-RC1 from KDE4 Live-CD. Enter "i915.modeset=1" at the boot prompt. (The problem also happens with the current Ubuntu Live-CD)
2. Wait until the desktop is loaded.
3. Start "krandrtray" (using "xrandr" usually leads to the same result)
4. Switch resolution several times (from 1600x1200 to 1024x768)
Comment 1 Christian Eggers 2009-10-27 00:23:21 UTC
Created attachment 30728 [details]
Output of dmesg
Comment 2 Christian Eggers 2009-10-27 00:25:24 UTC
Created attachment 30729 [details]
Ouput of intel_gpu_dump (before resoluation change)
Comment 3 Christian Eggers 2009-10-27 00:25:54 UTC
Created attachment 30730 [details]
Ouput of intel_gpu_dump (after resolution change)
Comment 4 Christian Eggers 2009-10-27 00:26:19 UTC
Created attachment 30731 [details]
Xorg.0.log
Comment 5 Christian Eggers 2009-10-27 00:41:07 UTC
Created attachment 30732 [details]
Screenshot
Comment 6 Carl Worth 2009-11-06 12:53:20 UTC
Hi Christian,

Thanks for the bug report. I'm curious if the output of "xrandr --verbose" is
different between when things work and when things don't. Could you look at
that and perhaps attach the output if different.

I'll assign this bug to yakui.zhao@intel.com who has some experience in this
area.

Thanks,

-Carl
Comment 7 Christian Eggers 2009-11-08 08:49:00 UTC
(In reply to comment #6)
Hi Carl,

> Hi Christian,
> 
> Thanks for the bug report. I'm curious if the output of "xrandr --verbose" is
> different between when things work and when things don't. Could you look at
> that and perhaps attach the output if different.

does xrandr work over ssh? How can I get the information when the graphics output is disturbed?
Comment 8 Christian Eggers 2009-11-08 23:54:38 UTC
(In reply to comment #6)
> I'm curious if the output of "xrandr --verbose" is
> different between when things work and when things don't. Could you look at
> that and perhaps attach the output if different.

The output is nearly identical for both cases. Only the two lines with "Timestamp:" have different values.

For me it looks like the chance that resolution switching does not work is a little bit higher when I use krandrtray instead of xrandr.
Comment 9 Christian Eggers 2009-12-16 22:58:31 UTC
(In reply to comment #6)
> I'll assign this bug to yakui.zhao@intel.com who has some experience in this
> area.

Hi Carl,

is there any progress with this issue? Can I expect that this bug will be fixed in the near future? My housemate told me that he has similar problems on his laptop so I assume that more people are suffering from this error.

regards
Christian
Comment 10 ykzhao 2009-12-21 23:02:42 UTC
Hi, Christian
     Sorry for the late response.
     Will you please try the latest linux kernel(for example:2.6.32.2) and see whether the issue still exists?(Please add the boot option of "drm.debug=0x06" and boot the system with KMS enabled).
     Will you please add the "modedebug" option in xorg.conf and attach the output of Xorg.0.log?
     >Option "modedebug" "True"

     It will be great if you can attach the output of intel_reg_dumper when the issue happens.

Thanks.
Comment 11 Michael Fu 2009-12-26 00:54:09 UTC
ping Chris...
Comment 12 Christian Eggers 2009-12-28 11:56:50 UTC
(In reply to comment #11)
> ping Chris...
> 

Sorry, I was on holiday for the previous days...

Unfortunately it's difficult for me to test with a particular Kernel/Xorg-Driver because I made the investigations with a Live-CD of my Linux-Distro. My real working environment is a little bit outdated an I think it might be difficult to update all components.

My housemate uses Gentoo, so for him it should be much easier to make the tests on his Laptop. Unfortunately he's also on holiday until the 8. January...

Do you know whether there's another Live-CD available which uses the versions of Kernel/xorg-driver which I shall test for you? I could also install the most recent Kernel on my machine but I think it would be difficult to update the X-Server and other required components...

regards
Christian
Comment 13 Christian Eggers 2010-01-07 22:46:20 UTC
Created attachment 32515 [details]
Output of dmesg (linux-2.6.32.2)
Comment 14 Christian Eggers 2010-01-07 22:46:53 UTC
Created attachment 32516 [details]
Xorg.0.log (linux-2.6.32.2)
Comment 15 Christian Eggers 2010-01-07 22:47:22 UTC
Created attachment 32517 [details]
Output of intel_reg_dumper (linux-2.6.32.2)
Comment 16 Christian Eggers 2010-01-07 22:53:21 UTC
(In reply to comment #10)
>      Will you please try the latest linux kernel(for example:2.6.32.2) and see
> whether the issue still exists?(Please add the boot option of "drm.debug=0x06"
> and boot the system with KMS enabled).
>      Will you please add the "modedebug" option in xorg.conf and attach the
> output of Xorg.0.log?
>      >Option "modedebug" "True"
> 
>      It will be great if you can attach the output of intel_reg_dumper when the
> issue happens.

In the meantime I've installed openSUSE 11.2 on hard disk. I hope I can respond faster now if you need further input.

Also after upgrading to 2.6.32.2 ("SUSE Kernel Of The Day"), the problem still persists (perhaps the possibility that the problem happens on mode switching may be even higher!). The same result is for the second PC (Laptop with Gentoo).

regards
Christian
Comment 17 Christian Eggers 2010-01-19 10:45:24 UTC
Hi,

any news...?

regards
Christian
Comment 18 ykzhao 2010-01-30 03:35:16 UTC
(In reply to comment #17)
> Hi,
> any news...?
> regards
> Christian

Sorry for the late response.

Can you try the following patch set on 2.6.33-rc5 kernel and see whether
the issue still exists?
    >http://lists.freedesktop.org/archives/intel-gfx/2010-January/005505.html

BTW: the patch 1 can be skipped as it is already shipped in 2.6.33-rc5 kernel.

Thanks.
   Yakui

Comment 19 Christian Eggers 2010-02-02 21:32:57 UTC
(In reply to comment #18)
> Can you try the following patch set on 2.6.33-rc5 kernel and see whether
> the issue still exists?
>     >http://lists.freedesktop.org/archives/intel-gfx/2010-January/005505.html

Dear Yakui,

kernel compilation stressed my hard drive a little bit. Maybe I'll have to replace it...

I hope I can continue testing the next days and can give you the result end of this week.

For now I've tested with "vanilla" 2.6.33-rc5+your patches. With i915.modeset=1 the monitor goes to standby mode in the middle of the boot process (maybe when X starts).

regards
Christian
Comment 20 Michael Fu 2010-02-02 22:05:12 UTC
(In reply to comment #16)
> (In reply to comment #10)
> Also after upgrading to 2.6.32.2 ("SUSE Kernel Of The Day"), the problem still
> persists (perhaps the possibility that the problem happens on mode switching
> may be even higher!). The same result is for the second PC (Laptop with
> Gentoo).
> 

Do you mean you have same problem another PC (laptop with Gentoo)? What's it HW configuration ? Is it the same as the first PC? thanks.

> regards
> Christian
> 

Comment 21 Christian Eggers 2010-02-03 13:18:15 UTC
(In reply to comment #20)
Dear Michael,
> 
> Do you mean you have same problem another PC (laptop with Gentoo)? What's it HW
> configuration ? Is it the same as the first PC? thanks.

1st PC: Intel mainboard with i965G chipset
2nd PC: Laptop with GM965 (X3100) chipset

All test outputs are generated on the first system, but the symptoms are the same on the second system.

I'll try to test with kernel 2.6.33-RC5 the next days after replacing my defective hard drive.

regards
Christian
Comment 22 Christian Eggers 2010-02-07 02:09:22 UTC
(In reply to comment #18)
> Can you try the following patch set on 2.6.33-rc5 kernel and see whether
> the issue still exists?
>     >http://lists.freedesktop.org/archives/intel-gfx/2010-January/005505.html
> 
After serious hard drive problems I've reinstalled everything and tested 3 configurations (on System "1"):

a) 2.6.31 (openSUSE, w/o your patches)
b) 2.6.33-rc5 (vanilla, w/o your patches)
c) 2.6.33-rc5 (vanilla, w/ your patches)

Results:
a) Problems when resolutions is changed with krandrtray (original problem, looks like "lost of sync" on crts)
b) Similar to a), but usually the monitor enters standby mode instead of showing the "lost of sync" pattern)
c) The monitor enters standby mode during booting. It seems that this happens when the i915 module is loaded (not when starting X as previously guessed). So I could not test what happens when resolution is changed with krandrtray.

So something has changed between 2.6.31 and 2.6.33, but this doesn't solve the problem. And unfortunately your patches also didn't help.

regards
Christian

I'll be on vacation between Feb. 12 and Feb. 28
Comment 23 Christian Eggers 2010-02-09 09:45:42 UTC
d) 2.6.33-rc7 (vanilla, w/o your patches)

Results:
d) Same as b)
Comment 24 Christian Eggers 2010-03-01 10:36:03 UTC
(In reply to comment #23)
2.6.33 (final) also doesn't work. 

Is there any chance that this bug will be fixed in the near future?

regards
Christian
Comment 25 walch.martin 2010-03-07 07:01:50 UTC
If it helps in any way: I have the same problem with

System environment: 
 -- chipset: G965
 -- system architecture: x86_64
 -- xf86-video-intel: git snapshot with last commit 8ece6cf5afa1bb0d8d9328696422f42f3c3adbd6 from Sat Mar 6 14:09:12 2010 -0500
 -- xserver: Server 1.7.5
 -- mesa: 7.7
 -- libdrm: 2.4.17
 -- kernel: 2.6.31-gentoo-r6
 -- Linux distribution: Gentoo
 -- Machine or mobo model: Intel DG965SS
 -- Display connector: VGA/DE-15

I have *two* working resolutions with a monitor "Captiva E1902W" (cheap flatscreen): 1440x900@59,9 and 1280x1024@75.0. I can switch between these two modes and everything works fine. For any other tested mode (1024x768, 800x600, 640x480), I encounter the same out-of-sync problem as descried above. From that state, the only way to recover I have found is a reboot. Switching back to a working mode does not help: the screen stays corrupted. Switching to a VT makes things even worse (black screen, sometimes freeze when afterwards switching back to X).

Please let me know if you need any further information from me.
Comment 26 walch.martin 2010-03-26 11:06:03 UTC
I have tested this again with a recent git snapshot of xf86-video-intel, but with a different monitor (acer AL1917). As far as I remember, this problem existed with both monitors, the Captiva E1902W and the acer AL1917.

However, with the acer monitor, I just made about 20 mode switches and I did not experience any problems. Maybe this has been fixed during the last weeks?

I will check with the Captiva monitor as soon as I get my hands on it again. :)

changes that happended to my system configuration:

-- xf86-video-intel: git snapshot from one day ago (last commit
362a49e71fc41541b6dc121660d98e29da4b14e8)
-- xserver: Server 1.7.6
-- mesa: git snapshot from one day ago (last commit
9eaadfeaa54d15fc3eb90d4137795ace4f920b2f)
-- libdrm: 2.4.19
-- kernel: 2.6.31-gentoo-r10
Comment 27 Christian Eggers 2010-03-27 15:15:04 UTC
(In reply to comment #26)
> changes that happended to my system configuration:
> 
> -- xf86-video-intel: git snapshot from one day ago (last commit
> 362a49e71fc41541b6dc121660d98e29da4b14e8)
> -- xserver: Server 1.7.6
> -- mesa: git snapshot from one day ago (last commit
> 9eaadfeaa54d15fc3eb90d4137795ace4f920b2f)
> -- libdrm: 2.4.19
> -- kernel: 2.6.31-gentoo-r10
> 
I've updated my openSUSE 11.2 to the following package versions:
-- xf86-video-intel: git snapshot from 2010-03-10
-- xserver: Server 1.8.0 RC2
-- mesa: 7.7.99
-- libdrm: 2.4.19
-- kernel: 2.6.33

Result: No changes. The problem is still present as before. Btw: Does it make any sense to update components other than the kernel?

> However, with the acer monitor, I just made about 20 mode switches and I did
> not experience any problems. Maybe this has been fixed during the last weeks?

This doesn't surprise me. Sometimes everything seems to work, but I only need to reboot once and the problem is present again.

regards
Christian
Comment 28 ykzhao 2010-03-29 00:54:57 UTC
> > 
> I've updated my openSUSE 11.2 to the following package versions:
> -- xf86-video-intel: git snapshot from 2010-03-10
> -- xserver: Server 1.8.0 RC2
> -- mesa: 7.7.99
> -- libdrm: 2.4.19
> -- kernel: 2.6.33
> 
> Result: No changes. The problem is still present as before. Btw: Does it make
> any sense to update components other than the kernel?

Sorry for the late response. 
Can you add the boot option of "drm.debug=0x04" and attach the output of dmesg, xrandr -q --verbose?

Do you have an opportunity to try another monitor and see whether the issue still can be reproduced?

Thanks. 

Comment 29 Christian Eggers 2010-03-29 12:39:56 UTC
Created attachment 34533 [details]
dmesg (for comment #28)
Comment 30 Christian Eggers 2010-03-29 12:41:02 UTC
Created attachment 34534 [details]
xrandr -q --verbose (for comment #28)
Comment 31 Christian Eggers 2010-03-29 12:46:16 UTC
(In reply to comment #28)
> Sorry for the late response. 
> Can you add the boot option of "drm.debug=0x04" and attach the output of dmesg,
> xrandr -q --verbose?
Done

> Do you have an opportunity to try another monitor and see whether the issue
> still can be reproduced?

Same problem with Monitor A, B and A+B.

A: HP LP2065, 20.1" LCD, DVI-D (SDVO card), 9.2 kg
B: Viewsonic E790, 19" CRT, VGA, 22.5 kg (~30 kg when moving from/to the basement)

regards
Christian
Comment 32 ykzhao 2010-03-29 20:07:31 UTC
Created attachment 34537 [details] [review]
try the debug patch that updates the self-refresh watermark on 965 platform

From the log in comment #29 it seems that the SR watermark is 1. It is incorrect.

Will you please try the attached debug patch on 2.6.33 kernel and see whether the issue still exists?

thanks.
Comment 33 ykzhao 2010-03-29 20:15:36 UTC
Created attachment 34538 [details] [review]
try the debug patch that updates the self-refresh watermark on 965 platform

Sorry for the typo.

Please try the updated patch.
Comment 34 Christian Eggers 2010-03-30 22:48:40 UTC
Created attachment 34575 [details]
dmesg (for comment #33)

Applied you patch to 2.6.33. Result is (at least nearly) the same: After switching resolution from 1600x1200 to 1024x768, the monitor went to standby mode. After a few seconds I pressed ESC and krandrtray switched back to the previous settings.

Then I tried another mode change which produced the "out of sync" pattern. Some other guy has posted a screenshot which shows the result [1]. Reverting to the previous mode didn't work anymore.

After that I tried to switch to console which filled the screen completely which a single color (also the same behavior as before).

[1]
http://lists.freedesktop.org/archives/intel-gfx/2009-September/004362.html
Comment 35 walch.martin 2010-03-31 07:37:10 UTC
(In reply to comment #27)
> This doesn't surprise me. Sometimes everything seems to work, but I only need
> to reboot once and the problem is present again.

You are right. A reboot broke things again. I made a lot of reboots now and I have not seen any pattern in when things break and when not.


I compared the dmesg output with drm.debug=0x04 from a working case and from a broken case: I did not see any differences besides minor changes in detected cpu frequency, BogoMIPS and the order of some lines about hard disks and network interfaces.

However, I guess I am missing something in my kernel configuration, because I see far less drm output than in the log file in comment #29.

$ grep drm dmesg
Command line: root=/dev/sdb1 drm.debug=0x04
Kernel command line: root=/dev/sdb1 drm.debug=0x04
[drm] Initialized drm 1.1.0 20060810
[drm] DAC-6: set mode 1280x1024 17
[drm] fb0: inteldrmfb frame buffer device
[drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0
Comment 36 Christian Eggers 2010-04-06 13:02:18 UTC
(In reply to comment #35)
> You are right. A reboot broke things again. I made a lot of reboots now and I
> have not seen any pattern in when things break and when not.

Did you use krandrtray or xrandr? Please try "xrandr -q --verbose" and then switch through all available modes 

# xrandr --output DVI1 --mode 0x45
# xrandr --output DVI1 --mode 0x44
# xrandr --output DVI1 --mode 0x46
...

In my case, for instance mode 0x49 (1024x768@85Hz) doesn't work. But this mode is chosen by xrandr when no explicit refresh rate is given. With xrandr I could successfully revert to the previous (working) mode by blindly using the bash history.

Assumptions:
- Maybe that krandrtray (which I used for nearly all previous tests) also choses 85Hz for 1024x768 regardless of the refresh rate which is actually selected by the user.

- Maybe that krandrtray has a bug in the "restore to the previous mode" function.

Could you please check this?

regards
Christian
Comment 37 ykzhao 2010-04-07 02:12:16 UTC
Hi, Christian
     From the dmesg log it seems that the VESA fb driver is also loaded. 
    > vesafb: framebuffer at 0x80000000, mapped to 0xffffc90008980000, using 7500k, total 7616k
     > vesafb: mode is 1600x1200x16, linelength=3200, pages=1

     And after the i915 driver is loaded, the following message is complained:
     >fb: conflicting fb hw usage inteldrmfb vs VESA VGA - removing generic driver

     Can you disable the vesa fb driver in kernel configuration and see whether the issue still exists?


Thanks.


     I
Comment 38 Christian Eggers 2010-04-07 22:33:32 UTC
Created attachment 34795 [details]
dmesg (for comment #38)

> Can you disable the vesa fb driver in kernel configuration and see whether the issue still exists?

Without the vesa fb driver there no big difference, only the "vga=" kernel parameter doesn't work anymore. The display switches from text mode to higher resolution later when the i915 module is loaded.

Yesterday I had again the situation where I could NOT recover from a "bad mode" (0x49) to a working one. After running "xrandr --output DVI1 --mode 0x49" the monitor went to standby mode. Then I tried to revert to mode 0x43, which resulted in the "out of sync" pattern (see attached dmesg).

Intermediary result:
At least some modes which are shown by "xrandr -q --verbose" are "bad". Switching to these modes puts the monitor in standby mode. Sometimes it's possible to switch back to a working mode by blindly using the shell history. Other times this causes the "out of sync" pattern.

For the moment I'm not sure whether the problem can only happen with specific "bad modes" or also with other ones.

regards
Christian
Comment 39 ykzhao 2010-04-08 00:42:54 UTC
(In reply to comment #38)
> Created an attachment (id=34795) [details]
> dmesg (for comment #38)
> 
> > Can you disable the vesa fb driver in kernel configuration and see whether the issue still exists?
> 

Thanks for the testing.

> Without the vesa fb driver there no big difference, only the "vga=" kernel
> parameter doesn't work anymore. The display switches from text mode to higher
> resolution later when the i915 module is loaded.

It seems that the issue still exists after removing the vesa fb driver.

> 
> Yesterday I had again the situation where I could NOT recover from a "bad mode"
> (0x49) to a working one. After running "xrandr --output DVI1 --mode 0x49" the
> monitor went to standby mode. Then I tried to revert to mode 0x43, which
> resulted in the "out of sync" pattern (see attached dmesg).

The message of "out of sync" is related with SDVO DVI. I am not sure whether the issue is related with SDVO.

Can you connect this monitor by using VGA connector and see whether the issue can also be reproduced?

Thanks
   Yakui
Comment 40 ykzhao 2010-04-08 07:25:10 UTC
Created attachment 34811 [details] [review]
try the debug patch that dumps the output pixel clock range of sdvo device

From the dmesg log we can get one message related with SDVO.
    >drm:intel_sdvo_debug_write], SDVOB: W: 16 48 3F 40 30 62 B0 32 40 (SDVO_CMD_SET_OUTPUT_TIMINGS_PART1)
    > [drm:intel_sdvo_debug_response], SDVOB: R: (Not supported)
    
    It seems that this SDVO device can't support the command of setting the output timing of SDVO device. I am not sure whether the high resolution is supported by this SDVO device.

    Will you please try the debug patch and attach the output of dmesg?

Thanks.
   Yakiu
Comment 41 Christian Eggers 2010-04-08 10:20:09 UTC
Created attachment 34819 [details]
dmesg (for comment #41)

(In reply to comment #39)
> The message of "out of sync" is related with SDVO DVI. I am not sure whether
> the issue is related with SDVO.

Sorry, this may be a misunderstanding. "out of sync" is no kernel message, it is a description of the distortion of my screen content. It looks similar to this: http://lists.freedesktop.org/archives/intel-gfx/attachments/20090925/bf583987/attachment-0001.png

When I press any key or move the mouse, the content starts moving very quickly which looks similar to "out of sync" problems on very old crts. If you need, I can try to take a picture with my camera.

> Can you connect this monitor by using VGA connector and see whether the issue
> can also be reproduced?
Done. I've connected the same monitor to the vga connector instead of dvi. The result is the same, but with this configuration the output of xrandr shows different mode entries (see next attachment). This time the 0x4a was the "bad" mode.

I'll test comment #40 in the next minutes...

regards
Christian
Comment 42 Christian Eggers 2010-04-08 10:22:14 UTC
Created attachment 34820 [details]
xrandr -q --verbose (for comment #41)

xrandr -q --verbose with HP LP2065 TFT connected via VGA
Comment 43 Christian Eggers 2010-04-08 11:13:32 UTC
Created attachment 34821 [details]
dmesg (for comment #43)

(In reply to comment #40)

> Will you please try the debug patch and attach the output of dmesg?

Done

regards
Christian
Comment 44 ykzhao 2010-04-12 00:53:11 UTC
Created attachment 34906 [details] [review]
try the debug patch that disable memory self-refresh on 965 desktop platform

Will you please try the attached debug patch and see whether the issue still exists?

Thanks.
Comment 45 Christian Eggers 2010-04-12 13:06:52 UTC
Created attachment 34924 [details]
dmesg (for comment #45)

(In reply to comment #44)

It seems that your patch changed the behavior a little bit. Now I need more mode switches to get an error. When the error happens, the whole screen is white and flickers a little bit. It is not possible to recover from this state by switching back to the previous state.

Do we have two independent errors?

regards
Christian
Comment 46 ykzhao 2010-04-12 17:54:55 UTC
(In reply to comment #45)
> Created an attachment (id=34924) [details]
> dmesg (for comment #45)
> 
> (In reply to comment #44)
> 
> It seems that your patch changed the behavior a little bit. Now I need more
> mode switches to get an error. When the error happens, the whole screen is
> white and flickers a little bit. It is not possible to recover from this state
> by switching back to the previous state.

It seems that you still use the DVI to connector the monitor. How about using VGA?

thanks.
> 
> Do we have two independent errors?
> 
> regards
> Christian
Comment 47 Christian Eggers 2010-04-12 23:11:29 UTC
Created attachment 34948 [details]
dmesg (for comment #47)

(In reply to comment #46)

> It seems that you still use the DVI to connector the monitor. 
This is correct. I use VGA only if requested.

> How about using VGA?
Result is very similar compared to DVI:

- With VGA, mode 0x4a puts the monitor to standby mode which can usually be undo be switching back the previous mode (0x49 in my test). Mode 0x4d causes the "flickering white screen". Sometimes the white screen is about half a second and then it works. But when I switch the second time to this mode the error stays and I can not switch to a working mode again.

- With DVI I have the same results but with mode 0x49 and 0x4c instead of 0x4a and 0x4b.

regards
Christian
Comment 48 Christopher James Halse Rogers 2010-06-14 18:41:08 UTC
Created attachment 36274 [details]
dmesg for 2.6.33.2 drm

I think that we're also seeing this bug in Ubuntu.  I've attached the relevant dmesg from our bug https://bugs.edge.launchpad.net/bugs/586325 .

In this case it's failing on the i965q card found in the Fujitsu Siemens Esprimo E.  For several monitors mode switching from the initial preferred resolution fails, and switching back does not restore correct behaviour.  Apparently for some monitors mode switching works fine.

Video of mode switch failure is here:
http://launchpadlibrarian.net/49431241/IMG_0569.MOV
lspci:
http://launchpadlibrarian.net/49203360/lspci-vvnn
Xorg log:
http://launchpadlibrarian.net/49994258/Xorg.0.log

If you have something which needs more testing, just ask.
Comment 49 Christian Eggers 2010-06-15 10:05:01 UTC
(In reply to comment #48)
Thank you for contributing to  my bug report. Unfortunately it has been a little bit quiet here for some time. 

I hope that the developers at Intel are able AND willing to fix this. I think that changing the screen resolution is a "basic" feature which should be implemented correctly by any graphics driver. Particularly laptop users might want to connect external beamers/displays to their devices without crashing their whole graphical system. I'm quite sure that on other platforms this would have been repaired much earlier.

I've spent many hours providing test feedback for this bug (among other things this required to buy another hard disk and to install "unstable" versions of several packages). It seems that nobody was able to get the real error from my descriptions.

I suggest that some paid developer at Intel should get a board/laptop which is affected and try for him-/herself. When he/she was successful in reproducing and fixing the error I'm inclined to test again...

regards
Christian
Comment 50 Daniel Stone 2010-06-15 11:20:42 UTC
On Tue, Jun 15, 2010 at 10:05:01AM -0700, bugzilla-daemon@freedesktop.org wrote:
> Thank you for contributing to  my bug report. Unfortunately it has been a
> little bit quiet here for some time. 

It's been four days since Yakui last offered a patch, of which two were
the weekend.

> I hope that the developers at Intel are able AND willing to fix this. I think
> that changing the screen resolution is a "basic" feature which should be
> implemented correctly by any graphics driver. Particularly laptop users might
> want to connect external beamers/displays to their devices without crashing
> their whole graphical system. I'm quite sure that on other platforms this would
> have been repaired much earlier.

Actually, I know of very, very few other operating system vendors who
will take a bug report from the general public (particularly if that
person has paid $0 for it), and have a suggested fix prepared within a
couple of days.

Of course, if you have experience of OS X or Windows teams having fixed
a bug you've reported quicker than that, please let me know.  Else, just
sit tight and wait.

> I've spent many hours providing test feedback for this bug (among other things
> this required to buy another hard disk and to install "unstable" versions of
> several packages). It seems that nobody was able to get the real error from my
> descriptions.

If all bugs could be magically solved by looking at a single logfile,
then half of us would be out of a job.
Comment 51 Christian Eggers 2010-06-15 12:53:58 UTC
(In reply to comment #50)
> On Tue, Jun 15, 2010 at 10:05:01AM -0700, bugzilla-daemon@freedesktop.org
> wrote:
>
> It's been four days since Yakui last offered a patch, of which two were
> the weekend.

4 days? For me it looks more like 2 months and 4 days. And 12th of April was a Monday...
 
> Actually, I know of very, very few other operating system vendors who
> will take a bug report from the general public (particularly if that
> person has paid $0 for it), and have a suggested fix prepared within a
> couple of days.

Does Intel sell operating systems? Are you offering me to fix this for money?

> Of course, if you have experience of OS X or Windows teams having fixed
> a bug you've reported quicker than that, please let me know.  Else, just
> sit tight and wait.

The initial bug report has been created more than half a year ago. What do you think is an adequate time for a hardware vendor to fix a driver bug? I've bought a mainboard with Intel graphic because I wanted to have good Linux support. So now I'm at least irritated that such a basic function doesn't work properly.

> If all bugs could be magically solved by looking at a single logfile,
> then half of us would be out of a job.

I know that it is at least difficult to fix this bug. Therefore I've proposed that the developer(s) at Intel should get a mainboard from stock instead of trying to solve such complicated things remotely.

May I ask how you are related to this bug, because I have never seen you contributing anything to this topic before. 

regards
Christian
Comment 52 Daniel Stone 2010-06-16 05:52:29 UTC
On Tue, Jun 15, 2010 at 12:53:58PM -0700, bugzilla-daemon@freedesktop.org wrote:
> --- Comment #51 from Christian Eggers <ceggers@gmx.de> 2010-06-15 12:53:58 PDT ---
> (In reply to comment #50)
> > On Tue, Jun 15, 2010 at 10:05:01AM -0700, bugzilla-daemon@freedesktop.org
> > wrote:
> >
> > It's been four days since Yakui last offered a patch, of which two were
> > the weekend.
> 
> 4 days? For me it looks more like 2 months and 4 days. And 12th of April was a
> Monday...

Oops, misread the date.  My point remains, however.  (The last time I
reported a bug on OS X, by the way, was 2003.  It was still unfixed as
of 2008, at least.  And this was something I actually paid money for!)

> > Actually, I know of very, very few other operating system vendors who
> > will take a bug report from the general public (particularly if that
> > person has paid $0 for it), and have a suggested fix prepared within a
> > couple of days.
> 
> Does Intel sell operating systems? Are you offering me to fix this for money?

No, I'm not.  If you paid anyone for it, I recommend you contact them
for a full refund.

> > If all bugs could be magically solved by looking at a single logfile,
> > then half of us would be out of a job.
> 
> I know that it is at least difficult to fix this bug. Therefore I've proposed
> that the developer(s) at Intel should get a mainboard from stock instead of
> trying to solve such complicated things remotely.
> 
> May I ask how you are related to this bug, because I have never seen you
> contributing anything to this topic before. 

Mainly, I just sit on xorg-team@lists.x.org and respond whenever someone
with an overdeveloped sense of entitlement starts flaming people for
daring to not have their bug (which is one of quite a large number)
fixed yet.
Comment 53 Christopher James Halse Rogers 2010-07-05 23:21:26 UTC
Created attachment 36780 [details] [review]
Break the mouse cursor, fix resolution changing

I suspect that the resolution changes are only a part of this problem.  Now that I've got some hardware to test on I noticed that moving the mouse changes the behaviour of the broken screen, so I suspected that the hardware cursor might be involved.

Investigating this, I came up with the attached patch which simply causes all calls to set the cursor to set a null cursor.  This fixes resolution changing for me - I can cycle through the xrandr mode list to my heart's content.

Of course, this patch also ensures that you can never see a mouse cursor, so it's hardly a fix.
Comment 54 Chris Wilson 2010-07-06 00:28:30 UTC
Wow, never would have suspected that. The next step would seem to be to test disabling the cursor around modesetting.
Comment 55 Christian Eggers 2010-07-06 12:13:12 UTC
(In reply to comment #53)
> Created an attachment (id=36780) [details]
> Break the mouse cursor, fix resolution changing
> 
Thank you for giving the hint for the real source of the problem:

http://fstatic1.mtb-news.de/img/photos/3/3/8/8/0/_/large/ScheeseamAbgrund.JPG

The bike is the cursor, the cliff is your screen. Now reduce the size of the cliff about 10-50% without moving the bike...

How can we move the bike BEFORE reducing the size of the cliff?

regards
Christian
Comment 56 Christopher James Halse Rogers 2010-07-06 19:05:49 UTC
Yup, just disabling the cursor around modesetting works.
Comment 57 Christian Eggers 2010-07-06 21:28:28 UTC
(In reply to comment #56)
> Yup, just disabling the cursor around modesetting works.

Does this also work when the cursor is in an area which will be "outside" the new resolution? Where will the cursor be positioned when it's re-enabled after switching?

Instead of disabling the cursor it may be sufficient to ensure that the cursor will not be outside the new resolution. I "parked" my cursor on the upper left of the screen and cycled resolution about fifty times without any problems.

I'll be on vacation from Thursday until Sunday, 18th. If you would provide a patch, I'll test either today or in the week after my vacation.

regards
Christian
Comment 58 Christopher James Halse Rogers 2010-07-06 23:40:25 UTC
(In reply to comment #57)
> (In reply to comment #56)
> > Yup, just disabling the cursor around modesetting works.
> 
> Does this also work when the cursor is in an area which will be "outside" the
> new resolution? Where will the cursor be positioned when it's re-enabled after
> switching?

It does indeed work - until you move the pointer.

> 
> Instead of disabling the cursor it may be sufficient to ensure that the cursor
> will not be outside the new resolution. I "parked" my cursor on the upper left
> of the screen and cycled resolution about fifty times without any problems.
> 

I've also just noticed this.  So, the problem manifests when the pointer is outside the framebuffer and gets touched.  Is this as simple as the cursor scribbling on memory it's not meant to?
Comment 59 Chris Wilson 2010-07-07 05:30:37 UTC
Created attachment 36827 [details] [review]
Unset cursor if out of bounds.

Following on from Christopher's hint is this patch that should disable the cursor on a mode change if it results in an invalid cursor position.
Comment 60 Chris Wilson 2010-07-07 05:57:30 UTC
Created attachment 36828 [details] [review]
Unset cursor if out of bounds.
Comment 61 Chris Wilson 2010-07-07 15:34:53 UTC
I scanned through the docs I have on hand and the only caveat for cursor positioning is that the VGA popup cursor must entirely be within the bounds of the pipe. Since we don't use that cursor...
Comment 62 Christopher James Halse Rogers 2010-07-07 18:30:02 UTC
(In reply to comment #61)
> I scanned through the docs I have on hand and the only caveat for cursor
> positioning is that the VGA popup cursor must entirely be within the bounds of
> the pipe. Since we don't use that cursor...

It's in the X/Y sign bit register documentation for CURAPOS and CURBPOS - “For normal high resolution display modes, the cursor must have at least a single pixel positioned over the active screen.” (p143, p148 of the hardware registers docs).

(In reply to comment #60)
> Created an attachment (id=36828) [details]
> Unset cursor if out of bounds.

Ding!  This works.  Thanks for fixing this while I slept :)

Tested-by: Christopher Halse Rogers <christopher.halse.rogers@canonical.com>
Comment 63 Christian Eggers 2010-07-08 00:13:08 UTC
(In reply to comment #60)
> Created an attachment (id=36828) [details]
> Unset cursor if out of bounds.

Thank you!!!

This patch works great for xrandr but not for krandrtray. With krandrtray the problem is still present when the mouse cursor is outside the new resolution.

When I move the control bar (and systray) to the upper left of the screen, everything is fine. When the kdrandrtray icon is on the bottom right, the graphic crashes.

Perhaps there's a bug in krandrtray, because even if the graphics doesn't crash, the window sizes are not adjusted to the new resolution (in contrast to xrandr). But even in this case that should not provoke a crash of the graphics.

I'll be on vacation for the next 10 days, so I can not provide any test feedback during this time. If the problem with krandrtray is too complicated, I suggest to apply the current patch immediately in treat the krandrtray problem as a different bug.

regards
Christian
Comment 64 Chris Wilson 2010-07-08 02:39:20 UTC
Christian, I've seen other reports where kdrandrtray behaves differently than xrandr, so I'm inclined to believe that therein lies a few other bugs.

I'll send this off to Eric with the tested-by, thanks!
Comment 65 Christopher James Halse Rogers 2010-07-08 05:36:37 UTC
The patch is still slightly broken:

In intel_crtc_update_cursor you have
...
if (crtc->fb) {
	base = intel_crtc->cursor_addr;
	if (x > crtc->fb->width)
		base = 0;
...
with x a signed int and crtc->fb->width an unsigned integer, and similarly for y and height.  This makes the cursor disappear near the far left and top of the screen (as (unsigned)-1 > 1440) - there's no guarantee the hot point is the top-left of the image, and indeed for me it's not.

Also, reading the docs it seems that it's required that CUR*BASE be written to update any of the cursor regs (vol 3, p142).  I presume that I'm reading it incorrectly, since the cursor appears to update fine without doing that.
Comment 66 Chris Wilson 2010-07-08 05:52:30 UTC
I forgot to check whether fb->width was signed or not.  Hmm, should check with sparse more often I guess. Thanks for spotting that.

CURAPOS:
"This register specifies the physical memory address at which the cursor image data is located.   Writes to this register acts like a trigger that enables atomic updates of the cursor registers.  When updating the cursor registers, this register should be written last in the sequence.  This register should be written even if the actual contents did not change to allow the holding registers to move to the active registers on the next VBLANK."

Hmm, but if the cursor is enabled first with an invalid position what happens? Yes, the update to base and the pos updated is buffered until the next vblank, but that smells like a race and elsewhere the docs say that CUR*BASE should be written last to trigger the updates...
Comment 67 Chris Wilson 2010-07-08 06:24:46 UTC
(In reply to comment #66)
> CURAPOS:

Idiot left in charge of keyboard, again. This is CURABASE:

> "This register specifies the physical memory address at which the cursor image
> data is located.   Writes to this register acts like a trigger that enables
> atomic updates of the cursor registers.  When updating the cursor registers,
> this register should be written last in the sequence.  This register should be
> written even if the actual contents did not change to allow the holding
> registers to move to the active registers on the next VBLANK."

CURAPOS:

"This register can be loaded atomically (requires that the base address be
written) and is double buffered."
Comment 68 Jesse Barnes 2010-07-15 11:22:41 UTC
I think this one is fixed now.
Comment 69 Christian Eggers 2013-01-30 16:06:13 UTC
Unfortunately it seems that with openSUSE 12.2 (kernel 3.4.11) the problem (or a similar one) is present again. Switching to another mode crashes the graphics system with a possibility of 90 percent!

I've filled a new bug report here:
https://bugs.freedesktop.org/show_bug.cgi?id=59066

Unfortunately there hasn't been much progress in the last weeks so I hope someone who has been related to this bug could help.

Could somebody please check whether there's at least a chance to fix it, or it would be more wise to purchase another graphics adapter?

Thank you very much
Christian

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.