Summary: | [915GM][KMS] resume fails with backlight turned off | ||
---|---|---|---|
Product: | xorg | Reporter: | Tobias Diedrich <ranma+freedesktop> |
Component: | Driver/intel | Assignee: | Jesse Barnes <jbarnes> |
Status: | RESOLVED FIXED | QA Contact: | Xorg Project Team <xorg-team> |
Severity: | normal | ||
Priority: | medium | CC: | kenyon, yakui.zhao, zhenyu.z.wang |
Version: | 7.4 (2008.09) | ||
Hardware: | x86 (IA32) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Description
Tobias Diedrich
2010-02-24 20:04:54 UTC
Created attachment 33545 [details]
dmesg
Created attachment 33546 [details]
xorg.log
Created attachment 33547 [details]
xorg.conf
Created attachment 33577 [details]
regdump with working display
Created attachment 33578 [details]
reg dump after resume with black screen
This time I didn't get xrandr errors, so I didn't log the xrandr --verbose output.
I did determine that at least this time it was only the backlight that stayed off.
I was unable to enable the backlight using thinkpad brightness keys or bl_power /sys/devices/virtual/backlight/thinkpad_screen/. Right now bl_power is 0 despite the backlight being on...
I wonder if the backlight is only controlled by the embedded controller or maybe it can also be switched on/off by the GM915?
--- intel_reg_dumper_baseline_display_working 2010-02-25 13:11:58.453333790 +0900 +++ intel_reg_dumper_display_black 2010-02-26 11:17:20.432191101 +0900 @@ -39,9 +39,9 @@ DVOC_SRCDIM: 0x00000000 PP_CONTROL: 0x00000001 (power target: on) PP_STATUS: 0xc0000008 (on, ready, sequencing idle) - PP_ON_DELAYS: 0x00fa09c4 - PP_OFF_DELAYS: 0x00fa09c4 - PP_DIVISOR: 0x00270f02 + PP_ON_DELAYS: 0x00000000 + PP_OFF_DELAYS: 0x00000000 + PP_DIVISOR: 0x00270f04 PFIT_CONTROL: 0x00000008 PFIT_PGM_RATIOS: 0x00000000 PORT_HOTPLUG_EN: 0x00000000 @@ -50,7 +50,7 @@ DSPASTRIDE: 0x00001000 (4096 bytes) DSPAPOS: 0x00000000 (0, 0) DSPASIZE: 0x02ff03ff (1024, 768) - DSPABASE: 0x00c00000 + DSPABASE: 0x01000000 DSPASURF: 0x00000000 DSPATILEOFF: 0x00000000 PIPEACONF: 0x00000000 (disabled, single-wide) @@ -89,9 +89,9 @@ PIPEB_GMCH_DATA_N: 0x00000000 PIPEB_DP_LINK_M: 0x00000000 PIPEB_DP_LINK_N: 0x00000000 - CURSOR_B_BASE: 0x00000000 - CURSOR_B_CONTROL: 0x10000000 - CURSOR_B_POSITION: 0x02c903df + CURSOR_B_BASE: 0x36558000 + CURSOR_B_CONTROL: 0x14000027 + CURSOR_B_POSITION: 0x021602c2 FPB0: 0x00030e09 (n = 3, m1 = 14, m2 = 9) FPB1: 0x00030e09 (n = 3, m1 = 14, m2 = 9) DPLL_B: 0x98026000 (enabled, non-dvo, spread spectrum clock, LVDS mode, p1 = 2, p2 = 14) @@ -144,9 +144,9 @@ TV_H_CHROMA_59: 0x0000b060 FBC_CFB_BASE: 0x5f800000 FBC_LL_BASE: 0x36893000 - FBC_CONTROL: 0xc1f407e3 + FBC_CONTROL: 0xc1f407e4 FBC_COMMAND: 0x00000000 - FBC_STATUS: 0x60000000 + FBC_STATUS: 0x40000000 FBC_CONTROL2: 0x00000000 FBC_FENCE_OFF: 0x00000000 FBC_MOD_NUM: 0x00000000 @@ -190,9 +190,9 @@ FENCE 0: 0x00000000 (disabled, X tiled, 0 pitch, 0x00000000 - 0x00000000 (0kb)) FENCE 1: 0x00000000 (disabled, X tiled, 0 pitch, 0x00000000 - 0x00000000 (0kb)) FENCE 2: 0x00000000 (disabled, X tiled, 0 pitch, 0x00000000 - 0x00000000 (0kb)) - FENCE 3: 0x00c00231 ( enabled, X tiled, 16 pitch, 0x00c00000 - 0x00e00000 (2048kb)) - FENCE 4: 0x01400131 ( enabled, X tiled, 16 pitch, 0x01400000 - 0x01500000 (1024kb)) - FENCE 5: 0x05400011 ( enabled, X tiled, 4 pitch, 0x05400000 - 0x05400000 (0kb)) + FENCE 3: 0x00c00131 ( enabled, X tiled, 16 pitch, 0x00c00000 - 0x00d00000 (1024kb)) + FENCE 4: 0x01000231 ( enabled, X tiled, 16 pitch, 0x01000000 - 0x01200000 (2048kb)) + FENCE 5: 0x00000000 (disabled, X tiled, 0 pitch, 0x00000000 - 0x00000000 (0kb)) FENCE 6: 0x00000000 (disabled, X tiled, 0 pitch, 0x00000000 - 0x00000000 (0kb)) FENCE 7: 0x00000000 (disabled, X tiled, 0 pitch, 0x00000000 - 0x00000000 (0kb)) INST_PM: 0x00000000 Ok, I've been looking at the kernel code, at x.org/docs/intel and poking with intel_reg_write/intel_reg_read a bit. Here is what I found: First of all, I thought this part is especially suspicious: @@ -39,9 +39,9 @@ DVOC_SRCDIM: 0x00000000 PP_CONTROL: 0x00000001 (power target: on) PP_STATUS: 0xc0000008 (on, ready, sequencing idle) - PP_ON_DELAYS: 0x00fa09c4 - PP_OFF_DELAYS: 0x00fa09c4 - PP_DIVISOR: 0x00270f02 + PP_ON_DELAYS: 0x00000000 + PP_OFF_DELAYS: 0x00000000 + PP_DIVISOR: 0x00270f04 PFIT_CONTROL: 0x00000008 PFIT_PGM_RATIOS: 0x00000000 PORT_HOTPLUG_EN: 0x00000000 I got the same on a second diff against a new baseline dump and a new dump after resume to black once again. When I tried writing to them using intel_reg_write the write didn't work and the register documentation confirmed that these can only be written when PP_STATUS is OFF. After inserting a delay after the 0-write to PP_CONTROL, writing the registeres worked fine, but even with the new values the display would switch on and off properly, so this is a red herring apparently. Note that the new values are the same as the power-up defaults according to the register documentation, so it looks like the register restore after suspend is not working as it should. Looking at the kernel code: In i915_restore_display: [...] Skipping the ironlake part [...] I915_WRITE(PFIT_PGM_RATIOS, dev_priv->savePFIT_PGM_RATIOS); I915_WRITE(BLC_PWM_CTL, dev_priv->saveBLC_PWM_CTL); I915_WRITE(BLC_HIST_CTL, dev_priv->saveBLC_HIST_CTL); I915_WRITE(PP_ON_DELAYS, dev_priv->savePP_ON_DELAYS); I915_WRITE(PP_OFF_DELAYS, dev_priv->savePP_OFF_DELAYS); I915_WRITE(PP_DIVISOR, dev_priv->savePP_DIVISOR); I915_WRITE(PP_CONTROL, dev_priv->savePP_CONTROL); Note how restore display doesn't seem to care about the write protection, it really should write a 0 to PP_CONTROL and wait until PP_STATUS signals OFF before writing PP_(ON|OFF)_DELAYS and PP_DIVISOR. That may explain how the values ended up being default values. I thought the BLC_* registeres looked interesting since they are missing from intel_reg_dumper output and also backlight related. I amended the reg_dumper to also output those: # intel_reg_dumper | grep BLC BLC_PWM_CTL: 0x6c676c66 BLC_HIST_CTL: 0x80000000 The documentation says default for PWM_CTL is 0x00000000, let's see what happens if we write that one... Oh, if it's 0x00000000 the backlight doesn't turn on! I think that has a good chance of being the issue I'm seeing, but I still have to confirm by running intel_reg_dumper the next time I get the black display issue. I also created a 'turn backlight on' script that pokes all involved registers directly: #!/bin/sh # PP_CONTROL, POWER_OFF intel_reg_write 0x61204 0x0 sleep 1 # now power should be off and control registers writeable # BLC_PWM_CTL intel_reg_write 0x61254 0x6c676c66 # PP_ON_DELAYS intel_reg_write 0x61208 0x00fa09c4 # PP_OFF_DELAYS intel_reg_write 0x6120c 0x00fa09c4 # PP_DIVISOR intel_reg_write 0x61210 0x00270f02 # PP_CONTROL, POWER_ON intel_reg_write 0x61204 0x1 BTW, I just tried writing differend PWM values besides 0x6c676c66 and 0x0 to BLC_PWM_CTL and found that I can control brightness using this register, which apparently works independently of the normal thinkpad brightness control. Looks like it's most likely both PWM signals could possibly be ANDed together to create the real PWM signal for the display. Intel developers, any comments? Created attachment 34097 [details] [review] try the debug patch that writes the protected off key to restore LVDS panel sequence register will you please try the debug patch and see whether the issue can be fixed? Please attach the output of intel_reg_dumper after resuming. Thanks. Right now I'm still waiting for the issue to happen again to have a look at BLC_PWM_CTL. But it's _very_ sporadic. I think it hasn't happened in a week. At the moment that's my most likely candidate, and it's not affected by the write-protect anyway. When I've confirmed wether or not I can fix the backlight by register poking I'll look at the patch. Took quite long this time, but I finally got the error again just now. I have a slight inkling this may be a weird temperature related hardware issue. Sure hope I'm wrong, but it started happening during winter when the room would get cold during the night and now that it's getting warmer again it took longer for the issue to happen again. But anyway: Modified intel_reg_dumper shows that BLC_PWM_CTL is indeed 0x00000000 when this happens. My blon script worked and restored display. No Idea why this happens though, as BLC_PWM_CTL isn't one of the 'may be write-protected' registers AFAICS. Created attachment 34388 [details]
Output of blon script.
Created attachment 34389 [details]
Updated blon script
--- /tmp/blon.old 2010-03-24 11:22:27.895017153 +0900
+++ blon.20100324.sh 2010-03-24 11:20:12.526506894 +0900
@@ -1,4 +1,7 @@
#!/bin/sh
+
+(
+intel_reg_dumper
# PP_CONTROL, POWER_OFF
intel_reg_write 0x61204 0x0
sleep 1
@@ -13,3 +16,7 @@
intel_reg_write 0x61210 0x00270f02
# PP_CONTROL, POWER_ON
intel_reg_write 0x61204 0x1
+# setpci
+setpci -s 0:2.0 f4.l
+setpci -s 0:2.0 f4.l=0x000000ff
+) 2>&1 | tee /home/ranma/blon.log
> But anyway: > Modified intel_reg_dumper shows that BLC_PWM_CTL is indeed 0x00000000 when this > happens. My blon script worked and restored display. No Idea why this happens > though, as BLC_PWM_CTL isn't one of the 'may be write-protected' registers > AFAICS. Sorry for the late response. Yes. The register of BLC_PWM_CTL doesn't belong to the write_protected register. But the brightness will be affected by the POWER_ON/DELAY register. And from the script in comment #13/14 it seems that the system can work well again after you change the POWER_ON/DELAY register. And your proposal in comment #7 seems reasonable. The restore of some registers doesn't consider the write-protected register(E.g. POWER_ON/DELAY register). Can you try the patch in comment #10 and see whether the issue still can be reproduced? thanks. I applied the patch and am currently waiting for the issue to appear again, which sometimes happens three days in a row or even more than once a day and sometimes it doesn't trigger for two weeks... I had a 'resume to black' again today, despite the patch. Also, I used a minimized blon script, that just writes to BLC_PWM_CTL: intel_reg_write 0x61254 0x6c676c66 This sufficed in restoring the display. In addition to the https://bugs.freedesktop.org/attachment.cgi?id=34097 I also added my own debug patch, that shows old and new values of BLC_PWM_CTL in the suspend path. Partial log of this suspend/resume cycle: [277090.584016] i915_save_display(): BLC_PWM_CTL:6c676c66 [277090.584016] i915 0000:00:02.0: PCI INT A disabled [277090.604053] i915 0000:00:02.0: power state changed by ACPI to D3 [277090.654464] PM: suspend of devices complete after 1067.602 msecs [...] [277091.015055] PM: late suspend of devices complete after 309.562 msecs [277091.104009] ACPI: Preparing to enter system sleep state S3 [277091.508009] Extended CMOS year: 2000 [277091.508009] Back to C! [277091.508009] Extended CMOS year: 2000 [277091.508009] Force enabled HPET at resume [...] [277092.154172] PM: early resume of devices complete after 82.086 msecs [277092.340009] i915 0000:00:02.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16 [277092.347842] i915 0000:00:02.0: setting latency timer to 64 [277092.347842] i915_restore_display(): BLC_PWM_CTL=00000000 [277092.354103] i915_restore_display(): BLC_PWM_CTL:00000000 [...] [277094.704429] PM: resume of devices complete after 2543.795 msecs [277094.710432] Restarting tasks ... done. A working cycle looks the same except that it's: -i915_restore_display(): BLC_PWM_CTL=00000000 -i915_restore_display(): BLC_PWM_CTL:00000000 +i915_restore_display(): BLC_PWM_CTL=6c676c66 +i915_restore_display(): BLC_PWM_CTL:6c676c66 So it looks like dev_priv->saveBLC_PWM_CTL gets sometimes set to 0 for some reason... Created attachment 36019 [details] [review] suspend debug patch (In reply to comment #18) > Created an attachment (id=36019) [details] > suspend debug patch Hi, Tobias Thanks for you test. It is a useful discovery. From the info in comment #17 it seems that content of saved_PWM_CTL is spontaneously changed in course of suspend/resume. Not sure whether it is related with the hardware? Can you help to print more info related with saved register info in course of suspend/resume and see whether they are also changed in course of suspend/resume? As the content of saved_PWM_CTL is changed spontaneously in course of suspend/resume, can we downgrade the priority of this bug? Thanks. Yakui Assuming this still happens with 2.6.35-rc? Sorry for the delay. Jesse Barnes wrote: > Assuming this still happens with 2.6.35-rc? Yes, still happens on 2.6.35-rc3. ykzhao wrote: > Can you help to print more info related with saved register info in course > of suspend/resume and see whether they are also changed in course of > suspend/resume? I'm not sure how I can debug this further, it seems to be changed at a point between storing the old state and restoring it, where printks won't show up in syslog AFAIK. > As the content of saved_PWM_CTL is changed spontaneously in course of > suspend/resume, can we downgrade the priority of this bug? I suppose so, at least I have a reliable 'workaround' to turn it back on when it happens. :) > I'm not sure how I can debug this further, it seems to be changed at a point
> between storing the old state and restoring it, where printks won't show up in
> syslog AFAIK.
>
> > As the content of saved_PWM_CTL is changed spontaneously in course of
> > suspend/resume, can we downgrade the priority of this bug?
>
> I suppose so, at least I have a reliable 'workaround' to turn it back on when
> it happens. :)
It doesn't look like you included the whole log with your debug output here, but if dev_priv->saveBLC_PWM_CTL really is changing between the read and the restore there's one of two things happening:
1) there's another save going on that we don't see due to suspend/resume
2) you've got memory corruption
I can easily believe (1); it could be that we're racing with ACPI or the thinkpad backlight controller at suspend time, and we occasionally lose the race and grab the zeroed value some firmware code wrote out during the suspend process, then happily restore it later.
But (2) is also possible, have you tried running memtest on your machine recently?
(In reply to comment #22) > It doesn't look like you included the whole log with your debug output here, > but if dev_priv->saveBLC_PWM_CTL really is changing between the read and the > restore there's one of two things happening: > 1) there's another save going on that we don't see due to suspend/resume > 2) you've got memory corruption Could it be i915 suspend memory corruption that is the root cause here? Tobias, please retry with 2.6.35-rc6 which should fix that obnoxious bug and so provide a clean base for checking for other bugs. (In reply to comment #22) > It doesn't look like you included the whole log with your debug output here, I'll attach the complete dmesg buffer from the current resume, which also resumed to black. I didn't try memcheck, but I doubt it's broken main memory, since it should be unlikley that it will always zero exactly this address over different kernel versions and I haven't had any other problems. I.e. kernel compiles etc. run just fine. Created attachment 37360 [details]
dmesg from resume to black on 25th july 2010
> Could it be i915 suspend memory corruption that is the root cause here? Tobias,
> please retry with 2.6.35-rc6 which should fix that obnoxious bug and so provide
> a clean base for checking for other bugs.
I'll do that. Maybe I'm just unlucky enough that the memory corruption bug never hit any important memory except this...
Created attachment 37367 [details]
dmesg from resume to black on 25th july 2010 with 2.6.35-rc6
No luck with 2.6.35-rc6...
Thanks Tobias, useful to rule that out. This is concerning: [ 1968.612009] render error detected, EIR: 0x00000010 [ 1968.612009] page table error [ 1968.612009] PGTBL_ER: 0x00000001 [ 1968.612009] [drm:i915_report_and_clear_eir] *ERROR* EIR stuck: 0x00000010, masking [ 1968.612009] render error detected, EIR: 0x00000010 [ 1968.612009] page table error [ 1968.612009] PGTBL_ER: 0x00000001 2.6.35-rc6 has all known bugs fixed wrt to this particular render error! :-( Can you also grab a fresh intel_reg_dumper and then restart with drm.debug=0x4? Created attachment 37381 [details]
dmesg (normal suspend to memory and resume), 2.6.36-rc6 with drm.debug=0x4
Note that this render error during suspend/resume doesn't seem to have any adverse effects AFAICT
Does this still happen with 2.6.37? What about Linus's git tree or 2.6.38-rc2? (In reply to comment #31) > Does this still happen with 2.6.37? What about Linus's git tree or 2.6.38-rc2? I haven't seen the bug since upgrading to 2.6.37 a few days ago. (Instead I now have an alsa resume bug where it sometimes doesn't switch to headphones on plugging them in, but I haven't found a reliable trigger case yet... I'm not sure if that's a seperate bug or maybe still the same underlying memory corruption bug) So I'm inclined to says it's probably solved, but I cant say I'm 100% certain. Well, I'm always hopeful. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.