Summary: | [855GM KMS] freezes on suspend to RAM | ||
---|---|---|---|
Product: | DRI | Reporter: | Ferenc Wágner <wferi> |
Component: | DRM/Intel | Assignee: | Jesse Barnes <jbarnes> |
Status: | CLOSED FIXED | QA Contact: | |
Severity: | normal | ||
Priority: | medium | CC: | diegoe, jbarnes, shiningxc |
Version: | unspecified | ||
Hardware: | x86 (IA32) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Description
Ferenc Wágner
2009-06-06 13:32:35 UTC
Created attachment 26500 [details]
my 2.6.30-rc7 kernel config
Hi, Ferenc Do you mean that the suspend/resume can work well with KMS disabled? But it can't work with KMS enabled. Right? Will you please double check it again? Will you please do the following test with KMS enabled? a. set the "CONFIG_PM_DEBUG" in kernel configuration b. echo cores > /sys/power/pm_test c. echo mem > /sys/power/state and see whether it can be resumed.(It is unnecessary to press the power button to wake up the system) d. please also echo processors/platform/devices/freezer to pm_test and re-do the test again. Thanks. > Do you mean that the suspend/resume can work well with KMS disabled? But it > can't work with KMS enabled. Right? Will you please double check it again? Yes, exactly, but for suspend to RAM only. Suspend to disk works all right in each case. I rechecked it with current rc8 from git. > Will you please do the following test with KMS enabled? > a. set the "CONFIG_PM_DEBUG" in kernel configuration > b. echo cores > /sys/power/pm_test > c. echo mem > /sys/power/state and see whether it can be resumed.(It is > unnecessary to press the power button to wake up the system) > d. please also echo processors/platform/devices/freezer to pm_test and > re-do the test again. Sure. It doesn't matter if I write cores, processors or platform into pm_test, the machine just freezes hard all the same. But if I write devices or freezer, I get the prompt back in a couple of seconds. I enabled PM_TRACE_RTC as well, but it didn't even manage to clobber my RTC. To stress it again, the problem is not resume, but suspend. The machine freezes on suspend, it does not manage to power down. Thanks, Feri. Created attachment 27324 [details] [review] restore the modeset for every activated crtc Will you please try the attached patch on the latest linus git tree and see whether the issue still exists? Thanks. I was on holiday, and now I noticed that this patch is already present in the current Linus git tree. modprobe i915 modeset=1 now gives some render errors as the attached annotated kernel logs show (first I loaded i915 without modeset=1, but it does not matter). Even if I load i915 with modeset=1 for the first time, the freeze on suspend happens all the same as before. Created attachment 27806 [details]
Kernel log of a full KMS bootup (with messages of other modules elided)
Created attachment 27836 [details] [review] try the debug patch which ignores the drm class for the connector device Will you please try the debug patch on the latest kernel and see whether this issue still exists? Thanks. This debug patch on current git (actually the same as in the previous test) does not make any difference, unfortunately. (In reply to comment #8) > This debug patch on current git (actually the same as in the previous test) > does not make any difference, unfortunately. Sorry for the late response. Will you please do the following test and confirm whether the suspend/resume can work well under console mode? Of course the KMS should be enabled. 1. echo mem > /sys/power/state; dmesg >dmesg_after; sync; 2. press the power button and see whether it can be resumed. 3. If it can't be resumed, please reboot the system and check whether the file of "dmesg_after" is created. Thanks. > (In reply to comment #9) > Will you please do the following test and confirm whether the suspend/resume > can work well under console mode? Of course the KMS should be enabled. > 1. echo mem > /sys/power/state; dmesg >dmesg_after; sync; > 2. press the power button and see whether it can be resumed. > 3. If it can't be resumed, please reboot the system and check whether the > file of "dmesg_after" is created. Hi Yakui, I think you mixed up bug reports: this bug is about a failure to suspend from console mode. I've used the command you propose above from the very beginning, as you can read in the bug description. As such, I can't even attempt to resume, since the machine freezes before suspending itself. The issue can be reproduced from initramfs even, before X or any user space thing has a chance to mess with the system. Regards, Feri. 2.6.31-rc5 still doesn't suspend on echo mem >/sys/power/state but freezes instead of powering down. Still the same under 2.6.31-rc8. :( Will you please try the following patch from Chris Wilson and see whether the issue still exists? Patch: drm/i915: Check that the relocation points to within the target http://lists.freedesktop.org/archives/intel-gfx/2009-September/004243.html thank Hi, I tested Chris' patch on top of 2.6.31. Unfortunately, no effect, the system freezes as usual. I'm trying to find out how far it gets, whether it leaves i915_suspend() at all. It it possible to get anything on screen after pci_disable_device(dev->pdev) and pci_set_power_state(dev->pdev, PCI_D3hot);? Or should the screen only show whatever was on it right before invoking those functions? Or should it turn blank if those were successful? There's no serial port in this laptop, so I've got to find some other means to get feedback... (In reply to comment #14) > Hi, > > I tested Chris' patch on top of 2.6.31. Unfortunately, no effect, the system > freezes as usual. I'm trying to find out how far it gets, whether it leaves > i915_suspend() at all. It it possible to get anything on screen after > pci_disable_device(dev->pdev) and pci_set_power_state(dev->pdev, PCI_D3hot);? > Or should the screen only show whatever was on it right before invoking those > functions? Or should it turn blank if those were successful? There's no > serial port in this laptop, so I've got to find some other means to get > feedback... Sorry that I mix up this bug with other bugs. The issue on this box is that it can't be resumed from S3 when in KMS mode. Right? Will you please do the following test under console mode? 1. echo mem > /sys/power/state; dmesg >dmesg_after; sync; 2. press the power button and see whether it can be resumed. 3. If it can't be resumed, please reboot the system and check whether the file of "dmesg_after" is created. It is noted that the above test should be done on KMS/UMS mode. > (In reply to comment #15) > Sorry that I mix up this bug with other bugs. The issue on this box is that it > can't be resumed from S3 when in KMS mode. Right? Actually, no, see my comment #10. This system freezes *during* S3 suspend when in KMS mode. That is, it does not enter S3, instead it freezes with Shutting down device i915 0000:00:02.0: PCI INT A disabled on the screen, with i915_suspend() in drivers/gpu/drm/i915/i915_drv.c modified as [...] if (state.event == PM_EVENT_SUSPEND) { DRM_INFO("Shutting down device\n"); pci_disable_device(dev->pdev); pci_set_power_state(dev->pdev, PCI_D3hot); } DRM_INFO("Leaving i915_suspend\n"); [...] All this happens with 100% reproducibility right after booting, from the initramfs, with nothing else but the intel_agp and the i915 module loaded. If I don't specify modeset=1 for the latter, suspend and resume works OK. Feri. (In reply to comment #16) After hooking up some LEDs to the parallel port, I was able to determine that under 2.6.31 the freeze happens in drivers/acpi/acpica/hwsleep.c, function acpi_enter_sleep_state_prep, line 176, that is /* Run the _PTS method */ status = acpi_evaluate_object(NULL, METHOD_NAME__PTS, &arg_list, NULL); does not return if I load i915 with modeset=1. I compiled with CONFIG_ACPI_DEBUG=y, but the framebuffer console is already off by the time we get here, so I wasn't able to extract any info from that. Not that I understand ACPI the least. I wonder if NETCONSOLE or LP_CONSOLE could help here (this ThinkPad R50e has no serial port) or would those be suspended regardless of no_console_suspend before the suspend procedure got to the interesting part... Thanks, Feri. > I wonder if NETCONSOLE or LP_CONSOLE could help here (this ThinkPad R50e has no > serial port) or would those be suspended regardless of no_console_suspend > before the suspend procedure got to the interesting part... Thanks for your finding. Will you please attach the output of acpidump on your box? The latest acpidump tool(pmtools-20071116) can be found in: http://www.kernel.org/pub/linux/kernel/people/lenb/acpi/utils/ > > Thanks, > Feri. > Created attachment 30231 [details] acpidump output Compressed acpidump output attached, as requested by comment #18. This problem persists under 2.6.32-rc6 as well. ping. I am another unfortunate owner of a r50e. The issue still exists on 2.6.32, and also drm-intel-next of today (based on 2.6.33-rc5). @Ferenc: Did you lose hope ? :) And did you get to try netconsole ? > @Ferenc: Did you lose hope ? :) Yes, pretty much. Rafael Wysocki called this a "BIOS issue" in http://article.gmane.org/gmane.linux.acpi.devel/42754, but it looks like nobody is interested in fixing this. Nothing forces me to use KMS right now, so I can live with this, but I'm afraid this will change. :( > And did you get to try netconsole? I had no reason to do so. As things stand, I really don't know what I could do to get this issue fixed. Should I lobby IBM for a BIOS upgrade? Should I keep nagging the ACPI or the Intel folks? Unfortunately, I know very little about ACPI, and haven't got the time to dive in either. Too bad that such an easily reproducable bug gets no attention. My hibernation related phantom bug is much more popular, although totally unreproducible... I just don't get it. (In reply to comment #22) > > @Ferenc: Did you lose hope ? :) > > Yes, pretty much. Rafael Wysocki called this a "BIOS issue" in > http://article.gmane.org/gmane.linux.acpi.devel/42754, but it looks like nobody > is interested in fixing this. Nothing forces me to use KMS right now, so I can > live with this, but I'm afraid this will change. :( > Actually the last ddx release dropped UMS. And my distrib already uses that : http://www.archlinux.org/news/484/ Upstream forces/wants us to use KMS by dropping UMS support. > > Too bad that such an easily reproducable bug gets no attention. My hibernation > related phantom bug is much more popular, although totally unreproducible... I > just don't get it. > So all the following discussion is about hibernation, not suspend to ram ? I read a suggestion to follow up on suspend-devel. Did this happen ? (In reply to comment #23) > (In reply to comment #22) >>> @Ferenc: Did you lose hope ? :) >> >> Yes, pretty much. Rafael Wysocki called this a "BIOS issue" in >> http://article.gmane.org/gmane.linux.acpi.devel/42754, but it looks like nobody >> is interested in fixing this. Nothing forces me to use KMS right now, so I can >> live with this, but I'm afraid this will change. :( > > Actually the last ddx release dropped UMS. And my distrib already uses that : > http://www.archlinux.org/news/484/ > > Upstream forces/wants us to use KMS by dropping UMS support. Yes, that's what I expected. We're trapped. >> Too bad that such an easily reproducable bug gets no attention. My hibernation >> related phantom bug is much more popular, although totally unreproducible... I >> just don't get it. > > So all the following discussion is about hibernation, not suspend to ram ? Yes, that thread is about kernel bug #14504, I just mentioned this problem there. > I read a suggestion to follow up on suspend-devel. Did this happen ? I'm not sure which one you mean, the thread was quite widely crossposted... Suspend-devel was also involved on several occasions. (In reply to comment #24) > > I read a suggestion to follow up on suspend-devel. Did this happen ? > > I'm not sure which one you mean, the thread was quite widely crossposted... > Suspend-devel was also involved on several occasions. > Sorry, I just misread the mail, it was only about white-listing the r50e for s2ram in UMS mode. So the only clue we have is Rafael Wysocki calling it a "BIOS issue", that's not much. How can we get more information and details about what that BIOS issue is ? Should we bug Rafael again or where are the experts in this domain ? And why is this BIOS issue only triggered in KMS mode ? We don't know for a fact that getting KMS suspend to work with that BIOS is impossible, do we ? (In reply to comment #25) > So the only clue we have is Rafael Wysocki calling it a "BIOS issue", that's > not much. How can we get more information and details about what that BIOS > issue is? Should we bug Rafael again or where are the experts in this domain? Bugging Rafael is certainly an option, but I didn't want to hijack that thread too much and I also felt like I bugged him too much already. :) I guess the Linux ACPI list would be the most appropriate place to ask about this, but they were crossposted if I remember correctly, and nobody spoke up. Maybe a direct question would result in some answer, it's definitely worth a try. I just didn't find the time for this yet. > And why is this BIOS issue only triggered in KMS mode? We don't know for a > fact that getting KMS suspend to work with that BIOS is impossible, do we? Certainly not. KMS is not a machine state, it's rather an alternative path to get there (besides UMS), as I understand it. However, if you suspend from UMS, the screen is reset to the standard VGA text mode, while if you suspend from KMS it does not. This isn't the problem in itself, though: I'm almost sure that suspend with a framebuffer console worked for me. And that seemed at least visually equivalent to the KMS console. So your question about why the issue is triggered under KMS is very important, and the answer might well be the solution to this bug. But I'm mostly speculating here. Hi, I followed the bug report guide and got it to suspend/resume only on 'devices' and 'freezer' @pm_test. core, platform, processors, all failed and stuck. Obviously I share the annoyance of completely broken suspend, it hangs in "suspending consoles..." when suspending. Suspend/resume works ok without KMS in X (well, until they driver still had UMS support). It works ok also without X when KMS is not enabled. I got the dmesg after|before logs for all the cases of pm_test. Which ones should I include? Please let me help since I really miss suspend in my laptop :) My system info: -- chipset: 855GM -- system architecture: 32-bit -- xf86-video-intel: 3d4b3f257fbbb69c6f236d9803abe54a90d7d434 (master up to today march 18) -- xserver: X.Org X Server 1.7.5.902 (1.7.6 RC 2) -- mesa: 7.7.1-DEVEL (debian 7.7-4) -- libdrm: 2.4.18 (debian 2.4.18-2) -- kernel: 2.6.33-2-686 -- Linux distribution: Debian unstable+experimental -- Machine or mobo model: Thinkpad R50e -- Display connector: LVDS By the way, Rafael asked me to open a bug on bugzilla.kernel.org so I did : https://bugzilla.kernel.org/show_bug.cgi?id=15322 As the first line of the original report said, it's a pure kernel problem after all :) I've just posted some fresh news to the kernel bug 2 minutes ago. This bug can be worked around with a custom DSDT. Assuming this old bug is fixed now. In case it's not fixed, I'll let Chris look at it. :) (In reply to comment #30) > In case it's not fixed, I'll let Chris look at it. :) Hi Jesse, unfortunately it is still broken :(. Last I checked the DSDT fix for the kernel helped but not really much, I got it to suspend a few times and then it broke again: https://bugzilla.kernel.org/show_bug.cgi?id=15322 I tested again with 2.6.34 (no dsdt fix) and intel master but no luck. I can't test *right now* but I bet it's still broken. I'm up for helping to debug, although I know that doesn't help a lot. I just upgraded to 35-rc5, the problem is still there, systematic hard freeze on every suspend to ram. But the dsdt workaround still works perfectly for me, I just suspended 5 times in a row, it suspends and resumes very fast and reliably in my testing, better than other laptops. Created attachment 37130 [details] [review] never disable pipe a once enabled This patch will leave pipe a enabled, which may prevent the hang. The BIOS is probably touching something on the GPU that needs to be on; hopefully this will be sufficient. Thanks a lot to Jesse Barnes, this pipea trick fixes the issue for me ! Ferenc or Diego, can you confirm ? Created attachment 37132 [details]
dmidecode output
as required by jbarnes
jbarnes said : "dmidecode provides a bunch of machine specific info we can use to key the quirk from." Diego and Ferenc : could you also attach your dmidecode output, so that we are sure the quirk will be applied on your systems ? Created attachment 37136 [details] [review] port old pipea force quirks to kernel Does this patch also make things work for you? (In reply to comment #37) > Created an attachment (id=37136) [details] > port old pipea force quirks to kernel > > Does this patch also make things work for you? Jesse. I'm running a patched kernel with this last patch and it seems to be working ok. I'll try a few more times before giving a full OK but if you say it's an old quirk, likely it's what was missing. Xavier? Ah right, I already tested that last patch yesterday and confirmed to Jesse it was still working, but I said that on irc, not here :) Ok I'll clean up that patch and submit it upstream. Thanks for testing. (In reply to comment #40) > Ok I'll clean up that patch and submit it upstream. Thanks for testing. I tested the "port old pipea force quirks to kernel" on 2.6.35-rc5, and can also confirm that it fixes suspend to RAM. Thank you very much! I'm attaching my dmidecode output as requested. It would be fantastic if this fix could also be added to the long term supported stable version (2.6.32) of the Linux kernel if at all possible, so distros could benefit from it. Created attachment 37235 [details]
dmidecode output from another affected ThinkPad R50e
Review of attachment 37136 [details] [review]: You probably know, but your intel_quirks[] array is redundant: the last stanza shadows several previous entries. (In reply to comment #37) > Created an attachment (id=37136) [details] > port old pipea force quirks to kernel > > Does this patch also make things work for you? Well, at first thanks for your work guys! :) I found this here with a comment in bug https://bugzilla.kernel.org/show_bug.cgi?id=14640. I'm sorry, I feel really uncomfortable when asking this, but, how to apply this patch? Beeing in my Lucid-git-tree on my hd I tried to $ /media/datenplatte/ubuntu-lucid$ patch -Np1 -i i915_suspend_fix.patch , and it works partly, but the first part of the patch fails, here's the terminal-output: patching file drivers/gpu/drm/i915/i915_drv.h Hunk #1 FAILED at 222. Hunk #2 succeeded at 305 with fuzz 2 (offset -31 lines). 1 out of 2 hunks FAILED -- saving rejects to file drivers/gpu/drm/i915/i915_drv.h.rej patching file drivers/gpu/drm/i915/intel_display.c Hunk #1 succeeded at 2015 (offset -240 lines). Hunk #2 succeeded at 2035 (offset -240 lines). Hunk #3 succeeded at 4738 (offset -739 lines). Hunk #4 succeeded at 4832 (offset -739 lines). I'm sorry for my question, but please, could anybody explain what I'm doing wrong or what to do with that first section of the patch, so I could manually add it (it's just a few lines I think, but I don't really know where to put them in the original file). Thanks in advantance, really looking forward that this will fix the problem for me, too :) Created attachment 37348 [details]
i915_drv.h.rej
sorry, forgot to add the *.rej - file, maybe this helps :)
(In reply to comment #45) > Created an attachment (id=37348) [details] > i915_drv.h.rej > > sorry, forgot to add the *.rej - file, maybe this helps :) Well, I maybe should add, obviously this patch is intend to be used with 2.6.35-rc6, but I'd like to know how to port it back to 2.6.33, because -I don't know exactly why- but with kernels newer than 2.6.33 my thinkpad-audio stopped working =/ And, 5 min. ago, I tested a kernel with your patch J. Barnes, and I'm happy I can confirm it's working, tested with an IBM thinkpad R51-2888 with an Intel 855GM-GPU. Thanks a lot for your work and I'm looking forward this patch will made it's way to the kernel :) author Jesse Barnes <jbarnes@virtuousgeek.org> Mon, 19 Jul 2010 20:53:12 +0000 (13:53 -0700) committer Eric Anholt <eric@anholt.net> Mon, 26 Jul 2010 19:00:43 +0000 (12:00 -0700) commit b690e96cf9e6a6cde6f0393de47bdd6317ddb5de tree 8438bf5540d4f71d0fcc8b6acb8bf472780e4579 tree | snapshot parent 0cc4d4300c28d5c3fc73e5ec91bfd4b0c2c744af commit | diff drm/i915: add pipe A force quirks to i915 driver Ported over from the old UMS list. Unfortunately they're still necessary especially on older laptop platforms. Fixes https://bugs.freedesktop.org/show_bug.cgi?id=22126. Tested-by: Xavier <shiningxc@gmail.com> Tested-by: Diego Escalante Urrelo <diegoe@gnome.org> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Eric Anholt <eric@anholt.net> (In reply to comment #47) > author Jesse Barnes <jbarnes@virtuousgeek.org> > Mon, 19 Jul 2010 20:53:12 +0000 (13:53 -0700) > committer Eric Anholt <eric@anholt.net> > Mon, 26 Jul 2010 19:00:43 +0000 (12:00 -0700) > commit b690e96cf9e6a6cde6f0393de47bdd6317ddb5de > tree 8438bf5540d4f71d0fcc8b6acb8bf472780e4579 tree | snapshot > parent 0cc4d4300c28d5c3fc73e5ec91bfd4b0c2c744af commit | diff > drm/i915: add pipe A force quirks to i915 driver > > Ported over from the old UMS list. Unfortunately they're still > necessary especially on older laptop platforms. > > Fixes https://bugs.freedesktop.org/show_bug.cgi?id=22126. > > Tested-by: Xavier <shiningxc@gmail.com> > Tested-by: Diego Escalante Urrelo <diegoe@gnome.org> > Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> > Signed-off-by: Eric Anholt <eric@anholt.net> Well, after a short time with a kernel with this patch included in use, I'm facing a new problem: My Thinkpad is crashing suddendly while playing espacially flash videos (e.g. youtube, zshare and similar). I can't really say that the patch causes this issue, because I switched for the patch directly from the ubuntu-stable-kernel 2.6.32 up to 2.6.35. A kind of annoying, the only thing left is a hard-reboot with ALT+MAGICSYSRECOVERY, I'd like to know if anyone here is facing the same issue? But in fact, the patch works, standby works again flawlessly,thanks James :) Just a short info: This patch breaks external monitor support on certain notebooks (including mine, unfortunately). For more details please have a look at the following bug report: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/796030 Should I reopen this bug? |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.