Description
rasmus
2015-03-05 23:01:39 UTC
Created attachment 114046 [details]
glxinfo output
Created attachment 114047 [details]
intel_reg_dumper output
Created attachment 114048 [details]
output of lspci -nn
Created attachment 114049 [details]
LIBGL_DEBUG=verbose start_furmark_windowed_1024x640.sh > stdout.txt 2> stderr.txt
Created attachment 114050 [details]
output of journalctl -b-1 -e _COMM=Xorg.bin after a crash
Created attachment 114051 [details]
xorg_crash2.txt output of journalctl -b-1 -e _COMM=Xorg.bin after another crash but with more debug kernel modules.
Created attachment 114052 [details]
journalctl -b -e _COMM=Xorg.bin of a system that has not crashed.
If it is of any relevance, I have discussed the issue in this thread on the interwebs: http://forum.thinkpads.com/viewtopic.php?f=70&t=116472. I tried to get a kdump following the Fedora wiki instructions, but nothing is saved to /var/dumps. Sorry. System reboot is a processor event. A GPU failure just kills the system - I have yet to hear of one that could cause a spontaneous reboot. I would suggest you try setting up netconsole. Created attachment 114076 [details]
netconsole output of crash
Hi Chris > System reboot is a processor event. A GPU failure just kills the system - I have yet to hear of one that could cause a spontaneous reboot. I don't understand most of what you are saying above. I'm a merely a *user* of software and hardware. I can record a video of the screen if that helps. > I would suggest you try setting up netconsole. I have attached the requested output now. Looks very, very suspicious. The reboot is not at the OS level, so down to firmware. Look at your BIOS settings and version. Created attachment 114077 [details]
output of dmidecode
Ups, I didn't delete serial number (warranty) from the dmidecode file. Could you delete it? Created attachment 114079 [details]
currently active bios settings
I attached my current bios settings. I skipped a couple of section of the bios, but hopefully the info you need is there. Created attachment 114080 [details]
dmidecode output
Chris,
> The reboot is not at the OS level, so down to firmware.
Just so I know how to proceed. How should I interpret the above statement? Should I try to get in touch with Lenovo engineers? (I don't know that they have got any open channels).
Again, there's no issue on W7, which is why I suspected the Linux drivers.
I do not think this is a firmware bug. Rather, I think it's a bug in Linux or Xorg+friends. I have run the test specified in "2 Reproduce steps" with Fedora 20 and Fedora 19 (ISO, no updates). In Fedora 20 the problem is present. In Fedora 19 I ran the test for approximately 3 hours without any reboots. Of course that does not mean that the bug isn't present, but in Fedora {20, 21} and Arch the reboot usually occurs within 10 minutes. Your dmesg does not show a controlled shutdown. A GPU hang, even a lowlevel hardware hang, should not result in the machine rebooting. You dmesg does show that the kernel disagrees with the ACPI firmware implementation and that its actively managing the thermal throttling. At this point, your best bet is to bisect the kernel and see where that leads. So to understand: your claim is that there's no bug in the Intel drivers, but there's a bug in Linux? By now Fedora 19 (iso version) has been running for 8 hours. So almost certainly something was introduced after Fedora 19 that causes the reboots. Also, how would I bisec the kernel in this case? The error involves a pretty big crash. I would appreciate hints on how to write a bisec program that would involve (a) potential reboots; and (b) upgrading the kernel. > At this point, your best bet is to bisect the kernel and see where that leads.
The bug does not seem to be present in Linux 3.9 (the system ran Furmark for 18 hours). In Linux 3.10 the system crashed within 10 minutes. The rest of the Xorg-stack was the current one (from Arch repos).
Perhaps this is not a driver issue after all. I guess I will try to open a bug report with Linux, though "between 3.9 and 3.10" is still terribly inaccurate...
Any idea where in Linux the bug might be? So that I can pass it on to the right maintainer? Reported on the Linux bugzilla here: https://bugzilla.kernel.org/show_bug.cgi?id=94551 |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.