Bug 92464 - 8086:0f31_freezes_totally
Summary: 8086:0f31_freezes_totally
Status: CLOSED MOVED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: x86-64 (AMD64) Linux (All)
: medium major
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-10-14 17:11 UTC by Chris Rainey
Modified: 2016-05-04 09:20 UTC (History)
3 users (show)

See Also:
i915 platform: BYT
i915 features:


Attachments
DMESG from 4.3.0-040300rc5-generic using drm.debug=14 boot parameter (31 bytes, text/plain)
2015-10-14 17:11 UTC, Chris Rainey
no flags Details
[ CORRECTED ] DMESG from 4.3.0-040300rc5-generic using drm.debug=14 boot parameter (114.56 KB, text/x-log)
2015-10-14 17:15 UTC, Chris Rainey
no flags Details
DMESG from 3.19.0-31-generic using drm.debug=14 boot parameter (131.55 KB, text/plain)
2015-10-21 17:16 UTC, Chris Rainey
no flags Details

Description Chris Rainey 2015-10-14 17:11:34 UTC
Created attachment 118875 [details]
DMESG from 4.3.0-040300rc5-generic using drm.debug=14 boot parameter

Total lockup(cannot even switch to console via Alt-F1, F2, etc.).

Bug is most easily reproducible via Chromium browser when opening multiple background-tabs(middle-button(wheel) mouse-clicks) and when using ALT-TAB to switch between browser and Xterms, etc.

Very difficult to pin due to randomness(i.e. system may run for 30-seconds, 30-minutes or 30-hours before freeze). However -- it rarely runs stable for more that 24-48/hrs without a lockup.


*** Attaching DMESG with drm.debug=14
Comment 1 Chris Rainey 2015-10-14 17:15:09 UTC
Created attachment 118876 [details]
[ CORRECTED ]  DMESG from 4.3.0-040300rc5-generic using drm.debug=14 boot parameter

Sorry for not correctly attaching to first comment.
Comment 2 Jani Nikula 2015-10-15 09:16:43 UTC
Comment on attachment 118875 [details]
DMESG from 4.3.0-040300rc5-generic using drm.debug=14 boot parameter

>(Nothing has been logged yet.)
Comment 3 Jani Nikula 2015-10-15 09:17:52 UTC
(In reply to Jani Nikula from comment #2)
> Comment on attachment 118875 [details]
> DMESG from 4.3.0-040300rc5-generic using drm.debug=14 boot parameter
> 
> >(Nothing has been logged yet.)

Where did *that* come from? Please disregard.

Can you ssh into the box when it locks up? Any chance to get dmesg when that happens?
Comment 4 Chris Rainey 2015-10-15 15:20:44 UTC
(In reply to Jani Nikula from comment #3)
> (In reply to Jani Nikula from comment #2)
> > Comment on attachment 118875 [details]
> > DMESG from 4.3.0-040300rc5-generic using drm.debug=14 boot parameter
> > 
> > >(Nothing has been logged yet.)
> 
> Where did *that* come from? Please disregard.
> 
> Can you ssh into the box when it locks up? Any chance to get dmesg when that
> happens?

Yeah ... *weird* --- I did a direct attach of my /var/log/dmesg and that was what was in it(!?!?). Am I able to delete bad attachments like that or wait for you to do it?

I'll switch over my daily work to another machine and then let this one run till lockup and see if I can SSH into it and pull a _real_ dmesg from it.
Comment 5 Jani Nikula 2015-10-16 11:24:52 UTC
(In reply to Chris Rainey from comment #4)
> Yeah ... *weird* --- I did a direct attach of my /var/log/dmesg and that was
> what was in it(!?!?). Am I able to delete bad attachments like that or wait
> for you to do it?

Heh, actually I was referring to my comment #2, I did not intend to make that comment at all!

The attachment list has "Details" link, which has an "edit details" link, where you can check "obsolete" box, and submit. At least with whatever permissions I have. The attachment doesn't get deleted, but will be hidden from the default view.

> I'll switch over my daily work to another machine and then let this one run
> till lockup and see if I can SSH into it and pull a _real_ dmesg from it.

That would be great, thanks.
Comment 6 Chris Rainey 2015-10-21 17:16:38 UTC
Created attachment 119045 [details]
DMESG from 3.19.0-31-generic using drm.debug=14 boot parameter
Comment 7 Chris Rainey 2015-10-21 17:18:26 UTC
Just got my first 'total-freeze' after switching my daily workload to another machine.

I was _NOT_ able to SSH into the machine while it was frozen. I did confirm that I could SSH in _before_ the freeze.

In case it is of any use, I'm pasting in the last log entry to /var/log/syslog before cold-restart of machine was required:

... <snip>

Oct 21 11:55:04 CKR-DKM kernel: [ 2023.105714] [drm:drm_mode_addfb2] [FB:66]
Oct 21 12:02:12 CKR-DKM rsyslogd: [origin software="rsyslogd" swVersion="7.4.4" x-pid="674" x-info="http://www.rsyslog.com"] start

<snip> ...


BTW: this is using an earlier(stock) kernel since I decided to do a fresh(clean) reinstall of the OS(Ubuntu 15.04) for added purity of test:

chris@CKR-DKM:~$ uname -a
Linux CKR-DKM 3.19.0-31-generic #36-Ubuntu SMP Wed Oct 7 15:04:02 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

I'd be glad to grab the latest daily upstream mainline kernel build or the latest daily upstream drm-nightly-intel kernel if you would like.

I'm also attaching the current(running) dmesg.out with drm.debug=14 for your consideration.

Thank You!
Comment 8 Chris Rainey 2015-10-29 21:27:25 UTC
It looks like my issue is a combination of i915 and intel_pstates.

I'm following the thread at this bug: https://bugs.freedesktop.org/show_bug.cgi?id=88012

After switching my DELL Inspiron 3646 to the 3.16 kernel, I've had little to no trouble(even stressing the system using: glmark2 --run-forever).

I got my 3.16 kernel here: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.16.7-ckt18-utopic/

This is on the new Ubuntu 15.10 release.

I hope this helps !!
Comment 9 gfl3162+xbugzilla 2015-11-18 23:17:52 UTC
I am having the same issue on an Asus X205TA laptop with the Intel Atom Z3735F SOC and the same graphics hardware (8086:0f31).

A few more observations (on a vanilla 4.3 kernel, with compositor enabled):

- Setting the sysctl kernel.nmi_watchdog to enable to NMI watchdog does not detect the lockup.
- It appears that the kernel is completely locked up. I have a USB sound card that lights up when enumerated by the kernel, but that does not occur when the freeze occurs. As such, trying to SSH into the system does not work, and system logs do not show any traces. Setting kernel.panic and kernel.panic_on_oops also does not automatically reboot the system, suggesting that neither a panic nor an oops occurs when the freeze occurs.
- While in console mode (Ctrl-Alt-F1), the screen flickers every now and then. Not sure if related.
- The freeze occurs when using the "intel" and "modeset" xorg drivers. Interestingly enough, the freeze is /completely/ mitigated when using the "fbdev" video driver. Performance is not optimal, but I have been using it as a workaround.
Comment 10 Chris Rainey 2015-12-03 16:50:50 UTC
Confirming that "intel_idle.max_cstate=1" has solved my complete freeze issues on Bay Trail running Linux 4.1.13(Slackware64-current(pre-4.2) formerly running Ubuntu 15.04/15.10 with stock kernels).

Thanx for all the hard-work and long-efforts to see this through!


*** PLEASE CONSIDER MARKING THIS AS A DUPLICATE OF: 

https://bugs.freedesktop.org/show_bug.cgi?id=88012
Comment 11 Chris Wilson 2016-05-04 09:20:55 UTC
Dupe of https://bugzilla.kernel.org/show_bug.cgi?id=109051


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.