Bug 104401 - GPU HANG: ecode 7:0:0xab0f77f9, in Xorg [1348], reason: Hang on rcs0, action: reset
Summary: GPU HANG: ecode 7:0:0xab0f77f9, in Xorg [1348], reason: Hang on rcs0, action:...
Status: RESOLVED WORKSFORME
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium major
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: Triaged
Keywords:
Depends on:
Blocks:
 
Reported: 2017-12-28 10:03 UTC by Luca Bonissi
Modified: 2019-07-02 06:11 UTC (History)
3 users (show)

See Also:
i915 platform: BYT
i915 features: GPU hang


Attachments
dmesg (51.20 KB, text/plain)
2017-12-28 10:03 UTC, Luca Bonissi
no flags Details
Crash dump (44.73 KB, text/plain)
2017-12-28 10:04 UTC, Luca Bonissi
no flags Details
Test case - disabling hpd polling (5.11 KB, patch)
2018-01-05 11:45 UTC, Luca Bonissi
no flags Details | Splinter Review
dmesg with drm debug (315.63 KB, application/octet-stream)
2019-04-06 12:27 UTC, Luca Bonissi
no flags Details

Description Luca Bonissi 2017-12-28 10:03:43 UTC
Created attachment 136422 [details]
dmesg

Starting from kernel 4.7 (I tried 4.7.4, 4.9.40, 4.11.12, 4.13.0-rc4, 4.14.0-rc6), the whole system freeze, at least 1 time per day (usually it freeze during the night when the system is idle). Only 1 time I successfully got the "GPU HANG" on dmesg.

The last known good kernel is 4.6.7.

The hang does not occur if I disable i915 kernel module.

Xorg version is 1.18.3.
Comment 1 Luca Bonissi 2017-12-28 10:04:36 UTC
Created attachment 136423 [details]
Crash dump
Comment 2 Elizabeth 2017-12-28 15:13:02 UTC
Hello Luca, could you bisect the issue?
Comment 3 Luca Bonissi 2018-01-05 11:45:45 UTC
Created attachment 136567 [details] [review]
Test case - disabling hpd polling

Hi Elizabeth,
  the hang/bug appears from 4.7.3, when hpd polling was introduced.

I disabled the hpd polling in 4.7.3 and 4.14.11: since now no more hangs.

On my platform (HP 250 G3 Notebook) it seems no problem are raised by disabling it, even pm-suspend works fine (but I did not try with external monitor).

Is there any other test I could perform to better address the problem?

Thank you!
  Luca
Comment 4 Jani Saarinen 2018-03-29 07:11:35 UTC
First of all. Sorry about spam.
This is mass update for our bugs. 

Sorry if you feel this annoying but with this trying to understand if bug still valid or not.
If bug investigation still in progress, please ignore this and I apologize!

If you think this is not anymore valid, please comment to the bug that can be closed.
If you haven't tested with our latest pre-upstream tree(drm-tip), can you do that also to see if issue is valid there still and if you cannot see issue there, please comment to the bug.
Comment 5 Jani Saarinen 2018-04-25 10:56:17 UTC
Closing, please re-open is issue still exists.
Comment 6 Luca Bonissi 2019-03-27 21:53:08 UTC
Bug still present in 5.0.4.
Latest attached patch for 4.14.11 (disabling HPD polling) works without any update.
Comment 7 Lakshmi 2019-03-28 08:12:31 UTC
(In reply to Luca Bonissi from comment #6)
> Bug still present in 5.0.4.
> Latest attached patch for 4.14.11 (disabling HPD polling) works without any
> update.

Can you please attach the latest dmesg (kernel 5.0.4) from boot with kernel parameters drm.debug=0x1e log_buf_len=4M?

Also, Can you attach the GPU crash dump file?
Comment 8 Luca Bonissi 2019-04-06 12:27:03 UTC
Created attachment 143885 [details]
dmesg with drm debug
Comment 9 Luca Bonissi 2019-04-06 12:28:47 UTC
(In reply to Lakshmi from comment #7)
> Can you please attach the latest dmesg (kernel 5.0.4) from boot with kernel
> parameters drm.debug=0x1e log_buf_len=4M?

Done, one just after boot, and the other one after some time (in the while DPMS was active)
 
> Also, Can you attach the GPU crash dump file?

Sorry, but I got GPU crash dump only one time: the other times the system was totally frozen...
Comment 10 Lakshmi 2019-04-10 10:33:03 UTC
(In reply to Luca Bonissi from comment #8)
> Created attachment 143885 [details]
> dmesg with drm debug

From the attached logs, I don't see GPU hang messages. Attached crash dump is from kernel 4.13.

So, the system hangs/frozen with latest kernel?
Comment 11 Luca Bonissi 2019-04-10 21:29:20 UTC
(In reply to Lakshmi from comment #10)
>
> From the attached logs, I don't see GPU hang messages. Attached crash dump
> is from kernel 4.13.
> 
> So, the system hangs/frozen with latest kernel?

From kernel 4.7.3 the system hangs/frozen nearly one a day (it usually hangs during the night, when DPMS switched off the monitor), so it hangs also with the latest kernel 5.0.4.

Unfortunately, only 1 time (with kernel 4.13) the system did not freeze and I could get the GPU crash dump.

Also another system, with Intel N2840 CPU and the same integrated GPU, suffers the same problem (I lost some audio recording data due to the bug....). If I disabled DPMS, no problems occurred.

Anyway, the problem is identified somewhere in the "HPD polling" routines, introduced in kernel 4.7.3: if I remove these routines, the system works fine without any hangs (neither GPU nor totally).
It seemed something with mutex (try to lock when already locked...), but I am not a mutex/GPU expert....
Comment 12 Lakshmi 2019-05-02 12:31:29 UTC
Can you verify with Kernel 5.1 and attach logs if the hang occurs again?
Comment 13 Lakshmi 2019-05-29 09:52:37 UTC
Can you please try to reproduce the issue with drm-tip (https://cgit.freedesktop.org/drm-tip)
If persists on drmtip, please upload the dmesg and crash dump file.
Comment 14 Lakshmi 2019-07-02 06:11:25 UTC
No feedback from more than a  month, closing as resolved works for me.
Please re-open this issue if persists with latest drm-tip https://cgit.freedesktop.org/drm-tip and send dmesg from boot with kernel parameters drm.debug=0x1e log_buf_len=4M?

Also attach the GPU crash dump file.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.