Created attachment 140080 [details]
dmesg and /sys/class/drm/card0/error
My GIGABYTE Q21B/Q21B UI is always broken. I found some GPU hangs error from dmesg. The gpu resets multiple times after hang.
model name: GIGABYTE Q21B/Q21B
[ 35.849634] [drm] GPU HANG: ecode 8:1:0x21204b77, in X , reason: Hang on bcs0, action: reset
[ 35.849639] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[ 35.849640] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[ 35.849642] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[ 35.849643] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[ 35.849645] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[ 35.849663] i915 0000:00:02.0: Resetting bcs0 after gpu hang
[ 42.848209] i915 0000:00:02.0: Resetting rcs0 after gpu hang
[ 51.840245] i915 0000:00:02.0: Resetting rcs0 after gpu hang
[ 61.824125] i915 0000:00:02.0: Resetting rcs0 after gpu hang
[ 70.848282] i915 0000:00:02.0: Resetting rcs0 after gpu hang
[ 79.840180] i915 0000:00:02.0: Resetting rcs0 after gpu hang
blt command stream:
HEAD: 0x00000068 [0x00000000]
head = 0x00000068, wraps = 0
TAIL: 0x00000130 [0x00000078, 0x00000098]
ACTHD: 0x00000000 0008801c
at ring: 0x00000000
batch: [0x00000000_00000000, 0x00000000_00001000]
FADDR: 0x00000000 00088200
RC PSMI: 0x00000010
Invalid PTE Fault
Source ID 25
hangcheck stall: yes
hangcheck action: dead
hangcheck action timestamp: 4294899752, 4725752 ms ago
engine reset count: 0
ELSP: pid 437, ban score 0, seqno 2:00000005, prio 1024, emitted 4726568ms ago, head 000000e8, tail 00000130
Active context: X user_handle 0 hw_id 2, prio 0, ban score 0 guilty 0 active 0
Another early batch (first user), IPEHR of garbage, a page fault. Houston we have problem.
Please build a kernel from https://cgit.freedesktop.org/drm-tip and test. It will be important later on for testing patches, anyway.
See also #106828
Created attachment 140086 [details]
dmesg, /sys/class/drm/card0/error, and build config
Thanks for your quickly reply.
I have tried to build the kernel 4.17.0-rc7+ from https://cgit.freedesktop.org/drm-tip.
Unfortunately, this issue still persist cannot be resolved.
I also uploaded an attachment here. It included dmesg, error dump, and my build config. Hope it is helpful to resolve this issue.
Hmm. Can you do a quick run with CONFIG_INTEL_IOMMU disabled (./scripts/config -d CONFIG_INTEL_IOMMU)?
Oh, and just in case I forget later, this is no longer hanging on the first batch.
(In reply to Chris Wilson from comment #6)
> Oh, and just in case I forget later, this is no longer hanging on the first
Yes it is. Just rcs this time, and not a garbage IPEHR.
Disabled Intel_IOMMU still not work.
Can you re-attach the dmesg log after adding the following kernel parameters:
Also, does this happen every time?
Created attachment 140130 [details]
Debug log after DRM's log enabled.
I've uploaded the attachment for DRM logs enabled.
(In reply to James Ausmus from comment #10)
> Also, does this happen every time?
Yes, the issue happened every time during startx running.
Created attachment 140131 [details]
Thanks for the additional details and logs!
Do you have any updates on it?
Francesco, any updates on this issue?
(In reply to circle_chen from comment #15)
> Do you have any updates on it?
Can you try this again but this time just stick to the modesetting driver? (e.g. remove xorg-x11-drv-intel (fedora) or xorg-x11-drv-intel (deb)). And attach the Xorg logs.
Created attachment 142274 [details]
Add Xorg.0.log for disabled Intel driver.
Add the log for disabled Intel driver. (Screen is hang on blank cursor)
Hmm, looks like our options of narrowing the problem is getting dimmer now that the ddx is apparently innocent.
I'd just like to verify at this point this is not a hw problem of some sort in your system. Can you drop your run-level down to command-line and run IGT testdisplay as root?
Also you reported the hang on a blank cursor? You can try running tests/kms_cursor_crc as well
If one or both tests triggers a hang, please attach the /sys/class/drm/card0/error
circle_chen, were you able to try Abdiel's test?
Created attachment 142591 [details]
Yes, sorry for late reply.
I will do the test next week.
<firstname.lastname@example.org>於 2018年11月23日 週五，下午7:21寫道：
> *Comment # 20 <https://bugs.freedesktop.org/show_bug.cgi?id=106858#c20> on
> bug 106858 <https://bugs.freedesktop.org/show_bug.cgi?id=106858> from
> Francesco Balestrieri <email@example.com> *
> circle_chen, were you able to try Abdiel's test?
> You are receiving this mail because:
> - You reported the bug.
I have downloaded xorg-intel-gpu-tools-intel-gpu-tools-1.19 and compiled successful.
And then boot the kernel and install the tools/libs to file system to run ./testdisplay.
But I got "syntax error: unexpected end of file".
We are not sure what wrong in my environment.
Could you provided a boot image let us test more easily?
If you didn't manage to compile IGT, please use your own distro's package binaries
Circle Chen, any updates here? Were you able to run the tests as in Comment 19? Feedback is needed to proceed further with this bug.
No feedback for more than 2 months, closing this bug as WORKSFORME.
When you experience the same problem with drmtip, please attach dmesg log from boot with kernel parameters drm.debug=0x1e log_buf_len=4M.
Remember to attach error file and xorg log as well.