Created attachment 142794 [details] /sys/class/drm/card0/error After using a chrome-based browser with gles2-based acceleration running for several weeks with lot of tabs on a machine with small amount of memory the screen flickers and the gpu hang appeared in dmesg. From the user point of view the problem is minor since the gpu recovered finely, even browser is still working. This is mostly for sharing error state text for a case that similar problem affects other users with a non-recoverable hangs. GPU HANG: ecode 3:0:0x7ee06741, in Chrome_InProcGp [7079], reason: hang on rcs0, action: reset System environment: -- chipset: i945g (82945G/GZ Integrated Graphics Controller 8086:2772) -- system architecture: 64-bit -- xf86-video-intel: 2:2.99.917+git20180925-2 -- xserver:2:1.20.3-1 -- mesa:18.2.5-3 -- libdrm:2.4.95-1 -- kernel:4.19.0-rc7-amd64 -- Linux distribution:debian testing -- Mobo model:DMI: Gigabyte Technology Co., Ltd. GC330UD, BIOS F2 03/17/2009 -- Display connector:d-sub Browser opera 59.0.3154.0 was started with lots of non-default arguments including --use-gl=egl that enables glesv2 backend (well, to make it usably fast on i945) opera-developer --no-zygote --use-gl=egl --enable-zero-copy --enable-native-gpu-memory-buffers --disable-gpu-sandbox --in-process-gpu --ui-disable-partial-swap --ui-enable-zero-copy --disable-gpu-driver-bug-workarounds --enable-features=CheckerImaging,UseSkiaRenderer,SkiaDeferredDisplayList --enable-media-suspend --enable-background-timer-throttling --enable-prefer-compositing-to-lcd-text --no-sandbox --no-pings --use-skia-deferred-display-list --use-skia-renderer --renderer-process-limit=1 --limit-fps=15 Actually this bug is on the same physical machine that was https://bugs.freedesktop.org/show_bug.cgi?id=92732 The linked bug was 3 years ago with odd mix of 32-bit userspace on 64bit kernel. Such mix was crashing every 2-3 weeks. Now all is 64 bit and MUCH more stable - first hang in 3 months. So, despite of this bug, generally i945g with gles2 is still in a good shape with current kernel and mesa from the user point of view.
Created attachment 142795 [details] full dmesg with default debug level
While the bug is similar to closed bug about gen3 hanging https://bugs.freedesktop.org/show_bug.cgi?id=90841 It doesn't seem to be a duplicate, since the fix was done in 2016, and current bug was observed on quite new software.
Hi, Might be wishful thinking but our drm-tip is 4.20.0-rc5 today, do you mind try to reproduce the error using drm-tip (https://cgit.freedesktop.org/drm-tip) and kernel parameters drm.debug=0x1e log_buf_len=4M, and if the problem persists attach the full dmesg from boot.
Created attachment 143111 [details] Compressed logs of crash on 4.20 with drm.debug=0x1e I've installed ubuntu's build of drm-tip from https://kernel.ubuntu.com/~kernel-ppa/mainline/drm-tip/2018-12-13/ version 4.20.0-994-generic #201812122102 SMP Thu Dec 13 02:04:59 UTC 2018 x86_64 GNU/Linux compiled from cod/tip/drm-tip/2018-12-13 (1f86f1fb70f082ed93450c328e518d8013d23953 - 2018y-12m-13d-01h-20m-07s UTC integration manifest) And booted with drm.debug=0x1e log_buf_len=4M After uptime of 21.5 days the similar GPU hang reproduces: [1851925.030801] vgalkin-desktop kernel: [drm:intel_power_well_enable [i915]] enabling always-on [1851929.100893] vgalkin-desktop kernel: [drm:intel_power_well_disable [i915]] disabling always-on [1851949.683477] vgalkin-desktop kernel: [drm:intel_power_well_enable [i915]] enabling always-on [1851953.132869] vgalkin-desktop kernel: [drm:intel_power_well_disable [i915]] disabling always-on [1851964.572983] vgalkin-desktop kernel: [drm:intel_power_well_enable [i915]] enabling always-on [1851973.028999] vgalkin-desktop kernel: [drm] GPU HANG: ecode 3:0:0x407bcfc5, in Xorg [910], reason: hang on rcs0, action: reset [1851973.029007] vgalkin-desktop kernel: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [1851973.029010] vgalkin-desktop kernel: [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [1851973.029013] vgalkin-desktop kernel: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [1851973.029015] vgalkin-desktop kernel: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [1851973.029018] vgalkin-desktop kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error [1851973.029262] vgalkin-desktop kernel: [drm:i915_reset_device [i915]] resetting chip [1851973.029384] vgalkin-desktop kernel: [drm:drm_atomic_state_init [drm]] Allocated atomic state 000000004846a8c9 Nobody used computer at that moment, so the visual effect at the time of bug is unknown (I think the monitor was in standby mode when hang appeared). When I see it after several hours all was looking&working fine. The 21-day log with drm.debug=0x1e it too huge, so I'm attaching several compressed logs inside tar.xz: at the boottime: hang-on-4.20-boot-logs.txt several hours before hang and hang: hang-on-4.20-hang-logs.txt and error state: hang-on-4.20-drm-card0-error.txt
Created attachment 143112 [details] output of journalctl -o short-monotonic | grep -C 30 i915_reset_device After first hang with generating the error state, the [drm:i915_reset_device [i915]] resetting chip message is appearing nearly 1 time a day - see attached log. (I didn't rebooted yet). Two times it is immediately following the message "[1855512.225483] vgalkin-desktop barriers[24178]: Barrier 2.2.0-Release: [2019-01-11T03:36:07] NOTE: client "LURAT-PC" is dead" Barrier is keyboard-mouse-switcher-like software and I think that after outputting this message it shows previously invisible mouse cursor and maybe awaking monitor from sleep.
(In reply to Vasily Galkin from comment #5) > Created attachment 143112 [details] > output of journalctl -o short-monotonic | grep -C 30 i915_reset_device > > After first hang with generating the error state, the > [drm:i915_reset_device [i915]] resetting chip message is appearing nearly 1 > time a day - see attached log. (I didn't rebooted yet). > > Two times it is immediately following the message "[1855512.225483] > vgalkin-desktop barriers[24178]: Barrier 2.2.0-Release: > [2019-01-11T03:36:07] NOTE: client "LURAT-PC" is dead" > > Barrier is keyboard-mouse-switcher-like software and I think that after > outputting this message it shows previously invisible mouse cursor and maybe > awaking monitor from sleep. Can you attach GPU crash dump /sys/class/drm/card0/error? Have you tried with latest drmtip? (https://cgit.freedesktop.org/drm-tip)
Created attachment 143322 [details] /sys/class/drm/card0/error on 4.20 Forgot to mention that /sys/class/drm/card0/error was already included in attachment 143111 [details]: Compressed logs of crash on 4.20 with drm.debug=0x1e However now attaching it for simplicity as separate file (and obsoleting all attachments from non-last reproduction). About testing on drm-tip: note that the problem typically reproduces only after several weeks of uptime - so it is always "nearly 1 kernel release late" - for example my last test was from drm-tip/2018-12-13 Since testing time is so long - it may need some planning of what version start to test - it may be more useful to wait 1-2-3 weeks and test some "new release full of changes" than restarting testing with "nearly same code". Does current drm-tip since 2018-12-13 include any changes that may affect this bug or it's better to wait for some future pull?
(In reply to Vasily Galkin from comment #7) > Created attachment 143322 [details] > /sys/class/drm/card0/error on 4.20 (In reply to Vasily Galkin from comment #7) > Created attachment 143322 [details] > /sys/class/drm/card0/error on 4.20 > > Forgot to mention that /sys/class/drm/card0/error was already included in > attachment 143111 [details]: Compressed logs of crash on 4.20 with > drm.debug=0x1e > > However now attaching it for simplicity as separate file (and obsoleting all > attachments from non-last reproduction). Thanks for attaching the error file. There are no clues in the attached error file. As you said in the bug description, is that the only way to reproduce the hang? If so, then the error file is also from the same scenario. Can you attach Xorg.0.log? > > About testing on drm-tip: > note that the problem typically reproduces only after several weeks of > uptime - so it is always "nearly 1 kernel release late" - for example my > last test was from drm-tip/2018-12-13 > > Since testing time is so long - it may need some planning of what version > start to test - it may be more useful to wait 1-2-3 weeks and test some "new > release full of changes" than restarting testing with "nearly same code". > > Does current drm-tip since 2018-12-13 include any changes that may affect > this bug or it's better to wait for some future pull? There will be quiet a many changes going to drmtip regularly, so we always recommend to use latest drmtip, logs from that will help during investigation.
Unfortunately Xorg log is already lost. I'll attach it if the bug reproduces another time. About scenario - the first time (when bug was reported initially with pre-4.20 kernels) I opened a lot of new tabs chromium-based browser seconds before the problem. And the hang was in request from browser process. The second time (current 4.20 attachments) - nothing was done at all, the "office" machine was staying "locked&unused" during night. And the hang was in request from Xorg process.
Reporter, any updates from drmtip? I close this bug if the issue not seen on drmtip.
I didn't test drm-tip yet, closing by now. Well, initially I was afraid that I'll see "flicker-or-*freezing* every 2 weeks" problem as I saw earlier years with 32-bit kernel, but it turns out that even if reproduces - I've seen this bug 4 times now - it always just flickers and continues working fine. I'll reopen and report if issue reproduces with more fresh drm-tip.
The problem reproduced with 5.1-rc1-based drm-tip drm-tip: 2019y-03m-18d-22h-01m-31s UTC integration manifest commit 7f60fa0e Source: https://git.launchpad.net/~ubuntu-kernel-test/ubuntu/+source/linux/+git/mainline-crack/commit/?id=7f60fa0e I used ubuntu drm-tip binaries from https://kernel.ubuntu.com/~kernel-ppa/mainline/drm-tip/2019-03-19/ Similar to previous hang it occured at night when machine wasn't used, but browser window was left open, so some rendering may occured. Despite of GPU error the system continues working fine. Now I saved (still running) Xorg.0.log, but it doesn't contain anything at the hang time. However, the main error message was a bit different: > i915 0000:00:02.0: GPU HANG: ecode 3:1:0x00000000, in Xorg [877], hang on rcs0 Going to attach it with fresh logs.
Created attachment 144022 [details] /sys/class/drm/card0/error on drm-tip 5.1-rc1
Created attachment 144023 [details] Xorg.0.log on drm-tip 5.1-rc1
Created attachment 144024 [details] Compressed logs of boot & crash with drm.debug=0x1e on drm-tip 5.1-rc1
The problem reproduces in a more informative way (with dump in error file!). Since the history of this bug is filled of not-very informatives error files lacking the dump and has different error messages for older kernels - I reported it as a new bug #110628 and closing this.
Created attachment 144398 [details] journalctl priority 7
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.