Since upgrading from 5.1 to 5.2, the cursor sprite set via (non-atomic) KMS is not properly shown on screen sometimes when running mutter/GNOME Shell on top of KMS. It happens somewhat randomly and fairly seldom (personally only once), with no clear way of how to reproduce, but often enough to get regular bug reports. It somewhat feels like a race condition somewhere. E.g. https://bugzilla.redhat.com/show_bug.cgi?id=1738614 (contains drm.debug log) https://gitlab.gnome.org/GNOME/gnome-shell/issues/1165 In all cases, downgrading to 5.1 makes the issue go away. If it's not a kernel bug/regression, any hints on what could cause it?
(In reply to Jonas Ådahl from comment #0) > Since upgrading from 5.1 to 5.2, the cursor sprite set via (non-atomic) KMS > is not properly shown on screen sometimes when running mutter/GNOME Shell on > top of KMS. It happens somewhat randomly and fairly seldom (personally only > once), with no clear way of how to reproduce, but often enough to get > regular bug reports. It somewhat feels like a race condition somewhere. > > E.g. > https://bugzilla.redhat.com/show_bug.cgi?id=1738614 (contains drm.debug log) > https://gitlab.gnome.org/GNOME/gnome-shell/issues/1165 > > In all cases, downgrading to 5.1 makes the issue go away. If it's not a > kernel bug/regression, any hints on what could cause it? Can you please verify the issue with drmtip?(https://cgit.freedesktop.org/drm-tip). Full logs (from 0 sec) from drmtip will be helpful for investigation. Btw attached logs are not from boot. Which platform is this?
cat /sys/kernel/debug/dri/0/i915_display_info intel_reg read --count 12 0x70080 0x71080 0x72080 when the cursor has vanished should at least tell us whether the kernel thinks the cursor should be enabled, and whether it's actually enabled in hardware.
(In reply to Ville Syrjala from comment #2) > cat /sys/kernel/debug/dri/0/i915_display_info > intel_reg read --count 12 0x70080 0x71080 0x72080 > > when the cursor has vanished should at least tell us whether the kernel > thinks the cursor should be enabled, and whether it's actually enabled in > hardware. I just hit the issue again, and in i915_display_info, the cursor is reported as visible, but it's not showing on screen: CRTC 47: pipe: A, active=yes, (size=1920x1080), dither=no, bpp=24 fb: 118, pos: 0x0, size: 1920x1080 encoder 106: type: DP-MST A, connectors: connector 117: type: DP-3, status: connected, mode: "1920x1080": 60 148500 1920 2008 2052 2200 1080 1084 1089 1125 0x48 0x5 cursor visible? yes, position (334, 6), size 256x256, addr 0x00880000 num_scalers=2, scaler_users=0 scaler_id=-1, scalers[0]: use=no, mode=10000000, scalers[1]: use=no, mode=0 --Plane id 30: type=PRI, crtc_pos= 0x 0, crtc_size=1920x1080, src_pos=0.0000x0.0000, src_size=1920.0000x1080.0000, format=XR24 little-endian (0x34325258), rotation=0 (0x00000001) --Plane id 37: type=OVL, crtc_pos= 0x 0, crtc_size= 0x 0, src_pos=0.0000x0.0000, src_size=0.0000x0.0000, format=N/A, rotation=0 (0x00000001) --Plane id 44: type=CUR, crtc_pos= 334x 6, crtc_size= 256x 256, src_pos=0.0000x0.0000, src_size=256.0000x256.0000, format=AR24 little-endian (0x34325241), rotation=0 (0x00000001) underrun reporting: cpu=yes pch=yes When it's showing, the cursor part of the above text is identical, apart form the 'addr' (including the plane with id 44). The intel_reg command just printed an error: Error: /usr/share/igt-gpu-tools/registers/gen8_interrupt.txt:1: ('GEN8_MASTER_IRQ', '0x00044200', '') Error: /usr/share/igt-gpu-tools/registers/skylake:1: gen8_interrupt.txt Error: /usr/share/igt-gpu-tools/registers/kabylake:2: skylake Warning: reading '/usr/share/igt-gpu-tools/registers/kabylake' failed. Using builtin register spec. but then printed some registers. Attaching for when it's visible, and invisible. Attaching dump --all too for good measure.
Created attachment 145243 [details] i915_display_info: cursor invisible
Created attachment 145244 [details] i915_display_info: cursor visible
Created attachment 145245 [details] intel_reg dump --all: invisible
Created attachment 145246 [details] intel_reg dump --all: visible
Created attachment 145247 [details] intel_req read 12: invisible
Created attachment 145248 [details] intel_req read 12: visible
Hmm. Yeah, looks like both the kernel and hw think the cursor should be enabled. One theory might be that the alpha channel is all zeroes. Would need to dump the relevant chunk of memory to confirm. I have a tool to do just that, and I just pimped it to handle cursors. git://github.com/vsyrjala/intel-gpu-tools.git gtt_dump_2 intel_gtt_dump -f cursor.png -C a # dumps cursor image for pipe A The downside of my tool is that it requires a kernel patch on PAT machines because the kernel is silly and won't allow you to mmap RAM via /dev/mem: git://github.com/vsyrjala/linux.git pat_vs_dev_mem
Jani, here are the affected platforms: KBL, BDW, HSW, CFL. From the original issue: KBL: https://gitlab.gnome.org/GNOME/gnome-shell/issues/1165#note_574025 BDW: https://gitlab.gnome.org/GNOME/gnome-shell/issues/1165#note_576056 HSW: https://gitlab.gnome.org/GNOME/gnome-shell/issues/1165#note_579478 KBL: https://gitlab.gnome.org/GNOME/gnome-shell/issues/1165#note_584631 CFL: https://gitlab.gnome.org/GNOME/gnome-shell/issues/1165#note_586072 HSW: https://gitlab.gnome.org/GNOME/gnome-shell/issues/1165#note_586161 KBL: https://gitlab.gnome.org/GNOME/gnome-shell/issues/1165#note_586442 (multiple of those) BDW: https://gitlab.gnome.org/GNOME/gnome-shell/issues/1165#note_587079 HSW: https://gitlab.gnome.org/GNOME/gnome-shell/issues/1165#note_587669 (and more…)
IVB as well (ThinkPad x230 w/ i7-3520M)
If v5.1 is good and v5.2 is bad, a bisect may give a clearer hint.
Created attachment 145329 [details] dump-gbm-bo.c Using the attached function (compile, break the compositor process using gdb, then run "print dlopen("/path/to/compiled/dump-gbm-bo.so", 2)") I attempted to look at the contents of the cursor buffer before it was passed to drmModeSetCursor2(). When reproducing, it seems that it contains only 0s, but the rest is correct (e.g. size). When it's not reproducing, the dump at the equivalent timing shows correct content. I haven't verified that a dump of the same gbm_bo is correct immediately after writing pixels. In the cursor renderer code in mutter, we only ever write to a gbm_bo immediately after its construction, and we never map the memory after that. As the case is that this only reproduces with some kernel versions, I have my doubt it's that we upload empty pixels, but will add some code that dumps after construction too to be sure. What could cause memory of a gbm_bo to be cleared after its construction, but before its destruction?
I wonder if this might be related to a VT switch somehow. It seems to me every time I had that issue with the cursor disappearing, I had switched to another (console) VT shortly before... It's not a 100% reproducer, but I suspect this might be a factor to trigger the issue.
Not sure about VT switch, but I'm seeing this often in gnome when switching between virtual desktops. Pointer is drawn on first but not on second. There is also something with active windows, since pointer may be drawn on active console app window but not outside of it. And sometimes no pointer visible on lock/login screen... bisecting kernel now.
Thanks Mikko.
I finally found a way to reproduce this issue easily. It appears that increasing CPU/memory usage rapidly causes this issue to be triggered. Steps to reproduce: - Start a GNOME session - Open Firefox - Hold Ctrl-t to open tabs rapidly - Once system memory reaches about 80% use the hot corner to switch to Activities view in GNOME Shell - Move cursor in Activities overview - The cursor should disappear My current kernel version is: ``` $ uname -a Linux galago 5.2.13-200.fc30.x86_64 #1 SMP Fri Sep 6 14:30:40 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux ``` I haven't done any debugging to figure out why this is happening, but hopefully this information was useful!
Thanks to being able to reproduce at will, I did some more digging. 1. Take note of a what gbm_bo successful drmModeSetCursor2() had when showing a cursor on the GNOME Shell panel 2. Move cursor to a maximized Firefox window just below the top panel (this caused drmModeSetCursor2() calls to change to the cursor from a wl_buffer. 3. Hold down ^T for a while to open a bunch of tabs 4. Move the cursor back up to the top panel. This triggered a call to drmModeSetCursor2(). What I could observe is that the same gbm_bo was used in (4) as was in (1), it had become empty. There were no gbm_bo cursor allocations done in between, nor any drmModeSetCursor2() calls.
Can you please try https://patchwork.freedesktop.org/series/67000/
I am reasonably confident this should be resolved by commit 5028851cdfdf78dc22eacbc44a0ab0b3f599ee4a (HEAD -> drm-intel-next-queued, drm-intel/drm-intel-next-queued) Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Sep 20 13:18:21 2019 +0100 drm/i915: Mark contents as dirty on a write fault Since dropping the set-to-gtt-domain in commit a679f58d0510 ("drm/i915: Flush pages on acquisition"), we no longer mark the contents as dirty on a write fault. This has the issue of us then not marking the pages as dirty on releasing the buffer, which means the contents are not written out to the swap device (should we ever pick that buffer as a victim). Notably, this is visible in the dumb buffer interface used for cursors. Having updated the cursor contents via mmap, and swapped away, if the shrinker should evict the old cursor, upon next reuse, the cursor would be invisible. E.g. echo 80 > /proc/sys/kernel/sysrq ; echo f > /proc/sysrq-trigger Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111541 Fixes: a679f58d0510 ("drm/i915: Flush pages on acquisition") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.william.auld@gmail.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: <stable@vger.kernel.org> # v5.2+ Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190920121821.7223-1-chris@chris-wilson.co.uk
Let's resolve this. Reporter and commenters please verify on your side and please re-open if issue not fixed.
I've tried 5.2.17 with the mentioned patch applied, and cannot reproduce anymore.
I got stuck in bisecting when the bug no longer reproduced on laptop without the docking station. Now with 10 days uptime and back to docking station use, the bug reproduces again. Will try the patch.
The patch doesn't seem to apply to 5.2 stable tree, so I tried to backport it like this. I hope this is correct. --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1908,7 +1908,11 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf) list_add(&obj->userfault_link, &dev_priv->mm.userfault_list); GEM_BUG_ON(!obj->userfault_count); - i915_vma_set_ggtt_write(vma); + if (write) { + GEM_BUG_ON(!i915_gem_object_has_pinned_pages(obj)); + i915_vma_set_ggtt_write(vma); + obj->mm.dirty = true; + } err_fence: i915_vma_unpin_fence(vma);
With the patch above applied on top of v5.2.17 kernel the issue does not seem to reproduce anymore. The patch is not yet in 5.2.20 or 5.3.5 stable trees but would be nice if you could submit there once the fix lands in Linus's tree.
(In reply to Mikko Rapeli from comment #26) > With the patch above applied on top of v5.2.17 kernel the issue does not > seem to reproduce anymore. > > The patch is not yet in 5.2.20 or 5.3.5 stable trees but would be nice if > you could submit there once the fix lands in Linus's tree. At the moment fix is in drmtip, can not say when it will land in linus tree. You have to check regularly.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.