Since upgrading from 5.1 to 5.2, the cursor sprite set via (non-atomic) KMS is not properly shown on screen sometimes when running mutter/GNOME Shell on top of KMS. It happens somewhat randomly and fairly seldom (personally only once), with no clear way of how to reproduce, but often enough to get regular bug reports. It somewhat feels like a race condition somewhere.
https://bugzilla.redhat.com/show_bug.cgi?id=1738614 (contains drm.debug log)
In all cases, downgrading to 5.1 makes the issue go away. If it's not a kernel bug/regression, any hints on what could cause it?
(In reply to Jonas Ådahl from comment #0)
> Since upgrading from 5.1 to 5.2, the cursor sprite set via (non-atomic) KMS
> is not properly shown on screen sometimes when running mutter/GNOME Shell on
> top of KMS. It happens somewhat randomly and fairly seldom (personally only
> once), with no clear way of how to reproduce, but often enough to get
> regular bug reports. It somewhat feels like a race condition somewhere.
> https://bugzilla.redhat.com/show_bug.cgi?id=1738614 (contains drm.debug log)
> In all cases, downgrading to 5.1 makes the issue go away. If it's not a
> kernel bug/regression, any hints on what could cause it?
Can you please verify the issue with drmtip?(https://cgit.freedesktop.org/drm-tip). Full logs (from 0 sec) from drmtip will be helpful for investigation.
Btw attached logs are not from boot. Which platform is this?
intel_reg read --count 12 0x70080 0x71080 0x72080
when the cursor has vanished should at least tell us whether the kernel thinks the cursor should be enabled, and whether it's actually enabled in hardware.
(In reply to Ville Syrjala from comment #2)
> cat /sys/kernel/debug/dri/0/i915_display_info
> intel_reg read --count 12 0x70080 0x71080 0x72080
> when the cursor has vanished should at least tell us whether the kernel
> thinks the cursor should be enabled, and whether it's actually enabled in
I just hit the issue again, and in i915_display_info, the cursor is reported as visible, but it's not showing on screen:
CRTC 47: pipe: A, active=yes, (size=1920x1080), dither=no, bpp=24
fb: 118, pos: 0x0, size: 1920x1080
encoder 106: type: DP-MST A, connectors:
connector 117: type: DP-3, status: connected, mode:
"1920x1080": 60 148500 1920 2008 2052 2200 1080 1084 1089 1125 0x48 0x5
cursor visible? yes, position (334, 6), size 256x256, addr 0x00880000
num_scalers=2, scaler_users=0 scaler_id=-1, scalers: use=no, mode=10000000, scalers: use=no, mode=0
--Plane id 30: type=PRI, crtc_pos= 0x 0, crtc_size=1920x1080, src_pos=0.0000x0.0000, src_size=1920.0000x1080.0000, format=XR24 little-endian (0x34325258), rotation=0 (0x00000001)
--Plane id 37: type=OVL, crtc_pos= 0x 0, crtc_size= 0x 0, src_pos=0.0000x0.0000, src_size=0.0000x0.0000, format=N/A, rotation=0 (0x00000001)
--Plane id 44: type=CUR, crtc_pos= 334x 6, crtc_size= 256x 256, src_pos=0.0000x0.0000, src_size=256.0000x256.0000, format=AR24 little-endian (0x34325241), rotation=0 (0x00000001)
underrun reporting: cpu=yes pch=yes
When it's showing, the cursor part of the above text is identical, apart form the 'addr' (including the plane with id 44).
The intel_reg command just printed an error:
Error: /usr/share/igt-gpu-tools/registers/gen8_interrupt.txt:1: ('GEN8_MASTER_IRQ', '0x00044200', '')
Error: /usr/share/igt-gpu-tools/registers/skylake:1: gen8_interrupt.txt
Error: /usr/share/igt-gpu-tools/registers/kabylake:2: skylake
Warning: reading '/usr/share/igt-gpu-tools/registers/kabylake' failed. Using builtin register spec.
but then printed some registers. Attaching for when it's visible, and invisible. Attaching dump --all too for good measure.
Created attachment 145243 [details]
i915_display_info: cursor invisible
Created attachment 145244 [details]
i915_display_info: cursor visible
Created attachment 145245 [details]
intel_reg dump --all: invisible
Created attachment 145246 [details]
intel_reg dump --all: visible
Created attachment 145247 [details]
intel_req read 12: invisible
Created attachment 145248 [details]
intel_req read 12: visible
Hmm. Yeah, looks like both the kernel and hw think the cursor should be enabled.
One theory might be that the alpha channel is all zeroes. Would need to dump the relevant chunk of memory to confirm.
I have a tool to do just that, and I just pimped it to handle cursors.
intel_gtt_dump -f cursor.png -C a # dumps cursor image for pipe A
The downside of my tool is that it requires a kernel patch on PAT machines because the kernel is silly and won't allow you to mmap RAM via /dev/mem:
Jani, here are the affected platforms: KBL, BDW, HSW, CFL.
From the original issue:
KBL: https://gitlab.gnome.org/GNOME/gnome-shell/issues/1165#note_586442 (multiple of those)
IVB as well (ThinkPad x230 w/ i7-3520M)
If v5.1 is good and v5.2 is bad, a bisect may give a clearer hint.
Created attachment 145329 [details]
Using the attached function (compile, break the compositor process using gdb, then run "print dlopen("/path/to/compiled/dump-gbm-bo.so", 2)") I attempted to look at the contents of the cursor buffer before it was passed to drmModeSetCursor2().
When reproducing, it seems that it contains only 0s, but the rest is correct (e.g. size). When it's not reproducing, the dump at the equivalent timing shows correct content. I haven't verified that a dump of the same gbm_bo is correct immediately after writing pixels.
In the cursor renderer code in mutter, we only ever write to a gbm_bo immediately after its construction, and we never map the memory after that. As the case is that this only reproduces with some kernel versions, I have my doubt it's that we upload empty pixels, but will add some code that dumps after construction too to be sure. What could cause memory of a gbm_bo to be cleared after its construction, but before its destruction?
I wonder if this might be related to a VT switch somehow.
It seems to me every time I had that issue with the cursor disappearing, I had switched to another (console) VT shortly before...
It's not a 100% reproducer, but I suspect this might be a factor to trigger the issue.
Not sure about VT switch, but I'm seeing this often in gnome when switching between virtual desktops. Pointer is drawn on first but not on second. There is also something with active windows, since pointer may be drawn on active console app window but not outside of it. And sometimes no pointer visible on lock/login screen... bisecting kernel now.
I finally found a way to reproduce this issue easily. It appears that increasing CPU/memory usage rapidly causes this issue to be triggered.
Steps to reproduce:
- Start a GNOME session
- Open Firefox
- Hold Ctrl-t to open tabs rapidly
- Once system memory reaches about 80% use the hot corner to switch to Activities view in GNOME Shell
- Move cursor in Activities overview
- The cursor should disappear
My current kernel version is:
$ uname -a
Linux galago 5.2.13-200.fc30.x86_64 #1 SMP Fri Sep 6 14:30:40 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
I haven't done any debugging to figure out why this is happening, but hopefully this information was useful!
Thanks to being able to reproduce at will, I did some more digging.
1. Take note of a what gbm_bo successful drmModeSetCursor2() had when showing a cursor on the GNOME Shell panel
2. Move cursor to a maximized Firefox window just below the top panel (this caused drmModeSetCursor2() calls to change to the cursor from a wl_buffer.
3. Hold down ^T for a while to open a bunch of tabs
4. Move the cursor back up to the top panel. This triggered a call to drmModeSetCursor2().
What I could observe is that the same gbm_bo was used in (4) as was in (1), it had become empty. There were no gbm_bo cursor allocations done in between, nor any drmModeSetCursor2() calls.