Bug 111197

Summary: [skl] assertion hits
Product: xorg Reporter: Jiri Slaby <jirislaby>
Component: Driver/intelAssignee: Chris Wilson <chris>
Status: RESOLVED MOVED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium    
Version: git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
xorg 0 log
none
debug=full log
none
debug=full log
none
x11trace -o x11trace dbus-launch plasmashell
none
xorg 0 log
none
debug=full log none

Description Jiri Slaby 2019-07-23 08:39:12 UTC
Created attachment 144850 [details]
xorg 0 log

Being at git rev 60022507.

After switching debug on, I see:
> #4  0x0000558446930cc6 in FatalError (f=f@entry=0x7fbd6f1abb47 "%s:%d assertion '%s' failed\n") at log.c:1017
> #5  0x00007fbd6f15eb26 in sna_dri2_schedule_swap (client=0x55844816a690, draw=0x558448288e80, front=0x5584479cb840, back=0x55844831e2d0, target_msc=0x7ffd3af4bf88, divisor=0, remainder=0, func=0x5584468f57c0 <DRI2SwapEvent>, data=0x558448288e80) at sna_dri2.c:3321
> #6  0x00005584468f490d in DRI2SwapBuffers (client=client@entry=0x55844816a690, pDraw=0x558448288e80, target_msc=<optimized out>, divisor=0, remainder=0, swap_target=swap_target@entry=0x7ffd3af4c028, func=0x5584468f57c0 <DRI2SwapEvent>, data=0x558448288e80) at dri2.c:1182
> #7  0x00005584468f5d5a in ProcDRI2SwapBuffers (client=0x55844816a690) at dri2ext.c:424
> #8  ProcDRI2Dispatch (client=0x55844816a690) at dri2ext.c:621
> #9  0x00005584467ce4b4 in Dispatch () at dispatch.c:478
> #10 0x00005584467d24c8 in dix_main (argc=14, argv=0x7ffd3af4c218, envp=<optimized out>) at main.c:280

and

> #4  0x0000556e73825cc6 in FatalError (f=f@entry=0x7f2a22406b47 "%s:%d assertion '%s' failed\n") at log.c:1017
> #5  0x00007f2a223b9656 in sna_dri2_schedule_swap (client=0x556e757ca770, draw=0x556e75a46920, front=0x556e75953790, back=0x556e758c4780, target_msc=0x7ffc60457a88, divisor=0, remainder=0, func=0x556e737ea7c0 <DRI2SwapEvent>, data=0x556e75a46920) at sna_dri2.c:3316
> #6  0x0000556e737e990d in DRI2SwapBuffers (client=client@entry=0x556e757ca770, pDraw=0x556e75a46920, target_msc=<optimized out>, divisor=0, remainder=0, swap_target=swap_target@entry=0x7ffc60457b28, func=0x556e737ea7c0 <DRI2SwapEvent>, data=0x556e75a46920) at dri2.c:1182
> #7  0x0000556e737ead5a in ProcDRI2SwapBuffers (client=0x556e757ca770) at dri2ext.c:424
> #8  ProcDRI2Dispatch (client=0x556e757ca770) at dri2ext.c:621
> #9  0x0000556e736c34b4 in Dispatch () at dispatch.c:478
> #10 0x0000556e736c74c8 in dix_main (argc=14, argv=0x7ffc60457d18, envp=<optimized out>) at main.c:280
Comment 1 Chris Wilson 2019-07-23 09:38:37 UTC
assert(!get_private(back)->bo->active_scanout);

That's quite a major screw up that would explain your flickering bug. I thought we would catch the inversion earlier. Oh well, care to attach an xz-compressed --enable-debug=full log?
Comment 2 Jiri Slaby 2019-07-23 10:38:28 UTC
Created attachment 144851 [details]
debug=full log
Comment 3 Jiri Slaby 2019-07-23 10:43:49 UTC
(In reply to Chris Wilson from comment #1)
> assert(!get_private(back)->bo->active_scanout);

For the record the other crash above is:
assert(get_private(back)->bo != get_private(front)->bo);
Comment 4 Chris Wilson 2019-07-23 10:44:16 UTC
Did that full-debug run exhibit the flickering? The assert itself was not recorded.
Comment 5 Jiri Slaby 2019-07-23 10:50:04 UTC
(In reply to Chris Wilson from comment #4)
> Did that full-debug run exhibit the flickering? The assert itself was not
> recorded.

It crashed right after start dozen times. With -core to X, the server shut downs without putting any info into the log, it seems.

Or maybe I missed it because the X server is restarted by sddm like mad:

-rw------- 1 root   root    44990464 2019-07-23 12:34:02.866783418 +0200 core.0.19147.X
-rw------- 1 root   root    44982272 2019-07-23 12:34:05.354766502 +0200 core.0.19245.X
-rw------- 1 root   root    44986368 2019-07-23 12:34:07.838749615 +0200 core.0.19278.X
-rw------- 1 root   root    44986368 2019-07-23 12:34:10.142733950 +0200 core.0.19312.X
-rw------- 1 root   root    44986368 2019-07-23 12:34:12.370718803 +0200 core.0.19347.X
-rw------- 1 root   root    44982272 2019-07-23 12:34:14.550703981 +0200 core.0.19381.X
-rw------- 1 root   root    44978176 2019-07-23 12:34:16.802688672 +0200 core.0.19411.X
-rw------- 1 root   root    44986368 2019-07-23 12:34:19.146672735 +0200 core.0.19445.X
-rw------- 1 root   root    44982272 2019-07-23 12:34:21.586656146 +0200 core.0.19478.X
-rw------- 1 root   root    44990464 2019-07-23 12:34:23.814640999 +0200 core.0.19510.X
-rw------- 1 root   root    44982272 2019-07-23 12:34:26.078625607 +0200 core.0.19602.X
-rw------- 1 root   root    44986368 2019-07-23 12:34:28.318610377 +0200 core.0.19637.X
-rw------- 1 root   root    44978176 2019-07-23 12:34:30.566595094 +0200 core.0.19670.X
-rw------- 1 root   root    44978176 2019-07-23 12:34:32.774580082 +0200 core.0.19700.X
-rw------- 1 root   root    44986368 2019-07-23 12:34:34.962565205 +0200 core.0.19732.X
-rw------- 1 root   root    44978176 2019-07-23 12:34:37.210549922 +0200 core.0.19764.X
-rw------- 1 root   root    44982272 2019-07-23 12:34:39.466534585 +0200 core.0.19818.X
-rw------- 1 root   root    44974080 2019-07-23 12:34:42.006517316 +0200 core.0.19853.X

So possibly, I did not pack up the right xorg.log.
Comment 6 Jiri Slaby 2019-07-23 10:55:07 UTC
Created attachment 144853 [details]
debug=full log

Ah, this time it is the good one.
Comment 7 Chris Wilson 2019-07-23 12:03:00 UTC
The client never asked for the updated back buffer after the swap. It kept on rendering into the buffer it just swapped into the frontbuffer. Oh dear.
Comment 8 Chris Wilson 2019-07-23 14:38:21 UTC
If you can, an x11trace of the misbehaving app would be useful. That would show whether the DRI2GetBuffers() request is there between the DRI2SwapBuffers() or not.
Comment 9 Jiri Slaby 2019-07-23 17:14:01 UTC
(In reply to Chris Wilson from comment #8)
> If you can, an x11trace of the misbehaving app would be useful. That would
> show whether the DRI2GetBuffers() request is there between the
> DRI2SwapBuffers() or not.

The problem is that X crashes during startup, so I cannot run anything yet. Or I could run X, but I have no idea what app is causing the crash. Would x11trace of startkde be of any help?
Comment 10 Chris Wilson 2019-07-23 17:28:37 UTC
That's inconvenient. Having X die as soon as the error occurs will help to keep the trace concise. If you could launch X, then trace startkde? Or turn off kwin compositing, then see if the bug can be hit by another app that is traceable.
Failing that, turn off the debug asserts and we can grep the trace once the error is visible.
Comment 11 Jiri Slaby 2019-07-24 07:09:43 UTC
Created attachment 144865 [details]
x11trace -o x11trace dbus-launch plasmashell

So it was enough to run plasmashell.
Comment 12 Chris Wilson 2019-07-24 10:12:02 UTC
The trace is quite short, and there's about 3 flip sequences, but afaict all seemed to do a back-to-back swap without a GetBuffers after being invalidated. E.g, the last sequence:

001:<:0320: 32: DRI2-Request(154,8): SwapBuffers drawable=0x0040000c target_msc_hi=0 target_msc_lo=0 divisor_hi=0 divisor_lo=0 remainder_hi=0 remainder_lo=0
001:<:0321:  4: Request(43): GetInputFocus 
001:>:0321: Event DRI2-BufferSwapComplete(119) drawable=0x00000001 ust_hi=4194316 ust_lo=9 msc_hi=1456050911 msc_lo=4294967295 sbc_hi=3640489432 sbc_lo=2
001:>:0320:32: Reply to SwapBuffers: swap_hi=0 swap_lo=3
001:>:0321:32: Reply to GetInputFocus: revert-to=None(0x00) focus=PointerRoot(0x00000001)
001:<:0322: 20: DRI2-Request(154,7): GetBuffersWithFormat drawable=0x0040000c attachments={attachment=BackLeft(0x00000001) format=0x00000020};
001:>:0322:52: Reply to GetBuffersWithFormat: width=1920 height=1080 buffers={attachment=BackLeft(0x00000001) name=0x00000005 pitch=7680 cpp=4 flags=0x00000000};
001:>:0322: Event DRI2-BufferSwapComplete(119) drawable=0x00000002 ust_hi=4194316 ust_lo=9 msc_hi=1456067584 msc_lo=4294967295 sbc_hi=3640489433 sbc_lo=3
001:<:0323: 32: DRI2-Request(154,8): SwapBuffers drawable=0x0040000c target_msc_hi=0 target_msc_lo=0 divisor_hi=0 divisor_lo=0 remainder_hi=0 remainder_lo=0
001:<:0324:  4: Request(43): GetInputFocus 
001:>:0324: Event DRI2-InvalidateBuffers(120) drawable=0x0040000c
001:>:0323:32: Reply to SwapBuffers: swap_hi=0 swap_lo=4
001:>:0324:32: Reply to GetInputFocus: revert-to=None(0x00) focus=PointerRoot(0x00000001)
001:<:0325: 20: DRI2-Request(154,7): GetBuffersWithFormat drawable=0x0040000c attachments={attachment=BackLeft(0x00000001) format=0x00000020};
001:>:0325:52: Reply to GetBuffersWithFormat: width=1920 height=1080 buffers={attachment=BackLeft(0x00000001) name=0x00000003 pitch=7680 cpp=4 flags=0x00000000};
001:<:0326: 32: DRI2-Request(154,8): SwapBuffers drawable=0x0040000c target_msc_hi=0 target_msc_lo=0 divisor_hi=0 divisor_lo=0 remainder_hi=0 remainder_lo=0
001:<:0327:  4: Request(43): GetInputFocus 
001:>:0327: Event DRI2-InvalidateBuffers(120) drawable=0x0040000c
001:>:0326:32: Reply to SwapBuffers: swap_hi=0 swap_lo=5
001:>:0327:32: Reply to GetInputFocus: revert-to=None(0x00) focus=PointerRoot(0x00000001)
001:<:0328: 20: DRI2-Request(154,7): GetBuffersWithFormat drawable=0x0040000c attachments={attachment=BackLeft(0x00000001) format=0x00000020};
001:>:0328: Event DRI2-BufferSwapComplete(119) drawable=0x00000002 ust_hi=4194316 ust_lo=9 msc_hi=1456084246 msc_lo=4294967295 sbc_hi=3640489434 sbc_lo=4
001:>:0328:52: Reply to GetBuffersWithFormat: width=1920 height=1080 buffers={attachment=BackLeft(0x00000001) name=0x00000005 pitch=7680 cpp=4 flags=0x00000000};
001:<:0329: 32: DRI2-Request(154,8): SwapBuffers drawable=0x0040000c target_msc_hi=0 target_msc_lo=0 divisor_hi=0 divisor_lo=0 remainder_hi=0 remainder_lo=0
001:<:032a:  4: Request(43): GetInputFocus 
001:>:032a: Event DRI2-InvalidateBuffers(120) drawable=0x0040000c
001:>:0329:32: Reply to SwapBuffers: swap_hi=0 swap_lo=6
001:>:032a:32: Reply to GetInputFocus: revert-to=None(0x00) focus=PointerRoot(0x00000001)
001:<:032b: 32: DRI2-Request(154,8): SwapBuffers drawable=0x0040000c target_msc_hi=0 target_msc_lo=0 divisor_hi=0 divisor_lo=0 remainder_hi=0 remainder_lo=0
001:<:032c:  4: Request(43): GetInputFocus 
001:>:032c: Event DRI2-BufferSwapComplete(119) drawable=0x00000002 ust_hi=4194316 ust_lo=9 msc_hi=1456100912 msc_lo=4294967295 sbc_hi=3640489435 sbc_lo=5
001:>:032c: Event DRI2-BufferSwapComplete(119) drawable=0x00000002 ust_hi=4194316 ust_lo=9 msc_hi=1456117583 msc_lo=4294967295 sbc_hi=3640489436 sbc_lo=6

Without asserts enabled, we just ignore the swap.
Comment 13 Chris Wilson 2019-07-24 10:24:30 UTC
commit 364c91697805e3d6504c14128ec21d95026a2551 (HEAD -> master, upstream/master)
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Jul 23 18:29:26 2019 +0100

    sna/dri2: Skip stale client swap before asserting
    
    Check for a stale backbuffer (the client didn't call DRI2GetBuffers
    between DRI2SwapBuffers) before asserting so that we should be
    more resilient with asserts enabled for client errors.
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=111197
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

back to full-debug and see if we get another error.
Comment 14 Jiri Slaby 2019-07-24 10:41:06 UTC
Created attachment 144866 [details]
xorg 0 log

(In reply to Jiri Slaby from comment #3)
> For the record the other crash above is:
> assert(get_private(back)->bo != get_private(front)->bo);

Now I see these crashes. full debug attached.
Comment 15 Chris Wilson 2019-07-24 11:03:48 UTC
Apologies, another chunk of asserts need to be after the skip for stale back buffers for the logging to continue.

commit fffb1f52b61fd4983e4a1ff29c4673481ea63f17 (HEAD -> master, upstream/master)
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Jul 23 18:29:26 2019 +0100

    sna/dri2: Skip asserting distinct front/back for stale swaps
    
    If the backbuffer is stale (i.e. the client didn't call DRI2GetBuffers
    before swapping) the front/back bo may not be distinct. Move the
    assertion for a valid swap after the handling of a stale swap so that
    the assertions are more robust for a client error.
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=111197
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 16 Jiri Slaby 2019-07-24 12:04:16 UTC
Created attachment 144867 [details]
debug=full log

Now I started X server, and didn't see the problem.

So I removed Option "DRI"      "2" from the configuration, restarted and ran mpv to see this crash:

> #4  0x0000560088d46cc6 in FatalError (f=0x7fd4e1d25647 "%s:%d assertion '%s' failed\n") at log.c:1017
> #5  0x00007fd4e1c5dbf1 in sna_dri2_get_back (sna=0x7fd4e123c000, draw=0x56008a59a7e0, back=0x56008a9dd2e0) at sna_dri2.c:352
> #6  0x00007fd4e1c5e0a2 in sna_dri2_reuse_buffer (draw=0x56008a59a7e0, buffer=0x56008a9dd2e0) at sna_dri2.c:445
> #7  0x0000560088d09537 in allocate_or_reuse_buffer (pDraw=pDraw@entry=0x56008a59a7e0, ds=ds@entry=0x560089e75070, pPriv=pPriv@entry=0x56008a56df50, attachment=attachment@entry=1, format=format@entry=32, dimensions_match=dimensions_match@entry=1, buffer=0x56008a6fc5a0) at dri2.c:523
> #8  0x0000560088d0a018 in do_get_buffers (pDraw=0x56008a59a7e0, width=width@entry=0x7ffc10cb3ce8, height=height@entry=0x7ffc10cb3cf0, attachments=0x56008ac9a434, attachments@entry=0x56008ac9a42c, count=1, out_count=out_count@entry=0x7ffc10cb3cf8, has_format=1) at dri2.c:591
> #9  0x0000560088d0a3fb in DRI2GetBuffersWithFormat (pDraw=<optimized out>, width=width@entry=0x7ffc10cb3ce8, height=height@entry=0x7ffc10cb3cf0, attachments=attachments@entry=0x56008ac9a42c, count=<optimized out>, out_count=out_count@entry=0x7ffc10cb3cf8) at dri2.c:708
> #10 0x0000560088d0be38 in ProcDRI2GetBuffersWithFormat (client=0x56008a534db0) at dri2ext.c:317
> #11 ProcDRI2Dispatch (client=0x56008a534db0) at dri2ext.c:619
> #12 0x0000560088be44b4 in Dispatch () at dispatch.c:478
Comment 17 Chris Wilson 2019-07-24 13:26:02 UTC
commit 6f4972d5c368c30e971a23c1dc370d3e43761282 (HEAD -> master, upstream/master)
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Jul 24 14:10:24 2019 +0100

    sna/dri2: Relinquish back-buffer cache on change of scanout status
    
    If we change scanout status (i.e. whether or not this flip chain may be
    presented directly on the CRTC), throwaway the previous back buffer
    cache as those buffers may not be suitable for presentation.
    
    Reported-by: Jiri Slaby <jirislaby@gmail.com>
    References: https://bugs.freedesktop.org/show_bug.cgi?id=111197
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

At first I thought eureka, but it looks like it is just complaining about being silly. Not the warning about front/back inversion as I first thought.
Comment 18 Jiri Slaby 2019-07-25 06:02:25 UTC
(In reply to Chris Wilson from comment #17)
> commit 6f4972d5c368c30e971a23c1dc370d3e43761282 (HEAD -> master,

OK, no more crashes.
Comment 19 Jiri Slaby 2019-07-26 06:33:58 UTC
(In reply to Chris Wilson from comment #1)
> assert(!get_private(back)->bo->active_scanout);

I have just hit it even after the move: 
#5  0x00007f00322cb166 in sna_dri2_schedule_swap (client=0x55f38c48fa50, draw=0x55f38cc13f70, front=0x55f38c82d200, back=0x55f38ccc6fa0, target_msc=0x7ffd7ecee278, divisor=0, remainder=0, func=0x55f38ba5b7c0 <DRI2SwapEvent>, data=0x55f38cc13f70) at sna_dri2.c:3339
#6  0x000055f38ba5a90d in DRI2SwapBuffers (client=client@entry=0x55f38c48fa50, pDraw=0x55f38cc13f70, target_msc=<optimized out>, divisor=0, remainder=0, swap_target=swap_target@entry=0x7ffd7ecee318, func=0x55f38ba5b7c0 <DRI2SwapEvent>, data=0x55f38cc13f70) at dri2.c:1182
#7  0x000055f38ba5bd5a in ProcDRI2SwapBuffers (client=0x55f38c48fa50) at dri2ext.c:424
#8  ProcDRI2Dispatch (client=0x55f38c48fa50) at dri2ext.c:621
#9  0x000055f38b9344b4 in Dispatch () at dispatch.c:478

It took a day to hit it -- in the meantime, I disabled full debug, so no more info I have. It happened during switch of virtual desktops.
Comment 20 Chris Wilson 2019-07-26 22:54:26 UTC
(In reply to Jiri Slaby from comment #19)
> (In reply to Chris Wilson from comment #1)
> > assert(!get_private(back)->bo->active_scanout);
> 
> I have just hit it even after the move: 
> #5  0x00007f00322cb166 in sna_dri2_schedule_swap (client=0x55f38c48fa50,
> draw=0x55f38cc13f70, front=0x55f38c82d200, back=0x55f38ccc6fa0,
> target_msc=0x7ffd7ecee278, divisor=0, remainder=0, func=0x55f38ba5b7c0
> <DRI2SwapEvent>, data=0x55f38cc13f70) at sna_dri2.c:3339
> #6  0x000055f38ba5a90d in DRI2SwapBuffers
> (client=client@entry=0x55f38c48fa50, pDraw=0x55f38cc13f70,
> target_msc=<optimized out>, divisor=0, remainder=0,
> swap_target=swap_target@entry=0x7ffd7ecee318, func=0x55f38ba5b7c0
> <DRI2SwapEvent>, data=0x55f38cc13f70) at dri2.c:1182
> #7  0x000055f38ba5bd5a in ProcDRI2SwapBuffers (client=0x55f38c48fa50) at
> dri2ext.c:424
> #8  ProcDRI2Dispatch (client=0x55f38c48fa50) at dri2ext.c:621
> #9  0x000055f38b9344b4 in Dispatch () at dispatch.c:478
> 
> It took a day to hit it -- in the meantime, I disabled full debug, so no
> more info I have. It happened during switch of virtual desktops.

Bah. That's going to be nasty; it's definitely in the "should never happen" category. It's at times like that you wish there was a trace_printk.
Comment 21 Jiri Slaby 2019-07-28 15:50:42 UTC
A different one:
> #5  0x00007f44bb4f4db6 in sna_dri2_copy_region (draw=0x55a4f0317e20, region=0x55a4f0323970, dst=0x55a4efdc6240, src=0x55a4f0382f10) at sna_dri2.c:1442
> #6  0x000055a4ee7f9790 in dri2_copy_region (pDraw=pDraw@entry=0x55a4f0317e20, pRegion=pRegion@entry=0x55a4f0323970, pDest=0x55a4efdc6240, pSrc=0x55a4f0382f10) at dri2.c:899
> #7  0x000055a4ee7f9ec5 in DRI2CopyRegion (pDraw=0x55a4f0317e20, pRegion=0x55a4f0323970, dest=0, src=1) at dri2.c:940
> #8  0x000055a4ee7fc19e in ProcDRI2CopyRegion (client=0x55a4ef9ed180) at dri2ext.c:340
> #9  ProcDRI2Dispatch (client=0x55a4ef9ed180) at dri2ext.c:617
> #10 0x000055a4ee6d44b4 in Dispatch () at dispatch.c:478
> #11 0x000055a4ee6d84c8 in dix_main (argc=14, argv=0x7ffe7d968c98, envp=<optimized out>) at main.c:280
Comment 22 Martin Peres 2019-11-27 13:51:05 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/driver/xf86-video-intel/issues/166.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.