Summary: | KDE Konsole very slow scrolling with TearFree and QT graphics system native | ||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | xorg | Reporter: | Ildar Nurislamov <absorbb> | ||||||||||||||||||||||||||||||||||||||
Component: | Driver/intel | Assignee: | Chris Wilson <chris> | ||||||||||||||||||||||||||||||||||||||
Status: | RESOLVED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||||||||||||||||||||||||||||||||||||
Severity: | normal | ||||||||||||||||||||||||||||||||||||||||
Priority: | medium | ||||||||||||||||||||||||||||||||||||||||
Version: | unspecified | ||||||||||||||||||||||||||||||||||||||||
Hardware: | All | ||||||||||||||||||||||||||||||||||||||||
OS: | Linux (All) | ||||||||||||||||||||||||||||||||||||||||
Whiteboard: | |||||||||||||||||||||||||||||||||||||||||
i915 platform: | i915 features: | ||||||||||||||||||||||||||||||||||||||||
Attachments: |
|
Description
Ildar Nurislamov
2014-04-14 14:50:40 UTC
Disabling TearFree affects problem in funny way. Rendering of content become laggy. If you move scrollbar up and down few times very fast - content of windows spend few seconds after that trying to catch up and draw all the states. Problem is not affected by method i use to achive Vsync (driver or Kwin's one) There is no apparent lag here with TearFree enabled, cpu/gpu is well below maximum. This is ivb hd graphics (i.e. slightly less than HD2500), but with an uptodate kernel. I guess the first to do is to confirm this problem with xf86-video-intel.git and see if you can capture an --enable-debug=full of the slowdown. Xorg.0.log with driver from git and --enable-debug=full https://dl.dropboxusercontent.com/u/42927427/Xorg.tar.bz2 My actions: started kde started Konsole ran dmesg spent about 10 sec moving scroll bar up and down swiched VT and killed Xorg weird thing is that scrolling Konsole with --enable-debug=full was little bit faster. and it was only thing that was faster - other things are slow as hell. P.S.: commit http://cgit.freedesktop.org/xorg/driver/xf86-video-intel/commit/?id=cb35b397395cff8ff7f5155759f58c1840bfeeb5 breaks compiling with --enable-debug=full with error: sna_display.c:3176:43: error: 'struct <anonymous>' has no member named 'rotation' assert(cursor->rotation != sna->cursor.rotation); so i had to rollback this one So far, I have only seen a pattern of good behaviour in the glyph rendering. About the only missed improvement would be to remove overdrawn copies back from the TearFree buffers. Back to normal (still using xf86-video-intel.git), is there any abnormalities in CPU loading? Where does sudo perf top say all the time is being spent? Created attachment 97394 [details]
High iowait
So i grab scroll bar and start to move it like ∞ (to exclude disk io from reading scrollback buffer) very fast. after some time it gets very laggy till display stops to redraw completely (even mouse cursor hangs) after that i can witness spike of iowait for 1 core for a moment. sometimes to 100%.
it depends on window size. If Konsole size is small e.g. 869,653 - everything is fast - and i cannot reproduce problem no matter how hard i try.
Do you have sysrq enabled? It would be great to get a kernel stacktrace to see where that iowait is coming from (sysrq-T). If the cursor stops moving, we have serious problems, it suggests that X is being trapped in a throttle() waiting for the GPU. One thing to try for poor GPU performance is my branch http://cgit.freedesktop.org/~ickle/linux-2.6/ #bug77436 Created attachment 97406 [details]
SysRq-t
the last one on 246 sec - i think i've caught exact hang moment.
This is the stalled process: [ 246.122116] Xorg D ffff88021fa13800 0 1343 1298 0x00400004 [ 246.122117] ffff8802115bbac8 0000000000000086 ffff880212418210 ffff8802115bbfd8 [ 246.122119] 0000000000013800 0000000000013800 ffff880212418210 ffff88021fa140b8 [ 246.122120] ffff8802116d0000 0000000000000001 0000000000000000 ffff880212418210 [ 246.122121] Call Trace: [ 246.122124] [<ffffffff815cdfb8>] io_schedule+0x98/0x130 [ 246.122135] [<ffffffffa0117915>] __wait_seqno+0x315/0x580 [i915] [ 246.122156] [<ffffffffa0118a6a>] i915_gem_object_wait_rendering+0x3a/0x70 [i915] [ 246.122174] [<ffffffffa0118f3b>] i915_gem_object_finish_gpu.part.50+0xb/0x20 [i915] [ 246.122192] [<ffffffffa011d6b8>] i915_gem_object_set_cache_level+0x298/0x350 [i915] [ 246.122211] [<ffffffffa011d896>] i915_gem_object_pin_to_display_plane+0x46/0x190 [i915] [ 246.122230] [<ffffffffa013608f>] intel_pin_and_fence_fb_obj+0x9f/0x140 [i915] [ 246.122262] [<ffffffffa013617f>] intel_gen7_queue_flip+0x4f/0x300 [i915] [ 246.122295] [<ffffffffa013e819>] intel_crtc_page_flip+0x1d9/0x3c0 [i915] [ 246.122328] [<ffffffffa0096bde>] drm_mode_page_flip_ioctl+0x15e/0x2d0 [drm] [ 246.122346] [<ffffffffa008696d>] drm_ioctl+0x46d/0x570 [drm] [ 246.122350] [<ffffffff811a23f7>] do_vfs_ioctl+0x2d7/0x4b0 [ 246.122356] [<ffffffff811a2650>] SyS_ioctl+0x80/0xa0 [ 246.122359] [<ffffffff815d807d>] system_call_fastpath+0x1a/0x1f [ 246.122361] [<00007fd174c481e7>] 0x7fd174c481e6 As i see you've made the commit to your kernel branch to fix this. I will wait for a fix in mainline kernel. Not sure about compiling 2.6 kernel for OpenSuse 13.1. Weird though if this caused by page flipping issues why only Qt application affected (kwrite, okular appeared to be affected too, when dolphin is fine) Mozilla Firefox never hangs and scrolls very fast even in fullscreen mode on text and images heavy pages. even Java draws Netbeans fast. Maybe that is the reason why QT uses raster backend by default. Right, I don't have an explanation as to how it becomes backlogged, but that seems to be the culprit behind the bad behaviour here. This may help improve things in the meantime: diff --git a/src/sna/kgem.c b/src/sna/kgem.c index 0b8af54..30adeb1 100644 --- a/src/sna/kgem.c +++ b/src/sna/kgem.c @@ -4295,6 +4295,11 @@ struct kgem_bo *kgem_create_2d(struct kgem *kgem, bo = __kgem_bo_create_as_display(kgem, size, tiling, pitch); if (bo) return bo; + + if (bucket >= NUM_CACHE_BUCKETS) + goto large_inactive; + else + goto search_inactive; } if (bucket >= NUM_CACHE_BUCKETS) { tried. caused segfault at X start: [ 4940.341] (EE) Backtrace: [ 4940.341] (EE) 0: /usr/bin/Xorg (xorg_backtrace+0x48) [0x58d6c8] [ 4940.341] (EE) 1: /usr/bin/Xorg (0x400000+0x1916b9) [0x5916b9] [ 4940.341] (EE) 2: /lib64/libc.so.6 (0x7fa9e2e3a000+0x358d0) [0x7fa9e2e6f8d0] [ 4940.341] (EE) 3: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7fa9dfed4000+0x27840) [0x7fa9dfefb840] [ 4940.341] (EE) 4: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7fa9dfed4000+0x3ea8c) [0x7fa9dff12a8c] [ 4940.341] (EE) 5: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7fa9dfed4000+0x6a755) [0x7fa9dff3e755] [ 4940.341] (EE) 6: /usr/bin/Xorg (0x400000+0xafabe) [0x4afabe] [ 4940.341] (EE) 7: /usr/bin/Xorg (0x400000+0x41436) [0x441436] [ 4940.341] (EE) 8: /lib64/libc.so.6 (__libc_start_main+0xf5) [0x7fa9e2e5bbe5] [ 4940.342] (EE) 9: /usr/bin/Xorg (0x400000+0x2ca91) [0x42ca91] [ 4940.342] (EE) [ 4940.342] (EE) Segmentation fault at address 0x6c Oops, the patch was silly and overlooked gcc warning about a valid issue. Anyway I've applied the couple of ideas discussed today to xf86-video-intel.git, and would very much appreciate some feedback. There is one other thing to be wary of, though I would expect it to affect every setup, Xorg-1.15.0 + mesa-10.1 have a bug in that kwin tries to use a 8x msaa framebuffer. This causes everything to be much slower than normal, trigger GPU hangs and cause corruption. Thanks. I hope tomorrow i will have time to test your last fixes. I prefer to use Xrender for compositing in kwin. Don't know why they call it "software". Because with your driver it seems to be 100% hardware. It is faster(in gtkperf) and less CPU hungry than opengl. And how is it "software" if i can cover window with 3d game with another transparend window and still have 60fps? Tested latest from git. Didn't noticed any changes. Can you please look out for more iowaits? 'sudo perf record -g -a -e i915:i915_gem_request_wait_begin' should help And :( for no change. I thought there would have at least been some small improvement. Created attachment 97446 [details]
perf record
Maybe it takes longer to cause a freeze now. This perf record contain a lot of scrolling and a spike of 100% iowait in the end.
And one more thing that i didnt tell before - After this freeze it draws a lot of frames very fast - like all the frames that it should have drawn.
Apologies, should have mentioned the output of perf record has to be decoded on the same machine for symbol resolution (it is much simpler that way). Please run it through "perf report" and hope that it managed to unwind through the ioctl. Created attachment 97454 [details]
perf.stdio
hm. not much of symbols info it could find in my system.
Annoying, it appears that perf has dropped all the relevant frames to figure out if I need to be worried, or to suggest what is going on. Do you still see the high iowaits? Do you think you could hit one with a sysrq-T again? Created attachment 97492 [details]
SysRq-t
freeze was long enough to make 2 SysRq-t prints
Hmm. [ 253.551569] Xorg D ffff88021fa13800 0 1348 1302 0x00400004 [ 253.551571] ffff880036459a68 0000000000000082 ffff8802143d2050 ffff880036459fd8 [ 253.551572] 0000000000013800 0000000000013800 ffff8802143d2050 ffff88021fa140b8 [ 253.551573] ffff88021165c000 0000000000000001 0000000000000000 ffff8802143d2050 [ 253.551575] Call Trace: [ 253.551577] [<ffffffff815cdfb8>] io_schedule+0x98/0x130 [ 253.551589] [<ffffffffa0117915>] __wait_seqno+0x315/0x580 [i915] [ 253.551606] [<ffffffffa0118a6a>] i915_gem_object_wait_rendering+0x3a/0x70 [i915] [ 253.551622] [<ffffffffa0118f3b>] i915_gem_object_finish_gpu.part.50+0xb/0x20 [i915] [ 253.551638] [<ffffffffa011bef8>] i915_vma_unbind+0x1d8/0x2b0 [i915] [ 253.551657] [<ffffffffa011ca5f>] i915_gem_object_pin+0x17f/0x750 [i915] [ 253.551676] [<ffffffffa011d917>] i915_gem_object_pin_to_display_plane+0xc7/0x190 [i915] [ 253.551696] [<ffffffffa013608f>] intel_pin_and_fence_fb_obj+0x9f/0x140 [i915] [ 253.551726] [<ffffffffa013617f>] intel_gen7_queue_flip+0x4f/0x300 [i915] [ 253.551756] [<ffffffffa013e819>] intel_crtc_page_flip+0x1d9/0x3c0 [i915] [ 253.551789] [<ffffffffa0096bde>] drm_mode_page_flip_ioctl+0x15e/0x2d0 [drm] [ 253.551807] [<ffffffffa008696d>] drm_ioctl+0x46d/0x570 [drm] [ 253.551811] [<ffffffff811a23f7>] do_vfs_ioctl+0x2d7/0x4b0 [ 253.551817] [<ffffffff811a2650>] SyS_ioctl+0x80/0xa0 [ 253.551820] [<ffffffff815d807d>] system_call_fastpath+0x1a/0x1f [ 253.551822] [<00007fe5cdca91e7>] 0x7fe5cdca91e6 [ 257.990590] Xorg D ffff88021fa13800 0 1348 1302 0x00400004 [ 257.990591] ffff880036459a68 0000000000000082 ffff8802143d2050 ffff880036459fd8 [ 257.990593] 0000000000013800 0000000000013800 ffff8802143d2050 ffff88021fa140b8 [ 257.990594] ffff88021165c000 0000000000000001 0000000000000000 ffff8802143d2050 [ 257.990596] Call Trace: [ 257.990598] [<ffffffff815cdfb8>] io_schedule+0x98/0x130 [ 257.990614] [<ffffffffa0117915>] __wait_seqno+0x315/0x580 [i915] [ 257.990634] [<ffffffffa0118a6a>] i915_gem_object_wait_rendering+0x3a/0x70 [i915] [ 257.990652] [<ffffffffa0118f3b>] i915_gem_object_finish_gpu.part.50+0xb/0x20 [i915] [ 257.990671] [<ffffffffa011bef8>] i915_vma_unbind+0x1d8/0x2b0 [i915] [ 257.990693] [<ffffffffa011ca5f>] i915_gem_object_pin+0x17f/0x750 [i915] [ 257.990712] [<ffffffffa011d917>] i915_gem_object_pin_to_display_plane+0xc7/0x190 [i915] [ 257.990735] [<ffffffffa013608f>] intel_pin_and_fence_fb_obj+0x9f/0x140 [i915] [ 257.990767] [<ffffffffa013617f>] intel_gen7_queue_flip+0x4f/0x300 [i915] [ 257.990802] [<ffffffffa013e819>] intel_crtc_page_flip+0x1d9/0x3c0 [i915] [ 257.990836] [<ffffffffa0096bde>] drm_mode_page_flip_ioctl+0x15e/0x2d0 [drm] [ 257.990859] [<ffffffffa008696d>] drm_ioctl+0x46d/0x570 [drm] [ 257.990864] [<ffffffff811a23f7>] do_vfs_ioctl+0x2d7/0x4b0 [ 257.990870] [<ffffffff811a2650>] SyS_ioctl+0x80/0xa0 [ 257.990872] [<ffffffff815d807d>] system_call_fastpath+0x1a/0x1f [ 257.990875] [<00007fe5cdca91e7>] 0x7fe5cdca91e6 Do you have VT'd (iommu) enabled? Otherwise I expect it had to move the fb into the mappable portion. I still haven't explained how we end up with the backlog, but perhaps this will help again with alleviating the symptoms: diff --git a/src/sna/kgem.c b/src/sna/kgem.c index a840ed6..7ba3ac9 100644 --- a/src/sna/kgem.c +++ b/src/sna/kgem.c @@ -4194,6 +4194,12 @@ static void __kgem_bo_make_scanout(struct kgem *kgem, do_ioctl(kgem->fd, DRM_IOCTL_MODE_ADDFB, &arg) == 0) { bo->scanout = true; bo->delta = arg.fb_id; + + /* Pre-emptively move the object into the mappable + * portion to avoid rebinding later. */ + bo->map__gtt = __kgem_bo_map__gtt(kgem, bo); + if (bo->map__gtt) + *bo->map__gtt = 0; } } diff --git a/src/sna/kgem.c b/src/sna/kgem.c index a840ed6..24841dd 100644 --- a/src/sna/kgem.c +++ b/src/sna/kgem.c @@ -4194,6 +4194,12 @@ static void __kgem_bo_make_scanout(struct kgem *kgem, do_ioctl(kgem->fd, DRM_IOCTL_MODE_ADDFB, &arg) == 0) { bo->scanout = true; bo->delta = arg.fb_id; + + /* Pre-emptively move the object into the mappable + * portion to avoid rebinding later. */ + bo->map__gtt = __kgem_bo_map__gtt(kgem, bo); + if (bo->map__gtt) + *(uint32_t *)bo->map__gtt = 0; } } Created attachment 97499 [details]
SysRq-t with last patch
I cant get mouse cursor freeze after the last patch. and my hand is tired. And not sure about iowaits anymore. Sometimes they happen and i tried to catch them with SysRt-t but something changed in those trace or i failed to catch the right moment.
Screen still stops refreshing. Mouse cursor moves fine and CPU load looks ok most of the time. But everything is still as laggy as before.
My CPU Core™ i5-3450 doesnt support VT-d.
(In reply to comment #28) > Screen still stops refreshing. Mouse cursor moves fine and CPU load looks ok > most of the time. But everything is still as laggy as before. Good. That is actually what I expect. We haven't done anything to prevent the backlog, so eventually we have to service all that rendering debt and stop processing new commands. However, whereas previously that happened at an unexpected point and blocked the Xserver, it now happens more or less under our control and we allow X to continue processing requests and update the cursor. Thanks for all your help so far, I need to think how we can get a clear picture of what is causing the backlog. Here's something you can try. Roll back a few commits in xf86-video-intel.git so that you can easily trigger the massive iowaits again. Then in the middle of a freeze, "echo 1 | sudo tee /sys/kernel/debug/dri/0/i915_wedged" and please attach /sys/class/drm/card0/error Created attachment 97501 [details]
/sys/class/drm/card0/error
first 3 i did with VT switch and execute command.
last 3 - just pressed enter in the same Konsole that i scrolled.
not sure about catching exact moment. it seemed like this command unfreeze the GPU.
Hm. I mentioned at the beginning that "Problem is not affected by method i use to achieve Vsync (driver or Kwin's one)" This is not the case anymore. Or rather i was wrong that time. if i disable TearFree and enable OpenGL 3.1 in kwin with vsync no problem happens. But cursor flickers annoyingly and black boxes appears randomly on screen - mesa bug that you mentioned? Also i made few roll backs to previos revisions till 15 Oct to check whether it was a regression. Nope. TearFree always worked this way. It's me that started using it few month ago. OMG. My previous comment was not correct. i forgot to come back to HEAD version. So that comment applies to 15 Oct version. I believe this should be fixed by commit fdc28205d590d560a6f3886189ef974d1d00f2cd Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Apr 18 11:34:07 2014 +0100 sna: Always flush the shadow scanout When the shadow scanout is active, make sure we call the BlockHandler every time, not just when we about to sleep. This is in case we are swamped by clients trying to render and forgo the important step of ensuring that their output reaches the screen. Reported-by: Ildar Nurislamov <absorbb@gmail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77436 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> It is sad. But nothing changed for me. I've made some additional tests with different modes. Maybe this can shed some light: 1. TearFree true, kwin Xrender - lag that grows in time. it takes seconds to draw 1 frame after some time of intense scrolling 2. TearFree false, kwin Xrender - smooth. kinda. there is a delay in drawing actual screen content that grows in time. so after i stop doing anything my screens keeps draw all the frames until it catches up. 3. TearFree false, kwin OpenGL 3.1, Vsync Full screen repaint - same as 1 but lag between redraws grows with slower speed. 4. TearFree false, kwin OpenGL 3.1, Vsync None - same as 1 but lag between redraws grows with slower speed. Something wrong with the way kde applications draws themselves. Even kwin can't handle it. Or maybe this is kwin's bug. Im going to test with different WM... the same with Icewm. i dont know if this problem deserve so much attention. QT doesn't use xrender by default anyway. Of course it is surprising how application produces frames without caring about whether they was drawn on screen and no proper frame skipping takes place. I believe that app always must try to show its actual state. The most surprising thing for me is - where all this queue of frames to be drawn is stored. in 2. TearFree false, kwin Xrender mode it takes seconds or even dozens of second to catch up to actual state. And all the frames are drawn. Where was they stored .. so much data. Offtopic: i've noticed sna performance regression with 3.14 kernel. from 1.18s in gtkperf to 1.88s. should i report bug or this is smth in OpenSuSE? I'll take another pass at seeing if I can reproduce the slow rendering. You did lead me to find a real issue with TearFree, but it appears to have been a separate issue entirely. :( (In reply to comment #36) > Offtopic: i've noticed sna performance regression with 3.14 kernel. from > 1.18s in gtkperf to 1.88s. should i report bug or this is smth in OpenSuSE? Please do file a bug report. (When in doubt, always file a new bug report.) Try to give the simplest (but as detailed as possible) steps to reproduce and include an Xorg.0.log so that I can check hw and sw versions later. Try to test on bigger screen resolution. i have 1920x1080. As i sad before problem happens only with big window size. Maybe your GPU need 2560x1600 or 4k to reach the limits. I will be able to assist you and file a bug report at Monday. Because i have Linux and Core I5 at my PC at work. I haven't yet reproduced the symptoms of the render backlog spiralling out of control (went up to 1366x768 + 2560+1440 with a lowly IVB celeron). But then I realised I am testing on my tweaked kernels and should try a vanilla 3.13/3.14... Can you reproduce problem now? Tested 2.99.911-91-gb259cbf - no difference. Compiled kernel from your bug77436 branch - 3.14.0-6.ga68bc7c-desktop+ #1 SMP PREEMPT ( i hope i did it right way ) - no difference for me. and performance regression described https://bugs.freedesktop.org/show_bug.cgi?id=77712 also applies to it I've been playing with the stock Ubuntu 3.13.0-24 kernel to see if anything unusual happens, no luck so far. I'm still puzzling how we end up a rendering backlog here. As always if you can check the latest git updates and see if the damage reduction for TearFree helps, that would be useful. i build updated git version every working day since then. especially after tearfree related commits. no changes was noticed since. btw is this possible rightnow to use DRI3? It's possible -- just needs some not upstreamed code. I am not yet impressed by the Present implementation, it is not a replacement for DRI2 yet. But DRI3, modulo a few protocol bugs!, is ready. Ok, DRI3/Present support is available upstream now. As are more refinements to TearFree. If you do have the opportunity, I would like to now how it all fares (DRI2 and/or DRI3, with/without TearFree). I saw all this commits and built driver after every bunch of them. No changes was noticed. [281306.077] (II) intel(0): [DRI2] Setup complete [281306.077] (II) intel(0): [DRI2] DRI driver: i965 [281306.077] (II) intel(0): [DRI2] VDPAU driver: i965 [281306.077] (II) intel(0): direct rendering: DRI2 DRI3 enabled [281306.090] (II) GLX: Initialized DRI2 GL provider for screen 0 I believe this means that DRI3 is enabled. I did more testing today: AccelMethod "uxa" and AccelMethod "none" still doing great here. TearFree "false" with sna is still produce huge delay in drawing actual frames. That is why i believe problem is not TearFree related. It just appears differently and without TearFree it is much more weird. Low framerate and hangs are common things but delay in drawing actual screen content while doing it with a quite smooth framerate is something i experience the first time in my entire PC usage practice. Some new observations: 1. Lag depends of amount of text drawn in Konsole. Maybe 2. Running Diablo 3 in parallel improves the situation. Amount of improvements depends on scene that opened in Diablo. If it is character menu - scrolling is almost smooth, if it is actual game results are mixed. Hmm, that is starting to sound like cpufreq - glyph rendering should be limited to how fast we can construct the GPU commands. I don't think we played around with cpufreq or pstate in this bug yet... Maybe. but not entirely. I use performance governor since then. And i don't think that this is glyph rendering performance problem even if it depends on amount of symbols (which i'm not entirely sure of). Other applications like gtkperf draws and scroll text very fast even in maximized window. What Diablo may be doing is maxing out GPU frequency (Do your GPU use higher freq in 3D mode?). So in character menu we have max frequency but very low GPU usage - and scrolling become faster. In actual game GPU usage is very high so even GPU freq is still at max value - this doesn't help much. Im going to check out TearFree off with Diablo.. If you do suspect that the GPU is not reclocking as required, you can get my monitor from http://cgit.freedesktop.org/~ickle/linux-2.6/ (#master is current) and the intel-gpu-overlay from http://cgit.freedesktop.org/xorg/app/intel-gpu-tools/ which provide an overlay showing what the GPU is doing. Same story - with diablo it is hardly possible to achieve delay. I came out with different theory. what if Diablo actually makes everything slower (as it should). And Konsole manages to draw much less fps so frame buffers or queues (i don't know) doesn't overrun. Yes i have no idea how driver works. But behavior with TearFree disabled make me believe that there must be some queue somewhere and that this queue can be insanely long like few seconds of drawing at ~20fps (what i think is weird. before i though that there is tripple buffer and nothing more) Or maybe this is not frames queue in the driver but rather X protocol commands queue in the X server. And QT is the one who stupidly overwhelm server with commands without much care of produced result. QT doesn't use xrender backend by default. It may be abandoned and broken. damn. this theory doesn't explain why uxa is doing well. Created attachment 100808 [details]
intel-gpu-overlay output + intel_gpu_top
tried your kernel. But intel-gpu-overlay 1.6 says that GPU freq not found in debugfs.
Anyway i caught some unusual output from dmesg:
[ 89.523354] ------------[ cut here ]------------
[ 89.523362] WARNING: CPU: 2 PID: 2623 at drivers/gpu/drm/i915/intel_display.c:9494 intel_check_page_flip+0x117/0x130()
[ 89.523363] Kicking stuck page flip: queued at 5336, now 5341
[ 89.523364] Modules linked in: fuse af_packet nls_iso8859_1 nls_cp437 vfat fat snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal snd_hda_intel intel_powerclamp snd_hda_controller coretemp snd_hda_codec snd_hwdep snd_pcm kvm snd_seq snd_timer snd_seq_device keucr(C) crct10dif_pclmul crc32_pclmul crc32c_intel snd ghash_clmulni_intel usb_storage aesni_intel ablk_helper iTCO_wdt cryptd r8169 iTCO_vendor_support lrw gf128mul mei_me glue_helper mii aes_x86_64 pcspkr soundcore mei sr_mod serio_raw cdrom lpc_ich shpchp mfd_core i2c_i801 tpm_tis wmi tpm battery sg dm_mod efivarfs fan thermal processor scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_dh_hp_sw scsi_dh
[ 89.523394] CPU: 2 PID: 2623 Comm: konsole Tainted: G C 3.15.0-rc7-2.g0de0f93-desktop+ #1
[ 89.523395] Hardware name: Hewlett-Packard HP Pro 3500 Series/2ABF, BIOS 8.13 01/11/2013
[ 89.523396] 0000000000000009 ffff88021fb03d80 ffffffff816e80cc ffff88021fb03dc8
[ 89.523398] ffff88021fb03db8 ffffffff8105b02d ffff8802124f9000 ffff88021247c000
[ 89.523399] ffff88021247c198 0000000000000086 0000000000000009 ffff88021fb03e18
[ 89.523401] Call Trace:
[ 89.523402] <IRQ> [<ffffffff816e80cc>] dump_stack+0x4d/0x6f
[ 89.523409] [<ffffffff8105b02d>] warn_slowpath_common+0x7d/0xa0
[ 89.523410] [<ffffffff8105b09c>] warn_slowpath_fmt+0x4c/0x50
[ 89.523414] [<ffffffff81493309>] ? __intel_pageflip_stall_check+0xc9/0x140
[ 89.523416] [<ffffffff814a1e17>] intel_check_page_flip+0x117/0x130
[ 89.523418] [<ffffffff8147a06b>] ironlake_irq_handler+0x15b/0xdf0
[ 89.523420] [<ffffffff8153e76c>] ? __usb_hcd_giveback_urb+0x9c/0x100
[ 89.523423] [<ffffffff810b793e>] handle_irq_event_percpu+0x3e/0x1f0
[ 89.523424] [<ffffffff810b7b33>] handle_irq_event+0x43/0x60
[ 89.523426] [<ffffffff810ba811>] handle_edge_irq+0x91/0x160
[ 89.523428] [<ffffffff81004d61>] handle_irq+0x21/0x30
[ 89.523431] [<ffffffff816fb93f>] do_IRQ+0x4f/0xf0
[ 89.523433] [<ffffffff816f1b6d>] common_interrupt+0x6d/0x6d
[ 89.523434] <EOI> [<ffffffff816f972d>] ? system_call_fastpath+0x1a/0x1f
[ 89.523436] ---[ end trace ff48f125529813ef ]---
And made a picture of gpu_top and gpu-overlay. Xorg waits showed some extreme numbers for a moments but was 300-500ms most of time.
Ugh. That's really, really bad. It says the the render engine on the GPU is stalling for ages upon ages waiting for memory access. I suspect the reason why UXA is not so bad in this case is that it simply doesn't use the GPU that much... If you repeat this experiment with Option "AccelMethod" "blt", how does it fare? intel-gpu-overlay from i-g-t-1.7 should show the frequencies with the recent kernels. It reminds me of severe TLB miss: Can you please try diff --git a/src/sna/gen7_render.c b/src/sna/gen7_render.c index ce137bd..b08dc5c 100644 --- a/src/sna/gen7_render.c +++ b/src/sna/gen7_render.c @@ -1295,6 +1295,11 @@ gen7_bind_bo(struct sna *sna, return offset * sizeof(uint32_t); } + if (bo->tiling == I915_TILING_NONE && bo->pitch >= 4096) { + ErrorF("slow %s access to linear bo with pitch %d (handle=%d, size=%d), scanout? %d, io? %d\n", + is_dst ? "write" : "read", bo->pitch, bo->handle, kgem_bo_size(bo), bo->scanout, bo->io); + } + offset = sna->kgem.surface -= sizeof(struct gen7_surface_state) / sizeof(uint32_t); ss = sna->kgem.batch + offset; and see if that warns about anything. Created attachment 100819 [details]
intel-gpu-overlay 1.7 output + intel_gpu_top + Diablo 3
I tests with my kernel now 3.14.4. Yours one hangs PC completely when Diablo 3 is running.
Frequency is maxed out when i scroll. with Diablo it is maxed out always. So 1100Mhz in both cases. Everything else seems differently.
I didnt see any warnings after your patch in dmesg or Xorg.log. Should i compile debug version?
(In reply to comment #56) > Created attachment 100819 [details] > intel-gpu-overlay 1.7 output + intel_gpu_top + Diablo 3 > > I tests with my kernel now 3.14.4. Yours one hangs PC completely when Diablo > 3 is running. That is scary. Would be nice to know if it oopsed. > Frequency is maxed out when i scroll. with Diablo it is maxed out always. So > 1100Mhz in both cases. Everything else seems differently. > > I didnt see any warnings after your patch in dmesg or Xorg.log. Should i > compile debug version? No. It will print out into the logfile (and to stderr) if it spots anything egregious. So back to the drawing board. (In reply to comment #57) > That is scary. Would be nice to know if it oopsed. Didn't find anything about that in /var/log/messages > No. It will print out into the logfile (and to stderr) if it spots anything > egregious. So back to the drawing board. And where is that logfile. Cant find that messages anywhere. i assume condition never matched. Continue to play with your kernel. last time when i tried to run it - it couldn't start X because i915 failed to open VT7 o_O It used to be in /var/log/Xorg.0.log. If you are using systemd there is an arcane spell to retrieve it. The VT bug is not new, the only thing that is new is that kernel detects it and X dies instead of it hanging indefinitely. Smth new int Xorg.0.log: (EE) [mi] EQ overflowing. Additional events will be discarded until existing events are processed. (EE) (EE) Backtrace: (EE) 0: /usr/bin/Xorg (xorg_backtrace+0x48) [0x58dd48] (EE) 1: /usr/bin/Xorg (mieqEnqueue+0x22b) [0x56fc5b] (EE) 2: /usr/bin/Xorg (QueuePointerEvents+0x52) [0x454632] (EE) 3: /usr/lib64/xorg/modules/input/evdev_drv.so (0x7fc777f9b000+0x54c0) [0x7fc777fa04c0] (EE) 4: /usr/lib64/xorg/modules/input/evdev_drv.so (0x7fc777f9b000+0x593a) [0x7fc777fa093a] (EE) 5: /usr/bin/Xorg (0x400000+0x7aa58) [0x47aa58] (EE) 6: /usr/bin/Xorg (0x400000+0xa3800) [0x4a3800] (EE) 7: /lib64/libc.so.6 (0x7fc77f049000+0x358d0) [0x7fc77f07e8d0] (EE) 8: /lib64/libc.so.6 (ioctl+0x7) [0x7fc77f1291e7] (EE) 9: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7fc77baa7000+0x232dc) [0x7fc77baca2dc] (EE) 10: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7fc77baa7000+0x2ba3c) [0x7fc77bad2a3c] (EE) 11: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7fc77baa7000+0x3b7bd) [0x7fc77bae27bd] (EE) 12: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7fc77baa7000+0x44d4b) [0x7fc77baebd4b] (EE) 13: /usr/bin/Xorg (0x400000+0x17ea4b) [0x57ea4b] (EE) 14: /usr/bin/Xorg (0x400000+0xc8a35) [0x4c8a35] (EE) 15: /usr/bin/Xorg (0x400000+0x3abd6) [0x43abd6] (EE) 16: /usr/bin/Xorg (0x400000+0x3d6ce) [0x43d6ce] (EE) 17: /usr/bin/Xorg (0x400000+0x414ea) [0x4414ea] (EE) 18: /lib64/libc.so.6 (__libc_start_main+0xf5) [0x7fc77f06abe5] (EE) 19: /usr/bin/Xorg (0x400000+0x2cb41) [0x42cb41] (EE) (EE) [mi] These backtraces from mieqEnqueue may point to a culprit higher up the stack. (EE) [mi] mieq is *NOT* the cause. It is a victim. (EE) [mi] EQ overflow continuing. 100 events have been dropped. Here we are: [ 852.533] slow read access to linear bo with pitch 5464 (handle=697, size=1642496), scanout? 0, io? 0 [ 852.551] slow read access to linear bo with pitch 5464 (handle=697, size=1642496), scanout? 0, io? 0 [ 852.567] slow read access to linear bo with pitch 5464 (handle=697, size=1642496), scanout? 0, io? 0 The thing is these messages never appear when i scroll in Konsole - that is why i didnt see them. Running OpenSuSE YaST triggers them, but it works fine. (In reply to comment #54) > If you repeat this experiment with Option "AccelMethod" "blt", how does it > fare? No bug with Option "AccelMethod" "blt". But drawing is broking at some places. Gtkpeft cant even draw results properly. Hmm, did not see gtkperf fail, but I did see some unhappiness with PutImage, fixed by commit 53c1faa5093f8a0c969e3ebe9f27ee1ce5137e14 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Jun 11 07:59:41 2014 +0100 sna: Mark the GPU as all damaged when discarding CPU bo during uploads Do you have any more failures with BLT? Created attachment 100870 [details] Screen garbage with blt It seems that all GTK applications affected except Firefox (which is barely GTK this days): gtkperf, mysql-workbench, GIMP If i use Xrender for composting - area where left panel drops shadow sometimes forget to update. Also http://ie.microsoft.com/testdrive/Performance/ParticleAcceleration/ results dropped from 24-25fps to 14-15 in Firefox 33.0a1 (In reply to comment #64) > Also http://ie.microsoft.com/testdrive/Performance/ParticleAcceleration/ > results dropped from 24-25fps to 14-15 in Firefox 33.0a1 As a result of BLT, or with normal acceleration? BLT is a fallback acceleration in case where there is no render acceleration for that GPU. (But it also provides acceleration for legacy X drawing using the dedicated hw.) (In reply to comment #65) > As a result of BLT, or with normal acceleration? As a result of BLT Btw. Lastest commits made Xorg crash with BLT: [ 74651.198] (EE) Backtrace: [ 74651.198] (EE) 0: /usr/bin/Xorg (xorg_backtrace+0x48) [0x58dd48] [ 74651.198] (EE) 1: /usr/bin/Xorg (0x400000+0x191d89) [0x591d89] [ 74651.198] (EE) 2: /lib64/libc.so.6 (0x7f700c256000+0x358d0) [0x7f700c28b8d0] [ 74651.198] (EE) 3: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f7008cb4000+0x5ce6a) [0x7f7008d10e6a] [ 74651.198] (EE) 4: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f7008cb4000+0x452c2) [0x7f7008cf92c2] [ 74651.198] (EE) 5: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f7008cb4000+0x45b59) [0x7f7008cf9b59] [ 74651.198] (EE) 6: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f7008cb4000+0x5abb3) [0x7f7008d0ebb3] [ 74651.198] (EE) 7: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f7008cb4000+0x5ef37) [0x7f7008d12f37] [ 74651.198] (EE) 8: /usr/bin/Xorg (0x400000+0x117fd2) [0x517fd2] [ 74651.198] (EE) 9: /usr/bin/Xorg (0x400000+0x10e1ea) [0x50e1ea] [ 74651.198] (EE) 10: /usr/bin/Xorg (0x400000+0x3d6ce) [0x43d6ce] [ 74651.198] (EE) 11: /usr/bin/Xorg (0x400000+0x414ea) [0x4414ea] [ 74651.198] (EE) 12: /lib64/libc.so.6 (__libc_start_main+0xf5) [0x7f700c277be5] [ 74651.198] (EE) 13: /usr/bin/Xorg (0x400000+0x2cb41) [0x42cb41] [ 74651.198] (EE) [ 74651.198] (EE) Segmentation fault at address 0x82c000 [ 74651.198] (EE) Fatal server error: [ 74651.198] (EE) Caught signal 11 (Segmentation fault). Server aborting Exciting. Please run addr2line -e /usr/lib64/xorg/modules/drivers/intel_drv.so -i 0x5ce6a 0x452c2 /home/absorbb/Documents/xf86-video-intel/src/sna/sna_blt.c:3796 (discriminator 1) /home/absorbb/Documents/xf86-video-intel/src/sna/sna_accel.c:3295 If I understand correctly, the only way that could happen was if the damage was empty. Hmm. I couldn't see a way we would make that mistake, it would cause various issues (but shouldn't actually affect rendering) as the assumption is that no damage is tracked by the NULL pointer. This is hopefully the cause of your crash: commit a90160dcecf0a3df21a04b4f467e660f69ddae54 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Jun 11 15:40:25 2014 +0100 sna: Update damage pointer after the implicit reduction sna_damage_contains_box() routine implicitly reduces the damage before performing its check. This may alter and even destroy the damage entry, so pass in the handle so that it can be updated correctly. References: https://bugs.freedesktop.org/show_bug.cgi?id=77436 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (In reply to comment #70) > This is hopefully the cause of your crash: Yep. No crashes now. This KDE Konsole bug was also fixed. Until this commit: sna: Remove debugging hunk Now i have it again :( Haha! Ok, I have no idea yet, but at least we have narrowed down the exact code path that is troublesome! I've rolled back to a90160dcecf0a3df21a04b4f467e660f69ddae54. And can confirm that it works fine. No significant slowdowns noticed. Maybe gtkperf is slower by 1-2% - hard to say. We are going to have 4-day holidays here. Next time i be able to test smth at Monday. I think we will want to test next: diff --git a/src/sna/gen7_render.c b/src/sna/gen7_render.c index 3fcca3e..ad6ea32 100644 --- a/src/sna/gen7_render.c +++ b/src/sna/gen7_render.c @@ -2965,7 +2965,8 @@ fallback_blt: if (too_large(extents.x2-extents.x1, extents.y2-extents.y1)) goto fallback_blt; - if (can_switch_to_blt(sna, dst_bo, flags) && + if ((sna->render_state.gen7.gt == 0 || + can_switch_to_blt(sna, dst_bo, flags)) && sna_blt_copy_boxes(sna, alu, src_bo, src_dx, src_dy, dst_bo, dst_dx, dst_dy, Sorry, that should be diff --git a/src/sna/gen7_render.c b/src/sna/gen7_render.c index 3fcca3e..d1115b9 100644 --- a/src/sna/gen7_render.c +++ b/src/sna/gen7_render.c @@ -2965,7 +2965,8 @@ fallback_blt: if (too_large(extents.x2-extents.x1, extents.y2-extents.y1)) goto fallback_blt; - if (can_switch_to_blt(sna, dst_bo, flags) && + if ((sna->render_state.gen7.gt < 2 || + can_switch_to_blt(sna, dst_bo, flags)) && sna_blt_copy_boxes(sna, alu, src_bo, src_dx, src_dy, dst_bo, dst_dx, dst_dy, I pushed a variant as commit 1beaa980ea6a9617f7dd4dc87b881c37cc7a277b Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Jun 11 16:41:22 2014 +0100 sna/gen6+: On small GT avoid the intermediate copy for overlaps If the memory bw is constrained on the GPU avoid doing the 2-pass copy for overlaps on the render ring, and do the single pass slower BLT copy instead - as since it has to transfer less data it will be faster. Reported-by: Ildar Nurislamov <absorbb@gmail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77436 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> If that makes scrolling fluid, great. I'd like to dig more into why it slowed down so much though. My guess is due to cache thrashing on the GPU and memory bandwidth, but I don't feel that adequately explains how far behind the GPU got in the command stream, nor how it interacted with a loaded system. Current git version has the bug. Drat. I thought I had understood the failure path. :| If you try: diff --git a/src/sna/gen7_render.c b/src/sna/gen7_render.c index 3fcca3e..f0e68a2 100644 --- a/src/sna/gen7_render.c +++ b/src/sna/gen7_render.c @@ -2858,6 +2858,8 @@ prefer_blt_copy(struct sna *sna, struct kgem_bo *dst_bo, unsigned flags) { + return true; + if (sna->kgem.mode == KGEM_BLT) return true; does that prevent the lag? And in particular, diff --git a/src/sna/gen7_render.c b/src/sna/gen7_render.c index 3fcca3e..41950ed 100644 --- a/src/sna/gen7_render.c +++ b/src/sna/gen7_render.c @@ -2965,8 +2965,7 @@ fallback_blt: if (too_large(extents.x2-extents.x1, extents.y2-extents.y1)) goto fallback_blt; - if (can_switch_to_blt(sna, dst_bo, flags) && - sna_blt_copy_boxes(sna, alu, + if (sna_blt_copy_boxes(sna, alu, src_bo, src_dx, src_dy, dst_bo, dst_dx, dst_dy, dst->drawable.bitsPerPixel, Nope. 1nd alone and 1nd and 2nd together don't help. Actually, it is not quite as cut and dried as I earlier thought. If we force the fallback there, we move the entire pixmap to the CPU which encourages the next set of operations to be done on the CPU as well. So rather than just the copy operation being suspect, we prevent loading the GPU at all. Sigh. What happens with? diff --git a/src/sna/sna_glyphs.c b/src/sna/sna_glyphs.c index 5d8dd78..df9e24c 100644 --- a/src/sna/sna_glyphs.c +++ b/src/sna/sna_glyphs.c @@ -67,7 +67,7 @@ #include "sna_render_inline.h" #include "fb/fbpict.h" -#define FALLBACK 0 +#define FALLBACK 1 #define NO_GLYPH_CACHE 0 #define NO_GLYPHS_TO_DST 0 #define NO_GLYPHS_VIA_MASK 0 What happens with? diff --git a/src/sna/gen6_common.h b/src/sna/gen6_common.h index 6668620..1277804 100644 --- a/src/sna/gen6_common.h +++ b/src/sna/gen6_common.h @@ -106,6 +106,8 @@ static int prefer_blt_bo(struct sna *sna, struct kgem_bo *bo) inline static bool force_blt_ring(struct sna *sna) { + return true; + if (sna->flags & SNA_POWERSAVE) return true; The other approach would be to bisect to find the slowdown. 1st patch helps. 2nd patch don't. I tested them separately. Should i apply them together? (In reply to comment #86) > 1st patch helps. > 2nd patch don't. > I tested them separately. Should i apply them together? Nope. They were different tress I was barking at. This is slightly more conclusive that is the glyph rendering that is producing too much load. First can you quickly test master as I tweaked placement of glyph operations. Then check diff --git a/src/sna/sna_glyphs.c b/src/sna/sna_glyphs.c index e8d92b8..5350eb0 100644 --- a/src/sna/sna_glyphs.c +++ b/src/sna/sna_glyphs.c @@ -70,7 +70,7 @@ #define FALLBACK 0 #define NO_GLYPH_CACHE 0 #define NO_GLYPHS_TO_DST 0 -#define NO_GLYPHS_VIA_MASK 0 +#define NO_GLYPHS_VIA_MASK 1 #define NO_SMALL_MASK 0 #define NO_GLYPHS_SLOW 0 #define NO_DISCARD_MASK 0 and then diff --git a/src/sna/sna_glyphs.c b/src/sna/sna_glyphs.c index e8d92b8..f7b8607 100644 --- a/src/sna/sna_glyphs.c +++ b/src/sna/sna_glyphs.c @@ -71,7 +71,7 @@ #define NO_GLYPH_CACHE 0 #define NO_GLYPHS_TO_DST 0 #define NO_GLYPHS_VIA_MASK 0 -#define NO_SMALL_MASK 0 +#define NO_SMALL_MASK 1 #define NO_GLYPHS_SLOW 0 #define NO_DISCARD_MASK 0 Master: nope NO_GLYPHS_VIA_MASK 1: helps NO_SMALL_MASK 1: nope FALLBACK and NO_GLYPHS_VIA_MASK regresses gtkperf ~5%. But i suppose no surprise here. On the off-chance that it helps: commit 62102f505cd13840e4c910adbe762b3fb46dfaec Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Jun 16 09:46:54 2014 +0100 sna: Promote better active buffer reuse Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> And another glyph test... Update to commit 10cb36e1ddc6f4bf41941b24d6557343aa595a13 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Jun 16 10:33:07 2014 +0100 sna/glyphs: Add DBG option to force use of the glyph image mask Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Then apply diff --git a/src/sna/sna_glyphs.c b/src/sna/sna_glyphs.c index 1ce7170..f433573 100644 --- a/src/sna/sna_glyphs.c +++ b/src/sna/sna_glyphs.c @@ -71,7 +71,7 @@ #define NO_GLYPH_CACHE 0 #define NO_GLYPHS_TO_DST 0 #define NO_GLYPHS_VIA_MASK 0 -#define FORCE_SMALL_MASK 0 /* -1 = never, 1 = always */ +#define FORCE_SMALL_MASK 1 /* -1 = never, 1 = always */ #define NO_GLYPHS_SLOW 0 #define NO_DISCARD_MASK 0 Compiled 6b82962e58c0959c94f2df1f0ebd9a478b15cb6b. Still has the bug. But i noticed with latest master versions that lag duration is lower across the board but not significantly. Going to try FORCE_SMALL_MASK... It didn't help Played with font size. Droid Sans Mono - 5pt still has the bug. 4pt - no bug. i made a script that fills entire console area with some text - still no bug ouch. with this script output no bug even with a larger font size. it seems like it has something to do with glyph cash. dmesg has a large variety of symbols. but text that i use in script doesn't - i just repeat some text paragraph in a loop. nope. i made a script that print entire dmesg output but with no line breaks. it works fast event with 7pt. When original dmesg lags as usual. Sorry for such an intensive flooding. But it is getting weirder. Now i have almost fluent scrolling even with 8pt font size with both dmesg and dmesg with no breaks output. All i did was reseting the console. No worries, the information is helpful. If you find a good script, I'll give it a go as well. At the moment, I am struggling to make konsole step out of the bottom GPU frequency whilst scrolling through several dmesgs. Didn't you forget about "-graphicssystem native"? At 8pt font size results are not stable. Sometimes in can be fast like there is no bug at all. And sometimes it lags. I caught a moment when even 9pt was fluent for a moment. But nevertheless 8pt seems like a threshold. It seems switching to Mozilla and back improves a situation for a some time. Looks like current master has some improvements. At the very beginning i haven't noticed any dependencies on font size. And i use freetype2 with infinality patches if that matters. Going to test with original. The same with original freetype2 It did indeed take me a moment to remember to add -graphicssystem native, and that did make it cleaner but still the GPU could render at 60fps whilst at lowest frequency. (An ivb gt1 celeron.) Do the characteristics of the lag remain the same -- very longs waits whilst stalling for memory (GAM)? Im not sure. Situating is changing continuously. I mean i forgot which font i used before. But i see now that it depends on font too: DejaVu Sans Mono is much faster then Droid Sans Mono or Liberation Mono. And depends on composting engine also:
Now im on DejaVu Sans Mono 9pt and Xrender lags when OpenGL is fast.
>Do the characteristics of the lag remain the same -- very longs waits whilst stalling for memory (GAM)?
Looks like. But harder to acheive. And waits ms is lower.
Hm. I caught total hang maybe because of using OpenGL or gpu-overlay so i had to restart. After restart everything little bit slower again. It seems i need to do all the window switch, font change magic again.
(In reply to comment #102) > Hm. I caught total hang maybe because of using OpenGL or gpu-overlay so i > had to restart. intel-gpu-overlay or intel-gpu-top will cause a total system hang if used against stock kernels (and intel-gpu-top requires another patch I have to be safe). I'll wait patiently until you have a nasty reproducible case again. Still have to solve why memory spontaneously? becomes the bottleneck. I still can reproduce it even it little bit less nasty than before. There are conditions that lower severity of lags (like font and composition engine) and many changes in master version showed little improvements. What seemed like a big relief in my last comments is actually shaky and depends on magic or solar activity. But bug is still there and i'm ready to continue testing things. Let's attack the other angle for a bit. Can you trigger the bad behaviour at all without TearFree enabled? Same as usual. Smooth rendering but growing delay. Only speed of delay growth is lower. Oh, ok. TearFree is not a critical component here, it just generates a wait that triggers unhappiness. I was about to ask you to try commit a4d845bc8f77e4a36064f132720ce462d0bdd5b4 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Jun 16 14:25:16 2014 +0100 sna: Regularly check the cache level on bo Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> with --enable-debug, but I don't that is going to be insightful. I can't compile it anyway: sna_glyphs.c: In function 'clipped_glyphs': sna_glyphs.c:561:2: error: invalid use of void expression DBG(("%s? %d glyph in %d lists extents (%d, %d), (%d, %d), region (%d, %d), (%d, %d): %s\n", ^ sna_glyphs.c: In function 'glyphs_via_mask': sna_glyphs.c:1130:2: error: invalid use of void expression DBG(("%s: nlist=%d, count=%d, bounds=((%d, %d), (%d, %d))\n", __FUNCTION__, ^ sna_glyphs.c: In function 'glyphs_fallback': sna_glyphs.c:1628:2: error: invalid use of void expression DBG(("%s: nlist=%d, count=%d, extents (%d, %d), (%d, %d)\n", __FUNCTION__, ^ sna_glyphs.c: In function 'glyphs_via_image': sna_glyphs.c:2054:2: error: invalid use of void expression DBG(("%s: nlist=%d, count=%d, bounds=((%d, %d), (%d, %d))\n", __FUNCTION__, (In reply to comment #108) > I can't compile it anyway: Oops fixed. First try --enable-debug not --enable-debug=full, the later will slow your machine down considerably and probably mask the bug in the process. --enable-debug doesn't produce any extra output at all. It must be in Xorg.0.log correct? It should just generate one extra line: [ 42.772] (II) intel(0): SNA compiled with assertions enabled and then abort if it detects an error. This line exists. Apparently there are no errors https://dl.dropboxusercontent.com/u/42927427/Xorg.0.log.tar.bz2 enable-debug=full as it was before - with debug scrolling is faster. tried to find operations that took at least 1ms is [157786.128] glyphs_format: overlapping glyph inside line, current bbox (54, 290), (72, 299), glyph (70, 292), (79, 299) a bad thing? (In reply to comment #113) > https://dl.dropboxusercontent.com/u/42927427/Xorg.0.log.tar.bz2 > enable-debug=full > as it was before - with debug scrolling is faster. Which is unfortunate as it means we don't get to see the true hog. Thanks, I'll look at that later. > tried to find operations that took at least 1ms > is > [157786.128] glyphs_format: overlapping glyph inside line, current bbox (54, > 290), (72, 299), glyph (70, 292), (79, 299) > a bad thing? It's fairly normal. It means that the glyphs bounding box overlaps (e.g. kerning, but it is usually due to subpixel filtering blurring the glyphs larger than designed) and prevents us from taking a fast path (due to Render semantics). It's only about 5x slower... Created attachment 101284 [details]
Screen garbage with latest master
New thing. It not always like this. Few times per minute maybe
(In reply to comment #115) > Created attachment 101284 [details] > Screen garbage with latest master > > New thing. It not always like this. Few times per minute maybe Which commit are you running? > Which commit are you running?
That was a HEAD at the time i made this comment. Tomorrow i be more specific.
Just wondering if was before commit 62aaf2ff4f8597067cf387865707baa00ed9a123 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Jun 18 18:41:15 2014 +0100 sna: Skip redundant copies when already cloned References: https://bugs.freedesktop.org/show_bug.cgi?id=77436 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> which obviously I didn't think through very well, and had to fixup. The dangers of trying to work on multiple problems at once. commit 51a0559b120dd0a57a938b300bccefbf1142e495 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Jun 18 20:55:02 2014 +0100 sna: Do not treat both src/dst having no clones as being the same Fixes regression from commit 62aaf2ff4f8597067cf387865707baa00ed9a123 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Jun 18 18:41:15 2014 +0100 sna: Skip redundant copies when already cloned Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk It was before both of them and it still present in c257c936b42b92827b784cee0b7caa54e6040364. So i've made a separate bug: https://bugs.freedesktop.org/show_bug.cgi?id=80215 Maybe something like: diff --git a/src/sna/sna_display.c b/src/sna/sna_display.c index 9c24ea6..6835bda 100644 --- a/src/sna/sna_display.c +++ b/src/sna/sna_display.c @@ -6495,6 +6495,7 @@ fixup_flip: mode = KGEM_BLT; DBG(("%s: marking flip bo as busy [%x -> mode=%d]\n", __FUNCTION__, busy.busy, mode)); kgem_bo_mark_busy(&sna->kgem, flip_bo, mode); + sna->kgem.need_throttle = true; } else __kgem_bo_clear_busy(flip_bo); } @@ -6511,6 +6512,11 @@ fixup_flip: assert(old->refcnt >= 1); set_shadow(sna, region); } + + if (sna->kgem.need_throttle) { + drmCommandNone(sna->kgem.fd, DRM_IOCTL_I915_GEM_THROTTLE); + sna->kgem.need_throttle = false; + } } else kgem_submit(&sna->kgem); Something that's been bothering me is that I think we end up not calling the BlockHandler often enough. That's called after a client uses up his time slice or has no more commands to execute. diff --git a/src/sna/sna_driver.c b/src/sna/sna_driver.c index e0995ce..4eaf1d5 100644 --- a/src/sna/sna_driver.c +++ b/src/sna/sna_driver.c @@ -719,6 +719,9 @@ sna_block_handler(BLOCKHANDLER_ARGS_DECL) sna->BlockHandler(BLOCKHANDLER_ARGS); + ErrorF("%s (tv=%ld.%06ld), shadow?=%d\n", __FUNCTION__, + *tv ? (*tv)->tv_sec : -1, *tv ? (*tv)->tv_usec : 0, has_shadow(sna)); + if (*tv == NULL || ((*tv)->tv_usec | (*tv)->tv_sec) || has_shadow(sna)) sna_accel_block_handler(sna, tv); } Would give a rough indication of how often we are called, and how often we would proceed. The complication here is that whilst TearFree is active, continuation is done immediately after the flip. On that front, "trace-cmd record -e drm -e i915" during the lag would be useful. Attach the output of "trace-cmd report". (See http://git.kernel.org/cgit/linux/kernel/git/rostedt/trace-cmd.git) Last commits already fixed the bug. Maybe its not always 60fps but i spent some more time testing to insure that there are no more lags. Should i apply the patch? Fixed only for TearFree. Without it delay still increases. And as result "no tearfree" + "kwin opengl vsync" still lags The commit supersedes comment 120 - I thought a little more and decided that catching it when we were rendering faster than flipping was a more useful throttle point. Let me know how this feels, and once you have a better handle on the behaviour we can think about how to improve it further. (I.e. try to tackle the root cause rather than fashion better bandaids.) Indeed, the throttling was only added to TearFree. Without TearFree is still an issue, so working out where the missing throttle is will be important. It should be hitting the BlockHandler and from there throttling at least every 40ms. (In reply to comment #126) > It should be hitting the BlockHandler and from there throttling at least every > 40ms. I think that is why i wrote "Maybe its not always 60fps". I can feel it. I used Kwin Show Fps plugin. fps if quite high ~50. But there are red spikes on the graph that appears only when i scroll konsole. It represents frame draw time. It become red in 20ms-50ms range - if i read kwin sources correctly. Is it possible to throttle for 15ms and not more? It's a balance between render lag and input lag. If we forgo throttling, we can build long streams of render commands on the GPU, and the display lags behind. If we wait for the GPU for too long, we stop sending clients updates. I have two problems here: 1. The throttling should already exist to prevent the huge +2s stalls. Unless it is just a single massive batch constructed within one throttle period that generates the massive stall. trace-cmd should help identify if that is the case. 2. There is no way that we should be able to queue more than the GPU can handle in the first place, certainly not +2s! So something is not quite right with the command stream. This is why I have been trying to spot likely culprits such as using large untiled buffers. Time for something different: diff --git a/src/sna/kgem.c b/src/sna/kgem.c index 7a7e2f2..52e78f2 100644 --- a/src/sna/kgem.c +++ b/src/sna/kgem.c @@ -3369,6 +3369,9 @@ void _kgem_submit(struct kgem *kgem) kgem_reset(kgem); + if (kgem->scanout_busy) + isItTimeToYield = TRUE; + assert(kgem->next_request != NULL); } Created attachment 101742 [details]
trace-cmd report
It seems nothing has changed
That's mostly 60fps, with a few periods where it dropped to 32ms and even 48ms between the flip-request and the actual flip. 14338.924991: render=0.0165519999991375 fps=60.092542514884 14338.941653: render=0.0166360000002896 fps=60.0168047053296 14338.958318: render=0.0166340000014316 fps=60.0060005964152 14338.974979: render=0.0166399999998248 fps=60.0204069395908 14339.008323: render=0.0333019999998214 fps=29.9904030714778 14339.058291: render=0.0499429999999847 fps=20.0128081975053 14339.074952: render=0.0166440000011789 fps=60.0204069330379 14339.091615: render=0.0166379999991477 fps=60.0132029099832 14339.108294: render=0.016658999998981 fps=59.955632830645 14339.124943: render=0.016631000000416 fps=60.0636674837393 14339.141607: render=0.016646000000037 fps=60.0096015403702 14339.158289: render=0.0166600000011385 fps=59.944850732613 14339.174930: render=0.0166149999986374 fps=60.0925425214526 14339.191595: render=0.0166370000006282 fps=60.0060005964152 14339.224940: render=0.0330740000008518 fps=29.9895036738702 14339.274915: render=0.0470879999993485 fps=20.0100050025397 14339.291575: render=0.0166399999998248 fps=60.0240096062931 14339.324906: render=0.03326399999969 fps=30.0021001465206 14339.341564: render=0.0166119999994407 fps=60.0312162307778 14339.358221: render=0.016631000000416 fps=60.0348201952704 14339.374882: render=0.0166349999999511 fps=60.0204069395908 14339.391549: render=0.0166420000005019 fps=59.9988000244664 14339.408213: render=0.0165100000012899 fps=60.0096015338197 14339.441539: render=0.0333039999986795 fps=30.0066014533538 14339.491547: render=0.0498279999992519 fps=19.9968005118534 14339.508191: render=0.00871700000061537 fps=60.0817111228772 14339.524856: render=0.016646000000037 fps=60.0060006029649 14339.558187: render=0.0332960000014282 fps=30.0021001465206 14339.574842: render=0.0102200000001176 fps=60.0420294225921 Which says that the slowdown is due to the GPU load and not latency in preparing the next flip. I scroll manually. so i could pause for a moment. And i was changing a font in konsole in the middle of the trace. second half of it made with liberation - and kwin's show fps showed spikes of >50ms screen draws. Any idea why you couldn't reproduce it on your machine? The big difference is that on this machine I haven't been able to saturate the GPU at 100% busy let alone 100% busy at max frequency. It's only an Ivybridge Celeron with GT1 graphics, so not a powerhouse, but on the other hand only 1366x768. Looking through the trace-cmd, I still haven't had that eureka moment of spotting the beginning of the snowball, let alone understand the cause. There are some long throttling periods which corresponds to the low framerate, but the commands running up to those do not look absurd. There is the tendency for the batch buffer size to grow when loaded, as expected though, and I don't think scary. What I do see though is that I perhaps neutered the regular throttling too much. So, commit 5ad1661b870b5b189e52383e45a9b8f570b40501 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Jun 25 13:21:58 2014 +0100 sna: Only mark throttle as done after success References: https://bugs.freedesktop.org/show_bug.cgi?id=77436 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> should be a step in the right direction. didn't notice any changes. i rolled back the last path, right? Yup. :( As a final stab in this direction, please try: diff --git a/src/sna/kgem.c b/src/sna/kgem.c index 50a7da2..3058086 100644 --- a/src/sna/kgem.c +++ b/src/sna/kgem.c @@ -967,7 +967,7 @@ static bool __kgem_throttle(struct kgem *kgem) * too much and completely starve X. We will sleep again shortly, * and so catch up or detect the hang. */ - if (ioctl(kgem->fd, DRM_IOCTL_I915_GEM_THROTTLE) == 0) { + if (drmIoctl(kgem->fd, DRM_IOCTL_I915_GEM_THROTTLE) == 0) { kgem->need_throttle = 0; return false; } kgem.c:970:2: error: too few arguments to function 'drmIoctl' if (drmIoctl(kgem->fd, DRM_IOCTL_I915_GEM_THROTTLE) == 0) { ^ Picky. Just give it an extra 0: if (drmIoctl(kgem->fd, DRM_IOCTL_I915_GEM_THROTTLE, 0) == 0) { Hard to say. Maybe red spikes appears less often and not so high or maybe not ) Try without TearFree to see if the long lag periods are gone. Yep. its gone. I maybe feels more snappier without TearFree. im not sure. No lag with kwin openGL vsync either. but performance drop quite huge - to 20-25fps I've restored the throttling and tried to compromise to not over-throttle during allocations: commit ca0d06add926eb17fcec6c031adabb656498d744 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Jun 25 14:19:33 2014 +0100 sna: Improve throttling during bo allocation By controlling how long we may block during buffer allocation, we can relax the throttle elsewhere to prevent render lag buildup. References: https://bugs.freedesktop.org/show_bug.cgi?id=77436 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Can you please check that prevents the lag buildup on both TearFree and non-TearFree setups? Due to the well flips are scheduled in TearFree there can easily be a 20ms delay between render and seeing the results, at best 16ms. That should be enough to make non-TearFree feel snappier. To improve, I would need to submit the next frame just before the vblank to minimise the render output latency. The current solution is nice for its simplicity. :) Yep no lags in both mode. Cant say if there is difference between this commit and the patch though. About snappiness. Actually i doubt that i would ever noticed smth if it was like that in a first time. It simply me looking at the performance side very captiously now. Btw. Will these changes be helpful in any other cases? Or that is just workaround for buggy qt xrender backend and only i am going to be happy about it? It's not the only pathological renderer! I still think something is subpar and anomalous about the behavior on your machine, but this bug has gone on long enough... For reference, can you please run x11perf -aa10text -aa24text -rgb10text -rgb24text under the various compositor guises (TearFree, vsync, vsync+TearFree, etc). For reference on my baby ivb, with tear-free enabled and bare X I get: Sync time adjustment is 0.0324 msecs. 200000000 trep @ 0.0001 msec (7920000.0/sec): Char in 80-char aa line (Charter 10) 112000000 trep @ 0.0003 msec (3850000.0/sec): Char in 30-char aa line (Charter 24) 120000000 trep @ 0.0003 msec (3280000.0/sec): Char in 80-char rgb line (Charter 10) 32000000 trep @ 0.0011 msec (871000.0/sec): Char in 30-char rgb line (Charter 24) and without: Sync time adjustment is 0.0328 msecs. 280000000 trep @ 0.0001 msec (10200000.0/sec): Char in 80-char aa line (Charter 10) 112000000 trep @ 0.0002 msec (4130000.0/sec): Char in 30-char aa line (Charter 24) 120000000 trep @ 0.0003 msec (3400000.0/sec): Char in 80-char rgb line (Charter 10) 32000000 trep @ 0.0011 msec (890000.0/sec): Char in 30-char rgb line (Charter 24) Bare X + NoTearFree: 120000000 trep @ 0.0002 msec (4080000.0/sec): Char in 80-char aa line (Charter 10) 32000000 trep @ 0.0009 msec (1100000.0/sec): Char in 30-char aa line (Charter 24) 120000000 trep @ 0.0002 msec (4080000.0/sec): Char in 80-char rgb line (Charter 10) 32000000 trep @ 0.0009 msec (1100000.0/sec): Char in 30-char rgb line (Charter 24) Bare X + TearFree: 120000000 trep @ 0.0002 msec (4040000.0/sec): Char in 80-char aa line (Charter 10) 32000000 trep @ 0.0009 msec (1100000.0/sec): Char in 30-char aa line (Charter 24) 120000000 trep @ 0.0002 msec (4040000.0/sec): Char in 80-char rgb line (Charter 10) 32000000 trep @ 0.0009 msec (1100000.0/sec): Char in 30-char rgb line (Charter 24) KDE + TearFree + Kwin OpenGL Vsync: 120000000 trep @ 0.0002 msec (4130000.0/sec): Char in 80-char aa line (Charter 10) 32000000 trep @ 0.0009 msec (1110000.0/sec): Char in 30-char aa line (Charter 24) 120000000 trep @ 0.0002 msec (4130000.0/sec): Char in 80-char rgb line (Charter 10) 32000000 trep @ 0.0009 msec (1110000.0/sec): Char in 30-char rgb line (Charter 24) My CPU is Core™ i5-3450. ivb too. Intel® HD Graphics 2500. And our results are nothing alike. Forgot to mention that 70-90% of cpuload was IOwait. That's really, really odd. aa10 should be much faster than rgb10. Can you attach Xorg.0.log? I just want to check if you have enabled any kernel features, fbc being the important one. Created attachment 101786 [details]
Xorg.0.log
Btw. When i ran bare X it printed something in console output. GLYPHS was also mentioned there.
Created attachment 101787 [details]
X output
Puzzling, didn't spot anything anomalous. (Other than a ugly use of "enabled".) Just to put my concerns to rest, what does /sys/kernel/debug/dri/0/i915_fbc_status report? (In reply to comment #148) > Created attachment 101787 [details] > X output Ah, that's ok. I was worried there for a moment, but that is just some leak debugging output that X does on shutdown/regeneration. # cat /sys/kernel/debug/dri/0/i915_fbc_status FBC disabled: disabled per chip default Let's run through some other highlights: x11perf -shmput500 -shmget500 -copywinwin500 -copywinpix500 -copypixwin500 -comppixwin500 That should more or less tell us how fast memory access is in different modes (reading/writing by the CPU/GPU/display). KDE + TearFree 200000 trep @ 0.1356 msec ( 7380.0/sec): Copy 500x500 from window to window 200000 trep @ 0.1356 msec ( 7370.0/sec): Copy 500x500 from pixmap to window 200000 trep @ 0.1357 msec ( 7370.0/sec): Copy 500x500 from window to pixmap 400000 trep @ 0.0878 msec ( 11400.0/sec): ShmPutImage 500x500 square 400000 trep @ 0.0991 msec ( 10100.0/sec): ShmGetImage 500x500 square Some tests where 100% pure iowait. Now im going to disable intel pstate driver and see what happens.. disabling intel p-state didnt help. This is my memory info btw: # dmidecode --type 17 # dmidecode 2.12 # SMBIOS entry point at 0x000f04c0 SMBIOS 2.7 present. Handle 0x0029, DMI type 17, 34 bytes Memory Device Array Handle: 0x0028 Error Information Handle: Not Provided Total Width: 64 bits Data Width: 64 bits Size: 4096 MB Form Factor: DIMM Set: None Locator: DIMM1 Bank Locator: BANK 0 Type: DDR3 Type Detail: Synchronous Speed: 1333 MHz Manufacturer: Kingston Serial Number: 9D3AA568 Asset Tag: 9876543210 Part Number: 99U5471-012.A00LF Rank: 2 Configured Clock Speed: 1333 MHz Handle 0x002B, DMI type 17, 34 bytes Memory Device Array Handle: 0x0028 Error Information Handle: Not Provided Total Width: 64 bits Data Width: 64 bits Size: 4096 MB Form Factor: DIMM Set: None Locator: DIMM3 Bank Locator: BANK 2 Type: DDR3 Type Detail: Synchronous Speed: 1333 MHz Manufacturer: Elpida Serial Number: 2A5629D7 Asset Tag: 9876543210 Part Number: EBJ41UF8BDW0-GN-F Rank: 2 Configured Clock Speed: 1333 MHz (In reply to comment #152) > KDE + TearFree > 200000 trep @ 0.1356 msec ( 7380.0/sec): Copy 500x500 from window to > window > 200000 trep @ 0.1356 msec ( 7370.0/sec): Copy 500x500 from pixmap to > window > 200000 trep @ 0.1357 msec ( 7370.0/sec): Copy 500x500 from window to > pixmap > 400000 trep @ 0.0878 msec ( 11400.0/sec): ShmPutImage 500x500 square > 400000 trep @ 0.0991 msec ( 10100.0/sec): ShmGetImage 500x500 square Your memory is much higher performance than mine! Yours is running at dual channel compared to the single channel here, so is roughly 2 faster. So your CPU is faster, your memory is faster, but your EU are slower... Odd. Can you run xf86-video-intel/tests/lowlevel-blt-bench? Despite its name that should exercise the EU. Created attachment 101843 [details]
lowlevel-blt-bench results
in bare X
Comparing our machines: Yours: Testing a8r8g8b8 with Src: ref=1.348951, out=1.340763 Testing x8r8g8b8 with Src: ref=1.342294, out=1.334717 Testing a8r8g8b8 with Over: ref=2.078094, out=1.460116 Testing x8r8g8b8 with Over: ref=1.338495, out=1.325701 Mine: Testing a8r8g8b8 with Src: ref=0.895362, out=0.570604 Testing x8r8g8b8 with Src: ref=0.883392, out=0.579583 Testing a8r8g8b8 with Over: ref=1.833890, out=0.755831 Testing x8r8g8b8 with Over: ref=0.882829, out=0.578426 Not quite the results I was expecting. Oh darn, now I remember it runs at fullscreen. Who wrote this test! Do you mind repeating that test but first doing a 'xrandr --output HDMI1 --mode 1024x768' Created attachment 101850 [details]
lowlevel-blt-bench results 1024x768
Yours @ 1024x768: Testing a8r8g8b8 with Src: ref=0.273359, out=0.420982 Testing x8r8g8b8 with Src: ref=0.273589, out=0.421422 Testing a8r8g8b8 with Over: ref=0.775864, out=0.441080 Testing x8r8g8b8 with Over: ref=0.274387, out=0.416628 Mine @ 1024x768: Testing a8r8g8b8 with Src: ref=0.870605, out=0.593283 Testing x8r8g8b8 with Src: ref=0.858216, out=0.589730 Testing a8r8g8b8 with Over: ref=1.947748, out=0.767362 Testing x8r8g8b8 with Over: ref=0.857667, out=0.586849 Looking at the reference, which uses pure software rasterisation, your cpu/memory is a little over 2x faster (indeed that is most likely due to the dual channel advantage you have). But it seems that the GPU is completely memory bandwidth limited. Comparing Over with argb, which is the only operation there that does perform a read from dst and blend with src before writing, we do expect to see a significant impact from the extra read. Can you please 'cat /sys/kernel/debug/dri/0/i915_ring_freq_table' and 'cat /proc/cpuinfo' Created attachment 101852 [details]
i915_ring_freq_table
Created attachment 101853 [details]
cpuinfo
Drat, no debugfs hack, so do you mind compiling a kernel with diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c index d23d114832fa..db61a1bd9b06 100644 --- a/drivers/gpu/drm/i915/intel_pm.c +++ b/drivers/gpu/drm/i915/intel_pm.c @@ -3707,6 +3707,7 @@ static void __gen6_update_ring_freq(struct drm_device *dev) else ia_freq = max_ia_freq - ((diff * scaling_factor) / 2); ia_freq = DIV_ROUND_CLOSEST(ia_freq, 100); + ia_freq = max_ia_freq; } sandybridge_pcode_write(dev_priv, applied? On a GT2 ivb with dual channel I get @1024x768: Testing a8r8g8b8 with Src: ref=0.336485, out=0.397242 Testing x8r8g8b8 with Src: ref=0.339070, out=0.389645 Testing a8r8g8b8 with Over: ref=1.031118, out=0.350234 Testing x8r8g8b8 with Over: ref=0.339027, out=0.386291 which more closely match your results. Maybe not quite the lead I hoped. Time to ponder again. That's a i7-3720QM CPU @ 2.60GHz, which is exactly in line with your CPU results (when scaling your results at 3.5GHz to this 2.6GHz CPU). Similary, the max GPU clocks here are 1250Mhz which accounts for the difference in GPU results. So, it looks like your results are nearly perfectly scaling for clocks with mine. I.e. no discrepancy here. :| Ah, I have an idea that would account for the glyph discrepancy - it is as if your aa glyphs miss the fast path entirely. That explains why both aa and rgb glyphs are the same speed as well, and why the impact of the extra damage tracking for a compositor doesn't affect performance much. So I guess the characteristic of the slowdown on your machine is going to be best modeled by rgb24text. And that as far as I can see, everything is in order with your hardware - it just seems to be a particularly brutal rendering path for your GPU. (In reply to comment #164) > Ah, I have an idea that would account for the glyph discrepancy - it is as > if your aa glyphs miss the fast path entirely. Is it fixable? And what are x11perf -aa10text -aa24text -rgb10text -rgb24text results on GT2 ? Btw couldn't compile your kernel: CC drivers/gpu/drm/i915/i915_gem.o drivers/gpu/drm/i915/i915_gem.c: In function ‘i915_gem_phys_pwrite’: drivers/gpu/drm/i915/i915_gem.c:396:2: error: ‘ret’ undeclared (first use in this function) ret = i915_gem_object_wait_rendering(obj, false); ^ drivers/gpu/drm/i915/i915_gem.c:396:2: note: each undeclared identifier is reported only once for each function it appears in DRM and i915 was set to built into kernel. This system has the opposite issue (the glyph mask is also discarded for rgb glyphs!): Sync time adjustment is 0.0164 msecs. 800000000 trep @ 0.0001 msec (18100000.0/sec): Char in 80-char aa line (Charter 10) 320000000 trep @ 0.0001 msec (6680000.0/sec): Char in 30-char aa line (Charter 24) 800000000 trep @ 0.0001 msec (18000000.0/sec): Char in 80-char rgb line (Charter 10) 320000000 trep @ 0.0001 msec (6680000.0/sec): Char in 30-char rgb line (Charter 24) Turning off GLYPHS_TO_DST: 200000000 trep @ 0.0001 msec (6670000.0/sec): Char in 80-char aa line (Charter 10) 64000000 trep @ 0.0005 msec (1930000.0/sec): Char in 30-char aa line (Charter 24) 200000000 trep @ 0.0001 msec (6670000.0/sec): Char in 80-char rgb line (Charter 10) 64000000 trep @ 0.0005 msec (1930000.0/sec): Char in 30-char rgb line (Charter 24) which starts to look very similar to your results. I can't see anything else to squeeze out of masked glyphs. It basically requires 3 passes: (1) clear temporary mask, (2) render glyphs to mask, (3) composite mask onto dst. That of course doesn't help your situation where konsole is able to queue up rendering faster than can be processed by the GPU. So I tweaked the cheat: commit a6ba93283b20a50ff36758624d2967562b7bdae9 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Jun 27 13:42:37 2014 +0100 sna/glyphs: Add DBG option for glyph tolerance And bump it to 3 so that glyph filtering doesn't force us to use the mask too often. References: https://bugs.freedesktop.org/show_bug.cgi?id=77436 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> KDE + Tearfree + Kwin Xrender 240000000 trep @ 0.0001 msec (8680000.0/sec): Char in 80-char aa line (Charter 10) 80000000 trep @ 0.0004 msec (2680000.0/sec): Char in 30-char aa line (Charter 24) 240000000 trep @ 0.0001 msec (8690000.0/sec): Char in 80-char rgb line (Charter 10) 80000000 trep @ 0.0004 msec (2680000.0/sec): Char in 30-char rgb line (Charter 24) Oh thanks :). twice as fast. but still aa==rgb Yeah. On my GT2 + dual-channel aa==rgb, whereas on my GT1 + single-channel aa==1.5*rgb. That means I am memory bound with single-channel, but compute bound with dual-channel. I think. Will have to ponder a little bit more, because despite that, my baby ivb is still 10% faster than yours. But I think it does mean that there is room in the shaders to squeeze more performance out of the GPU. :| But back to the bigger question... How does the system behave now when scrolling? Considering the fact that scrolling is already 95% perfect - no difference was noticed. Red spikes on its places. (In reply to comment #169) > Yeah. On my GT2 + dual-channel aa==rgb, whereas on my GT1 + single-channel > aa==1.5*rgb. > > That means I am memory bound with single-channel, but compute bound with > dual-channel. I think. Will have to ponder a little bit more, because > despite that, my baby ivb is still 10% faster than yours. And your GT2 is 2.5faster than mine > But back to the bigger question... How does the system behave now when > scrolling? Considering the fact that scrolling is already 95% perfect - no difference was noticed. Red spikes are still on its places. Hi. It seems something has changed in last few weeks or days. I don't see that throttle takes place during scrolling anymore. Scrolling is absolutely perfect and shows constant 60fps in both Xrender and OpenGL compositing. Great job! Bummer. I know what has changed. I replaced my A4tech XL-750F 1600dpi mouse with a cheap Logitech one. Plugged A4tech - and throttling came back. And now we have a reason - A4tech mouse is very fast and produce too many of mouse move events. I'm not sure just what improved it for you, but any improvement is welcome! If you do feel bored a reverse bisect to find the fix would be interesting to see where the problem (or at least the low hanging fruit) lay. (In reply to Ildar Nurislamov from comment #172) > Bummer. I know what has changed. I replaced my A4tech XL-750F 1600dpi mouse > with a cheap Logitech one. Plugged A4tech - and throttling came back. > And now we have a reason - A4tech mouse is very fast and produce too many of > mouse move events. Yep. I tested it - A4Tech refresh rate is 500Hz. it's insane :) ordinary mouse has 125Hz. No surprise that throttle takes place when i'm using this mouse. So Konsole just tried to draw all frames per second. Don't bother. Everything is fine. And i use ordinary mouse now, which ironically gives smother result ) |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.