Summary: | xorg-server 1.18.2 freeze desktop on i915 | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | xorg | Reporter: | FadeMind <fademind> | ||||||||||||
Component: | Driver/intel | Assignee: | Chris Wilson <chris> | ||||||||||||
Status: | RESOLVED NOTOURBUG | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||||||||||
Severity: | normal | ||||||||||||||
Priority: | medium | CC: | aleksey, alexander, andyrtr, axelfischer12, bugs, fademind, iskatu, jpsutton, leho, lordheavym, manuel, michel, mike, nrndda, rdieter, sait.a.umar, tobias.johannes.klausmann, torsten.veller+fdo | ||||||||||||
Version: | unspecified | ||||||||||||||
Hardware: | x86-64 (AMD64) | ||||||||||||||
OS: | Linux (All) | ||||||||||||||
Whiteboard: | |||||||||||||||
i915 platform: | i915 features: | ||||||||||||||
Attachments: |
|
Description
FadeMind
2016-03-12 19:28:36 UTC
Comment on attachment 122254 [details]
dmesg, journalctl, xorg.0.log
Save as file: logs.tar.gz
i can confirm this bug (sometimes right at startup), it always leads to the following backtrace (sorry for no line numbers): #0 0x00007f1c47e8b487 in ioctl () at /lib64/libc.so.6 #1 0x00007f1c491cdbf8 in drmIoctl () at /usr/lib64/libdrm.so.2 #2 0x00007f1c43972fff in sna_present_queue_vblank () at /usr/lib64/xorg/modules/drivers/intel_drv.so #3 0x000000000050ff4c in present_execute () #4 0x00007f1c43973161 in sna_present_vblank_handler () at /usr/lib64/xorg/modules/drivers/intel_drv.so #5 0x00007f1c438e4cff in sna_mode_wakeup () at /usr/lib64/xorg/modules/drivers/intel_drv.so #6 0x00007f1c438e7ff8 in sna_wakeup_handler () at /usr/lib64/xorg/modules/drivers/intel_drv.so #7 0x000000000043db1a in WakeupHandler () #8 0x0000000000586ce7 in WaitForSomething () #9 0x0000000000438f5e in Dispatch () #10 0x000000000043cfe3 in dix_main () #11 0x00007f1c47dc9610 in __libc_start_main () at /lib64/libc.so.6 #12 0x0000000000428539 in _start () Hardware is an Notebook with intel (gen7/ivb)+ nvidia. (In reply to Tobias Klausmann from comment #2) > i can confirm this bug (sometimes right at startup), it always leads to the > following backtrace (sorry for no line numbers): > > #0 0x00007f1c47e8b487 in ioctl () at /lib64/libc.so.6 > #1 0x00007f1c491cdbf8 in drmIoctl () at /usr/lib64/libdrm.so.2 > #2 0x00007f1c43972fff in sna_present_queue_vblank () at > /usr/lib64/xorg/modules/drivers/intel_drv.so That can't be the same bug as FadeMind isn't using DRI3/Present. Please file a separate bug, and here both drm.debug=0x7 dmesg and/or Xorg.0.log with xf86-video-intel compiled with --enable-debug=full will be vital. (In reply to FadeMind from comment #0) > After installing lastest Xorg update 1.18.2 and reboot I have serious issue > . After while whole GUI just freeze Mouse and Keyboard not repond at all. > Only hard power off (via pushing power button 5 sec. works) shutdown laptop. > I using linux-zen kernel 4.4.5 now. > In Xorg I have weird: > > [ 232.965] (II) GetModeLine - scrn: 0 clock: 71800 > [ 232.965] (II) GetModeLine - hdsp: 1366 hbeg: 1414 hend: 1446 httl: 1526 > > in lots of lines after launching google-chrome too. That's extremely suspect indeed. That message comes from Xext/vidmode.c which was recently rewritten. Maybe using x11trace with google-chrome would reveal the loop it gets stuck in, or gdb'ing the Xserver, or using perf. Bisecting the Xserver would also be useful, my guess would be commit f175cf45aebcdda53f3ae49c0eaf27da1f194e92 Author: Olivier Fourdan <ofourdan@redhat.com> Date: Wed Feb 10 09:34:34 2016 +0100 vidmode: move to a separate library of its own but hopefully it is a smaller commit. Created attachment 122257 [details]
logs-debug_enabled.tar.gz
Xorg file with extended msgs.
I forgot: I using KDE Plasma 5 with enabled Rendering OpenGL 3.1 AND I tested two scenarios with DRI2 and DRI3. The same freeze happened.
Ok, the GetModeLine spam is a separate issue, from the looks of the debug log it doesn't look to be related to any hw/ddx interaction. Logs look fairly normal up to the end. I presume the event has to yet to happen, could you keep capturing and attach the logs from after the freeze? Probably need to grab the logs remotely, or else disable filesystem caching (depending on hard you need to reboot). Mr Chris Wilson how to do this debug catch safety? I using SSD. One partition with Ext4 standard options. / ext4 rw,relatime,data=ordered 0 1 Like I mentioned whole GUI and input devices are frozen when this issue happened. IMO nothing is saved except this what was saved before hangs. I can trying boot from live usb AFTER this happened, so xorg file will be not overwritten. UPDATE I have this at beginning error (loop?) [ 158.158] vblank_complete: 1 events complete [ 158.158] vblank_complete: pipe=0 tv=158.155330 msc=9449 (target=4294976744), event=10093 complete: MISS [ 158.158] sna_present_queue_vblank(pipe=0, event=10093, msc=4294976744) [ 158.158] sna_present_queue_vblank: appending event=10093 to vblank 4294976744 x 2 at the end of file I have: 174.213] sna_present_queue_vblank: appending event=10093 to vblank 4294976744 x 1399566 [ 174.213] vblank_complete: pipe=0 tv=158.155330 msc=9449 (target=4294976744), event=10093 complete: MISS [ 174.213] sna_present_queue_vblank(pipe=0, event=10093, msc=4294976744) [ 174.213] sna_present_queue_vblank: appending event=10093 to vblank 4294976744 x 1399567 [ 174.213 Xorg log have 470 MB (with debug flags enabled in xorg intel driver) HERE it is: https://www.dropbox.com/s/25lzliobalk7861/error_94515.tar.xz?dl=1 SHA-1: a587d8329196eca2a2c0694176e22aa185471735 (In reply to FadeMind from comment #9) > UPDATE > > I have this at beginning error (loop?) > > [ 158.158] vblank_complete: 1 events complete > [ 158.158] vblank_complete: pipe=0 tv=158.155330 msc=9449 > (target=4294976744), event=10093 complete: MISS > [ 158.158] sna_present_queue_vblank(pipe=0, event=10093, msc=4294976744) > [ 158.158] sna_present_queue_vblank: appending event=10093 to vblank > 4294976744 x 2 > > at the end of file I have: > > 174.213] sna_present_queue_vblank: appending event=10093 to vblank > 4294976744 x 1399566 > [ 174.213] vblank_complete: pipe=0 tv=158.155330 msc=9449 > (target=4294976744), event=10093 complete: MISS > [ 174.213] sna_present_queue_vblank(pipe=0, event=10093, msc=4294976744) > [ 174.213] sna_present_queue_vblank: appending event=10093 to vblank > 4294976744 x 1399567 > [ 174.213 > > Xorg log have 470 MB (with debug flags enabled in xorg intel driver) > > HERE it is: https://www.dropbox.com/s/25lzliobalk7861/error_94515.tar.xz?dl=1 > SHA-1: a587d8329196eca2a2c0694176e22aa185471735 Hmm that is the same as in my backtrace. Maybe (just to make sure): Can you ssh into your maschine and debug the hanging xserver with gdb? Sorry, but I cannot do SSH connect here. Created attachment 122258 [details]
xserver 1.18.2 log
Created attachment 122259 [details]
dmesg drm.debug=0x7
Despite your request to open another bug, following the information from the xorg log, i think this is indeed the same bug. The above posted backtrace is still valid.
I filed a bug with Fedora earlier on this problem. Using Fedora 23 with xorg-x11-xserver 1.18.2, KDE plasma 5.19 on a Lenovo Thinkpad 450s. The mouse/keyboard/display freezes for periods of 20-40 seconds but if you wait long enough it starts until the next freeze. I get messages: =========================== [ 1120.069] (II) SYN_DROPPED event from "TPPS/2 IBM TrackPoint" - some input events have been lost. [ 11[ 1120.069] (II) SYN_DROPPED event from "TPPS/2 IBM TrackPoint" - some input events have been lost. [ 1120.070] (EE) libinput bug: timer offset negative [ 1120.070] (EE) libinput bug: timer offset negative [ 1120.070] (EE) libinput bug: timer offset negative [ 1120.070] (EE) libinput bug: timer offset negative [ 1120.070] (EE) libinput bug: timer offset negative [ 1120.070] (EE) libinput bug: timer offset negative [ 1120.070] (EE) libinput bug: timer offset negative [ 1120.071] (EE) libinput bug: timer offset negative [ 1120.071] (II) SYN_DROPPED event from "SynPS/2 Synaptics TouchPad" - some input events have been lost. ============================ Yje problem I think actually exists with 1.18.1 as well but less evident. Here is my analysis: xorg-x11-server-1.18.1-3, EGL interface, libinput-1.2.1-3 NO PROBLEMS xorg-x11-server-1.18.1-3, glx interface, libinput-1.2.1-3 OCCASIONAL FREEZE xorg-x11-server-1.18.2-1, either EGL or glx interface, FREQUENT FREEZE Tobias, are you using KDE plasma 5 as well? (In reply to Tobias Klausmann from comment #13) > Created attachment 122259 [details] > dmesg drm.debug=0x7 > > Despite your request to open another bug, following the information from the > xorg log, i think this is indeed the same bug. The above posted backtrace is > still valid. Yes, but we have 2 bugs here! The first bug report does not involve Present, the latter backtrace and logs are a Present bug. Both will freeze the UI, but they have different underlying causes. I got the same GetModeLine flood starting on Gentoo + Chrome, after updating to 1.18.2. Haven't experienced freezes yet, but the upgrade may be too fresh and hasn't received much workload yet. Perhaps also configuration differences. Is the GetModeLine flood merely cosmetic (in which case, any way to silence it?) or should I downgrade back to 1.18.1 for time being if actual work needs to be done? (In reply to Leho Kraav (:macmaN :lkraav) from comment #17) > Is the GetModeLine flood merely cosmetic (in which case, any way to silence > it?) or should I downgrade back to 1.18.1 for time being if actual work > needs to be done? It seems to be in response to a XVidMode requests from the client, so both log spam and excess X11 traffic. The debug log showed that it only the Xext/vidmode.c layer and wasn't causing extra work for the hardware (we were not continuing reprobing the hardware which is what I feared, it's both slow and any time we touch the hardware there be dragons). From my pov, it is cosmetic, but it may also be impacting system performance. (In reply to Chris Wilson from comment #18) > > It seems to be in response to a XVidMode requests from the client, so both > log spam and excess X11 traffic. The debug log showed that it only the > Xext/vidmode.c layer and wasn't causing extra work for the hardware (we were > not continuing reprobing the hardware which is what I feared, it's both slow > and any time we touch the hardware there be dragons). From my pov, it is > cosmetic, but it may also be impacting system performance. Thanks for the breakdown. Should this thing be branched off into a separate bug? The original problem description also includes freezing and I for one don't have that (at least yet...). Would be useful to have a drop-in patch attached here to eliminate the cosmetics perhaps. That'd enable further testing 1.18.2 for regressions and wouldn't force reverting back. This message flood is really bad. Could I please ask you to regnerate the debug Xorg.log with current xf86-video-intel.git? There are some details missing from the log files that are included in commit 5489402ca1fe17d5b4d20e4dcdc6ac8e6e415b0b Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Sun Mar 13 13:28:42 2016 +0000 sna/present: Add a modicum more debug info References: https://bugs.freedesktop.org/show_bug.cgi?id=94515 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (In reply to Leho Kraav (:macmaN :lkraav) from comment #19) > (In reply to Chris Wilson from comment #18) > > > > It seems to be in response to a XVidMode requests from the client, so both > > log spam and excess X11 traffic. The debug log showed that it only the > > Xext/vidmode.c layer and wasn't causing extra work for the hardware (we were > > not continuing reprobing the hardware which is what I feared, it's both slow > > and any time we touch the hardware there be dragons). From my pov, it is > > cosmetic, but it may also be impacting system performance. > > Thanks for the breakdown. Should this thing be branched off into a separate > bug? The original problem description also includes freezing and I for one > don't have that (at least yet...). Right, I think the GetModeLine spam is a separate issue from the freezing and separating that out into its own bug report should help prevent it being forgotten. > Would be useful to have a drop-in patch attached here to eliminate the > cosmetics perhaps. That'd enable further testing 1.18.2 for regressions and > wouldn't force reverting back. This message flood is really bad. Looking at the changes, it appears entirely cosmetic. That logging used to exist in 1.18.1 but was hidden behind a verbosity check, now it is just using a standard LogMessage(): diff --git a/Xext/vidmode.c b/Xext/vidmode.c index 7c838f4..e7f9947 100644 --- a/Xext/vidmode.c +++ b/Xext/vidmode.c @@ -267,13 +267,13 @@ ProcVidModeGetModeLine(ClientPtr client) rep.vtotal = VidModeGetModeValue(mode, VIDMODE_V_TOTAL); rep.flags = VidModeGetModeValue(mode, VIDMODE_FLAGS); - LogMessage(X_INFO, "GetModeLine - scrn: %d clock: %ld\n", - stuff->screen, (unsigned long) rep.dotclock); - LogMessage(X_INFO, "GetModeLine - hdsp: %d hbeg: %d hend: %d httl: %d\n", - rep.hdisplay, rep.hsyncstart, rep.hsyncend, rep.htotal); - LogMessage(X_INFO, " vdsp: %d vbeg: %d vend: %d vttl: %d flags: %ld\n", - rep.vdisplay, rep.vsyncstart, rep.vsyncend, - rep.vtotal, (unsigned long) rep.flags); + DebugF("GetModeLine - scrn: %d clock: %ld\n", + stuff->screen, (unsigned long) rep.dotclock); + DebugF("GetModeLine - hdsp: %d hbeg: %d hend: %d httl: %d\n", + rep.hdisplay, rep.hsyncstart, rep.hsyncend, rep.htotal); + DebugF(" vdsp: %d vbeg: %d vend: %d vttl: %d flags: %ld\n", + rep.vdisplay, rep.vsyncstart, rep.vsyncend, + rep.vtotal, (unsigned long) rep.flags); /* * Older servers sometimes had server privates that the VidMode will quieten it up again. Though any client actually still using XVidMode should be fixed! So This bug has been fixed ? because i hit on it too on gentoo and archlinux . Graphics: Card: Intel Core Processor Integrated Graphics Controller Display Server: X.Org 1.18.2 driver: Intel Resolution: 1366x768@60.17hz GLX Renderer: Mesa DRI Intel Ironlake Mobile GLX Version: 2.1 Mesa 11.1.2 the system freezes a lot and lagging a lot. With Sna + Dri3 + Mate Desktop. (In reply to Carlos Jimenez from comment #22) > So This bug has been fixed ? because i hit on it too > on gentoo and archlinux . No, still waiting on the debug logs. I experienced similar frequent hangs of maybe 15-20sec with a Lenovo ThinkPad X1 Carbon. Downgrading to xorg-x11-server-Xorg-1.18.1-3.fc23.x86_64 (Fedora, from 1.18.2-something) did the trick, it seems. It seems I also got my first hang on 1.18.2. Resumed from suspend, and i3lock password circle suddenly just hung on flashing that green bar on a password character entry. Soon after Caps-Lock started flashing. Magic-SysRq was unavailable after the hang. (In reply to Chris Wilson from comment #23) > ... > No, still waiting on the debug logs. Hmm, aren't the ones in #12 and #13 enough? If something is missing, I'll gladly provide it... (In reply to Tobias Klausmann from comment #26) > (In reply to Chris Wilson from comment #23) > > ... > > No, still waiting on the debug logs. > > Hmm, aren't the ones in #12 and #13 enough? If something is missing, I'll > gladly provide it... No, they are missing some debug information from later builds which should help understand the issue. A patch was introduced as a part of the below bug report that is working. I also tested it on Fedora 23 and freezes are gone. FYI https://bugs.archlinux.org/task/48549 (In reply to Sait Umar from comment #28) > A patch was introduced as a part of the below bug report that is working. I > also tested it on Fedora 23 and freezes are gone. > FYI > > https://bugs.archlinux.org/task/48549 I am unable to locate the source for the specific patch. What am I missing? I posted it (with due credits) in another bug report: https://bugzilla.redhat.com/show_bug.cgi?id=1317153 (In reply to Leho Kraav (:macmaN :lkraav) from comment #29) > (In reply to Sait Umar from comment #28) > > A patch was introduced as a part of the below bug report that is working. I > > also tested it on Fedora 23 and freezes are gone. > > FYI > > > > https://bugs.archlinux.org/task/48549 > > I am unable to locate the source for the specific patch. What am I missing? FYI https://projects.archlinux.org/svntogit/packages.git/commit/trunk?h=packages/xorg-server&id=bb37fae55e67f14bc491ca6bcccde3907f73b21c I have the GetModeline logspam as well on Debian unstable (1.18.2). It also severely impacts UI performance, switching between windows or workspaces has a noticeable (~1s) delay as soon as I start Chrome or Spotify (both of which trigger the logspam). Unfortunately I can't downgrade to 1.18.1 as that had a different issue that caused Xwayland to segfault (resolved in 1.18.2) :) I have the same issue, gentoo, xorg-server-1.18.2, kde-plasma-5.19 and VGA compatible controller: Intel Corporation Broadwell-U Integrated Graphics (i915). It always happens when I click on an icon in the kde tray, the Xorg process uses 100% of one of the cpus, everything hangs for 20-30 seconds. Downgrade to xorg-server-1.18.1 resolves the issue. Created attachment 122483 [details]
xserver 1.18.2 log w 74cd4d0
Another take on the desired log. Cut from an 1.8G sized file
It looks like the Client sent PresentPixmap with a divisor=1<<32, or even target_msc of 1<<32. I think commit f656f6afa288c63968f45d6d32c96a4cf6213117 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Mar 22 22:11:18 2016 +0000 sna/present: Report BadValue if target_msc exceeds last known msc by 1<<32 As we cannot queue a vblank further than 1<<32 into the future, nor can we set a Timer to expire that far into the future, creating such a vblank event is futile. Report BadValue and hope for the best. References: https://bugs.freedesktop.org/show_bug.cgi?id=94515 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> should hide the immediate issue with the freeze / busy spin. Can someone attach gdb and put a breakpoint on present_pixmap() line 953, just after present_queue_vblank() returns an error and print the locals? I am most interested in getting the arguments to the function so we can compare what the client requested with the computed vblank. The problem seems to be gone after this patch: https://cgit.freedesktop.org/xorg/driver/xf86-video-intel/commit/?id=f656f6afa288c63968f45d6d32c96a4cf6213117 Sorry, I was wrong, after an hour it appeared again. (In reply to Vadim Dyadkin from comment #36, #37) > The problem seems to be gone after this patch: > https://cgit.freedesktop.org/xorg/driver/xf86-video-intel/commit/ > ?id=f656f6afa288c63968f45d6d32c96a4cf6213117 > Sorry, I was wrong, after an hour it appeared again. i can confirm it is still possible to hit the problem, but way more unlikely to happen after the commit. Thanks for the emergency (half)fix! (In reply to Chris Wilson from comment #35) > It looks like the Client sent PresentPixmap with a divisor=1<<32, or even > target_msc of 1<<32. > > I think > > commit f656f6afa288c63968f45d6d32c96a4cf6213117 > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Tue Mar 22 22:11:18 2016 +0000 > > sna/present: Report BadValue if target_msc exceeds last known msc by > 1<<32 > > As we cannot queue a vblank further than 1<<32 into the future, nor can > we set a Timer to expire that far into the future, creating such a > vblank event is futile. Report BadValue and hope for the best. > > References: https://bugs.freedesktop.org/show_bug.cgi?id=94515 > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > should hide the immediate issue with the freeze / busy spin. > > Can someone attach gdb and put a breakpoint on present_pixmap() line 953, > just after present_queue_vblank() returns an error and print the locals? I > am most interested in getting the arguments to the function so we can > compare what the client requested with the computed vblank. +if (warn_unless(msc - swap->msc < 1ull<<32)) Should that be <= ? or possibly just cap the time to 1<<32 ? I got the freeze and I'm on UXA GLAMOR, not SNA. Are we sure this patch is related? (In reply to Joakim Tjernlund from comment #39) > +if (warn_unless(msc - swap->msc < 1ull<<32)) > Should that be <= ? No. if msc was current + 1<<32 (swap->msc should be the current msc give or take a vblank occurring between the start of PresentPixmap and the end of the function), then (uint32_t)msc == swap->msc and the scheduled event returns instantly. We then see that that the target msc hasn't occurred and requeue the event, which fires instantly again. And we get trapped into the loop. > or possibly just cap the time to 1<<32 ? Possibly, but given the choice of reporting "I can't support this value" and faking it, we should punt that decision up. It saves having to make a policy decision in the driver. (In reply to Leho Kraav (:macmaN :lkraav) from comment #40) > I got the freeze and I'm on UXA GLAMOR, not SNA. Are we sure this patch is > related? Yes. The bug is the client is requesting a target_msc that wraps around onto the current msc. No backend handled that overflow and so all are susceptible. The patch adds a check for the overflow when casting from the 64bit target_msc into the 32bit hardware sequence number and reporting the error from the driver. The bug as far as I can tell so far is that the requested divisor received from the Client is 1<<32 rather than 1. That suggests that the two CARD32 in the protocol have been swapped, but that seems unlikely... On a slight tangent, it seems the bogus MSC is not limited to current + 1<<32, but we can also get current + 1<<32-1, which at a guess would be the vblank before a flip for a missed divisor=1<<32. I had these freezes, on 1.18.1 and more frequently on 1.18.2, until I changed my cpu_freq governor from "ondemand" to "conservative". Have been running without freezes for days (tested with different versions of xf86-video-intel, e.g. with commits 0204e55b618af285eb31c6eecc3c57b54319a0f4 and 4e108afe55f6b13aa946d1252ff0b828929245b3 for the last couple of days). The freezes are either gone or maybe became much less frequent (with "conservative" governor I have encountered zero). May be this will help finding the issue. There is a new one line patch at the xorg/xserver freedesktop git that solved the problem for me. The previous proposed patch had taken this section of code out. This one seems to fix it instead. I have been using it for a few hours and had no freezes. Arch Linux provides fix patch for solving freeze issue in xorg-server 1.18.2-4 release. Already in stable repository. https://projects.archlinux.org/svntogit/packages.git/tree/trunk/0001-present-Only-requeue-for-next-MSC-after-flip-failure.patch?h=packages/xorg-server&id=9fd963e75ec9745e07a5e2595dae57de3f925210 commit: https://projects.archlinux.org/svntogit/packages.git/commit/trunk?h=packages/xorg-server&id=9fd963e75ec9745e07a5e2595dae57de3f925210 (In reply to Sait Umar from comment #45) > There is a new one line patch at the xorg/xserver freedesktop git that solved > the problem for me. Note that it would still be interesting to find out where the bogus MSC values are coming from, per the last paragraph of comment 35, in case there's a client side bug which needs to be addressed there. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.