|Summary:||[ilk] Font and screen corruption in GTK+ applications|
|Product:||xorg||Reporter:||Tristan Miller <psychonaut>|
|Component:||Driver/intel||Assignee:||Chris Wilson <chris>|
|Status:||NEW ---||QA Contact:||Intel GFX Bugs mailing list <intel-gfx-bugs>|
|Priority:||medium||CC:||adeptsmail, alexander, azurlay, giuseppe.pandolfo, gottwald, hamer.mk, jtojnar, marci_r, mar.kolya, ponymarzanna, psychonaut, rdieter, seb128, shawvrana, solstag, thejoe|
|i915 platform:||i915 features:|
Description Tristan Miller 2015-01-19 13:11:14 UTC
After running my system for some time (several hours or days), certain text characters in GTK+ applications become blank or garbled. This renders these applications completely unusable until I restart. (See the attached screenshots.) Qt applications are unaffected. I'm not sure if this is related, but in addition to the font corruption, sometimes I get black boxes or black streaking over non-text elements of GTK+ applications. It's not clear to me what the cause of the problem is. There's a Debian bug report at <https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=760435> which suggests that the problem is in libglib. However, there are also several freedesktop.org bug reports for the Intel video driver which describe similar symptoms. In particular, my font corruption looks just like Attachment 35720 [details] of Bug 28151 and Attachment 37173 [details] of Bug 20560. I think it's more likely a problem with the Intel driver than with libglib, since I didn't notice any problems after my last update to libglib (from 2.38.2 to 2.42.0 on 3 December 2014) but I did start enountering this problem after migrating my OS from a system with an Nvidia card to one with a Core Processor Integrated Graphics Controller. I am running KDE 4.14.3 on openSUSE 13.2 for x86-64. I am using xorg 7.6 and version 2.99.916 of the i915 driver. My graphics card is a Elsewhere on the web this problem has been reported on ArchLinux and Kubuntu: <https://bbs.archlinux.org/viewtopic.php?id=186783> I have reported the problem downstream on the openSUSE bug tracker: <https://bugzilla.novell.com/show_bug.cgi?id=913425>
Comment 1 Tristan Miller 2015-01-19 13:12:01 UTC
Created attachment 112465 [details] Screenshot showing missing characters in various GTK+ applications
Comment 2 Tristan Miller 2015-01-19 13:13:05 UTC
Created attachment 112466 [details] Screenshot showing corrupted characters in SeaMonkey Note in this screenshot how one character, the uppercase W, is systematically corrupted.
Comment 3 Tristan Miller 2015-01-19 13:13:42 UTC
Created attachment 112467 [details] Screenshot showing black streaks over widgets in SeaMonkey
Comment 4 Tristan Miller 2015-01-19 13:15:33 UTC
Created attachment 112468 [details] Screenshot showing black boxes over widgets in Thunderbird
Comment 5 Chris Wilson 2015-01-19 13:20:44 UTC
Please attach your Xorg.0.log and dmesg.
Comment 6 Tristan Miller 2015-01-19 13:24:21 UTC
Created attachment 112469 [details] Output of dmesg
Comment 7 Tristan Miller 2015-01-19 13:24:42 UTC
Created attachment 112470 [details] /var/log/Xorg.0.log
Comment 8 Chris Wilson 2015-01-19 13:56:56 UTC
Hmm, drat I was expecting/hoping for a GPU hang. Do you mind just confirming that the dmesg/Xorg.0.log are from after the corruption starts showing?
Comment 9 Tristan Miller 2015-01-19 14:03:07 UTC
Yes, the dmesg and Xorg.0.log files are from shortly after taking the screenshots in Attachment 112466 [details] and Attachment 112468 [details], in the same login session. (Attachment 112465 [details] and Attachment 112467 [details] are from an earlier session last week, after which I rebooted.)
Comment 10 Sebastien Bacher 2015-05-06 07:43:41 UTC
I see similar issues on my intel i5 gen5 using Unity on Ubuntu, no GPU hang error in the logs either, doing a session logout/login fixes (no need to reboot).
Comment 11 Tristan Miller 2015-06-01 20:18:46 UTC
I can reproduce this problem on two additional openSUSE 13.2 machines, both using the i915 driver (but with different models of video controller). Anything else I can do to help troubleshoot? Unfortunately the bug is making the computers practically unusable.
Comment 12 Orion Poplawski 2015-06-30 17:01:31 UTC
Fedora report - https://bugzilla.redhat.com/show_bug.cgi?id=1163689
Comment 14 Chris Wilson 2015-06-30 17:13:02 UTC
(In reply to Orion Poplawski from comment #13) > Perhaps bug #63595 is related? No. Different GPUs, different rendering engines. This is a missing GPU flush, that is a missing piece of state setup.
Comment 15 Christian Stadelmann 2015-07-26 19:59:20 UTC
There is another related bug in Fedora on https://bugzilla.redhat.com/show_bug.cgi?id=742776. According to people there it also affects radeon drivers. I am seeing this bug too, using an Intel Core i5 iGPU (first generation) with i915 kernel module loaded. I am seeing this in both Gtk2 and Gtk3 applications (including firefox built with Gtk2 and firefox built with Gtk3).
Comment 16 Orion Poplawski 2015-10-15 20:16:17 UTC
Any progress here? Anything that can be done to help? This machine is becoming pretty unusable.
Comment 17 Tristan Miller 2015-11-11 08:59:09 UTC
(In reply to Orion Poplawski from comment #16) > Any progress here? Anything that can be done to help? This machine is > becoming pretty unusable. If you follow the link to the openSUSE bug from my original report, you'll find that two workarounds have been suggested (one from Stefan Dirsch in Comment #5 and one from Egbert Eich in Comment #7). The first one seems to have worked for me; the second one I haven't tested. I reproduce Egbert Eich's comment here, as it contains technical information which may be of use to the Intel driver developers: > I've chased this issue on Intel Ilk for weeks. This is the same gen > as you have used for the log you posted on fd.o. I don't know which > other GPUs you have used. > > I've been able to trace this to the 2D textures COGL uses to store > glyphs and icons in (as sub-textures) called 'atlases': when a new > object needs to be cached, COGLS tries to find room in the current > atlas. If there is no room it tries to resize the atlas. If the > maximum size of the texture is reached it creates a new 'atlas'. > > What I found was, that once the texture size exceeded 4kx4k (the > i965_dri.so driver announces a max texture level of 14 - ie. a size > of 8kx8k) the content got corrupted, ie some elemends got lost during > copying. However, what I also found was that this depended on the > blit mode used for atlases: the blit modes can be specified with the > environment variable COGL_ATLAS_DEFAULT_BLIT_MODE - available > settings are: 'texture-render'. 'framebuffer', 'copy-tex-sub-image', > 'get-tex-data'. 'texture-render' is the default. Specifying any of > the others made the issue go away for me. > > I've never gotten around to dump the opengl state and shaders used in > this mode to generate a simple test case which would have allowed to > debug the intel driver. > > The workaround I suggested to our customer was to set the environment > variable: > > COGL_ATLAS_DEFAULT_BLIT_MODE=framebuffer > > in /etc/environment. This made things work for him > > You may want to give this a try as well. > > Please let me know if it does the job.
Comment 18 Zeljko Tomic 2015-11-20 17:31:59 UTC
I am experiencing this as well, ever since I switched from UXA to SNA. With UXA I had other issues, and SNA seems faster on my 2nd gen intel, so I'd really prefer to stick with it. I'm not sure exactly how to reproduce it, but in my case, it seems to have something to do with suspend/resume action. I can work for 10 hours straight without glitches, but few minutes after I resume from sleep, I start getting fonts messed up (missing letters or displaying rectangles instead of letters) and GTK apps look broken (pretty much how reporter's screenshots shows it). Restarting X solves the issue, until I sleep/resume again. Maybe it has nothing to do with sleep/resume, but that's just what I noticed. I never had this issue working straight after X starts, but only after sleep/resume. In my case, about 7/10 times after sleep/resume breaks the driver within 20-30 minutes. I reported this recently on irc, and Chris suggested I try running 3.2 kernel (based on my suspend/resume notice). Unfortunately, my wifi driver doesn't work with 3.2 kernel and I couldn't stay up long enough to experience the issue. As suggested in previous comment, I now tried setting COGL_ATLAS_DEFAULT_BLIT_MODE to "framebuffer". I'll report back in a couple of days with results. Thinkpad W520 CPU: i7-2720QM BIOS: Lenovo, Version: 8BET62WW (1.42 ), Firmware Revision: 1.36 OS & DE: Ubuntu 15.10, Gnome 3.18 Kernel: 4.2.0-18-generic, x86_64 Xorg: 7.7 Xserver: 1.17.2 Mesa: 11.0.2 No errors in dmesg or xorg.0.log.
Comment 19 Zeljko Tomic 2015-11-26 16:33:39 UTC
Just to report back, the workaround I tried (setting COGL_ATLAS_DEFAULT_BLIT_MODE=framebuffer) seems to be working. In last 7 days, I did not experience major breakage of UI, but did notice slowdowns, mostly on input (lag when scrolling or keyboard input). I'm using latest drvier: 2.99.917-513-g0995ad2
Comment 21 Bogdan B 2016-01-08 09:29:27 UTC
Created attachment 120892 [details] OutputOfVarLogXorg0Log
Comment 22 Bogdan B 2016-01-08 09:39:10 UTC
Hello, Reproduced on Dell Inspiron 15-3531 Reg Model P28F Reg Type No P28F005 DPN K2DJT A01. Kubuntu 16.04. Reproduced with pidgin, emacs GUI. Fixed by echo COGL_ATLAS_DEFAULT_BLIT_MODE=framebuffer >> /etc/environment This laptop model is also affected by another nasty bug: constant noise in headphones http://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/?id=5e6db6699b7651f02f4b7cc6a86f5b3d9359d636 https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1315770 Thanks.
Comment 23 Alex 2016-04-03 04:21:32 UTC
That fix doesn't seem to work for me on a Dell Venue 11 Pro running newest kernel/intel gfx/Gnome on Arch. still getting the issue every several suspend cycles.
Comment 24 Stefan Gottwald 2016-04-11 08:07:59 UTC
Created attachment 122858 [details] [review] Possible Fix for font rendering (glyphs) issues We encountered also some font rendering issues with missing or garbled fonts. After some searching and trying the attached patch fixed the issue for us. Not really sure if this is correct but in our case it helped, so you can try it. The patch has two parts, the first part is only for correcting a non valid looking if condition and the second part made the difference. All in all it reverts https://cgit.freedesktop.org/xorg/driver/xf86-video-intel/commit/?id=05cf93287419992208493f5098fc7b089e95b20c which was the problem causing commit, for our case, we found.
Comment 25 François Guerraz 2016-04-12 18:05:46 UTC
Created attachment 122881 [details] same bug with XWayland Hello, I was having the same issue and I recently switched to wayland. I guess it should not come too much as a surpise but the problem is now confined to applications running in XWayland! See the attached screenshot. (running up to date Arch) Does that exclude problems with the intel driver? F.
Comment 26 François Guerraz 2016-04-15 10:50:39 UTC
Is it worth for me to try the xf86-video-intel patch or is it not used in XWayland?
Comment 27 Yury 2016-04-22 13:30:12 UTC
I have the same problem on Kubuntu 16.04 (4.4 kernel, xserver-xorg-video-intel 2:2.99.917+git20160325-1ubuntu1, xorg 7.7), Dell XPS 13 9333 (i7-4500U)
Comment 28 Yury 2016-04-23 07:54:27 UTC
COGL_ATLAS_DEFAULT_BLIT_MODE didn't help me
Comment 29 Aaron Sloman 2016-10-04 22:44:09 UTC
There seem to be lots of bug reports on lots of different web sites complaining about text corruption in gnome utilities (apparently not in other applications, e.g. firefox, libreoffice, opera, terminal windows, etc.) -- including freedesktop bug reports, bugzilla.redhat.com and elsewhere. I wonder whether there is a central place where this this should be reported so as to get it fixed? I am using a workaround that was recommended here: https://fedoramagazine.org/solution-graphics-issues-intel-graphics-chipsets-fedora-22/ and also in various bug reports, namely I.e. create a new text file /etc/X11/xorg.conf.d/20-intel.conf containing Section "Device" Identifier "card0" Driver "intel" Option "AccelMethod" "uxa" EndSection I am using a test kernel that was recommended to fix another i915 bug (hanging): 4.8.0-0.rc8.git2.2.fc26.x86_64 #1 SMP Thu Sep 29 21:09:26 UTC 2016 though it did not fix the text rendering bug. My hardware (vintage 2010, and generally still excellent): 00:02.0 VGA compatible controller : Intel Corporation Core Processor Integrated Graphics Controller [8086:0046] (rev 02) (prog-if 00 [VGA controller]) Subsystem: Dell Latitude E6410 [1028:040a] Flags: bus master, fast devsel, latency 0, IRQ 31 Memory at f0000000 (64-bit, non-prefetchable) [size=4M] Memory at e0000000 (64-bit, prefetchable) [size=256M] I/O ports at 70b0 [size=8] [virtual] Expansion ROM at 000c0000 [disabled] [size=128K] Capabilities:  MSI: Enable+ Count=1/1 Maskable- 64bit- Capabilities: [d0] Power Management version 2 Capabilities: [a4] PCI Advanced Features Kernel driver in use: i915 Kernel modules: i915 I have only had this problem in the last year or so, using fedora 22. I hoped that a switch to f24 would fix it, but not so. Did someone change graphic drivers to use acceleration facilities on new hardware without checking whether the software is running on old hardware? It's strange that the bug affects only text in gnome displays (e.g. network panel, volume control, sound recorder, etc.).
Comment 30 Stefan Gottwald 2016-10-04 23:00:05 UTC
Created attachment 127009 [details] attachment-18877-0.html Dear Sir or Madam, Thank you very much for your message. I am not in the office until October, 24 2016. For urgent topics please contact Klaus Lang firstname.lastname@example.org. Greetings Stefan Gottwald Sehr geehrte Damen und Herren, Vielen Dank für Ihre Nachricht. Ich bin bis zum 24.10.2016 nicht im Büro. Bei dringenden Themen kontaktieren Sie bitte meinen Kollegen Klaus Lang email@example.com. Mit freundlichen Grüßen Stefan Gottwald
Comment 31 Mihail Kasadjikov 2016-10-05 11:02:06 UTC
Hello. I use "blt" instead of "uxa": # cat /etc/X11/xorg.conf.d/10-intel.conf Section "Device" Identifier "Intel HD" Driver "intel" Option "AccelMethod" "blt" EndSection # lspci -vnn -d 8086:0046 00:02.0 VGA compatible controller : Intel Corporation Core Processor Integrated Graphics Controller [8086:0046] (rev 02) (prog-if 00 [VGA controller]) Subsystem: Lenovo Device [17aa:215a] Flags: bus master, fast devsel, latency 0, IRQ 25 Memory at f2000000 (64-bit, non-prefetchable) [size=4M] Memory at d0000000 (64-bit, prefetchable) [size=256M] I/O ports at 1800 [size=8] [virtual] Expansion ROM at 000c0000 [disabled] [size=128K] Capabilities:  MSI: Enable+ Count=1/1 Maskable- 64bit- Capabilities: [d0] Power Management version 2 Capabilities: [a4] PCI Advanced Features Kernel driver in use: i915
Comment 32 Aaron Sloman 2016-10-18 18:59:25 UTC
(In reply to Mihail Kasadjikov from comment #31) > I use "blt" instead of "uxa": Is there any reason "blt" would be preferable? I have searched a little but found no useful information. Many sites mention sna and uxa without mentioning blt. Some web sites suggest avoiding the intel driver and just using common xorg facilities, but I have not found any really clear information for someone with little knowledge. "uxa" has been working for me since I switched from "sna". Would I get any advantage from "blt"? Thanks
Comment 33 Mihail Kasadjikov 2016-10-18 19:12:04 UTC
(In reply to Aaron Sloman from comment #32) > (In reply to Mihail Kasadjikov from comment #31) > > > I use "blt" instead of "uxa": > > Is there any reason "blt" would be preferable? I have searched a little but > found no useful information. Many sites mention sna and uxa without > mentioning blt. The reason is performance. I used test programs from bug 55296.
Comment 34 Aaron Sloman 2016-10-18 22:03:41 UTC
(In reply to Mihail Kasadjikov from comment #33) > (In reply to Aaron Sloman from comment #32) > > > > Is there any reason "blt" would be preferable? I have searched a little but > > found no useful information. Many sites mention sna and uxa without > > mentioning blt. > > The reason is performance. I used test programs from bug 55296. Many thanks. There does not seem to be a standard Fedora 24 version of the test package gtkperf, but I found gtkperf-0.40-21.fc22.x86_64.rpm here https://www.rpmfind.net/linux/RPM/fedora/22/x86_64/g/gtkperf-0.40-21.fc22.x86_64.html and installed it on my six year old Dell Latitude E6410. I was able to run gtkperf with "uxa" and then run it again after switching to "blt" and re-starting X. The results were very impressive (apart from a couple of warnings that I have not investigated, but seem to be trivial): BEFORE TEST: Using gtkperf in ctwm window manager on Fedora 24 (XFCE) TEST WITH "uxt" # gtkperf -c 200 (gtkperf:15734): Gtk-WARNING **: GtkSpinButton: setting an adjustment with non-zero page size is deprecated (gtkperf:15734): Gtk-WARNING **: GtkSpinButton: setting an adjustment with non-zero page size is deprecated GtkPerf 0.40 - Starting testing: Tue Oct 18 21:17:03 2016 GtkEntry - time: 0.11 GtkComboBox - time: 2.18 GtkComboBoxEntry - time: 1.74 GtkSpinButton - time: 0.27 GtkProgressBar - time: 0.18 GtkToggleButton - time: 0.42 GtkCheckButton - time: 0.27 GtkRadioButton - time: 0.40 GtkTextView - Add text - time: 1.25 GtkTextView - Scroll - time: 0.46 GtkDrawingArea - Lines - time: 2.41 GtkDrawingArea - Circles - time: 1.33 GtkDrawingArea - Text - time: 2.14 GtkDrawingArea - Pixbufs - time: 0.33 --- Total time: 13.50 ======================================= AFTER TEST: RUN WITH "blt" # gtkperf -c 200 (gtkperf:16369): Gtk-WARNING **: GtkSpinButton: setting an adjustment with non-zero page size is deprecated (gtkperf:16369): Gtk-WARNING **: GtkSpinButton: setting an adjustment with non-zero page size is deprecated GtkPerf 0.40 - Starting testing: Tue Oct 18 21:24:14 2016 GtkEntry - time: 0.10 GtkComboBox - time: 1.69 GtkComboBoxEntry - time: 1.16 GtkSpinButton - time: 0.23 GtkProgressBar - time: 0.10 GtkToggleButton - time: 0.37 GtkCheckButton - time: 0.21 GtkRadioButton - time: 0.30 GtkTextView - Add text - time: 0.58 GtkTextView - Scroll - time: 0.16 GtkDrawingArea - Lines - time: 0.55 GtkDrawingArea - Circles - time: 0.73 GtkDrawingArea - Text - time: 0.34 GtkDrawingArea - Pixbufs - time: 0.05 --- Total time: 6.57 ================================================================ The reduction from 13.50 I have checked that video works as normal in firefox, e.g. BBC news, Youtube, and also vlc running connected to a digital TV device. I may be imagining things, but I feel everything is a bit more responsive than it was previously: videos start up more promptly, and the slider control on a youtube video works much better to fast-forward. I also have the impression that focus follows the mouse more reliably than previously, when using firefox, but I have not done systematic testing. Hibernate / resume works as normal and so does suspend / resume. And so far there are no signs of any text corruption. In view of all this I can't understand why there are not more sites recommending use of "blt" for people complaining about text corruption, etc. Perhaps this web page will help with this new information. Thanks very much for the tip!
Comment 35 Giuseppe Pandolfo 2016-12-02 17:55:55 UTC
Having random and low occurrence rate issues with font distortions, missing text, and blocked character fonts. It looks to be an XORG server side issue and possibly an Intel X11 video driver issue. If I run a separate X11 client/process like the GTK3-DEMO, this separate client also sees font problems as well. I took a snapshot on October 17, 2016 of the Intel X11 video driver from “https://cgit.freedesktop.org/xorg/driver/xf86-video-intel” but I'm still seeing the same problem. According to “https://wiki.archlinux.org/index.php/intel_graphics”, there are two things to try to get around font issues: • Disable SNA acceleration and move to UXA. ( In Troubleshooting section under “SNA”.) • Set Environment variable “COGL_ATLAS_DEFAULT_BLIT_MODE=framebuffer” I tried using “COGL_ATLAS_DEFAULT_BLIT_MODE=framebuffer” but unfortunately it didn’t resolve the problem. The “SNA” acceleration has the option of “TearFree” support so the option of trying to disabling SNA and switch to UXA produces screen tearing so I need to stick with SNA acceleration. Also choosing either “copy-tex-sub-image” or “get-tex-data” for COGL_ATLAS_DEFAULT_BLIT_MODE will significantly impact the UI’s GPU usage, so this would not be an option. The only recovery is to restart XORG and the applications! Does Intel have any insight on this issue? The Open Source community has being looking for the 3.x release of the Intel X11 video driver. How is that coming along? Here's my current System environment: =================================== • Intel HD Graphics device ID: 0F31 • x86_64 system architecture • XORG version 1.15.0 • xf86-video-intel version 2.99.917-713-geb01cc5 From “git clone https://anongit.freedesktop.org/git/xorg/driver/xf86-video-intel.git” on October 17 ,2016 • kernel version: 3.10.62 • Linux distribution: WindRiver 6.0 • Motherboard : Intel Bay Trail • Display connector: eDP1 • Occurrence rate is low and random, with no reproducible scenario. Attached are: ===================== • CorruptedFonts.png - Example of an occurrence. • xorg.conf - XORG configuration file used at runtime. Currently this is the XORG startup options used: ===================================== Xorg :2 -ac -br -bs -r -nocursor -s 0 -v -dpms -nolisten tcp -extension XVideo +extension Composite -extension XFree86-VidModeExtension -extension XFree86-DGA +extension X-Resource +extension DPMS +extension DAMAGE +extension "Generic Events" +extension DOUBLE-BUFFER +extension RANDR -noreset vt02 -logfile /var/log/ui/Xorg.log -logverbose 10 -fp /opt/xorg/share/fonts/X11/misc -config /opt/xorg/share/X11/xorg.conf.d/xorg.conf Thank you for your support! Giuseppe Pandolfo
Comment 36 Giuseppe Pandolfo 2016-12-02 18:09:51 UTC
Created attachment 128310 [details] Screenshot of missing characters from "Comment 35" Screenshot of missing characters from "Comment 35"
Comment 37 Giuseppe Pandolfo 2016-12-02 18:11:18 UTC
Created attachment 128311 [details] xorg.conf file from "Comment 35" xorg.conf file from "Comment 35"
Comment 38 Elvis Stansvik 2017-02-10 08:36:44 UTC
Stefan Gottwald: Did you every bring your patch/findings (in comment #24) to the attention of the Intel folks? If the fix is indeed correct, then perhaps Chris Wilson at Intel should be notified? Looking at xf86-video-intel master, the if condition is still using UNCACHED and not CREATE_UNCACHED as in your patch, and bo->domain is not assigned DOMAIN_NONE.
Comment 39 Chris Wilson 2017-02-10 08:46:04 UTC
You have hijacked this bug report about a potential missing pipeline flush inside the Ironlake state emission. This bug has nothing to do with the DRI issues, i.e. everything from comment 12 onwards.
Comment 40 Elvis Stansvik 2017-02-10 08:51:26 UTC
Chris Wilson: Ah, my bad. I found this bug by coming from https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/1573959 and thought this was the same.