Bug 59771 - [uxa PNV] EDEADLCK: Characters and fonts overwriting and/or missing, bad rendering and frozen sections of windows on >2.20.18
Summary: [uxa PNV] EDEADLCK: Characters and fonts overwriting and/or missing, bad rend...
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/intel (show other bugs)
Version: 7.7 (2012.06)
Hardware: x86-64 (AMD64) Linux (All)
: medium major
Assignee: Chris Wilson
QA Contact: Intel GFX Bugs mailing list
URL: https://bbs.archlinux.org/viewtopic.p...
Whiteboard:
Keywords:
: 59769 61717 80096 86031 (view as bug list)
Depends on:
Blocks:
 
Reported: 2013-01-23 17:05 UTC by freedesktop
Modified: 2014-11-21 10:01 UTC (History)
10 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Fix up fence counts (1.30 KB, patch)
2013-03-03 10:05 UTC, Chris Wilson
no flags Details | Splinter Review
Avoid overcounting fences for self-relocs (1.65 KB, patch)
2013-05-08 16:45 UTC, Chris Wilson
no flags Details | Splinter Review

Description freedesktop 2013-01-23 17:05:57 UTC
I've got a bug since 2.20.18, and I'm not the only one: https://bbs.archlinux.org/viewtopic.php?id=156486

I'm running on a netbook with atom n450 and archlinux, using libdrm 2.4.41.

Basically everything goes well until a intensive video app is launched, like mplayer2 or skype, then nothing is rendered properly, characters are written on top of eachothers, half of them are missing, some sections are just black.

Here are some logs (Xorg and dmesg): http://pastebin.mozilla.org/2080842 http://pastebin.mozilla.org/2080843

You can see the repeating error "(EE) intel(0): Failed to submit batch buffer, expect rendering corruption or even a frozen display: Resource deadlock avoided."

I have been able to remove this behaviour by downgrading to 2.20.17 (not 2.20.18).
Comment 1 freedesktop 2013-01-23 17:08:50 UTC
*** Bug 59769 has been marked as a duplicate of this bug. ***
Comment 2 pjb1 2013-01-23 17:54:12 UTC
Chris Wilson wanted to know the libdrm on the dup bug 59769. My upgrade, that showed the bug was as follows:

upgraded libdrm (2.4.40-1 -> 2.4.41-1)
Comment 3 pjb1 2013-01-23 17:58:13 UTC
Oh, he also asked if it is a "composited wm". I don't know what that is, but I am running LXDE.
Comment 4 freedesktop 2013-01-23 18:00:43 UTC
I'm using i3wm, and libdrm 2.4.41. Since the combination of libdrm 2.4.41 + xf86-video-intel 2.20.17 is not affected, I don't think libdrm is involved in this bug.
Comment 5 pjb1 2013-01-23 18:06:10 UTC
It appears all 3 users reporting the error are on Intel Atom processors. It looks like we are pushing those little things too hard!
Comment 6 Chris Wilson 2013-01-23 18:08:26 UTC
(In reply to comment #5)
> It appears all 3 users reporting the error are on Intel Atom processors. It
> looks like we are pushing those little things too hard!

No. Switch to SNA to understand just how wrong that statement is.
Comment 7 Chris Wilson 2013-01-24 16:28:29 UTC
I have tried to reproduce this by using mplayer + lxsession on a pineview machine. It remains happy. Are you able to refine your test case to a single application running under a bare X session?
Comment 8 freedesktop 2013-01-24 17:36:52 UTC
Try connecting an external screen on VGA. I haven't been able to reproduce the bug on LVDS only.
Comment 9 pjb1 2013-01-24 20:26:39 UTC
It took some doing but I reproduced the error. I closed all windows in LXDE, opened a single console session, expanded the window to full size, did an "su root", and ran the command "journalctl --no-pager|grep -i error". By the time I got through the log I had seen some blankouts. I got another console going and looked at the X log and sure enough the errors were there.

It seemed more difficult than last time I ran into this, but last time I had Seamonkey going with multiple tabs and other sorts of things running at the same time, along with multiple console windows.
Comment 10 pjb1 2013-01-24 20:29:48 UTC
Oh, by the way, as my machine is an Intel D510MO, VGA is all I have. The monitor is running at 1440x900 pixels.
Comment 11 freedesktop 2013-02-04 15:11:21 UTC
I tried the latest 2.21.0 drivers, on LVDS-1 as usual no problem, but when adding VGA-1 (and if running some video player) it segfault instead of just messing up rendering:

http://pastebin.mozilla.org/2113784

-> Back to 2.20.17
Comment 12 freedesktop 2013-02-16 10:23:59 UTC
Tried with 2.21.2, same result -> segfault and X crash when playing video.
Back to 2.20.17

log: http://pastebin.mozilla.org/2144623
Comment 13 Chris Wilson 2013-02-16 10:29:05 UTC
Can you please do 'addr2line -e /usr/lib/xorg/modules/drivers/intel_drv.so 0x22a0a 0x1df1a ; addr2line -e /usr/bin/X 0x8fb5c 0xd9b55 0x37e51 0x2695a'. Making sure you have the debug symbols.
Comment 14 freedesktop 2013-02-20 16:41:55 UTC
addr2line -e /usr/lib/xorg/modules/drivers/intel_drv.so 0x22a0a 0x1df1a:
xf86-video-intel-2.21.2/src/intel_batchbuffer.h:88
xf86-video-intel-2.21.2/src/i915_render.c:486

That's all I have for now, didn't yet succeed building X with debug.
Comment 15 auxsvr 2013-02-25 21:18:36 UTC
The message 

intel(0): Failed to submit batch buffer, expect rendering corruption: Resource deadlock avoided.

appears on an Atom N450 netbook here, too, with libdrm2-2.4.42-100, xf86-video-intel-2.21.3-54.1, openSUSE 12.2. It is not necessary to play video to trigger this, and the effect it has in my case is that graphics operations become slow; no corruption, no crashes so far.
Comment 16 Chris Wilson 2013-03-03 00:19:05 UTC
*** Bug 61717 has been marked as a duplicate of this bug. ***
Comment 17 Chris Wilson 2013-03-03 10:05:29 UTC
Created attachment 75820 [details] [review]
Fix up fence counts

My belief is that the error in fence counting is magnified through the clear_relocs() function, and if true this patch should fix up the leak.
Comment 18 auxsvr 2013-05-03 13:39:13 UTC
Glyph corruption has been absent for the past two days after I applied the patch, even after suspend to RAM. This seems to be fixed, thanks.
Comment 19 Chris Wilson 2013-05-08 16:45:15 UTC
Created attachment 79029 [details] [review]
Avoid overcounting fences for self-relocs

I think this is the root cause of the miscounting issue. Please test without the other patch applied.
Comment 20 auxsvr 2013-05-08 20:05:26 UTC
Empty windows and the message 

(EE) intel(0): Failed to submit batch buffer, expect rendering corruption: Resource deadlock avoided.

just appeared with the second patch applied.
Comment 21 M.R 2013-05-11 18:20:42 UTC
Second patch didn't fix it for me either.

Atom N550
Desktop: GNOME 3.8 (Fedora 19)
Patch applied on libdrm commit 040f6b015e
Comment 22 Gordon Jin 2013-05-24 03:42:46 UTC
clear needinfo
Comment 23 Adam Huffman 2013-09-20 11:28:58 UTC
Has there been any progress? This corruption is happening very frequently for me now, on a Q35 system running Fedora 19, driver version 2.21.12. libdrm version is 2.4.46.

While I haven't seen a pattern in what triggers the bug, I do have a *lot* of Firefox tabs, over two windows.
Comment 24 Chris Wilson 2013-09-20 11:34:25 UTC
The fix in libdrm lies unreviewed. In the meantime the default has changed to SNA which renders this code obsolete.
Comment 25 Ben Armstrong 2013-11-11 15:08:33 UTC
This bug currently affects Debian sid and probably also jessie, since it has the same versions of libdrm and the intel driver as are now in sid. See:

http://bugs.debian.org/725781

Like some others reporting problems, my system is an Atom n450 netbook and I'm using an external VGA display with it.
Comment 26 Ingo Saitz 2014-04-27 13:59:54 UTC
I can confirm that the first patch works for me, for details see https://bugs.freedesktop.org/show_bug.cgi?id=78000

After applying the patch to libdrm 2.4.52-1 (current debian unstable package) I immediately get after startup of the X server the expected error on stdout:

Fixing up fence counts; was -1, expected 0

I was however not experiencinf any glyph errors lately, but dri completely stopped working. After this patch it continues to work.
Comment 27 Chris Wilson 2014-07-19 11:23:47 UTC
*** Bug 80096 has been marked as a duplicate of this bug. ***
Comment 28 Chris Wilson 2014-11-08 19:23:19 UTC
*** Bug 86031 has been marked as a duplicate of this bug. ***
Comment 29 Daniel Vetter 2014-11-21 09:16:05 UTC
I've reviewed the fix over a year ago, but somehow forgotten to push it. Done now:

commit ec65f8d71eb3eb065c7cadf4153138435ac3b388
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed May 8 16:30:44 2013 +0100

    intel: Avoid overcounting fences when emitting self-referential relocs
Comment 30 mikhail.v.gavrilov 2014-11-21 09:57:42 UTC
(In reply to Daniel Vetter from comment #29)
> I've reviewed the fix over a year ago, but somehow forgotten to push it.
> Done now:
> 
> commit ec65f8d71eb3eb065c7cadf4153138435ac3b388
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Wed May 8 16:30:44 2013 +0100
> 
>     intel: Avoid overcounting fences when emitting self-referential relocs

Please look this: https://bugs.freedesktop.org/show_bug.cgi?id=86378

This is same or not?
Comment 31 Chris Wilson 2014-11-21 10:01:27 UTC
(In reply to mikhail.v.gavrilov from comment #30)
> Please look this: https://bugs.freedesktop.org/show_bug.cgi?id=86378
> 
> This is same or not?

No, that's a different bug. Most likely the lack of synchronisation between the compositor and X.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.