Bug 80745

Summary: [ALL Regression] master device leak
Product: DRI Reporter: zhoujian <jianx.zhou>
Component: DRM/IntelAssignee: Antti Koskipaa <antti.koskipaa>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: critical    
Priority: highest CC: eero.t.tamminen, imre.deak, intel-gfx-bugs, mengmeng.meng
Version: unspecified   
Hardware: All   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Xorg.0.log
none
dmesg.log
none
dmesg_hsw25.log none

Description zhoujian 2014-07-01 05:15:54 UTC
Platform: HSW
Libdrm: (master)libdrm-2.4.54-17-ge8c3c1358ecaf4e90f7d43762357ae6f8e2022b6
Mesa: (master)15b5e663b050505683b7b4c9c489e46863b8441d
Xserver: (master)xorg-server-1.15.99.902-121-g2f5cf9ff9a0f713b7e038636484c77f11
Xf86_video_intel: (master)2.99.912-188-ga8b0ba0ed5f27e0d671b650d6f4bfe2fd86a0f3f
Cairo: (master)f574fec8d2d1f83525fd7e4dbb266b6e5091627d
Libva: (master)c61d8c6ce9ffc27320e9e177c1e1123d5f1b5014
Libva_intel_driver: (master)745340dd013399f64507de73401ab3adb712dad5
Kernel: drm-intel-nightly git-1087d4


Bug detailed description:
--------------------------------------------------------------------------------
Show “Fatal server error” when run some games on HSW, This issue doesn’t exists on IVB/BYT-M/BDW,Please see Xorg.o.log and dmesg.log.
Output:
Fatal server error:
(EE) no screens found(EE)
(EE)
Please consult the The X.Org Foundation support
         at http://wiki.x.org
 for help.
(EE) Please also check the log file at "/opt/X11R7/var/log/Xorg.0.log" for additional information.
(EE)
(EE) Server terminated with error (1). Closing log file.


Reproduce steps:
--------------------------------------------------------------------------------
1. ./piglit-run.py  tests/performance.tests Performance
2. After run the some games, the bug will happened.
3. show “Fatal server error”.
Comment 1 zhoujian 2014-07-01 08:45:55 UTC
Created attachment 102061 [details]
Xorg.0.log
Comment 2 zhoujian 2014-07-01 08:46:25 UTC
Created attachment 102062 [details]
dmesg.log
Comment 3 zhoujian 2014-07-01 09:02:35 UTC
Update the Reproduce steps:
1. run doom3 games about 20 times.
2. show “Fatal server error” when start X.

BYT,once we get X error then can't restart it.
Comment 4 Antti Koskipaa 2014-07-03 12:57:11 UTC
Ok, I need a longer dmesg log file with drm.debug=0xe. The one provided doesn't show anything useful.

Also, is this reproducible on SNB?
Comment 5 zhoujian 2014-07-07 06:47:05 UTC
Created attachment 102346 [details]
dmesg_hsw25.log
Comment 6 zhoujian 2014-07-07 06:51:23 UTC
Have updae the dmesg log file with drm.debug=0xe,you may see dmesg_hsw25.log
Comment 7 Eero Tamminen 2014-07-10 09:43:27 UTC
Is this issue reproducible with a test-case that:
- doesn't require QA's own (internal) piglit tests
- doesn't require commercial games

(Antti doesn't have Doom3 nor access to your piglit tests.)
Comment 8 Chris Wilson 2014-07-10 09:49:37 UTC
It can be reproduced with just X. However higher frequency seems to correlate with having multiple active DRI clients and sending a kill signal to X.
Comment 9 Eero Tamminen 2014-07-10 09:54:03 UTC
(In reply to comment #8)
> It can be reproduced with just X. However higher frequency seems to
> correlate with having multiple active DRI clients and sending a kill signal
> to X.

So correct steps are re-starting also X, not just DRI clients on top of X?  Is e.g. glxgears enough?


Bug descriptions says that this doesn't happen on IVB/BYT-M/BDW, but title states "ALL".  Which one is correct?
Comment 10 Antti Koskipaa 2014-07-10 13:00:20 UTC
(In reply to comment #8)
> It can be reproduced with just X. However higher frequency seems to
> correlate with having multiple active DRI clients and sending a kill signal
> to X.

By "frequency" you mean frequency of occurrence, right? So how often should I expect the bug to happen? Tried two rounds of about 40 instances of glxgears and then killing X with SIGTERM. Tried on Ubuntu 13.10 with kernel drm-intel-nightly 26d2131dd6cc564431d75e56d7d00d99a2f5b29d. Everything else stock. Could not reproduce.
Comment 11 zhoujian 2014-07-16 03:23:10 UTC
> Bug descriptions says that this doesn't happen on IVB/BYT-M/BDW, but title
> states "ALL".  Which one is correct?
The title states should be "ALL",in our side,reproduced percentage of low on IVB/BYT-M/BDW.
Comment 12 zhoujian 2014-08-07 02:41:05 UTC
Hi Chris,could you please tell me which branch is this patch in? thanks
patch: http://patchwork.freedesktop.org/patch/30102/
Comment 13 wendy.wang 2014-08-19 07:11:25 UTC
Applied patch: http://patchwork.freedesktop.org/patch/30102/ against drm-intel-nightly kernel

Runned following games, did not reproduce device mask leak failure, check through 
lsof /dev/dri/card0 or /sys/kernel/debug/dri/0/i915_gem_object.

Doom3 v1.3.1
etqw-demo
Lightsmark v2008
OpenArena v0.8.8
Padman v1.2
Smokin-Guns v1.1
UrbanTerror 4.1
Warsow v1.0
X11perf v1.5--aa10text
X11perf v1.5--rgb10text

But after this patch, these game playing will be very very slow, above games cost 6 hours to finish running.
Comment 14 wendy.wang 2014-08-20 06:15:08 UTC
We confirmed comment 13's result was caused by kernel bug.
 
Re-apply the patch http://patchwork.freedesktop.org/patch/30102/  against another drm-intel-nightly commit: 2b6e6b9c29dbdaf596cad99877384af8b406d103,
Cannot reproduce master device leak issue after running below games/demos:

x11perf_aa10text 23266666.67
x11perf_rgb10text 15233333.33
doom3 96.83
doom3.sh.power 25.19
etqw_1_10 59.8
etqw_1_10.sh.power 21.05
etqw-demo 59.77
etqw-demo.sh.power 30.14
lightsmark 115.93
lightsmark.sh.power 26.24
openarena 80.3
openarena.sh.power 38.52
padman 375.57
padman.sh.power 21.19
smokin-guns 326.7
smokin-guns.sh.power 12.15
urbanterror 229.63
urbanterror.sh.power 12.55
warsow01 195.6
warsow01.sh.power 23.93
xonotic07 413.79
xonotic07.sh.power 21.5
Comment 15 wendy.wang 2014-08-20 08:11:21 UTC
Pls merge up the patch, thanks.
Comment 16 Jani Nikula 2014-09-02 12:47:31 UTC
(In reply to comment #15)
> Pls merge up the patch, thanks.

commit 47bd34e99649ea2905fd97112793c300c8eeddf7
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Aug 7 14:20:40 2014 +0100

    drm/i915: Prevent recursive deadlock on releasing a busy userptr

has been applied. Please reopen if the problem persists with current nightly.
Comment 17 zhoujian 2014-09-09 07:35:23 UTC
Verified it,Verified the commit is git-51c49e(drm-intel-nightly).
Comment 18 Jari Tahvanainen 2016-11-22 07:30:25 UTC
Closing verified+fixed. commit 850ebc0.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.