Summary: | [ILK i386 Bisected]X start fail | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | lu hua <huax.lu> | ||||||
Component: | DRM/Intel | Assignee: | Daniel Vetter <daniel> | ||||||
Status: | CLOSED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||||
Severity: | critical | ||||||||
Priority: | high | CC: | htd, intel-gfx-bugs, ugis.germanis, ville.syrjala | ||||||
Version: | unspecified | ||||||||
Hardware: | x86 (IA32) | ||||||||
OS: | Linux (All) | ||||||||
Whiteboard: | |||||||||
i915 platform: | i915 features: | ||||||||
Attachments: |
|
Description
lu hua
2014-11-25 02:20:31 UTC
Created attachment 109973 [details]
Xorg.0.log
Oh this looks bad, division by 0 hooray. Please supply the bisect result. And smells like watermark fun for Ville. Note that you may not hit the failure during X start like QA, but should using "xset dpms force off; xset dpms force on". This looks somewhat familiar. I think it might be the zeroed mode problem I also hit with my ILK a while back. I'll have to see if I can reproduce it still. (In reply to Daniel Vetter from comment #2) > Oh this looks bad, division by 0 hooray. > > Please supply the bisect result. Between good commit and bad commit, bug 85277 causes boot fail, I am not sure it impacts bisect. I will give it a try. Bisect shows 83f45fc360c8e16a330474860ebda872d1384c8c is the first bad commit. revert it, this issue goes away. commit 83f45fc360c8e16a330474860ebda872d1384c8c Author: Daniel Vetter <daniel.vetter@ffwll.ch> AuthorDate: Wed Aug 6 09:10:18 2014 +0200 Commit: Daniel Vetter <daniel.vetter@ffwll.ch> CommitDate: Wed Aug 6 10:41:13 2014 +0200 drm: Don't grab an fb reference for the idr The current refcounting scheme is that the fb lookup idr also holds a reference. This works out nicely bacause thus far we've always explicitly cleaned up idr entries for framebuffers: - Userspace fbs get removed in the rmfb ioctl or when the drm file gets closed. - Kernel fbs (for fbdev emulation) get cleaned up by the driver code at module unload time. But now i915 also reconstructs the bios fbs for a smooth transition. And that fb is purely transitional and should get removed immmediately once all crtcs stop using it. Of course if the i915 fbdev code decides to reuse it as the main fbdev fb then it shouldn't be cleaned up, but in that case the fbdev code will grab it's own reference. The problem is now that we also want to register that takeover fb in the idr, so that userspace can do a smooth transition (animated maybe even!) itself. But currently we have no one who will clean up the idr reference once that fb isn't useful any more, and so essentially leak it. Fix this by no longer holding a full fb reference for the idr, but instead just have a weak reference using kref_get_unless_zero. But that requires us to synchronize and clean up with the idr and fb_lock in drm_framebuffer_free, so add that. It's a bit ugly that we have to unconditionally grab the fb_lock, but without that someone might creep through a race. This leak was caught by the fb leak check in drm_mode_config_cleanup. Originally the leak was introduced in commit 46f297fb83d4f9a6f6891964beb184664341a28b Author: Jesse Barnes <jbarnes@virtuousgeek.org> Date: Fri Mar 7 08:57:48 2014 -0800 drm/i915: add plane_config fetching infrastructure v2 Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77511 Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> The bisect is misleading as that fixes the leak of the framebuffer (thus preventing the subsequent PIN_BIAS failure thus preventing from switching the displays off/on). Does it still blow up in the wm code if you do xset dpms force off; xset dpms force on? My ILK was hitting this already for some time but got magically fixed recently. Reverse bisect points at: commit c211a47c2c28562f8a3fff9e027be1a3ed9e154a Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Mon Nov 24 11:12:42 2014 +0100 drm/i915: Disallow pin ioctl completely for kms drivers Oh and that machine is running 64bit gentoo. I dumperd the setcrtc struct coming in the kernel and it looks like this: [ 52.334815] [drm:drm_ioctl] setcrtc asize=104 usize=104 drv_size=104 [ 52.334823] 00000000: a7b709b0 00007fff 00000001 00000008 [ 52.334828] 00000010: 00000022 00000000 00000000 00000000 [ 52.334832] 00000020: 00000001 00000000 00000000 00000000 [ 52.334837] 00000030: 00000000 00000000 00000000 00000000 [ 52.334842] 00000040: 00000000 00000000 00000000 00000000 [ 52.334846] 00000050: 00000000 00000000 00000000 00000000 [ 52.334850] 00000060: 00000000 00000000 So X is really is trying to set a zeroed mode :P This is the third setcrtc coming in when starting X, the previous two were requests to disable both crtcs (and they were totally zeroed apart from the crtc id). I suppose the next question is how did X come up with that zeroed mode. Did the kernel tell it that such a mode was already used and it's trying to set it again, or did it extract it from somewhere else... Obviosuly we should add some checks to the kernel as well to make sure we don't explode when encountering such an invalid mode... The simplest way would be to compile xf86-video-intel with --enable-debug=full and read back the reasons for the last modeset before the crash. *** Bug 87330 has been marked as a duplicate of this bug. *** *** Bug 87279 has been marked as a duplicate of this bug. *** Regresses here and there, time for a revert? (In reply to Jani Nikula from comment #14) > Regresses here and there, time for a revert? Revert of what? You have already applied the "fix" from Daniel to ban the pin ioctl completely. This bug is about userspace being able to oops the kernel by feeding it a malicious mode. Bug 87279 is about corruption in the ddx using sw fallbacks, and bug 87330 has not been analysed to find the actual root cause. Some kernel fixes that prevent the kernel from blowing up with the zeroed mode: http://lists.freedesktop.org/archives/dri-devel/2014-December/074160.html (In reply to Chris Wilson from comment #15) > (In reply to Jani Nikula from comment #14) > > Regresses here and there, time for a revert? > > Revert of what? You have already applied the "fix" from Daniel to ban the > pin ioctl completely. This bug is about userspace being able to oops the > kernel by feeding it a malicious mode. Bug 87279 is about corruption in the > ddx using sw fallbacks, and bug 87330 has not been analysed to find the > actual root cause. drm-i915-Disallow-pin-ioctl-completely-for-kms-drive seems to fix https://bugs.freedesktop.org/show_bug.cgi?id=87279 bug aswell, which has been marked as a duplicate of this bug drm-i915-Disallow-pin-ioctl-completely-for-kms-drive applies cleanly to 3.18.1, but the kernel then fails to compile. Thus, I can't check if this patch fixes my bug, which has been marked as a duplicate of this one. (In reply to Heinz from comment #18) > drm-i915-Disallow-pin-ioctl-completely-for-kms-drive applies cleanly to > 3.18.1, but the kernel then fails to compile. Thus, I can't check if this > patch fixes my bug, which has been marked as a duplicate of this one. Can't confirm. 3.18.1 builds fine here with the patch. Check your configuration, maybe you enabled something experimental I'm writing this form 3.18.1 Linux ChakraPC 3.18.1-1-CHAKRA #1 SMP PREEMPT Thu Dec 18 10:53:51 EET 2014 x86_64 GNU/Linux Here's the compiler output where it fails on an otherwise bog standard vanilla 3.18.1, which clearly indicates that it's not some weird .config options which are the cause. GCC is: gcc version 4.9.2 20141101 (Red Hat 4.9.2-1) (GCC) drivers/gpu/drm/i915/i915_gem.c:4282:1: error: redefinition of ‘i915_gem_pin_ioctl’ i915_gem_pin_ioctl(struct drm_device *dev, void *data, ^ drivers/gpu/drm/i915/i915_gem.c:4189:1: note: previous definition of ‘i915_gem_pin_ioctl’ was here i915_gem_pin_ioctl(struct drm_device *dev, void *data, ^ drivers/gpu/drm/i915/i915_gem.c:4335:1: error: redefinition of ‘i915_gem_unpin_ioctl’ i915_gem_unpin_ioctl(struct drm_device *dev, void *data, ^ drivers/gpu/drm/i915/i915_gem.c:4245:1: note: previous definition of ‘i915_gem_unpin_ioctl’ was here i915_gem_unpin_ioctl(struct drm_device *dev, void *data, ^ scripts/Makefile.build:257: recipe for target 'drivers/gpu/drm/i915/i915_gem.o' failed make[4]: *** [drivers/gpu/drm/i915/i915_gem.o] Error 1 scripts/Makefile.build:402: recipe for target 'drivers/gpu/drm/i915' failed make[3]: *** [drivers/gpu/drm/i915] Error 2 scripts/Makefile.build:402: recipe for target 'drivers/gpu/drm' failed make[2]: *** [drivers/gpu/drm] Error 2 scripts/Makefile.build:402: recipe for target 'drivers/gpu' failed make[1]: *** [drivers/gpu] Error 2 Makefile:937: recipe for target 'drivers' failed make: *** [drivers] Error 2 make: *** Waiting for unfinished jobs.... (In reply to Heinz from comment #20) > Here's the compiler output where it fails on an otherwise bog standard > vanilla 3.18.1, which clearly indicates that it's not some weird .config > options which are the cause. GCC is: gcc version 4.9.2 20141101 (Red Hat > 4.9.2-1) (GCC) > > drivers/gpu/drm/i915/i915_gem.c:4282:1: error: redefinition of > ‘i915_gem_pin_ioctl’ > i915_gem_pin_ioctl(struct drm_device *dev, void *data, > ^ > drivers/gpu/drm/i915/i915_gem.c:4189:1: note: previous definition of > ‘i915_gem_pin_ioctl’ was here > i915_gem_pin_ioctl(struct drm_device *dev, void *data, > ^ > drivers/gpu/drm/i915/i915_gem.c:4335:1: error: redefinition of > ‘i915_gem_unpin_ioctl’ > i915_gem_unpin_ioctl(struct drm_device *dev, void *data, > ^ > drivers/gpu/drm/i915/i915_gem.c:4245:1: note: previous definition of > ‘i915_gem_unpin_ioctl’ was here > i915_gem_unpin_ioctl(struct drm_device *dev, void *data, > ^ > scripts/Makefile.build:257: recipe for target > 'drivers/gpu/drm/i915/i915_gem.o' failed > make[4]: *** [drivers/gpu/drm/i915/i915_gem.o] Error 1 > scripts/Makefile.build:402: recipe for target 'drivers/gpu/drm/i915' failed > make[3]: *** [drivers/gpu/drm/i915] Error 2 > scripts/Makefile.build:402: recipe for target 'drivers/gpu/drm' failed > make[2]: *** [drivers/gpu/drm] Error 2 > scripts/Makefile.build:402: recipe for target 'drivers/gpu' failed > make[1]: *** [drivers/gpu] Error 2 > Makefile:937: recipe for target 'drivers' failed > make: *** [drivers] Error 2 > make: *** Waiting for unfinished jobs.... Chakra's gcc is 4.9.1 Here's our build process https://gitorious.org/chakra-packages/core/source/399186728de12a5117ca508de724e9f4da1df0c6:linux/PKGBUILD The gist of it is 1)make mrproper 2)Apply patches (3.18.1 patch first) 3)Make prepare 4)load configuration 5)make ${MAKEFLAGS} LOCALVERSION= bzImage modules 6) Packages everything (In reply to Heinz from comment #20) > Here's the compiler output where it fails on an otherwise bog standard > vanilla 3.18.1, which clearly indicates that it's not some weird .config > options which are the cause. GCC is: gcc version 4.9.2 20141101 (Red Hat > 4.9.2-1) (GCC) > > drivers/gpu/drm/i915/i915_gem.c:4282:1: error: redefinition of > ‘i915_gem_pin_ioctl’ > i915_gem_pin_ioctl(struct drm_device *dev, void *data, > ^ > drivers/gpu/drm/i915/i915_gem.c:4189:1: note: previous definition of > ‘i915_gem_pin_ioctl’ was here > i915_gem_pin_ioctl(struct drm_device *dev, void *data, > ^ > drivers/gpu/drm/i915/i915_gem.c:4335:1: error: redefinition of > ‘i915_gem_unpin_ioctl’ > i915_gem_unpin_ioctl(struct drm_device *dev, void *data, > ^ > drivers/gpu/drm/i915/i915_gem.c:4245:1: note: previous definition of > ‘i915_gem_unpin_ioctl’ was here > i915_gem_unpin_ioctl(struct drm_device *dev, void *data, > ^ > scripts/Makefile.build:257: recipe for target > 'drivers/gpu/drm/i915/i915_gem.o' failed > make[4]: *** [drivers/gpu/drm/i915/i915_gem.o] Error 1 > scripts/Makefile.build:402: recipe for target 'drivers/gpu/drm/i915' failed > make[3]: *** [drivers/gpu/drm/i915] Error 2 > scripts/Makefile.build:402: recipe for target 'drivers/gpu/drm' failed > make[2]: *** [drivers/gpu/drm] Error 2 > scripts/Makefile.build:402: recipe for target 'drivers/gpu' failed > make[1]: *** [drivers/gpu] Error 2 > Makefile:937: recipe for target 'drivers' failed > make: *** [drivers] Error 2 > make: *** Waiting for unfinished jobs.... Can you build drm-intel-hightly kernel http://cgit.freedesktop.org/drm-intel?h=drm-intel-nightly To see if the same problem exists there Current drm-intel git clone from today builds flawlessly and boots fine into X. (In reply to Heinz from comment #23) > Current drm-intel git clone from today builds flawlessly and boots fine into > X. Thanks for letting me know. I warned Chakra devs against using this patch as there could be some unforeseen consequences, it seems (In reply to Heinz from comment #23) > Current drm-intel git clone from today builds flawlessly and boots fine into > X. Hmm, One of our devs told that if you get redefinition errors (as in your previous post) it usually means that the patch is not correctly applied For sanity maybe try this patch directly from cgit's page http://cgit.freedesktop.org/drm-intel/patch/?id=83f45fc360c8e16a330474860ebda872d1384c8c (In reply to Ugis Germanis from comment #25) > (In reply to Heinz from comment #23) > > Current drm-intel git clone from today builds flawlessly and boots fine into > > X. > > Hmm, One of our devs told that if you get redefinition errors (as in your > previous post) it usually means that the patch is not correctly applied > > For sanity maybe try this patch directly from cgit's page > http://cgit.freedesktop.org/drm-intel/patch/ > ?id=83f45fc360c8e16a330474860ebda872d1384c8c I'm sorry I posted the wrong patch (I need sleep) This is the correct patch http://cgit.freedesktop.org/drm-intel/patch/?id=d472fcc8379c062bd56a3876fc6ef22258f14a91 Thanks! Applies cleanly and boots fine. Presumed fixed by commits commit 05acaec334fcc1132d1e48c5042e044651e0b75b Author: Ville Syrjälä <ville.syrjala@linux.intel.com> Date: Wed Dec 17 13:56:22 2014 +0200 drm: Reorganize probed mode validation commit abc0b1447d4974963548777a5ba4a4457c82c426 Author: Ville Syrjälä <ville.syrjala@linux.intel.com> Date: Wed Dec 17 13:56:23 2014 +0200 drm: Perform basic sanity checks on probed modes commit 23e1ce89af5404c7a35dbd008ca85fb6adb16aad Author: Ville Syrjälä <ville.syrjala@linux.intel.com> Date: Wed Dec 17 13:56:24 2014 +0200 drm: Do basic sanity checks for user modes in drm-next, queued for 3.20. Also included in drm-intel-nightly. Please reopen if the problem persists. It works well. Verified. Closing old verified. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.