Summary: | i965_dri.so calls abort() if it doesn't recognize the PCI ID. | ||
---|---|---|---|
Product: | Mesa | Reporter: | lu hua <huax.lu> |
Component: | Drivers/DRI/i965 | Assignee: | Kenneth Graunke <kenneth> |
Status: | VERIFIED FIXED | QA Contact: | Intel 3D Bugs Mailing List <intel-3d-bugs> |
Severity: | critical | ||
Priority: | high | CC: | ben, chris, mengmeng.meng |
Version: | git | ||
Hardware: | All | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: | Xorg.0.log |
Description
lu hua
2013-12-25 05:26:10 UTC
It only happens on Broadwell. It happens on master branch and 1.14 branch. Please let me know if the problem still exists with latest X server, DDX, and BDW mesa. (In reply to comment #2) > Please let me know if the problem still exists with latest X server, DDX, > and BDW mesa. It still exists. Just to confirm a theory can you please build xserver with ./configure --disable-dri --disable-dri2 --disable-dri3 --disable-aiglx? (In reply to comment #4) > Just to confirm a theory can you please build xserver with ./configure > --disable-dri --disable-dri2 --disable-dri3 --disable-aiglx? Add these parameters, It works well. Ok, we are getting somewhere! Can you please now try each of the parameters independently: --disable-dri --disable-dri2 --disable-dri3 --disable-aiglx and see which extension is triggering the assertion? (In reply to comment #6) > Ok, we are getting somewhere! > > Can you please now try each of the parameters independently: > > --disable-dri bad > --disable-dri2 good > --disable-dri3 bad > --disable-aiglx good So that conclusively points toward AIGLX triggering the broken code. (I say triggering as I suspect something it dlopens throws the error). Can you please launch X under gdb, set a breakpoint on _exit and grab a backtrace? (something like gdb Xorg -ac -noreset ; b _exit ; bt;) Actually, dri2 not aiglx. Reading through the changes between the two endpoints (I should have asked for a bisect I gather), I only see a couple of relevant patches: commit 6e926b18ca1b182253bac435a1d53caaff7ffff6 Author: Eric Anholt <eric@anholt.net> Date: Thu Nov 14 17:40:46 2013 -0800 glx: Fix incorrect use of dri_interface.h version defines in extensions. Those defines are so you can compile-time check "do I have a dri_interface.h that defines this new field of the struct?" You don't want the server to claim it implements the new struct just because you installed a new copy of Mesa. Signed-off-by: Keith Packard <keithp@keithp.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> commit ac772cb187ddf7e04b8f4b3a071b90f18f4488d0 Author: Eric Anholt <eric@anholt.net> Date: Thu Nov 14 17:40:47 2013 -0800 glx: Fix incorrect use of dri_interface.h version defines in driver probing. If we extend __DRI_CORE or __DRI_SWRAST in dri_interface.h to allow a new version, it shouldn't make old server code retroactively require the new version from swrast drivers. Notably, new Mesa defines __DRI_SWRAST version 4, but we still want to be able to probe version 1 drivers, since we don't use any features beyond version 1 of the struct. Signed-off-by: Keith Packard <keithp@keithp.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> You can either run a bisect, or just try reverting those two. However, those still look like colateral damage. Note: QA team is in holiday and will be back Feb 7. (In reply to comment #9) > Actually, dri2 not aiglx. > > Reading through the changes between the two endpoints (I should have asked > for a bisect I gather), I only see a couple of relevant patches: > > commit 6e926b18ca1b182253bac435a1d53caaff7ffff6 > Author: Eric Anholt <eric@anholt.net> > Date: Thu Nov 14 17:40:46 2013 -0800 > > glx: Fix incorrect use of dri_interface.h version defines in extensions. > > Those defines are so you can compile-time check "do I have a > dri_interface.h that defines this new field of the struct?" You don't > want the server to claim it implements the new struct just because you > installed a new copy of Mesa. > > Signed-off-by: Keith Packard <keithp@keithp.com> > Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> > > commit ac772cb187ddf7e04b8f4b3a071b90f18f4488d0 > Author: Eric Anholt <eric@anholt.net> > Date: Thu Nov 14 17:40:47 2013 -0800 > > glx: Fix incorrect use of dri_interface.h version defines in driver > probing. > > If we extend __DRI_CORE or __DRI_SWRAST in dri_interface.h to allow a > new version, it shouldn't make old server code retroactively require > the new version from swrast drivers. > > Notably, new Mesa defines __DRI_SWRAST version 4, but we still want to > be able to probe version 1 drivers, since we don't use any features > beyond version 1 of the struct. > > Signed-off-by: Keith Packard <keithp@keithp.com> > Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> > > > You can either run a bisect, or just try reverting those two. However, those > still look like colateral damage. Revert above 2 commits, this issue still exists. Try to bisect it. There are only 'skip'ped commits left to test. The first bad commit could be any of: ebcc1c214c466582d7b92826b4860256fd9c582a revert this commit, still fails 81c123ea2dd833864f7ba217791e59acca0f7c97 revert this commit, still fails f70a8bf3714d89bccaad36841ef9149e91ad3bba revert this commit, still fails a239e6faf3fce848ac0d10c48f8e817db68a493c revert fail(It is a merge commit) a239e6faf3fce848ac0d10c48f8e817db68a493c Merge: 43e5a43 f70a8bf Author: Keith Packard <keithp@keithp.com> Date: Mon Nov 11 15:26:12 2013 -0800 Merge remote-tracking branch 'jeremyhu/master' It passes on 43e5a43 and fails on f70a8bf (In reply to comment #8) > So that conclusively points toward AIGLX triggering the broken code. (I say > triggering as I suspect something it dlopens throws the error). > > Can you please launch X under gdb, set a breakpoint on _exit and grab a > backtrace? (something like gdb Xorg -ac -noreset ; b _exit ; bt;) (gdb) bt #0 0x000000372f035819 in raise () from /usr/lib64/libc.so.6 #1 0x000000372f036f28 in abort () from /usr/lib64/libc.so.6 #2 0x00007f8a380096bd in brw_get_device_info (devid=<optimized out>) at brw_device_info.c:233 #3 0x00007f8a37fe1adc in intelInitScreen2 (psp=0x10fe890) at intel_screen.c:1330 #4 0x00007f8a37fa06a0 in driCreateNewScreen2 (scrn=0, fd=14, extensions=<optimized out>, driver_extensions=<optimized out>, driver_configs=0x10f8590, data=0x10f84d0) at dri_util.c:159 #5 0x00007f8a39a365ad in __glXDRIscreenProbe (pScreen=0x10dc2e0) at glxdri2.c:910 #6 0x00007f8a39a2eecb in GlxExtensionInit () at glxext.c:362 #7 0x00000000004c2b59 in InitExtensions (argc=argc@entry=3, argv=argv@entry=0x7fff84aea9f8) at ../../../mi/miinitext.c:338 #8 0x000000000043ce55 in dix_main (argc=3, argv=<optimized out>, envp=<optimized out>) at main.c:204 #9 0x000000372f021b75 in __libc_start_main () from /usr/lib64/libc.so.6 #10 0x0000000000427761 in _start () So you managed to link the xserver against the wrong mesa lib, but mesa should not be using abort(!) there either. commit 6e9f427ed8a20d78e7d832b163d757827dd3e74f Author: Kenneth Graunke <kenneth@whitecape.org> Date: Thu Jul 4 12:11:36 2013 -0700 i965: Add a new brw_device_info structure. The idea is that struct brw_device_info should store statically-known information about hardware features. Using the new family name in the PCI ID table, we can easily grab the right structure. This is basically the equivalent of intel_device_info in the kernel. This patch also makes the new structure available from intel_screen, but nothing uses it. Right now, it looks very redundant with existing fields, but that will change. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> You're correct, of course...there's a proper way to fail and I didn't use it. Sorry for the trouble, Chris. Patch on list: http://lists.freedesktop.org/archives/mesa-dev/2014-February/053693.html (In reply to comment #14) > You're correct, of course...there's a proper way to fail and I didn't use > it. Sorry for the trouble, Chris. Patch on list: > http://lists.freedesktop.org/archives/mesa-dev/2014-February/053693.html Fixed by this patch. commit eaf3358e0a1323ed417b6875e70fdcdc30ed97e0 Author: Kenneth Graunke <kenneth@whitecape.org> Date: Mon Feb 10 01:54:23 2014 -0800 i965: Don't call abort() on an unknown device. If we don't recognize the PCI ID, we can't reasonably load the driver. However, calling abort() is quite rude - it means the application that tried to initialize us (possibly the X server) can't continue via fallback paths. We already have a more polite mechanism - failing to create the context. So, just use that. While we're at it, improve the error message. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73024 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Tested-by: Lu Hua <huax.lu@intel.com> Verified.Fixed. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.