Bug 73024

Summary: i965_dri.so calls abort() if it doesn't recognize the PCI ID.
Product: Mesa Reporter: lu hua <huax.lu>
Component: Drivers/DRI/i965Assignee: Kenneth Graunke <kenneth>
Status: VERIFIED FIXED QA Contact: Intel 3D Bugs Mailing List <intel-3d-bugs>
Severity: critical    
Priority: high CC: ben, chris, mengmeng.meng
Version: git   
Hardware: All   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: Xorg.0.log

Description lu hua 2013-12-25 05:26:10 UTC
Created attachment 91181 [details]
Xorg.0.log

System Environment:
--------------------------
Platform: Broadwell
Libdrm:		(master)libdrm-2.4.50-3-g068ea68b3f7ebd5efcfcc2f6ae417651423c8382
Mesa:		(master)35a34143026785e015adb906756651807de89bde
Xserver:	(master)xorg-server-1.14.99.905
Xf86_video_intel:(master)2.99.906-98-g9289e2c56b7f0cc78c5123691ad96611f0e04bed
Cairo:		(master)040a9f678bfb0f0b89a0273b729c4e9f2bc23e4f
Libva:		(staging)d349f2bb779c596290a493f3c1344f912565e568
Libva_intel_driver:(staging)5b211d3e2f2e4eab95b8697d9109fb7ea29fbfb3
Kernel:	(drm-intel-nightly) 164a4cb4c1431a0689f85507868356fae24da638

Bug detailed description:
-------------------------
xinit fails with Unknown Intel device.xinit: giving up.

run on commit 264fc3abe5f18341d0cf9ddb6766e10e4154e447, It works well.

The latest known good commit: 264fc3abe5f18341d0cf9ddb6766e10e4154e447
The latest known bad commit: a68df147421da21528b5be2d34678383922fa352

output:
_XSERVTransSocketOpenCOTSServer: Unable to open socket for inet6
_XSERVTransOpen: transport open failed for inet6/x-bdw01:0
_XSERVTransMakeAllCOTSServerListeners: failed to open listener for inet6

This is a pre-release version of the X server from The X.Org Foundation.
It is not supported in any way.
Bugs may be filed in the bugzilla at http://bugs.freedesktop.org/.
Select the "xorg" product for bugs you find in this release.
Before reporting bugs in pre-release versions please check the
latest version in the X.Org Foundation git repository.
See http://wiki.x.org/wiki/GitPage for git access instructions.

X.Org X Server 1.14.99.905 (1.15.0 RC 5)
Release Date: 2013-12-19
X Protocol Version 11, Revision 0
Build Operating System: Linux 3.3.4-3.fc16.x86_64 x86_64
Current Operating System: Linux x-bdw01 3.13.0-rc4_drm-intel-nightly_5f7867_20131225+ #2 SMP Wed Dec 25 09:41:45 CST 2013 x86_64
Kernel command line: BOOT_IMAGE=kernels//nightly_parents/2013_12_25/drm-intel-nightly/5f7867f14f8b3688584f4bf589fe4b8aa8c718b3/bzImage_x86_64 root=/dev/sda4 acpi_rsdp=0x00000000ab8e1014 modules_path=kernels//nightly_parents/2013_12_25/drm-intel-nightly/5f7867f14f8b3688584f4bf589fe4b8aa8c718b3/modules_x86_64/lib/modules/3.13.0-rc4_drm-intel-nightly_5f7867_20131225+
Build Date: 23 December 2013  03:36:11PM

Current version of pixman: 0.33.1
        Before reporting problems, check http://wiki.x.org
        to make sure that you have the latest version.
Markers: (--) probed, (**) from config file, (==) default setting,
        (++) from command line, (!!) notice, (II) informational,
        (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
(==) Log file: "/opt/X11R7/var/log/Xorg.0.log", Time: Thu Dec 12 02:16:04 2013
(==) Using config file: "/etc/X11/xorg.conf"
(==) Using config directory: "/etc/X11/xorg.conf.d"
(==) Using system config directory "/usr/share/X11/xorg.conf.d"
Initializing built-in extension Generic Event Extension
Initializing built-in extension SHAPE
Initializing built-in extension MIT-SHM
Initializing built-in extension XInputExtension
Initializing built-in extension XTEST
Initializing built-in extension BIG-REQUESTS
Initializing built-in extension SYNC
Initializing built-in extension XKEYBOARD
Initializing built-in extension XC-MISC
Initializing built-in extension XINERAMA
Initializing built-in extension XFIXES
Initializing built-in extension RENDER
Initializing built-in extension RANDR
Initializing built-in extension COMPOSITE
Initializing built-in extension DAMAGE
Initializing built-in extension MIT-SCREEN-SAVER
Initializing built-in extension DOUBLE-BUFFER
Initializing built-in extension RECORD
Initializing built-in extension DPMS
Initializing built-in extension Present
Initializing built-in extension DRI3
Initializing built-in extension X-Resource
Initializing built-in extension XVideo
Initializing built-in extension XVideo-MotionCompensation
Initializing built-in extension XFree86-VidModeExtension
Initializing built-in extension XFree86-DGA
Initializing built-in extension XFree86-DRI
Initializing built-in extension DRI2
Loading extension GLX
Unknown Intel device.xinit: giving up
xinit: unable to connect to X server: Bad file descriptor
xinit: server error
Comment 1 lu hua 2013-12-26 02:44:17 UTC
It only happens on Broadwell.
It happens on master branch and 1.14 branch.
Comment 2 Ben Widawsky 2014-01-10 17:39:20 UTC
Please let me know if the problem still exists with latest X server, DDX, and BDW mesa.
Comment 3 lu hua 2014-01-16 01:07:20 UTC
(In reply to comment #2)
> Please let me know if the problem still exists with latest X server, DDX,
> and BDW mesa.

It still exists.
Comment 4 Chris Wilson 2014-01-16 09:30:39 UTC
Just to confirm a theory can you please build xserver with ./configure --disable-dri --disable-dri2 --disable-dri3 --disable-aiglx?
Comment 5 lu hua 2014-01-17 02:04:59 UTC
(In reply to comment #4)
> Just to confirm a theory can you please build xserver with ./configure
> --disable-dri --disable-dri2 --disable-dri3 --disable-aiglx?

Add these parameters, It works well.
Comment 6 Chris Wilson 2014-01-28 21:08:29 UTC
Ok, we are getting somewhere!

Can you please now try each of the parameters independently:

--disable-dri
--disable-dri2
--disable-dri3
--disable-aiglx

and see which extension is triggering the assertion?
Comment 7 lu hua 2014-01-29 02:21:32 UTC
(In reply to comment #6)
> Ok, we are getting somewhere!
> 
> Can you please now try each of the parameters independently:
> 
> --disable-dri
bad

> --disable-dri2
good

> --disable-dri3
bad

> --disable-aiglx
good
Comment 8 Chris Wilson 2014-01-29 12:37:00 UTC
So that conclusively points toward AIGLX triggering the broken code. (I say triggering as I suspect something it dlopens throws the error).

Can you please launch X under gdb, set a breakpoint on _exit and grab a backtrace? (something like gdb Xorg -ac -noreset ; b _exit ; bt;)
Comment 9 Chris Wilson 2014-01-29 16:33:01 UTC
Actually, dri2 not aiglx.

Reading through the changes between the two endpoints (I should have asked for a bisect I gather), I only see a couple of relevant patches:

commit 6e926b18ca1b182253bac435a1d53caaff7ffff6
Author: Eric Anholt <eric@anholt.net>
Date:   Thu Nov 14 17:40:46 2013 -0800

    glx: Fix incorrect use of dri_interface.h version defines in extensions.
    
    Those defines are so you can compile-time check "do I have a
    dri_interface.h that defines this new field of the struct?"  You don't
    want the server to claim it implements the new struct just because you
    installed a new copy of Mesa.
    
    Signed-off-by: Keith Packard <keithp@keithp.com>
    Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

commit ac772cb187ddf7e04b8f4b3a071b90f18f4488d0
Author: Eric Anholt <eric@anholt.net>
Date:   Thu Nov 14 17:40:47 2013 -0800

    glx: Fix incorrect use of dri_interface.h version defines in driver probing.
    
    If we extend __DRI_CORE or __DRI_SWRAST in dri_interface.h to allow a
    new version, it shouldn't make old server code retroactively require
    the new version from swrast drivers.
    
    Notably, new Mesa defines __DRI_SWRAST version 4, but we still want to
    be able to probe version 1 drivers, since we don't use any features
    beyond version 1 of the struct.
    
    Signed-off-by: Keith Packard <keithp@keithp.com>
    Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>


You can either run a bisect, or just try reverting those two. However, those still look like colateral damage.
Comment 10 Gordon Jin 2014-01-31 07:40:28 UTC
Note: QA team is in holiday and will be back Feb 7.
Comment 11 lu hua 2014-02-07 05:36:10 UTC
(In reply to comment #9)
> Actually, dri2 not aiglx.
> 
> Reading through the changes between the two endpoints (I should have asked
> for a bisect I gather), I only see a couple of relevant patches:
> 
> commit 6e926b18ca1b182253bac435a1d53caaff7ffff6
> Author: Eric Anholt <eric@anholt.net>
> Date:   Thu Nov 14 17:40:46 2013 -0800
> 
>     glx: Fix incorrect use of dri_interface.h version defines in extensions.
>     
>     Those defines are so you can compile-time check "do I have a
>     dri_interface.h that defines this new field of the struct?"  You don't
>     want the server to claim it implements the new struct just because you
>     installed a new copy of Mesa.
>     
>     Signed-off-by: Keith Packard <keithp@keithp.com>
>     Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
> 
> commit ac772cb187ddf7e04b8f4b3a071b90f18f4488d0
> Author: Eric Anholt <eric@anholt.net>
> Date:   Thu Nov 14 17:40:47 2013 -0800
> 
>     glx: Fix incorrect use of dri_interface.h version defines in driver
> probing.
>     
>     If we extend __DRI_CORE or __DRI_SWRAST in dri_interface.h to allow a
>     new version, it shouldn't make old server code retroactively require
>     the new version from swrast drivers.
>     
>     Notably, new Mesa defines __DRI_SWRAST version 4, but we still want to
>     be able to probe version 1 drivers, since we don't use any features
>     beyond version 1 of the struct.
>     
>     Signed-off-by: Keith Packard <keithp@keithp.com>
>     Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
> 
> 
> You can either run a bisect, or just try reverting those two. However, those
> still look like colateral damage.

Revert above 2 commits, this issue still exists.
Try to bisect it.
There are only 'skip'ped commits left to test.
The first bad commit could be any of:
ebcc1c214c466582d7b92826b4860256fd9c582a  revert this commit, still fails
81c123ea2dd833864f7ba217791e59acca0f7c97  revert this commit, still fails
f70a8bf3714d89bccaad36841ef9149e91ad3bba  revert this commit, still fails
a239e6faf3fce848ac0d10c48f8e817db68a493c  revert fail(It is a merge commit)


a239e6faf3fce848ac0d10c48f8e817db68a493c
Merge: 43e5a43 f70a8bf
Author: Keith Packard <keithp@keithp.com>
Date:   Mon Nov 11 15:26:12 2013 -0800

    Merge remote-tracking branch 'jeremyhu/master'

It passes on 43e5a43 and fails on f70a8bf
Comment 12 lu hua 2014-02-10 06:04:52 UTC
(In reply to comment #8)
> So that conclusively points toward AIGLX triggering the broken code. (I say
> triggering as I suspect something it dlopens throws the error).
> 
> Can you please launch X under gdb, set a breakpoint on _exit and grab a
> backtrace? (something like gdb Xorg -ac -noreset ; b _exit ; bt;)

(gdb) bt
#0  0x000000372f035819 in raise () from /usr/lib64/libc.so.6
#1  0x000000372f036f28 in abort () from /usr/lib64/libc.so.6
#2  0x00007f8a380096bd in brw_get_device_info (devid=<optimized out>)
    at brw_device_info.c:233
#3  0x00007f8a37fe1adc in intelInitScreen2 (psp=0x10fe890)
    at intel_screen.c:1330
#4  0x00007f8a37fa06a0 in driCreateNewScreen2 (scrn=0, fd=14,
    extensions=<optimized out>, driver_extensions=<optimized out>,
    driver_configs=0x10f8590, data=0x10f84d0) at dri_util.c:159
#5  0x00007f8a39a365ad in __glXDRIscreenProbe (pScreen=0x10dc2e0)
    at glxdri2.c:910
#6  0x00007f8a39a2eecb in GlxExtensionInit () at glxext.c:362
#7  0x00000000004c2b59 in InitExtensions (argc=argc@entry=3,
    argv=argv@entry=0x7fff84aea9f8) at ../../../mi/miinitext.c:338
#8  0x000000000043ce55 in dix_main (argc=3, argv=<optimized out>,
    envp=<optimized out>) at main.c:204
#9  0x000000372f021b75 in __libc_start_main () from /usr/lib64/libc.so.6
#10 0x0000000000427761 in _start ()
Comment 13 Chris Wilson 2014-02-10 08:32:21 UTC
So you managed to link the xserver against the wrong mesa lib, but mesa should not be using abort(!) there either.

commit 6e9f427ed8a20d78e7d832b163d757827dd3e74f
Author: Kenneth Graunke <kenneth@whitecape.org>
Date:   Thu Jul 4 12:11:36 2013 -0700

    i965: Add a new brw_device_info structure.
    
    The idea is that struct brw_device_info should store statically-known
    information about hardware features.  Using the new family name in the
    PCI ID table, we can easily grab the right structure.
    
    This is basically the equivalent of intel_device_info in the kernel.
    
    This patch also makes the new structure available from intel_screen, but
    nothing uses it.  Right now, it looks very redundant with existing
    fields, but that will change.
    
    Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
    Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Comment 14 Kenneth Graunke 2014-02-10 09:51:44 UTC
You're correct, of course...there's a proper way to fail and I didn't use it.  Sorry for the trouble, Chris.  Patch on list:
http://lists.freedesktop.org/archives/mesa-dev/2014-February/053693.html
Comment 15 lu hua 2014-02-11 01:07:26 UTC
(In reply to comment #14)
> You're correct, of course...there's a proper way to fail and I didn't use
> it.  Sorry for the trouble, Chris.  Patch on list:
> http://lists.freedesktop.org/archives/mesa-dev/2014-February/053693.html

Fixed by this patch.
Comment 16 Kenneth Graunke 2014-02-11 10:21:02 UTC
commit eaf3358e0a1323ed417b6875e70fdcdc30ed97e0
Author: Kenneth Graunke <kenneth@whitecape.org>
Date:   Mon Feb 10 01:54:23 2014 -0800

    i965: Don't call abort() on an unknown device.
    
    If we don't recognize the PCI ID, we can't reasonably load the driver.
    However, calling abort() is quite rude - it means the application that
    tried to initialize us (possibly the X server) can't continue via
    fallback paths.  We already have a more polite mechanism - failing to
    create the context.  So, just use that.
    
    While we're at it, improve the error message.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73024
    Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
    Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
    Tested-by: Lu Hua <huax.lu@intel.com>
Comment 17 lu hua 2014-02-13 08:22:36 UTC
Verified.Fixed.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.