Bug 48894

Summary: plymouthd crashed with SIGABRT in __assert_fail_base()
Product: DRI Reporter: Bryce Harrington <bryce>
Component: libdrmAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED MOVED QA Contact:
Severity: major    
Priority: high CC: apw, steve.langasek
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
BootDmesg.txt
none
CurrentDmesg.txt
none
Lspci.txt
none
ProcModules.txt
none
ProcModules.txt
none
ThreadStacktrace.txt none

Description Bryce Harrington 2012-04-18 14:22:54 UTC
Forwarding this bug from Ubuntu reporter Florin:
http://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/966868

[Problem]
On Ubuntu there appears to be a race condition in libdrm during boot.  It appears the i915 drm device exists but isn't fully initialized at the time plymouth wants to use it.

Note I'm filing this against -intel just because it's the intel portion of libdrm where the code is passing through; I think this is really a libdrm bug.

[Original Description]
After a force restart of Ubuntu, I've got a System Crash error after logging in.

lsb_release -rd
Description:	Ubuntu precise (development branch)
Release:	12.04

This looks more like a libdrm bug. There's a race condition with the i915 device not being ready by the time plymouth is starting. Possibly it's because it doesn't have drm master.

<Sarvatt> apparently chromeos works around it with http://git.chromium.org/gitweb/?p=chromiumos/third_party/kernel.git;a=commit;h=32a8c5b67163a6ae211ff2683c999b6ad2c76d1f but thats just working around the problem..

googling intel/intel_bufmgr_gem.c:2783 turns up a lot of hits.

The code in question with the assert is:

 if (IS_GEN2(bufmgr_gem->pci_device))
                bufmgr_gem->gen = 2;
        else if (IS_GEN3(bufmgr_gem->pci_device))
                bufmgr_gem->gen = 3;
 else if (IS_GEN4(bufmgr_gem->pci_device))
                bufmgr_gem->gen = 4;
        else if (IS_GEN5(bufmgr_gem->pci_device))
                bufmgr_gem->gen = 5;
        else if (IS_GEN6(bufmgr_gem->pci_device))
                bufmgr_gem->gen = 6;
        else if (IS_GEN7(bufmgr_gem->pci_device))
                bufmgr_gem->gen = 7;
        else
                assert(0);

$ xpci 8086:0126
snb-m-gt2+ (8086:0126) sandybridge

So it should be going into the IS_GEN6 branch.


Thanks!

ProblemType: Crash
DistroRelease: Ubuntu 12.04
Package: plymouth 0.8.2-2ubuntu28
ProcVersionSignature: Ubuntu 3.2.0-20.33-generic 3.2.12
Uname: Linux 3.2.0-20-generic x86_64
ApportVersion: 1.95-0ubuntu1
Architecture: amd64
Date: Wed Mar 28 09:33:16 2012
DefaultPlymouth: /lib/plymouth/themes/ubuntu-logo/ubuntu-logo.plymouth
ExecutablePath: /sbin/plymouthd
InstallationMedia: Ubuntu 12.04 LTS "Precise Pangolin" - Alpha amd64 (20120322)
MachineType: LENOVO 4284BZ4
ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.2.0-20-generic root=UUID=f1bb4518-a890-49c0-9339-ecc3d8bd2658 ro quiet splash vt.handoff=7
ProcCmdline: /sbin/plymouthd --mode=boot --attach-to-session
ProcEnviron:
 TERM=linux
 PATH=(custom, no user)
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.2.0-20-generic root=UUID=f1bb4518-a890-49c0-9339-ecc3d8bd2658 ro quiet splash vt.handoff=7
Signal: 6
SourcePackage: plymouth
TextPlymouth: /lib/plymouth/themes/ubuntu-text/ubuntu-text.plymouth
Title: plymouthd crashed with SIGABRT in raise()
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:
 
dmi.bios.date: 01/19/2012
dmi.bios.vendor: LENOVO
dmi.bios.version: 8BET56WW (1.36 )
dmi.board.asset.tag: Not Available
dmi.board.name: 4284BZ4
dmi.board.vendor: LENOVO
dmi.board.version: Not Available
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Not Available
dmi.modalias: dmi:bvnLENOVO:bvr8BET56WW(1.36):bd01/19/2012:svnLENOVO:pn4284BZ4:pvrThinkPadW520:rvnLENOVO:rn4284BZ4:rvrNotAvailable:cvnLENOVO:ct10:cvrNotAvailable:
dmi.product.name: 4284BZ4
dmi.product.version: ThinkPad W520
dmi.sys.vendor: LENOVO
Comment 1 Bryce Harrington 2012-04-18 14:27:11 UTC
Created attachment 60277 [details]
BootDmesg.txt
Comment 2 Bryce Harrington 2012-04-18 14:27:27 UTC
Created attachment 60278 [details]
CurrentDmesg.txt
Comment 3 Bryce Harrington 2012-04-18 14:27:38 UTC
Created attachment 60279 [details]
Lspci.txt
Comment 4 Bryce Harrington 2012-04-18 14:27:49 UTC
Created attachment 60280 [details]
ProcModules.txt
Comment 5 Bryce Harrington 2012-04-18 14:28:01 UTC
Created attachment 60281 [details]
ProcModules.txt
Comment 6 Bryce Harrington 2012-04-18 14:28:24 UTC
Created attachment 60282 [details]
ThreadStacktrace.txt
Comment 7 Bryce Harrington 2012-04-18 14:58:55 UTC
This is another bug that we think is the same root cause:

  https://bugs.launchpad.net/ubuntu/+source/libdrm/+bug/982889

in this one, X comes up before the drm device is ready, and so trips on a different chunk of code.

You can see from comparing timestamps in Xorg.0.log and dmesg when drm is accessed vs. when it is reporting itself ready.


We've got a couple ideas on how to fix this in the distro.  One is to put a loop around the code paths where the failures occur, to continue retrying for some number of seconds.  But that feels like a big hack.  The other idea would be if there was an event to indicate the driver is ready for use, that we could listen for and delay plymouth, X, etc. until it's received.  But we don't know the feasibility of that.
Comment 8 Bryce Harrington 2012-04-18 15:01:51 UTC
We suspect that the reason this happens is due to a Ubuntu kernel patch, which was added to work around other boot crashing problems:

http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-precise.git;a=commitdiff;h=6d74feca6235b463ade4ecddd1dfdb73d30a2ff7;hp=e29a4668d7441aa88d8015da51674a7e8159312b

"When a drm driver is initialised we first allocate and initialise the
drm minor numbers including creating the sysfs files, then we trigger
the driver load method.  The act of creating the sysfs files triggers the
uevent.  This means udev may start programs which open /dev/dri/card0 and
other interfaces, this can occur before the load method has even started
and thus before the driver has fully initialised its data structures.
In the case of plymouthd this leads to it opening and closing (in disgust)
the interface, which in turn leads to a kernel panic as the mutexes are
yet to be initialised.

"This patch delays the linking up of the drm devices minor numbers until
the driver is fully initialised.  As it is possible for consumers of
these interfaces to reach them before they are fully initialised we
arrange for opens of these devices to return EAGAIN until the device is
fully initialised."
Comment 9 Bryce Harrington 2012-04-18 15:25:25 UTC
<jbarnes> so for 48894 I'd open a separate bug against drm for the core issue: if you access the device too early you get a crash
<jbarnes> there's a similar bug with accessing the dpms status files in sysfs
<jbarnes> if the module is unloading at the time, you can panic the kenrel
<jbarnes> also a kernel bug

I'll move this bug to drm, as I think the core issue is what we're really looking for advice on here.
Comment 10 Bryce Harrington 2012-04-18 15:34:25 UTC
<jbarnes> ok looks like a core drm kernel bug
<jbarnes> we don't lock properly around initialization
Comment 11 GitLab Migration User 2019-09-24 17:08:36 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/drm/issues/8.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.