Bug 86043

Summary: Optimus issue with libdrm 2.4.58
Product: DRI Reporter: Jordan Justen <jljusten>
Component: libdrmAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED MOVED QA Contact:
Severity: normal    
Priority: medium CC: chris, niekbergman, tomi, tvrtko.ursulin, uzytkownik2, vrodic
Version: XOrg git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
valgrind steam run, error should near the end of the file none

Description Jordan Justen 2014-11-08 20:43:31 UTC
Users with optimus systems are reporting that
many games fail to run in libdrm-intel is upgraded
from 2.4.56 to 2.4.58. (And, downgrading to 2.4.56
fixes the issue.

Steam bug:
https://github.com/ValveSoftware/steam-for-linux/issues/3506

Debian bug:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=768045

I don't have hardware to confirm this issue.

I notice that between 2.4.56 and 2.4.58 libdrm changed
some symbol visibility settings.
Comment 1 Tobias Klausmann 2014-11-10 00:38:22 UTC
Works fine on my Optimus machine with libdrm 2.4.58. Dota2 runs fine on both GPUs!
I am not using bumblebee or primus, so i guess the problem lies there!

Anyway, keeping this bug open for a while.
Comment 2 Gavin Graham 2014-11-12 20:34:44 UTC
(In reply to Tobias Klausmann from comment #1)
> Works fine on my Optimus machine with libdrm 2.4.58. Dota2 runs fine on both
> GPUs!
> I am not using bumblebee or primus, so i guess the problem lies there!
> 
> Anyway, keeping this bug open for a while.

Out of curiosity, what version nVidia driver are you using?
Comment 3 Tobias Klausmann 2014-11-15 16:59:37 UTC
I dont use the NVIDIA blob, but the nouveau driver...anyhow the intel driver is working on prime (Optimus) systems if it does not get bugged by bumblebee/primus with libdrm 2.4.58. Go report a bug against these apps.
Comment 4 Niek Bergman 2015-01-27 11:30:01 UTC
Can confirm this issue locally on my current system.

Running Steam through the "primusrun" Bumblebee wrapper to start games on the nVidia card leads to the following error after starting a game:

malloc: unknown:0: assertion botched
free: called with unallocated block argument
last command: (null)
Aborting...Aborted (core dumped)

I can also confirm that downgrading libdrm-intel1 to 2.4.56 fixes the issue and allows the game to run on the nVidia card just fine. Furthermore, starting Steam without "primusrun" (thereby causing the game to start on the Intel card) also does not cause this issue.

Should any testing be desired, I'd be happy to oblige.
Comment 5 Vedran Rodic 2015-02-28 11:24:31 UTC
I confirm this is still present in 2.4.59+git20150225.1f73578d-0ubuntu0ricotz~utopic.
Comment 6 Emil Velikov 2015-02-28 12:59:21 UTC
Guys can you bisect this ? Afaics it will take ~6 steps to track the issue.
Comment 7 Vedran Rodic 2015-03-01 20:29:46 UTC
This is what I came up with when bisecting

ae8edc7544e566084f7b958eb93c9109b471ca30 is the first bad commit
commit ae8edc7544e566084f7b958eb93c9109b471ca30
Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Date:   Thu Jun 19 15:52:03 2014 +0100

    intel: Add support for userptr objects
    
    Allow userptr objects to be created and used via libdrm_intel.
    
    At the moment tiling and mapping to GTT aperture is not supported
    due hardware limitations across different generations and uncertainty
    about its usefulness.
    
    v2: Improved error handling in feature detection per review comments.
    
    v3: Rebase on top of the drm_public addition, minor whitespace addition.
    
    Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
    Signed-off-by: Damien Lespiau <damien.lespiau@intel.com> (v3)
    Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> (v1,v2)

:040000 040000 5b1f4eecbdd1cf2c57d7da9388384c17e8448bee b41b6afa071dc408dcaf7a4c6dcfecc2d7413c73 M	intel
Comment 8 Emil Velikov 2015-03-01 23:11:43 UTC
Adding the author to the CC list.

Vedran thanks for bisecting this :-)
I'm not sure how this patch can cause such an issue. Either something funny has happened during bisect or there is another underlying problem.


Summary of the problem so far:

Optimus system running Intel mesa and the proprietary Nvidia drivers. Running Steam (games) with either bumblebee, optirun or primusrun seems to cause a crash.

Steam Log:
Game update: AppID 550 "Left 4 Dead 2", ProcID 9399, IP 0.0.0.0:0
...
malloc: unknown:0: assertion botched
free: called with unallocated block argument
last command: (null)
Aborting...Game removed: AppID 550 "Left 4 Dead 2", ProcID 9399

Bisecting libdrm shows ae8edc7544e as the offending commit.


If an Intel dev has a similar machine they can try to reproduce it locally. Otherwise a back-trace should help at least a bit.

I while back I was able to use gdb with steam. Not sure if it still works
* Close all steam games and the steam client
* $ export GAME_DEBUGGER=gdb
* $ steam

Note I'm not looking into the issue, just trying to get some information for the Intel devs.
Comment 9 Tvrtko Ursulin 2015-03-02 09:50:27 UTC
Thanks Emil! Unfortunaltey I am not familiar with Optimus nor do I have appropriate hardware.

Could someone provide a backtrace (with symbols) of the binary which ends up saying "malloc: unknown:0: assertion botched", etc? Or maybe even Valgrind it? Although that may be pretty advanced.

I don't think this commit is the actual culprit, but it is possible that the failing binary uses UserPtr if it detects it, and has a bug in those optional code paths.

Another interesting test would be to try non-working libdrm on a kernel without UserPtr support.
Comment 10 Vedran Rodic 2015-03-02 19:39:57 UTC
I don't have pre 3.16 kernel easily available. 

export GAME_DEBUGGER=gdb still makes the game crash before it runs it with the debugger.

I've tried various things to make it work (modifying the game startup shell script) but without success. 

When I put the older version of libdrm (86b37c61c78edd1353a3f76f678c39e2ec168771, before Tvrtkos change), gdb runs normally for the game. So the steam binary actually crashes, not the game itself. 

It's funny because steam binary is 32bit, Dota 2 binary is 64 bit, and only changing the 64bit version of libdrm_intel.so affects this bug. 

I've tried debugging steam with export DEBUGGER=gdb, but it seems spawn multiple threads and I'm not experienced with multithreaded debugging. 

I did manage to run steam with valgrind, I'll add an attachment now.
Comment 11 Vedran Rodic 2015-03-02 19:40:48 UTC
Created attachment 113924 [details]
valgrind steam run, error should near the end of the file

valgrind steam run, error should near the end of the file
Comment 12 Tvrtko Ursulin 2015-03-03 10:17:38 UTC
I think we'll need to pull in someone from Steam since Valgrind reports some mismatched frees in vgui2_s.so, steamclient.so and steamui.so. 

There are not debug symbols so impossible to guess what area of the code is that.

Then this also looks pretty bad:

ERROR: ld.so: object '/home/vedran/.local/share/Steam/ubuntu12_32/gameoverlayrenderer.so' from LD_PRELOAD cannot be preloaded (wrong ELF class: ELFCLASS32): ignored.
ERROR: ld.so: object '/home/vedran/.local/share/Steam/ubuntu12_32/gameoverlayrenderer.so' from LD_PRELOAD cannot be preloaded (wrong ELF class: ELFCLASS32): ignored

Please if someone knows any Steam devs CC them here?
Comment 13 Emil Velikov 2015-03-03 11:55:44 UTC
(In reply to Tvrtko Ursulin from comment #12)
> I think we'll need to pull in someone from Steam since Valgrind reports some
> mismatched frees in vgui2_s.so, steamclient.so and steamui.so. 
> 
> There are not debug symbols so impossible to guess what area of the code is
> that.
> 
I'm assuming Jordan can coordinate that with Valve. Afaics he is the one assigned for the issue over at github.

> Then this also looks pretty bad:
> 
> ERROR: ld.so: object
> '/home/vedran/.local/share/Steam/ubuntu12_32/gameoverlayrenderer.so' from
> LD_PRELOAD cannot be preloaded (wrong ELF class: ELFCLASS32): ignored.
> ERROR: ld.so: object
> '/home/vedran/.local/share/Steam/ubuntu12_32/gameoverlayrenderer.so' from
> LD_PRELOAD cannot be preloaded (wrong ELF class: ELFCLASS32): ignored
> 
Those should be harmless, as steam LD_PRELOADs a bunch of libraries. Afaik steam is 32bit, while Vedran was running a 64bit system - thus the messages. 

There are a couple of other interesting bits in the log:
 - Alarming heap stats (allocated ~770 MiB, in-use at exit ~220MiB)
 - Invalid memory access in various modules - invalid reads in various steam runtime components, alarming number of invalid writes in i965_dri.so.

The latter one perhaps should be looked into closer ?

> Please if someone knows any Steam devs CC them here?
Afaict unless a person is known* by bugzilla one cannot CC them. For example one cannot CC your @intel email, so I've opted for the known @linux.intel one.
Comment 14 Chris Wilson 2015-03-03 11:59:36 UTC
(In reply to Emil Velikov from comment #13)
>  - Invalid memory access in various modules - invalid reads in various steam
> runtime components, alarming number of invalid writes in i965_dri.so.
> 
> The latter one perhaps should be looked into closer ?

Probably not, that is just valgrind not understanding GEM and its ioctls. You need to use a libdrm compiled with --enable-valgrind to suppress the false positives.
Comment 15 GitLab Migration User 2019-09-24 17:08:59 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/drm/issues/14.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.