Bug 90925 - "high fidelity": Segfault in _mesa_program_resource_find_name
Summary: "high fidelity": Segfault in _mesa_program_resource_find_name
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: Tapani Pälli
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-06-10 11:41 UTC by Christoph Haag
Modified: 2015-09-02 04:55 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
full backtrace (73.76 KB, text/plain)
2015-06-10 11:41 UTC, Christoph Haag
Details
apitrace trace (100.83 KB, text/plain)
2015-06-10 17:22 UTC, Christoph Haag
Details

Description Christoph Haag 2015-06-10 11:41:36 UTC
Created attachment 116421 [details]
full backtrace

I tried this project: https://github.com/highfidelity/hifi
Build instructions aren't so clear, but once you have all dependencies (with the oculus rift sdk I believe), building&running should work with simply:
cmake /path/to/hifi; make; ./interface/interface

Anyway, with mesa git on intel ivy bridge it segfaults in mesa, see attached gdb.txt.

With radeonsi (with PRIME) it doesn't segfault. It still has the "Error: [Context] Unable to obtain x11 visual from context" problem, but that's a different one that comes from the oculus rift SDK I think.

Short discussion in IRC with not much results a few days ago:
https://secure.freedesktop.org/~cbrill/dri-log/?channel=dri-devel&date=2015-06-02
Comment 1 Tapani Pälli 2015-06-10 16:21:28 UTC
Would it be possible to get apitrace dump of the application?
Comment 2 Christoph Haag 2015-06-10 17:22:25 UTC
Created attachment 116423 [details]
apitrace trace

Hm. More or less.

apitrace doesn't seem to like that much.
Comment 3 Tapani Pälli 2015-06-11 04:52:48 UTC
The program seems to segfault even if I revert 222e5b8 which changes glGetAttribLocation implementation. This makes it painful to know if my fix for this bug works correctly :/

Program received signal SIGSEGV, Segmentation fault.
0x00000000004c03a4 in retrace_glGetAttribLocation(trace::Call&) ()

This is a strange bug, the ir_variable in question seems to have NULL as it's name, I need to dig out if this is a valid condition, code currently assumes that everything has at least some name.
Comment 4 Tapani Pälli 2015-06-11 08:08:01 UTC
I think what happens is that during linking we add resource but only later on this resource gets deleted by the backend (decides it is not used/active). This is a bit strange pattern, typically unused variables get dropped already early. Will investigate what should be done here.
Comment 5 Tapani Pälli 2015-06-12 12:11:26 UTC
OK, now I know what is going on. This is actually major issue with the current implementation. Like said in comment #4, we are creating ProgramResourceList too early. Backend driver will still do optimization and is still able to remove dead code, thus removing variables that are in the resource list.

The solution is to move resource list creation up in the linking to happen only after backend LinkShader hook and remove symbol table usage from the list creation (symbol table usage was just optimization instead of iterating whole IR to find variables).
Comment 6 Tapani Pälli 2015-06-29 11:24:48 UTC
Ah fortunately it seems dead code removal works just fine after all! It's just me building the list using a symbol table that still contains removed variables. I will change the list creation to not use symbol table.
Comment 7 Tapani Pälli 2015-06-29 11:55:52 UTC
(In reply to Tapani Pälli from comment #6)
> Ah fortunately it seems dead code removal works just fine after all! It's
> just me building the list using a symbol table that still contains removed
> variables. I will change the list creation to not use symbol table.

correction, I will change list creation to not use symbol table + move the program resource list creation to happen after backend LinkShader hook.
Comment 8 Tapani Pälli 2015-07-01 12:22:30 UTC
fixed in master
Comment 9 Christoph Haag 2015-07-01 13:16:05 UTC
Can confirm, High Fidelity works fine now. Thanks a lot.
Comment 10 Matt Turner 2015-07-01 21:10:58 UTC
Please leave a comment noting the commit that fixed it.
Comment 11 Tapani Pälli 2015-07-03 08:58:23 UTC
commit f045b8b2ff5ac75da3e092f482fd1717571d8462
Author: Tapani Pälli <tapani.palli@intel.com>
Date:   Mon Jun 29 15:23:45 2015 +0300

    glsl: create program resource list after LinkShader
    
    Resource list can be created properly  only after LinkShader hook
    has been called to make sure all dead variables have been removed.
    
    Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
    Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90925
Comment 12 Mark Janes 2015-09-02 00:16:01 UTC
I have been bisecting dEQP regressions for 10.6.  It appears that several 10.6 dEQP regressions are fixed by the commit that fixes this bug.

Tapani, is it possible to backport this fix to 10.6?
Comment 13 Tapani Pälli 2015-09-02 04:55:11 UTC
(In reply to Mark Janes from comment #12)
> I have been bisecting dEQP regressions for 10.6.  It appears that several
> 10.6 dEQP regressions are fixed by the commit that fixes this bug.
> 
> Tapani, is it possible to backport this fix to 10.6?

Sure, will do.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.