Bug 94291 - llvmpipe tests fail if built on skylake i7-6700k
Summary: llvmpipe tests fail if built on skylake i7-6700k
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Other (show other bugs)
Version: 11.2
Hardware: Other All
: medium normal
Assignee: mesa-dev
QA Contact: mesa-dev
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-02-25 15:14 UTC by Timo Aaltonen
Modified: 2016-05-10 15:36 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Timo Aaltonen 2016-02-25 15:14:52 UTC
building on skylake will fail on llvmpipe tests:

make  check-TESTS
make[5]: Entering directory '/«PKGBUILDDIR»/build/src/gallium/drivers/llvmpipe'
make[6]: Entering directory '/«PKGBUILDDIR»/build/src/gallium/drivers/llvmpipe'
PASS: lp_test_printf
../../../../../bin/test-driver: line 107: 32508 Illegal instruction     (core dumped) "$@" > $log_file 2>&1
FAIL: lp_test_conv
../../../../../bin/test-driver: line 107: 32509 Illegal instruction     (core dumped) "$@" > $log_file 2>&1
FAIL: lp_test_blend
../../../../../bin/test-driver: line 107: 32513 Illegal instruction     (core dumped) "$@" > $log_file 2>&1
../../../../../bin/test-driver: line 107: 32511 Illegal instruction     (core dumped) "$@" > $log_file 2>&1
FAIL: lp_test_arit
FAIL: lp_test_format

broadwell is fine, and using the same llvm version (3.8-rc) on 11.1.x works fine, so this is a regression in the 11.2 branch
Comment 1 Roland Scheidegger 2016-02-26 15:17:25 UTC
Could you show the instruction where it crashed (and the disassembly)?
Comment 2 Timo Aaltonen 2016-04-11 14:32:21 UTC
how exactly? I've tried gdb:

(gdb) run
Starting program: /home/tjaalton/src/pkg-xorg/lib/mesa.git/build/src/gallium/drivers/llvmpipe/lp_test_format
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Testing PIPE_FORMAT_B8G8R8A8_UNORM (float) ...

Program received signal SIGILL, Illegal instruction.
0x00007ffff7ff5004 in ?? ()
(gdb) bt
#0 0x00007ffff7ff5004 in ?? ()
#1 0x0000000000000000 in ?? ()
(gdb) bt full
#0 0x00007ffff7ff5004 in ?? ()
No symbol table info available.
#1 0x0000000000000000 in ?? ()
No symbol table info available.
Comment 3 Timo Aaltonen 2016-04-11 14:33:42 UTC
Also, compiz doesn't run on SKL/KBL when using llvmpipe, it keeps restarting with 'trap: invalid opcode'. I guess these are related.
Comment 4 Roland Scheidegger 2016-04-11 15:06:24 UTC
(In reply to Timo Aaltonen from comment #2)
> how exactly? I've tried gdb:

Usually you could use x/i <address> if it's in jit code when gcc can't figure out the function (or just follow up to the caller and disassemble from there). But it looks like the stack got smashed so I don't know if that really would provide much insight.
Is that a debug build?
Comment 5 Timo Aaltonen 2016-04-20 08:41:04 UTC
llvm-3.8 misdetects skylake features, this is fixed in 3.9-snapshot..
Comment 6 Jose Fonseca 2016-04-20 11:38:25 UTC
It's not the first time LLVM misidentifies modern CPUs.

I thought that all the logic in src/gallium/auxiliary/gallivm/lp_bld_misc.cpp for setting +/-foo mattrs would save us from this sort of grief.

On the other hand, I suppose that actually knowing the exact CPU model allows it to better model instruction latency/throughput.
Comment 7 Roland Scheidegger 2016-04-20 13:35:18 UTC
(In reply to Jose Fonseca from comment #6)
> It's not the first time LLVM misidentifies modern CPUs.
> 
> I thought that all the logic in
> src/gallium/auxiliary/gallivm/lp_bld_misc.cpp for setting +/-foo mattrs
> would save us from this sort of grief.

For features we already know about (I think I even mentioned that back then, hoping it wouldn't be a problem)...
If I look at the list of skylake features, I'd nearly bet the winner is avx512 (and/or any subvariant).
Comment 8 Timo Aaltonen 2016-04-20 15:20:27 UTC
Actually it wasn't avx512, that was the first one I tried :) It's enabled also on 3.7 and that version works fine. Only one that was added in 3.8 is PKU, but dropping just that didn't help.

I did try dropping all non-client features (AVX512, CDI, DQI, BWI, VLX, PKU) and that worked. Maybe one of CDI/DQI/BWI/VLX is somewhat broken on 3.8?
Comment 9 Roland Scheidegger 2016-04-20 15:56:31 UTC
(In reply to Timo Aaltonen from comment #8)
> Actually it wasn't avx512, that was the first one I tried :) It's enabled
> also on 3.7 and that version works fine. Only one that was added in 3.8 is
> PKU, but dropping just that didn't help.
> 
> I did try dropping all non-client features (AVX512, CDI, DQI, BWI, VLX, PKU)
> and that worked. Maybe one of CDI/DQI/BWI/VLX is somewhat broken on 3.8?

Which is why I said "or any subvariant" ;-).
ERI, CDI, PFI, DQI, BWI, VLX are all avx512 variants (omg naming???), though that skylake in the llvm 3.8 list doesn't suport ERI and PFI. I'm not sure, but probably dropping avx512 manually when a enhanced variant still gets enabled won't do anything. I don't think PKU would matter (but no guarantee...). I suppose we should explicitly disable all of them via mattrs too (not that it's a battle we can win, there will be some extensions at some point...).
Comment 10 Timo Aaltonen 2016-04-20 18:15:01 UTC
Oh, I didn't know they were subvariants :)

I've dropped them from our llvm-3.8 for now at least..
Comment 11 Roland Scheidegger 2016-05-10 15:36:57 UTC
Worked around by 8b66e2647d5e36e318177a460e6e586d6ca8c36b.


bug/show.html.tmpl processed on Mar 30, 2017 at 04:46:14.
(provided by the Example extension).