Bug 99994 - Unreal Tournament 2016 segfaults with sisched
Summary: Unreal Tournament 2016 segfaults with sisched
Status: RESOLVED MOVED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/radeonsi (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: Axel Davy
QA Contact: Default DRI bug account
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-02-27 22:25 UTC by Ernst Sjöstrand
Modified: 2019-09-25 17:57 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments

Description Ernst Sjöstrand 2017-02-27 22:25:52 UTC
The free to play Unreal Tournament under development segfaults with R600_DEBUG=sisched.
Fully playable, with rendering issues, without it.
It segfaults very early, before the first graphics.

FIJI Fury card.

LLVM: 5.0~svn295460-0~y~padoka0
Mesa: 17.1~git170226151200.5b5ffb7~y~padoka0

(gdb) bt
#0  llvm::SIInstrInfo::isLowLatencyInstruction(llvm::MachineInstr const&) const () at /build/llvm-toolchain-snapshot-jHu56_/llvm-toolchain-snapshot-5.0~svn295460/include/llvm/CodeGen/MachineInstr.h:273
warning: (Internal error: pc 0x7fffdc4858cf in read in psymtab, but not in symtab.)

warning: (Internal error: pc 0x7fffdc4858cf in read in psymtab, but not in symtab.)

#1  0x00007fffdc4858d0 in llvm::SIScheduleDAGMI::moveLowLatencies() () at /build/llvm-toolchain-snapshot-jHu56_/llvm-toolchain-snapshot-5.0~svn295460/lib/Target/AMDGPU/SIMachineScheduler.cpp:1727
warning: (Internal error: pc 0x7fffdc48fc85 in read in psymtab, but not in symtab.)

warning: (Internal error: pc 0x7fffdc48fc85 in read in psymtab, but not in symtab.)

#2  0x00007fffdc48fc86 in llvm::SIScheduleDAGMI::schedule() () at /build/llvm-toolchain-snapshot-jHu56_/llvm-toolchain-snapshot-5.0~svn295460/lib/Target/AMDGPU/SIMachineScheduler.cpp:1871
#3  0x00007fffdb5f50ef in (anonymous namespace)::MachineSchedulerBase::scheduleRegions(llvm::ScheduleDAGInstrs&, bool) () at /build/llvm-toolchain-snapshot-jHu56_/llvm-toolchain-snapshot-5.0~svn295460/lib/CodeGen/MachineScheduler.cpp:480
#4  0x00007fffdb5fe789 in (anonymous namespace)::MachineScheduler::runOnMachineFunction(llvm::MachineFunction&) () at /build/llvm-toolchain-snapshot-jHu56_/llvm-toolchain-snapshot-5.0~svn295460/lib/CodeGen/MachineScheduler.cpp:344
#5  0x00007fffdb5a86a1 in llvm::MachineFunctionPass::runOnFunction(llvm::Function&) () at /build/llvm-toolchain-snapshot-jHu56_/llvm-toolchain-snapshot-5.0~svn295460/lib/CodeGen/MachineFunctionPass.cpp:62
#6  0x00007fffdb430b22 in llvm::FPPassManager::runOnFunction(llvm::Function&) () at /build/llvm-toolchain-snapshot-jHu56_/llvm-toolchain-snapshot-5.0~svn295460/lib/IR/LegacyPassManager.cpp:1513
#7  0x00007fffdb430bc3 in llvm::FPPassManager::runOnModule(llvm::Module&) () at /build/llvm-toolchain-snapshot-jHu56_/llvm-toolchain-snapshot-5.0~svn295460/lib/IR/LegacyPassManager.cpp:1534
#8  0x00007fffdb431567 in llvm::legacy::PassManagerImpl::run(llvm::Module&) () at /build/llvm-toolchain-snapshot-jHu56_/llvm-toolchain-snapshot-5.0~svn295460/lib/IR/LegacyPassManager.cpp:1590
#9  0x00007fffdc2b79c8 in LLVMTargetMachineEmit(LLVMOpaqueTargetMachine*, LLVMOpaqueModule*, llvm::raw_pwrite_stream&, LLVMCodeGenFileType, char**) () at /build/llvm-toolchain-snapshot-jHu56_/llvm-toolchain-snapshot-5.0~svn295460/lib/Target/TargetMachineC.cpp:204
#10 0x00007fffdc2b7bc9 in LLVMTargetMachineEmitToMemoryBuffer () at /build/llvm-toolchain-snapshot-jHu56_/llvm-toolchain-snapshot-5.0~svn295460/lib/Target/TargetMachineC.cpp:228
#11 0x00007fffdee3d8d5 in si_llvm_compile (M=0x7fffbc013db0, binary=0x7fffbc009460, tm=0x75e140, debug=0x0) at ../../../../../../src/gallium/drivers/radeonsi/si_shader_tgsi_setup.c:224
#12 0x00007fffdee38528 in si_compile_llvm (sscreen=<optimized out>, binary=0x7fffbc009460, conf=0x7fffbc0094c0, tm=0x75e140, mod=0x7fffbc013db0, debug=0x0, processor=1, name=0x7fffdf001231 "TGSI shader")
    at ../../../../../../src/gallium/drivers/radeonsi/si_shader.c:6159
#13 0x00007fffdee39389 in si_compile_tgsi_shader (sscreen=0x757480, tm=0x75e140, shader=0x7fffbc009330, is_monolithic=<optimized out>, debug=0x0) at ../../../../../../src/gallium/drivers/radeonsi/si_shader.c:7366
#14 0x00007fffdee4d39c in si_init_shader_selector_async (job=0x7fffa8b84d10, thread_index=1) at ../../../../../../src/gallium/drivers/radeonsi/si_state_shaders.c:1416
#15 0x00007fffdeb77754 in util_queue_thread_func (input=<optimized out>) at ../../../../../src/gallium/auxiliary/util/u_queue.c:175
#16 0x00007fffdeb773d8 in impl_thrd_routine (p=<optimized out>) at ../../../../../include/c11/threads_posix.h:87
#17 0x00007ffff7bc16ca in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#18 0x00007ffff0f240af in clone () from /lib/x86_64-linux-gnu/libc.so.6
Comment 1 Ernst Sjöstrand 2017-02-27 22:28:36 UTC
I downloaded it from here:
https://www.epicgames.com/unrealtournament/forums/showthread.php?12011-Unreal-Tournament-Pre-Alpha-Playable-Build

And followed instructions here:
https://www.gamingonlinux.com/articles/unreal-tournament-on-linux-checking-up-on-the-progress-by-epic-games.7241/

I'm running
4.15.0-3315666+++UT+Release-Next 510 0
which seems to be
"UPDATE: New build posted 1/26/2016 - 0.1.7.1"
Comment 2 Timothy Arceri 2018-04-11 04:20:27 UTC
This continues to crash on LLVM 7-devel and 18.1-devel as of today. Tested with latests Linux build [1].

[1] https://www.epicgames.com/unrealtournament/forums/unreal-tournament-discussion/announcements/387158-0-1-11-update-posted-5-16-2017
Comment 3 Andy Furniss 2018-04-11 08:50:02 UTC
(In reply to Timothy Arceri from comment #2)
> This continues to crash on LLVM 7-devel and 18.1-devel as of today. Tested
> with latests Linux build [1].
> 
> [1]
> https://www.epicgames.com/unrealtournament/forums/unreal-tournament-
> discussion/announcements/387158-0-1-11-update-posted-5-16-2017

FWIW that's not the latest.

https://www.epicgames.com/unrealtournament/forums/unreal-tournament-discussion/announcements
Comment 4 Andy Furniss 2018-04-11 08:54:12 UTC
0.1.12.1 on tonga current llvm/mesa does segfault on start for me with
R600_DEBUG=sisched
Comment 5 Axel Davy 2018-04-16 20:11:54 UTC
I guess for some unknown reason, for one instruction being scheduled, getInstr() returns NULL, and that causes the crash in isLowLatencyInstruction. Not sure why it happens specifically with this game, I will have to look at the intermediate compiler results.
Comment 6 ilia 2018-05-28 08:01:54 UTC
Similar error for me. Trying to run "Deus Ex Mankind Divided" - later on 16.04, now on 18.04 with same error.
HW: xeon 2665 in hp z420 mobo
SW: mesa 18.04, llvm 6.0, linux 4.16.12
Thread 6 "si_shader:1" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffcf7fe700 (LWP 30129)]
0x00007fffd7f54c00 in llvm::SIInstrInfo::isLowLatencyInstruction(llvm::MachineInstr const&) const () from /usr/lib/x86_64-linux-gnu/libLLVM-6.0.so.1
(gdb) bt
#0  0x00007fffd7f54c00 in llvm::SIInstrInfo::isLowLatencyInstruction(llvm::MachineInstr const&) const () at /usr/lib/x86_64-linux-gnu/libLLVM-6.0.so.1
#1  0x00007fffd7f842ac in llvm::SIScheduleDAGMI::moveLowLatencies() () at /usr/lib/x86_64-linux-gnu/libLLVM-6.0.so.1
#2  0x00007fffd7f8f088 in llvm::SIScheduleDAGMI::schedule() () at /usr/lib/x86_64-linux-gnu/libLLVM-6.0.so.1
#3  0x00007fffd6e2487f in  () at /usr/lib/x86_64-linux-gnu/libLLVM-6.0.so.1
#4  0x00007fffd6e2e3f7 in  () at /usr/lib/x86_64-linux-gnu/libLLVM-6.0.so.1
#5  0x00007fffd6dbbfe0 in llvm::MachineFunctionPass::runOnFunction(llvm::Function&) () at /usr/lib/x86_64-linux-gnu/libLLVM-6.0.so.1
#6  0x00007fffd6bf17f8 in llvm::FPPassManager::runOnFunction(llvm::Function&) () at /usr/lib/x86_64-linux-gnu/libLLVM-6.0.so.1
#7  0x00007fffd78ab1b8 in  () at /usr/lib/x86_64-linux-gnu/libLLVM-6.0.so.1
#8  0x00007fffd6bf108f in llvm::legacy::PassManagerImpl::run(llvm::Module&) () at /usr/lib/x86_64-linux-gnu/libLLVM-6.0.so.1
#9  0x00007fffd7d435b5 in  () at /usr/lib/x86_64-linux-gnu/libLLVM-6.0.so.1
#10 0x00007fffd7d43799 in LLVMTargetMachineEmitToMemoryBuffer () at /usr/lib/x86_64-linux-gnu/libLLVM-6.0.so.1
#11 0x00007fffdaeff264 in  () at /usr/lib/x86_64-linux-gnu/dri/radeonsi_dri.so
#12 0x00007fffdaef4cdc in  () at /usr/lib/x86_64-linux-gnu/dri/radeonsi_dri.so
#13 0x00007fffdaef654c in  () at /usr/lib/x86_64-linux-gnu/dri/radeonsi_dri.so
#14 0x00007fffdaf12a50 in  () at /usr/lib/x86_64-linux-gnu/dri/radeonsi_dri.so
#15 0x00007fffdab1bc0e in  () at /usr/lib/x86_64-linux-gnu/dri/radeonsi_dri.so
#16 0x00007fffdab1b8a7 in  () at /usr/lib/x86_64-linux-gnu/dri/radeonsi_dri.so
#17 0x00007ffff476f6db in start_thread (arg=0x7fffcf7fe700) at pthread_create.c:463
#18 0x00007fffede5e88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Comment 7 ilia 2018-05-28 14:34:45 UTC
I has modified isLowLatencyInstruction to that:

bool SIInstrInfo::isLowLatencyInstruction(const MachineInstr &MI) const {
  if(&MI != nullptr){
    std::thread::id this_id = std::this_thread::get_id();
    std::cout << getpid() << this_id << " -- Got valid pointer &MI=" << &MI << "\n";
    unsigned Opc = MI.getOpcode();
    printf("Got Opc = %u\n", Opc);
    return isSMRD(Opc);
  } else {
    printf("Got NULL pointer MI\n");
    return false;
  }
}

And got no nullptr passed, but found accessing same pointer from 2 different threads by the same time. 

0x2ce50x7f54094a9700 -- Got valid pointer &MI=0x7f53f03c0d70
Got Opc = 8188
0x2ce50x7f5403fff700 -- Got valid pointer &MI=0x7f53f03c0d70
Got Opc = 8188
Comment 8 ilia 2018-06-02 06:41:48 UTC
All I found - error happens while access to member Desc of class MCInstrInfo. 
isLowLatencyInstruction -> isSMRD(Opcode) -> MCInstrInfo::get(Opcode) -> return Desc[Opcode]

Opcode value is not meaning - it fails even if replace Desc[Opcode] with Desc[0]. But "return *(new MCInstrDesc())" instead of "return Desc[Opcode]" make it working (nuget is only for isSMRD call):

  const MCInstrDesc &get(unsigned Opcode) const {
    assert(Opcode < NumOpcodes && "Invalid opcode!");
    return Desc[Opcode];
  }
  MCInstrDesc &nuget(unsigned Opcode) const {
    assert(Opcode < NumOpcodes && "Invalid opcode!");
    assert(Desc != NULL && "Desc is NULL!");
    assert(Desc != nullptr && "Desc is nullptr!");
    return *(new MCInstrDesc());
  }

of course, it is not a solution, also ambient occlusion isn't working correctly with this.
Declaring Desc as std::vector<MCInstrDesc>, init it with new and fill with Desc->assign(D, D+NO) - also gives nothing, game fails the same way.
Comment 9 Axel Davy 2018-06-02 09:18:15 UTC
Do you have by any chance already extracted the llvm code for the shader that triggers the issue ?
Comment 10 Axel Davy 2018-06-04 22:53:11 UTC
I extracted the shader and looked at the debug info when it compiles.

My understanding is that the bug is triggered when the llvm shader contains a COPY operand as last operation.
I'm not sure why this behaviour is triggered, but a fix is to ignore in this case the successor (which is ExitSU, not valid):

Replace in SIScheduleDAGMI::moveLowLatencies:
    } else if (SU->getInstr()->getOpcode() == AMDGPU::COPY) {
      bool CopyForLowLat = false;
      for (SDep& SuccDep : SU->Succs) {
        SUnit *Succ = SuccDep.getSUnit();
        if (SITII->isLowLatencyInstruction(*Succ->getInstr())) {
          CopyForLowLat = true;
        }
      }


by

    } else if (SU->getInstr()->getOpcode() == AMDGPU::COPY) {
      bool CopyForLowLat = false;
      for (SDep& SuccDep : SU->Succs) {
        SUnit *Succ = SuccDep.getSUnit();
        if (Succ->NodeNum >= DAGSize)
          continue;
        if (SITII->isLowLatencyInstruction(*Succ->getInstr())) {
          CopyForLowLat = true;
        }
      }

The affected shader seems to compile fine with the change.
Comment 11 ilia 2018-06-06 11:20:08 UTC
(In reply to Axel Davy from comment #10)

Thank you, this fix made DX:MD working for me.
Comment 12 Axel Davy 2018-06-20 18:55:20 UTC
I've posted the patch for review last week:
https://reviews.llvm.org/D47984
Comment 13 GitLab Migration User 2019-09-25 17:57:21 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1256.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.