Bug 104462

Summary: [SKL] GPU HANG: ecode 9:0:0x85dffffb, in BattleBlockThea [2678], reason: Hang on rcs0, action: reset
Product: Mesa Reporter: Tobe To <ebotot>
Component: Drivers/DRI/i965Assignee: Intel 3D Bugs Mailing List <intel-3d-bugs>
Status: RESOLVED DUPLICATE QA Contact: Intel 3D Bugs Mailing List <intel-3d-bugs>
Severity: normal    
Priority: medium CC: intel-gfx-bugs
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: GPU crash dump
dmesg output

Description Tobe To 2018-01-02 19:54:47 UTC
Created attachment 136502 [details]
GPU crash dump

Bug description:
Running the [linuxtest] build of BattleBlock Theater works normally up until reaching Chapter 3 of story mode. Upon entering that room, the game freezes and crashes to desktop.

System environment:
-- mesa: 17.3.1
-- system architecture: 64-bit
-- kernel: 4.14.10-1-ARCH
-- Linux distribution: Archlinux
-- Machine: Dell E7270
-- Display connector: DP

Reproducing steps:
Startup BattleBlock Theater
Start a local game
Enter story mode
Choose normal mode
Proceed to chapter 3
Enter the room

Additional info:
Certain arena modes such as Muckle as well as the corresponding online modes produce the same hang
Comment 1 Tobe To 2018-01-02 19:55:22 UTC
Created attachment 136503 [details]
dmesg output
Comment 2 Elizabeth 2018-01-05 23:15:22 UTC
Hello ebotot, 
Do you know if this worked in any previous mesa version? If possible, could you try this branch https://cgit.freedesktop.org/mesa/mesa/??
Thanks.
Comment 3 Tobe To 2018-01-07 03:42:19 UTC
Hi Elizabeth,
The crash still happens on the linked branch.
I hadn't reached Chapter 3 on previous versions of mesa so I'm not sure. However, I noticed that there are arena modes that I had successfully played before on either mesa 17.2.6 or 17.3.0 that now crash on 17.3.1. I'm having some trouble downgrading my mesa drivers, so I can't confirm this right now for sure.
Thanks for the help!
Comment 4 vadym 2018-01-25 10:35:00 UTC
Issue is reproducible on my laptop:

OS: Ubuntu 17.10 64-bit
CPU: Intel® Core™ i7-7500U CPU @ 2.70GHz × 4
GPU: Intel® HD Graphics 620 (Kaby Lake GT2)
mesa: OpenGL ES 3.2 Mesa 17.2.4
kernel: 4.13.0-31-generic

Bisected to bad commit:

ea0d2e98ecb369ab84e78c84709c0930ea8c293a is the first bad commit
commit ea0d2e98ecb369ab84e78c84709c0930ea8c293a
Author: Kenneth Graunke <kenneth@whitecape.org>
Date:   Thu Oct 5 20:31:01 2017 -0700

    i965: Disable auxiliary buffers when there are self-dependencies.
    
    Jason and I investigated several OpenGL CTS failures where the tests
    bind the same texture for rendering and texturing, at the same time.
    This has defined results as long as the reads happen before writes,
    or the regions are non-overlapping.  Normally, this just works out.
    
    However, CCS can cause problems.  If the shader is reading one set of
    pixels, and writing to different pixels that are adjacent, they may end
    up being covered by the same CCS block.  So rendering may be writing a
    CCS block, while the sampler is trying to read it.  Corruption ensues.
    
    Disabling CCS is unfortunate, but safe.
    
    Fixes several KHR-GL45.texture_barrier.* subtests.
    
    Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
    Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>


With the latest dev version of mesa 18.1.0-devel(513c2263) issue is not reproducible. Also I found out that this issue was fixed by the following series of commits:

df13588d21 i965: Stop disabling aux during texture preparation
20f70ae385 i965/draw: Set NEW_AUX_STATE when draw aux changes
e52a9f18d6 i965: Replace draw_aux_buffer_disabled with draw_aux_usage
468ea3cc45 i965/surface_state: Drop brw_aux_surface_disabled
d38ec24f53 i965/miptree: Add an aux_disabled parameter to render_aux_usage

Bug can be closed now.
Comment 5 vadym 2018-01-25 10:36:12 UTC
BTW On Ubuntu 17.10 and Ubuntu 16.04 this game crashes on start up with the following error:

DEBUG: InitGame started
BattleBlockTheater: /media/BGBS/BBT_Linux/Core/MemorySystem.cpp:161: void* MemoryBlock::Alloc(unsigned int): Assertion `(!"Got request for zero bytes!")' failed.
Aborted (core dumped)

I noticed that crash is happening when zero length array is allocated in glsl linker:

src/compiler/glsl/linker.cpp:1209
       InterfaceBlockStageIndex[i] = new int[max_num_buffer_blocks];

Looks like memory allocation is redefined in the game itself and it leads to crash for zero length array allocation. 

With the following hack game starts properly:

-- a/src/compiler/glsl/linker.cpp
+++ b/src/compiler/glsl/linker.cpp
@@ -1148,7 +1148,12 @@ interstage_cross_validate_uniform_blocks(struct gl_shader_program *prog,
    for (unsigned i = 0; i < MESA_SHADER_STAGES; i++) {
       struct gl_linked_shader *sh = prog->_LinkedShaders[i];
 
-      InterfaceBlockStageIndex[i] = new int[max_num_buffer_blocks];
+      if (max_num_buffer_blocks != 0) {
+           InterfaceBlockStageIndex[i] = new int[max_num_buffer_blocks];
+      }

My questing is should we provide any fix for this in mesa or it should be fixed in BattleBlocks game sources ?
Comment 6 Mark Janes 2018-01-25 16:51:07 UTC
The crash is a bug in the game:

https://stackoverflow.com/questions/1087042/c-new-int0-will-it-allocate-memory

The original hang has been fixed, as noted.  Thanks for the bug report!

*** This bug has been marked as a duplicate of bug 104411 ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.