Bug 37871 - [bisected i965] Bus error (core dumped) on oglc texdecaltile
[bisected i965] Bus error (core dumped) on oglc texdecaltile
Status: VERIFIED FIXED
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965
git
All Linux (All)
: medium major
Assigned To: Ian Romanick
:
: 44436 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-06-02 22:26 UTC by fangxun
Modified: 2014-07-11 05:31 UTC (History)
2 users (show)

See Also:


Attachments
Limit texture size to fit in GTT (2.94 KB, patch)
2011-06-04 05:38 UTC, Chris Wilson
Details | Splinter Review
Check the buffer is mappable at the time of creation (1.44 KB, patch)
2011-06-06 16:05 UTC, Chris Wilson
Details | Splinter Review
Catch SIGBUS and propagate GL_OUT_OF_MEMORY (3.92 KB, patch)
2011-06-10 16:01 UTC, Chris Wilson
Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description fangxun 2011-06-02 22:26:21 UTC
System Environment:
--------------------------
Arch:           i386
Platform:       Huronriver
Libdrm:         (master)2.4.25-1-g61be94018ae9c403517d53f69357719224fa6ff3
Mesa:           (master)f61d1deac7d19dcec38b7852a635d92680624a32
Xf86_video_intel: (master)2.15.0-18-g340cfb7f5271fd1df4c8948e5c9336f5b69a6e6c
Xserver:(master)xorg-server-1.10.0-397-g4621bb270a36d35d4ab67f1d7fb47674683dfc5b
Kernel: (drm-intel-next)caee6066332b83e7f8bdd6f2f40ce46d4836d69c


Bug detailed description:
-------------------------
It happens on Sandybrige and ironlake. Bisect shows f61d1deac7d19dcec38b7852a635d92680624a32 is the first bad commit. Three other oglc cases are also caused by the same commit.
fbo(scenario.shadowMaps) 
max_values(advanced.textureSize.texture2D)
max_values(negative.textureSize.texture2D)

commit f61d1deac7d19dcec38b7852a635d92680624a32
Author:     Chris Wilson <chris@chris-wilson.co.uk>
AuthorDate: Thu Jun 2 08:27:09 2011 +0100
Commit:     Chris Wilson <chris@chris-wilson.co.uk>
CommitDate: Thu Jun 2 08:30:21 2011 +0100

    i965: Raise const.MaxTextureLevels to 14 (8192)

    Mesa now limits, by default, the max number of texture levels to 15 so we
    can now support the architectural maximum for gen4-6 of 14.

    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


Reproduce steps:
-------------------------
1. start X
2. ./oglconform -z -s -suite all -v 2 -D 33 -test texdecaltile
Comment 1 Chris Wilson 2011-06-03 09:57:10 UTC
=0 sandybridge:/usr/src/oglconform_31/dump/linux/debug64/OGLconform (master)$ ./oglconform64 -z -s -suite all -v 2 -D 33 -test texdecaltile
Intel OpenGL Conformance Test
Version ENG (Feb  1 2011 11:48:42)

CLI options echo:
oglconform64 -z -s -suite all -v 2 -D 33 -test texdecaltile 

WARNING: OpenCL is not supported.
Pixel format 33
GLX_USE_GL:          Yes
GLX_BUFFER_SIZE:     32
GLX_LEVEL:           0
GLX_RGBA:            Yes
GLX_DOUBLEBUFFER:    Yes
GLX_STEREO:          No
GLX_AUX_BUFFERS      0
GLX_RED_SIZE         8
GLX_GREEN_SIZE       8
GLX_BLUE_SIZE        8
GLX_ALPHA_SIZE       8
GLX_DEPTH_SIZE       24
GLX_STENCIL_SIZE     8
GLX_ACCUM_RED_SIZE   0
GLX_ACCUM_GREEN_SIZE 0
GLX_ACCUM_BLUE_SIZE  0
GLX_ACCUM_ALPHA_SIZE 0

Setup Report.
    Verbose level = 2.
    Path inactive.

Visual Report.
    Display ID = 33. 
    Double Buffered.
    RGBA (8, 8, 8, 8).
    Stencil (8).
    Depth (24).
    Accumulation (0, 0, 0, 0).
    0 Auxilary Buffers.

OpenGL Report.
    Vendor - 'Tungsten Graphics, Inc'
    Renderer - 'Mesa DRI Intel(R) Sandybridge Desktop '
    Version - '2.1 Mesa 7.11-devel (git-3aeb596)'
    GLSL Version - '1.20'

WARNING: Extension GL_ARB_draw_elements_base_vertex is reported but its API is NOT COMPLETE.

>> Texture Decal Tiling (texdecaltile)  test:
<< Texture Decal Tiling (texdecaltile)  test passed.

Intel Conformance passed.

Total Passed : 1
Total Failed : 0
Total Not run: 0

Puzzled.
Comment 2 Ian Romanick 2011-06-03 14:28:31 UTC
Increasing the maximum texture size may lead to odd out-of-memory like problems.  How much memory do the failing and non-failing configurations have?

Are you both running the same kernel?

Chris, is your SNB a Huronriver?  We have seen some differences between the platforms, and this could be another one.
Comment 3 Chris Wilson 2011-06-04 03:12:55 UTC
Ah, I see the error on my HuronRiver. And actually thinking about the issue:

8192*8192*4 = 256MiB, which exceeds the available mappable aperture size on my laptop but still fits conformtably within the 512MiB aperture on my SugarBay.

So i915_gem_fault() is detecting an E2BIG when trying to bing the bo into the mappable region which gets translated to a SIGBUS in the fault handler.

The easiest approach is to ratchet down the maximum texture size until we can safely mmap it through the GTT.
Comment 4 Chris Wilson 2011-06-04 05:38:52 UTC
Created attachment 47518 [details] [review]
Limit texture size to fit in GTT
Comment 5 Ian Romanick 2011-06-06 15:23:56 UTC
(In reply to comment #4)
> Created an attachment (id=47518) [details]
> Limit texture size to fit in GTT

This isn't generally the right way to do this, but this should fix the issue for now.  We should fail at texture creation time instead.  After all, a 8192x1 texture will fit in the GTT.  It's only when both dimensions are too big that we should fail.
Comment 6 Chris Wilson 2011-06-06 16:05:03 UTC
Created attachment 47626 [details] [review]
Check the buffer is mappable at the time of creation

Ian, something along the lines of this then?

I'm not sure if this will make the testcase happy though, I guess it will compain about the GL_OUT_OF_MEMORY..
Comment 7 Chris Wilson 2011-06-06 16:05:59 UTC
Proof-of-concept patch only. (Obviously doesn't even compile ;-)
Comment 8 Chris Wilson 2011-06-10 16:01:15 UTC
Created attachment 47832 [details] [review]
Catch SIGBUS and propagate GL_OUT_OF_MEMORY

The third scheme is to trap the SIGBUS and convert it to a GL_OUT_OF_MEMORY. Insert the horror of signal handling and multithreaded applications here...

(Again another proof-of-concept patch, it prevents the SIGBUS in test case, but can not prevent it from failing, although this time gracefully.)
Comment 9 Dan McCabe 2011-07-21 17:58:35 UTC
When I run the test on my SNB system, the conformance test app seg faults in in the file:
   src/CONFSHEL/windowing.cpp
inside the function:
   FbConfig::get_config_with_id()
because glXChooseFBConfig() returns a NULL configs[] array and the code tries to access the first element of that array, dereferencing NULL. The test does not seg fault in the underlying graphics system, which appears to be well behaved.

When I modify the test to check whether configs is NULL and just return NULL from the function in that situation, the test no longer seg faults. Instead, the test emits the following error message:
   Error encountered during test scheduling:
   Error: no test in schedule is compatibile with selected pixel format

This appears to me to be a test application failure, not a failure of the underlying system.
Comment 10 zhao jian 2011-07-21 19:31:06 UTC
(In reply to comment #9)
> When I run the test on my SNB system, the conformance test app seg faults in in
> the file:
>    src/CONFSHEL/windowing.cpp
> inside the function:
>    FbConfig::get_config_with_id()
> because glXChooseFBConfig() returns a NULL configs[] array and the code tries
> to access the first element of that array, dereferencing NULL. The test does
> not seg fault in the underlying graphics system, which appears to be well
> behaved.
> When I modify the test to check whether configs is NULL and just return NULL
> from the function in that situation, the test no longer seg faults. Instead,
> the test emits the following error message:
>    Error encountered during test scheduling:
>    Error: no test in schedule is compatibile with selected pixel format
> This appears to me to be a test application failure, not a failure of the
> underlying system.

This error is in the oglc code now it uses glXChooseFBConfigto select fbconfig and its visual. So now we should test it using the command: oglconform -z -s -suite all -v 2 -test texdecaltile basic.allCases -D 115 (if without -D option it will test all the visuals available) 

And I find it can pass with with some visuals while failed with some others. So I guess it related to some visual disposing in our driver. 
It pass with the visual 
(ID |ACCELERA|DB |REND_T |SURF_T |C_BUF_T |BUF_S |RED_S | 
 115| 1| 1| gl| wipbpx| rgba| 32| 8| 

GREEN_S |BLUE_S |ALPHA_S |DEPTH_S |STENC_S |ACCUM_S |SPL_BUF |SAMPLES | 
8| 8| 8| 24| 8| 64| 0| 0| 

SRGB |TEX_RGB |TEX_RGBA|CAVEAT |SWAP |M_PBUF_W|M_PBUF_H|M_PBUF_P 
-1| 0| 0| slow| undef| 0| 0| 0 ). 
It failed with visual: 
(ID |ACCELERA|DB |REND_T |SURF_T |C_BUF_T |BUF_S |RED_S | 
115| 1| 1| gl| wipbpx| rgba| 24| 8| 

GREEN_S |BLUE_S |ALPHA_S |DEPTH_S |STENC_S |ACCUM_S |SPL_BUF |SAMPLES | 
8| 8| 0| 24| 8| 48| 0| 0| 

SRGB |TEX_RGB |TEX_RGBA|CAVEAT |SWAP |M_PBUF_W|M_PBUF_H|M_PBUF_P 
-1| 0| 0| slow| undef| 0| 0| 0 )

And another issue in mesa now is that even with the same fbconfig ID it will have different formats on different platforms. This maybe also a driver bug?
Comment 11 Chad Versace 2011-07-26 12:53:26 UTC
This bug also causes piglit/fbo-maxsize to occasionally segfault on my ILK machine.
Comment 12 Chad Versace 2011-07-26 12:56:55 UTC
See also
    Bug 38423 - i965/gen5: fbo-maxsize fails on master 
    https://bugs.freedesktop.org/show_bug.cgi?id=38423
Comment 13 Gordon Jin 2011-07-28 19:05:41 UTC
demote to unblock the release.
Comment 14 Dan McCabe 2011-07-29 13:26:40 UTC
The root cause of this bug appears to have been in code base for quite a while.

The issue is that texture map allocation fails silently if too large a texture map or too many small texture maps are allocated. 

As I understand it (after discussions with Eric Anholt), in DRM (file: intel_bufmgr_gem.c proc: drm_intel_gem_bo_map_gtt()), the memory allocation for a texture map occurs in 2 stages: the first allocation creates a mapped file and its virtual memory, and the second stage passes that mapped file to the kernel to placed into the GTT (via drmioctl(..., DRM_IOCTL_I915_GEM_SET_DOMAIN, ...)).

If the memory allcoated by mmap() is fragmented (and the likelihood of that happening gets higher the larger texture map is), DROM_IOCTL_I915_GEM_SET_DOMAIN will not be use that memory. However, no indication is returned if that memory is not usable due to fragmentation (as I understand the code, which is only partial).

In that event, the first attempt to write to that memory (such as during the initialization of a texture map using an image supplied by the application), the system segfaults in texstore() (which is located in mesa/src/mesa/main/texstore.c).
Comment 15 Chris Wilson 2012-01-04 01:35:17 UTC
*** Bug 44436 has been marked as a duplicate of this bug. ***
Comment 16 Eric Anholt 2013-04-08 20:23:00 UTC
Fixed by the series ending with:

commit ca9a7d975af228cabb79c3040ec67f26f94f90a2
Author: Eric Anholt <eric@anholt.net>
Date:   Tue Apr 2 17:28:41 2013 -0700

    intel: Avoid making tiled miptrees we won't be able to blit.
Comment 17 fangxun 2014-07-11 05:31:05 UTC
Verified it on latest mesa master and 10.2 branch.