Bug 107892

Summary: [Debug mesa only]. crash happens when blit framebuffer
Product: Mesa Reporter: xinghua <xinghua.cao>
Component: Drivers/DRI/i965Assignee: Intel 3D Bugs Mailing List <intel-3d-bugs>
Status: RESOLVED FIXED QA Contact: Intel 3D Bugs Mailing List <intel-3d-bugs>
Severity: normal    
Priority: medium CC: vadym.shovkoplias, yang.gu, yunchao.he
Version: git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: Patch with the workaround
attachment-4807-0.html

Description xinghua 2018-09-11 02:55:20 UTC
Steps:
1. Download chrome, and open the link "https://www.khronos.org/registry/webgl/sdk/tests/conformance2/rendering/blitframebuffer-size-overflow.html?webglVersion=2&quiet=0"
2.Crash happens.

Notes:
I am not sure that the issue is a mesa bug or case bug, but mesa throws the ASSERT. If it is a case bug, please give some information why it crashes, thank you.
Comment 1 Tapani Pälli 2018-09-11 05:55:54 UTC
Which device are you running this on? For me on HSW machine and Chrome 69.0.3497.81 (Official Build) (64-bit) I get TEST_COMPLETE (passes)
Comment 2 xinghua 2018-09-11 06:03:00 UTC
(In reply to Tapani Pälli from comment #1)
> Which device are you running this on? For me on HSW machine and Chrome
> 69.0.3497.81 (Official Build) (64-bit) I get TEST_COMPLETE (passes)

My machine is coffee lake and Ubuntu 18.04. I could also reproduce this issue by using firefox and ubuntu system default graphics driver.
Comment 3 Tapani Pälli 2018-09-11 07:45:57 UTC
Please attach a backtrace where the assert happens.
Comment 4 xinghua 2018-09-11 08:06:21 UTC
(In reply to Tapani Pälli from comment #3)
> Please attach a backtrace where the assert happens.

The backtrce is as below, but cannot view mesa source code in the stack.

chrome --single-process: ../../../../../src/intel/genxml/gen9_pack.h:72: __gen_uint: Assertion `v <= max' failed.

Thread 24 "Chrome_InProcGp" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffaee01700 (LWP 3948)]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) c
Continuing.

Thread 24 "Chrome_InProcGp" received signal SIGTRAP, Trace/breakpoint trap.
base::debug::(anonymous namespace)::DebugBreak () at ../../base/debug/debugger_posix.cc:240
240	}
(gdb) c
Continuing.
Received signal 6
#0 0x7ffff7ce9bcd base::debug::StackTrace::StackTrace()
#1 0x7ffff79f59ac base::debug::StackTrace::StackTrace()
#2 0x7ffff7ce9624 base::debug::(anonymous namespace)::StackDumpSignalHandler()
#3 0x7fffd1743890 <unknown>
#4 0x7fffcd410e97 gsignal
#5 0x7fffcd412801 abort
#6 0x7fffcd40239a <unknown>
#7 0x7fffcd402412 __assert_fail
#8 0x7fffa9a66f2c <unknown>
#9 0x7fffaa11b9e8 <unknown>
#10 0x7fffaa0d1cb0 <unknown>
#11 0x7fffaa0d4b82 <unknown>
#12 0x7fffa9e707e7 <unknown>
#13 0x7fffa9e70ac2 <unknown>
#14 0x7fffa9e71737 <unknown>
#15 0x7fffa9ea81e7 <unknown>
#16 0x7fffa9a88616 <unknown>
#17 0x7fffe585eeaa gl::GLApiBase::glBlitFramebufferFn()
#18 0x7fffe110a727 gpu::gles2::GLES2DecoderImpl::DoBlitFramebufferCHROMIUM()
#19 0x7fffe10cb434 gpu::gles2::GLES2DecoderImpl::HandleBlitFramebufferCHROMIUM()
#20 0x7fffe113b88c gpu::gles2::GLES2DecoderImpl::DoCommandsImpl<>()
#21 0x7fffe10fb695 gpu::gles2::GLES2DecoderImpl::DoCommands()
#22 0x7fffedb398dd gpu::CommandBufferService::Flush()
#23 0x7fffcbfdbfad gpu::CommandBufferStub::OnAsyncFlush()
#24 0x7fffcbfeaba4 _ZN4base20DispatchToMethodImplIPN3gpu17CommandBufferStubEMS2_FvijENSt3__15tupleIJijEEEJLm0ELm1EEEEvRKT_T0_OT1_NS6_16integer_sequenceImJXspT2_EEEE
#25 0x7fffcbfeaad8 _ZN4base16DispatchToMethodIPN3gpu17CommandBufferStubEMS2_FvijENSt3__15tupleIJijEEEEEvRKT_T0_OT1_
#26 0x7fffcbfeaa64 _ZN3IPC16DispatchToMethodIN3gpu17CommandBufferStubEMS2_FvijEvNSt3__15tupleIJijEEEEEvPT_T0_PT1_OT2_
#27 0x7fffcbfe4b73 _ZN3IPC8MessageTI35GpuCommandBufferMsg_AsyncFlush_MetaNSt3__15tupleIJijEEEvE8DispatchIN3gpu17CommandBufferStubES8_vMS8_FvijEEEbPKNS_7MessageEPT_PT0_PT1_T2_
#28 0x7fffcbfd9c4d gpu::CommandBufferStub::OnMessageReceived()
#29 0x7ffff444b998 IPC::MessageRouter::RouteMessage()
#30 0x7fffcbffe220 gpu::GpuChannel::HandleMessageHelper()
#31 0x7fffcbff9d2b gpu::GpuChannel::HandleMessage()
#32 0x7fffcbfeedef _ZN4base8internal13FunctorTraitsIMN3gpu17CommandBufferStubEFvRKNS2_9SyncTokenEEvE6InvokeIS8_RKNS_7WeakPtrIS3_EEJS6_EEEvT_OT0_DpOT1_
#33 0x7fffcbfeed55 _ZN4base8internal12InvokeHelperILb1EvE8MakeItSoIRKMN3gpu17CommandBufferStubEFvRKNS4_9SyncTokenEERKNS_7WeakPtrIS5_EEJS8_EEEvOT_OT0_DpOT1_
#34 0x7fffcbfeeccd _ZN4base8internal7InvokerINS0_9BindStateIMN3gpu17CommandBufferStubEFvRKNS3_9SyncTokenEEJNS_7WeakPtrIS4_EES5_EEEFvvEE7RunImplIRKS9_RKNSt3__15tupleIJSB_S5_EEEJLm0ELm1EEEEvOT_OT0_NSI_16integer_sequenceImJXspT1_EEEE
#35 0x7fffcc0087e9 _ZN4base8internal7InvokerINS0_9BindStateIMN3gpu10GpuChannelEFvRKN3IPC7MessageEEJNS_7WeakPtrIS4_EES6_EEEFvvEE7RunOnceEPNS0_13BindStateBaseE
#36 0x7fffedb37e9e _ZNO4base12OnceCallbackIFvvEE3RunEv
#37 0x7fffedb4cfcc gpu::Scheduler::RunNextTask()
#38 0x7fffedb5dc6f _ZN4base8internal13FunctorTraitsIMN3gpu9SchedulerEFvvEvE6InvokeIS5_RKNS_7WeakPtrIS3_EEJEEEvT_OT0_DpOT1_
#39 0x7fffedb5dbea _ZN4base8internal12InvokeHelperILb1EvE8MakeItSoIRKMN3gpu9SchedulerEFvvERKNS_7WeakPtrIS5_EEJEEEvOT_OT0_DpOT1_
#40 0x7fffedb5db80 _ZN4base8internal7InvokerINS0_9BindStateIMN3gpu9SchedulerEFvvEJNS_7WeakPtrIS4_EEEEEFvvEE7RunImplIRKS6_RKNSt3__15tupleIJS8_EEEJLm0EEEEvOT_OT0_NSF_16integer_sequenceImJXspT1_EEEE
#41 0x7fffedb5dabc _ZN4base8internal7InvokerINS0_9BindStateIMN3gpu9SchedulerEFvvEJNS_7WeakPtrIS4_EEEEEFvvEE3RunEPNS0_13BindStateBaseE
#42 0x7ffff79a43ee _ZNO4base12OnceCallbackIFvvEE3RunEv
#43 0x7ffff79f6e72 base::debug::TaskAnnotator::RunTask()
#44 0x7ffff7a88d26 base::MessageLoop::RunTask()
#45 0x7ffff7a890ae base::MessageLoop::DeferOrRunPendingTask()
#46 0x7ffff7a89539 base::MessageLoop::DoWork()
#47 0x7ffff7a8fd07 base::MessagePumpDefault::Run()
#48 0x7ffff7a8841b base::MessageLoop::Run()
#49 0x7ffff7b3092d base::RunLoop::Run()
#50 0x7ffff7c24428 base::Thread::Run()
#51 0x7ffff7c250f0 base::Thread::ThreadMain()
#52 0x7ffff7d1f2fd base::(anonymous namespace)::ThreadFunc()
#53 0x7fffd17386db start_thread
#54 0x7fffcd4f388f clone
  r8: 0000000000000000  r9: 00007fffaedf9e70 r10: 0000000000000008 r11: 0000000000000246
 r12: 00007fffaa284490 r13: 00007fffaa258cc0 r14: 0000000000000048 r15: 00007fffaedfa8c0
  di: 0000000000000002  si: 00007fffaedf9e70  bp: 00007fffcd5897d8  bx: 0000000000000000
  dx: 0000000000000000  ax: 0000000000000000  cx: 00007fffcd410e97  sp: 00007fffaedf9e70
  ip: 00007fffcd410e97 efl: 0000000000000246 cgf: 002b000000000033 erf: 0000000000000000
 trp: 0000000000000001 msk: 0000000000000000 cr2: 0000000000000000
[end of stack trace]
Comment 5 Denis 2018-09-11 11:19:53 UTC
Hi, looks like that's not an ordinary issue. I couldn't reproduce the crash on this configuration:


Graphics: Card: Intel Device 3e91 bus-ID: 00:02.0
Display Server: x11 (X.Org 1.19.6 ) driver: i915
Resolution: 1920x1080@60.00hz
OpenGL: renderer: Mesa DRI Intel UHD Graphics 630 (Coffeelake 3x8 GT2)
version: 4.5 Mesa 18.3.0-devel (git-3d08631fe5) Direct Render: Yes
CPU~Quad core Intel Core i3-8100 (-MCP-) speed/max~1118/3600 MHz Kernel~4.15.0-34-generic x86_64


The only thing I see, it is this in cmd:

[14531:14531:0911/141344.539273:ERROR:gles2_cmd_decoder.cc(8641)] [.WebGL-0x2ad78b82ee00]GL ERROR :GL_INVALID_VALUE : glBlitFramebufferCHROMIUM: the width or height of src or dst region overflowed
Comment 6 asimiklit 2018-09-11 13:52:17 UTC
Hi,

Looks like as a duplicate for:
https://bugs.freedesktop.org/show_bug.cgi?id=103241

In both these bugs the programs crash on the same assertion: gen9_pack.h:72

Regards,
Andrii.
Comment 7 Jason Ekstrand 2018-09-11 15:59:11 UTC
(In reply to asimiklit from comment #6) 
> Looks like as a duplicate for:
> https://bugs.freedesktop.org/show_bug.cgi?id=103241

Those two bugs are completely unrelated.  That assert is something of a catch-all that just means you over-flowed a field in a hardware packet somewhere.
Comment 8 Mark Janes 2018-09-11 16:22:07 UTC
Can we make a simple reproducing test case in crucible for this?
Comment 9 Mark Janes 2018-09-11 16:23:54 UTC
oops -- we need a piglit test, not crucible.  i was confused by the reference to 103241
Comment 10 Mark Janes 2018-09-22 19:54:43 UTC
Denis: can you try to repro with the 18.04 configuration reported in the description?
Comment 11 Denis 2018-09-24 10:24:58 UTC
Mark, my fault, I forgot to mention OS. That is exactly Ubuntu 18.04

System:    Host: ubuntu-MS-7B49 Kernel: 4.15.0-34-generic x86_64 bits: 64
           Desktop: Gnome 3.28.3 Distro: Ubuntu 18.04.1 LTS

Btw, after that I updated xserver to 1.20, and still - can't reproduce the issue

Display Server: x11 (X.Org 1.20.1 ).


The only one difference I see between Chrome and FF, that in FF test has several failures, but nothing crash.



PASS WebGL context exists

Begin to run blitFramebuffer. The computed width/height of src and/or dst region might overflow during blitting.
PASS getError was expected value: NO_ERROR : Using max 32-bit integer as blitFramebuffer parameter should succeed.
PASS getError was expected value: NO_ERROR : Using blitFramebuffer parameters where calculated width/height matches max 32-bit integer should succeed.
FAIL getError expected: INVALID_VALUE. Was NO_ERROR : Using source width/height greater than max 32-bit integer should fail.
FAIL getError expected: INVALID_VALUE. Was NO_ERROR : Using source width/height greater than max 32-bit integer should fail.
FAIL getError expected: INVALID_VALUE. Was NO_ERROR : Using destination width/height greater than max 32-bit integer should fail.
FAIL getError expected: INVALID_VALUE. Was NO_ERROR : Using destination width/height greater than max 32-bit integer should fail.
FAIL getError expected: INVALID_VALUE. Was NO_ERROR : Using both source and destination width/height greater than max 32-bit integer should fail.
FAIL getError expected: INVALID_VALUE. Was NO_ERROR : Using minimum and maximum integers for all boundaries should fail.
PASS successfullyParsed is true

TEST COMPLETE
Comment 12 Denis 2018-09-24 11:14:21 UTC
Oh, reproduced. Even with mesa 18.0.0 (trying to find version without the issue)

But - debug only. Every time I am loading the test, chrome generates additional crash report:

google-chrome
/chrome/chrome --type=gpu-process --field-trial-handle=8950006329711317741,10367114233502893736,131072 --enable-crash-reporter=af797d65-acae-45fa-b15d-77ebb7842225, --gpu-preferences=KAAAAAAAAACAAABAAQAAAAAAAAAAAGAAAAAAAAAAAAAIAAAAAAAAAAgAAAAAAAAA --enable-crash-reporter=af797d65-acae-45fa-b15d-77ebb7842225, --service-request-channel-token=11429883429524802042: ../../../../../src/intel/genxml/gen9_pack.h:66: __gen_uint: Assertion `v <= max' failed.
--2018-09-24 14:10:59--  https://clients2.google.com/cr/report
Resolving clients2.google.com (clients2.google.com)... 216.58.205.238, 2a00:1450:4001:820::200e
Connecting to clients2.google.com (clients2.google.com)|216.58.205.238|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘/dev/fd/4’

Crash dump id: 5abd635e59165a3f



Pretty sure it is exactly that one, caught by Xinghua
Comment 13 Denis 2018-09-24 11:36:54 UTC
Reproduced this issue on debug mesa versions:
18.3.0-git
18.2.1 (tag)
18.0.0 (tag)
17.3.6 (tag)

Also I found interesting behavior. Even on latest mesa, after running test several times - 3-5, it will pass and crashes will disappear. I thought that it might be some cashed data, but cache is empty for the page.

The same behavior and for test from this bug https://bugs.freedesktop.org/show_bug.cgi?id=107987
Comment 14 vadym 2018-09-28 13:58:29 UTC
(In reply to Mark Janes from comment #9)
> oops -- we need a piglit test, not crucible.  i was confused by the
> reference to 103241

Hi Mark,

This issue happens during following gl call:

glBlitFramebufferEXT(0, 0, INT_MAX, INT_MAX,
	             0, 0, 160, 160,
		     GL_COLOR_BUFFER_BIT, GL_NEAREST);

I already added new pilit test which tests some limits of BlitFramebuffer function and I think it can be extended to test this case. Link to test: https://patchwork.freedesktop.org/patch/253442/
Comment 15 vadym 2018-09-28 13:58:42 UTC
The bug is in brw_meta_mirror_clip_and_scissor() function. For the above case it calculates coordinates for destination buffer as: 

   dstX0: 0.000000, dstY0: 159.999988, dstX1: 0.000012, dstY1: 160.000000

Then in try_blorp_blit() functions these coordinates rounded to nearest integer:

   params->x0 = params->wm_inputs.discard_rect.x0 = round(coords->x.dst0);
   params->y0 = params->wm_inputs.discard_rect.y0 = round(coords->y.dst0);
   params->x1 = params->wm_inputs.discard_rect.x1 = round(coords->x.dst1);
   params->y1 = params->wm_inputs.discard_rect.y1 = round(coords->y.dst1);

The result of rounding is:

   params->x0 = 0.0
   params->y0 = 160.0
   params->x1 = 0.0
   params->y1 = 160.0

Then these coordinates goes to gen9_blorp_exec():

   blorp_emit(batch, GENX(3DSTATE_DRAWING_RECTANGLE), rect) {
      rect.ClippedDrawingRectangleXMax = MAX2(params->x1, params->x0) - 1;
      rect.ClippedDrawingRectangleYMax = MAX2(params->y1, params->y0) - 1;
   }

Here MAX2(params->x1, params->x0) equals 0. And 0 - 1 equals UINT_MAX for unsigned integers.
Comment 16 vadym 2018-09-28 14:00:33 UTC
Created attachment 141776 [details] [review]
Patch with the workaround
Comment 17 vadym 2018-10-01 12:37:53 UTC
Didn't notice that Jason already sent a fix for this bug https://patchwork.freedesktop.org/patch/248561/

But with this fix applied Firefox still crashes on the following call:

glBlitFramebuffer(srcX0 = -1, srcY0 = -1, srcX1 = 2147483646, srcY1 = 2147483646, dstX0 = 0, dstY0 = 0, dstX1 = 8, dstY1 = 8, mask = GL_COLOR_BUFFER_BIT, filter = GL_NEAREST)
Comment 18 Yang Gu 2018-10-01 12:38:11 UTC
Created attachment 141822 [details]
attachment-4807-0.html

Yang is OOO from Oct 1 to 7 for National Day holiday. Please expect slow response.
Comment 19 vadym 2018-10-29 16:32:15 UTC
Hi Jason,

Looks like there is an issue with the float comparison. It works perfectly fine for me if it is compared with some precision:

if( (fabsf(*dstX1 - *dstX0) < 1e-8F) || (fabsf(*dstY1 - *dstY0) < 1e-8F) ) {
      return true;
Comment 20 Denis 2019-04-18 11:30:21 UTC
hi guys. As I see, this issue was fixed by this commit:

commit 72a921e12ac1828998d2a32966e1dd0123eabfdf
Author: Sergii Romantsov <sergii.romantsov@globallogic.com>
Date:   Thu Feb 28 13:35:54 2019 +0200

    i965,iris/blorp: do not blit 0-sizes
    
    Seems there is no sense in blitting 0-sized sources
    or destinations.
    Additionaly it may cause segfaults for i965.
    
    v2: Function call replaced with inline check
    
    v3: Added check to avoid devision by zero (L. Landwerlin)
    
    v4: Added simillar check for Iris (L. Landwerlin)
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110239
    Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com>
    Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
    Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>


I tested it and test is passing now and don't crash Chromium (previous commit crashed). So closing as fixed.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.