Bug 94100 - [HSW] compute indirect dispatch with 0 work groups causes gpu hang
Summary: [HSW] compute indirect dispatch with 0 work groups causes gpu hang
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: Ian Romanick
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-02-11 19:47 UTC by Ilia Mirkin
Modified: 2016-02-17 18:13 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
dirty patch to conditionally terminate batch on hsw (1.26 KB, text/plain)
2016-02-13 01:51 UTC, Ilia Mirkin
Details

Description Ilia Mirkin 2016-02-11 19:47:59 UTC
From the ARB_compute_shader spec:

        The command

        void DispatchComputeIndirect(intptr indirect);

    is equivalent (assuming no errors are generated) to calling
    DispatchCompute with <num_groups_x>, <num_groups_y> and <num_groups_z>
    initialized with the three uint values contained in the buffer currently
    bound to the DISPATCH_INDIRECT_BUFFER binding at an offset, in basic
    machine units, specified by <indirect>.  
    ...
    If any of <num_groups_x>, <num_groups_y> or <num_groups_z>
    is greater than MAX_COMPUTE_WORK_GROUP_COUNT for the corresponding
    dimension then the results are undefined.

And earlier there:

"If the work group count in any dimension is zero, no work groups are dispatched."

I suspect that GLES 3.1 says something similar, as there is a dEQP test that exercises this:

dEQP-GLES31.functional.compute.indirect_dispatch.upload_buffer.empty_command

Running it on a HSW with kernel 4.4.1 results in a GPU hang:

[17904.184491] [drm] stuck on render ring
[17904.185787] [drm] GPU HANG: ecode 7:0:0x8fd0ffff, in deqp-gles31 [17164], reason: Ring hung, action: reset
[17904.185861] ------------[ cut here ]------------
[17904.185880] WARNING: CPU: 2 PID: 15876 at drivers/gpu/drm/i915/intel_display.c:11289 intel_mmio_flip_work_func+0x37d/0x3c0()
[17904.185883] WARN_ON(__i915_wait_request(mmio_flip->req, mmio_flip->crtc->reset_counter, false, NULL, &mmio_flip->i915->rps.mmioflips))
[17904.185886] Modules linked in:
[17904.185890]  cfg80211 bcma x86_pkg_temp_thermal
[17904.185900] CPU: 2 PID: 15876 Comm: kworker/2:0 Tainted: G        W       4.4.1 #6
[17904.185903] Hardware name: Dell Inc. XPS 8700/0KWVT8, BIOS A08 04/16/2014
[17904.185910] Workqueue: events intel_mmio_flip_work_func
[17904.185914]  ffffffff81c1b398 ffff8801cb457d30 ffffffff813d684f ffff8801cb457d78
[17904.185920]  ffff8801cb457d68 ffffffff810fd966 ffff8801fb7bfa80 ffff8801ac5fa240
[17904.185925]  ffff88022fa94d00 0000000000000080 ffff88022fa99400 ffff8801cb457dc8
[17904.185931] Call Trace:
[17904.185942]  [<ffffffff813d684f>] dump_stack+0x44/0x55
[17904.185951]  [<ffffffff810fd966>] warn_slowpath_common+0x86/0xc0
[17904.185957]  [<ffffffff810fd9ec>] warn_slowpath_fmt+0x4c/0x50
[17904.185963]  [<ffffffff81566d1d>] intel_mmio_flip_work_func+0x37d/0x3c0
[17904.185973]  [<ffffffff81113dcc>] process_one_work+0x14c/0x3d0
[17904.185979]  [<ffffffff8111436b>] worker_thread+0x4b/0x440
[17904.185986]  [<ffffffff81114320>] ? rescuer_thread+0x2d0/0x2d0
[17904.185991]  [<ffffffff81118f09>] kthread+0xc9/0xe0
[17904.185996]  [<ffffffff81118e40>] ? kthread_park+0x60/0x60
[17904.186002]  [<ffffffff818df89f>] ret_from_fork+0x3f/0x70
[17904.186006]  [<ffffffff81118e40>] ? kthread_park+0x60/0x60
[17904.186010] ---[ end trace 1c4e38c670f10909 ]---
[17904.187422] drm/i915: Resetting chip after gpu hang

Please let me know if any additional information is required.
Comment 1 Ilia Mirkin 2016-02-13 01:51:19 UTC
Created attachment 121729 [details]
dirty patch to conditionally terminate batch on hsw

This very very dirty patch looks like does the trick. 0x36 is MI_CONDITIONAL_BATCH_BUFFER_END. This will need a helper to deal with RELOC64 on gen8+ of course. Or perhaps gen8 can deal with 0-sized workgroups directly.
Comment 2 Jordan Justen 2016-02-17 18:13:25 UTC
commit 9a939ebb47a0d37a6b29e3dbb1b20bdc9538a721

    i965/gen7: Use predicated rendering for indirect compute


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.