From the ARB_compute_shader spec: The command void DispatchComputeIndirect(intptr indirect); is equivalent (assuming no errors are generated) to calling DispatchCompute with <num_groups_x>, <num_groups_y> and <num_groups_z> initialized with the three uint values contained in the buffer currently bound to the DISPATCH_INDIRECT_BUFFER binding at an offset, in basic machine units, specified by <indirect>. ... If any of <num_groups_x>, <num_groups_y> or <num_groups_z> is greater than MAX_COMPUTE_WORK_GROUP_COUNT for the corresponding dimension then the results are undefined. And earlier there: "If the work group count in any dimension is zero, no work groups are dispatched." I suspect that GLES 3.1 says something similar, as there is a dEQP test that exercises this: dEQP-GLES31.functional.compute.indirect_dispatch.upload_buffer.empty_command Running it on a HSW with kernel 4.4.1 results in a GPU hang: [17904.184491] [drm] stuck on render ring [17904.185787] [drm] GPU HANG: ecode 7:0:0x8fd0ffff, in deqp-gles31 [17164], reason: Ring hung, action: reset [17904.185861] ------------[ cut here ]------------ [17904.185880] WARNING: CPU: 2 PID: 15876 at drivers/gpu/drm/i915/intel_display.c:11289 intel_mmio_flip_work_func+0x37d/0x3c0() [17904.185883] WARN_ON(__i915_wait_request(mmio_flip->req, mmio_flip->crtc->reset_counter, false, NULL, &mmio_flip->i915->rps.mmioflips)) [17904.185886] Modules linked in: [17904.185890] cfg80211 bcma x86_pkg_temp_thermal [17904.185900] CPU: 2 PID: 15876 Comm: kworker/2:0 Tainted: G W 4.4.1 #6 [17904.185903] Hardware name: Dell Inc. XPS 8700/0KWVT8, BIOS A08 04/16/2014 [17904.185910] Workqueue: events intel_mmio_flip_work_func [17904.185914] ffffffff81c1b398 ffff8801cb457d30 ffffffff813d684f ffff8801cb457d78 [17904.185920] ffff8801cb457d68 ffffffff810fd966 ffff8801fb7bfa80 ffff8801ac5fa240 [17904.185925] ffff88022fa94d00 0000000000000080 ffff88022fa99400 ffff8801cb457dc8 [17904.185931] Call Trace: [17904.185942] [<ffffffff813d684f>] dump_stack+0x44/0x55 [17904.185951] [<ffffffff810fd966>] warn_slowpath_common+0x86/0xc0 [17904.185957] [<ffffffff810fd9ec>] warn_slowpath_fmt+0x4c/0x50 [17904.185963] [<ffffffff81566d1d>] intel_mmio_flip_work_func+0x37d/0x3c0 [17904.185973] [<ffffffff81113dcc>] process_one_work+0x14c/0x3d0 [17904.185979] [<ffffffff8111436b>] worker_thread+0x4b/0x440 [17904.185986] [<ffffffff81114320>] ? rescuer_thread+0x2d0/0x2d0 [17904.185991] [<ffffffff81118f09>] kthread+0xc9/0xe0 [17904.185996] [<ffffffff81118e40>] ? kthread_park+0x60/0x60 [17904.186002] [<ffffffff818df89f>] ret_from_fork+0x3f/0x70 [17904.186006] [<ffffffff81118e40>] ? kthread_park+0x60/0x60 [17904.186010] ---[ end trace 1c4e38c670f10909 ]--- [17904.187422] drm/i915: Resetting chip after gpu hang Please let me know if any additional information is required.
Created attachment 121729 [details] dirty patch to conditionally terminate batch on hsw This very very dirty patch looks like does the trick. 0x36 is MI_CONDITIONAL_BATCH_BUFFER_END. This will need a helper to deal with RELOC64 on gen8+ of course. Or perhaps gen8 can deal with 0-sized workgroups directly.
commit 9a939ebb47a0d37a6b29e3dbb1b20bdc9538a721 i965/gen7: Use predicated rendering for indirect compute
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.