Bug 73418

Summary: OpenCL hangs graphics on CAYMAN
Product: Mesa Reporter: Thomas Rohloff <v10lator>
Component: Drivers/Gallium/r600Assignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium    
Version: 10.0   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: dmesg snippet from mesa 10.0.1
dmesg snippet from mesa 10.1
Possible Fix

Description Thomas Rohloff 2014-01-09 00:40:49 UTC
Created attachment 91722 [details]
dmesg snippet from mesa 10.0.1

When I try to run some OpenCL code from within X the screen freezes, goes black, goes on again while still frozen, ... this repeats a few times till it won't come back again and all I have left is my monitor in standby and my keyboard not reacting to any input.

When I try to run some OpenCL code from a console (Cntrl + Alt + F1) and go back to X while the OpenCL code is still running it's the same as above.

When I try to run some OpenCL code from a console and go back to X after the OpenCL code is finished I have a black monitor with my mouse cursor (I'm even able to move it) for a few seconds, then my desktop is back. While it produces the black screen (with mouse cursor) there's also GPU hang logged to dmesg.

Reproducable with bfgminer or the tests from http://cgit.freedesktop.org/~tstellar/opencl-example/ with mesa 10.0.1 as well as 10.1 (git-646c16a).
Comment 1 Thomas Rohloff 2014-01-09 00:41:16 UTC
Created attachment 91723 [details]
dmesg snippet from mesa 10.1
Comment 2 Tom Stellard 2014-01-13 17:53:48 UTC
Do you run into the same issues if you run OpenCL programs while X isn't running?
Comment 3 Chris 2014-01-14 07:04:56 UTC
- If I run the OpenCL hello_world without starting the X server, it is running fine. No hangs even if I run it multiple times. X starts properly after.

- If I run hello_world within an X session, it hangs ever two tries.

- If I start X server and run it in console, black screen/hang when returning to the X session

With ddd, I have tracked the issue down to the point it hangs:
(gdb) bt
#0  radeon_drm_ws_queue_cs (ws=0x60b2e0, cs=0x7ffff7f6a010) at radeon_drm_winsys.c:555
#1  0x00007ffff293a7e8 in radeon_drm_cs_flush (rcs=0x7ffff7f6a010, flags=2, cs_trace_id=0) at radeon_drm_cs.c:567
#2  0x00007ffff2950f6d in r600_context_flush (ctx=0x6251d0, flags=2) at r600_hw_context.c:356
#3  0x00007ffff2951f3b in r600_flush (ctx=0x6251d0, flags=0) at r600_pipe.c:88
#4  0x00007ffff2952056 in r600_flush_gfx_ring (ctx=0x6251d0, flags=0) at r600_pipe.c:120
#5  0x00007ffff2952034 in r600_flush_from_st (ctx=0x6251d0, fence=0x7fffffffb390, flags=0) at r600_pipe.c:115
#6  0x00007ffff6c0718a in clover::command_queue::flush (this=0x642000) at core/queue.cpp:48
#7  0x00007ffff6c182f2 in clover::hard_event::wait (this=0xaca350) at core/event.cpp:124
#8  0x00007ffff6c60811 in clFinish (d_q=0x642008) at api/event.cpp:268
#9  0x000000000040136c in main (argc=1, argv=0x7fffffffdd58) at hello_world.c:193

If I run line 563 pipe_semaphore_signal(&ws->cs_queued) the system hang.

The calling parameters are in a mesa dev list post.
Comment 4 Tom Stellard 2014-01-14 15:45:00 UTC
Thanks for the information, this is a well known bug on Cayman.  I will try to investigate further.
Comment 5 Tom Stellard 2014-01-24 20:58:29 UTC
Created attachment 92744 [details] [review]
Possible Fix

Does this patch help?
Comment 6 Thomas Rohloff 2014-01-24 23:13:51 UTC
(In reply to comment #5)
> Does this patch help?

Yes, this seems to do the trick. But as it's a work-around I'm unsure about closing this as fixed.
Comment 7 Alex Deucher 2014-01-24 23:15:55 UTC
(In reply to comment #6)
> (In reply to comment #5)
> > Does this patch help?
> 
> Yes, this seems to do the trick. But as it's a work-around I'm unsure about
> closing this as fixed.

I don't think it's a workaround, I think it's actually the way the hw is supposed to be programmed for compute.
Comment 8 Thomas Rohloff 2014-01-25 11:19:00 UTC
(In reply to comment #7)
> I don't think it's a workaround, I think it's actually the way the hw is
> supposed to be programmed for compute.

Oh, it was a bit late yesterday, I thought the patch added the lines containing "Work-around for flushing problems with compute shaders on Cayman" but it removes them. My fault, sorry.
Comment 9 Tom Stellard 2014-01-27 13:26:20 UTC
Let's leave this bug open until the fix is committed upstream.
Comment 10 Tom Stellard 2014-01-27 16:25:46 UTC
Fix committed as d51dbe048afd2131eb3675e9cd868ce73325a61d
Comment 11 chris 2014-02-03 06:34:59 UTC
Worked for me too with dev branch.
Thanks!

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.