Created attachment 91722 [details]
dmesg snippet from mesa 10.0.1
When I try to run some OpenCL code from within X the screen freezes, goes black, goes on again while still frozen, ... this repeats a few times till it won't come back again and all I have left is my monitor in standby and my keyboard not reacting to any input.
When I try to run some OpenCL code from a console (Cntrl + Alt + F1) and go back to X while the OpenCL code is still running it's the same as above.
When I try to run some OpenCL code from a console and go back to X after the OpenCL code is finished I have a black monitor with my mouse cursor (I'm even able to move it) for a few seconds, then my desktop is back. While it produces the black screen (with mouse cursor) there's also GPU hang logged to dmesg.
Reproducable with bfgminer or the tests from http://cgit.freedesktop.org/~tstellar/opencl-example/ with mesa 10.0.1 as well as 10.1 (git-646c16a).
Created attachment 91723 [details]
dmesg snippet from mesa 10.1
Do you run into the same issues if you run OpenCL programs while X isn't running?
- If I run the OpenCL hello_world without starting the X server, it is running fine. No hangs even if I run it multiple times. X starts properly after.
- If I run hello_world within an X session, it hangs ever two tries.
- If I start X server and run it in console, black screen/hang when returning to the X session
With ddd, I have tracked the issue down to the point it hangs:
#0 radeon_drm_ws_queue_cs (ws=0x60b2e0, cs=0x7ffff7f6a010) at radeon_drm_winsys.c:555
#1 0x00007ffff293a7e8 in radeon_drm_cs_flush (rcs=0x7ffff7f6a010, flags=2, cs_trace_id=0) at radeon_drm_cs.c:567
#2 0x00007ffff2950f6d in r600_context_flush (ctx=0x6251d0, flags=2) at r600_hw_context.c:356
#3 0x00007ffff2951f3b in r600_flush (ctx=0x6251d0, flags=0) at r600_pipe.c:88
#4 0x00007ffff2952056 in r600_flush_gfx_ring (ctx=0x6251d0, flags=0) at r600_pipe.c:120
#5 0x00007ffff2952034 in r600_flush_from_st (ctx=0x6251d0, fence=0x7fffffffb390, flags=0) at r600_pipe.c:115
#6 0x00007ffff6c0718a in clover::command_queue::flush (this=0x642000) at core/queue.cpp:48
#7 0x00007ffff6c182f2 in clover::hard_event::wait (this=0xaca350) at core/event.cpp:124
#8 0x00007ffff6c60811 in clFinish (d_q=0x642008) at api/event.cpp:268
#9 0x000000000040136c in main (argc=1, argv=0x7fffffffdd58) at hello_world.c:193
If I run line 563 pipe_semaphore_signal(&ws->cs_queued) the system hang.
The calling parameters are in a mesa dev list post.
Thanks for the information, this is a well known bug on Cayman. I will try to investigate further.
Created attachment 92744 [details] [review]
Does this patch help?
(In reply to comment #5)
> Does this patch help?
Yes, this seems to do the trick. But as it's a work-around I'm unsure about closing this as fixed.
(In reply to comment #6)
> (In reply to comment #5)
> > Does this patch help?
> Yes, this seems to do the trick. But as it's a work-around I'm unsure about
> closing this as fixed.
I don't think it's a workaround, I think it's actually the way the hw is supposed to be programmed for compute.
(In reply to comment #7)
> I don't think it's a workaround, I think it's actually the way the hw is
> supposed to be programmed for compute.
Oh, it was a bit late yesterday, I thought the patch added the lines containing "Work-around for flushing problems with compute shaders on Cayman" but it removes them. My fault, sorry.
Let's leave this bug open until the fix is committed upstream.
Fix committed as d51dbe048afd2131eb3675e9cd868ce73325a61d
Worked for me too with dev branch.