Bug 70123 - Freeze caused by 'winsys/radeon: remove cs_queue_empty' commit
Summary: Freeze caused by 'winsys/radeon: remove cs_queue_empty' commit
Status: RESOLVED NOTOURBUG
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/r600 (show other bugs)
Version: git
Hardware: x86 (IA32) Linux (All)
: medium normal
Assignee: mesa-dev
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-10-04 12:30 UTC by Jeff Blake
Modified: 2013-10-16 13:50 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Possible fix (2.68 KB, patch)
2013-10-07 10:15 UTC, Christian König
Details | Splinter Review
Backtrace of compton deadlock (1.35 KB, text/plain)
2013-10-09 14:05 UTC, Jeff Blake
Details
'thread apply all bt' with the commit reverted instead of the patch (1.34 KB, text/plain)
2013-10-12 02:58 UTC, Jeff Blake
Details
Additional workaround. (1.01 KB, patch)
2013-10-12 16:49 UTC, Christian König
Details | Splinter Review
thread apply all bt (1.96 KB, text/plain)
2013-10-14 09:36 UTC, Jeff Blake
Details
Additional workaround. (3.59 KB, text/plain)
2013-10-15 09:56 UTC, Christian König
Details

Description Jeff Blake 2013-10-04 12:30:55 UTC
I was getting the whole wm freeze up immediately after logging into my Openbox systems and had to switch console to get roll back to a previously working state. 

There was nothing in any of the system logfiles and disabling things like conky and compton (the compositor) didn't resolve anything.

I was able to log in via the slim display manager and could see the wallpaper, but there was no openbox menu (not available via a keyboard shortcut either). Ctrl-Alt-Bksp put me back to slim, but an attempt at using slim's console login produced a reponsive but unreadable xterm display (white blocks instead of text).

Through trial and error I traced a regression caused by the 'winsys/radeon: remove cs_queue_empty' commit from Sept 22. Doing a git revert -n 0653c66ef40ac553f91b29bbda7f59f7ce6948fa and recompiling fixed the issue for me.

I'm running debian unstable: 32-bit on a laptop with a Mobility Radeon HD 4530/4570; 64-bit on a desktop with an X850 XT.

I haven't yet had a chance to see if the desktop machine works with the above solution, but both machines were previously in sync and went wrong with the same update.
Comment 1 Jeff Blake 2013-10-06 14:39:43 UTC
I've just tested the workaround on my 64-bit Debian desktop (X850 XT), and reverting 0653c66ef40ac553f91b29bbda7f59f7ce6948fa fixes this too.

I've changed the category from 'Other' to 'Drivers/Gallium/r600' so that the right people see this.
Comment 2 Christian König 2013-10-07 10:15:33 UTC
Created attachment 87229 [details] [review]
Possible fix

Please try the attached patch, it might fix your issue.
Comment 3 Jeff Blake 2013-10-08 10:14:44 UTC
Unfortunately this doesn't fix the issue. When I get a spare moment I'll try the patch on the desktop pc (which has a different card).
Comment 4 Christian König 2013-10-08 12:00:24 UTC
(In reply to comment #3)
> Unfortunately this doesn't fix the issue. When I get a spare moment I'll try
> the patch on the desktop pc (which has a different card).

Please try to attach a gdb to the deadlocked process and provide the output of the following command: "thread apply all bt"

Thanks in advance,
Christian.
Comment 5 Jeff Blake 2013-10-09 14:05:35 UTC
Created attachment 87341 [details]
Backtrace of compton deadlock

Backtrace of compton deadlock

gdb attach <pid of compton>
gdb thread apply all bt
Comment 6 Jeff Blake 2013-10-09 14:11:46 UTC
I should have re-disabled compton after applying your patch, as I can start up without it and crash when I run it. Before the patch disabling compton had no effect and things froze anyway.
Comment 7 Christian König 2013-10-09 14:13:57 UTC
(In reply to comment #5)
> Created attachment 87341 [details]
> Backtrace of compton deadlock

Are you sure that this is the whole output of "thread apply all bt"?

There is only one thread shown and that's the locked up one, but there should also be at least  the command submission thread.
Comment 8 Jeff Blake 2013-10-12 02:58:10 UTC
Created attachment 87498 [details]
'thread apply all bt' with the commit reverted instead of the patch

Sorry for the delay. I've recompiled, reinstalled and double checked everything and there's definitely just one thread showing on 'thread apply all bt'.

If I revert the 'winsys/radeon: remove cs_queue_empty' commit instead of applying the patch then I get 2 threads as per the gdb_compton_good file I've just attached.
Comment 9 Christian König 2013-10-12 16:49:50 UTC
Created attachment 87517 [details] [review]
Additional workaround.

That looks like we are accessing the CS after the winsys has already been destroyed.

Please try the attached workaround on top of the latest mesa code and also compile the driver with debugging symbols (give --enable-debug to configure) and then do the backtrace again.

Thanks,
Christian.
Comment 10 Jeff Blake 2013-10-14 09:36:41 UTC
Created attachment 87588 [details]
thread apply all bt

With that patch things are still freezing up, see the attachment for the gdb output (which still has just one thread).

The command which causes the freeze is :-

compton -b --backend glx --config /dev/null

(The '--config /dev/null' is suggested by compton's maintainer to force everthing to their defaults when trying to troubleshoot; Using my own config file and changing the glx-related config settings within it seems to have no effect on whatever is at fault.)

In my openbox autostart script the following line invokes compton in the background without any problems at all :-

compton --backend glx --config /dev/null &

If the backend is changed to xrender then things run fine whether the -b switch is used or not.

So compton crashes when using the -b switch to daemonise in conjunction with the glx backend, and omitting the switch works around the problem.

Perhaps not unexpectedly, if I revert the following commits then things run fine.

8bc7673ef874faa95d43c255c7fc631c2d2160c0 radeon/winsys: fix handling in radeon_drm_cs_flush v2
0653c66ef40ac553f91b29bbda7f59f7ce6948fa winsys/radeon: remove cs_queue_empty

I'm starting to wonder if this is a bug in compton.
Comment 11 Christian König 2013-10-14 09:47:01 UTC
(In reply to comment #10)
> The command which causes the freeze is :-
> 
> compton -b --backend glx --config /dev/null

Thanks for this, I can reproduce the problem now.

Not sure if that's an issue in compton or not, but it's definately a bit odd.
Comment 12 Christian König 2013-10-15 09:56:38 UTC
Created attachment 87660 [details]
Additional workaround.

It's indeed a specific problem that only happens with comptons "-b" option.

Compton initializes the X connection and GLX context (loads and starts the driver) and then forks into the background. This creates all kinds of problems with out driver infrastructure and I'm not 100% sure if it's allowed or not.

Previously it just worked because of pure coincident and the attached patch is a workaround for at least this problem, but there might be others as well.

Going to dicuss this with the maillinglist.
Comment 13 Christian König 2013-10-16 13:50:40 UTC
Please file a bug report with compton. I'm still not 100% sure, but it indeed looks like forking into the background with a current GLX context is not allowed.

Previously it just worked because of coincident. 

Thanks,
Christian.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.