Bug 89734 - GL_AMD_pinned_memory extension causing a kernel hardlock
Summary: GL_AMD_pinned_memory extension causing a kernel hardlock
Status: RESOLVED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Radeon (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-03-23 21:55 UTC by jdruel
Modified: 2015-04-08 09:44 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Debuging patch. (1018 bytes, patch)
2015-03-24 14:56 UTC, Christian König
no flags Details | Splinter Review
first requested dmesg (79.80 KB, text/plain)
2015-03-25 06:34 UTC, jdruel
no flags Details
dmesg with patch (77.03 KB, text/plain)
2015-03-27 21:17 UTC, jdruel
no flags Details
Possible fix (1.14 KB, patch)
2015-03-30 13:12 UTC, Christian König
no flags Details | Splinter Review
screen flickers with patch (635.83 KB, video/mp4)
2015-04-01 18:39 UTC, jdruel
no flags Details
dmesg with patch (72.51 KB, text/plain)
2015-04-01 18:42 UTC, jdruel
no flags Details
xorg with patch (565.62 KB, text/plain)
2015-04-03 05:36 UTC, jdruel
no flags Details
dmesg with patch (70.29 KB, text/plain)
2015-04-03 05:39 UTC, jdruel
no flags Details

Description jdruel 2015-03-23 21:55:59 UTC
Kubuntu 15.04, kernel 4.0rc3, amd hd7850 with recent (from ppa oibaf) open source drivers. Dolphin emulator runs fine except that I have a complete freeze when I exit. I have to reboot the computer (although it seems to be still running as I can remotely log with ssh). I tried under kde,  then under icewm, to see if it was the compositor's fault. No change. Windowed or fullscreen: same freeze. I tried to stop Dolphin with the interface buttons: same freeze.

kubuntu 15.04, i7 4770k non overclocked, amd hd7850 on open source drivers.

dmesg log is here:
http://pastebin.ca/2962286

I reported the bug to the Dolphin team. Told me that "Basically it's AMD's GL_AMD_pinned_memory extension causing a kernel hardlock." and that I have to report here.
Comment 1 Michel Dänzer 2015-03-24 02:25:52 UTC
Please attach the dmesg output here directly.

Christian, any ideas?
Comment 2 Christian König 2015-03-24 09:04:03 UTC
(In reply to Michel Dänzer from comment #1)
> Please attach the dmesg output here directly.
> 
> Christian, any ideas?

Not of hand. Looks like we don't handle something correctly on tearing this down, but I'm not sure what it is.

Going to take a look at it.
Comment 3 Christian König 2015-03-24 14:56:52 UTC
Created attachment 114584 [details] [review]
Debuging patch.

The only possible cause I can come up with is that we try to free the sg table twice.

Please apply the attached debugging patch and report back with the resulting dmesg.

Thanks in advance,
Christian.
Comment 4 jdruel 2015-03-25 06:34:55 UTC
Created attachment 114609 [details]
first requested dmesg

Here's the first requested dmesg output.
Applying the attached debugging patch is above my skills, unfortunately.
Don't know if you can create a binary so that I can test (or give me a procedure on how to apply the patch).
Sorry for too low technical skills guys.
The dolphin version tested is 5921 (but same pb on older revisions). Freeze happens on the exit of Mario Kart and Donkey Kong Country Wii (I didn't test more as if seems independant of the game).
Comment 5 Christian König 2015-03-25 10:43:29 UTC
(In reply to poub365-bugzilla from comment #4)
> Created attachment 114609 [details]
> first requested dmesg
> 
> Here's the first requested dmesg output.
> Applying the attached debugging patch is above my skills, unfortunately.
> Don't know if you can create a binary so that I can test (or give me a
> procedure on how to apply the patch).
> Sorry for too low technical skills guys.
> The dolphin version tested is 5921 (but same pb on older revisions). Freeze
> happens on the exit of Mario Kart and Donkey Kong Country Wii (I didn't test
> more as if seems independant of the game).

Unfortunately we usually don't have time to provide binary packages for every small patch we make, sorry.

You could try following one of the tutorials on the net how to compile your own kernel with kbuntu. If you managed to do so applying the patch on top of it is trivial.
Comment 6 jdruel 2015-03-27 21:17:25 UTC
Created attachment 114670 [details]
dmesg with patch
Comment 7 jdruel 2015-03-27 21:22:06 UTC
This is what I did (main commands)
git clone git://kernel.ubuntu.com/ubuntu/ubuntu-vivid.git

patch -p1 </home/jo/internet/Downloads/mesa/0001-WIP-userptr-debuging-patch.patch 

fakeroot debian/rules clean
fakeroot debian/rules binary-headers binary-generic

Installed the new kernel (3.19) and rebooted on it. Same freeze on the exit of Dolphin. Here's the new dmesg log. Hope I did it right.
Comment 8 Christian König 2015-03-30 10:12:33 UTC
(In reply to poub365-bugzilla from comment #7)
> Installed the new kernel (3.19) and rebooted on it. Same freeze on the exit
> of Dolphin. Here's the new dmesg log. Hope I did it right.

Yeah, thanks! That's exactly what I needed.

And my initial suspicion was right, for some reason we try to free the SG table twice:

[  141.833287] [drm:radeon_ttm_backend_unbind [radeon]] *ERROR* ttm->sg = ffff880222930e00, ttm->sg->sgl = ffff8802238bd200
...
[  141.898159] [drm:radeon_ttm_backend_unbind [radeon]] *ERROR* ttm->sg = ffff880222930e00, ttm->sg->sgl =           (null)

Which then obviously causes problems.

For a quick fix we could try to double check if we haven't already freed the table, but I think the underlying problem is that we try to free the BO twice as well.

@Michel any ideas?
Comment 9 Christian König 2015-03-30 13:12:12 UTC
Created attachment 114727 [details] [review]
Possible fix

Please test the attached patch it should fix your issue for now.
Comment 10 Michel Dänzer 2015-03-31 01:52:11 UTC
(In reply to Christian König from comment #8)
> @Michel any ideas?

I'm afraid not. :(
Comment 11 jdruel 2015-04-01 18:39:33 UTC
Created attachment 114820 [details]
screen flickers with patch

I tried the new patch.
I can exit now without freezing but I have a lot of artifacts (see video) in the game but also on desktop (kde with opengl): the screen flickers a lot.
The dmesg will follow.
Comment 12 jdruel 2015-04-01 18:42:26 UTC
Created attachment 114821 [details]
dmesg with patch

No idea if that helps, but just in case.
Comment 13 Michel Dänzer 2015-04-02 01:03:42 UTC
(In reply to poub365-bugzilla from comment #11)
> I can exit now without freezing but I have a lot of artifacts (see video) in
> the game but also on desktop (kde with opengl): the screen flickers a lot.

Unless setting the environment variable

 MESA_EXTENSION_OVERRIDE='-GL_AMD_pinned_memory'

avoids that problem, it's probably a separate issue that needs to be tracked in its own report, including the corresponding Xorg.0.log file.
Comment 14 jdruel 2015-04-03 05:36:44 UTC
Created attachment 114841 [details]
xorg with patch
Comment 15 jdruel 2015-04-03 05:39:17 UTC
Created attachment 114842 [details]
dmesg with patch

Very strange: I booted on the patch kernel today to get a Xlog. No flickering on desktop neither on fullscreen game. I didn't upgrade any component since last time. Dolphin is working fine from my quick test.

So, I can now exit dolphin without freezing. I get this message on the console:

The program 'dolphin-emu' received an X Window System error.
This probably reflects a bug in the program.
The error was 'BadWindow (invalid Window parameter)'.
  (Details: serial 8835 error_code 3 request_code 38 minor_code 0)
  (Note to programmers: normally, X errors are reported asynchronously;
   that is, you will receive the error a while after causing it.
   To debug your program, run it with the --sync command line
   option to change this behavior. You can then get a meaningful
   backtrace from your debugger if you break on the gdk_x_error() function.)
terminate called without an active exception
Abandon (core dumped)

I attach my dmesg and xorg.log just in case.
Comment 16 Christian König 2015-04-06 13:54:19 UTC
(In reply to poub365-bugzilla from comment #15)
> Created attachment 114842 [details]
> dmesg with patch
> 
> Very strange: I booted on the patch kernel today to get a Xlog. No
> flickering on desktop neither on fullscreen game. I didn't upgrade any
> component since last time. Dolphin is working fine from my quick test.

Mhm, could be an uninitialized variable or something like this. Not sure off hand. But if it works fine now it's unlikely that this is a driver problem.

> 
> So, I can now exit dolphin without freezing. I get this message on the
> console:
> 
> The program 'dolphin-emu' received an X Window System error.
> This probably reflects a bug in the program.
> The error was 'BadWindow (invalid Window parameter)'.
>   (Details: serial 8835 error_code 3 request_code 38 minor_code 0)
>   (Note to programmers: normally, X errors are reported asynchronously;
>    that is, you will receive the error a while after causing it.
>    To debug your program, run it with the --sync command line
>    option to change this behavior. You can then get a meaningful
>    backtrace from your debugger if you break on the gdk_x_error() function.)
> terminate called without an active exception
> Abandon (core dumped)
> 
> I attach my dmesg and xorg.log just in case.

That's a completely different issue, not even sure if it is a driver problem at all.
Comment 17 Daniel Stone 2015-04-06 14:07:16 UTC
(In reply to Christian König from comment #16)
> (In reply to poub365-bugzilla from comment #15)
> > The program 'dolphin-emu' received an X Window System error.
> > This probably reflects a bug in the program.
> > The error was 'BadWindow (invalid Window parameter)'.
> >   (Details: serial 8835 error_code 3 request_code 38 minor_code 0)
> >   (Note to programmers: normally, X errors are reported asynchronously;
> >    that is, you will receive the error a while after causing it.
> >    To debug your program, run it with the --sync command line
> >    option to change this behavior. You can then get a meaningful
> >    backtrace from your debugger if you break on the gdk_x_error() function.)
> > terminate called without an active exception
> > Abandon (core dumped)
> > 
> > I attach my dmesg and xorg.log just in case.
> 
> That's a completely different issue, not even sure if it is a driver problem
> at all.

It's not - request #38 is XQueryPointer, which GL drivers don't do.
Comment 18 Christian König 2015-04-08 09:44:45 UTC
In this case we can probably close this bug report, cause the original issue is fixed.

If you find another issue which could be cause by the driver stack feel free to open up a new bug.

Thanks for the help,
Christian.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.