Summary: | System (seems to) completely freeze when interacting with java swing applications. | ||
---|---|---|---|
Product: | Mesa | Reporter: | Vitaly Ostrosablin <tmp6154> |
Component: | Drivers/Gallium/radeonsi | Assignee: | Default DRI bug account <dri-devel> |
Status: | RESOLVED FIXED | QA Contact: | Default DRI bug account <dri-devel> |
Severity: | major | ||
Priority: | medium | ||
Version: | unspecified | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
dmesg output from faulty session
Xorg.0.log from faulty session Java Swing reproducer application Comparison of reproducer app under java 8 and java 7 with git mesa |
Description
Vitaly Ostrosablin
2016-12-31 13:46:37 UTC
Please attach your xorg log and dmesg output. Created attachment 128721 [details]
dmesg output from faulty session
Here's my dmesg. From these lines:
[ 675.891897] amdgpu 0000:01:00.0: GPU fault detected: 147 0x00004802
[ 675.891899] amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000
[ 675.891900] amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x02048002
[ 675.891902] VM fault (0x02, vmid 1) at page 0, read from 'TC4' (0x54433400) (72)
[ 675.892003] amdgpu 0000:01:00.0: GPU fault detected: 147 0x00004802
[ 675.892004] amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000
[ 675.892006] amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x02048002
It's obvious that something went terribly wrong inside AMD GPU.
Created attachment 128722 [details]
Xorg.0.log from faulty session
This is Xorg log. It doesn't seem that X noticed the fault at all. Moreover, with `ps -e` over ssh I can see X process and all other GUI programs still running just fine. So it seems that only GPU driver has failed, other stuff doesn't seem to be affected.
Created attachment 128723 [details]
Java Swing reproducer application
Here's Java sources of a reproducer mini-application for the issue. Running it and pressing the button results for me in AMD GPU fault, similar to ones I've already attached logs for. Works in 100% cases (2/2) if run under JRE8.
It's most likely a Mesa driver issue. Can you try running Xephyr something like this: GALLIUM_DDEBUG="pipelined 2000" Xephyr :99 -glamor -screen 1024x768 and then run the reproducer app with DISPLAY=:99 . After the hang, a file should appear in ~/ddebug_dumps/. Please attach that file here. Yes, no problem. However, my XOrg was compiled without Xephyr, so I rebuilt it. Unfortunately, I've decided to update mesa to latest commit as well, but it seems that one of recent commits breaks everything (I get an unusable desktop which looks white and can see gray outlines of KDE taskbar, mouse cursor and login password box cursor). So, I had to temporarily revert to 13.0.3 mesa. But there some useful info. First, on 13.0.3 I cannot reproduce fault with reproducer app. Second, reproducer app looks same under 13.0.3 both on Java 7 and Java 8. I found it strange that under Java 7 swing app looks like it should (Metal look & feel) and under Java 8 it looked different (white buttons instead of default metallic). But it appears that this was just a rendering artefact. I will try to get back to working mesa commit and reproduce the problem with Xephyr now. Created attachment 128804 [details]
Comparison of reproducer app under java 8 and java 7 with git mesa
Checked out two days old revision of mesa. Attached screenshot of what I meant about reproducer app.
Will try running it in Xephyr.
Attempted to run reproducer app in Xephyr. It appears exactly like on host with Java 8 in attached screenshot. I.e. with white button. However, clicking the button just adds the text into textarea, as programmed, while doing this directly on host's Xorg hangs the system. I think it's possible that app alone is not enough to reproduce the issue, KDE and it's window manager might be at play here, too. But it's strange, because this seems to occur on adding text to JTextArea, which is rendered by Swing and should be least affected by WM and DE (except for window border, which is absent in Xephyr, since no WM runs there). But if that's mesa-only bug, shouldn't Ctrl+Alt+F1 work? Here GPU appears to have stopped output completely (most likely, fullscreen tty opens, but GPU shows same picture as on moment of freeze). Any chance you can bisect Mesa? (In reply to Vitaly Ostrosablin from comment #8) > But if that's mesa-only bug, shouldn't Ctrl+Alt+F1 work? Here GPU appears to > have stopped output completely (most likely, fullscreen tty opens, but GPU > shows same picture as on moment of freeze). A GPU hang tends to cause the Xorg process to hang as well, which prevents VT switching from working. Yes, will try to bisect mesa. Unfortunately, in looks like I'll have to do that manually, since Gentoo doesn't seem to have bisect tools for portage. So far I can say following initial info: 1) Bug wasn't introduced at least until November 30, 2016. 2) White button artefact doesn't seem to be related to hang. In Nov 30 commit button is white, but pressing it doesn't hang the system. 3) On Dec 20, 2016, hang was already introduced. Further narrowed date range: between Dec 6 and Dec 12. Looks like it broke on Dec 07. There was a lot of radeonsi-related commits, but I had difficulty compiling a working mesa out of them. On Dec 6, there was no bug. No commits on Dec 7 seems to work, they're segfaulting. Then later on Dec 8, mesa can be compiled an started, but issue is already present. Vitaly - commit id's please. Dates are largely meaningless - the default date shown by git has little to do with when the commit made it into a particular tree, even with mesa's rebase policy. 85a3057f651a1c56348f1af18343d9cc0a5c93f3 used to work fine. After that, in at most 3 commits to future from this point something was broken and mesa didn't run (checked on 4c8c13b3568c82e503a10ddcb846b4c96261ec4c). One of commits further in history I tried was 132b69c4edb824c70c98f8937c63e49b04f3adff, which didn't work as well. After it, there was a huge batch of radeonsi commits. c7dc1b010ae581f532240b661cb3d1c82e117e7e is not runnable, too. bd56de88dfb192310f3432a3c0e0ddc3469c6d55 is runnable (probably, was fixed somewhere earlier) and java reproducer app hangs system there. For any commits that you can't test, run git bisect skip Eventually, git bisect will either show the commit which introduced the problem, or the minimal set of candidates. Have successfully updated to latest mesa. Seems like issue was fixed recently. (In reply to Vitaly Ostrosablin from comment #16) > Have successfully updated to latest mesa. Seems like issue was fixed > recently. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.