Summary: | [bisected: 3d02b7] VM fault with kernel 4.7-rc1 on Alien: Isolation | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | Alexandre Demers <alexandre.f.demers> | ||||||
Component: | DRM/Radeon | Assignee: | Default DRI bug account <dri-devel> | ||||||
Status: | RESOLVED FIXED | QA Contact: | |||||||
Severity: | normal | ||||||||
Priority: | medium | CC: | alexdeucher, bas | ||||||
Version: | DRI git | ||||||||
Hardware: | Other | ||||||||
OS: | All | ||||||||
Whiteboard: | |||||||||
i915 platform: | i915 features: | ||||||||
Attachments: |
|
So this is the commit that exposes the bug. 3d02b7fee9c3ece1746f5b06c4143b511383fc6b is the first bad commit commit 3d02b7fee9c3ece1746f5b06c4143b511383fc6b Author: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Date: Fri Apr 15 02:47:49 2016 +0200 drm/radeon: Allow setting shader registers using DMA/COPY packet3 on SI. Mesa uses a COPY_DATA packet to copy the grid size for indirect dispatches into COMPUTE_USER_DATA_*. Setting those registers with a SET_SH_REG packet is allowed, not allowing them with other packets seems like an oversight. v2: Clarify commit message. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> :040000 040000 ec4d58f4c4e5bd746474776c852a7c70763183a7 8ba67a00f061b454d1600838bf620e1e937e9461 M drivers Still a problem with today's latest mesa and kernel 4.7-rc2. Can you provide an apitrace that reproduces the VM faults? (In reply to Nicolai Hähnle from comment #3) > Can you provide an apitrace that reproduces the VM faults? I'll do that later. However, it may end up as a big apitrace since it takes a lot of time before being able to actually play de game, which is when the VM faults appear. I'll figure out a way of putting a link. Thanks, that would be much appreciated. Most people tend to use Google Drive - I've downloaded GB-sized traces from there. And here is the trace that you can download from Google Drive (I hope it works correctly): https://drive.google.com/open?id=0Bw_tZdWsNa4BV0VKMGVaeFBDaEE Thanks! No VM faults here on Tonga, so this may be specific to SI. Do you get VM faults in dmesg when you play the trace back on your system? (In reply to Nicolai Hähnle from comment #7) > Thanks! No VM faults here on Tonga, so this may be specific to SI. Do you > get VM faults in dmesg when you play the trace back on your system? Yes it does, I just tested it with the trace I've shared with you. (In reply to Nicolai Hähnle from comment #7) > Thanks! No VM faults here on Tonga, so this may be specific to SI. Do you > get VM faults in dmesg when you play the trace back on your system? Anything I can provide you with? Any specific test or steps? No, thank you. I can reproduce this on a Verde, so it does seem to be SI-specific (perhaps also CI) or perhaps a radeon vs. amdgpu issue. I've seen a VM fault even with GALLIUM_DDEBUG=800 (i.e. frequent flushes) happen at 901138 @3 glDispatchCompute(num_groups_x = 128, num_groups_y = 2, num_groups_z = 1) Created attachment 124514 [details] [review] preliminary patch Problem understood - we're generating bad shader code - though I still need to double-check all the possible corner cases. In the meantime, the attached patch for LLVM should fix Alien: Isolation. Fixed in LLVM r272761. (In reply to Nicolai Hähnle from comment #12) > Fixed in LLVM r272761. Thank you, I'll test it later today. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 124312 [details] VM faults in dmesg triggered by Alien: Isolation Hi, I've recently begun playing Alien: Isolation now that the compute shaders are available when combining latest Mesa with a 4.7-git kernel. However, my computer freezes after sometime. Poking dmesg got me a repeating VM fault which goes on as long as I'm in the game (playing, not the menu or while loading). Setup: GPU -> R9 280X Kernel -> 4.7-rc1 Distribution -> Archlinux 64 Mesa, drm, ddx -> using latest code from git repositories I'll try with a 4.6 kernel to see if this bug was introduced in or exposed by the 4.7 branch. Attaching dmesg