Bug 100620

Summary: [SKL] 48-bit addresses break DOOM
Product: DRI Reporter: Grazvydas Ignotas <notasas>
Component: DRM/IntelAssignee: Jason Ekstrand <jason>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: intel-gfx-bugs, jason
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
screenshot
none
good
none
bad
none
radv screenshot
none
Patch to restrict VkDeviceMemory object size.
none
Patch to restrict VkDeviceMemory object size.
none
Patch to restrict VkDeviceMemory object size.
none
Patch to restrict VkDeviceMemory object size.
none
Patch to fix 48-bit issues with scratch buffers none

Description Grazvydas Ignotas 2017-04-08 17:11:37 UTC
Created attachment 130758 [details]
screenshot

Since "anv: Add support for 48-bit addresses" doom has started misrendering, see the attached screenshot. It's enough to comment out EXEC_OBJECT_SUPPORTS_48B_ADDRESS in anv_bo_init_new() to get it back to normal.

HW: SKL GT2
Comment 1 Jason Ekstrand 2017-04-08 18:00:46 UTC
With no reference as to what it's supposed to look like, that image doesn't help me much I'm afraid.  Mind posting a "good" screenshot so that I can see the difference?
Comment 2 Grazvydas Ignotas 2017-04-08 19:43:37 UTC
Created attachment 130760 [details]
good

The scene choice was rather poor because of dynamic light that adds confusion. So here is a different scene while it's good
Comment 3 Grazvydas Ignotas 2017-04-08 19:45:22 UTC
Created attachment 130761 [details]
bad

..and when it's regressed.

The artifacts flicker all over the place, so it's much more distracting ingame than in this screenshot.
Comment 4 Jason Ekstrand 2017-04-08 22:55:59 UTC
It appears to be some sort of caching or compression issue.  Could you give it a try with INTEL_DEBUG=norbc and see if that makes a difference.  I've requested a copy of the game and will take a look once I get it.

Also, even your "good" image also appears to have corruption near that halo effect it's just less noticable.
Comment 5 Grazvydas Ignotas 2017-04-09 01:07:08 UTC
Created attachment 130764 [details]
radv screenshot

norbc doesn't seem to help.

(In reply to Jason Ekstrand from comment #4)
> I've requested a copy of the game and will take a look once I get it.
Just to save you some time, you'll need wine-staging and to add
"+r_renderapi 1"
to DOOM's launch options, otherwise you can't even get to the options screen to enable Vulkan because of it wanting 4.5 compat GL context. Other useful options are "+com_skipIntroVideo 1 +com_skipKeyPressOnLoadScreens 1". It also needs Dave's "[rfc] spirv: work around doom shaders having load/store to sampler types" from the ML, and then other than hitting some debug asserts it worked fine before the 48-bit patch.

> Also, even your "good" image also appears to have corruption near that halo
> effect it's just less noticable.
I think that one is intended, see the screenshot from radv (same settings, just higher resolution).
Comment 6 Jason Ekstrand 2017-04-11 15:54:24 UTC
Created attachment 130801 [details] [review]
Patch to restrict VkDeviceMemory object size.

Could you please try the attached patch.  It's a bit of a stab in the dark but I think it definitely is a bug that's related to 48-bit and heap sizes.
Comment 7 Jason Ekstrand 2017-04-11 16:04:51 UTC
Created attachment 130802 [details] [review]
Patch to restrict VkDeviceMemory object size.
Comment 8 Jason Ekstrand 2017-04-11 16:07:10 UTC
Created attachment 130803 [details] [review]
Patch to restrict VkDeviceMemory object size.
Comment 9 Jason Ekstrand 2017-04-11 18:13:51 UTC
Created attachment 130805 [details] [review]
Patch to restrict VkDeviceMemory object size.
Comment 10 Grazvydas Ignotas 2017-04-12 22:40:28 UTC
Doesn't help at all, unfortunately.
Comment 11 Grazvydas Ignotas 2017-04-17 00:08:20 UTC
I've been experimenting with this a bit and found some things.

Making anv_AllocateMemory() not set EXEC_OBJECT_SUPPORTS_48B_ADDRESS removes the corruption.  The game calls it exactly 100 times from the start to getting ingame, so adding a counter and only clearing the flag sometimes also works. It seems if the flag is set for 85 or more allocations the corruption appears. It doesn't matter if the flag is cleared for the first 16 or the last 16 calls, both cases avoid the glitch, so there is no specific "bad" allocation, it seems to be related to count (or maybe total size?). Another way to "fix" it is to not set the flag for all other anv_bo_init_new() callers, then anv_AllocateMemory can set the flag for all 100 calls without causing the glitch.

I've also tried setting the flag when the requested size passes certain threshold, but concluded it's not related to the size of some single allocation (for example there is no glitch is the flag is not set for all allocations below 100MB, also no glitch if that comparison is inverted).

The allocations seem to range from 1024 to 134217728 bytes (no larger alloc attempts), it allocates ~1.8GB of memory in total.
Comment 12 Jason Ekstrand 2017-04-22 04:58:08 UTC
Could you please describe your graphical settings?  I'm having trouble reproducing it.  Screenshots of the two graphics settings pages would be the most effective way to describe it completely.
Comment 13 Grazvydas Ignotas 2017-04-22 16:54:55 UTC
It doesn't look like it depends on graphic settings, but I have everything on off/lowest.

I've tested some kernels today, and it looks like it's a kernel issue:
good (no corruption): 4.9.0 4.10.0-rc8+ 
bad: 4.11.0-rc5+ 4.11.0-rc7+ drm-next (6b146270)

If there are no ideas I can bisect it sometime later.
Comment 14 Jason Ekstrand 2017-04-22 22:54:25 UTC
Created attachment 130985 [details] [review]
Patch to fix 48-bit issues with scratch buffers

I was finally able to reproduce it but I needed your same screen resolution and medium settings.  Also, it didn't reproduce 100% reliably.  That said, I think I've found the issue and have attached a patch.  Mind giving it a try?
Comment 15 Grazvydas Ignotas 2017-04-23 00:58:25 UTC
That fixes it, thanks.

For me it's reproducing 100% and not only on that resolution. Only a different kernel made it go away, but I guess it's just luck of how buffers got arranged in memory.
Comment 16 Jason Ekstrand 2017-04-27 09:05:54 UTC
Fixed by the following commit in master:

commit c43b4bc85eddba8bc31665cfee5928bed8343516
Author: Jason Ekstrand <jason.ekstrand@intel.com>
Date:   Sat Apr 22 15:51:01 2017 -0700

    anv: Don't place scratch buffers above the 32-bit boundary
    
    This fixes rendering corruptions in DOOM.  Hopefully, it will also make
    Jenkins a bit more stable as we've been seeing some random failures and
    GPU hangs ever since turning on 48bit.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100620
    Fixes: 651ec926fc1 "anv: Add support for 48-bit addresses"
    Tested-by: Grazvydas Ignotas <notasas@gmail.com>
    Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
    Cc: "17.1" <mesa-stable@lists.freedesktop.org>

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.