Bug 97840

Summary: [regression] [tonga] Freeze since new memory manager enabled
Product: DRI Reporter: Mike Lothian <mike>
Component: DRM/AMDgpuAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium CC: ckoenig.leichtzumerken, mike, monk.liu, vedran
Version: XOrg git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Glxinfo
none
Dmesg
none
Working glxinfo
none
Error
none
Glxgears corrupted
none
Dmesg with issues
none
Part #1 of the fix. none

Description Mike Lothian 2016-09-17 11:12:35 UTC
Created attachment 126589 [details]
Glxinfo

Regression since:

drm/amdgpu: add a custom GTT memory manager v2
5b03b7a0f28d68ca4015b1b5ad27cb70641b7ebd

Nothing shows up in the dmesg, 3D games don't start, glxgears kinda starts and the window just shows white

The system then becomes partially unresponsive and slowly becomes totally unresponsive

I managed to capture the freeze in glxinfo - if that helps
Comment 1 Mike Lothian 2016-09-17 11:12:58 UTC
Created attachment 126590 [details]
Dmesg
Comment 2 Mike Lothian 2016-09-17 11:14:02 UTC
Reverting 5b03b7a0f28d68ca4015b1b5ad27cb70641b7ebd fixes this for me, I should also make clear this is a Skylake/Tonga laptop
Comment 3 Mike Lothian 2016-09-17 12:01:43 UTC
The amount of video memory on the card is being reported differently on the two cards

-    Device: AMD TONGA (DRM 3.6.0 / 4.8.0-rc1-agd5f+, LLVM 4.0.0) (0x6921)
+    Device: AMD TONGA (DRM 3.3.0 / 4.8.0-rc6-tip+, LLVM 4.0.0) (0x6921)
     Version: 12.1.0
     Accelerated: yes
-    Video memory: 4062MB
+    Video memory: 4085MB
     Unified memory: no
     Preferred profile: core (0x1)
     Max core profile version: 4.3
@@ -57,18 +57,18 @@
     Max GLES1 profile version: 1.1
     Max GLES[23] profile version: 3.1
 Memory info (GL_ATI_meminfo):
-    VBO free memory - total: 4061 MB, largest block: 4061 MB
-    VBO free aux. memory - total: 16047 MB, largest block: 16047 MB
-    Texture free memory - total: 4061 MB, largest block: 4061 MB
-    Texture free aux. memory - total: 16047 MB, largest block: 16047 MB
-    Renderbuffer free memory - total: 4061 MB, largest block: 4061 MB
-    Renderbuffer free aux. memory - total: 16047 MB, largest block: 16047 MB
+    VBO free memory - total: 4085 MB, largest block: 4085 MB
+    VBO free aux. memory - total: 4093 MB, largest block: 4093 MB
+    Texture free memory - total: 4085 MB, largest block: 4085 MB
+    Texture free aux. memory - total: 4093 MB, largest block: 4093 MB
+    Renderbuffer free memory - total: 4085 MB, largest block: 4085 MB
+    Renderbuffer free aux. memory - total: 4093 MB, largest block: 4093 MB
 Memory info (GL_NVX_gpu_memory_info):
-    Dedicated video memory: 4062 MB
-    Total available memory: 20109 MB
-    Currently available dedicated video memory: 4061 MB
+    Dedicated video memory: 4085 MB
+    Total available memory: 8178 MB
+    Currently available dedicated video memory: 4085 MB

Also it seems glxinfo was freezing printing this line:

GL_ARB_texture_cube_map_array, GL_ARB_texture_env_add,
Comment 4 Mike Lothian 2016-09-17 12:02:02 UTC
Created attachment 126591 [details]
Working glxinfo
Comment 5 Michel Dänzer 2016-09-20 03:41:03 UTC
Mike, have you tested https://patchwork.freedesktop.org/patch/110851/ and possibly the more recent patches on https://patchwork.freedesktop.org/project/amd-xorg-ddx/patches/?submitter=11066 ?
Comment 6 Mike Lothian 2016-09-20 06:02:35 UTC
Created attachment 126643 [details]
Error

I've just tested the latest wip-4.9 branch and now X won't even start
Comment 7 Michel Dänzer 2016-09-20 06:04:34 UTC
(In reply to Mike Lothian from comment #6)
> I've just tested the latest wip-4.9 branch and now X won't even start

Can you bisect which change caused that?
Comment 8 Mike Lothian 2016-09-20 06:11:21 UTC
I'm headed off to work now, will test tonight
Comment 9 Mike Lothian 2016-09-20 17:51:17 UTC
The first bad commit is:

abd0a5ee7a25d108e9d709a4f61ef58754b60919 is the first bad commit
commit abd0a5ee7a25d108e9d709a4f61ef58754b60919
Author: Monk Liu <Monk.Liu@amd.com>
Date:   Wed Sep 14 19:10:33 2016 +0800

    drm/amdgpu:correct smc fw version error
    
    original method get wrong smc fw version.
    
    Signed-off-by: Monk Liu <Monk.Liu@amd.com>
    Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

Reverting this one allows X to start again
Comment 10 Mike Lothian 2016-09-20 17:53:37 UTC
Even with this reverted things aren't working correctly if I do a DRI_PRIME=1 glxgears the image is highly messed up
Comment 11 Mike Lothian 2016-09-20 17:55:14 UTC
Created attachment 126667 [details]
Glxgears corrupted
Comment 12 Mike Lothian 2016-09-20 17:57:06 UTC
Created attachment 126668 [details]
Dmesg with issues

This is the dmesg, when the system is still semi-responsive
Comment 13 Michel Dänzer 2016-09-21 02:07:33 UTC
Monk, any ideas? (See comment 6 and comment 9)
Comment 14 Christian König 2016-09-21 07:41:39 UTC
Created attachment 126692 [details] [review]
Part #1 of the fix.

First part of the fix. Might help with the white window, but there are clearly more bugs to fix.

Still working on this.
Comment 15 Mike Lothian 2016-11-03 13:12:55 UTC
Might as well close this now

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.