Bug 101976 - glmark2 random blank or background only screen freeze over amdgpu rx550 AMD POLARIS12 due to dpm
Summary: glmark2 random blank or background only screen freeze over amdgpu rx550 AMD P...
Status: RESOLVED MOVED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium major
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-07-30 23:43 UTC by Pablo Estigarribia
Modified: 2019-11-19 08:20 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
journal -ab log (576.40 KB, text/x-log)
2017-07-30 23:43 UTC, Pablo Estigarribia
no flags Details
glmark2 debug 1 (901 bytes, text/plain)
2017-07-30 23:44 UTC, Pablo Estigarribia
no flags Details
glxinfo (66.53 KB, text/x-log)
2017-07-30 23:44 UTC, Pablo Estigarribia
no flags Details

Description Pablo Estigarribia 2017-07-30 23:43:24 UTC
Created attachment 133139 [details]
journal -ab log
Comment 1 Pablo Estigarribia 2017-07-30 23:44:16 UTC
Created attachment 133140 [details]
glmark2 debug 1
Comment 2 Pablo Estigarribia 2017-07-30 23:44:59 UTC
Created attachment 133141 [details]
glxinfo
Comment 3 Pablo Estigarribia 2017-07-30 23:47:42 UTC
fedora 26. 
uname -a
Linux bpowerhome 4.12.4-300.fc26.x86_64 #1 SMP Thu Jul 27 23:09:13 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux


tried with mesa from repositories (17.1.5) and che/mesa che/llvm copr repositories (17.3). (the latests one is the one in the logs). 

trying glmark2 got random blank screens, sometimes with grey (lock screen). 

This seems to be same on wayland or xorg session.
Comment 4 Pablo Estigarribia 2017-07-30 23:50:00 UTC
maybe this is the error (seen on journal -ab log): 

jul 30 20:35:22 bpowerhome kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x0c984802
jul 30 20:35:22 bpowerhome kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00154D93
jul 30 20:35:22 bpowerhome kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x08024001
jul 30 20:35:22 bpowerhome kernel: amdgpu 0000:01:00.0: VM fault (0x01, vmid 4) at page 1396115, read from 'TC0' (0x54433000) (36)
Comment 5 Pablo Estigarribia 2017-08-03 03:07:11 UTC
After reading many many many pages around the web and trying everything I could try I have got some config that doesn't crash with white/blank screen or background screen. 

command tested: 

MESA_GL_VERSION_OVERRIDE=2.1 MESA_GLES_VERSION_OVERRIDE=2.0 MESA_NO_ERROR=1 MESA_EXTENSION_MAX_YEAR=2014 glmark2 

Looks like there is no guide for people new in debugging mesa3d, so I had to spend lot of time looking on the web and trying to get tips to know what I can try here. 

debugging tips doesn't helps for newcomers to mesa3d debuggin: https://www.mesa3d.org/debugging.html 

First tried with some vars shown in: 

https://launchpad.net/~paulo-miguel-dias/+archive/ubuntu/mesa

I was also looking in the information shown here: 

https://www.x.org/wiki/RadeonFeature/#index12h2

Then finally used first test with: 

https://www.mesa3d.org/envvars.html

I have put the first command I tried, but still don't know what should I really use. 

Probably the issue is with MESA_GL or MESA_GLES not fully implemented for the version automatically recognized for the driver (from glxinfo not using the vars to force the version): 

    Max core profile version: 4.5
    Max compat profile version: 3.0
    Max GLES1 profile version: 1.1
    Max GLES[23] profile version: 3.1

Is there any documentation that could help me assign the bug-free GL version should I use? 

I will be trying different versions until get the bigger and will comment again.

I got blank screen also watching youtube videos, it seems to be some random bug that I couldn't identify yet but hope this information could help some other more familiar with mesa3d could.
Comment 6 Pablo Estigarribia 2017-08-03 03:08:17 UTC
I forgot to add info for newcomers: 

You can also put these vars in cat /etc/environment to avoid the issues happening for other applications.
Comment 7 Pablo Estigarribia 2017-08-04 02:05:53 UTC
I have tried now two more combinations (in /etc/environment to stabilize gnome session):

MESA_GL_VERSION_OVERRIDE=3.0
MESA_GL_VERSION_OVERRIDE=3.1 

Up to GL 3.0, everything works fine and stable, glmark2 and gnome session. 

With 3.1 or any other higher glmark2 shows only black content on the window opened when I use MESA_GL_VERSION_OVERRIDE=3.1 glmark2 in same session that I openned using /etc/environment. 

If I change to: 

MESA_GL_VERSION_OVERRIDE=3.1 in /etc/environment, then also gnome session (after login in gdm) got "gdm background" with screen freezed video.
Comment 8 Pablo Estigarribia 2017-08-04 02:20:54 UTC
(In reply to Pablo Estigarribia from comment #7)
> I have tried now two more combinations (in /etc/environment to stabilize
> gnome session):
> 
> MESA_GL_VERSION_OVERRIDE=3.0
> MESA_GL_VERSION_OVERRIDE=3.1 
> 
> Up to GL 3.0, everything works fine and stable, glmark2 and gnome session. 
> 
> With 3.1 or any other higher glmark2 shows only black content on the window
> opened when I use MESA_GL_VERSION_OVERRIDE=3.1 glmark2 in same session that
> I openned using /etc/environment. 
> 
> If I change to: 
> 
> MESA_GL_VERSION_OVERRIDE=3.1 in /etc/environment, then also gnome session
> (after login in gdm) got "gdm background" with screen freezed video.

after new tests I got error also with 3.0, still doing more tests.
Comment 9 Pablo Estigarribia 2017-08-25 02:44:13 UTC
On latest test th glmark2 passed but changing dpm from auto to high with: 

echo "high" > /sys/class/drm/card0/device/power_dpm_force_performance_level

as root. 

glmark2
=======================================================
    glmark2 2014.03
=======================================================
    OpenGL Information
    GL_VENDOR:     X.Org
    GL_RENDERER:   Gallium 0.4 on AMD POLARIS12 (DRM 3.15.0 / 4.12.8-300.fc26.x86_64, LLVM 4.0.0)
    GL_VERSION:    3.0 Mesa 17.1.7
=======================================================
[build] use-vbo=false: FPS: 3486 FrameTime: 0.287 ms
[build] use-vbo=true: FPS: 5095 FrameTime: 0.196 ms
[texture] texture-filter=nearest: FPS: 4927 FrameTime: 0.203 ms
[texture] texture-filter=linear: FPS: 5014 FrameTime: 0.199 ms
[texture] texture-filter=mipmap: FPS: 4899 FrameTime: 0.204 ms
[shading] shading=gouraud: FPS: 4960 FrameTime: 0.202 ms
[shading] shading=blinn-phong-inf: FPS: 4979 FrameTime: 0.201 ms
[shading] shading=phong: FPS: 4945 FrameTime: 0.202 ms
[shading] shading=cel: FPS: 4880 FrameTime: 0.205 ms
[bump] bump-render=high-poly: FPS: 5064 FrameTime: 0.197 ms
[bump] bump-render=normals: FPS: 4824 FrameTime: 0.207 ms
[bump] bump-render=height: FPS: 4688 FrameTime: 0.213 ms
[effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 5140 FrameTime: 0.195 ms
[effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 3688 FrameTime: 0.271 ms
[pulsar] light=false:quads=5:texture=false: FPS: 4306 FrameTime: 0.232 ms
[desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 2557 FrameTime: 0.391 ms
[desktop] effect=shadow:windows=4: FPS: 2273 FrameTime: 0.440 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 557 FrameTime: 1.795 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata: FPS: 779 FrameTime: 1.284 ms
[buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 636 FrameTime: 1.572 ms
[ideas] speed=duration: FPS: 1468 FrameTime: 0.681 ms
[jellyfish] <default>: FPS: 3937 FrameTime: 0.254 ms
[terrain] <default>: FPS: 636 FrameTime: 1.572 ms
[shadow] <default>: FPS: 3587 FrameTime: 0.279 ms
[refract] <default>: FPS: 1310 FrameTime: 0.763 ms
[conditionals] fragment-steps=0:vertex-steps=0: FPS: 4893 FrameTime: 0.204 ms
[conditionals] fragment-steps=5:vertex-steps=0: FPS: 4779 FrameTime: 0.209 ms
[conditionals] fragment-steps=0:vertex-steps=5: FPS: 4807 FrameTime: 0.208 ms
[function] fragment-complexity=low:fragment-steps=5: FPS: 4830 FrameTime: 0.207 ms
[function] fragment-complexity=medium:fragment-steps=5: FPS: 4475 FrameTime: 0.223 ms
[loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 4424 FrameTime: 0.226 ms
[loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 4343 FrameTime: 0.230 ms
[loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 4441 FrameTime: 0.225 ms
=======================================================
                                  glmark2 Score: 3806 
=======================================================


OpenGL vendor string: X.Org
OpenGL renderer string: Gallium 0.4 on AMD POLARIS12 (DRM 3.15.0 / 4.12.8-300.fc26.x86_64, LLVM 4.0.0)
OpenGL core profile version string: 4.5 (Core Profile) Mesa 17.1.7
OpenGL core profile shading language version string: 4.50
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
Comment 10 Pablo Estigarribia 2017-08-25 02:48:38 UTC
also tested game Insurgency that had same problem as glmark2 and it works now! 

seems that dpm is buggy on this radeon card, probably is better to disable it by default or put it on high performance for a more stable usage for users.
Comment 11 Pablo Estigarribia 2017-08-27 13:05:23 UTC
Everything works fine disabling dpm on amdgpu. 

Workaround steps: 

edit /etc/default/grub

add amdgpu.dpm=0 to GRUB_CMDLINE_LINUX=

generate new config: 

sudo grub2-mkconfig -o /boot/grub2/grub.cfg

reboot (to test new permanent config is done. 

You will notice now /sys/class/drm/card0/device/power_dpm_force_performance_level doesn't exist. 

For months I though the problem was somewhere else, and was looking for many combinations without success, without this workaround I had a very annoying experience working, playing videos, whatever, because randomly I had freezed display and had to restart. I couldn't play any game before, now tested with many games and everything is perfect. 

Could be good to apply this workaround by default? so users doesn't experience the annoying bug?
Comment 12 Marcelo "Marc" Ranolfi 2018-04-14 00:22:24 UTC
Kudos for your findings, Pablo Estigarribia.

I made a workaround available in the form of setting DPM to "Low" on system start and calling a script when you need to set it to "High". They can be found at https://gitlab.com/snippets/1709853 and https://gitlab.com/snippets/1709854.

I'm against disabling DPM altogether from the kernel driver code. That'd be far from ideal.
Comment 13 Pablo Estigarribia 2018-04-14 01:05:50 UTC
(In reply to Marcelo "Marc" Ranolfi from comment #12)
> Kudos for your findings, Pablo Estigarribia.
> 
> I made a workaround available in the form of setting DPM to "Low" on system
> start and calling a script when you need to set it to "High". They can be
> found at https://gitlab.com/snippets/1709853 and
> https://gitlab.com/snippets/1709854.
> 
> I'm against disabling DPM altogether from the kernel driver code. That'd be
> far from ideal.

Thanks Marc!

Is great to have some feedback now.

After looking again into the files of the card0, I have noticed that dpm is enabled by default! (I have been confused because the file changed its place).


Now it is here:

cat /sys/class/drm/card0/power/control 
auto

uname -a
Linux powers11 4.16.1-300.fc28.x86_64 #1 SMP Mon Apr 9 15:29:05 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Also their outpus have same files:

/sys/class/drm/card0/
card0-DP-1/     card0-HDMI-A-1/ device/         subsystem/
card0-DVI-D-1/  dev             power/          uevent

So it is really fixed for fedora 28!! (kernel 4.16.x)

I have been using this for weeks and not noticed this bug!

Also tested glmark2 and some game just to see if something crashes.
Comment 14 Pablo Estigarribia 2018-05-18 03:00:54 UTC
Today reinstalled fedora 28 and did an upgrade.

Got 3 times the 'blank' screen just working on gnome session, same behaviour as before. 

So looks like this bug is still open.
Comment 15 Martin Peres 2019-11-19 08:20:51 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/214.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.