Bug 102276 - System randomly freezes, only fixed by power off and boot.
Summary: System randomly freezes, only fixed by power off and boot.
Status: RESOLVED MOVED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/radeonsi (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium critical
Assignee: Default DRI bug account
QA Contact: Default DRI bug account
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-08-18 02:41 UTC by laurie
Modified: 2019-09-25 17:59 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
output of dmesg (69.43 KB, text/plain)
2017-08-18 04:04 UTC, laurie
Details
glxinfo output after installing Mesa-demo-x (78.87 KB, text/plain)
2017-08-18 04:07 UTC, laurie
Details
glxinfo - 2nd try (101.25 KB, text/plain)
2017-08-18 05:36 UTC, laurie
Details

Description laurie 2017-08-18 02:41:17 UTC
System randomly freezes or locks up, sometimes completely but more often the mouse pointer can be moved with nothing else working.this bug appears similar to bug ID 100306 except that the computer cannot even be shutdown until power has been removed by plug or main power on/off switch and then rebooting. The freezes occur randomly, at least once per week with the odd exception of 4 to 5 over a 24 hour period then back to one per week.
My operating system is Opensuse Tumbleweed 64 bit, running KDE Plasma 5.10.4, KDE frameworks 5.36.0, QT 5.9.1, Kernel 4.11.8.2
 with following hardware:
CPU - AMD Ryzen 5 1400
Motherboard - MSI X370 Gaming Pro Carbon
RAM - 16GB Crucial Premium Memory (2 x 8GB ddr4 2133 UDIMM)
video - Sapphire Nitro Radeon RX 460
HDD - samsung spinpoint 1TB HD103SJ & Weston digital 2TB WD2003FZEX
Monitors - Benq  27" M2700HD & dell 22" P2210t.
Power supply - Antec TP750C
There are four fans running and temperatures never exceed 35⚬C (video card in this case) even when running 3D modeling or video editing programs.

To check hardware problems, the motherboard, ram, power supply  & cpu now installed have been swapped from an Asus M5A99FX-PRO motherboard, AMD AM3 cpu, 16 GB Kingston hyperX fury ram, ATI Radeon HD5770 video & Thermaltake 500W power supply with no change in the screen freezing situation.
Comment 1 Michel Dänzer 2017-08-18 03:37:30 UTC
Please attach the Xorg log file and the output of dmesg and glxinfo.

This is more likely a Mesa or kernel issue than a Xorg driver one.
Comment 2 laurie 2017-08-18 04:04:43 UTC
Created attachment 133589 [details]
output of dmesg
Comment 3 laurie 2017-08-18 04:07:03 UTC
Created attachment 133590 [details]
glxinfo output after installing Mesa-demo-x
Comment 4 Michel Dänzer 2017-08-18 05:06:10 UTC
(In reply to laurie from comment #3)
> Created attachment 133590 [details]
> glxinfo output after installing Mesa-demo-x

This is truncated, missing the beginning of the glxinfo output.
Comment 5 laurie 2017-08-18 05:36:58 UTC
Created attachment 133591 [details]
glxinfo - 2nd try

glxinfo using gnome terminal instead of Konsole
Comment 6 Mathieu Belanger 2017-09-11 20:56:38 UTC
I might have the same bug

I does not append with Mesa git from mid august but a version from beginning of September do it.

System boot, splash screen turn on, I login in KDE, sometime it fail and return to login (X crash).

With it work, doing stuff that require 3D card will sometime work or do graphics corruption and crash the system. When games work, it usually crash after not so long.. Even opening the KDE menu can trigger the crash, when that append, the mouse usually work but nothing else (can't toggle numlock for example). Using magic keys usually work to reboot the system.

Kernel 4.12
DDX GIT
Mesa GIT
libdrm from 22 Jul (might be old but mesa don't complain when I build it.

Video card is a Polaris 10 card (RX480)


/var/log/message get flooded by these when I open an opengl context:
Sep 11 15:26:44 tanith kernel: amdgpu 0000:0c:00.0: GPU fault detected: 146 0x0b02b714
Sep 11 15:26:44 tanith kernel: amdgpu 0000:0c:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00106360
Sep 11 15:26:44 tanith kernel: amdgpu 0000:0c:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0D0B7014
Sep 11 15:26:44 tanith kernel: amdgpu 0000:0c:00.0: VM fault (0x14, vmid 6) at page 1074016, write from 'SDM0' (0x53444d30) (183)
Sep 11 15:26:44 tanith kernel: amdgpu 0000:0c:00.0: GPU fault detected: 146 0x0b02b714
Sep 11 15:26:44 tanith kernel: amdgpu 0000:0c:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00106362
Sep 11 15:26:44 tanith kernel: amdgpu 0000:0c:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0D0B7014
Sep 11 15:26:44 tanith kernel: amdgpu 0000:0c:00.0: VM fault (0x14, vmid 6) at page 1074018, write from 'SDM0' (0x53444d30) (183)
Sep 11 15:26:44 tanith kernel: amdgpu 0000:0c:00.0: GPU fault detected: 146 0x0b02b714
Sep 11 15:26:44 tanith kernel: amdgpu 0000:0c:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00106365
Sep 11 15:26:44 tanith kernel: amdgpu 0000:0c:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0D0B7014
Sep 11 15:26:44 tanith kernel: amdgpu 0000:0c:00.0: VM fault (0x14, vmid 6) at page 1074021, write from 'SDM0' (0x53444d30) (183)
Sep 11 15:26:44 tanith kernel: amdgpu 0000:0c:00.0: GPU fault detected: 146 0x0b02b714
Sep 11 15:26:44 tanith kernel: amdgpu 0000:0c:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00106367
Sep 11 15:26:44 tanith kernel: amdgpu 0000:0c:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0D0B7014
Sep 11 15:26:44 tanith kernel: amdgpu 0000:0c:00.0: VM fault (0x14, vmid 6) at page 1074023, write from 'SDM0' (0x53444d30) (183)
Sep 11 15:26:44 tanith kernel: amdgpu 0000:0c:00.0: GPU fault detected: 146 0x0b0ab714
Sep 11 15:26:44 tanith kernel: amdgpu 0000:0c:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00106369
Sep 11 15:26:44 tanith kernel: amdgpu 0000:0c:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0D0B7014
Sep 11 15:26:44 tanith kernel: amdgpu 0000:0c:00.0: VM fault (0x14, vmid 6) at page 1074025, write from 'SDM0' (0x53444d30) (183)
Sep 11 15:26:44 tanith kernel: amdgpu 0000:0c:00.0: GPU fault detected: 146 0x0b0ab714
Sep 11 15:26:44 tanith kernel: amdgpu 0000:0c:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0010636B
Sep 11 15:26:44 tanith kernel: amdgpu 0000:0c:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0D0B7014
Sep 11 15:26:44 tanith kernel: amdgpu 0000:0c:00.0: VM fault (0x14, vmid 6) at page 1074027, write from 'SDM0' (0x53444d30) (183)
Sep 11 15:26:44 tanith kernel: amdgpu 0000:0c:00.0: GPU fault detected: 146 0x0b0ab714
Sep 11 15:26:44 tanith kernel: amdgpu 0000:0c:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0010636D
Sep 11 15:26:44 tanith kernel: amdgpu 0000:0c:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0D0B7014
Sep 11 15:26:44 tanith kernel: amdgpu 0000:0c:00.0: VM fault (0x14, vmid 6) at page 1074029, write from 'SDM0' (0x53444d30) (183)
Sep 11 15:26:44 tanith kernel: amdgpu 0000:0c:00.0: GPU fault detected: 146 0x0b0ab714
Sep 11 15:26:44 tanith kernel: amdgpu 0000:0c:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0010636F
Sep 11 15:26:44 tanith kernel: amdgpu 0000:0c:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0D0B7014
Sep 11 15:26:44 tanith kernel: amdgpu 0000:0c:00.0: VM fault (0x14, vmid 6) at page 1074031, write from 'SDM0' (0x53444d30) (183)
Sep 11 15:26:44 tanith kernel: amdgpu 0000:0c:00.0: GPU fault detected: 146 0x0b12b714
Sep 11 15:26:44 tanith kernel: amdgpu 0000:0c:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00106372
Sep 11 15:26:44 tanith kernel: amdgpu 0000:0c:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0D0B7014
Sep 11 15:26:44 tanith kernel: amdgpu 0000:0c:00.0: VM fault (0x14, vmid 6) at page 1074034, write from 'SDM0' (0x53444d30) (183)
Sep 11 15:26:44 tanith kernel: amdgpu 0000:0c:00.0: GPU fault detected: 146 0x0b12b714
Sep 11 15:26:44 tanith kernel: amdgpu 0000:0c:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00106374
Sep 11 15:26:44 tanith kernel: amdgpu 0000:0c:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0D0B7014
Sep 11 15:26:44 tanith kernel: amdgpu 0000:0c:00.0: VM fault (0x14, vmid 6) at page 1074036, write from 'SDM0' (0x53444d30) (183)
Comment 7 Michel Dänzer 2017-09-12 07:43:44 UTC
(In reply to Mathieu Belanger from comment #6)
> I might have the same bug
> 
> I does not append with Mesa git from mid august but a version from beginning
> of September do it.

[...]

> Kernel 4.12

If that means amd-staging-4.12, you're probably running into bug 102500.
Comment 8 Mathieu Belanger 2017-09-14 19:41:10 UTC
(In reply to Michel Dänzer from comment #7)
> (In reply to Mathieu Belanger from comment #6)
> > I might have the same bug
> > 
> > I does not append with Mesa git from mid august but a version from beginning
> > of September do it.
> 
> [...]
> 
> > Kernel 4.12
> 
> If that means amd-staging-4.12, you're probably running into bug 102500.

Yes, I did fix the one I wrote by switching the 4.13-rc5 (amd-staging-next, I think)

Now I might have got similar to OP, "randomly" crash and the mouse was still alive. After about 5 minutes X suicided and I was able to get the message log and I saw a ton of:

[drm:amdgpu_gem_object_create [amdgpu]] *ERROR* Failed to allocate GEM object (131072, 6, 131072, -12)

That bug was also not present with 4.12/mid august mesa.
Comment 9 Mathieu Belanger 2017-09-14 19:45:40 UTC
I forgot to mention, while the crash occured, the mouse was working but everything else was hang, including magic keys was doing nothing.

After X felt, I tried to login and I got an error about missing OpenGL 2 from KDE so I did a full reboot, here the end of my message log.

[loop]
Sep 14 13:57:56 uk-dev kernel: [drm:amdgpu_gem_object_create [amdgpu]] *ERROR* Failed to allocate GEM object (33554432, 6, 131072, -12)
Sep 14 13:57:56 uk-dev kernel: [TTM] Out of kernel memory
Sep 14 13:57:56 uk-dev kernel: [TTM] Out of kernel memory
[/loop]
Sep 14 13:57:56 uk-dev kernel: [drm:amdgpu_gem_object_create [amdgpu]] *ERROR* Failed to allocate GEM object (33554432, 6, 131072, -12)
Sep 14 13:57:56 uk-dev kernel: [drm] Atomic commit: RESET. crtc id 0:[ffff880405ebb800]
Sep 14 13:57:56 uk-dev kernel: [drm] dc_commit_state: 0 streams
Sep 14 13:57:56 uk-dev polkitd[3633]: Unregistered Authentication Agent for unix-session:/org/freedesktop/ConsoleKit/Session1 (system bus name :1.18, object path /org/kde/PolicyKit1/AuthenticationAgent, locale en_US.utf8) (disconnected from bus)
Sep 14 13:57:56 uk-dev kernel: [drm] Atomic commit: SET crtc id 0: [ffff880405ebb800]
Sep 14 13:57:56 uk-dev kernel: [drm] dc_commit_state: 1 streams
Sep 14 13:57:56 uk-dev kernel: [drm] core_stream 0x4b321c00: src: 0, 0, 3840, 2160; dst: 0, 0, 3840, 2160, colorSpace:1
Sep 14 13:57:56 uk-dev kernel: [drm] \x09pix_clk_khz: 533250, h_total: 4000, v_total: 2222, pixelencoder:1, displaycolorDepth:2
Sep 14 13:57:56 uk-dev kernel: [drm] \x09sink name: U28E590, serial: 810373197
Sep 14 13:57:56 uk-dev kernel: [drm] \x09link: 0
Sep 14 13:57:56 uk-dev kernel: [drm] dce_get_required_clocks_state: clocks unsupported
Sep 14 13:57:56 uk-dev kernel: [drm] Link: 0 eDP panel mode supported: 0 eDP panel mode enabled: 0 
Sep 14 13:57:56 uk-dev kernel: [drm] [LKTN]\x09[DP][ConnIdx:0] HBR2x4 pass VS=2, PE=0^
Sep 14 13:57:56 uk-dev kernel: [drm] [Mode]\x09[DP][ConnIdx:0] {3840x2160, 4000x2222@533250Khz}^
Sep 14 13:58:57 uk-dev slim[21180]: pam_unix(slim:session): session opened for user destroyfx by destroyfx(uid=0)
[loop]
Sep 14 13:58:57 uk-dev kernel: [TTM] Out of kernel memory
Sep 14 13:58:57 uk-dev kernel: [TTM] Out of kernel memory
Sep 14 13:58:57 uk-dev kernel: [drm:amdgpu_gem_object_create [amdgpu]] *ERROR* Failed to allocate GEM object (131072, 6, 131072, -12)
[/loop]
Comment 10 GitLab Migration User 2019-09-25 17:59:50 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1276.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.