Summary: | amdgpu couldn't resume after suspend | ||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | mikhail.v.gavrilov | ||||||||||||||||||||||||||
Component: | DRM/AMDgpu | Assignee: | Default DRI bug account <dri-devel> | ||||||||||||||||||||||||||
Status: | RESOLVED MOVED | QA Contact: | |||||||||||||||||||||||||||
Severity: | normal | ||||||||||||||||||||||||||||
Priority: | medium | CC: | andrey.grodzovsky, dor.askayo, FD, leio | ||||||||||||||||||||||||||
Version: | XOrg git | ||||||||||||||||||||||||||||
Hardware: | Other | ||||||||||||||||||||||||||||
OS: | All | ||||||||||||||||||||||||||||
Whiteboard: | |||||||||||||||||||||||||||||
i915 platform: | i915 features: | ||||||||||||||||||||||||||||
Attachments: |
|
Created attachment 141226 [details]
system log
From looking into the log seems your system was out of memory in the time of calling suspend. I see a few user mode apps like steam crashing before that, coudl be related. That in turn caused GPU buffers eviction failure during suspend and hence failures after resume. See if you can check your memory status before suspending, try to figure out when memory exhausting problem starts, what use case. Use commands from here to check memory status - https://www.binarytides.com/linux-command-check-memory-usage/ Yep you right. But suspend mode will be totally useless on the computer on which no programs are running. The sence of suspend mode to put the computer to sleep with all running programs, and then wake up and that everything continues to work. Anyway, I see that in swap there was enough space for unloading the full size of RAM. $ free -m total used free shared buff/cache available Mem: 32158 27500 1054 1193 3603 3007 Swap: 65535 7912 57623 $ cat /proc/meminfo MemTotal: 32930572 kB MemFree: 1149372 kB MemAvailable: 3127012 kB Buffers: 28 kB Cached: 3366532 kB SwapCached: 1007320 kB Active: 20999764 kB Inactive: 3531864 kB Active(anon): 19666712 kB Inactive(anon): 2725324 kB Active(file): 1333052 kB Inactive(file): 806540 kB Unevictable: 31468 kB Mlocked: 31468 kB SwapTotal: 67108860 kB SwapFree: 59004668 kB Dirty: 2008 kB Writeback: 0 kB AnonPages: 21151436 kB Mapped: 1888740 kB Shmem: 1222624 kB Slab: 894752 kB SReclaimable: 301996 kB SUnreclaim: 592756 kB KernelStack: 77072 kB PageTables: 405340 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 83574144 kB Committed_AS: 347269980 kB VmallocTotal: 34359738367 kB VmallocUsed: 0 kB VmallocChunk: 0 kB Percpu: 12864 kB HardwareCorrupted: 0 kB AnonHugePages: 2207744 kB ShmemHugePages: 0 kB ShmemPmdMapped: 0 kB CmaTotal: 0 kB CmaFree: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB Hugetlb: 0 kB DirectMap4k: 29582752 kB DirectMap2M: 3901440 kB DirectMap1G: 1048576 kB $ vmstat -s 32930572 K total memory 28110600 K used memory 21000912 K active memory 3542156 K inactive memory 1140784 K free memory 28 K buffer memory 3679160 K swap cache 67108860 K total swap 8103680 K used swap 59005180 K free swap 21926506 non-nice user cpu ticks 1867047 nice user cpu ticks 4336923 system cpu ticks 101407781 idle cpu ticks 547470 IO-wait cpu ticks 452621 IRQ cpu ticks 266687 softirq cpu ticks 0 stolen cpu ticks 40223592 pages paged in 62917184 pages paged out 2325269 pages swapped in 4803989 pages swapped out 2369356089 interrupts 4293312571 CPU context switches 1535402349 boot time 398972 forks Created attachment 141325 [details]
system log (4.19.0-0.rc1.git0.1)
(In reply to mikhail.v.gavrilov from comment #4) > Created attachment 141325 [details] > system log (4.19.0-0.rc1.git0.1) (In reply to mikhail.v.gavrilov from comment #4) > Created attachment 141325 [details] > system log (4.19.0-0.rc1.git0.1) Can you now show memory status after suspend happened and failed ? Can you also try repeat the test with minimal graphics enabled(switch to FB console, sudo xinit) and then repeat the steps to see if this still happens Created attachment 141331 [details]
memory status before
Created attachment 141332 [details]
memory status after
Created attachment 141333 [details]
system log
> Can you now show memory status after suspend happened and failed ? > Can you also try repeat the test with minimal graphics enabled(switch to FB > console, sudo xinit) and then repeat the steps to see if this still happens (In reply to mikhail.v.gavrilov from comment #6) > Created attachment 141331 [details] > memory status before # systemctl suspend (In reply to mikhail.v.gavrilov from comment #7) > Created attachment 141332 [details] > memory status after I make this in FB console but result are same: (In reply to mikhail.v.gavrilov from comment #8) > Created attachment 141333 [details] > system log Created attachment 141348 [details]
dmesg (4.19.0-0.rc1.git0.1)
I see from the log that your failure was on 0 order allocation (1 page) in zone NORMAL but this ZONE still had enough 1 page blocks and even larger blocks to fulfill your request so that strange. The only problem I see from the logs is that your free memory in zone NORMAL was lower then min watermark watermark, which AFAIK this should have triggered kswapd to start swapping out memory. I do see you have already some pages in swap so maybe that it. Any way , I can't understand why exactly that failed from the logs. Possibly some memory leaks. Please add cat /sys/kernel/debug/dri/0/amdgpu_gem_info immdialy before and after suspend operation to see how much memory the driver allocated. I will try to ask people from #mm about your log. Created attachment 141390 [details]
amdgpu_gem_info before
Created attachment 141391 [details]
amdgpu_gem_info after
Created attachment 141663 [details] [review] 0001-drm-amdgpu-Allocate-UVD-FW-BO-backup-RAM-space-on-in.patch This is just a shot in the dark but please give a try - see if it helps with suspend resume issue. Created attachment 141666 [details]
dmesg after patch 0001
(In reply to Andrey Grodzovsky from comment #14) > Created attachment 141663 [details] [review] [review] > 0001-drm-amdgpu-Allocate-UVD-FW-BO-backup-RAM-space-on-in.patch > > This is just a shot in the dark but please give a try - see if it helps with > suspend resume issue. The patch couldn't helps. new dmesg attached here (In reply to mikhail.v.gavrilov from comment #15) > Created attachment 141666 [details] > dmesg after patch 0001 Created attachment 143042 [details]
kernel log pre and post suspend
I still hit something like this with a 4.20 kernel. Perhaps it gives additional data points to figure it out.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/484. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 141225 [details] dmesg Steps for reproduce: 1) Put the computer into suspend mode. 2) Wake up the computer with the Power button. $ inxi -bM System: Host: localhost.localdomain Kernel: 4.19.0-0.rc0.git3.1.fc30.x86_64 x86_64 bits: 64 Desktop: Gnome 3.29.90 Distro: Fedora release 29 (Rawhide) Machine: Type: Desktop Mobo: ASUSTeK model: ROG STRIX X470-I GAMING v: Rev 1.xx serial: <root required> UEFI: American Megatrends v: 0901 date: 07/23/2018 CPU: 8-Core: AMD Ryzen 7 2700X type: MT MCP speed: 3381 MHz min/max: 2200/3700 MHz Graphics: Card-1: Advanced Micro Devices [AMD/ATI] Vega 10 XT [Radeon RX Vega 64] driver: amdgpu v: kernel Display: wayland server: Fedora Project X.org 11.0 driver: amdgpu resolution: 3840x2160~60Hz OpenGL: renderer: Radeon RX Vega (VEGA10 DRM 3.27.0 4.19.0-0.rc0.git3.1.fc30.x86_64 LLVM 6.0.1) v: 4.5 Mesa 18.1.5 Network: Card-1: Intel I211 Gigabit Network driver: igb Card-2: Realtek RTL8822BE 802.11a/b/g/n/ac WiFi adapter driver: r8822be Drives: Local Storage: total: 11.35 TiB used: 4.39 TiB (38.6%) Info: Processes: 575 Uptime: 2h 00m Memory: 31.36 GiB used: 28.11 GiB (89.6%) Shell: bash inxi: 3.0.20