bisect took to me to this change that certainly reflects the behavior I am seeing : 5.1.0-rc5 commit 81da87f63a1edebcf8cbb811d387e353d9f89c7a (refs/bisect/bad) Author: Thomas Zimmermann <tzimmermann@suse.de> Date: Tue May 21 13:08:29 2019 +0200 drm: Replace drm_gem_vram_push_to_system() with kunmap + unpin The push-to-system function forces a buffer out of video RAM. This decision should rather be made by the memory manager. By replacing the function with calls to the kunmap and unpin functions, the buffer's memory becomes available, but the buffer remains in VRAM until it's evicted by a pin operation. This patch replaces the remaining instances of drm_gem_vram_push_to_system() in ast and mgag200, and removes the function from DRM. My 1st impression is we need a method that restores the previous behavior that pushes the content to the device . I found this issue using gnome-desktop3-3.28.2-1.el8.x86_64 If there is a more specific. RPM I can look at for guidance I will .
Created attachment 145949 [details] dmesg and message file on bi-sected kernel Starting gnome See messages for " starting gnome " and " Stopping gnome "
debugfs content : With gnome running # for f in `find . -type f ` ; do > echo "$f : `cat $f` " > done ./VGA-1/edid_override : ./VGA-1/force : unspecified ./internal_clients : ./framebuffer : framebuffer[35]: allocated by = Xorg refcount=2 format=XR24 little-endian (0x34325258) modifier=0x0 size=1024x768 layers: size[0]=1024x768 pitch[0]=4096 offset[0]=0 obj[0]:(null) framebuffer[34]: allocated by = [fbcon] refcount=1 format=XR24 little-endian (0x34325258) modifier=0xb7e2c74500000010 size=1024x768 layers: size[0]=1024x768 pitch[0]=4096 offset[0]=4294967295 obj[0]:(null) ./gem_names : name size handles refcount ./clients : command pid dev master a uid magic systemd-logind 1563 0 y y 0 0 ./name : mgag200 dev=0000:3d:00.0 unique=0000:3d:00.0
Created attachment 145950 [details] Running startx on the console This likely doesn't help much On a 4.18 kernel ; when I do "startx" on the console ; it eventually runs gnone. On the bad kernel ; I just see x11 noise ; then nothing .
FTR, the affected machine has 8 MiB of video ram.
Only fishy thing I'm seeing is that the fbcon framebuffer seems to be decent nonsense in the debugfs file: modifier=0xb7e2c74500000010 <- this should be 0 offset[0]=4294967295 <- this is (uint_t)-1 should be 0
Can you pls attach full boot log for the previous kernel (that one that worked, i.e. 982c0500fd1a ("dt-bindings: gpu: add #cooling-cells property to the ARM Mali Midgard GPU binding"))? I'm trying to spot anything that's different. Only thing I can think about is that the offset programming is botched, and implicitly relied on the previous buffer getting thrown out. And now that we don't do that anymore (both buffers for fbcon and Xorg fit together) we still scan out whatever is at offset 0 in vram, which happens to be fbcon. Thomas, does mga200 work for you if you pick a resolution at boot (with video=) so that 2 buffers fit?
Booted : 982c0500fd1a ("dt-bindings: gpu: add #cooling-cells property to the ARM Mali Midgard GPU binding")) With gnome running : for f in `find . -type f ` ; do echo "$f : `cat $f` " ; done ./0/VGA-1/edid_override : ./0/VGA-1/force : unspecified ./0/internal_clients : ./0/framebuffer : framebuffer[35]: allocated by = Xorg refcount=2 format=XR24 little-endian (0x34325258) modifier=0x0 size=1024x768 layers: size[0]=1024x768 pitch[0]=4096 offset[0]=0 obj[0]:(null) framebuffer[34]: allocated by = [fbcon] refcount=1 format=XR24 little-endian (0x34325258) modifier=0xffff8fff00000010 size=1024x768 layers: size[0]=1024x768 pitch[0]=4096 offset[0]=4294967295 obj[0]:(null) ./0/gem_names : name size handles refcount ./0/clients : command pid dev master a uid magic systemd-logind 1569 0 y y 0 0 ./0/name : mgag200 dev=0000:3d:00.0 unique=0000:3d:00.0 dmesg.2 and message.2 will be attached shortly.
Created attachment 145956 [details] dmesg and message for comment 7 For comment 7; booted : 982c0500fd1a ("dt-bindings: gpu: add #cooling-cells property to the ARM Mali Midgard GPU binding"))
Looking at the changes for : 81da87f63a1edebcf8cbb811d387e353d9f89c7a in: drivers/gpu/drm/mgag200/mgag200_mode.c There are explicit changes for the console in two places : mga_crtc_do_set_base() + /* unmap if console */ + if (&mdev->mfbdev->mfb == mga_fb) + drm_gem_vram_kunmap(gbo); + drm_gem_vram_unpin(gbo); } That looks suspicious . What it is the difference between going from text mode where the screen is 24x80 ascii terminal mode - I believe if was referred to as "vga" mode, to graphics mode ? It appears the "frame buffers" may not be getting setup right after the switch, or the lower-level mgag200 driver is not properly detecting where to retrieve the data to display from.
I added to : mga_crtc_do_set_base() DRM_DEBUG_KMS("jpd - setting start addr %p \n",(u32)gpu_addr ); mga_set_start_address(crtc, (u32)gpu_addr); And in the trace I see : [ 629.004322] [drm:drm_crtc_helper_set_config [drm_kms_helper]] attempting to set mode from userspace [ 629.004330] [drm:drm_mode_debug_printmodeline [drm]] Modeline "1024x768": 60 65000 1024 1048 1184 1344 768 771 777 806 0x40 0xa [ 629.004333] [drm:drm_crtc_helper_set_mode [drm_kms_helper]] [CRTC:31:crtc-0] [ 629.078168] [drm:mga_crtc_do_set_base.isra.6.constprop.16 [mgag200]] jpd - setting start addr (null) gpu_addr == 0 ; ( null ) In text mode I see : [ 595.057604] [drm:drm_crtc_helper_set_config [drm_kms_helper]] [CRTC:31:crtc-0] [FB:35] #connectors=1 (x y) (0 0) [ 595.057609] [drm:drm_crtc_helper_set_config [drm_kms_helper]] [CONNECTOR:33:VGA-1] to [CRTC:31:crtc-0] [ 595.080068] [drm:mga_crtc_do_set_base.isra.6.constprop.16 [mgag200]] jpd - setting start addr 00000000c79c76db
wrt to comment 10: Using the "functional" kernel that works I see gpu_addr always zero: DRM_DEBUG_KMS("jpd - setting start addr 0x%x \n",(u32)gpu_addr ); mga_set_start_address(crtc, (u32)gpu_addr); [ 229.249797] [drm:drm_crtc_helper_set_config [drm_kms_helper]] [CONNECTOR:33:VGA-1] to [CRTC:31:crtc-0] [ 229.566570] [drm:mga_crtc_do_set_base.isra.6.constprop.16 [mgag200]] jpd - setting start addr 0x0 and starting gnome: [ 364.268009] [drm:drm_mode_debug_printmodeline [drm]] Modeline "1024x768": 60 65000 1024 1048 1184 1344 768 771 777 806 0x40 0xa [ 364.268012] [drm:drm_crtc_helper_set_mode [drm_kms_helper]] [CRTC:31:crtc-0] [ 364.268504] [drm:drm_ioctl [drm]] pid=1570, dev=0xe200, auth=1, DRM_IOCTL_DROP_MASTER [ 364.376192] [drm:mga_crtc_do_set_base.isra.6.constprop.16 [mgag200]] jpd - setting start addr 0x0
(In reply to John.p.donnelly from comment #9) > What it is the difference between going from text mode where the screen is > 24x80 ascii terminal mode - I believe if was referred to as "vga" mode, to > graphics mode ? It appears the "frame buffers" may not be getting setup > right after the switch, or the lower-level mgag200 driver is not properly > detecting where to retrieve the data to display from. You're not running in classic vga mode when in text mode, e.g. from your dmesg (there's more stuff in there that shows the vga -> mgag200 transition): [ 5.144662] fbcon: mgag200drmfb (fb0) is primary device [ 5.144716] Console: switching to colour frame buffer device 128x48 Your "text" mode is actually the fbcon console on top of the mgag200drmfb fbdev emulation on top of the mgag200 drm driver. So in "text mode" the drm driver is already running, and clearly it seems to work (somewhat at least). But when X boots and allocates its own framebuffer memory, somehow the switch to that new buffer is broken. Now with your little experiment there's two strange things: - I'd expect the graphical start address to be non-zero (for the broken kernel, working kernel has both 0), but per your description it's the other way round? - The address looks corrupted. You need to print it as %u (it's an u32, not a pointer), right now it looks way too big. Another expirement: On the working kernel, can you try to program an offset start address like this: mga_set_start_address(crtc, (u32)gpu_addr + 1024*1024); That should result in the entire console/gnome being moved up about 1/3rd of the screen, with possibly garbage at the bottom third. Finally can you pls attach the output of lspci -nn and what's in /proc/iomem? The address you have suspiciously looks like a cpu address, not a gpu address for the framebuffer ...
Created attachment 145958 [details] lspci -nn -vv -l ; and /proc/iomem lspci and iomem summary for comment 12;
(In reply to John.p.donnelly from comment #13) > Created attachment 145958 [details] > lspci -nn -vv -l ; and /proc/iomem > > lspci and iomem summary for comment 12; Huh, 0x799c76db is nowhere to be found. Can you pls try to re-grab the gpu addresses, but with the 0x%x modifier, not %p on the broken kernel?
Sorry -- my bad. wrt to comment 13. 1. using 0x%x , or 0x%u I get 0 as the gpu_addr using the "working". kernel . [ 13.337980] [drm:mga_crtc_do_set_base.isra.6.constprop.16 [mgag200]] jpd - setting start addr 0x0 [ 1005.166675] [drm:mga_crtc_do_set_base.isra.6.constprop.16 [mgag200]] jpd - setting start addr 0x0 So %u or %x is 0 2. On the "bad" kernel : DRM_DEBUG_KMS("jpd - setting start addr %u \n",(u32)gpu_addr ); in text mode: [ 11.687192] [drm:mga_crtc_do_set_base.isra.6.constprop.16 [mgag200]] jpd - setting start addr 0 Switching to Graphics : [ 96.193135] [drm:mga_crtc_do_set_base.isra.6.constprop.16 [mgag200]] jpd - setting start addr 3145728 3. DRM_DEBUG_KMS("jpd - setting start addr 0x%x \n",(u32)gpu_addr ); text: 5.249018] [drm:mga_crtc_do_set_base.isra.6.constprop.16 [mgag200]] jpd - setting start addr 0x0 graphics : [ 67.078407] [drm:mga_crtc_do_set_base.isra.6.constprop.16 [mgag200]] jpd - setting start addr 0x300000 3145728 == 0x300000. ; 3MB ; QUESTIONS: 1. It appears gpu_addr of 0x300000 ( 3MB) is the offset into the adapter . I see in mga_set_start_address(); it is being used to set registers , so I assume that is an offset into the video ram of the adapter; 2. " But when X boots and allocates its own framebuffer memory, somehow the switch to that new buffer is broken. " Where / how can I track that address down ? Is there something in the DRM tracing that will show that ? 3. I feel our best bet to track this down is at the breakage point with commit 81da87f63a1edebcf , not at the tip , because it is the lowest common denominator debugging at the initial breakage, even though the drm frame-work has changed since. --
On a good kernel : mode switch to graphics [ 4898.928861] [drm:drm_crtc_helper_set_config [drm_kms_helper]] attempting to set mode from userspace [ 4898.928869] [drm:drm_mode_debug_printmodeline [drm]] Modeline "1024x768": 60 65000 1024 1048 1184 1344 768 771 777 806 0x40 0xa [ 4898.928873] [drm:drm_crtc_helper_set_mode [drm_kms_helper]] [CRTC:31:crtc-0] [ 4899.036466] [drm:mga_crtc_do_set_base.isra.6.constprop.16 [mgag200]] jpd - setting start addr 0x0 [ 4899.040425] [drm:drm_crtc_helper_set_mode [drm_kms_helper]] [ENCODER:32:DAC-32] set [MODE:1024x768] [ 4899.145209] [drm:drm_crtc_helper_set_config [drm_kms_helper]] Setting connector DPMS state to on [ 4899.145213] [drm:drm_crtc_helper_set_config [drm_kms_helper]] [CONNECTOR:33:VGA-1] set DPMS on I added a backtrace to when I set the address: [ 129.268844] [drm:drm_ioctl [drm]] pid=2311, dev=0xe200, auth=1, DRM_IOCTL_MODE_SETCRTC [ 129.268850] [drm:drm_mode_setcrtc [drm]] [CRTC:31:crtc-0] [ 129.268859] [drm:drm_mode_setcrtc [drm]] [CONNECTOR:33:VGA-1] [ 129.268863] [drm:drm_crtc_helper_set_config [drm_kms_helper]] [ 129.268877] [drm:drm_crtc_helper_set_config [drm_kms_helper]] [CRTC:31:crtc-0] [FB:35] #connectors=1 (x y) (0 0) [ 129.268881] [drm:drm_crtc_helper_set_config [drm_kms_helper]] [CONNECTOR:33:VGA-1] to [CRTC:31:crtc-0] [ 129.290732] [drm:mga_crtc_do_set_base.isra.6.constprop.16 [mgag200]] jpd - setting start addr 0x300000 [ 129.296487] [drm:mga_crtc_do_set_base.isra.6.constprop.16 [mgag200]] jpd - setting start addr 0x10:0x0:0x0:0x16 [ 129.296488] CPU: 2 PID: 2311 Comm: Xorg Not tainted 5.1.0-rc5-g81da87f63a1e-dirty #32 [ 129.296489] Hardware name: Oracle Corporation ORACLE SERVER X5-2/ASM,MOTHERBOARD,1U, BIOS 30140300 09/20/2018 [ 129.296489] Call Trace: [ 129.296492] dump_stack+0x63/0x8a [ 129.296494] mga_crtc_do_set_base.isra.6.constprop.16+0x21c/0x290 [mgag200] [ 129.296495] mga_crtc_mode_set_base+0x11/0x20 [mgag200] [ 129.296499] drm_crtc_helper_set_config+0x50c/0x960 [drm_kms_helper] [ 129.296507] __drm_mode_set_config_internal+0x83/0x150 [drm] [ 129.296514] drm_mode_setcrtc+0x57a/0x780 [drm] [ 129.296520] ? drm_ioctl+0x177/0x410 [drm] [ 129.296527] ? drm_mode_getcrtc+0x1a0/0x1a0 [drm] [ 129.296533] drm_ioctl_kernel+0xb0/0x100 [drm] [ 129.296539] drm_ioctl+0x233/0x410 [drm] [ 129.296545] ? drm_mode_getcrtc+0x1a0/0x1a0 [drm] [ 129.296547] do_vfs_ioctl+0xa9/0x640 [ 129.296548] ? __audit_syscall_entry+0xdd/0x130 [ 129.296550] ? handle_mm_fault+0xe1/0x210 [ 129.296552] ksys_ioctl+0x67/0x90 ( init 5 starts graphic mode ) 1. On a GOOD KERNEL booting to init 3, then init 5; to init 3 I see 3 "set mode from user space " events : # egrep "attempt|jpd - setting" good [ 13.459004] [drm:drm_crtc_helper_set_config [drm_kms_helper]] attempting to set mode from userspace [ 13.554237] [drm:mga_crtc_do_set_base.isra.6.constprop.16 [mgag200]] jpd - setting start addr 0x0 [ 3357.030214] [drm:mga_crtc_do_set_base.isra.6.constprop.16 [mgag200]] jpd - setting start addr 0x0 [ 3371.276997] [drm:drm_crtc_helper_set_config [drm_kms_helper]] attempting to set mode from userspace [ 3371.383755] [drm:mga_crtc_do_set_base.isra.6.constprop.16 [mgag200]] jpd - setting start addr 0x0 [ 4872.079795] [drm:mga_crtc_do_set_base.isra.6.constprop.16 [mgag200]] jpd - setting start addr 0x0 [ 4898.928861] [drm:drm_crtc_helper_set_config [drm_kms_helper]] attempting to set mode from userspace [ 4899.036466] [drm:mga_crtc_do_set_base.isra.6.constprop.16 [mgag200]] jpd - setting start addr 0x0 [root@ca-dev55 ~]# On the BAD kernel - I am missing one of the set modes events : I see ONLY 2 "set mode from user space " : egrep "attempt|jpd - setting" bad [ 13.449488] [drm:drm_crtc_helper_set_config [drm_kms_helper]] attempting to set mode from userspace [ 13.545231] [drm:mga_crtc_do_set_base.isra.6.constprop.16 [mgag200]] jpd - setting start addr 0x0 [ 13.547980] [drm:mga_crtc_do_set_base.isra.6.constprop.16 [mgag200]] jpd - setting start addr 0x10:0x0:0x0:0x10 [ 129.290732] [drm:mga_crtc_do_set_base.isra.6.constprop.16 [mgag200]] jpd - setting start addr 0x300000 [ 129.296487] [drm:mga_crtc_do_set_base.isra.6.constprop.16 [mgag200]] jpd - setting start addr 0x10:0x0:0x0:0x16 [ 164.129553] [drm:drm_crtc_helper_set_config [drm_kms_helper]] attempting to set mode from userspace [ 164.203222] [drm:mga_crtc_do_set_base.isra.6.constprop.16 [mgag200]] jpd - setting start addr 0x0 [ 164.207498] [drm:mga_crtc_do_set_base.isra.6.constprop.16 [mgag200]] jpd - setting start addr 0x10:0x0:0x0:0x10 Specifically the one before we set the address to 0x300000 which is switching to graphics mode. I will debug this more tomorrow.
Ok all the stuff you've done looks correct now and as expected. (In reply to Daniel Vetter from comment #12) > mga_set_start_address(crtc, (u32)gpu_addr + 1024*1024); > > That should result in the entire console/gnome being moved up about 1/3rd of > the screen, with possibly garbage at the bottom third. > > Finally can you pls attach the output of lspci -nn and what's in > /proc/iomem? The address you have suspiciously looks like a cpu address, not > a gpu address for the framebuffer ... This one still needs to be done. I'm suspecting that something with the base address doesn't work. Also question on your setup: Are you showing the screen through the management console of the machine itself, or does this go through some external connector?
Hi Daniel, Ignore my notes in comment 16 regarding "set mode from user space " ; false alarm. I've been slowly looking at the DRM debug logs trying to learn the behavior. --- wrt comment 17. 1. The contents of lspci and /proc/iomem are in attachment #4 [details] [review]. Since iomem is not that large == It will be shown below this comment. 2. "This still needs done " mga_set_start_address(crtc, (u32)gpu_addr + 1024*1024); No difference in display . The vga/text mode appearance was fine, with the offset being x100000, which is kind of puzzling. When graphics mode was started , the offset used was 0x400000; No gnome splash screen seen. 3. The video device mgag200 is embedded on the motherboard on a variety of server class machines as remote consoles without a physical video output to an edge connector to attach a monitor to - so I guess the answer is : "remote management" 4: As noted below : I see the PCI space used for the device is: c5000000-c68fffff : PCI Bus 0000:3d c5000000-c5ffffff : 0000:3d:00.0 c5000000-c5ffffff : mgadrmfb_vram c6000000-c67fffff : 0000:3d:00.0 c6810000-c6813fff : 0000:3d:00.0 c6810000-c6813fff : mgadrmfb_mmio How is that reflected in the frame-buffer usage ? =========================== /proc/iomem : cat /proc/iomem 00000000-00000fff : Reserved 00001000-00099bff : System RAM 00099c00-0009ffff : Reserved 000a0000-000bffff : PCI Bus 0000:00 000c0000-000c7fff : Video ROM 000c8000-000cf9ff : Adapter ROM 000d0000-000d0fff : Adapter ROM 000d1000-000d1fff : Adapter ROM 000d2000-000d2fff : Adapter ROM 000d3000-000d3fff : Adapter ROM 000d4000-000d4fff : Adapter ROM 000e0000-000fffff : Reserved 000f0000-000fffff : System ROM 00100000-778c3fff : System RAM 778c4000-792f1fff : Reserved 78e57018-78e57018 : APEI ERST 78e5701c-78e57021 : APEI ERST 78e57028-78e57039 : APEI ERST 78e57040-78e5704c : APEI ERST 78e57050-78e5904f : APEI ERST 792f2000-7932cfff : ACPI Tables 7932d000-798fffff : ACPI Non-volatile Storage 79900000-7bd4cfff : Reserved 7bd4d000-7bd57fff : System RAM 7bd58000-7bd58fff : Reserved 7bd59000-7bd5bfff : System RAM 7bd5c000-7bd5cfff : Reserved 7bd5d000-7bd5dfff : System RAM 7bd5e000-7bde3fff : Reserved 7bde4000-7bffffff : System RAM 80000000-8fffffff : PCI MMCONFIG 0000 [bus 00-ff] 80000000-8fffffff : Reserved 90000000-c7ffbfff : PCI Bus 0000:00 c4400000-c48fffff : PCI Bus 0000:3a c4400000-c45fffff : 0000:3a:00.1 c4400000-c45fffff : ixgbe c4600000-c47fffff : 0000:3a:00.0 c4600000-c47fffff : ixgbe c4800000-c4803fff : 0000:3a:00.1 c4800000-c4803fff : ixgbe c4804000-c4807fff : 0000:3a:00.0 c4804000-c4807fff : ixgbe c4a00000-c4efffff : PCI Bus 0000:03 c4a00000-c4bfffff : 0000:03:00.1 c4a00000-c4bfffff : ixgbe c4c00000-c4dfffff : 0000:03:00.0 c4c00000-c4dfffff : ixgbe c4e00000-c4e03fff : 0000:03:00.1 c4e00000-c4e03fff : ixgbe c4e04000-c4e07fff : 0000:03:00.0 c4e04000-c4e07fff : ixgbe c5000000-c68fffff : PCI Bus 0000:3d c5000000-c5ffffff : 0000:3d:00.0 c5000000-c5ffffff : mgadrmfb_vram c6000000-c67fffff : 0000:3d:00.0 c6810000-c6813fff : 0000:3d:00.0 c6810000-c6813fff : mgadrmfb_mmio c6900000-c6dfffff : PCI Bus 0000:03 c6900000-c697ffff : 0000:03:00.1 c6980000-c69fffff : 0000:03:00.0 c6a00000-c6afffff : 0000:03:00.1 c6b00000-c6bfffff : 0000:03:00.1 c6c00000-c6cfffff : 0000:03:00.0 c6d00000-c6dfffff : 0000:03:00.0 c6e00000-c71fffff : PCI Bus 0000:3a c6e00000-c6efffff : 0000:3a:00.1 c6f00000-c6ffffff : 0000:3a:00.1 c7000000-c70fffff : 0000:3a:00.0 c7100000-c71fffff : 0000:3a:00.0 c7200000-c74fffff : PCI Bus 0000:23 c7200000-c72fffff : 0000:23:00.0 c7300000-c73fffff : 0000:23:00.0 c7400000-c740ffff : 0000:23:00.0 c7400000-c740ffff : megasas: LSI c7500000-c75007ff : 0000:00:1f.2 c7500000-c75007ff : ahci c7501000-c75013ff : 0000:00:1d.0 c7501000-c75013ff : ehci_hcd c7502000-c75023ff : 0000:00:1a.0 c7502000-c75023ff : ehci_hcd c7504000-c7504fff : 0000:00:05.4 c7ffc000-c7ffcfff : dmar1 c8000000-fbffbfff : PCI Bus 0000:80 f2000000-f5ffffff : PCI Bus 0000:90 f2000000-f5ffffff : PCI Bus 0000:91 f2000000-f2ffffff : PCI Bus 0000:98 f3000000-f3ffffff : PCI Bus 0000:96 f4000000-f4ffffff : PCI Bus 0000:94 f5000000-f5ffffff : PCI Bus 0000:92 f6000000-f64fffff : PCI Bus 0000:82 f6000000-f61fffff : 0000:82:00.1 f6000000-f61fffff : ixgbe f6200000-f63fffff : 0000:82:00.0 f6200000-f63fffff : ixgbe f6400000-f6403fff : 0000:82:00.1 f6400000-f6403fff : ixgbe f6404000-f6407fff : 0000:82:00.0 f6404000-f6407fff : ixgbe f7000000-faffffff : PCI Bus 0000:90 f7000000-faffffff : PCI Bus 0000:91 f7000000-f7ffffff : PCI Bus 0000:98 f8000000-f8ffffff : PCI Bus 0000:96 f9000000-f9ffffff : PCI Bus 0000:94 fa000000-faffffff : PCI Bus 0000:92 fb000000-fb3fffff : PCI Bus 0000:82 fb000000-fb0fffff : 0000:82:00.1 fb100000-fb1fffff : 0000:82:00.1 fb200000-fb2fffff : 0000:82:00.0 fb300000-fb3fffff : 0000:82:00.0 fb400000-fb400fff : 0000:80:05.4 fbffc000-fbffcfff : dmar0 fec00000-fecfffff : PNP0003:00 fec00000-fec003ff : IOAPIC 0 fec01000-fec013ff : IOAPIC 1 fec40000-fec403ff : IOAPIC 2 fed00000-fed003ff : HPET 0 fed00000-fed003ff : PNP0103:00 fed12000-fed1200f : pnp 00:01 fed12010-fed1201f : pnp 00:01 fed1b000-fed1bfff : pnp 00:01 fed1c000-fed1ffff : Reserved fed1f410-fed1f414 : iTCO_wdt.0.auto fed45000-fed8bfff : pnp 00:01 fee00000-feefffff : pnp 00:01 fee00000-fee00fff : Local APIC ff000000-ffffffff : Reserved ff000000-ffffffff : pnp 00:01 100000000-607fffffff : System RAM 2c0000000-2c0c00e10 : Kernel code 2c0c00e11-2c141683f : Kernel data 2c169e000-2c23fffff : Kernel bss 380000000000-383fffffffff : PCI Bus 0000:00 383ffff00000-383ffff03fff : 0000:00:04.7 383ffff00000-383ffff03fff : ioatdma 383ffff04000-383ffff07fff : 0000:00:04.6 383ffff04000-383ffff07fff : ioatdma 383ffff08000-383ffff0bfff : 0000:00:04.5 383ffff08000-383ffff0bfff : ioatdma 383ffff0c000-383ffff0ffff : 0000:00:04.4 383ffff0c000-383ffff0ffff : ioatdma 383ffff10000-383ffff13fff : 0000:00:04.3 383ffff10000-383ffff13fff : ioatdma 383ffff14000-383ffff17fff : 0000:00:04.2 383ffff14000-383ffff17fff : ioatdma 383ffff18000-383ffff1bfff : 0000:00:04.1 383ffff18000-383ffff1bfff : ioatdma 383ffff1c000-383ffff1ffff : 0000:00:04.0 383ffff1c000-383ffff1ffff : ioatdma 383ffff20000-383ffff200ff : 0000:00:1f.3 383ffff21000-383ffff2100f : 0000:00:16.1 383ffff22000-383ffff2200f : 0000:00:16.0 384000000000-387fffffffff : PCI Bus 0000:80 387ffff00000-387ffff03fff : 0000:80:04.7 387ffff00000-387ffff03fff : ioatdma 387ffff04000-387ffff07fff : 0000:80:04.6 387ffff04000-387ffff07fff : ioatdma 387ffff08000-387ffff0bfff : 0000:80:04.5 387ffff08000-387ffff0bfff : ioatdma 387ffff0c000-387ffff0ffff : 0000:80:04.4 387ffff0c000-387ffff0ffff : ioatdma 387ffff10000-387ffff13fff : 0000:80:04.3 387ffff10000-387ffff13fff : ioatdma 387ffff14000-387ffff17fff : 0000:80:04.2 387ffff14000-387ffff17fff : ioatdma 387ffff18000-387ffff1bfff : 0000:80:04.1 387ffff18000-387ffff1bfff : ioatdma 387ffff1c000-387ffff1ffff : 0000:80:04.0 387ffff1c000-387ffff1ffff : ioatdma ======= lspci -s 3d:00.0 -vvv -k 3d:00.0 VGA compatible controller: Matrox Electronics Systems Ltd. MGA G200e [Pilot] ServerEngines (SEP1) (rev 05) (prog-if 00 [VGA controller]) Subsystem: Oracle/SUN Device 4852 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 16 NUMA node: 0 Region 0: Memory at c5000000 (32-bit, non-prefetchable) [size=16M] Region 1: Memory at c6810000 (32-bit, non-prefetchable) [size=16K] Region 2: Memory at c6000000 (32-bit, non-prefetchable) [size=8M] Expansion ROM at 000c0000 [disabled] [size=128K] Capabilities: [dc] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [e4] Express (v1) Legacy Endpoint, MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset- DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported- RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 128 bytes, MaxReadReq 128 bytes DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend- LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s, Exit Latency L0s <64ns ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- Capabilities: [54] MSI: Enable- Count=1/1 Maskable- 64bit- Address: 00000000 Data: 0000 Kernel driver in use: mgag200 Kernel modules: mgag200
wrt to comment 17. The mgag device is integrated into Aspeed BMC device.
It appears to me the drm_gem_vram_pin() in mga_crtc_do_set_base() in drivers/gpu/drm/mgag200/mgag200_mode.c. is not working as expected. Would it be worth wild to restore this to the previous operations ? The BMC component that contains this video device is used on a large variety of server systems with remote console /remote management. I am concerned I discovered this early and other vendors have not used 5.1-rc5 (this change was done in May 2019 ) yet.
I reverted the offending 81da87f63a1edebcf8cbb811d387e353d9f89c7a changes only in the mgag200 mgag200_mode.c , and added the removed function drm_gem_vram_push_to_system() to the same file, and the graphics work. Minimal change. The "offset" address that is passed to: mga_crtc_do_set_base() is 0 again. That seems suspicious. As noted in comment 16, Why is the failing node the offset is 3MB ( 0x300000) by simply reverting minor modifications in mode.c ? Looking at the DRM logs using the tip , I get the same 0x300000 (3MB ) offset.
If I replace drm_gem_vram_unpin(gbo) with the older drm_gem_vram_push_to_system(gbo) at the tip. ( v5.4.0-rc6) I get a GNOME login. static int mga_crtc_do_set_base(struct drm_crtc *crtc, struct drm_framebuffer *fb, int x, int y, int atomic) @@ -866,7 +954,8 @@ static int mga_crtc_do_set_base(struct drm_crtc *crtc, if (!atomic && fb) { gbo = drm_gem_vram_of_gem(fb->obj[0]); - drm_gem_vram_unpin(gbo); + // drm_gem_vram_unpin(gbo); + drm_gem_vram_push_to_system(gbo); }
Hi John, thank you so much for debugging this problem. I've been OoO on Friday and now I have to set zp my mgag200 machine anew. For all this I'm some what slow to respond ATM. (In reply to John.p.donnelly from comment #22) > If I replace drm_gem_vram_unpin(gbo) with the older > drm_gem_vram_push_to_system(gbo) > at the tip. ( v5.4.0-rc6) I get a GNOME login. > > > static int mga_crtc_do_set_base(struct drm_crtc *crtc, > struct drm_framebuffer *fb, > int x, int y, int atomic) > @@ -866,7 +954,8 @@ static int mga_crtc_do_set_base(struct drm_crtc *crtc, > > if (!atomic && fb) { > gbo = drm_gem_vram_of_gem(fb->obj[0]); > - drm_gem_vram_unpin(gbo); > + // drm_gem_vram_unpin(gbo); > + drm_gem_vram_push_to_system(gbo); > } Some context to this code: push_to_system() explicitly kicked the buffer out of video memory (into system memory). But we don't want to do this in the driver. Evicting buffers is a decision that should be made by the memory manager. Therefore, we only unpin the buffer and leave evicting the buffer to the memory manager when the memory is actually required. After reviewing the code for unpin(), I think this doesn't work. Buffer objects are never marked for being located in system memory.
Created attachment 145986 [details] [review] drm/vram: Mark BO for VRAM and SYSTEM placement if pin count is zero John, could you please remove the push_to_system() call, restore the unpin() call, apply the attached patch, and report back about the results? After the final unpin, the buffer now gets marked for being located in video or system memory.
Hello Thomas. Thank you for helping out. wrt comment 24. I manually applied your patch to a fresh, clean 5.4.0-rc7 tip and I am still seeing the same behavior that no graphics is seen when GNOME starts: 31f4f5b495a6 2019-11-10 | Linux 5.4-rc7 # git diff diff --git a/drivers/gpu/drm/drm_gem_vram_helper.c b/drivers/gpu/drm/drm_gem_vram_helper.c index fd751078bae1..7263614ca8f4 100644 --- a/drivers/gpu/drm/drm_gem_vram_helper.c +++ b/drivers/gpu/drm/drm_gem_vram_helper.c @@ -265,6 +265,7 @@ int drm_gem_vram_unpin(struct drm_gem_vram_object *gbo) if (gbo->pin_count) goto out; + drm_gem_vram_placement(gbo, TTM_PL_FLAG_VRAM | TTM_PL_FLAG_SYSTEM); for (i = 0; i < gbo->placement.num_placement ; ++i) gbo->placements[i].flags &= ~TTM_PL_FLAG_NO_EVICT; Since it appears the drm_gem_vram_unpin_locked() function has been removed in 5.4, I assumed the same behavior would apply to drm_gem_vram_unpin() ? I can try this test the commit that I isolated the regression in if that helps. I am still seeing the offset applied in the 0x300000 ( 3MB ) range when I add an additional debug statement in mga_crtc_do_set_base() : 272.169421] [drm:mga_crtc_do_set_base.isra.6.constprop.17 [mgag200]] jpd - setting start addr for 0x300000
Hi (In reply to John.p.donnelly from comment #25) > Hello Thomas. > > Thank you for helping out. > > wrt comment 24. > > I manually applied your patch to a fresh, clean 5.4.0-rc7 tip and I am > still seeing the same behavior that no graphics is seen when GNOME starts: Thanks for testing. > > 31f4f5b495a6 2019-11-10 | Linux 5.4-rc7 > > # git diff > diff --git a/drivers/gpu/drm/drm_gem_vram_helper.c > b/drivers/gpu/drm/drm_gem_vram_helper.c > index fd751078bae1..7263614ca8f4 100644 > --- a/drivers/gpu/drm/drm_gem_vram_helper.c > +++ b/drivers/gpu/drm/drm_gem_vram_helper.c > @@ -265,6 +265,7 @@ int drm_gem_vram_unpin(struct drm_gem_vram_object *gbo) > if (gbo->pin_count) > goto out; > > + drm_gem_vram_placement(gbo, TTM_PL_FLAG_VRAM | TTM_PL_FLAG_SYSTEM); > for (i = 0; i < gbo->placement.num_placement ; ++i) > gbo->placements[i].flags &= ~TTM_PL_FLAG_NO_EVICT; > > > > > Since it appears the drm_gem_vram_unpin_locked() function has been removed > in 5.4, I assumed the same behavior would apply to drm_gem_vram_unpin() ? drm_gem_vram_unpin_locked() function has been removed ? It's an internal static interface, so it may not show up in stack traces. But, yeah, the behavior applies to drm_gem_vram_unpin(). > > I can try this test the commit that I isolated the regression in if that > helps. > > I am still seeing the offset applied in the 0x300000 ( 3MB ) range when I > add an additional debug statement in mga_crtc_do_set_base() : > > > 272.169421] [drm:mga_crtc_do_set_base.isra.6.constprop.17 [mgag200]] jpd - > setting start addr for 0x300000 Daniel suspected that the controller doesn't respect the offset value, but expects an offset of zero. I'll provide patches to work around that.
They were removed by you :-) commit 57c84d5c9348bda5e9129bc4e4e567546915ad8c Author: Thomas Zimmermann <tzimmermann@suse.de> Date: Thu Jun 13 09:30:40 2019 +0200 drm: Remove lock interfaces from GEM VRAM helpers The lock functions and the locked-pin/unpin functions of GEM VRAM are not required any longer. Remove them. -----
(In reply to John.p.donnelly from comment #27) > They were removed by you :-) > > > commit 57c84d5c9348bda5e9129bc4e4e567546915ad8c > Author: Thomas Zimmermann <tzimmermann@suse.de> > Date: Thu Jun 13 09:30:40 2019 +0200 > > > drm: Remove lock interfaces from GEM VRAM helpers > > The lock functions and the locked-pin/unpin functions of GEM VRAM are not > required any longer. Remove them. > > ----- Oh I see. I later introduced functions of the same name but for a different purpose. commit bc25bb9192c0438d84bf69ab72de02d3a4c3f827 Author: Thomas Zimmermann <tzimmermann@suse.de> Date: Fri Sep 6 14:20:54 2019 +0200 drm/vram: Acquire lock only once per call to vmap()/vunmap() The implementation of vmap() is a combined pin() and kmap(). As both functions share the same lock, we can make vmap() slightly faster by acquiring the lock only once for both operations. Same for the inverse, vunmap().
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/misc/issues/7.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.