Bug 112265 - Drm: mgag200. Video adapter issue with 5.4.0-rc3 ; no graphics
Summary: Drm: mgag200. Video adapter issue with 5.4.0-rc3 ; no graphics
Status: RESOLVED MOVED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/other (show other bugs)
Version: DRI git
Hardware: x86-64 (AMD64) Linux (All)
: not set major
Assignee: Thomas Zimmermann
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-11-13 21:30 UTC by John.p.donnelly
Modified: 2019-11-19 08:54 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg and message file on bi-sected kernel (60.98 KB, application/x-gzip)
2019-11-13 21:33 UTC, John.p.donnelly
no flags Details
Running startx on the console (255.71 KB, image/png)
2019-11-13 22:08 UTC, John.p.donnelly
no flags Details
dmesg and message for comment 7 (61.00 KB, application/x-gzip)
2019-11-14 15:35 UTC, John.p.donnelly
no flags Details
lspci -nn -vv -l ; and /proc/iomem (16.22 KB, application/x-gzip)
2019-11-14 20:20 UTC, John.p.donnelly
no flags Details
drm/vram: Mark BO for VRAM and SYSTEM placement if pin count is zero (1.04 KB, patch)
2019-11-18 09:29 UTC, Thomas Zimmermann
no flags Details | Splinter Review

Description John.p.donnelly 2019-11-13 21:30:11 UTC
bisect took to me to  this change that certainly reflects the behavior I am seeing :

 5.1.0-rc5


commit 81da87f63a1edebcf8cbb811d387e353d9f89c7a (refs/bisect/bad)
Author: Thomas Zimmermann <tzimmermann@suse.de>
Date:   Tue May 21 13:08:29 2019 +0200

   drm: Replace drm_gem_vram_push_to_system() with kunmap + unpin

   The push-to-system function forces a buffer out of video RAM. This decision
   should rather be made by the memory manager. By replacing the function with
   calls to the kunmap and unpin functions, the buffer's memory becomes available,
   but the buffer remains in VRAM until it's evicted by a pin operation.

   This patch replaces the remaining instances of drm_gem_vram_push_to_system()
   in ast and mgag200, and removes the function from DRM.


My 1st impression is we need a method  that restores the previous behavior that pushes the content to the device .    



I found this issue using 

gnome-desktop3-3.28.2-1.el8.x86_64

If there is a more specific. RPM  I can look at for guidance I will .
Comment 1 John.p.donnelly 2019-11-13 21:33:38 UTC
Created attachment 145949 [details]
dmesg and message file on bi-sected kernel

Starting gnome 

See messages for 
 
  " starting gnome "

  and 

"  Stopping gnome "
Comment 2 John.p.donnelly 2019-11-13 21:48:49 UTC
debugfs content :


With gnome running 


 # for f in `find . -type f ` ; do 
> echo "$f :  `cat $f` " 
> done
./VGA-1/edid_override :   
./VGA-1/force :  unspecified 
./internal_clients :   
./framebuffer :  framebuffer[35]:
	allocated by = Xorg
	refcount=2
	format=XR24 little-endian (0x34325258)
	modifier=0x0
	size=1024x768
	layers:
		size[0]=1024x768
		pitch[0]=4096
		offset[0]=0
		obj[0]:(null)
framebuffer[34]:
	allocated by = [fbcon]
	refcount=1
	format=XR24 little-endian (0x34325258)
	modifier=0xb7e2c74500000010
	size=1024x768
	layers:
		size[0]=1024x768
		pitch[0]=4096
		offset[0]=4294967295
		obj[0]:(null) 
./gem_names :    name     size handles refcount 
./clients :               command   pid dev master a   uid      magic
      systemd-logind  1563   0   y    y     0          0 
./name :  mgag200 dev=0000:3d:00.0 unique=0000:3d:00.0
Comment 3 John.p.donnelly 2019-11-13 22:08:42 UTC
Created attachment 145950 [details]
Running startx on the console

This likely doesn't help much 
On a 4.18 kernel  ; when I do "startx"  on the console   ; it eventually runs gnone.

 On the bad kernel ;   I just see  x11 noise ;  then nothing .
Comment 4 Thomas Zimmermann 2019-11-14 09:40:10 UTC
FTR, the affected machine has 8 MiB of video ram.
Comment 5 Daniel Vetter 2019-11-14 14:32:24 UTC
Only fishy thing I'm seeing is that the fbcon framebuffer seems to be decent nonsense in the debugfs file:

	modifier=0xb7e2c74500000010 <- this should be 0
		offset[0]=4294967295 <- this is (uint_t)-1 should be 0
Comment 6 Daniel Vetter 2019-11-14 14:45:28 UTC
Can you pls attach full boot log for the previous kernel (that one that worked, i.e. 982c0500fd1a ("dt-bindings: gpu: add #cooling-cells property to the ARM Mali Midgard GPU binding"))?

I'm trying to spot anything that's different. Only thing I can think about is that the offset programming is botched, and implicitly relied on the previous buffer getting thrown out. And now that we don't do that anymore (both buffers for fbcon and Xorg fit together) we still scan out whatever is at offset 0 in vram, which happens to be fbcon.

Thomas, does mga200 work for you if you pick a resolution at boot (with video=) so that 2 buffers fit?
Comment 7 John.p.donnelly 2019-11-14 15:28:52 UTC

Booted :

982c0500fd1a ("dt-bindings: gpu: add #cooling-cells property to the ARM Mali Midgard GPU binding"))



With gnome running :

 for f in `find . -type f ` ; do  echo "$f :  `cat $f` " ; done
./0/VGA-1/edid_override :   
./0/VGA-1/force :  unspecified 
./0/internal_clients :   
./0/framebuffer :  framebuffer[35]:
	allocated by = Xorg
	refcount=2
	format=XR24 little-endian (0x34325258)
	modifier=0x0
	size=1024x768
	layers:
		size[0]=1024x768
		pitch[0]=4096
		offset[0]=0
		obj[0]:(null)
framebuffer[34]:
	allocated by = [fbcon]
	refcount=1
	format=XR24 little-endian (0x34325258)
	modifier=0xffff8fff00000010
	size=1024x768
	layers:
		size[0]=1024x768
		pitch[0]=4096
		offset[0]=4294967295
		obj[0]:(null) 
./0/gem_names :    name     size handles refcount 
./0/clients :               command   pid dev master a   uid      magic
      systemd-logind  1569   0   y    y     0          0 
./0/name :  mgag200 dev=0000:3d:00.0 unique=0000:3d:00.0 


dmesg.2 and message.2 will be attached  shortly.
Comment 8 John.p.donnelly 2019-11-14 15:35:36 UTC
Created attachment 145956 [details]
dmesg and message for comment 7


For comment 7; booted : 

982c0500fd1a ("dt-bindings: gpu: add #cooling-cells property to the ARM Mali Midgard GPU binding"))
Comment 9 John.p.donnelly 2019-11-14 16:04:02 UTC
Looking at the changes for : 81da87f63a1edebcf8cbb811d387e353d9f89c7a 

in: drivers/gpu/drm/mgag200/mgag200_mode.c

There are explicit changes for the console in two places :

mga_crtc_do_set_base() 



+               /* unmap if console */
+               if (&mdev->mfbdev->mfb == mga_fb)
+                       drm_gem_vram_kunmap(gbo);
+               drm_gem_vram_unpin(gbo);
        }


That looks suspicious . 

What it is the  difference between going from text mode where the screen is 24x80 ascii terminal mode - I believe if was referred to as  "vga" mode,  to graphics mode ? It appears the  "frame buffers" may not be getting setup right after the switch, or the lower-level mgag200 driver is not properly detecting where to retrieve the data to display from.
Comment 10 John.p.donnelly 2019-11-14 18:09:08 UTC
I added   to :


mga_crtc_do_set_base() 


  DRM_DEBUG_KMS("jpd - setting start addr  %p \n",(u32)gpu_addr );
   mga_set_start_address(crtc, (u32)gpu_addr);


And in the trace I see : 

[  629.004322] [drm:drm_crtc_helper_set_config [drm_kms_helper]] attempting to set mode from userspace
[  629.004330] [drm:drm_mode_debug_printmodeline [drm]] Modeline "1024x768": 60 65000 1024 1048 1184 1344 768 771 777 806 0x40 0xa
[  629.004333] [drm:drm_crtc_helper_set_mode [drm_kms_helper]] [CRTC:31:crtc-0]
[  629.078168] [drm:mga_crtc_do_set_base.isra.6.constprop.16 [mgag200]] jpd - setting start addr            (null)


gpu_addr == 0 ;   ( null ) 

In text mode I see :



[  595.057604] [drm:drm_crtc_helper_set_config [drm_kms_helper]] [CRTC:31:crtc-0] [FB:35] #connectors=1 (x y) (0 0)
[  595.057609] [drm:drm_crtc_helper_set_config [drm_kms_helper]] [CONNECTOR:33:VGA-1] to [CRTC:31:crtc-0]
[  595.080068] [drm:mga_crtc_do_set_base.isra.6.constprop.16 [mgag200]] jpd - setting start addr  00000000c79c76db
Comment 11 John.p.donnelly 2019-11-14 18:57:28 UTC

wrt to comment 10:



Using the "functional" kernel that works I see gpu_addr always zero:


  DRM_DEBUG_KMS("jpd - setting start addr  0x%x \n",(u32)gpu_addr );
   mga_set_start_address(crtc, (u32)gpu_addr);



[  229.249797] [drm:drm_crtc_helper_set_config [drm_kms_helper]] [CONNECTOR:33:VGA-1] to [CRTC:31:crtc-0]
[  229.566570] [drm:mga_crtc_do_set_base.isra.6.constprop.16 [mgag200]] jpd - setting start addr 0x0


and starting gnome:

[  364.268009] [drm:drm_mode_debug_printmodeline [drm]] Modeline "1024x768": 60 65000 1024 1048 1184 1344 768 771 777 806 0x40 0xa
[  364.268012] [drm:drm_crtc_helper_set_mode [drm_kms_helper]] [CRTC:31:crtc-0]
[  364.268504] [drm:drm_ioctl [drm]] pid=1570, dev=0xe200, auth=1, DRM_IOCTL_DROP_MASTER
[  364.376192] [drm:mga_crtc_do_set_base.isra.6.constprop.16 [mgag200]] jpd - setting start addr 0x0
Comment 12 Daniel Vetter 2019-11-14 19:21:56 UTC
(In reply to John.p.donnelly from comment #9)
> What it is the  difference between going from text mode where the screen is
> 24x80 ascii terminal mode - I believe if was referred to as  "vga" mode,  to
> graphics mode ? It appears the  "frame buffers" may not be getting setup
> right after the switch, or the lower-level mgag200 driver is not properly
> detecting where to retrieve the data to display from.

You're not running in classic vga mode when in text mode, e.g. from your dmesg (there's more stuff in there that shows the vga -> mgag200 transition):

[    5.144662] fbcon: mgag200drmfb (fb0) is primary device
[    5.144716] Console: switching to colour frame buffer device 128x48

Your "text" mode is actually the fbcon console on top of the mgag200drmfb fbdev emulation on top of the mgag200 drm driver. So in "text mode" the drm driver is already running, and clearly it seems to work (somewhat at least).

But when X boots and allocates its own framebuffer memory, somehow the switch to that new buffer is broken.

Now with your little experiment there's two strange things:
- I'd expect the graphical start address to be non-zero (for the broken kernel, working kernel has both 0), but per your description it's the other way round?

- The address looks corrupted. You need to print it as %u (it's an u32, not a pointer), right now it looks way too big.

Another expirement: On the working kernel, can you try to program an offset start address like this:

mga_set_start_address(crtc, (u32)gpu_addr + 1024*1024);

That should result in the entire console/gnome being moved up about 1/3rd of the screen, with possibly garbage at the bottom third.

Finally can you pls attach the output of lspci -nn and what's in /proc/iomem? The address you have suspiciously looks like a cpu address, not a gpu address for the framebuffer ...
Comment 13 John.p.donnelly 2019-11-14 20:20:55 UTC
Created attachment 145958 [details]
lspci -nn -vv -l ; and /proc/iomem

lspci and iomem summary for  comment 12;
Comment 14 Daniel Vetter 2019-11-14 20:54:49 UTC
(In reply to John.p.donnelly from comment #13)
> Created attachment 145958 [details]
> lspci -nn -vv -l ; and /proc/iomem
> 
> lspci and iomem summary for  comment 12;

Huh, 0x799c76db is nowhere to be found. Can you pls try to re-grab the gpu addresses, but with the 0x%x modifier, not %p on the broken kernel?
Comment 15 John.p.donnelly 2019-11-14 21:40:34 UTC
  

Sorry -- my bad.   


wrt to comment 13. 

1.   using 0x%x , or 0x%u I get  0 as the gpu_addr using the "working".  kernel .

[   13.337980] [drm:mga_crtc_do_set_base.isra.6.constprop.16 [mgag200]] jpd - setting start addr 0x0 
[ 1005.166675] [drm:mga_crtc_do_set_base.isra.6.constprop.16 [mgag200]] jpd - setting start addr 0x0 


So  %u or %x is 0 


2.  On the "bad" kernel :

  DRM_DEBUG_KMS("jpd - setting start addr %u \n",(u32)gpu_addr );

in text mode:

[   11.687192] [drm:mga_crtc_do_set_base.isra.6.constprop.16 [mgag200]] jpd - setting start addr  0 


Switching to Graphics :

[   96.193135] [drm:mga_crtc_do_set_base.isra.6.constprop.16 [mgag200]] jpd - setting start addr 3145728 


3. 

  DRM_DEBUG_KMS("jpd - setting start addr 0x%x \n",(u32)gpu_addr );

  text:


    5.249018] [drm:mga_crtc_do_set_base.isra.6.constprop.16 [mgag200]] jpd - setting start addr 0x0 


graphics :


[   67.078407] [drm:mga_crtc_do_set_base.isra.6.constprop.16 [mgag200]] jpd - setting start addr 0x300000


 3145728  == 0x300000. ;    3MB ; 
 

QUESTIONS:


1.     It appears gpu_addr  of 0x300000 ( 3MB) is the offset into the adapter .

    I see in mga_set_start_address();  it is being used to set registers , so I assume that is an offset into the video ram of the adapter;   

2.   " But when X boots and allocates its own framebuffer memory, somehow the switch to that new buffer is broken. "

     Where / how can I track that address down ? 

     Is there something in the DRM tracing that will show that ?


3.  I feel our best bet to track this down is at the breakage point with commit 81da87f63a1edebcf , not at the tip ,  because it is the lowest common denominator debugging at the initial breakage,  even though the drm frame-work has changed  since. 

  


--
Comment 16 John.p.donnelly 2019-11-15 01:05:18 UTC
On a good kernel : mode switch to  graphics 


[ 4898.928861] [drm:drm_crtc_helper_set_config [drm_kms_helper]] attempting to set mode from userspace
[ 4898.928869] [drm:drm_mode_debug_printmodeline [drm]] Modeline "1024x768": 60 65000 1024 1048 1184 1344 768 771 777 806 0x40 0xa
[ 4898.928873] [drm:drm_crtc_helper_set_mode [drm_kms_helper]] [CRTC:31:crtc-0]
[ 4899.036466] [drm:mga_crtc_do_set_base.isra.6.constprop.16 [mgag200]] jpd - setting start addr 0x0
[ 4899.040425] [drm:drm_crtc_helper_set_mode [drm_kms_helper]] [ENCODER:32:DAC-32] set [MODE:1024x768]
[ 4899.145209] [drm:drm_crtc_helper_set_config [drm_kms_helper]] Setting connector DPMS state to on
[ 4899.145213] [drm:drm_crtc_helper_set_config [drm_kms_helper]]        [CONNECTOR:33:VGA-1] set DPMS on


I added a backtrace to when I set the address:


[  129.268844] [drm:drm_ioctl [drm]] pid=2311, dev=0xe200, auth=1, DRM_IOCTL_MODE_SETCRTC
[  129.268850] [drm:drm_mode_setcrtc [drm]] [CRTC:31:crtc-0]
[  129.268859] [drm:drm_mode_setcrtc [drm]] [CONNECTOR:33:VGA-1]
[  129.268863] [drm:drm_crtc_helper_set_config [drm_kms_helper]]
[  129.268877] [drm:drm_crtc_helper_set_config [drm_kms_helper]] [CRTC:31:crtc-0] [FB:35] #connectors=1 (x y) (0 0)
[  129.268881] [drm:drm_crtc_helper_set_config [drm_kms_helper]] [CONNECTOR:33:VGA-1] to [CRTC:31:crtc-0]
[  129.290732] [drm:mga_crtc_do_set_base.isra.6.constprop.16 [mgag200]] jpd - setting start addr 0x300000
[  129.296487] [drm:mga_crtc_do_set_base.isra.6.constprop.16 [mgag200]] jpd - setting start addr 0x10:0x0:0x0:0x16
[  129.296488] CPU: 2 PID: 2311 Comm: Xorg Not tainted 5.1.0-rc5-g81da87f63a1e-dirty #32
[  129.296489] Hardware name: Oracle Corporation ORACLE SERVER X5-2/ASM,MOTHERBOARD,1U, BIOS 30140300 09/20/2018
[  129.296489] Call Trace:
[  129.296492]  dump_stack+0x63/0x8a
[  129.296494]  mga_crtc_do_set_base.isra.6.constprop.16+0x21c/0x290 [mgag200]
[  129.296495]  mga_crtc_mode_set_base+0x11/0x20 [mgag200]
[  129.296499]  drm_crtc_helper_set_config+0x50c/0x960 [drm_kms_helper]
[  129.296507]  __drm_mode_set_config_internal+0x83/0x150 [drm]
[  129.296514]  drm_mode_setcrtc+0x57a/0x780 [drm]
[  129.296520]  ? drm_ioctl+0x177/0x410 [drm]
[  129.296527]  ? drm_mode_getcrtc+0x1a0/0x1a0 [drm]
[  129.296533]  drm_ioctl_kernel+0xb0/0x100 [drm]
[  129.296539]  drm_ioctl+0x233/0x410 [drm]
[  129.296545]  ? drm_mode_getcrtc+0x1a0/0x1a0 [drm]
[  129.296547]  do_vfs_ioctl+0xa9/0x640
[  129.296548]  ? __audit_syscall_entry+0xdd/0x130
[  129.296550]  ? handle_mm_fault+0xe1/0x210

[  129.296552]  ksys_ioctl+0x67/0x90



( init 5 starts graphic mode ) 


1.  On a GOOD KERNEL   booting to init 3, then init 5; to init 3

   I see 3   "set mode from user space "   events : 

# egrep "attempt|jpd - setting" good

[   13.459004] [drm:drm_crtc_helper_set_config [drm_kms_helper]] attempting to set mode from userspace
[   13.554237] [drm:mga_crtc_do_set_base.isra.6.constprop.16 [mgag200]] jpd - setting start addr 0x0 
[ 3357.030214] [drm:mga_crtc_do_set_base.isra.6.constprop.16 [mgag200]] jpd - setting start addr 0x0 
[ 3371.276997] [drm:drm_crtc_helper_set_config [drm_kms_helper]] attempting to set mode from userspace
[ 3371.383755] [drm:mga_crtc_do_set_base.isra.6.constprop.16 [mgag200]] jpd - setting start addr 0x0 
[ 4872.079795] [drm:mga_crtc_do_set_base.isra.6.constprop.16 [mgag200]] jpd - setting start addr 0x0 
[ 4898.928861] [drm:drm_crtc_helper_set_config [drm_kms_helper]] attempting to set mode from userspace
[ 4899.036466] [drm:mga_crtc_do_set_base.isra.6.constprop.16 [mgag200]] jpd - setting start addr 0x0 
[root@ca-dev55 ~]# 


On the BAD kernel  -  I am missing one of the set modes events :

 I see ONLY 2  "set mode from user space " : 

egrep "attempt|jpd - setting" bad
[   13.449488] [drm:drm_crtc_helper_set_config [drm_kms_helper]] attempting to set mode from userspace
[   13.545231] [drm:mga_crtc_do_set_base.isra.6.constprop.16 [mgag200]] jpd - setting start addr 0x0 
[   13.547980] [drm:mga_crtc_do_set_base.isra.6.constprop.16 [mgag200]] jpd - setting start addr 0x10:0x0:0x0:0x10
[  129.290732] [drm:mga_crtc_do_set_base.isra.6.constprop.16 [mgag200]] jpd - setting start addr 0x300000 
[  129.296487] [drm:mga_crtc_do_set_base.isra.6.constprop.16 [mgag200]] jpd - setting start addr 0x10:0x0:0x0:0x16
[  164.129553] [drm:drm_crtc_helper_set_config [drm_kms_helper]] attempting to set mode from userspace
[  164.203222] [drm:mga_crtc_do_set_base.isra.6.constprop.16 [mgag200]] jpd - setting start addr 0x0 
[  164.207498] [drm:mga_crtc_do_set_base.isra.6.constprop.16 [mgag200]] jpd - setting start addr 0x10:0x0:0x0:0x10


Specifically the one before we set the address to 0x300000 which is switching to graphics mode.


I will debug this more tomorrow.
Comment 17 Daniel Vetter 2019-11-15 19:52:08 UTC
Ok all the stuff you've done looks correct now and as expected.

(In reply to Daniel Vetter from comment #12)
> mga_set_start_address(crtc, (u32)gpu_addr + 1024*1024);
> 
> That should result in the entire console/gnome being moved up about 1/3rd of
> the screen, with possibly garbage at the bottom third.
> 
> Finally can you pls attach the output of lspci -nn and what's in
> /proc/iomem? The address you have suspiciously looks like a cpu address, not
> a gpu address for the framebuffer ...

This one still needs to be done. I'm suspecting that something with the base address doesn't work.

Also question on your setup: Are you showing the screen through the management console of the machine itself, or does this go through some external connector?
Comment 18 John.p.donnelly 2019-11-15 20:47:37 UTC
Hi Daniel,

Ignore my notes in comment 16 regarding "set mode from user space " ; false alarm.
I've been slowly looking at the DRM debug logs trying to learn the behavior.  


---


wrt comment 17.

 1. The contents of lspci and /proc/iomem are in attachment #4 [details] [review].

    Since iomem is not that large == It will be shown below this comment. 

 2. "This still needs done "  

     mga_set_start_address(crtc, (u32)gpu_addr + 1024*1024); 

     
 No difference in display .  

 The vga/text mode appearance was fine, with the offset being x100000, which is kind of puzzling. 

 When graphics mode was started , the offset used  was 0x400000;  No gnome splash screen seen.


 3. The video device mgag200 is embedded on the motherboard on a variety of server class machines as remote consoles without a physical video output to an edge connector to  attach a monitor to -  so I guess the answer is : "remote management"

4:  As noted below :  I see the PCI space used for the device is:

  c5000000-c68fffff : PCI Bus 0000:3d
    c5000000-c5ffffff : 0000:3d:00.0
      c5000000-c5ffffff : mgadrmfb_vram
    c6000000-c67fffff : 0000:3d:00.0
    c6810000-c6813fff : 0000:3d:00.0
      c6810000-c6813fff : mgadrmfb_mmio


How is that reflected in the frame-buffer usage ?



 



===========================

/proc/iomem :



cat /proc/iomem 
00000000-00000fff : Reserved
00001000-00099bff : System RAM
00099c00-0009ffff : Reserved
000a0000-000bffff : PCI Bus 0000:00
000c0000-000c7fff : Video ROM
000c8000-000cf9ff : Adapter ROM
000d0000-000d0fff : Adapter ROM
000d1000-000d1fff : Adapter ROM
000d2000-000d2fff : Adapter ROM
000d3000-000d3fff : Adapter ROM
000d4000-000d4fff : Adapter ROM
000e0000-000fffff : Reserved
  000f0000-000fffff : System ROM
00100000-778c3fff : System RAM
778c4000-792f1fff : Reserved
  78e57018-78e57018 : APEI ERST
  78e5701c-78e57021 : APEI ERST
  78e57028-78e57039 : APEI ERST
  78e57040-78e5704c : APEI ERST
  78e57050-78e5904f : APEI ERST
792f2000-7932cfff : ACPI Tables
7932d000-798fffff : ACPI Non-volatile Storage
79900000-7bd4cfff : Reserved
7bd4d000-7bd57fff : System RAM
7bd58000-7bd58fff : Reserved
7bd59000-7bd5bfff : System RAM
7bd5c000-7bd5cfff : Reserved
7bd5d000-7bd5dfff : System RAM
7bd5e000-7bde3fff : Reserved
7bde4000-7bffffff : System RAM
80000000-8fffffff : PCI MMCONFIG 0000 [bus 00-ff]
  80000000-8fffffff : Reserved
90000000-c7ffbfff : PCI Bus 0000:00
  c4400000-c48fffff : PCI Bus 0000:3a
    c4400000-c45fffff : 0000:3a:00.1
      c4400000-c45fffff : ixgbe
    c4600000-c47fffff : 0000:3a:00.0
      c4600000-c47fffff : ixgbe
    c4800000-c4803fff : 0000:3a:00.1
      c4800000-c4803fff : ixgbe
    c4804000-c4807fff : 0000:3a:00.0
      c4804000-c4807fff : ixgbe
  c4a00000-c4efffff : PCI Bus 0000:03
    c4a00000-c4bfffff : 0000:03:00.1
      c4a00000-c4bfffff : ixgbe
    c4c00000-c4dfffff : 0000:03:00.0
      c4c00000-c4dfffff : ixgbe
    c4e00000-c4e03fff : 0000:03:00.1
      c4e00000-c4e03fff : ixgbe
    c4e04000-c4e07fff : 0000:03:00.0
      c4e04000-c4e07fff : ixgbe
  c5000000-c68fffff : PCI Bus 0000:3d
    c5000000-c5ffffff : 0000:3d:00.0
      c5000000-c5ffffff : mgadrmfb_vram
    c6000000-c67fffff : 0000:3d:00.0
    c6810000-c6813fff : 0000:3d:00.0
      c6810000-c6813fff : mgadrmfb_mmio
  c6900000-c6dfffff : PCI Bus 0000:03
    c6900000-c697ffff : 0000:03:00.1
    c6980000-c69fffff : 0000:03:00.0
    c6a00000-c6afffff : 0000:03:00.1
    c6b00000-c6bfffff : 0000:03:00.1
    c6c00000-c6cfffff : 0000:03:00.0
    c6d00000-c6dfffff : 0000:03:00.0
  c6e00000-c71fffff : PCI Bus 0000:3a
    c6e00000-c6efffff : 0000:3a:00.1
    c6f00000-c6ffffff : 0000:3a:00.1
    c7000000-c70fffff : 0000:3a:00.0
    c7100000-c71fffff : 0000:3a:00.0
  c7200000-c74fffff : PCI Bus 0000:23
    c7200000-c72fffff : 0000:23:00.0
    c7300000-c73fffff : 0000:23:00.0
    c7400000-c740ffff : 0000:23:00.0
      c7400000-c740ffff : megasas: LSI
  c7500000-c75007ff : 0000:00:1f.2
    c7500000-c75007ff : ahci
  c7501000-c75013ff : 0000:00:1d.0
    c7501000-c75013ff : ehci_hcd
  c7502000-c75023ff : 0000:00:1a.0
    c7502000-c75023ff : ehci_hcd
  c7504000-c7504fff : 0000:00:05.4
c7ffc000-c7ffcfff : dmar1
c8000000-fbffbfff : PCI Bus 0000:80
  f2000000-f5ffffff : PCI Bus 0000:90
    f2000000-f5ffffff : PCI Bus 0000:91
      f2000000-f2ffffff : PCI Bus 0000:98
      f3000000-f3ffffff : PCI Bus 0000:96
      f4000000-f4ffffff : PCI Bus 0000:94
      f5000000-f5ffffff : PCI Bus 0000:92
  f6000000-f64fffff : PCI Bus 0000:82
    f6000000-f61fffff : 0000:82:00.1
      f6000000-f61fffff : ixgbe
    f6200000-f63fffff : 0000:82:00.0
      f6200000-f63fffff : ixgbe
    f6400000-f6403fff : 0000:82:00.1
      f6400000-f6403fff : ixgbe
    f6404000-f6407fff : 0000:82:00.0
      f6404000-f6407fff : ixgbe
  f7000000-faffffff : PCI Bus 0000:90
    f7000000-faffffff : PCI Bus 0000:91
      f7000000-f7ffffff : PCI Bus 0000:98
      f8000000-f8ffffff : PCI Bus 0000:96
      f9000000-f9ffffff : PCI Bus 0000:94
      fa000000-faffffff : PCI Bus 0000:92
  fb000000-fb3fffff : PCI Bus 0000:82
    fb000000-fb0fffff : 0000:82:00.1
    fb100000-fb1fffff : 0000:82:00.1
    fb200000-fb2fffff : 0000:82:00.0
    fb300000-fb3fffff : 0000:82:00.0
  fb400000-fb400fff : 0000:80:05.4
fbffc000-fbffcfff : dmar0
fec00000-fecfffff : PNP0003:00
  fec00000-fec003ff : IOAPIC 0
  fec01000-fec013ff : IOAPIC 1
  fec40000-fec403ff : IOAPIC 2
fed00000-fed003ff : HPET 0
  fed00000-fed003ff : PNP0103:00
fed12000-fed1200f : pnp 00:01
fed12010-fed1201f : pnp 00:01
fed1b000-fed1bfff : pnp 00:01
fed1c000-fed1ffff : Reserved
  fed1f410-fed1f414 : iTCO_wdt.0.auto
fed45000-fed8bfff : pnp 00:01
fee00000-feefffff : pnp 00:01
  fee00000-fee00fff : Local APIC
ff000000-ffffffff : Reserved
  ff000000-ffffffff : pnp 00:01
100000000-607fffffff : System RAM
  2c0000000-2c0c00e10 : Kernel code
  2c0c00e11-2c141683f : Kernel data
  2c169e000-2c23fffff : Kernel bss
380000000000-383fffffffff : PCI Bus 0000:00
  383ffff00000-383ffff03fff : 0000:00:04.7
    383ffff00000-383ffff03fff : ioatdma
  383ffff04000-383ffff07fff : 0000:00:04.6
    383ffff04000-383ffff07fff : ioatdma
  383ffff08000-383ffff0bfff : 0000:00:04.5
    383ffff08000-383ffff0bfff : ioatdma
  383ffff0c000-383ffff0ffff : 0000:00:04.4
    383ffff0c000-383ffff0ffff : ioatdma
  383ffff10000-383ffff13fff : 0000:00:04.3
    383ffff10000-383ffff13fff : ioatdma
  383ffff14000-383ffff17fff : 0000:00:04.2
    383ffff14000-383ffff17fff : ioatdma
  383ffff18000-383ffff1bfff : 0000:00:04.1
    383ffff18000-383ffff1bfff : ioatdma
  383ffff1c000-383ffff1ffff : 0000:00:04.0
    383ffff1c000-383ffff1ffff : ioatdma
  383ffff20000-383ffff200ff : 0000:00:1f.3
  383ffff21000-383ffff2100f : 0000:00:16.1
  383ffff22000-383ffff2200f : 0000:00:16.0
384000000000-387fffffffff : PCI Bus 0000:80
  387ffff00000-387ffff03fff : 0000:80:04.7
    387ffff00000-387ffff03fff : ioatdma
  387ffff04000-387ffff07fff : 0000:80:04.6
    387ffff04000-387ffff07fff : ioatdma
  387ffff08000-387ffff0bfff : 0000:80:04.5
    387ffff08000-387ffff0bfff : ioatdma
  387ffff0c000-387ffff0ffff : 0000:80:04.4
    387ffff0c000-387ffff0ffff : ioatdma
  387ffff10000-387ffff13fff : 0000:80:04.3
    387ffff10000-387ffff13fff : ioatdma
  387ffff14000-387ffff17fff : 0000:80:04.2
    387ffff14000-387ffff17fff : ioatdma
  387ffff18000-387ffff1bfff : 0000:80:04.1
    387ffff18000-387ffff1bfff : ioatdma
  387ffff1c000-387ffff1ffff : 0000:80:04.0
    387ffff1c000-387ffff1ffff : ioatdma


=======





lspci -s 3d:00.0 -vvv -k 
3d:00.0 VGA compatible controller: Matrox Electronics Systems Ltd. MGA G200e [Pilot] ServerEngines (SEP1) (rev 05) (prog-if 00 [VGA controller])
	Subsystem: Oracle/SUN Device 4852
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 16
	NUMA node: 0
	Region 0: Memory at c5000000 (32-bit, non-prefetchable) [size=16M]
	Region 1: Memory at c6810000 (32-bit, non-prefetchable) [size=16K]
	Region 2: Memory at c6000000 (32-bit, non-prefetchable) [size=8M]
	Expansion ROM at 000c0000 [disabled] [size=128K]
	Capabilities: [dc] Power Management version 2
		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [e4] Express (v1) Legacy Endpoint, MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
		DevCtl:	Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 128 bytes
		DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x1, ASPM L0s, Exit Latency L0s <64ns
			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
	Capabilities: [54] MSI: Enable- Count=1/1 Maskable- 64bit-
		Address: 00000000  Data: 0000
	Kernel driver in use: mgag200
	Kernel modules: mgag200
Comment 19 John.p.donnelly 2019-11-15 22:11:03 UTC
wrt to comment 17.


The mgag device is integrated into Aspeed BMC device.
Comment 20 John.p.donnelly 2019-11-16 16:34:12 UTC
It appears to me the drm_gem_vram_pin() in mga_crtc_do_set_base()  in drivers/gpu/drm/mgag200/mgag200_mode.c. is not working as expected.


Would it be worth wild to restore this to the previous operations ?


The BMC component that contains this video device is used on a large variety of server systems with remote console /remote  management. I am concerned I discovered this early and other vendors have not used 5.1-rc5 (this change was done in May 2019 ) yet.
Comment 21 John.p.donnelly 2019-11-17 01:07:21 UTC
I reverted the offending 81da87f63a1edebcf8cbb811d387e353d9f89c7a changes only in the mgag200 mgag200_mode.c , and added the removed function drm_gem_vram_push_to_system() to the same file,  and the graphics work. Minimal change. 

The "offset" address that is passed to: mga_crtc_do_set_base() is 0 again.

That seems suspicious.

As noted in comment 16,  Why is the failing node the offset is 3MB ( 0x300000) by simply reverting minor modifications in mode.c ? 

Looking at the DRM logs using the tip ,  I get the same 0x300000 (3MB ) offset.
Comment 22 John.p.donnelly 2019-11-17 02:01:33 UTC
If I replace  drm_gem_vram_unpin(gbo) with the older drm_gem_vram_push_to_system(gbo)
at the tip. ( v5.4.0-rc6) I get a  GNOME login.


 static int mga_crtc_do_set_base(struct drm_crtc *crtc,
                                struct drm_framebuffer *fb,
                                int x, int y, int atomic)
@@ -866,7 +954,8 @@ static int mga_crtc_do_set_base(struct drm_crtc *crtc,
 
        if (!atomic && fb) {
                gbo = drm_gem_vram_of_gem(fb->obj[0]);
-               drm_gem_vram_unpin(gbo);
+               // drm_gem_vram_unpin(gbo);
+               drm_gem_vram_push_to_system(gbo);
        }
Comment 23 Thomas Zimmermann 2019-11-18 09:26:03 UTC
Hi John,

thank you so much for debugging this problem. I've been OoO on Friday and now I have to set zp my mgag200 machine anew. For all this I'm some what slow to respond ATM.

(In reply to John.p.donnelly from comment #22)
> If I replace  drm_gem_vram_unpin(gbo) with the older
> drm_gem_vram_push_to_system(gbo)
> at the tip. ( v5.4.0-rc6) I get a  GNOME login.
> 
> 
>  static int mga_crtc_do_set_base(struct drm_crtc *crtc,
>                                 struct drm_framebuffer *fb,
>                                 int x, int y, int atomic)
> @@ -866,7 +954,8 @@ static int mga_crtc_do_set_base(struct drm_crtc *crtc,
>  
>         if (!atomic && fb) {
>                 gbo = drm_gem_vram_of_gem(fb->obj[0]);
> -               drm_gem_vram_unpin(gbo);
> +               // drm_gem_vram_unpin(gbo);
> +               drm_gem_vram_push_to_system(gbo);
>         }

Some context to this code: push_to_system() explicitly kicked the buffer out of video memory (into system memory). But we don't want to do this in the driver. Evicting buffers is a decision that should be made by the memory manager. Therefore, we only unpin the buffer and leave evicting the buffer to the memory manager when the memory is actually required.

After reviewing the code for unpin(), I think this doesn't work. Buffer objects are never marked for being located in system memory.
Comment 24 Thomas Zimmermann 2019-11-18 09:29:46 UTC
Created attachment 145986 [details] [review]
drm/vram: Mark BO for VRAM and SYSTEM placement if pin count is zero

John, could you please remove the push_to_system() call, restore the unpin() call, apply the attached patch, and report back about the results?

After the final unpin, the buffer now gets marked for being located in video or system memory.
Comment 25 John.p.donnelly 2019-11-18 15:14:56 UTC
Hello Thomas. 

Thank you for helping out.

wrt comment 24.

I manually applied your patch to a fresh, clean  5.4.0-rc7 tip and I am still seeing the same behavior that no graphics is seen when GNOME starts:

31f4f5b495a6 2019-11-10 | Linux 5.4-rc7

# git diff
diff --git a/drivers/gpu/drm/drm_gem_vram_helper.c b/drivers/gpu/drm/drm_gem_vram_helper.c
index fd751078bae1..7263614ca8f4 100644
--- a/drivers/gpu/drm/drm_gem_vram_helper.c
+++ b/drivers/gpu/drm/drm_gem_vram_helper.c
@@ -265,6 +265,7 @@ int drm_gem_vram_unpin(struct drm_gem_vram_object *gbo)
        if (gbo->pin_count)
                goto out;
 
+       drm_gem_vram_placement(gbo, TTM_PL_FLAG_VRAM | TTM_PL_FLAG_SYSTEM);
        for (i = 0; i < gbo->placement.num_placement ; ++i)
                gbo->placements[i].flags &= ~TTM_PL_FLAG_NO_EVICT;




Since it appears the drm_gem_vram_unpin_locked() function has been removed in 5.4, I assumed the same behavior would apply to drm_gem_vram_unpin() ?  

I can try this test the commit that I isolated the regression in if that helps.

I am still seeing the offset applied in the 0x300000 ( 3MB ) range when I add an additional debug statement in mga_crtc_do_set_base() : 


272.169421] [drm:mga_crtc_do_set_base.isra.6.constprop.17 [mgag200]] jpd - setting start addr for 0x300000
Comment 26 Thomas Zimmermann 2019-11-18 16:07:01 UTC
Hi

(In reply to John.p.donnelly from comment #25)
> Hello Thomas. 
> 
> Thank you for helping out.
> 
> wrt comment 24.
> 
> I manually applied your patch to a fresh, clean  5.4.0-rc7 tip and I am
> still seeing the same behavior that no graphics is seen when GNOME starts:

Thanks for testing.

> 
> 31f4f5b495a6 2019-11-10 | Linux 5.4-rc7
> 
> # git diff
> diff --git a/drivers/gpu/drm/drm_gem_vram_helper.c
> b/drivers/gpu/drm/drm_gem_vram_helper.c
> index fd751078bae1..7263614ca8f4 100644
> --- a/drivers/gpu/drm/drm_gem_vram_helper.c
> +++ b/drivers/gpu/drm/drm_gem_vram_helper.c
> @@ -265,6 +265,7 @@ int drm_gem_vram_unpin(struct drm_gem_vram_object *gbo)
>         if (gbo->pin_count)
>                 goto out;
>  
> +       drm_gem_vram_placement(gbo, TTM_PL_FLAG_VRAM | TTM_PL_FLAG_SYSTEM);
>         for (i = 0; i < gbo->placement.num_placement ; ++i)
>                 gbo->placements[i].flags &= ~TTM_PL_FLAG_NO_EVICT;
> 
> 
> 
> 
> Since it appears the drm_gem_vram_unpin_locked() function has been removed
> in 5.4, I assumed the same behavior would apply to drm_gem_vram_unpin() ?  

drm_gem_vram_unpin_locked() function has been removed ? It's an internal static interface, so it may not show up in stack traces. But, yeah, the behavior applies to drm_gem_vram_unpin().

> 
> I can try this test the commit that I isolated the regression in if that
> helps.
> 
> I am still seeing the offset applied in the 0x300000 ( 3MB ) range when I
> add an additional debug statement in mga_crtc_do_set_base() : 
> 
> 
> 272.169421] [drm:mga_crtc_do_set_base.isra.6.constprop.17 [mgag200]] jpd -
> setting start addr for 0x300000

Daniel suspected that the controller doesn't respect the offset value, but expects an offset of zero. I'll provide patches to work around that.
Comment 27 John.p.donnelly 2019-11-18 16:53:26 UTC
They were  removed by you :-) 


commit 57c84d5c9348bda5e9129bc4e4e567546915ad8c
Author: Thomas Zimmermann <tzimmermann@suse.de>
Date:   Thu Jun 13 09:30:40 2019 +0200


    drm: Remove lock interfaces from GEM VRAM helpers
    
    The lock functions and the locked-pin/unpin functions of GEM VRAM are not
    required any longer. Remove them.

-----
Comment 28 Thomas Zimmermann 2019-11-19 05:32:13 UTC
(In reply to John.p.donnelly from comment #27)
> They were  removed by you :-) 
> 
> 
> commit 57c84d5c9348bda5e9129bc4e4e567546915ad8c
> Author: Thomas Zimmermann <tzimmermann@suse.de>
> Date:   Thu Jun 13 09:30:40 2019 +0200
> 
> 
>     drm: Remove lock interfaces from GEM VRAM helpers
>     
>     The lock functions and the locked-pin/unpin functions of GEM VRAM are not
>     required any longer. Remove them.
> 
> -----

Oh I see. I later introduced functions of the same name but for a different purpose.

commit bc25bb9192c0438d84bf69ab72de02d3a4c3f827
Author: Thomas Zimmermann <tzimmermann@suse.de>
Date:   Fri Sep 6 14:20:54 2019 +0200

    drm/vram: Acquire lock only once per call to vmap()/vunmap()
    
    The implementation of vmap() is a combined pin() and kmap(). As both
    functions share the same lock, we can make vmap() slightly faster by
    acquiring the lock only once for both operations. Same for the inverse,
    vunmap().
Comment 29 Martin Peres 2019-11-19 08:54:34 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/misc/issues/7.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.