Bug 99521

Summary: xrandr makes Xorg memory leak on Ubuntu 16.04
Product: xorg Reporter: Xiaogang Chen <chenxiaogang_888>
Component: Server/DDX/XorgAssignee: Xorg Project Team <xorg-team>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: major    
Priority: high CC: jorge_monteagudo
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Xorg running under Valgrind log with a xrandr custom program running all night
none
Xorg running under Valgrind log with Michel patch
none
Xorg running under Valgrind log with Xiaogang patch
none
Xorg running under Valgrind log with Xiaogang patch after one night
none
Xorg running under Valgrind log with Xiaogang patch after one night with debug package installed
none
Sources related to memleak in 1.16.4 none

Description Xiaogang Chen 2017-01-24 17:48:07 UTC
Use Xserver shipped with Ubuntu 16.04. The version is 1.18.3. The gpu kernel driver is amdgpu.

When ran xrandr found memory leak from xorg. It is easy to see if keep running xrandr(such as in loop) and just use htop to monitor xorg virtual memory space or system memory consumption. The memory used will be increased fast.

Have debug, found the root cause is that Xorg allocates some memory for video modes defined by itself, but not release these memory. Have a fix locally. Should I submit the fix patches?

Xiaogang Chen
Comment 1 Michel Dänzer 2017-01-25 01:19:37 UTC
Yes, please submit the fixes to the xorg-devel@lists.x.org mailing list for review.
Comment 2 Xiaogang Chen 2017-02-01 15:21:48 UTC
Hi:

I am preparing the patches. This issue was found at server-1.18-branch, I think it also exists on master branch. Do I need to submit patches for both branches? or Xorg does not add fix for server-1.18-branch? 

Thanks.
Xiaogang
Comment 3 Michel Dänzer 2017-02-02 06:33:37 UTC
I don't think there will be any more upstream releases from the 1.18 branch, and fixes have to land on master first anyway, so just submit patches for master for now.
Comment 4 Xiaogang Chen 2017-02-20 18:59:03 UTC
Hi:

I have prepared a patch for master branch and sent to xorg-devel@lists.x.org. Please verify and let me know your opinion. Thanks.

Xiaogang
Comment 5 jorge_monteagudo 2017-02-22 06:19:51 UTC
Hi,

Could you attach it here too?

Thanks
Comment 6 Michel Dänzer 2017-02-22 07:08:25 UTC
*** Bug 99866 has been marked as a duplicate of this bug. ***
Comment 7 Michel Dänzer 2017-02-22 07:09:38 UTC
See https://patchwork.freedesktop.org/patch/140031/ for the patch and review feedback.
Comment 8 jorge_monteagudo 2017-02-22 08:50:12 UTC
Created attachment 129819 [details]
Xorg running under Valgrind log with a xrandr custom program running all night
Comment 9 jorge_monteagudo 2017-02-22 08:52:24 UTC
This log is from bug 99866 marked as duplicated from this one... Hope it helps!
Comment 10 Michel Dänzer 2017-02-22 09:36:45 UTC
Please test https://patchwork.freedesktop.org/patch/140197/
Comment 11 Xiaogang Chen 2017-02-22 21:52:42 UTC
Hi Michel: I tested your patch with same script I used before(while true; do xrandr --verbose; done), the memory leak is still there. Actually the xf86PruneDuplicateModes() is called before at xf86DDCGetModes() which is called by xf86EdidMonitorSet().

I think the point here is: why we need to have these mode? I do not find these mode got used afterword. Does anyone know the purpose that xf86DDCGetModes() generates these modes?

Jorge: have you tested this patch or my patch?

Thanks
Xiaogang
Comment 12 jorge_monteagudo 2017-02-23 06:17:01 UTC
Hi, I've tested Michel patch. After all night long there are still memory leaks but a little one. Attached is the valgrind log

Regards
Comment 13 jorge_monteagudo 2017-02-23 06:17:38 UTC
Created attachment 129858 [details]
Xorg running under Valgrind log with Michel patch
Comment 14 jorge_monteagudo 2017-02-23 06:20:31 UTC
Basically with the patch I see:

==3429== 1,816,448 bytes in 14,191 blocks are definitely lost in loss record 462 of 462
==3429==    at 0x4C28C20: malloc (vg_replace_malloc.c:296)
==3429==    by 0x1EAFD4: xf86SetDDCproperties (in /usr/bin/Xorg.valgrind-testing)
==3429==    by 0x9728257: amd_xserver116_xf86OutputSetEDID (in /usr/lib/xorg/modules/drivers/fglrx_drv.so)
==3429==    by 0x8FA0EEC: amd_xf86OutputSetEDID (in /usr/lib/xorg/modules/drivers/fglrx_drv.so)
==3429==    by 0x91E2638: atiddxDisplayMonitorCallbackDetect (in /usr/lib/xorg/modules/drivers/fglrx_drv.so)
==3429==    by 0x972677D: amd_xserver116_xf86ProbeOutputModes (in /usr/lib/xorg/modules/drivers/fglrx_drv.so)
==3429==    by 0x97307D2: xf86RandR12GetInfo12 (in /usr/lib/xorg/modules/drivers/fglrx_drv.so)
==3429==    by 0x224D51: RRGetInfo (in /usr/bin/Xorg.valgrind-testing)
==3429==    by 0x22C11E: ??? (in /usr/bin/Xorg.valgrind-testing)
==3429==    by 0x15F3F6: ??? (in /usr/bin/Xorg.valgrind-testing)
==3429==    by 0x163595: ??? (in /usr/bin/Xorg.valgrind-testing)
==3429==    by 0x6B96B44: (below main) (libc-start.c:287)
==3429== 
==3429== LEAK SUMMARY:
==3429==    definitely lost: 1,829,989 bytes in 14,379 blocks
==3429==    indirectly lost: 174,612 bytes in 201 blocks
==3429==      possibly lost: 50,248 bytes in 749 blocks
==3429==    still reachable: 1,491,229 bytes in 5,051 blocks
==3429==         suppressed: 0 bytes in 0 blocks


And in the Xorg.0.log the endless notification of new mode detected:

[ 54258.380] (II) fglrx(0): EDID vendor "TOV", prod id 21
[ 54258.381] (II) fglrx(0): Printing DDC gathered Modelines:
[ 54258.382] (II) fglrx(0): Modeline "1920x1080"x0.0  138.50  1920 1968 2000 2080  1080 1083 1088 1111 +hsync -vsync (66.6 kHz eP)
[ 54258.382] (II) fglrx(0): Modeline "1366x768"x0.0   72.01  1366 1414 1446 1528  768 770 775 790 -hsync -vsync (47.1 kHz e)
[ 54258.382] (II) fglrx(0): Modeline "1280x800"x0.0   83.48  1280 1352 1480 1680  800 803 809 831 -hsync +vsync (49.7 kHz e)
[ 54258.383] (II) fglrx(0): Modeline "800x600"x0.0   40.00  800 840 968 1056  600 601 605 628 +hsync +vsync (37.9 kHz e)
[ 54258.383] (II) fglrx(0): Modeline "640x480"x0.0   25.18  640 656 752 800  480 490 492 525 -hsync -vsync (31.5 kHz e)
[ 54258.383] (II) fglrx(0): Modeline "720x400"x0.0   28.32  720 738 846 900  400 412 414 449 -hsync +vsync (31.5 kHz e)
[ 54258.384] (II) fglrx(0): Modeline "1024x768"x0.0   65.00  1024 1048 1184 1344  768 771 777 806 -hsync -vsync (48.4 kHz e)
[ 54258.384] (II) fglrx(0): Modeline "1920x1080"x60.0  172.80  1920 2040 2248 2576  1080 1081 1084 1118 -hsync +vsync (67.1 kHz e)
[ 54258.384] (II) fglrx(0): Modeline "1600x900"x60.0  119.00  1600 1696 1864 2128  900 901 904 932 -hsync +vsync (55.9 kHz e)
[ 54258.385] (II) fglrx(0): Modeline "1280x1024"x0.0  108.00  1280 1328 1440 1688  1024 1025 1028 1066 +hsync +vsync (64.0 kHz e)
[ 54258.385] (II) fglrx(0): Modeline "1280x720"x60.0   74.48  1280 1336 1472 1664  720 721 724 746 -hsync +vsync (44.8 kHz e)
Comment 15 Xiaogang Chen 2017-02-23 06:41:15 UTC
Hi Jorge:

in your valgrind log I still see:

==3429==    by 0x1DD1A0: xf86DDCGetModes (in /usr/bin/Xorg.valgrind-testing)
==3429==    by 0x1DD2D4: xf86EdidMonitorSet (in /usr/bin/Xorg.valgrind-testing

This means the memory leak from the modes generated by xf86DDCGetModes still there.

Can you do same test with patch: https://patchwork.freedesktop.org/patch/140031/ ?

Thanks
Xiaogang
Comment 16 jorge_monteagudo 2017-02-23 07:13:20 UTC
Hi Xiaogang,

Ok, I'll do the test with your patch and I'll upload the valgrind log...

Regards
Comment 17 jorge_monteagudo 2017-02-23 11:05:01 UTC
With the Xiaogang patch I see no memory leak after one hour running the custom xrandr... See the valgrind log attached. I'll try to leave the test running all night to verify it.

Regards
Comment 18 jorge_monteagudo 2017-02-23 11:05:29 UTC
Created attachment 129863 [details]
Xorg running under Valgrind log with Xiaogang patch
Comment 19 jorge_monteagudo 2017-02-24 06:36:29 UTC
Bad news... after one night long running the test with Xorg patched with Xiaogang patch I have still memory leaks... Basically:

==677== 1,583,232 bytes in 12,369 blocks are definitely lost in loss record 455 of 455
==677==    at 0x4C28C20: malloc (vg_replace_malloc.c:296)
==677==    by 0x1EAFA4: xf86SetDDCproperties (in /usr/bin/Xorg.valgrind-testing)
==677==    by 0x9728257: amd_xserver116_xf86OutputSetEDID (in /usr/lib/xorg/modules/drivers/fglrx_drv.so)
==677==    by 0x8FA0EEC: amd_xf86OutputSetEDID (in /usr/lib/xorg/modules/drivers/fglrx_drv.so)
==677==    by 0x91E2638: atiddxDisplayMonitorCallbackDetect (in /usr/lib/xorg/modules/drivers/fglrx_drv.so)
==677==    by 0x972677D: amd_xserver116_xf86ProbeOutputModes (in /usr/lib/xorg/modules/drivers/fglrx_drv.so)
==677==    by 0x97307D2: xf86RandR12GetInfo12 (in /usr/lib/xorg/modules/drivers/fglrx_drv.so)
==677==    by 0x224D21: RRGetInfo (in /usr/bin/Xorg.valgrind-testing)
==677==    by 0x22C0EE: ??? (in /usr/bin/Xorg.valgrind-testing)
==677==    by 0x15F3F6: ??? (in /usr/bin/Xorg.valgrind-testing)
==677==    by 0x163595: ??? (in /usr/bin/Xorg.valgrind-testing)
==677==    by 0x6B96B44: (below main) (libc-start.c:287)
==677== 
==677== LEAK SUMMARY:
==677==    definitely lost: 1,596,773 bytes in 12,557 blocks
==677==    indirectly lost: 174,484 bytes in 200 blocks
==677==      possibly lost: 50,376 bytes in 750 blocks
==677==    still reachable: 1,489,934 bytes in 5,037 blocks
==677==         suppressed: 0 bytes in 0 blocks

Attached is the complete log

regards
Comment 20 jorge_monteagudo 2017-02-24 06:37:09 UTC
Created attachment 129892 [details]
Xorg running under Valgrind log with Xiaogang patch after one night
Comment 21 Xiaogang Chen 2017-02-26 00:15:32 UTC
Hi Jorge:

Thank you for testing. I read the valgrind log, there is NO these functions:

xf86DDCGetModes 
xf86EdidMonitorSet

My patch targets the memory leak introduced by this function call chain when xrand runs.

I use htop to monitor Xorg memory usage in overnight test, and memleax https://github.com/WuBingzheng/memleax, to find the location of memory leak.

Regards
Xiaogang
Comment 22 jorge_monteagudo 2017-02-26 06:52:37 UTC
Hi Xiaogang,

Maybe it's a memleak from the Xorg version I'm using. My debian 8 system has
X.Org X Server 1.16.4 and yours is 1.18.3 as I read in your first post... 

Regards
Jorge
Comment 23 jorge_monteagudo 2017-02-28 06:45:47 UTC
Hi,

A new overnight valgrind log with the debug package installed to get a more concise output. Basically

==758== 2,395,648 bytes in 18,716 blocks are definitely lost in loss record 460 of 460
==758==    at 0x4C28C20: malloc (vg_replace_malloc.c:296)
==758==    by 0x1EAFA4: edidMakeAtom (ddcProperty.c:43)
==758==    by 0x1EAFA4: addRootWindowProperties (ddcProperty.c:64)
==758==    by 0x1EAFA4: xf86SetDDCproperties (ddcProperty.c:83)
==758==    by 0x9728257: amd_xserver116_xf86OutputSetEDID (in /usr/lib/xorg/modules/drivers/fglrx_drv.so)
==758==    by 0x8FA0EEC: amd_xf86OutputSetEDID (in /usr/lib/xorg/modules/drivers/fglrx_drv.so)
==758==    by 0x91E2638: atiddxDisplayMonitorCallbackDetect (in /usr/lib/xorg/modules/drivers/fglrx_drv.so)
==758==    by 0x972677D: amd_xserver116_xf86ProbeOutputModes (in /usr/lib/xorg/modules/drivers/fglrx_drv.so)
==758==    by 0x97307D2: xf86RandR12GetInfo12 (in /usr/lib/xorg/modules/drivers/fglrx_drv.so)
==758==    by 0x224D21: RRGetInfo (rrinfo.c:196)
==758==    by 0x22C0EE: rrGetScreenResources (rrscreen.c:501)
==758==    by 0x15F3F6: Dispatch (dispatch.c:432)
==758==    by 0x163595: dix_main (main.c:296)
==758==    by 0x6B96B44: (below main) (libc-start.c:287)

Michel, Xiaogang Chen, how can I patch this memleak?? I don't understand very well the code involved and I don't know where to begin...

Regards

Note: I've added the related source code files from the xorg-server-1.16.4 package
Comment 24 jorge_monteagudo 2017-02-28 06:47:06 UTC
Created attachment 129971 [details]
Xorg running under Valgrind log with Xiaogang patch after one night with debug package installed
Comment 25 jorge_monteagudo 2017-02-28 06:50:38 UTC
Created attachment 129972 [details]
Sources related to memleak in 1.16.4
Comment 26 Michel Dänzer 2017-03-09 08:23:54 UTC
(In reply to jorge_monteagudo from comment #23)
> A new overnight valgrind log with the debug package installed to get a more
> concise output. Basically
> 
> ==758== 2,395,648 bytes in 18,716 blocks are definitely lost in loss record
> 460 of 460
> ==758==    at 0x4C28C20: malloc (vg_replace_malloc.c:296)
> ==758==    by 0x1EAFA4: edidMakeAtom (ddcProperty.c:43)
> ==758==    by 0x1EAFA4: addRootWindowProperties (ddcProperty.c:64)
> ==758==    by 0x1EAFA4: xf86SetDDCproperties (ddcProperty.c:83)

I only see something like this on the first invocation of xrandr, not on consequent invocations, so I suspect this leak has been fixed in 1.19, e.g. by Adam Jackson's commits surrounding 7961377567f15dfad9d96c5c0a0992b38013d973.
Comment 27 Michel Dänzer 2017-03-22 02:27:16 UTC
Thanks for the report. The issue in xf86EdidMonitorSet is fixed with the commit below in Git master. If you're still seeing growing memory usage with Git master, please file another report with details about that, preferably from valgrind vgdb as described in https://lists.x.org/archives/xorg-devel/2017-March/053027.html .

commit fdc79fe72bc0b97776df2c3a664076c60e08a87c
Author: Michel Dänzer <michel.daenzer@amd.com>
Date:   Thu Mar 9 17:34:55 2017 +0900

    edid: Prune duplicates after adding modes from DDC

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.