Bug 24119

Summary: [G45] Possible memory leak with Blender use
Product: Mesa Reporter: Sven Arvidsson <sa>
Component: Drivers/DRI/i965Assignee: Eric Anholt <eric>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium CC: jan.steffens, wolfgang.kufner
Version: git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: blender model
meminfo before blender use
meminfo after blender use
periodic_swap_hills.png
vmstat output including the time of the periodic_swap_hills.png screenshot

Description Sven Arvidsson 2009-09-23 10:56:41 UTC
Created attachment 29808 [details]
blender model

I think I have stumbled upon a memory leak somewhere which happens during a certain operation in Blender.

When armature is moved around in pose mode in Blender, the system quickly becomes unresponsive, wants to swap, and the OOM killer kicks in. 

Whatever is using all memory does not seem to show up in top output but I very much doubt it's Blender since this is a very simple model, and it's well behaved if software rendering is forced.

Examining /proc/meminfo does show a huge spike in Active memory (see attched for full output).

I'm not sure how to go on investigation this so I'm attaching the .blend file of the model and hope it's easy to reproduce. Simply load the model, press G on the keyboard and move the selected leg around with the mouse. On my system (4GB of RAM) it takes about 15-20 seconds of movement before the system is becoming unresponsive.

System environment:
-- chipset: G45 / ICH10R
-- system architecture: 32-bit
-- Linux distribution: Debian unstable
-- Machine or mobo model: Asus P5Q-EM
-- Display connector: DVI
-- KMS: enabled
-- xf86-video-intel: efbcf29dd1a1ca058b7a2a93f0685102c06c9369
-- xserver: 1.6.3.901
-- mesa: de25f82067bca5231fb968190f6c12cb517d62ff
-- drm: 67e4172394a88d4922fb8d9c7c3d96ce7e02c5a6
-- kernel: 2.6.31
Comment 1 Sven Arvidsson 2009-09-23 10:57:15 UTC
Created attachment 29809 [details]
meminfo before blender use
Comment 2 Sven Arvidsson 2009-09-23 10:57:33 UTC
Created attachment 29810 [details]
meminfo after blender use
Comment 3 Wolfgang Kufner 2009-09-29 09:20:05 UTC
Now that is a funny one.

The presence or absence of swap is the key factor in the behavior of this bug.

I have been seeing the following on Mobile 4 [8086:2a42] (rev 07) with mesa 7.6.0~git20090928.6829ef74-0ubuntu1~xup~1:

with 2.4 GByte of swap:
Continuously wiggling the leg of the little fellow as per your instructions I saw periodic bursts of swap use. See attachment periodic_swap_hills.png. The two hills in the screenshot are start less then a minute apart. I also attach the vmstat output that was running the whole time. (The vmstat output ends just after the screenshot was taken but begins several minutes earlier and shows signs of many more such "hills".) 

without swap:
I reproduced your bug exactly. 

I will next try to reproduce the bug on the other notebook (same graphics, but mesa master from xorg-edgers). I had started out there and seen one of those active memory bursts that go with the swap hills. I did not know what to make of it then because I was looking for the drastic behavior you were describing. That behavior I did not see because I had 4.5GByte swap there. So what I saw there was active memory rising by about 600MB, but then falling again.
Comment 4 Wolfgang Kufner 2009-09-29 09:20:59 UTC
Created attachment 29945 [details]
periodic_swap_hills.png
Comment 5 Wolfgang Kufner 2009-09-29 09:22:37 UTC
Created attachment 29946 [details]
vmstat output including the time of the periodic_swap_hills.png screenshot
Comment 6 Sven Arvidsson 2009-09-29 09:41:35 UTC
That's an interesting observation. My system does not have any swap, which I should have mentioned before.
Comment 7 Wolfgang Kufner 2009-09-29 10:10:01 UTC
I just reproduced your bug successfully with mesa master on first try after I switched swap off. Everything as you described.

As for mentioning that you have no swap. I saw that in your meminfo attachment. What got me to really notice swap was that I saw that I had it inadvertently off on the second machine I tried to replicate your bug on. After replicating it all of a sudden where I had failed on the other notebook before.
Comment 8 Sven Arvidsson 2009-09-29 12:45:18 UTC
This is also reproducible with linux 2.6.32-rc1.

Output of cat /sys/kernel/debug/dri/0/gem_objects when this happens:

2253 objects
-566599680 object bytes
5 pinned
15597568 pin bytes
36925440 gtt bytes
134217728 gtt total

Does the negative value for object bytes mean that it wraps around? That would mean that it does use quite a lot of memory...
Comment 9 Wolfgang Kufner 2009-09-29 13:11:17 UTC
I just replicated this bug with the even older 7.6.0~git20090817.7c422387-0ubuntu6 that is currently in ubuntu karmic.
Basically the same behaviour though I did not see OOM killer messages, but the desktop became unresponsive with swap off.
With swap on I saw those hills again though it took me about 3 minutes or so to trigger the bug. That went faster last time. It might not always be easy to get the buggy behaviour.
Comment 10 Eric Anholt 2009-09-30 11:30:00 UTC
Could you retest with this in mesa:

commit 49fbdd18ed738feaf73b7faba4d3577cd9cc3e59
Author: Eric Anholt <eric@anholt.net>
Date:   Thu Feb 12 03:54:58 2009 -0800

    i965: Fix massive memory allocation for streaming texture usage.
    
    Once we've freed a miptree, we won't see any more state cache requests
    that would hit the things that pointed at it until we've let the miptree
    get released back into the BO cache to be reused.  By leaving those
    surface state and binding table pointers that pointed at it around, we
    would end up with up to (500 * texture size) in memory uselessly consumed
    by the state cache.
    
    Bug #20057
    Bug #23530
Comment 11 Sven Arvidsson 2009-09-30 13:15:45 UTC
Unfortunately no, I can still reproduce it with 49fbdd18ed738feaf73b7faba4d3577cd9cc3e59
Comment 12 Eric Anholt 2010-03-12 17:14:40 UTC
*** Bug 26461 has been marked as a duplicate of this bug. ***
Comment 13 Eric Anholt 2010-05-04 21:43:21 UTC
What it's not:
- bo reuse
- leaking of texture objects or images
- texture tiling overallocation

INTEL_DEBUG=buf suggests that we've got a ton of SS_SURFACE/SS_SURF_BIND laying around.  Maybe that broke.
Comment 14 Eric Anholt 2010-05-04 22:07:05 UTC
commit ce914fff0817cb3c25a2d715f8435c6b6d6fbcdd
Author: Eric Anholt <eric@anholt.net>
Date:   Tue May 4 22:02:18 2010 -0700

    i965: When an RB gets a new region, clear the old from the state cache.
    
    This prevents memory usage explosion in blender due to the state cache
    hanging on to old fake frontbuffer regions.  Sigh at blender still
    using frontbuffer rendering.
    
    Bug #24119.
Comment 15 Sven Arvidsson 2010-05-05 05:54:57 UTC
Awesome! You rock! :)