Unable to free memory for geometry object

Hi everyone.
This question has been discussed several times on this forum, but it doesn’t seem to be completely resolved, so I’d like to ask it again.

I understand that when I create a geometry object in my dynamo python node, I need a Dispose to release it.
However, when running in multithread, there is a phenomenon that even if I dispose, it is not completely destroyed.

Check the script below.
This code repeats Cuboid generation and Dispose.

If I run this code in a thread, it continues to consume virtual memory even though I’m disposing it.
Dispose is effective because without Dispose, more memory is consumed.
However, the effect is insufficient.

In the scripts used in business, a large amount of Curve, Suface, etc. are processed in addition to Cuboid.
As a result, the memory used may exceed 50GB without being released as expected, and Revit will pop up a warning.

Is there any workaround?

This behavior occurs in Dynamo for Revit and Dynamo sandbox.
Python occurs in both IronPython and CPython.
Also, the larger the number of parallels, the larger the amount of memory leak.

# Load the Python Standard and DesignScript Libraries
import sys
import clr
clr.AddReference('ProtoGeometry')
from Autodesk.DesignScript.Geometry import *
from multiprocessing.pool import ThreadPool as Pool
import multiprocessing

def test():
	for i in range(1000):
		a = Cuboid.ByLengths(5, 5, 5)
		a.Dispose()

def mapRun():
	cpuCount = multiprocessing.cpu_count()
	pool = Pool(cpuCount)
	for i in range(64):
		pool.apply_async(test, )
	pool.close()
	pool.join()
	
def singleRun():
	for i in range(64):
		test()

for i in range(0, 100):
	mapRun()
	#singleRun()
	
OUT = 0

Thanks.

2 Likes

Hi @at_yan - what version of Dynamo are you testing in?

Also @at_yan are you able to reproduce the leak in a single threaded workflow?

Thank you for having interest.

Such a phenomenon does not occur when running in a single task on the main thread.
If I use singleRun() instead of mapRun() in the test program above, it’s fine.

It happens both in Dynamo for revit and in dynamo sandbox.
I tried on Revit 2021,2022,2023.

Hi @at_yan this took a while to figure out, but I believe the issue is actually in the way you create multiple pools instead of a single large pool.

I’ve made two changes here that reduce the leak from 50 megs to about 0.1 megs.

  1. I’ve used map instead of apply_async, though if order is unimportant you can try imap_unordered -this means we don’t need to retrieve multiple result objects and there is no looping over the pool.
  2. I’ve created only 1 pool, not 100 of them - this is the more important change. It’s unclear why this leads to a leak, but there are multiple questions about it on stackoverflow, I can only blame something in the Cpython implementation - because when I isolate libG use in parallel in c# alone I see much smaller leaks than when using this multiprocessing module.

My final advice is not use threadpool from multiprocessing (which is really made for multiprocessing, not multithreading) - it’s weird and seems untested in many versions of CPython. I would switch to concurrent.futures — Launching parallel tasks — Python 3.11.1 documentation as it’s more modern and idiomatic.

# Load the Python Standard and DesignScript Libraries
import sys
import clr
clr.AddReference('ProtoGeometry')
from Autodesk.DesignScript.Geometry import *
from multiprocessing.pool import ThreadPool as Pool
import multiprocessing

def test(a):
	for i in range(1000):
		a = Cuboid.ByLengths(5, 5, 5)
		a.Dispose()

def mapRun():
	cpuCount = multiprocessing.cpu_count()
	pool = Pool(cpuCount)
	pool.map(test, range(6400),)
	pool.close()
	pool.join()
	

mapRun()
	
OUT = 0
2 Likes

Hello @Michael_Kirschner2

Thank you for investigating.
Indeed, I have confirmed that creating a single large pool has the effect of greatly reducing memory consumption.
It is a strange phenomenon that is hard to understand why.
However, the structure of the original test script mimics the structure of my real business script.
In other words, it is difficult to completely integrate them into one large pool.
But, I got a hint from you, so I’ll continue the verification a little more.

Thanks again.

well another solution is to move to using parallel structures like parallel.foreach in c# directly. I saw MUCH better performance with full cpu utilization there instead of the 40% util in python.

Out of my curiosity, I made some tests

With multiprocessing.pool on CPython3
import sys
import clr
import System
from System.Collections.Generic import List
clr.AddReference('ProtoGeometry')
from Autodesk.DesignScript.Geometry import *

my_path = System.Environment.GetFolderPath(System.Environment.SpecialFolder.MyDocuments)
pf_path = System.Environment.GetFolderPath(System.Environment.SpecialFolder.ProgramFilesX86)

from multiprocessing.pool import ThreadPool as Pool
import multiprocessing
import logging


## Start create logger Object ##
logger = logging.getLogger("MemoryLoggerCpyPool")
# set to  DEBUG
logger.setLevel(logging.DEBUG)
formatter = logging.Formatter('%(asctime)s :: %(levelname)s :: %(funcName)s :: %(process)d :: %(message)s')
# create handler 
file_handler = logging.FileHandler(my_path + '\\MemoryLoggerCpyPool.log', mode='w')
file_handler.setLevel(logging.DEBUG)
file_handler.setFormatter(formatter)
logger.addHandler(file_handler)
logger.disabled = False

def test(a):
	global logger
	for i in range(1000):
		a = Cuboid.ByLengths(5, 5, 5)
		a.Dispose()
	#totalBytesOfMemoryUsed = currentProcess.WorkingSet64
	logger.debug(System.Diagnostics.Process.GetCurrentProcess().WorkingSet64)
	

def mapRun():
	cpuCount = multiprocessing.cpu_count()
	pool = Pool(cpuCount)
	pool.map(test, range(2000),)
	pool.close()
	pool.join()

logger.debug(System.Diagnostics.Process.GetCurrentProcess().WorkingSet64)
#
mapRun()
#				
logger.debug(System.Diagnostics.Process.GetCurrentProcess().WorkingSet64)	
OUT = 0

Result

With Linq AsParallel() on IronPython2
import sys
import clr
import System
from System.Collections.Generic import List
clr.AddReference('ProtoGeometry')
from Autodesk.DesignScript.Geometry import *
from System.Threading.Tasks import *
clr.AddReference("System.Core")
clr.ImportExtensions(System.Linq)
my_path = System.Environment.GetFolderPath(System.Environment.SpecialFolder.MyDocuments)
pf_path = System.Environment.GetFolderPath(System.Environment.SpecialFolder.ProgramFilesX86)
sys.path.append(pf_path + '\\IronPython 2.7\\Lib')
import logging

## Start create logger Object ##
logger = logging.getLogger("MemoryLoggerIpy")
# set to  DEBUG
logger.setLevel(logging.DEBUG)
formatter = logging.Formatter('%(asctime)s :: %(levelname)s :: %(funcName)s :: %(process)d :: %(message)s')
# create handler 
file_handler = logging.FileHandler(my_path + '\\MemoryLoggerIpy.log', mode='w')
file_handler.setLevel(logging.DEBUG)
file_handler.setFormatter(formatter)
logger.addHandler(file_handler)
logger.disabled = False

def test(a):
	global logger
	for i in range(1000):
		a = Cuboid.ByLengths(5, 5, 5)
		a.Dispose()
	#totalBytesOfMemoryUsed = currentProcess.WorkingSet64
	logger.debug(System.Diagnostics.Process.GetCurrentProcess().WorkingSet64)
	
	
logger.debug(System.Diagnostics.Process.GetCurrentProcess().WorkingSet64)
#
process_list = List[System.Int32](list(range(2000)))
threadResult = process_list.AsParallel()\
				.WithDegreeOfParallelism(System.Environment.ProcessorCount)\
				.Select(lambda l: test(l))\
				.ToList()
#				
logger.debug(System.Diagnostics.Process.GetCurrentProcess().WorkingSet64)	
OUT = 0

Result

With concurrent.futures on CPython3
import sys
import clr
import System
from System.Collections.Generic import List
clr.AddReference('ProtoGeometry')
from Autodesk.DesignScript.Geometry import *
import logging
from concurrent.futures import ThreadPoolExecutor


my_path = System.Environment.GetFolderPath(System.Environment.SpecialFolder.MyDocuments)
pf_path = System.Environment.GetFolderPath(System.Environment.SpecialFolder.ProgramFilesX86)

## Start create logger Object ##
logger = logging.getLogger("MemoryLoggerCpyConcurrentFutures")
# set to  DEBUG
logger.setLevel(logging.DEBUG)
formatter = logging.Formatter('%(asctime)s :: %(levelname)s :: %(funcName)s :: %(process)d :: %(message)s')
# create handler 
file_handler = logging.FileHandler(my_path + '\\MemoryLoggerCpyConcurrentFutures.log', mode='w')
file_handler.setLevel(logging.DEBUG)
file_handler.setFormatter(formatter)
logger.addHandler(file_handler)
logger.disabled = False

def test(*args):
	global logger
	for i in range(1000):
		a = Cuboid.ByLengths(5, 5, 5)
		a.Dispose()
	#totalBytesOfMemoryUsed = currentProcess.WorkingSet64
	logger.debug(System.Diagnostics.Process.GetCurrentProcess().WorkingSet64)

logger.debug(System.Diagnostics.Process.GetCurrentProcess().WorkingSet64)
inputlst = range(2000)
#
with ThreadPoolExecutor(len(inputlst)) as executor:
    results = executor.map(test, inputlst)
    #out = [r for r in results]
logger.debug(System.Diagnostics.Process.GetCurrentProcess().WorkingSet64)

OUT = 0

Result

Note:

  • running on DynamoSandBox
  • it may not be representative, there may be improvements/fixes to be made in the above scripts
3 Likes