Exporting large datasets to CSV/SQLite

csv
solar-analysis

#1

I have created a graph that performs solar analysis on 5 surfaces and outputs the results to a CSV file. I need to repeat this process for every sunlight hour of the year which obviously will result in a very large dataset.

So far, when I run the process for a short time span, say one or two days, the time necessary to complete the process remains reasonable, around 10 minutes, or less. But if I increase the timeframe of the study, the time needed to process doesn’t scale proportionally. I understand that this is a limitation of Dynamo as it has to keep all the data in memory. (I have turned off preview in both Dynamo and Revit)

Initially, I was outputting separate columns for x, y, and z but that necessitated list manipulation. So I have decided to trim down the process and tried to output to SQLite. One day, that is 12,000 lines of data took 5:40 (minutes, seconds) while going for two days with 24,000 lines of data took 33 minutes.

The difference with outputting to SQLite is that the data is exported sequentially, so monitoring the database in DB Browser for SQLite shows the table filling up gradually.

Clearly, the best way (at least for me) is to process in chunks as the more data that is generated, the slower Dynamo will become till it is no more reasonable. I also think that resetting the solar analysis node to clear its data is the way to go. Resetting the whole graph will force dynamo to load all geometries which will take additional time to reload them for each run.

I’m attaching the original graph that writes to CSV and the trimmed down version that writes to SQLite.

(Note that in the second graph, the lines that go off the screen go to frozen nodes).
Any hints on how to proceed from here will be appreciated.

Thanks.


#2

Have you considered using Ladybug instead of the built in nodes?


#3

I have but since I’m new to the whole Revit/Dynamo thing, that makes a lot of things to learn in a relatively short time span. Besides, I reached a point where the output with the built-in nodes is what I need, it’s just that it would take me some time to get all the data out. Ladybug is more difficult to learn as I have read around and it’s the impression I got when I looked into it, and I’m not even sure if I will get to the same spot where I need to automate a repeated task.

Right now, I’m outputting the data to SQLite, one day at a time, since it’s much faster than writing to CSV and the time needed to process seems to be consistent. I experimented with a mouse recorder and it can work, although I would prefer an integrated Dynamo solution as it requires less interaction.

So to break down my problem, my idea is to introduce a switch in the graph that would reset the solar analysis node when the date changes, or when the number of items in the list reaches a certain threshold. There are 1035 data points for each hour interval, so that is an easy number to calculate.

I have looked into “if” nodes and “while loop” nodes but couldn’t wrap my head around using those in my graph. I don’t know, maybe the switch could provide some blank data to the solar analysis node to force it to reset and clear the memory downstream then resume with the proper data. At least that’s one idea I had but couldn’t implement.

Anyway, as a last note, I intentionally omitted the solar-analysis tag when I first posted the question as I consider this issue more about dynamo processing than a specific solar analysis issue and I fear that people who might have an idea of a solution won’t look at the question thinking it’s pure solar-analysis related.


#4

An idea:
Run the script via dynamo player instead of via dynamo directly. This will ensure the data is clear at each run.
Set up a “Hour of the Year” value in an excel file, text file, or other file type, with the initial value at 0. Set up a read file or node to give you time to run the graph, let it drive the solar analysis, and then add 2 hours to it, and then use a write method to over write the value in the file. Assuming you do 2 hour intervals, you will only need to run the script… 4380 times… ok maybe a mouse recorder to hit that button every 7 minutes…

A word of caution: even at 7 minute intervals, you’re looking at a LOT of run time - the cocktail napkin math is ugly here. 2 hours per run, 24 hours a day, 365 days a year, 7 minutes per run (cant be as narrow as 5.5 or 6 incase dynamo doesn’t finish), 60 minutes per hour, 24 hours a day… well that’s an ugly number in any case.

I can’t say for certain, but I think that reviewing the application of the Ladybug nodes available for Dynamo could make things run a good clip faster. Worth a review as you’ll likely have enough time to learn it while your system runs the calculations anyway.


#5

Thanks for suggesting Dyanmo player. It’s what I chose to use and it’s working rather fine. I’m processing one day at a time and using a mouse recorder to increment the date and it is working. You might have misunderstood regarding the run numbers but it’s actually one day taking an average of 8 to 9 minutes. And a cool feature of the mouse recorder (http://www.mouserecorder.com) I’m using is that it detects pixel color change so I don’t have to set a timer to wait to repeat the action. I have to note though that the memory management is not optimal still. I can see the used memory increasing after several processing runs (more than 20 I think). So have to shut down Dynamo player and Revit and restart but that is a small cost so at the moment I’m nearly done.

Anyway, I will look into Ladybug if I need to do this again!