Export data to Excel using Python/Microsoft Interop

I’ve been trying to pick apart the methods and logic used to export a basic 2D matrix of data to a worksheet in an existing file today.

I managed to work with some code from Bumblebee to get importing to work consistently, but have been hitting a lot of errors with exporting of data. I know part of this relates to how COM objects are released and the XlApp is quit, as files are sometimes left in a readonly state until I restart, but generally the warnings indicate I’m also getting something wrong with my writing methodology as well. They’re not clear in their message to me, mostly related to structure of internalized methods in the interop tools I think. I did get the data to export once somehow, but couldn’t identify the condition which allowed this to happen.

I’m aware Bumblebee is a thing, and there are XML excel nodes in 2022 and beyond, but am trying to make a super simple export node for Crumple to help people stuck back in 2020 that want something which works and can be copied out as a Python script if needed without the need for python libraries and packages. If anyone can see what might be the issue in the below script your help is much appreciated.

I’m guessing this may be a wicked problem as I couldn’t find any Python based exporters beyond Bumblebee and threads related to them. The internet hasn’t been much good either as most people just use custom Python libraries to export Excel as well (xlrd etc.).

Export to excel script.dyn (14.1 KB)

# Some function code borrowed and adjusted from Bumblebee
# Free for use under lisence on github
# https://github.com/ksobon/Bumblebee

# Boilerplate text
import clr
import System

# import excel for reading
clr.AddReference("Microsoft.Office.Interop.Excel")
from Microsoft.Office.Interop import Excel
System.Threading.Thread.CurrentThread.CurrentCulture = System.Globalization.CultureInfo("en-US")
from System.Runtime.InteropServices import Marshal

# sets up Excel
def SetUp(xlApp):
	# supress updates and warning pop ups
	xlApp.Visible = False
	xlApp.DisplayAlerts = False
	xlApp.ScreenUpdating = False
	return xlApp

# exit excel once worksheet is read
def ExitExcel(xlApp, wb, ws):
	# clean up before exiting excel, if any COM object remains
	# unreleased then excel crashes on open following time
	def CleanUp(_list):
		if isinstance(_list, list):
			for i in _list:
				Marshal.ReleaseComObject(i)
		else:
			Marshal.ReleaseComObject(_list)
		return None
		
	wb.SaveAs(unicode(filePath))
	xlApp.ActiveWorkbook.Close(False)
	xlApp.ScreenUpdating = True
	CleanUp([ws,wb,xlApp])
	return None

# write a basic 2D matrix to Excel
# yes I could use enumerate but want to work with it expanded for now
def WriteData(ws,data):
	rowCounter = 0
	for row in data:
		colCounter = 0
		for col in row:
			ws.Cells[rowCounter,colCounter] = col
			colCounter += 1
		rowCounter +=1
	return None

# inputs
filePath = IN[0]
sheetName = IN[1]
dataIn = IN[2]

# try to get Excel data
try:
	xlApp = SetUp(Excel.ApplicationClass())
	xlApp.Workbooks.open(unicode(filePath))
	wb = xlApp.ActiveWorkbook
	ws = xlApp.Sheets(sheetName)
	WriteData(ws,dataIn)
	ExitExcel(xlApp, wb, ws)
	report = "Data retrieved successfully."
except:
	xlApp.Quit()
	Marshal.ReleaseComObject(xlApp)
	report = "Data not retrieved successfully. Make sure sheet name is correct"

# return result
OUT = report
1 Like

It is a rather difficult issue, and I don’t know that using the MS interop library removes the install issues, just shifts to a different version. There is an Excel interop package(think that is the name actually) which is pretty simple already which more or less does this, and if tends to work when the repair tool is needed but not always when it isn’t needed… which is annoying.

You might want to look at the code base for the XML based exporter, as I am told that can work even when excel isn’t installed, so MS can’t break it. The relevant link may help in such an effort.

1 Like

There must be various degrees of brokenness to the repair i think. I’ve had luck with interop on all machines that had at least one repair in the past, but each windows update seems to fry the ootb node. I’ll look into that library in the meantime though, cheers!

Is the openxml sdk available outside of dynamo typically, or is it part of the 2022 onwards limitation?

1 Like

It’s an external dependency, but a stable one for pre-2022 builds as MS won’t alter that during any windows or office update (where the office updates occur). Just three years to go, we are 25% through the mess…

You may want to keep the input order to match the OOTB xml nodes so in a few years it can be mapped to auto swap the node out programmatically.

1 Like

Try using GitHub - ExcelDataReader/ExcelDataReader: Lightweight and fast library written in C# for reading Microsoft Excel files instead. I’ve not used it in Python before so not sure whether it will work or not, but its a lot simpler to use than the Office.Interop library.

2 Likes

Thanks Thomas! I guess I was hoping there’d be a local way for machines which likely wont have the necessary files on them. The more I research the more it looks like having an appendable library to support excel read/write (from the package itself as an extra) is the go, although at that point I’d just use Bumblebee probably.

No problem - also that library I mentioned is only a reader! I just noticed you wanted to export so it wont be very useful for that.

1 Like

No worries - still helpful as it indicates to me that most people with experience in this area ended up looking outside of the standard interop tools. Might begin playing with this one…

1 Like

with Excel Interrop it is preferable to work with ‘Range’
I wrote some example in my Github
Only works with IronPython (2 or 3).

With PythonNet (CPython3 Engine) does not support COM objects very well , it will be preferable to work with XML modifications with libraries such as openpyxml or Open XML SDK (integrate with Dynamo since version 2.10)

5 Likes

Thanks Cyril, this will be very helpful for both learning and potentially integrating into a set of nodes. I’ll look into alternatives such as openpyxml in the short-mid term as well for sure.

Appreciate your tips and speedy replies everyone!

I have tested it on a machine with repair issues and pleased to report it works. The stitched code from the Python script in Dynamo is below if anyone needs to use it. Beautifully written and easy to use/work with as always Cyril - have just begun making classes myself and really appreciate how you put these examples together:

# Copied and pasted/modified by Gavin Crump
# Based mostly on code by Cyril Poupin
# https://github.com/Cyril-Pop/IronPython-Samples/blob/master/Excel%20Interop/Import_Export_Xls_v2.0.py

# Boilerplate text
import clr

import System
from System import Array
from System.Collections.Generic import *

clr.AddReference('System.Drawing')
import System.Drawing
from System.Drawing import *

try:
	clr.AddReferenceByName('Microsoft.Office.Interop.Excel, Version=11.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c')
except:
	clr.AddReference('Microsoft.Office.Interop.Excel')

from Microsoft.Office.Interop import Excel
from System.Runtime.InteropServices import Marshal

xlDirecDown = System.Enum.Parse(Excel.XlDirection, "xlDown")
xlDirecRight = System.Enum.Parse(Excel.XlDirection, "xlToRight")
xlDirecUp = System.Enum.Parse(Excel.XlDirection, "xlUp")
xlDirecLeft = System.Enum.Parse(Excel.XlDirection, "xlToLeft")

class ExcelUtils():
	def __init__(self, lstData, filepath):
		self.lstData = lstData
		self.filepath = filepath

	def exportXls(self, wsName):
		
		ex = Excel.ApplicationClass()
		ex.Visible = True
		ex.DisplayAlerts = False
		workbook = ex.Workbooks.Add()
		workbook.SaveAs(self.filepath)
		ws = workbook.Worksheets[1]
		ws.Name = wsName
		nbr_row = len(self.lstData)
		nbr_colum = len(self.lstData[0])
		xlrange  = ws.Range[ws.Cells(1, 1), ws.Cells(nbr_row, nbr_colum)]
		a = Array.CreateInstance(object, nbr_row, nbr_colum)
		for indexR, row in enumerate(self.lstData):
			for indexC , value in  enumerate(row):
				a[indexR,indexC] = value
				
		#copy Array in range			
		xlrange.Value2 = a		
		used_range = ws.UsedRange	
		for column in used_range.Columns:
			column.AutoFit()
		# apply style
		missing = System.Type.Missing
		try:
			currentWs.ListObjects.Add(Excel.XlListObjectSourceType.xlSrcRange, xlrange, missing, Excel.XlYesNoGuess.xlGuess, missing).Name = "WFTableStyle"
			currentWs.ListObjects["WFTableStyle"].TableStyle = "TableStyleMedium6"
		except:pass

# Preparing input from dynamo to revit
filepath = IN[0]
sheetName = IN[1]
dataMatrix = IN[2]

# Try export the excel file
try:
	classTry = ExcelUtils(dataMatrix,filepath)
	classTry.exportXls(sheetName)
	OUT = [filepath,"Export successful"]
except:
	OUT = [None,"Export unsuccessful"]
5 Likes

do not use the return key word in __init__ function, it’s a instance initializer :wink:

3 Likes

I tried but I get wrong something.
image

Is possible to use the OOTB node in Python instead? I am trying to do this nosense:

clr.AddReference("DSOffice")
import DSOffice
from DSOffice import *

filepath=IN[0]
sheetName=IN[1]
data=IN[2]
overWrite=True
starts=0

OUT = DSOffice.Data.ExportExcel(filepath,sheetName,starts,starts,data,overWrite)

You’ll get the same issues out of using the dynamo excel reader via Python as you will via the node. As Jacob pointed out your data isn’t an array.

If you want to pull apart my Excel code I landed on after this thread using Cyril’s code you can find it in the latest version of Crumple under Application > Excel. For the most part it’s not like the default Excel node and much more complex in Python, but it works well in my experience in lieu of the default Excel nodes as long as the user has online repaired windows at least once (whereas the ootb node seems to break every time a windows update is installed). Unconfirmed if that is a reliable fix though.

I saw apparently a trick to make it an array but I do not understand it well as I am newbie in python, but it works

filepath=IN[0]
sheetName="lolol"
dataIn=[[2,1,3],[2,1,4],[4,4,3]]
overWrite=True
starts=0

list = []
for l in dataIn:
	list.append(Array[object](l))

listDataIN = Array[Array[object]](Array[object](list))

exportoExcel = DSOffice.Data.ExportExcel(filepath,sheetName,starts,starts,listDataIN,overWrite)
OUT = exportoExcel

In this case we just mean the data should typically be a list of lists. I’m not sure if the excel node literally expects an array so maybe it does, but in Dynamo at least a list of lists is typically the go.

Hi @GavinCrump,
Could you please help me with one issue?
I have one existing Excel file with multiple sheets, and I want to delete one sheet from that excel by using Revit dynamo python script. for example, the sheet name is “Sheet1”. I am using Revit 2023.

I believe this is the method you will need to apply to a worksheet object to delete it, using the same API that microsoft provides via the interop namespace:

1 Like

Thanks for your response @GavinCrump ,
Could you please write me the python code to do the above-described operation?
I am very new to API and you are one of them I follow to improve my knowledge.

@GavinCrump
Also, I was trying to use “ExportAsFile” node from Crumple package, but it’s not producing multiple worksheets in a workbook. I have tried to modify the python script but still having issues.
Please check the below code for your review:

# Copied and pasted/modified by Gavin Crump
# Free for use (besides any terms by author below)
# BIM Guru, www.bimguru.com.au

# Based mostly on code by Cyril Poupin
# https://github.com/Cyril-Pop/IronPython-Samples/blob/master/Excel%20Interop/Import_Export_Xls_v2.0.py
# Thankyou for your ongoing contributions to the community Cyril :)

# Boilerplate text
import clr

import System
from System import Array
from System.Collections.Generic import *

clr.AddReference('System.Drawing')
import System.Drawing
from System.Drawing import *

# Try and get specific interop, if not just the general one
try:
	clr.AddReferenceByName('Microsoft.Office.Interop.Excel, Version=11.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c')
except:
	clr.AddReference('Microsoft.Office.Interop.Excel')

from Microsoft.Office.Interop import Excel
from System.Runtime.InteropServices import Marshal

## CYRIL'S CODE FROM HERE (minor edits) ##

# Excel range parsers
xlDirecDown = System.Enum.Parse(Excel.XlDirection, "xlDown")
xlDirecRight = System.Enum.Parse(Excel.XlDirection, "xlToRight")
xlDirecUp = System.Enum.Parse(Excel.XlDirection, "xlUp")
xlDirecLeft = System.Enum.Parse(Excel.XlDirection, "xlToLeft")

# Excel class and exporter
class ExcelUtils():
	def __init__(self, lstData, filepath):
		self.lstData = lstData
		self.filepath = filepath

	def exportXls(self, wsName, index):
		ex = Excel.ApplicationClass()
		ex.Visible = True
		ex.DisplayAlerts = False
		workbook = ex.Workbooks.Add()
		ws = workbook.Worksheets[index+1]
		ws.Name = wsName
		nbr_row = len(self.lstData)
		nbr_colum = len(self.lstData[0])
		xlrange  = ws.Range[ws.Cells(1, 1), ws.Cells(nbr_row, nbr_colum)]
		a = Array.CreateInstance(object, nbr_row, nbr_colum)
		for indexR, row in enumerate(self.lstData):
			for indexC , value in  enumerate(row):
				a[indexR,indexC] = value
				
		#copy Array in range			
		xlrange.Value2 = a		
		used_range = ws.UsedRange	
		for column in used_range.Columns:
			column.AutoFit()
		# apply style
		missing = System.Type.Missing
		try:
			currentWs.ListObjects.Add(Excel.XlListObjectSourceType.xlSrcRange, xlrange, missing, Excel.XlYesNoGuess.xlGuess, missing).Name = "WFTableStyle"
			currentWs.ListObjects["WFTableStyle"].TableStyle = "TableStyleMedium6"
		except:
			pass

		# close Excel
		workbook.SaveAs(self.filepath)
		ex.Workbooks.Close()
		ex.Quit()
    	#other proper way to make sure that you really closed andreleased all COM objects 
		if workbook is not None:
			Marshal.ReleaseComObject(workbook)
		if ex is not None:
			Marshal.ReleaseComObject(ex)     
		workbook = None        
		ex = None

## GAVIN'S PACKAGING FROM HERE ##

# Preparing input from dynamo to revit
filepath = IN[0] + "\\" + IN[1] + ".xlsx"
sheetName = IN[2]
dataMatrix = IN[3]

for i in range(len(sheetName)):
	sn=sheetName[i]
	dm=dataMatrix[i]
	classTry = ExcelUtils(dm,filepath)
	classTry.exportXls(sn,i)
	OUT = [filepath,"Export successful"]

Here 2 examples for multiple worksheets in one workbook (export datas)

  • with Interrop (Ironpython2 or Ironpython3 engines)
Code Python with Interrop
import clr
import sys
import System

from System import Array
from System.Collections.Generic import *
from System.Runtime.InteropServices import Marshal

def get_shape(lst, shape=()):
	"""
	returns the shape of nested lists 
	"""
	if not hasattr(lst, "__iter__") or isinstance(lst, (str, System.String)):
		# base case
		return shape
	# peek ahead and assure all lists in the next depth
	# have the same length
	if hasattr(lst[0], "__iter__") and not isinstance(lst[0], (str, System.String)):
		l = len(lst[0])
		if not all(len(item) == l for item in lst):
			msg = 'not all lists have the same length'
			raise ValueError(msg)
	shape += (len(lst), )
	# recurse
	shape = get_shape(lst[0], shape)

	return shape

class ExcelEnum:
	"""
	values Enum from API Interop Mircrosoft Doc 
	"""
	XlListObjectSourceType_xlSrcRange = 1
	XlYesNoGuess_xlGuess = 0 
	XlYesNoGuess_xlNo = 2 
	XlYesNoGuess_xlYes = 1 


class ExcelUtils():	
							
	@staticmethod
	def SetData(currentWs, data_array):	
		nbr_row = len(data_array)
		nbr_colum = len(data_array[0])
		xlrange  = currentWs.Range[currentWs.Cells(1, 1), currentWs.Cells(nbr_row, nbr_colum)]
		xlrange.NumberFormat = "@"
		a = Array.CreateInstance(object, nbr_row, nbr_colum)
		for indexR, row in enumerate(data_array):
			for indexC , value in  enumerate(row):
				a[indexR,indexC] = value
				
		#copy Array in range			
		xlrange.Value2 = a		
		used_range = currentWs.UsedRange	
		for idx, column in enumerate(used_range.Columns):
			column.AutoFit()
		# apply style
		missing = System.Type.Missing
		try:
			currentWs.Activate()
			#
			xlrangeTable  = currentWs.Range[currentWs.Cells(1, 1), currentWs.Cells(nbr_row, nbr_colum)]
			currentWs.ListObjects.Add(ExcelEnum.XlListObjectSourceType_xlSrcRange, xlrangeTable, missing, ExcelEnum.XlYesNoGuess_xlYes, missing).Name = "WFTableStyle"
			currentWs.ListObjects["WFTableStyle"].TableStyle = "TableStyleMedium7"
			#
		except Exception as ex:
			print(ex)
		
	@staticmethod
	def ExportXls(filepath, datas_array_array = [] , name_sheet_array = []):	
		ex = System.Activator.CreateInstance(System.Type.GetTypeFromProgID("Excel.Application", True))
		ex.Visible = True
		ex.DisplayAlerts = False
		workbook = ex.Workbooks.Add()
		workbook.SaveAs(filepath)
		for idx, (data_array, sheet_name) in enumerate(zip(datas_array_array, name_sheet_array)):
			if idx == 0:
				ws = workbook.Worksheets[1]
			else:
				ws = workbook.Sheets.Add(After=workbook.Sheets[workbook.Sheets.Count])
			ws.Name = sheet_name
			ExcelUtils.SetData(ws, data_array)
		workbook.Save()	

filepath = IN[0]
datas_array_array = IN[1]
if len(get_shape(datas_array_array)) != 3 :
	raise ValueError("datas_array_array must be a 3D array : \n[[[0,0],[0,0]],[[0,0],[0,0]]]")
name_sheet_array = IN[2]
if len(get_shape(name_sheet_array)) != 1 :
	raise ValueError("datas_array_array must be a 1D array : \n[name_sheetA,name_sheetB]")
ExcelUtils.ExportXls(filepath, datas_array_array, name_sheet_array)
  • with pandas and openpyxl (CPython3 engine)
Code Python with pandas and openpyxl
import sys
import clr
import System

clr.AddReference('Python.Included')
import Python.Included as pyInc
path_py3_lib = pyInc.Installer.EmbeddedPythonHome
sys.path.append(path_py3_lib + r'\Lib\site-packages')
import os
import numpy as np
import pandas as pd
from openpyxl.worksheet.table import Table, TableStyleInfo
from openpyxl.utils import get_column_letter

def autosize_columns(df, ws):
    """
    A function that sets the width of each column in a given worksheet (ws) based on the maximum length of data in each column in the Pandas DataFrame (df).

    Args:
    - df: Pandas DataFrame containing the data to be written to the worksheet.
    - ws: Worksheet object representing the worksheet where the data will be written.
    Returns: None
    """
    margin = 7
    lstvalues = df.values.tolist()
    lstColvalues = df.columns.values.tolist()
    column_widths = []
    for row in [lstColvalues] + lstvalues:
        for i, cell in enumerate(row):
            current_lenCell = len(cell) if cell is not None else 1
            if len(column_widths) > i:
                if current_lenCell > column_widths[i]:
                    column_widths[i] = current_lenCell
            else:
                column_widths += [current_lenCell]
    for i, column_width in enumerate(column_widths, 1):  # ,1 to start at 1
        ws.column_dimensions[get_column_letter(i)].width = column_width + margin
    
filepath = IN[0]
datas_array_array = np.array(IN[1])
if datas_array_array.ndim != 3 :
	raise ValueError("datas_array_array must be a 3D array : \n[[[0,0],[0,0]],[[0,0],[0,0]]]")
name_sheet_array = np.array(IN[2])
if name_sheet_array.ndim != 1 :
	raise ValueError("datas_array_array must be a 1D array : \n[name_sheetA,name_sheetB]")

# write excel
# Create a Pandas Excel writer using XlsxWriter as the engine.
with pd.ExcelWriter(filepath, mode="w", engine="openpyxl") as xlsx:
    for data_array, sheet_name in zip(datas_array_array, name_sheet_array):
        df = pd.DataFrame(data=data_array[1:, :],  # values
                        columns=data_array[0, :])  # 1st row as the column names
        #
        df.to_excel(xlsx, sheet_name=sheet_name, index=False)
        # get worksheet
        ws = xlsx.sheets[sheet_name]
        table = Table(displayName="Table_" + sheet_name, ref="A1:" + get_column_letter(ws.max_column) + str(ws.max_row))
        # Add a default style with striped rows and banded columns
        style = TableStyleInfo(name="TableStyleMedium9", showFirstColumn=False, showLastColumn=False, showRowStripes=True, showColumnStripes=False)
        table.tableStyleInfo = style
        ws.add_table(table)
        autosize_columns(df, ws)

OUT = 0
10 Likes