Extract All Content from an Excel File (including text boxes)

Hi all,

I have an excel room data sheet and would like to extract all the information.

I have tried using the conventional method “Data.ImportExcel”, but there are some text box fields where the input data is not being extracted.

I have tried the solution from this post, but the python script is not running: Excel file separate each worksheet into a separate external excel file - Developers - Dynamo

Excel Extract Data.dyn (29.9 KB) - Dynamo v.2.19.3

Excel Template.xlsx (584.7 KB)

(Note: I have tried to export the data to a blank excel as a means of checking if all the data can extract)

If anyone has any pointers of how I may extract all the data content, I’d really appreciate the help.

Hi

this syntax
clr.AddReferenceByName('Microsoft.Office.Interop.Excel, Version=11.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c' )

is specific to IronPython, swicth the engine to IronPython2 or IronPython3

If you want to use Excel Interrop with PythonNet (CPython3 or PythonNet3), you’ll need to use Reflection or cast COM objects.

Example

Thank you, really appreciate the help. Changing to IronPython made the python code run, however I misunderstood what the code would do, as it essentially creates a duplicate of the file, including the locked formatting which prevents me from easily extracting the data.

Would you have any pointers for how I might grab the data into Dynamo?

The final goal would be to use the data to push into Revit rooms, or into another excel template that has a different formatting.

The method which Cyril linked is to help you find some of the techniques you will need to use to work with this type of data. As noted he noted you are using ItonPython techniques in a CPython environment, which means you either need to change to a CPython technique or change your environment to an IronPython one.

This is not a simple problem to solve. I have seen a few code solutions which indicate this is possible, but they are not simple solutions. This makes sense as you had to stop using pure excel and leverage VBA just to get that text box - so the code you have to write now has to use language 1 to read a result created by language 2…

I think you should consider changing the workflow instead of fighting a very difficult coding problem which seems to be a decent bit beyond your capability to author and maintain - I mean no offense by that - this is a tough task which I haven’t seen someone attempt at scale yet. Some options:

  • Enter the data into sheet A, and setup the nice formatted print view on sheet B by referencing the data from sheet A into B and thereby get the ‘final look’ of a text box when printed.
  • Export from excel to CSV when you are done entering it.
  • Do one big ctrl+c and paste values into a new workbook and read those.
  • Swap the text box for the use of data validation thereby retaining the dropdown feature circumventing then use of VBA

If you really need to keep the text box, you could try implementing this: How to get content from excel textbox using python - Stack Overflow

Note that you will first have to customize your Python environment in Dynamo to get the prerequisite library, which will have to be done on every system and maintained on each update. Instructions on how to do that can be found on the Dynamo GitHub and elsewhere in the forum.

1 Like

here is an example to extract values from controls (OLE Objects)

python code (compatible all engines)

import clr
import sys
import System
from System import Array
from System.Collections.Generic import List, Dictionary, IDictionary

clr.AddReference("System.Reflection")
from System.Reflection import BindingFlags

from System.Runtime.InteropServices import  Marshal

clr.AddReference("System.Core")
clr.ImportExtensions(System.Linq)

xls_filePath = IN[0]
xls_SheetName = IN[1]
dict_values = {}

systemType = System.Type.GetTypeFromProgID("Excel.Application", True)
try:
    ex = System.Activator.CreateInstance(systemType)
except:
    methodCreate = next((m for m in clr.GetClrType(System.Activator)\
                    .GetMethods() if "CreateInstance(System.Type)" in m.ToString()), None)
    ex = methodCreate.Invoke(None, (systemType, ))
    
ex.Visible = False
workbooks = ex.GetType().InvokeMember("Workbooks", BindingFlags.GetProperty , None, ex, None)
workbook = workbooks.GetType().InvokeMember("Open", BindingFlags.InvokeMethod , None, workbooks, (xls_filePath, )) 
worksheets = workbook.GetType().InvokeMember("Worksheets", BindingFlags.GetProperty , None, workbook, None)
#
ws = worksheets.GetType().InvokeMember("Item", BindingFlags.GetProperty , None, worksheets, (xls_SheetName,))
#
# get all ole ActiveX objects
ole_objects = ws.GetType().InvokeMember("OLEObjects", BindingFlags.InvokeMethod , None, ws, None)  #sheet.OLEObjects
ole_objects_count = ole_objects.GetType().InvokeMember("Count", BindingFlags.GetProperty , None, ole_objects, None)
print(f"{ole_objects_count=}")
for i in range(ole_objects_count):
    #print(i)
    ole_object = ole_objects.GetType().InvokeMember("Item", BindingFlags.GetProperty , None, ole_objects, (i + 1,))
    ole_name = ole_object.GetType().InvokeMember("Name", BindingFlags.GetProperty , None, ole_object, None)
    prog_id = ole_object.GetType().InvokeMember("progID", BindingFlags.GetProperty , None, ole_object, None) # progID 
    embedded = ole_object.GetType().InvokeMember("Object", BindingFlags.GetProperty , None, ole_object, None) # ole.Object
    name = embedded.GetType().InvokeMember("Name", BindingFlags.GetProperty , None, embedded, None)
    value = embedded.GetType().InvokeMember("Value", BindingFlags.GetProperty , None, embedded, None) # ole.Value
    dict_values[name]={"ole_name" : ole_name, "value" : value}
#
# get all ole Shape objects
ole_shapes = ws.GetType().InvokeMember("Shapes", BindingFlags.GetProperty , None, ws, None)  #sheet.GetProperty
ole_shapes_count = ole_shapes.GetType().InvokeMember("Count", BindingFlags.GetProperty , None, ole_shapes, None)
print(f"{ole_shapes_count=}")
for i in range(ole_shapes_count):
    #print(i)
    ole_shape = ole_shapes.GetType().InvokeMember("Item", BindingFlags.InvokeMethod , None, ole_shapes, (i + 1,))
    ole_name = ole_shape.GetType().InvokeMember("Name", BindingFlags.GetProperty , None, ole_shape, None)
    ole_format = ole_shape.GetType().InvokeMember("OLEFormat", BindingFlags.GetProperty , None, ole_shape, None)
    embedded = ole_format.GetType().InvokeMember("Object", BindingFlags.GetProperty , None, ole_format, None)
    name = embedded.GetType().InvokeMember("Name", BindingFlags.GetProperty , None, embedded, None)
    try:
        value = embedded.GetType().InvokeMember("Value", BindingFlags.GetProperty , None, embedded, None) # ole.Value
        dict_values[name]={"ole_name" : ole_name, "value" : value}
    except :
        pass
        
    
# close and dispose
workbooks.GetType().InvokeMember("Close", BindingFlags.InvokeMethod , None, workbooks, None)
Marshal.ReleaseComObject(workbooks)
ex.GetType().InvokeMember("Quit", BindingFlags.InvokeMethod , None, ex, None)
Marshal.ReleaseComObject(ex)

    
OUT = dict_values

But I agree with @jacob.small that it would be preferable to write the values of the controls/forms in another Excel sheet (via a macro), and then read the results with OpenXML.

1 Like

You can use xlwings library which runs excel in the background and gives access to the Excel VBA API

Code extracts the text from all text boxes

import xlwings as xw


def get_text(shape):
    text = []
    if shape.Type == 6:
        for item in shape.GroupItems:
            text.extend(get_text(item))
    elif shape.Type == 17:
        text.append(shape.TextFrame2.TextRange.Text)

    return text


excel_file = IN[0]
with xw.App(visible=False) as app:
    wb = xw.Book(excel_file)
    output = {}
    for sheet in wb.api.Sheets:
        for shp in sheet.Shapes:
            output.setdefault(sheet.Name, []).append(get_text(shp))

OUT = output

My post on installing python libraries to Dynamo

2 Likes

Just want to say thank you for all the responses and advice! Jacob, you’re absolutely right, I’m lost when one has to depart from the simplicity provided by dynamo’s visual programming and venture into real coding.

I will take some time to properly explore and try out all the advice provided over the coming week, but a big thank you for now.

2 Likes

I’ve finally had a chance to study your responses. A big thank you once again, the script now works with your inputs.

2 Likes