Using dynamo to selected Element ID's via a HTML file

Hi all,

I am working with a client who is doing Revit coordination reports via copy monitor, this gets outputted as an HMTL file.

Our job is to go and find these objects and make the changes necessary as it is between a central file and multiple linked files, at least 15 different links spanning over 20 floors.

I was wondering if anyone knows how to make an HTML file work for Dynamo that I can extract the element ID from the file itself and then being able to select that element ID from within our central model itself via Dynamo.

Now his reports are coming through with the linked files element ID for the floors and our central file element ID for the floors.

Sorry if this sounds confusing but I think there should be a very straight forward method to doing this rather than having to go and individually go Select by ID every time we have a coordination report.

Kind regards,

 

Jean-Luc

1 Like

Hi Jean-Luc

I’d guess you would have to scrub the data first- html contains a lot of clutter & I don’t believe Dynamo could read it directly.

So there would probably need to be an intermediate step i.e cleaning up the data so it is in a nice tidy format that Dynamo can deal with.
This might be something like the ‘web query’ tool in Excel, or something more sophisticated. If the data is in an html table, it might not be too difficult.

Can post a sample of the html report.

 

Andrew

Python has a few handful libraries for writing html crawlers. One that I have been using previously is called BeautifulSoup and it gets the job done. Have a look at this post: www.archi-lab.net

This should be exactly what you are looking for. The post is a tad dated, but I bet you can figure it out.

You could also have a look in Spring nodes’ “ErrorReport.Parse” node. You might be able to adapt it to coordination reports.

Why not just save the html to text from your browser and read/parse the resulting file from Dynamo?

1 Like

Python’s Regex should be able to handle this pretty well.

import re

dataEnteringNode = IN
html = IN[0]

exp = r’(?<=id\s)([0-9]*)'
match = re.findall(exp, html)

if match:
OUT = set(match)
else:
OUT = ‘Not Found’

 

Gui,

I like re! Regular Expression is definitely an option here. The reason i suggested beatiful soup is that it was specifically written with mind on parsing HTML documents while regex is more of a string parsing library. of course an html document is nothing more than a string, but still i thought beautiful soup to be a more elegant solution.

I would only add that to make your suggestion even better, i would just use the os module to read the file in, rather than copy pasting the contents manually. then you can use re to parse it and it will be quite an elegant solution.

Cheers!