IFCNodes — AI-Powered IFC Toolkit for Dynamo

IFCNodes — AI-Powered IFC Toolkit for Dynamo

I’ve been developing an open-source IFC toolkit as part of my Master’s thesis at AUC, and it’s ready for community testing.

The highlight: a multimodal LLM chat interface that can see your model, query elements, analyze relationships, visualize results, and search the web — all through natural language.

I had clear objectives when I started. Now? I don’t know the limits.

Every prompt surprises me. It generates cost estimates, creates schedules, summarizes clash reports semantically, checks design intent — things I didn’t explicitly build.

Demo Video

Two AI chat examples:

  • Automated Passive House compliance check — it searched standards and color-coded failures
  • Visual Question Answering — found a windowless room by analyzing images, not metadata

:television: https://youtu.be/6M4WR-W9kT8

Getting Started

  1. Install IFCNodes from Dynamo Package Manager
  2. Get a free API key: https://aistudio.google.com/api-keys
  3. Load an IFC model and start chatting

:open_book: Full documentation: Welcome to IFCNodes | IFCNodes - Dynamo Package for IFC Processing


I need real-world feedback — what works, what breaks, what’s missing.

Download it. Test it. Tell me what you find.

4 Likes

Looks nice in the video. Great exploration.

My one caution is that processing full IFC files can be VERY expensive in direct AI processing in terms of tokens consumption. Your demo for a small example (I think it was a 3 room building?) cost 21,000 tokens (not much text on either the prompts or the responses so I am guessing they are mostly from the IFC processing) which for some LLMs can be expensive. As such your next step should be reducing the token call by preemptively processing the IFC into a more readily identifiable format, and provide a caching mechanism where you pick the model you want to query and load that into the LLM and thereby cut the token cost by 90%.

Curious to see how well it would hold up against a larger project though - it may be that some queries work well and don’t break the bank as is. Try exporting the Snowdon towers models to IFC and process those to see if that things still work well.

Don’t let this scaling concern steer you away, but it is something you should consider. The unpredictable cost of using such LLMs has kept a few larger companies from going ‘all in’ on these type of tools - building your own LLM and using is the alternative, but that sort of shifts the burden from ‘using’ a LLM service such as Googles, to maintaining the LLM and it’s infrastructure which is equally unpredictable in nature.

3 Likes

Thanks for the thoughtful feedback Jacob. You’re raising a really important point.

You’re right about the token count. The main reason it’s high is that I build a model context that gets sent with the system prompt. This context includes a summary of all IFC element classes with counts, all properties with their value ranges and usage stats, whether they’re standard IFC properties or custom, plus spaces, zones, systems, classifications. Basically everything that varies between IFC files depending on the authoring software.

This upfront context helps the LLM avoid asking unnecessary clarifying questions or searching for properties that don’t exist in that specific file. A lot of queries can actually be answered directly from this context without executing additional tool calls. So it’s a tradeoff. Higher input tokens but fewer failed attempts and less back-and-forth.

Worth noting that these are input tokens, which are significantly cheaper than output. For example Gemini 2.5 Flash charges $0.30 per million tokens vs $2.50 for output. So front-loading context is actually cost-effective compared to letting the model guess and retry.

I’ve considered caching this context or storing it in a vector database to reuse across sessions for the same IFC file. Didn’t have time to implement it yet because of thesis deadline pressure. But it’s top priority and I plan to add it within the next few weeks after my defense.

Also worth considering that input/output pricing keeps dropping as models get more efficient. What feels expensive today probably won’t be in a year. But that doesn’t mean we ignore token efficiency now. Just that the barrier will keep lowering.

I haven’t tested it on larger projects yet. I’ll try the Snowdon towers export and see how it holds up.

Have you had time to try it yourself? Would love to hear your thoughts after testing it on some larger files.

Thanks again Jacob. Exactly the kind of feedback I was hoping for.

1 Like

My experience is that higher failed attempts is also related to the size of the dataset and question. The larger the input vector field (in any format) the less predictable the result and greater likelihood for a hallucination. I have seen/helped people work around this by doing significant preprocessing on the local system which it sounds like you are doing already.

Also worth noting that NONE of the LLM services turn a profit - they’re all floated by investor funding which will eventually dry up causing the prices to go higher. Last analysis I saw (old at this point so worth reviewing on your own should you decide to push this further post graduation or if anyone else is looking into this at scale) the tokens would have to be an order of magnitude larger to turn a profit. While costs may have come down and ‘using a more static, older model’ might offset those costs some, those routes come with changes (i.e. no code updates).

This is the ‘right path’ as you’ve proven the concept IMO, now the question is how can you make that context scale. A LOT of technologists and researchers have skipped this step and paid dearly for it, while those who have wind up with a lot more runway. In fact (unless you have the time to spare) I would recommend you hold off on the Snowdon if you are doing the defense soon - it won’t likely help you defend the concept, only poke holes in it. You’re already speaking about whatever you might learn by that just yet.

We (I work for Autodesk) have pretty strict AI requirements to prevent exposing customer data, company IP, or innovation. This means I cannot use my work tools for external AI tools without about a month of review. I also have an ongoing attempt at reconfiguing my PC post windows 11 update, which means no Dynamo or Revit on that side of my life right now. If I get it running again after my end of the work year madness (10 days of on site workshops/travel coming up next month).

1 Like

Totally understand the AI restrictions on your end. Makes sense given the data sensitivity involved. And sounds like you’ve got a busy month ahead. No pressure at all. If you do get your setup running again and find time to try it, I’d genuinely appreciate hearing your thoughts. But no rush.

Worth mentioning that the package isn’t only about the LLM features. There are other nodes for querying, clash detection, filtering, geometry extraction and more that work independently without any AI. The full list is in the documentation if you’re curious.

Thanks again for taking the time to share all this. Really valuable perspective.

1 Like

It sounds really interesting @bassel.harby & I’ll certainly give it a try.

I can see the benefits of using natural language queries (you obviously have to ask the right questions) but wondering if many couldn’t be answered by a far simpler SQL-like query, spatial query or local vector database query ?

As you say, the challenge with IFC models is the variability in data structure, such as properties & classes- this is where traditional (SQL-like) queries fall down.

Do you think some kind of local vector database representation of the IFC model would be feasible, rather than relying on token-based LLMs ?

It’s great to see you working on this kind of thing- this is exactly what Autodesk should be doing doing, rather than the Forma smoke & mirrors stuff, or other “we do AI” marketing blurb

Andrew

2 Likes

These are (in my opinion) more impactful up front - easier for me to test as well. I’ve got my own solutions for the most part, but I’ll try to review from the hotel one evening (assuming I don’t have to scramble to deal with an issue for one of the workshops or something).

1 Like

FWIW the Forma stuff isn’t smoke and mirrors, but pretty game changing from a project lifecycle point of view. The ability to manage BIM data in one platform from concept to occupancy hasn’t really existed before (sure we can for the round peg in the square hole sometimes, but to date that’s been far harder than it should have been). It’s also worth noting that there is a necessity to rebuild the data structures which we all use to be friendly for new methods of working such as using a LLM to get data out of the project (what if the data was already a vector database you could query without having to post process anything?).

Make no purchasing decisions or investments based on what I say here or elsewhere; what I know is only as informed as what those outside can be (a quick search on most large company’s job’s portals will often yield a lot of insight if you’re willing to read tea leaves), but I suspect over the next five years the constant in AEC tech will continue to be ‘changing the way we do things’.

Thanks Andrew, appreciate the kind words and looking forward to hearing your thoughts after you try it.

You’re right that SQL-like queries can handle a lot. But the challenge is that they require a seasoned user who already knows what they’re looking for and where to find it.

IFC is complicated. The class hierarchy, property sets, quantity sets, relationships. It’s a lot to navigate even for experienced users. Then the authoring software adds another layer. Revit exports properties differently than ArchiCAD or Tekla. Each tool adds its own custom property sets and naming conventions. So the person validating the file needs to understand both the IFC schema itself and the quirks of whatever software created it.

A few real world examples where this becomes a problem:

Someone asks “show me all the external walls.” Simple enough. But what if some walls are classified as IfcBuildingElementProxy because the authoring software didn’t map them correctly? A property filter won’t catch those. The LLM can look at the properties, materials, and context and flag them anyway. It can even capture images and identify them visually even if the element metadata doesn’t include anything to identify them as walls.

Or imagine asking “which doors don’t meet fire rating requirements?” The fire rating might be stored under FireRating in one file, FireResistance in another, or buried in a custom property set like Pset_DoorCommon or a vendor specific one. A SQL query needs to know exactly where to look. The LLM can reason across all of them.

Or something like “find all rooms with inadequate ventilation.” That might require combining room volumes, window areas, mechanical equipment, and external standards. Not a single query. A chain of lookups plus external context.

So the way I see it, IFCNodes works as a two layer system.

Layer one is the deterministic nodes. Filtering, spatial queries, clash detection, geometry extraction. For the experienced user who knows the schema and knows what they want. Fast, predictable, no tokens involved.

Layer two is the LLM assistant. For the less experienced user, or when information is scattered, or when elements are misclassified, or when you need to combine multiple sources including external standards. It figures out which tools to use and in what order.

Both layers have their place depending on the user and the problem.

On the vector database idea, yes it’s definitely feasible and something I’m planning to explore. But I don’t see it as a replacement for the LLM. More like a complement. The vector database would handle retrieval. Finding relevant elements or properties based on similarity. But you still need something to reason over the results, decide what to do next, chain multiple operations together, or explain findings in plain language. That’s where the LLM stays valuable.

So the ideal setup is probably both working together. Vector database for efficient retrieval, LLM for reasoning and orchestration.

1 Like

thanks @jacob.small

I guess everyone has their own opinion- this is mine:

I’ve filed Forma under “Looks promising, but nowhere near production-ready, let’s wait & see if ADSK properly invest in it or if it gets dropped

So I’d partially agree- Forma could be game-changing, but I don’t think it is right now.

Of course- everything is branded as ‘AI’ these days including Forma (refer AI toothbrushes & AI toilets)
In the case of Forma, I’d describe it as procedural modelling, conceptual design & analysis. The AI part is extremely small.

Andrew

thanks @bassel.harby for the detailed reply.

I’ve heard quite a lot recently about design & construction tech, computational design, machine learning/AI for design & construction etc from people in the MENA region. Is that just a coincidence ?

I think it’s a global direction rather than MENA specific. AI and computational design are gaining traction everywhere across all industries. Construction just happens to be catching up after being one of the least digitized sectors for decades. The tools are more accessible now, so everyone is jumping in at the same time regardless of geography.

1 Like