Python Code for Getting Directory File Info

Hello,

I am trying to get information for files within a directory; nameley the creator of the files.

Is anyone interested in providing me with a nice python snippet to copy-paste? I know I should “learn” how to do this…I have tried in the past, but it usually leads me to a state of absolute anxiety.

Any help is very much appreciated! :slight_smile:

Cheers,
Matt

This is an interesting query, more from a ‘why’ standpoint than a ‘how’ standpoint. WhT is your intended use?

If I get my desk and laptop setup I’ll see if I can put something together tonight - not likely though as I’m mid move.

What files? All files? Revit project files? Revit family files?
Network drive?
Cloud storage?

Hi Jacob,

Why? Chaos.
Basically, I need the ability to pick-through a project folder and extract data regarding the contents. We have a structure in-place, but it doesn’t cover everything, and older projects may have a different structure. I need the ability to filter through the contets of each project quickly.

I can give you a few examples of how I use this:

  1. I want to find documents related to a specific subject throughout a series of projects and folders. If I know who has created the file, when it was last edited, etc. I can more quickly filter-through a short-list of candidates. (This is what I am specifically looking to do now.)

  2. I create a lot of dynamo scripts for large projects. Some of them are scalable, some of them are one-off. In any case, I want the ability to quickly see which projects have which scripts.

  3. I want to collect file- and network-use data for multiple projects and analyze that data in Power BI.

  4. I want to quickly identify files (mainly back-ups, logs, etc.) which can safely deleted when archiving projects.

My current workflow is to collect data, export to excel and work on the data there for each step downstream in the process.

I’ve also found a PDF-Package, which could also be very interesting for building a keyword-repository…not sure dynamo is the best tool for doing this at scale, but for sketching-out the concept it should work well enough.

I could carry on…but I think it may be a bit of information overload…

hi aaron,

At the moment, it is mostly non-revit files. A lot of office files, pdfs and graphics-files…Revit files not yet, but I would like to collect basic file info outside of Revit and in the OS / Server-Environments.

Network drive…but we will likely be moving some projects or services to the cloud within the next five years. Servers are good and fine, but not always ideal for multiple offices spread-out over long distances.

Moving day…hope it is / continues to go well. :slight_smile:

I’m off the plane and all my stuff made it so things are going well. :slight_smile:

On the original question, seeing what you’re after, learning more about how to fish for the data you might want is advisable. The os module would be where I would start. You can read about the module here: os — Miscellaneous operating system interfaces — Python 3.11.3 documentation.

OS has a stat method which I think will work for most of your needs on file specifics. This stack exchange solution should give you everything you need to get started: linux - how to find the owner of a file or directory in python - Stack Overflow

You’d also want to use the os module for getting the directory contents - either the listdir method or the walk method. More info on that can be found on the stack exchange thread here: python - How do I list all files of a directory? - Stack Overflow

thanks for the response. i have a well-versed python user in the office.
I’ll start bothering him… :wink:

So, if I’m hearing correctly, this is a utility that isn’t directly associated with Revit. And I assume the output would be some sort of report. Maybe in Excel, a database or web page?
In that case - Dynamo is the wrong tool.

I’d suggest just jumping into VB .Net or C#. You’ll have better access to low level OS functions and once compiled, it will run way faster. (A lot of files - take a lot of time to parse.)

Note that there are a lot of things that can change the rights and ownership of a file. so, you really can’t depend on the basic file attribute information to tell you who was the originator of the file. Some files, like Word and Excel, will have metadata which is going to be more reliable than just the file properties such as Owner. Most files won’t have this depth of information.

And you may find it faster to just buy something that does what you want.

This should be a good starting point to go through and get everything, then just plug in the additional parts that you require

I’m able to collect all the files from the directory; I even managed to find a python snippet for the date and time. But I can’t properly program, so I was hoping a python-god would just send me a goodie from heaven. I have a colleague who can probably figure it out in 15-20 Minutes…

Yes, just a utility which is writing information into excel for easier filtering and sorting.

I’ve also used this for batch-exporting families from multiple projects to grab families which needed to become standardized content, and also for deleting back-up files on the network.

I recommend giving it a shot and post where you get stuck. As a start you can try to build off of this (written from my phone without an editor or any way to test stuff):

data = []
for file in fileslist:
    filedata = []
    filedata.append(file)
    stats = os.stat(file)
    user = stats.st_uid
    filedata.append(user)
    data.append(filedata)
OUT = data
1 Like

hi,
here an example with .Net

import sys
import clr
import System

mydir = IN[0]
datafiles = []
filesInfo = [System.IO.FileInfo(f) for f in System.IO.Directory.GetFiles(mydir)]
for fi in filesInfo:
    user_owner = fi.GetAccessControl().GetOwner(System.Security.Principal.NTAccount).ToString()
    datafiles.append([fi.Name, user_owner, fi.CreationTime, fi.LastWriteTime])

OUT = datafiles
3 Likes

That worked beautifuly for one directory. Unfortunately, I want to collect the files for multiple directories.
I will bother my colleague with the issue now, and will post the results, if any.

Thanks for all the help so far!

you mean collect sub directories in a folder ?

I am using this at the moment:

Thie node returns all files under this folder, including files in sub-directories (e.g. sub-folders)

I have a list of 300+ files, which include the full file-path. I want to turn that into a list of users who have created and who last modified the files.

All the other info i could grab using the os.stats

Here is a variant to search in subdirectories

import sys
import clr
import System

mydir = IN[0]
datafiles = []

dirInfo = System.IO.DirectoryInfo(mydir)
allfiles = dirInfo.GetFiles("*.*", System.IO.SearchOption.AllDirectories)
for fi in allfiles:
    user_owner = fi.GetAccessControl().GetOwner(System.Security.Principal.NTAccount).ToString()
    datafiles.append([fi.DirectoryName, fi.Name, user_owner, fi.CreationTime, fi.LastWriteTime])

OUT = datafiles
3 Likes

Interesting. Did you try to use similar python code to retrieve the files metadata of Autodesk desktop connector? What I am interested is to know if the files have been updated since last time the files were synchronised, so time of the file modified in bim360 is different than the file date when received in local computer

Desktop connector data cannot be access by third party (ie: windows, system, etc) tools in a reliable way. You would have to identify the internal workings of the tool, which would be against the terms of service.