Extracting page sizes from PDF

How can I extract the sizes of different pdf files, A0,A1,A2, …


this node does not read all the files, what is the problem? Please help

Can you show the whole graph in a screendump :slight_smile: ?

What I’m trying to get is the size of the pdf files, to know what format they are, A4, A3, etc. However, there is a pdf that is null in the dimensions

2 Likes

did you find a solution to know pdf size with this package DynamoPDF? I am trying this ironpython library instead pdfrw · PyPI

Hi! It looks like the node might struggle with some PDFs. Check for compatibility issues, ensure correct file paths, and verify if it supports the sizes you need (A0, A1, A2, etc.). If issues persist, consider trying alternative nodes or methods for file size extraction.

Hi,

a solution using iTextSharp or iText

import sys
import clr
import System
sys.path.append(r"D:\REVIT\Scripts Dynamo\lib_dll\net4")
clr.AddReference('iTextSharp')
from iTextSharp.text.pdf  import PdfReader

out  = []
reader = PdfReader(IN[0])
for i in range(1, reader.NumberOfPages + 1):
    pagesize = reader.GetPageSize(i)
    height = pagesize.Height
    width = pagesize.Width
    out.append(f"page {i} : format Size {height * 0.352777778 :.1f} x {width * 0.352777778 :.1f} mm")
reader.Close()
OUT = out
3 Likes

and if I have several separate pdf files, I get null

It looks like maybe the Python script that Cyril provided is set up to only take one PDF at a time, I think you will need to modify the Python to handle a list of inputs.

1 Like

Here’s the code for a list of input paths

import sys
import clr
import System
sys.path.append(r"D:\REVIT\Scripts Dynamo\lib_dll\net4")
clr.AddReference('iTextSharp')
from iTextSharp.text.pdf  import PdfReader

def toList(x):
    if isinstance(x, list):
        return x
    elif isinstance(x, (str, System.String)):
        return [x]
    elif hasattr(x, "GetType") and x.GetType().GetInterface("IEnumerable") is not None :
        return x
    else :
        return [x]
        
lst_pdf_path = toList(IN[0])
print(lst_pdf_path)
out  = []
for pdf_path in lst_pdf_path:
    temp = []
    reader = PdfReader(pdf_path)
    for i in range(1, reader.NumberOfPages + 1):
        pagesize = reader.GetPageSize(i)
        height = pagesize.Height
        width = pagesize.Width
        temp.append(f"page {i} : format Size {height * 0.352777778 :.1f} x {width * 0.352777778 :.1f} mm")
    reader.Close()
    out.append(temp)
OUT = out
2 Likes

100

That error occurs when I use dynamo sand box.

You need to add the reference as per Cyril’s previous instructions.

@Angel76

This type of error usually occurs when the dll has a TargetFramework that is not compatible with NetRuntime (execution).

Which version of Dynamo are you using ?

I have tried several versions and I still get an error, I can only run without problems from Dynamo for civil 3D

  • check if dlls are locked
  • for Dynamo3+ iTextSharp is not compatible with .NetCore, you need to use another library such as itext7 or ITextSharp.LGPL (not compatible with CPython3/PythonNet 2.5.x which cannot read somme attributes and methods).

Alternatively, you can use the PyPDF2 or PyPDF3 library with CPython3 (pip install PyPDF2)


import sys
import clr
import System
clr.AddReference('Python.Included')
import Python.Included as pyInc
path_py3_lib = pyInc.Installer.EmbeddedPythonHome
sys.path.append(path_py3_lib + r'\Lib\site-packages')

from PyPDF2 import PdfReader

def toList(x):
    if isinstance(x, list):
        return x
    elif isinstance(x, (str, System.String)):
        return [x]
    elif hasattr(x, "GetType") and x.GetType().GetInterface("IEnumerable") is not None :
        return x
    else :
        return [x]
        
lst_pdf_path = toList(IN[0])
print(lst_pdf_path)
out  = []
for pdf_path in lst_pdf_path:
    temp = []
    reader = PdfReader(pdf_path)
    for i, p in enumerate(reader.pages):
        box = p.mediabox
        height = box.height
        width = box.width
        temp.append("page {} : format Size {:.1f} x {:.1f} mm".format(i + 1, 
                                                                    height * 0.352777778 ,
                                                                     width * 0.352777778 ))
    out.append(temp)
OUT = out
2 Likes

Thanks for your time, solved