Extracting page sizes from PDF

Angel76 · February 9, 2023, 1:22am

How can I extract the sizes of different pdf files, A0,A1,A2, …

Angel76 · February 9, 2023, 5:26am

this node does not read all the files, what is the problem? Please help

cv2HPX4 · February 9, 2023, 6:35am

Can you show the whole graph in a screendump ?

Angel76 · February 9, 2023, 1:24pm

What I’m trying to get is the size of the pdf files, to know what format they are, A4, A3, etc. However, there is a pdf that is null in the dimensions

RubenVivancos · December 16, 2023, 10:15pm

did you find a solution to know pdf size with this package DynamoPDF? I am trying this ironpython library instead pdfrw · PyPI

kimnancy · December 18, 2023, 7:49am

Hi! It looks like the node might struggle with some PDFs. Check for compatibility issues, ensure correct file paths, and verify if it supports the sizes you need (A0, A1, A2, etc.). If issues persist, consider trying alternative nodes or methods for file size extraction.

c.poupin · December 18, 2023, 7:40pm

Hi,

a solution using iTextSharp or iText

import sys
import clr
import System
sys.path.append(r"D:\REVIT\Scripts Dynamo\lib_dll\net4")
clr.AddReference('iTextSharp')
from iTextSharp.text.pdf  import PdfReader

out  = []
reader = PdfReader(IN[0])
for i in range(1, reader.NumberOfPages + 1):
    pagesize = reader.GetPageSize(i)
    height = pagesize.Height
    width = pagesize.Width
    out.append(f"page {i} : format Size {height * 0.352777778 :.1f} x {width * 0.352777778 :.1f} mm")
reader.Close()
OUT = out

Angel76 · January 4, 2024, 5:37am

and if I have several separate pdf files, I get null

Joe.Charpentier · January 4, 2024, 5:37pm

It looks like maybe the Python script that Cyril provided is set up to only take one PDF at a time, I think you will need to modify the Python to handle a list of inputs.

c.poupin · January 4, 2024, 8:45pm

Here’s the code for a list of input paths

import sys
import clr
import System
sys.path.append(r"D:\REVIT\Scripts Dynamo\lib_dll\net4")
clr.AddReference('iTextSharp')
from iTextSharp.text.pdf  import PdfReader

def toList(x):
    if isinstance(x, list):
        return x
    elif isinstance(x, (str, System.String)):
        return [x]
    elif hasattr(x, "GetType") and x.GetType().GetInterface("IEnumerable") is not None :
        return x
    else :
        return [x]
        
lst_pdf_path = toList(IN[0])
print(lst_pdf_path)
out  = []
for pdf_path in lst_pdf_path:
    temp = []
    reader = PdfReader(pdf_path)
    for i in range(1, reader.NumberOfPages + 1):
        pagesize = reader.GetPageSize(i)
        height = pagesize.Height
        width = pagesize.Width
        temp.append(f"page {i} : format Size {height * 0.352777778 :.1f} x {width * 0.352777778 :.1f} mm")
    reader.Close()
    out.append(temp)
OUT = out

Angel76 · January 5, 2024, 12:34am

100

That error occurs when I use dynamo sand box.

GavinCrump · January 5, 2024, 4:41am

You need to add the reference as per Cyril’s previous instructions.

c.poupin · January 5, 2024, 7:47am

@Angel76

This type of error usually occurs when the dll has a TargetFramework that is not compatible with NetRuntime (execution).

Which version of Dynamo are you using ?

Angel76 · January 5, 2024, 10:45pm

I have tried several versions and I still get an error, I can only run without problems from Dynamo for civil 3D

c.poupin · January 6, 2024, 11:22pm

check if dlls are locked
for Dynamo3+ iTextSharp is not compatible with .NetCore, you need to use another library such as itext7 or ITextSharp.LGPL (not compatible with CPython3/PythonNet 2.5.x which cannot read somme attributes and methods).

Alternatively, you can use the PyPDF2 or PyPDF3 library with CPython3 (pip install PyPDF2)


import sys
import clr
import System
clr.AddReference('Python.Included')
import Python.Included as pyInc
path_py3_lib = pyInc.Installer.EmbeddedPythonHome
sys.path.append(path_py3_lib + r'\Lib\site-packages')

from PyPDF2 import PdfReader

def toList(x):
    if isinstance(x, list):
        return x
    elif isinstance(x, (str, System.String)):
        return [x]
    elif hasattr(x, "GetType") and x.GetType().GetInterface("IEnumerable") is not None :
        return x
    else :
        return [x]
        
lst_pdf_path = toList(IN[0])
print(lst_pdf_path)
out  = []
for pdf_path in lst_pdf_path:
    temp = []
    reader = PdfReader(pdf_path)
    for i, p in enumerate(reader.pages):
        box = p.mediabox
        height = box.height
        width = box.width
        temp.append("page {} : format Size {:.1f} x {:.1f} mm".format(i + 1, 
                                                                    height * 0.352777778 ,
                                                                     width * 0.352777778 ))
    out.append(temp)
OUT = out

Angel76 · January 7, 2024, 4:41am

Thanks for your time, solved

Topic		Replies	Views
Print PDF , type error Packages dynamo	6	917	October 31, 2018
Managing and distributing printsettings/form sizes for all users Revit	5	271	January 11, 2021
Print all different size sheets in order Revit	0	259	October 4, 2019
Print to PDF Packages	6	758	November 18, 2019
How to get Worksheet Names, Excel file Packages excel	20	4516	May 17, 2022

Extracting page sizes from PDF

Related Topics