Python loop slow to find closest points to lines

Hi,

I have big lists and feed in a python script (indexbydistance) but takes extremely long time to run. (Since it will run 37919 x 322 = 12,209,918 times)
Is it possible to simplify the loop, so that it can run faster?

Give this a try.

38000 Pts , 322 Lines = less than 20 seconds, using math formulas to find distances. :sunglasses:

# Load the Python Standard and DesignScript Libraries
import sys
import clr
clr.AddReference('ProtoGeometry')
from Autodesk.DesignScript.Geometry import *

import math 

def closept(p1, p2, p3):
    x1, y1 = p1
    x2, y2 = p2
    x3, y3 = p3
    dx, dy = x2-x1, y2-y1
    det = dx*dx + dy*dy
    a = (dy*(y3-y1)+dx*(x3-x1))/det
    
    # Allow for line length limits
    if a < 0.0:
        a = 0.0
    elif a > 1.0:
        a = 1.0    
        
    return x1+a*dx, y1+a*dy
    
def dist(p1,p2):
    x1,y1 = p1
    x2,y2 = p2
    return math.sqrt( (x2 - x1)**2 + (y2 - y1)**2 )
    
pts = IN[0]

lines = IN[1]

ptList = []
for p in pts:
    ptList.append([p.X,p.Y])

lineList = []
for l in lines:
    lineList.append([[l.StartPoint.X,l.StartPoint.Y],[l.EndPoint.X,l.EndPoint.Y]])

closePt = []
for line in lineList:
    p1 = line[0]
    p2 = line[1]
    dists = []
    for pt in ptList:
        p3 = pt
        p4 = closept(p1, p2, p3)
        d = dist(p3,p4)
        dists.append(d)
    mini = dists.index(min(dists))
    closePt.append(pts[mini])
    
    
OUT = closePt

Hope this helps :grinning:

thanks! I haven’t tested it yet, but may I ask the theory behind?
It seems running the same number of loops as I do? 38000 x 322 = 12,236,000 loops
or it is faster to call a function?

for line in lineList:
    p1 = line[0]
    p2 = line[1]
    dists = []
    for pt in ptList:

Yes, it is iterating the same amount.
The only way to reduce this would be to maintain some pre-sorted groupings of data points/lines prior to the comparison of distance calculation, which might be a good thing to try also.

It is faster to call the functions I have shown as they are only dealing with numerical values, not geometric objects.

Calculations are just faster, but you have to know which one to use to substitute for the representative geometry.

I am unsure of the exact code behind the x.DistanceTo(y) function, so I thought to try a purely mathematical approach.

let me test and reply you soon, thank you so much for your reply! :slight_smile:

Use a quad tree an it will execute in seconds even with millions of points:

8 Likes

wow! so many things to learn! thanks! let me check.

You could also just use the node Geometry.DistanceTo, List.MinimumItem and List.RemoveItemAtIndex to get at first the closest, in a second run the second closest and in another run the third closest:

It may not be the fastest way, but it’s an easy way.

What is the target of getting closest points in your work?

If you use Dynamo 2.8 and upper, look into Scipy: scipy.spatial.KDTree — SciPy v1.9.0 Manual

1 Like

+1 for k-dimensional tree

import sys
import clr
import System
clr.AddReference('ProtoGeometry')
from Autodesk.DesignScript.Geometry import *

clr.AddReference('Python.Included')
import Python.Included as pyInc
path_py3_lib = pyInc.Installer.EmbeddedPythonHome
sys.path.append(path_py3_lib + r'\Lib\site-packages')
import numpy as np
from scipy import spatial


def get_points_onCurve(curve, step = 0.2):
    out = []
    i = step
    while i <= curve.Length :
        try:
            po = curve.PointAtSegmentLength(i)
            i += step
            out.append([po.X, po.Y, po.Z])
        except:
            break
    return out
    
def get_groupIdx_by_flatIdx(idx):
    curr = 0
    for i, grp in enumerate(lstgroup_pts):
        for j , pt in enumerate(grp):
            if curr == idx:
                return i
            else:
                curr += 1

lstgroup_pts = IN[0]
lstLines = IN[1]

a_arr = np.array([[p.X, p.Y, p.Z] for group in lstgroup_pts for p in group])
ktree = spatial.KDTree(a_arr)

out = []
for l in lstLines:
    pts_arr = get_points_onCurve(l)
    distances, lstindex = ktree.query(pts_arr, k=1)
    counts = np.bincount(lstindex)
    idx = np.argmax(counts)
    # search the group index 
    gpIdx = get_groupIdx_by_flatIdx(idx)
    out.append(gpIdx)
OUT = out
6 Likes

:star_struck: very slick!

1 Like

superb!!! :scream:

I just tested it, the KD tree binary search is super fast! :scream:
if using old searching method by comparing each value, it takes me more than 30mins to 1hr to finish
if use KDTree, I just need within 1 min!

2 Likes