Merge rows / add column value if two other column values equal

Hey everyone,

I’m trying to figure out the most efficient way to get the data I need, using dynamo. I’m guessing Python is the way to go and have some ideas on that (multiple nested for loops), but kinda stuck with how to do it without taking forever to process. Basically I have a list with 2 sublists. In each sublist, if the item number and task number are the same between rows, merge the rows and add quantity together.

So for example…

Item Task Rev Job # Qty Site
34333 2755 A 3-17017 10 Warehouse
37555 2765 A 3-17017 20 Warehouse
37555 2755 A 3-17017 15 Warehouse
37555 2755 A 3-17017 15 Warehouse
34333 2755 A 3-17017 25 Warehouse
34333 2755 A 3-17017 100 Warehouse

Would become…

Item Task Rev Job # Qty Site
34333 2755 A 3-17017 135 Warehouse
37555 2765 A 3-17017 20 Warehouse
37555 2755 A 3-17017 30 Warehouse

(The only column that would change during the merge is Quantity gets added together)

This is a very simplified example, as my live data is 18 columns and 10k+ rows. (hence the concern on how long it will take)

I was thinking I’d get all the unique item numbers and unique task numbers into their own lists, then use for loops against the main list to determine if there is a match. Then write all columns to a new line, adding the quantity. Tried to attach my .dyn that I’m using to test a solution, but it won’t allow me since I’m relatively new here. :frowning: Any advice is greatly appreciated.

Thank you!
Amber

You shouldn’t need Python for this! Unless I’m grossly oversimplifying the problem, you should be able to concat your item & task values into a single string that you can use as a key to group your sublists (rows) by key (item + task) with List.GroupByKey. Then sum the quantities in each sublist and reconstruct your list with 1 sublist per unique key, replacing the quantity index with the sum.

You can always share files for the time being by using Dropbox or another file sharing service!

If you upload/share a link to a sample Excel file, I’m happy to clarify the above graphically in Dynamo for you

3 Likes

Can you show a picture of what your data actually looks like before merging? So we know at least how to set up a similar example to test on.

2 Likes

Like @awilliams suggests: you probably won’t be needing Python for this.
What I would like to know what the data actually looks like at this point.
Are these all straight up strings or are these split up as well?
If these are strings they’d need to be split, which is very doable (also without Python) but it’d require a bit more work.

2 Likes

i don’t know what youre input is but is it a string then this could work:

4 Likes

Thank you! I will try this out and if I need further help I will let you know. Not sure why I didn’t think of using Dropbox.

@apanz

I haven’t tried @Nico_Stegeman’s graph to see if it works.

It may be possible to do in Dynamo/Python, but possibly is not the best tool for the job.
A very simple SQL query will give you exactly what you want:

SELECT Item, Task, Rev, Job, Sum(Qty) AS SumOfQty, Site
FROM TBL
GROUP BY Item, Task, Rev, Job, Site;

The key thing is to group by all the fields, except the quantity which is summed for each group

I would use Dynamo to export all of the data- to a format such as Excel, CSV, or a database format such as Access or SQLite. Then create a query (as above)

The below example is using Access, which would not raise a sweat with 10k rows.
I’ve attached the file (remove the .txt extension) Database1.accdb.txt (444 KB)

Andrew

@apanz This is interesting :grinning:.

Try this:

and the code:

import sys
pyt_path = r'C:\Program Files (x86)\IronPython 2.7\Lib'
sys.path.append(pyt_path)

from collections import defaultdict

import re

Numbers=[int(_string) for _string in [re.search('17017 (.*) ', _string).group(1) for _string in IN[0]]]

Strings=[_string[0:end] for _string,end in zip(IN[0], [re.search("17017", _string).end() for _string in IN[0]])]

data_dict= defaultdict(list)

for k, v in zip(Strings,Numbers):
        data_dict[k].append(v)

OUT=[item[0]+ " " + str(sum(item[1])) + " Warehouse" for item in list(data_dict.items())]

This is what my data looks like.