How to find strings with specific regular expressions regex?

Hello I would like to specify a regex to find:

  1. date with format: DD/MM/YY or DD/MM/YYYY
  2. sequence starting by prefix C,P,T,R,I and sequence number 2 digits, for example C03

I do not have a clue how is a regex defined but I am looking to achieve something like that.

Many thanks for your support

  1. [0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]

  2. [C,P,T,R,I][0-9][0-9]

Dates you can get all sorts of funky on. Google search ‘date regex’. The above will let you use invalid dates like 45/00/3098. But assuming your data input is relatively rigid that’s okay. It just depends on what exceptions you might expect. The above for your “C03” might also provide invalid results if your data set has a “BOC030” that you don’t want to pick up. Regex can be as simple or as complex as you like depending on what you expect your data set to look like. The above are very basic and assume your data is ‘good’.

Have a look at something like Regexr and the Python Library.

Edit: to be clear I tend to just use the Regex nodes (Clockwork package I think?) where you just input a regex string. If you want to explore how to do it in Python you could pull apart those nodes or look into Python regex guides.

1 Like

Something like that :

import regex as re
# pattern match correct with string datetime format dd/mm/yyyy
pattern = re.compile(r'^(0[1-9]|[12][0-9]|3[01])/(0[1-9]|1[012])/((19|20)\d\d)$')
# pattern = re.compile(r'(\d{1,2})/(\d{1,2})/(\d{2,4})')
# test list of dates
test_dates = ['01/01/2019', '01/01/19', '21/11/22', '01/01/2019', '01/01/2019 01:01:01', '01/01/19 01:01:01', '1/1/2019 01:01:01', '1/1/19 01:01:01']

print("Test for correct dates with dd/mm/yyyy format")
# check if match
for date in test_dates:
    if pattern.match(date):
        print('Match: {}'.format(date))
    else:
        print('No Match: {}'.format(date))
# pattern match correct with string datetime format dd/mm/yy
pattern2 = re.compile(r'^(0[1-9]|[12][0-9]|3[01])/(0[1-9]|1[012])/(\d\d)$')
# check if match
print("test for correct dates with dd/mm/yy format")
for date in test_dates:
    if pattern2.match(date):
        print('Match: {}'.format(date))
    else:
        print('No Match: {}'.format(date))
print("Test match with dd/mm/yyyy and dd/mm/yy format")
for date in test_dates:
    if pattern.match(date) or pattern2.match(date):
        print('Match: {}'.format(date))
    else:
        print('No Match: {}'.format(date))
# pattern sequence starting by prefix C,P,T,R,I and sequence number 2 digits
pattern = re.compile(r'[C,P,T,R,I]\d{2}')
# test list
test = ['C01','P02','TS3','RNB4','I05']
# check if match
print("Test with sequence starting by prefix C,P,T,R,I")
for t in test:
    if pattern.match(t):
        print('Match: {}'.format(t))
    else:
        print('No Match: {}'.format(t))

Result

Test for correct dates with dd/mm/yyyy format
Match: 01/01/2019
No Match: 01/01/19
No Match: 21/11/22
Match: 01/01/2019
No Match: 01/01/2019 01:01:01
No Match: 01/01/19 01:01:01
No Match: 1/1/2019 01:01:01
No Match: 1/1/19 01:01:01
test for correct dates with dd/mm/yy format
No Match: 01/01/2019
Match: 01/01/19
Match: 21/11/22
No Match: 01/01/2019
No Match: 01/01/2019 01:01:01
No Match: 01/01/19 01:01:01
No Match: 1/1/2019 01:01:01
No Match: 1/1/19 01:01:01
Test match with dd/mm/yyyy and dd/mm/yy format
Match: 01/01/2019
Match: 01/01/19
Match: 21/11/22
Match: 01/01/2019
No Match: 01/01/2019 01:01:01
No Match: 01/01/19 01:01:01
No Match: 1/1/2019 01:01:01
No Match: 1/1/19 01:01:01
Test with sequence starting by prefix C,P,T,R,I
Match: C01
Match: P02
No Match: TS3
No Match: RNB4
Match: I05
3 Likes

hello I tried it, it works although I had some wrong results because text was found with those regular expressions but text was not starting with the expressions, there was another text before that does not correspond on what I am looking for.

Do you know how to add the condition that the text starts with the regex after a line of text or separated words by space?

\s is a whitespace separator. So you can add that to the start and/or end. Regexr and Python Lib links above detail all the different possibilities.

1 Like

I understand, very helpful, many thanks