Skip to main content
Question

Update Invalid Dates


Forum|alt.badge.img

We often get data with invalid dates in date fields. What is the best way to find invalid dates and replace the invalid dates with a common valid date. There are multiple date fields and there could be invalid dates for multiple date fields for the same record. This is what we need to accomplish:

1. find the invalid dates (could be multiple fields for the same record)

2. update the invalid dates with a generic valid date (for instance, all the invalid dates would become 1/1/2001)

3. combine all the updated records with updated invalid dates with all the records that did not have invalid dates so that the outcome is a table with all the original records with just the updated invalid dates

 

Thanks,

David

7 replies

chrisatsafe
Contributor
Forum|alt.badge.img+2
  • Contributor
  • June 18, 2019

Hi @dkirouac,

You can accomplish this using an AttributeValidator to validate the format of the date attribute value. In the attached example, the source dataset contains a few different date formats like: 20180101, 01-01-2019, and 01/01/2019 (the desired format in this case). Using the AttributeValidator and a regular expression, you can filter out the dates that do not match the desired date format (%d/%m/%Y). Note: If you are not familiar with creating regular expressions, there are a lot of great sites like Regular Expression Library that have pre-built expressions like the Date with Slashes expression used in this template:

^\d{1,2}\/\d{1,2}\/\d{4}$

Once the dates are filtered, you can use a DateTimeStamper set the constant/generic date - in this case, I used today's date and set the format using the DateTimeConverter. Of course, you don't have to use today's date, you could use any other date or something like the DateTimeCalculator to subtract a certain number of years, months, or days from today's date or hard code a date in using something like an AttributeManager.

Lastly, use an AttributeRemover to remove the unwanted attributes that were created by the AttributeValidator and write the output to your desired format.

Hope this helps!

Template: Validate_Date_Format.fmwt


david_r
Celebrity
  • June 19, 2019

The tip from @chrisatsafe is good, but be aware that validating dates using regex risks accepting clearly invalid dates such as "32/46/9234" but also more subtle errors like "29/02/2019".

My take would be to leverage the exising datetime module in Python, it's very powerful and has a lot of testing behind it.

Sample code to validate the datetime attribute "my_datetime_string", the resulting attribute "my_datetime_isvalid" will be true or false depending.

import fmeobjects
import datetime

def validate(date_text):
    try:
        datetime.datetime.strptime(date_text, '%d/%m/%Y')
    except ValueError:
        raise ValueError("Incorrect data format, should be DD/MM/YYYY")
        
def validate_datetime(feature):
    dt = feature.getAttribute('my_datetime_string')
    try:
        validate(dt)
    except ValueError:
        feature.setAttribute('my_datetime_isvalid''false')
    else:
        feature.setAttribute('my_datetime_isvalid''true')

The PythonCaller needs to point to the method "validate_datetime".

Maybe FME needs dedicated functionality to validate dates?


ebygomm
Influencer
Forum|alt.badge.img+38
  • Influencer
  • June 19, 2019

Note that regex isn't actually validating for dates though, it will happily pass a date such as 30/02/2019 which is not a valid date.

The DateTimeConverter will let you validate dates, by sending invalid dates out the rejected port. The fme_expression_warning list identifies which values were found to be invalid and these are set to null, so can be replaced with a default value in the NullAttributeMapper


ebygomm
Influencer
Forum|alt.badge.img+38
  • Influencer
  • June 19, 2019
david_r wrote:

The tip from @chrisatsafe is good, but be aware that validating dates using regex risks accepting clearly invalid dates such as "32/46/9234" but also more subtle errors like "29/02/2019".

My take would be to leverage the exising datetime module in Python, it's very powerful and has a lot of testing behind it.

Sample code to validate the datetime attribute "my_datetime_string", the resulting attribute "my_datetime_isvalid" will be true or false depending.

import fmeobjects
import datetime

def validate(date_text):
    try:
        datetime.datetime.strptime(date_text, '%d/%m/%Y')
    except ValueError:
        raise ValueError("Incorrect data format, should be DD/MM/YYYY")
        
def validate_datetime(feature):
    dt = feature.getAttribute('my_datetime_string')
    try:
        validate(dt)
    except ValueError:
        feature.setAttribute('my_datetime_isvalid''false')
    else:
        feature.setAttribute('my_datetime_isvalid''true')

The PythonCaller needs to point to the method "validate_datetime".

Maybe FME needs dedicated functionality to validate dates?

Yes, in reality I use a python caller wrapped up in a custom transformer to validate multiple dates at once and set to a default value. 

The DateTimeConverter method produces the same results but does put lots of warnings in the log file


chrisatsafe
Contributor
Forum|alt.badge.img+2
  • Contributor
  • June 19, 2019
david_r wrote:

The tip from @chrisatsafe is good, but be aware that validating dates using regex risks accepting clearly invalid dates such as "32/46/9234" but also more subtle errors like "29/02/2019".

My take would be to leverage the exising datetime module in Python, it's very powerful and has a lot of testing behind it.

Sample code to validate the datetime attribute "my_datetime_string", the resulting attribute "my_datetime_isvalid" will be true or false depending.

import fmeobjects
import datetime

def validate(date_text):
    try:
        datetime.datetime.strptime(date_text, '%d/%m/%Y')
    except ValueError:
        raise ValueError("Incorrect data format, should be DD/MM/YYYY")
        
def validate_datetime(feature):
    dt = feature.getAttribute('my_datetime_string')
    try:
        validate(dt)
    except ValueError:
        feature.setAttribute('my_datetime_isvalid''false')
    else:
        feature.setAttribute('my_datetime_isvalid''true')

The PythonCaller needs to point to the method "validate_datetime".

Maybe FME needs dedicated functionality to validate dates?

That is a great point @egomm & @david_r! 

I did a little searching in our system and found out that validation for dates was actually added to the AttributeValidator in FME 2019.0! 

0684Q00000ArMkpQAF.png


david_r
Celebrity
  • June 19, 2019
chrisatsafe wrote:

That is a great point @egomm & @david_r!

I did a little searching in our system and found out that validation for dates was actually added to the AttributeValidator in FME 2019.0!

Fantastic! :-D


Forum|alt.badge.img
  • Author
  • June 19, 2019

Sounds promising. It sounds like we should upgrade to 2019 so that we can test the updated AttributeValidator.

Thank you all for your responses and help.


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings