Question

Different date formats in the same date column


Hello,

I have multiple invoice CSVs I pull in every month with about 300 records each. It eventually goes into a database table that is used as a lookup.

The CSV has an invoice_date column and at first I thought the invoice dates were all the same format (M/D/YYYY >> 2/1/2020), however, some invoices have dates coming in the following format: (YYYY/MM/DD >> 2020/02/01).

I was wondering if there was a way to standardize this so I don't get NULL date values in the translation when the date format I'm looking for isn't recognized. I'm having a hard time doing a condition because both the date type use the "/" character as a separator. If it was a "-" or something then I can use a condition to search out the special character and parse those dates using a particular format. Any help on this matter would be very helpful!

Thank you.


10 replies

Userlevel 5
Badge +26

Ah, long live consistent data :)

 

You can use a StringSearcher to do the parsing. No FME handy right now, so I have to work from memory, but if you use the regex ^\\d{4}/ it should filter out the YYYY/MM/DD ones.

Ah, long live consistent data :)

 

You can use a StringSearcher to do the parsing. No FME handy right now, so I have to work from memory, but if you use the regex ^\\d{4}/ it should filter out the YYYY/MM/DD ones.

Ah, that did it! Thank you very much.

Userlevel 1
Badge +21

You can use an attribute as an input format in the datetimeconverter, so you can use an attributecreator with a conditional statement to create an attribute containing the format

Then use this in the datetimeconverter

Userlevel 1
Badge +21

Actually, you don't really need the date time converter at all

You can use an attribute as an input format in the datetimeconverter, so you can use an attributecreator with a conditional statement to create an attribute containing the format

Then use this in the datetimeconverter

Ah this is a great work around as well! THank you for your help!

Userlevel 5

For my own education I thought I'd see if it was possible to chain two DateTimeConverters, the second one connected to the <Rejected> port of the first one. The idea being that I'd try to parse the date according to the first format, then retry with the second format if the first failed.

However I notice that the DateTimeConvert sets the date attribute to NULL before it exits the <Rejected> port, even when playing around with the "Repair overflow" setting.

Is this by design, @mark2atsafe? Wouldn't it make more sense to leave the input attribute untouched?

Userlevel 5
Badge +26

For my own education I thought I'd see if it was possible to chain two DateTimeConverters, the second one connected to the <Rejected> port of the first one. The idea being that I'd try to parse the date according to the first format, then retry with the second format if the first failed.

However I notice that the DateTimeConvert sets the date attribute to NULL before it exits the <Rejected> port, even when playing around with the "Repair overflow" setting.

Is this by design, @mark2atsafe? Wouldn't it make more sense to leave the input attribute untouched?

I agree, it would make more sense to keep the original attribute intact if it's rejected. Little point in inspecting the rejected feature if the reason for rejection is stripped...

For my own education I thought I'd see if it was possible to chain two DateTimeConverters, the second one connected to the <Rejected> port of the first one. The idea being that I'd try to parse the date according to the first format, then retry with the second format if the first failed.

However I notice that the DateTimeConvert sets the date attribute to NULL before it exits the <Rejected> port, even when playing around with the "Repair overflow" setting.

Is this by design, @mark2atsafe? Wouldn't it make more sense to leave the input attribute untouched?

I attempted this first but that's what I had a hard time figuring out. It was setting all the rejected date records to null which seemed a bit counter intuitive to me. I was able to use the regex method as mentioned in @redgeographics's post and isolate the different date format. I parse the normal date format in the attribute manager and the anomalies in the datetimeconverter.

Userlevel 5

I attempted this first but that's what I had a hard time figuring out. It was setting all the rejected date records to null which seemed a bit counter intuitive to me. I was able to use the regex method as mentioned in @redgeographics's post and isolate the different date format. I parse the normal date format in the attribute manager and the anomalies in the datetimeconverter.

Thanks for confirming. I've flagged it with Safe for analysis.

Possibly the same issue: https://knowledge.safe.com/questions/91296/datetimeconverter-rejected-port-null.html

Userlevel 4
Badge +26

For my own education I thought I'd see if it was possible to chain two DateTimeConverters, the second one connected to the <Rejected> port of the first one. The idea being that I'd try to parse the date according to the first format, then retry with the second format if the first failed.

However I notice that the DateTimeConvert sets the date attribute to NULL before it exits the <Rejected> port, even when playing around with the "Repair overflow" setting.

Is this by design, @mark2atsafe? Wouldn't it make more sense to leave the input attribute untouched?

That's a very good point. I'll check into that.

Reply