Solved

How to convert ambiguous dates to unique dates with time zone information?


Badge

I have a time series with ambiguous dates due to the daylight savings time change that I would like to convert to unique datetimes with time zone information:

 

  • 2021-10-31 02:00 > 2021-10-31 02:00:00+02:00
  • 2021-10-31 02:15 > 2021-10-31 02:15:00+02:00
  • 2021-10-31 02:30 > 2021-10-31 02:30:00+02:00
  • 2021-10-31 02:45 > 2021-10-31 02:45:00+02:00
  • 2021-10-31 02:00 > 2021-10-31 02:00:00+01:00
  • 2021-10-31 02:15 > 2021-10-31 02:15:00+01:00
  • 2021-10-31 02:30 > 2021-10-31 02:30:00+01:00
  • 2021-10-31 02:45 > 2021-10-31 02:45:00+01:00

I tried to do this in FME with the "TimeZoneSet"-function as described in https://community.safe.com/s/article/converting-time-and-date-fields-to-local-timezones. This works for all datetimes except the ambigous ones above (I get null values).

 

In Python this can easily be done by using pandas and the option "ambigous='infer'":

import pandas as pd
df = pd.read_csv(file, sep=';', parse_dates=['dt'])
df['dt'] = df['dt'].dt.tz_localize('CET', ambiguous='infer')

For this reason I wanted to use this code in a PythonCaller. But unfortunately this does not work. I assume this is because FME can not handle pandas DataFrame objects, see https://community.safe.com/s/question/0D54Q000080hYEoSAM/displaying-data-frame-in-python-caller.  I got errors like:

  • AttributeError: Can only use .dt accessor with datetimelike values

 

Any tips on how I could solve this problem?

icon

Best answer by ebygomm 8 June 2022, 18:05

View original

7 replies

I was able to solve this problem with ordered and sampled input and then process the datetime values manually in the AttributeManager. This works well but is not really a robust solution (e.g. different timestamp resolution, more than one timezone switch in the dataset etc.)

Badge

Isn't it possible to use a pandas DataFrame to transform datetime values? Any other recommendations?

Userlevel 1
Badge +21

Isn't it possible to use a pandas DataFrame to transform datetime values? Any other recommendations?

It is possible to transform the datetime values using pandas, but you'll need to create the DataFrame within the pythoncaller. How is your source data structured?

Badge

It is possible to transform the datetime values using pandas, but you'll need to create the DataFrame within the pythoncaller. How is your source data structured?

Thank you for your answer! My source is a csv-file with a "Date Time" column containing values like

  • 2021-10-31 02:00
  • 2021-10-31 02:15
  • ...

 

screenshot 

My solution so far:

I use a PythonCreator to import a csv file, do my "datetime transformations" with pandas (as described above), and then export a new csv file that I can then import into FME with a CSV-reader and do further transformations. Like this I can solve my problem. However I doubt that this is really best practice how to work with pandas and fme. What do you think of this solution?

Userlevel 1
Badge +21

It is possible to transform the datetime values using pandas, but you'll need to create the DataFrame within the pythoncaller. How is your source data structured?

I suspect a better way would be to import the csv file, do the transformation with pandas and then process the csv file directly into features from there.

 

Something like this

import fme
import fmeobjects
import pandas as pd
 
 
class FeatureProcessor(object):
    def __init__(self):
 
        self.df = pd.read_csv(FME_MacroValues['SourceDataset_CSV2'], sep=';', parse_dates=['dt'])
        self.df['dt'] = self.df['dt'].dt.tz_localize('CET', ambiguous='infer')
 
 
    def input(self, feature):
        for index, row in self.df.iterrows():
            f = fmeobjects.FMEFeature()
            f.setAttribute('dt', str(row['dt']))
            f.setAttribute('attr_value_1', row['attr_value_1'])
            f.setAttribute('attr_value_2', row['attr_value_2'])
            self.pyoutput(f)
 
    def close(self):
        pass

 

Badge

It is possible to transform the datetime values using pandas, but you'll need to create the DataFrame within the pythoncaller. How is your source data structured?

Good idea! This has worked like a charm. Thank you!

The ‘.dt’ accessor only works with datetimelike values, and if you try to use it on non-date-time-like objects, it will throw an error.

 

The '.dt' accessor can extract the year, month, day, hour, minute, and second information from a datetime value.

 

You can easily fix the AttributeError: can only use .dt accessor with datetimelike values by using the pd.to_datetime(df['column'], errors='coerce') function to transform missing values into NaN and then use the .dt accessor to get the date values.

Reply