Question

text file to spatial geometry

  • 16 March 2018
  • 8 replies
  • 24 views

Badge

Hi all,

I have a check-in text data file with long/lat information, ,example as below:

[user] [check-in time] [latitude] [longitude] [location id]

196514 2010-07-24T13:45:06Z 53.3648119 -2.2723465833 145064

How can I convert it into point features? I tried several times but unsuccessful

The data file looks like this:

0 2010-10-19T23:55:27Z 30.2359091167 -97.7951395833 22847 0 2010-10-18T22:17:43Z 30.2691029532 -97.7493953705 420315 0 2010-10-17T23:42:03Z 30.2557309927 -97.7633857727 316637 0 2010-10-17T19:26:05Z 30.2634181234 -97.7575966669 16516 0 2010-10-16T18:50:42Z 30.2742918584 -97.7405226231 5535878 0 2010-10-12T23:58:03Z 30.261599404 -97.7585805953 15372 0 2010-10-12T22:02:11Z 30.2679095833 -97.7493124167 21714 0 2010-10-12T19:44:40Z 30.2691029532 -97.7493953705 420315 0 2010-10-12T15:57:20Z 30.2811204101 -97.7452111244 153505 0 2010-10-12T15:19:03Z 30.2691029532 -97.7493953705 420315 0 2010-10-12T00:21:28Z 40.6438845363 -73.7828063965 23261 0 2010-10-11T20:21:20Z 40.74137425 -73.9881052167 16907 0 2010-10-11T20:20:42Z 40.741388197 -73.9894545078 12973 0 2010-10-11T00:06:30Z .....

Thanks for your help?


8 replies

Userlevel 2
Badge +17

Hi @maxbuiaus, looks like the text data is a single line without newlines, and every item is delimited by space. If so, a possible way is:

  1. Read the whole text at once.
  2. Find every check-in entry from the text and store them into a list attribute.
  3. Decompose the feature into multiple features, each of which contains a check-in entry (i.e. list element).
  4. Split a check-in entry into each item (user, check-in time, latitude, longitude, and location id).
  5. Create Point from the coordinates (longitude, latitude).

In regular expressions, /s (backslash and lowercase s) indicates a whitespace character, /S (backslash and uppercase S) indicates a non-whitespace character, and + means "one or more of preceding character".

Userlevel 2
Badge +17

Hi @maxbuiaus, looks like the text data is a single line without newlines, and every item is delimited by space.
If so, a possible way is:

  1. Read the whole text at once.
  2. Find every check-in entry from the text and store them into a list attribute.
  3. Decompose the feature into multiple features, each of which contains a check-in entry (i.e. list element).
  4. Split a check-in entry into each item (user, check-in time, latitude, longitude, and location id).
  5. Create Point from the coordinates (longitude, latitude).

0684Q00000ArLTLQA3.png

In regular expressions, /s (backslash and lowercase s) indicates a whitespace character, /S (backslash and uppercase S) indicates a non-whitespace character, and + means "one or more of preceding character".

For this kind of translation, sometimes JSON operations could also be effective.

 

JSON Template Expression (XQuery expression):

 

[
    let $data := fn:tokenize(fme:get-attribute("text_line_data"), '\s+')
    for $i in (1 to fn:count($data)) where ($i - 1) mod 5 eq 0
    return {
        "type" : "Feature",
        "geometry" : {
            "type" : "Point",
            "coordinates" : [
                xs:double($data[$i + 3]),
                xs:double($data[$i + 2])
            ]
        },
        "properties" : {
            "user" : $data[$i],
            "check-in time" : $data[$i + 1],
            "latitude" : $data[$i + 2],
            "longitude" : $data[$i + 3],
            "location id" : $data[$i + 4]
        }
    }
]
0684Q00000ArMjDQAV.png

 

 

Badge

Thx you for your great support @takashi

Badge

Hi @maxbuiaus, looks like the text data is a single line without newlines, and every item is delimited by space. If so, a possible way is:

  1. Read the whole text at once.
  2. Find every check-in entry from the text and store them into a list attribute.
  3. Decompose the feature into multiple features, each of which contains a check-in entry (i.e. list element).
  4. Split a check-in entry into each item (user, check-in time, latitude, longitude, and location id).
  5. Create Point from the coordinates (longitude, latitude).

In regular expressions, /s (backslash and lowercase s) indicates a whitespace character, /S (backslash and uppercase S) indicates a non-whitespace character, and + means "one or more of preceding character".

Thx you for your great support @takashi

 

 

I have tried your workflow, however it is not successful. I have attached a small part of the data bellow. Can you pls have look into this case

 

test.txt
Userlevel 2
Badge +17
The sample text you have posted at first is just a single string line containing space separated values, but the new one you have attached is a table with tab-separated values. Since its format is completely different from the first sample, naturally my solution which is based on the first sample doesn't work.

 

 

If the format of the new sample data was correct, you could just read it with the CSV reader simply. When adding the reader to the workspace, select "tab" as the Delimiter Character, and leave the Field Names Line blank (or select "None" in FME 2018).

 

Note: The 617th line in the sample data seems to be broken.

 

 

In this kind of question, it's highly recommended to post a minimal and valid sample data created in the same format as the actual data at first, in order to get a quick and valid answer.

 

 

 

 

 

Userlevel 2
Badge +17

Hi @maxbuiaus, looks like the text data is a single line without newlines, and every item is delimited by space. If so, a possible way is:

  1. Read the whole text at once.
  2. Find every check-in entry from the text and store them into a list attribute.
  3. Decompose the feature into multiple features, each of which contains a check-in entry (i.e. list element).
  4. Split a check-in entry into each item (user, check-in time, latitude, longitude, and location id).
  5. Create Point from the coordinates (longitude, latitude).

In regular expressions, /s (backslash and lowercase s) indicates a whitespace character, /S (backslash and uppercase S) indicates a non-whitespace character, and + means "one or more of preceding character".

The sample text you have posted at first is just a single string line containing space separated values, but the new one you have attached is a table with tab-separated values. Since its format is completely different from the first sample, naturally my solution which is based on the first sample doesn't work.

 

 

If the format of the new sample data was correct, you could just read it with the CSV reader simply. When adding the reader to the workspace, select "tab" as the Delimiter Character, and leave the Field Names Line blank (or select "None" in FME 2018).

 

Note: The 617th line in the sample data seems to be broken.

 

 

In this kind of question, it's highly recommended to post a minimal and valid sample data created in the same format as the actual data at first, in order to get a quick and valid answer.

 

 

 

 

 

Badge

Thank you for your great support, @takashi. I now can make it

Btw, with the UTC time in column 7, how can I convert to Local datetime (YYYYMMDDHHMMSS). Do you have any suggestion? I am totally new to FME

Userlevel 2
Badge +17

Thank you for your great support, @takashi.  I now can make it

Btw, with the UTC time in column 7, how can I convert to Local datetime (YYYYMMDDHHMMSS). Do you have any suggestion? I am totally new to FME

Assuming you are using FME 2017.1 or later.

 

Firstly convert format of the input date/time value to the standard FME data/time format with the DateTimeConverter.

 

  • Input Format: "%Y-%m-%dT%H:%M:%S%Ez$" (ISO format with offset)
  • Output Format: "FME" (auto), or "%Y%m%d%H%M%S%z" (FME format with offset)
You now have the UTC date/time value with FME format with timezone offset: YYYYmmddHHMMSS+00:00

 

You can then convert it to local date/time with @TimeZoneSet function, and remove the timezone offset suffix with the @TimeZoneRemove function. Try setting this expression to the Attribute Value column in an AttributeCreator or an AttributeManager. Assume an attribute called "_datetime" stores the UTC date/time value formatted in standard FME format with timezone offset.

 

@TimeZoneRemove(@TimeZoneSet(@Value(_datetime),local))
See these links to learn more about FME Date/Time functions and Standard FME Date/Time Format.

 

Addition: In advanced use of FME Date/Time functions, this single expression does the entire conversion above at once. 

 

@TimeZoneRemove(@TimeZoneSet(@DateTimeParse(@Value(_datetime),%Y-%m-%dT%H:%M:%S%Ez$),local))

Reply