Solved

csv and control-Z

5 years ago
August 22, 2019
2 replies
22 views

+10

kimo
Contributor
96 replies

I am suddenly receiving a pipe separated text file with imbedded pseudo-unicode characters that are supposed to be a macronated 'o'. Unfortunately it translates to a ^Z in the ASCII file.

All the vowels can have macrons in Maori and there is a new policy that all government departments must add macrons. If all software was unicode aware then this might work.

Some programs will handle this, reading the whole file regardless of the ^Z but many stop. The FME CSV2 reader stops. Oddly the FME Textfile reader does handle them with encoding set to DOS-Latin-1 (ibm-850)

What can I do?

The simplest idea is to translate the pair of characters ^Zo back to a plain ascii o.

Surely tr could just strip the ^Z...nope.

I have tried to use utf-8 encoding parameter on the CSV reader and other tricks with tr without success.

I have attached a test sample.asc of two records.

Unix wc -l returns a count of 1, not a good start since I can cat two records.

Best answer by kimo

I have created a workaround based on this suggestion on stack exchange

https://stackoverflow.com/questions/20078816/replace-non-ascii-characters-with-a-single-space

def processFeature(feature):
    """extract record type in field 8 and strip ^Z"""
    buffer = feature.getAttribute('text_line_data')
    fixed = ''.join([i if ord(i) > 31 else '' for i in buffer])
    feature.setAttribute("rec", buffer.split('|')[7])
    feature.setAttribute('text_line_data',fixed)

I used the text reader to read in the whole file (it ignores ^Z - hooray!) and then a PythonCaller to strip off the ^Z, write each line out to another text file. Then I was able to use a CSVReader to read in the data successfully splitting at the pipe separators. Perhaps I could have joined up the two processes with a workspace runner, but I just wanted my original workspace to run again.

I was not able to use the original startup Python script because Python also halts on an imbedded ^Z.

View original

Did this help you find an answer to your question?

+10

kimo
Author
Contributor
96 replies
Best Answer
5 years ago
August 22, 2019

I have created a workaround based on this suggestion on stack exchange

https://stackoverflow.com/questions/20078816/replace-non-ascii-characters-with-a-single-space

def processFeature(feature):
    """extract record type in field 8 and strip ^Z"""
    buffer = feature.getAttribute('text_line_data')
    fixed = ''.join([i if ord(i) > 31 else '' for i in buffer])
    feature.setAttribute("rec", buffer.split('|')[7])
    feature.setAttribute('text_line_data',fixed)

I used the text reader to read in the whole file (it ignores ^Z - hooray!) and then a PythonCaller to strip off the ^Z, write each line out to another text file. Then I was able to use a CSVReader to read in the data successfully splitting at the pipe separators. Perhaps I could have joined up the two processes with a workspace runner, but I just wanted my original workspace to run again.

I was not able to use the original startup Python script because Python also halts on an imbedded ^Z.

+20

debbiatsafe
Safer
648 replies
5 years ago
August 22, 2019

Hi @kimo

I was able to use the CSV2 reader with a UTF-8 encoding parameter to read your sample file successfully.

In the Data Inspector, the macronated characters were prefixed with a substitute character (hex code \\x1a). It is possible to match using hex codes in the StringReplacer in Replace Regular Expression mode as mentioned in this Q&A post, so I would recommend using this method to replace the substitute character in your text file. In addition, I have found it is also possible to replace this character by pasting the sub character in the text editor in Replace Text mode of the String Replacer.

I have attached a sample workspace demonstrating these two approaches. I hope it helps.

kimo_StringReplacer_MacronatedCharacters.fmw

Reply

Rich Text Editor, editor1

csv and control-Z

2 replies

Reply

Helpful Members This Week

Recently Solved Questions

Generic source file name confusion? Or bad workflow?

Truncate SDE table with archiving enabled

Dissolver - Attributes to Sum and Multi Polygons:1+2 = 5

How to see which features have invalid source datasets when using a FeatureWrite?

How to compare multiple AGOL Feature Services

Community Stats

Latest FME

Cookie policy

Cookie settings

Reply

Related Topics

Handle Control-Z in CSV2 reader

PointcloudXYZ writer precisionicon

Points to pointcloudicon

csv to lasicon

CSV to Point Cloudicon

Helpful Members This Week

Recently Solved Questions

Generic source file name confusion? Or bad workflow?

Truncate SDE table with archiving enabled

Dissolver - Attributes to Sum and Multi Polygons:1+2 = 5

How to see which features have invalid source datasets when using a FeatureWrite?

How to compare multiple AGOL Feature Services

Popular Tags

Community Stats

Latest FME

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded

Cookie policy

Cookie settings