Solved

Best way to parse mixed variable and fixed width text file


Badge

Hi Guys,

I have a text file that contains both header information with necessary attribute fields and a fixed width set of sequential coordinates with bearing and distances that create the polygons of a cadastre. I need to create the polygons with the attributes attached.

If I read in the file as a CAT, I can't find a way to retrieve the attribute information at the top

If I read it as a text file, then split on the ----, I can sort of retrieve the header information as attributes, but then can't figure out how to extract the fixed width stuff into lists.

Also, all the the blank spaces are causing me issues in creating a set of headings, especially as the first line of the definition is blanks until you get to the radial, OR the E/N.

I was going to try to read these in as a sequential list - ie, each feature is also referencing the one before and the one after... it should be relatively easy to do!

Any thoughts on the best way to do this? Sample text file attached.

(Definitions: I believe that bearing is the baring of the surveyed line, distance is chord distance, arc is arc distance, radius is radius of the arc circle part, radial is the angle at the start / end of the arc, TP stands for Tangent Point - not sure why TP is inconsistently applied though)

Thank you!!

icon

Best answer by takashi 19 June 2018, 11:50

View original

16 replies

Badge +6

My solution is the following

 

1. Remove blank lines

 

2. Separate the fixed attribute part (Obj Accuracy,Update dtg etc.) and the list data part (BEARING,DISTANCE etc.)

 

3. Divide the list data

For details, please refer to the attached work space.

Result

Badge

My solution is the following

 

1. Remove blank lines

 

2. Separate the fixed attribute part (Obj Accuracy,Update dtg etc.) and the list data part (BEARING,DISTANCE etc.)

 

3. Divide the list data

For details, please refer to the attached work space.

Result

Wow, thank you @taojunabc! That's a great start. I forgot to mention my build - I'm on 2017.0. Could you possibly send me a graphic of where and how you used the listbuilder in the workspace? As I can't open it with that transformer version in my build version.

 

Badge +6
Wow, thank you @taojunabc! That's a great start. I forgot to mention my build - I'm on 2017.0. Could you possibly send me a graphic of where and how you used the listbuilder in the workspace? As I can't open it with that transformer version in my build version.

 

I added a screenshot, and I found that the regular expression in StringSearcher_2 had some errors and I fixed it.

 

Because screenshots on a 4K monitor, you should be able to clearly see the annotations after zooming in.

 

 

 

Badge

My solution is the following

 

1. Remove blank lines

 

2. Separate the fixed attribute part (Obj Accuracy,Update dtg etc.) and the list data part (BEARING,DISTANCE etc.)

 

3. Divide the list data

For details, please refer to the attached work space.

Result

Thank you @taojunabc! That was very helpful.

The regex in number 2 didn't work by the way, but I substituted for an attribute manager and split out the attributes of the exploded data lists using substrings.

Now I just have to recreate the geometry! :)

Userlevel 2
Badge +17

Hi @katrinaopperman, to parse this sort of text data, I often use global variables (VariableSetters/Retrievers) and a TestFilter to classify the text lines, then translate them for each group (Header, Table Body, Footer=AREA) separately. The major part of the workflow looks like this.

Badge +6

Hi @katrinaopperman, to parse this sort of text data, I often use global variables (VariableSetters/Retrievers) and a TestFilter to classify the text lines, then translate them for each group (Header, Table Body, Footer=AREA) separately. The major part of the workflow looks like this.

Great, VariableSetters/Retrievers and a TestFilter simplifies the workspace and is relatively easy to understand.

 

 

Badge

Hi @katrinaopperman, to parse this sort of text data, I often use global variables (VariableSetters/Retrievers) and a TestFilter to classify the text lines, then translate them for each group (Header, Table Body, Footer=AREA) separately. The major part of the workflow looks like this.

Hi @takashi, thank you for that. Unfortunately I'm not really up with how to use the variable retriever and setter transformers. I looked at the help but I'm afraid they didn't really explain much and I can't seem to get it to work in my workspace. Is there an article you know of that takes someone through their use? I've got the parsing working as per taojunabc's answer with some modifications, but it would be nice to understand how to apply your method.

 

 

On a side note, do you know how to set the geometry of a part of a feature using a list item? I'm trying to create my polygons, and I have a list which contains all the right attributes, but I can only work out how to use the SetGeometry property on the feature, not on the feature part. It's got to have something to do with segments or boundaries or paths...?

 

This is what I've got so far:

 

pythoncaller-creategeometry.jpg

 

And this is what I'm trying to create:

 

ifmepolygon-attributes.jpg

 

is there a way to do this through python?

 

Thank you!

 

Katrina

 

Userlevel 2
Badge +17

Hi @katrinaopperman, to parse this sort of text data, I often use global variables (VariableSetters/Retrievers) and a TestFilter to classify the text lines, then translate them for each group (Header, Table Body, Footer=AREA) separately. The major part of the workflow looks like this.

If you are aware that FME basically processes features one by one, I think you can leverage the VariableSetter/Retriever effectively in any context. For example, if a feature did set value 1 to a global variable named "A" through a VariableSetter, the next feature can fetch the value 1 from the variable "A" through a VariableRetriever. Observing how a global variable works actually with simple workspaces would be a quick way to understand that.

 

----------

 

If the boundary of the destination polygon could consist of lines and arcs, you will have to create the boundary as an FMEPath object containing FMELine objects and FMEArc objects as its parts.

 

Details of script could depend on what values are stored in the list attribute. If you share your workspace that creates the list with us, we can think of a script example.

 

Badge +6

Hi @katrinaopperman, to parse this sort of text data, I often use global variables (VariableSetters/Retrievers) and a TestFilter to classify the text lines, then translate them for each group (Header, Table Body, Footer=AREA) separately. The major part of the workflow looks like this.

0684Q00000ArJcEQAV.png

0684Q00000ArJgaQAF.png

Based on the screenshots of your Python Caller, I guess what you want to achieve. But as takashi said, it's best to provide your workspace when you ask questions.

 

 

In addition, I noticed several errors in your Python code.

 

  1. The current Python API does not support fetching the entire list, so feature.getAttribute('_final_list{}.EASTING') is incorrect. Must be such a feature of feature.getAttribute('_final_list{1}.EASTING').
  2. The build line should be FMELine([(x0,y0),(x1,y1)]), Its argument must be a list.
In addition, you can use FMEGeometryTools's appendCurve() to connect lines and arcs to build the curve needed to create the polygon.

 

sample code

 

from fmeobjects import *

def CreatePolygon(feature):
  fmeGeoTool=FMEGeometryTools()
  length=int(feature.getAttributeAsType('_element_count',FME_ATTR_INT16))
  if not length:
  return

  for i in range(length-1):
  x0=feature.getAttributeAsType('_list{%d}.EASTING' % i,FME_ATTR_STRING)
  y0=feature.getAttributeAsType('_list{%d}.NORTHING' % i,FME_ATTR_STRING)

  x1=feature.getAttributeAsType('_list{%d}.EASTING' % (i+1),FME_ATTR_STRING)
  y1=feature.getAttributeAsType('_list{%d}.NORTHING' % (i+1),FME_ATTR_STRING)

  r=feature.getAttributeAsType('_list{%d}.RADIUS' % (i+1),FME_ATTR_STRING)

  values= [x0,y0,x1,y1,r]
  for idx,v in enumerate(values):
  if v.strip(' ')=='':
  values[idx]=0.0
  else:
  values[idx]=float(v)

  x0,y0,x1,y1,r=values

  if r:
  seg=FMEArc((FMEPoint(x0,y0),FMEPoint(x1,y1)),abs(r),r<0)
  else:
  seg=FMELine([(x0,y0),(x1,y1)])
  if i==0:
  curve=seg
  else:
  curve=fmeGeoTool.appendCurve(curve,seg)

  pg=FMEPolygon(curve)
  feature.setGeometry(pg)

 

Result

 

0684Q00000ArM9cQAF.png

Userlevel 2
Badge +17
Based on the screenshots of your Python Caller, I guess what you want to achieve. But as takashi said, it's best to provide your workspace when you ask questions.

 

 

In addition, I noticed several errors in your Python code.

 

  1. The current Python API does not support fetching the entire list, so feature.getAttribute('_final_list{}.EASTING') is incorrect. Must be such a feature of feature.getAttribute('_final_list{1}.EASTING').
  2. The build line should be FMELine([(x0,y0),(x1,y1)]), Its argument must be a list.
In addition, you can use FMEGeometryTools's appendCurve() to connect lines and arcs to build the curve needed to create the polygon.

 

sample code

 

from fmeobjects import *

def CreatePolygon(feature):
  fmeGeoTool=FMEGeometryTools()
  length=int(feature.getAttributeAsType('_element_count',FME_ATTR_INT16))
  if not length:
  return

  for i in range(length-1):
  x0=feature.getAttributeAsType('_list{%d}.EASTING' % i,FME_ATTR_STRING)
  y0=feature.getAttributeAsType('_list{%d}.NORTHING' % i,FME_ATTR_STRING)

  x1=feature.getAttributeAsType('_list{%d}.EASTING' % (i+1),FME_ATTR_STRING)
  y1=feature.getAttributeAsType('_list{%d}.NORTHING' % (i+1),FME_ATTR_STRING)

  r=feature.getAttributeAsType('_list{%d}.RADIUS' % (i+1),FME_ATTR_STRING)

  values= [x0,y0,x1,y1,r]
  for idx,v in enumerate(values):
  if v.strip(' ')=='':
  values[idx]=0.0
  else:
  values[idx]=float(v)

  x0,y0,x1,y1,r=values

  if r:
  seg=FMEArc((FMEPoint(x0,y0),FMEPoint(x1,y1)),abs(r),r<0)
  else:
  seg=FMELine([(x0,y0),(x1,y1)])
  if i==0:
  curve=seg
  else:
  curve=fmeGeoTool.appendCurve(curve,seg)

  pg=FMEPolygon(curve)
  feature.setGeometry(pg)

 

Result

 

0684Q00000ArM9cQAF.png

??? I think that the FMEFeature.getAttribute method can convert a list attribute to a Python list, as in:

 

x_coords = feature.getAttribute('_list{}.EASTING')
y_coords = feature.getAttribute('_list{}.NORTHING') 
Badge +6
??? I think that the FMEFeature.getAttribute method can convert a list attribute to a Python list, as in:

 

x_coords = feature.getAttribute('_list{}.EASTING')
y_coords = feature.getAttribute('_list{}.NORTHING') 
@takashi, Thank you very much for your correction. It is indeed my mistake. feature.getAttribute('_list{}') get None, but feature.getAttribute('_list{}.EASTING') can get the value of the list correctly.
Badge

Hi @katrinaopperman, to parse this sort of text data, I often use global variables (VariableSetters/Retrievers) and a TestFilter to classify the text lines, then translate them for each group (Header, Table Body, Footer=AREA) separately. The major part of the workflow looks like this.

Hello @takashi and @taojunabc,

 

 

Thank you for your explanation about variables @takashi, I will try it in a simple workspace.

 

 

I appreciate the time both of you have taken to help me. I still can't get it to work, even with the FMEGeometryTools, so I'd really appreciate you looking at my pythoncaller. My build is 2017.0, and I attach the workspace and sample data again as requested. I will keep in mind the admonition for later help requests! I'm new to FME - this is I think my third month of using it? But this forum and the people in it are so very helpful!!

 

 

Thank you!

 

Katrina

 

lis-parser-v1.fmw

 

blocks-section-054-kaleen.txt
Userlevel 2
Badge +17

Hi @katrinaopperman, to parse this sort of text data, I often use global variables (VariableSetters/Retrievers) and a TestFilter to classify the text lines, then translate them for each group (Header, Table Body, Footer=AREA) separately. The major part of the workflow looks like this.

0684Q00000ArJcEQAV.png

0684Q00000ArJgaQAF.png

OK. It has been clarified what elements the "_final_list" contain. I think this script works in your workspace to generate your desired result.

 

import fmeobjects
def createPolygon(feature):
    x_coords = [float(v) for v in feature.getAttribute('_final_list{}.EASTING')]
    y_coords = [float(v) for v in feature.getAttribute('_final_list{}.NORTHING')]
    radius = feature.getAttribute('_final_list{}.RADIUS')
    
    boundary = fmeobjects.FMEPath()
    x0, y0 = x_coords[0], y_coords[0]
    for i in range(1, len(x_coords)):
        x1, y1, r = x_coords[i], y_coords[i], radius[i]
        if r:
            twoPoints = (fmeobjects.FMEPoint(x0, y0), fmeobjects.FMEPoint(x1, y1))
            r = float(r)
            boundary.appendPart(fmeobjects.FMEArc(twoPoints, abs(r), r < 0))
        else:
            boundary.appendPart(fmeobjects.FMELine([(x0, y0), (x1, y1)]))
        x0, y0 = x1, y1
        
    feature.setGeometry(fmeobjects.FMEPolygon(boundary))
And, this is a worksapce that contains the same Python script and the workflow using global variables. FYI:

 

[Updated] b17291-parse-block-text.fmwt (FME 2017.0.1)
Badge

Hi @katrinaopperman, to parse this sort of text data, I often use global variables (VariableSetters/Retrievers) and a TestFilter to classify the text lines, then translate them for each group (Header, Table Body, Footer=AREA) separately. The major part of the workflow looks like this.

Thank you @takashi, that worked perfectly! And so compact too! Thank you for the variable example too, I will investigate that and see how it works.

 

 

I am in awe!

 

Badge +6

Referring to @takashi's workspace, I found that it seems more simple and clear that the entire process is implemented in python.

b17725-create-polygon.fmw

Badge

Referring to @takashi's workspace, I found that it seems more simple and clear that the entire process is implemented in python.

b17725-create-polygon.fmw

Thank you @taojunabc, that is certainly much more streamlined... kind of defeats the idea of using FME, but I guess it's what is fit for purpose that matters. Cheers

 

Reply