Skip to main content

Hi

I would like to create two new attributes from a linear WKT text file. I need to create one new attribute for the first pair of coordinates and one new attribute for the last pair of coordinates. (It's critiacl that the number of decimal places is retained)

I have managed get this to work in a fashion by using the attribute splitter then attribute creator. As the first pair is always at _list{0} in the list.

I have more problem with identiying the last pair of coordinates as there could be over 200 pairs of coordingates in the linear WKT field.

I've managed to create a new column that identifies the number of coordinates in the string and thought about using a test filter to split the data by number of coordinate pairs then assing the correct _list{x} value however this isn't practival when I have over 200 variations in the count of pair of coordiantes.

Is there a simpler way to extract the first and last pair of coordinates from a sting in the format as below:

LINESTRING (440720.53879545844 427232.77141405316, 440720.1293468303 427234.32703442615, 440720.08303986554 427234.27175864682, 440720.07874009717 427234.3278727259, 440720.1293468303 427234.32703442615, 440720.02814472251 427234.32869629288, 440720.07874009717 427234.3278727259)LINESTRING (465319.45713291148 413289.30000000005, 465317.43352968543 413283.25000000006, 465300.54984052165 413279.67591929191, 465264.70442175335 413273.616345648, 465265.53433199646 413266.54320040863, 465271.71414449363 413234.88681315148, 465286.80490784609 413179.86635444534, 465291.83005261811 413152.19758878386, 465302.8353721585 413112.61909653118, 465246.47810086748 413084.9533264137, 465184.78443391796 413050.409171295, 465082.76269732963 412991.76648624783, 465012.21409438667 412944.98036378925, 465063.48041194724 412871.36282405793, 465150.21443737263 412745.03416126739, 465190.36491619179 412688.43223224522, 465123.16946062754 412661.95187000179, 465155.62572572008 412578.59155002335, 465139.81304090458 412576.98770357895, 465076.44333494129 412585.55487726221, 465040.93668560189 412587.77767700778, 465014.19340087625 412584.7395535307, 465005.65864260681 412582.47149080411, 465015.61610099266 412543.55565560854, 465014.63840964978 412528.18072367407, 465012.87565928011 412517.69737987971, 465013.86512939405 412514.14015637676, 465018.39940455841 412508.10360412207)LINESTRING (429708.793362774 419479.58544436248, 429628.70000000007 419501.30000000005, 429517.65000000008 419436.10000000003, 429379.40000000008 419392.2, 429249.50000000006 419337.65, 429220.40000000008 419318, 429220.35 419318, 429041.8000000001 419197.9, 428931.40000000008 419127.7, 428900.25000000006 419116.35000000003, 428494.20000000007 418975.55000000005, 428445.80000000005 419032.95000000007, 428407.75000000006 419032.25000000006, 428389.85000000003 419076.10000000003, 428384.00000000006 419079.4, 428314.95000000007 419088.9, 428294.10000000009 419092.2, 428292.3000000001 419089.65, 428287.15000000008 419065.85000000003, 428125.25000000006 419117.60000000003, 428106.10000000003 419114.35000000003, 428082.30000000005 419115.35000000003, 428077.65 419123.4, 428079.35000000003 419137.30000000005, 428073.00000000006 419144.70000000007, 428058.65000000008 419143.85000000009, 428015.60000000009 419137.75000000012, 428015.4 419137.75, 427973.3000000001 419128.65000000014, 427941.70000000013 419137.05000000016, 427877.40000000014 419138.25000000017)LINESTRING (496502.69972204685 452146.35332261573, 496495.41285323462 452165.91575673624, 496498.1098790851 452179.41171024676, 496498.46486248093 452181.42757951561, 496499.0143071292 452187.41610325349, 496499.19927722536 452188.33492752619, 496500.75257650367 452197.88355210249, 496502.55530077172 452208.08638199524, 496505.14729518926 452222.00231441652, 496506.63611961796 452228.80934655166, 496509.21177928493 452243.42270046246, 496511.92434894189 452261.74389994337, 496513.89031667658 452267.05518550286, 496514.4998035492 452269.60732820758, 496514.70533512562 452271.10753988457, 496513.8697829927 452276.31369055458, 496513.3230288215 452278.0152537502, 496512.428430649 452280.71692462533, 496511.68333211186 452282.41867075057, 496504.61077293433 452289.79602180794, 496478.36120082228 452321.76398074505, 496476.66836714436 452324.06982424331, 496476.02391115681 452326.02369472955, 496466.35855476442 452382.16006774688, 496461.99149802589 452411.83355659322, 496455.52382950473 452436.38206611504, 496453.61091749312 452445.08102870907, 496450.38500363013 452457.68819934089, 496442.20123868005 452487.91359945096, 496427.13660564885 452531.48664969642, 496423.05727719271 452542.77307273663, 496420.51245759 452549.40034222265, 496419.69642201712 452552.18742140383, 496415.06539593945 452567.56761452468)LINESTRING (389053.43163046654 442116.687073448, 389044.13987374725 442115.73981239245)LINESTRING (390255.65 426201.28, 390254.65 426203.28)LINESTRING (390269.64999999997 426195.27999999997, 390270.64999999997 426196.27999999997, 390269.64999999997 426200.28, 390273.64999999997 426202.28)

 

I hope someone out there can help.

I think in this scenario I'd rebuild the geometry and then extract first and last coordinates rather than manipulating the text string

 

Edit: although it looks like this might not work for you if the decimal places are absolutely critical. In which case if you have already got a list you can use a list indexer with a list index of -1 to retrieve the last value


Hi @jez,

Another take is pure string manipulation:

 

Hope this helps.

Extracting the first and last coordinate pair from a string.fmw


You could use RegEx to get the precise characters.

Use the following expression in AttributeCreator or ExpressionEvaluator

@ReplaceRegEx(@Value(AttributeName),"\((\d+\.?\d* \d+\.?\d*),.* (\d+\.?\d* \d+\.?\d*)\)","\1,\2")

This will return Eg. "440720.53879545844 427232.77141405316,440720.07874009717 427234.3278727259 " being the first and last pair of coordinates in the WKT, with a comma delimiter.

Otherwise, the AttributeSplitter approach is fine as well, just use the ListIndexer in the way that @ebygomm suggests by retrieving Index = -1 to get the last value in the list.


You could use RegEx to get the precise characters.

Use the following expression in AttributeCreator or ExpressionEvaluator

@ReplaceRegEx(@Value(AttributeName),"\((\d+\.?\d* \d+\.?\d*),.* (\d+\.?\d* \d+\.?\d*)\)","\1,\2")

This will return Eg. "440720.53879545844 427232.77141405316,440720.07874009717 427234.3278727259 " being the first and last pair of coordinates in the WKT, with a comma delimiter.

Otherwise, the AttributeSplitter approach is fine as well, just use the ListIndexer in the way that @ebygomm suggests by retrieving Index = -1 to get the last value in the list.

If only I knew how to write RegEx! This is so simple and works perfectly! Thank you

 

the Listindexer works perfectly too, I never thought to put a value of -1 in there to identify the last entry.

 

If you have time could you write out in english what the RegEx is doing - it all looks like smoke and mirrors to me.


If only I knew how to write RegEx! This is so simple and works perfectly! Thank you

 

the Listindexer works perfectly too, I never thought to put a value of -1 in there to identify the last entry.

 

If you have time could you write out in english what the RegEx is doing - it all looks like smoke and mirrors to me.

OK, @jez it looks complex but really this is a simple one just using a few core RegEx expressions put together into a longer expression.

So really we start with the pattern we are looking for which is:

  • A Left bracket "("
  • + A number that may or may not have a decimal point and decimal component
  • + A space character
  • + Another number that may or may not have a decimal point and decimal component

....this is the first coordinate.

The last coordinate is similar, just with a right bracket, and at the end rather than at the start, so the pattern it obeys is:

  • A number that may or may not have a decimal point and decimal component
  • + A space character
  • + Another number that may or may not have a decimal point and decimal component
  • + A Right Bracket ")"

So put this together we have first coordinate as:

\\( = Left bracket. The "\\" is an escaping prefix we use because "(" is reserved in RegEx otherwise for special functions, so we need to tell RegEx that we literally mean the "(" character

\\d+ = A digit that is 1 or more characters long. This is what "+" as a suffix means

\\.? = A decimal character that may be 0 or 1 character long. Again "." is a reserved wildcard character in RegEx so we have to escape it with "\\" to tell it we literally mean the "." character.

\\d* = This is the decimal part following the decimal point "." that may or may not exist, and similarly the decimal number may be 0 or more characters long. This is what "*" as a suffix means.

,.* = This means the middle part of the string. This is a literal comma "," plus any combination of intermediate characters. "." is a reserved expression that means "any character" and ".*" means any character combination that is 0 characters or more long

So put it all together it just becomes (I'll just do the first half for the first coordinate. The last coordinate part is really just the same as below but with \\) at the end instead

\\( ----- \\d+ ----- \\.? ----- \\d* ----- {Space} ----- \\d+ ------ \\.? ----- \\d* matches

( ----- 440720 ----- . ----- 53879545844 ----- {Space} ----- 427232 ----- . ----- 77141405316

OK, now we introduce the special function of the "(" character into the mix. You put these around the parts of the pattern that you want to "Group" together as a substring. RegEx will "store" these Groups in memory as it parses through the string.

So we put (...) around the part of the pattern which corresponds to the numeric parts of the first coordinate Eg.

(-----(440720.53879545844 427232.77141405316) ----, = Group 1

(440720.07874009717 427234.3278727259) ----- ) = Group 2

With most Transformers in FME, you can "retrieve" those Groups of Substrings and call them as part of the expression of a new string, by Eg. using:

  • \\1 to refer to Group 1
  • \\2 to refer to Group 2

etc.

So you'll see in the RegEx expression the carefully placed extra brackets that create Group 1 and Group 2, and the Replacement string value we've chosen is just:

  • \\1(Group 1, the bracketed first coordinate substring) + a comma character "," + \\2 (Group 2, the bracketed last coordinate substring)
  • or "\\1,\\2" as the 3rd argument in the @ReplaceRegEx() function

You can try/learn about RegEx on quite a few online testers like https://regex101.com/ , which I find really useful for learning as it shows you how it applies each part of your RegEx expression to Test Strings.


If only I knew how to write RegEx! This is so simple and works perfectly! Thank you

 

the Listindexer works perfectly too, I never thought to put a value of -1 in there to identify the last entry.

 

If you have time could you write out in english what the RegEx is doing - it all looks like smoke and mirrors to me.

I really appreacite your time with this. I will be copying the description you provided as I'm certain I will be able to utlise some of the functions in other tasks. I'll be visiting the website too.

You have been a great help. Thank you


Reply