Question

Substring operation incorrect in several cases

11 months ago
August 9, 2024
6 replies
70 views

axcel.fme
Contributor
12 replies

Dear all,

I’m spending a lot of time trying to sort out an issue with a substring operation but I cannot see my results are not correct in several places.

In FME workbench 2024.0, I use the the AttributeManager to read a substring from the attribute _response_body and to write it to the attribute info. The response body is the result from a reverse geo-coding with OpenStreetMap and I want to extract the location for the given coordinates. To extract the relevant part, I use the following sequence of String operators:

1) Take everything left of </result>
2) From this, take everything right of address_rank
3) From this, take everything right of >

The actual (unfortunately complicated) condition statement which I’m using is:

@Right(@Right(@Left(@Value(_response_body),@FindString(@Value(_response_body),</result>)-1),@StringLength(@Left(@Value(_response_body),@FindString(@Value(_response_body),</result>)-1))-@FindString(@Value(_response_body),address_rank)),@StringLength(@Right(@Left(@Value(_response_body),@FindString(@Value(_response_body),</result>)-1),@StringLength(@Left(@Value(_response_body),@FindString(@Value(_response_body),</result>)-1))-@FindString(@Value(_response_body),address_rank)))-@FindString(@Right(@Left(@Value(_response_body),@FindString(@Value(_response_body),</result>)-1),@StringLength(@Left(@Value(_response_body),@FindString(@Value(_response_body),</result>)-1))-@FindString(@Value(_response_body),address_rank)+1),>))

This seems to work fine for a number of datasets but it gives me a few extra characters in some cases:

The syntax and structure of the source text seems to be consistent:

670:
<?xml version="1.0" encoding="UTF-8" ?>
<reversegeocode timestamp="Thu, 08 Aug 2024 09:13:46 +00:00" attribution="Data © OpenStreetMap contributors, ODbL 1.0. http://osm.org/copyright" querystring="lat=58.152879104460816&lon=8.126058832869633&format=xml&zoom=12"><result place_id="172781434" osm_type="node" osm_id="31264100" ref="Kristiansand" lat="58.1461500" lon="7.9957333" boundingbox="57.9861500,58.3061500,7.8357333,8.1557333" place_rank="16" address_rank="16">Kristiansand, Agder, 4611, Norway</result><addressparts><city>Kristiansand</city><municipality>Kristiansand</municipality><county>Agder</county><ISO3166-2-lvl4>NO-42</ISO3166-2-lvl4><postcode>4611</postcode><country>Norway</country><country_code>no</country_code></addressparts></reversegeocode>

671:
<?xml version="1.0" encoding="UTF-8" ?>
<reversegeocode timestamp="Thu, 08 Aug 2024 09:13:46 +00:00" attribution="Data © OpenStreetMap contributors, ODbL 1.0. http://osm.org/copyright" querystring="lat=58.72402439827633&lon=9.257224322307422&format=xml&zoom=12"><result place_id="177641629" osm_type="node" osm_id="2380166198" ref="Risør" lat="58.7207522" lon="9.2343838" boundingbox="58.6807522,58.7607522,9.1943838,9.2743838" place_rank="18" address_rank="16">Risør, Agder, 4956, Norway</result><addressparts><town>Risør</town><municipality>Risør</municipality><county>Agder</county><ISO3166-2-lvl4>NO-42</ISO3166-2-lvl4><postcode>4956</postcode><country>Norway</country><country_code>no</country_code></addressparts></reversegeocode>

672:
<?xml version="1.0" encoding="UTF-8" ?>
<reversegeocode timestamp="Thu, 08 Aug 2024 09:13:47 +00:00" attribution="Data © OpenStreetMap contributors, ODbL 1.0. http://osm.org/copyright" querystring="lat=60.51981765734847&lon=4.907608675437386&format=xml&zoom=12"><result place_id="173934469" osm_type="relation" osm_id="10150629" ref="4626" lat="60.4267846" lon="4.7265289" boundingbox="60.1040492,60.7409047,4.2661374,5.2380227" place_rank="14" address_rank="14">Øygarden, Vestland, Norway</result><addressparts><municipality>Øygarden</municipality><county>Vestland</county><ISO3166-2-lvl4>NO-46</ISO3166-2-lvl4><country>Norway</country><country_code>no</country_code></addressparts></reversegeocode>

I attach the source data in an Excel file and the relevant part of my workbench.
It would be wonderful if somebody could help me understand why my substring operation fails is numerous cases. Is there a better way to extract the relevant data from the result string?

Thanks
Axel

+50

geomancer
Evangelist
887 replies
11 months ago
August 9, 2024

There is something wrong with the XML for 670, this should read

<?xml version="1.0" encoding="UTF-8" ?>
<reversegeocode timestamp="Thu, 08 Aug 2024 09:13:46 +00:00" attribution="Data © OpenStreetMap contributors, ODbL 1.0. http://osm.org/copyright" querystring="lat=58.152879104460816&amp;lon=8.126058832869633&format=xml&zoom=12"><result place_id="172781434" osm_type="node" osm_id="31264100" ref="Kristiansand" lat="58.1461500" lon="7.9957333" boundingbox="57.9861500,58.3061500,7.8357333,8.1557333" place_rank="16" address_rank="16">Kristiansand, Agder, 4611, Norway</result><addressparts><city>Kristiansand</city><municipality>Kristiansand</municipality><county>Agder</county><ISO3166-2-lvl4>NO-42</ISO3166-2-lvl4><postcode>4611</postcode><country>Norway</country><country_code>no</country_code></addressparts></reversegeocode>

+50

geomancer
Evangelist
887 replies
11 months ago
August 9, 2024

You can get the data you want with an XMLFlattener.

Afterwards you can combine the individual attributes.

1 Attachments

XMLFlattener.zip

+50

geomancer
Evangelist
887 replies
11 months ago
August 9, 2024

Oh, and the problem in your complex statement is caused by the Ø and ø in the text. These probably make the @StringLength go awry.

+50

geomancer
Evangelist
887 replies
11 months ago
August 9, 2024

But of course this really asks for a solution with a regular expression using subexpressions in a StringSearcher.

And the best part: this even works with the erroneous XML for 670.

1 Attachments

Get_XML_by_RegEx.zip

axcel.fme
Author
Contributor
12 replies
11 months ago
August 9, 2024

Wow! Thank you very much. This was really helpful!
In fact, I did not consider the StringSearcher, but it is actually a really helpful transformer, and thanks to you, I now know how it works... 😃

Thank you so much!

+50

geomancer
Evangelist
887 replies
10 months ago
September 2, 2024

I think I should elaborate a bit on the regular expression I used:

(address_rank=\"\d+\"\>)(.+)(</result>)

The parentheses () indicate subexpressions (or capturing groups in RexEx lingo). The results of the subexpressions are added to the list named after Subexpression Matches List Name.

First subexpression address_rank=\"\d+\"\> : find the literal text address_rank, followed by an equal sign, a double quote, one or more digits, another double quote, and a closing angle bracket.

Second subexpression .+ : One or more characters (letter, digit, whitespace, almost everything)

Third subexpression </result> : the literal text </result>

This way the second subexpression contains everything that is between the > after address_rank and </result>.

Reply

Rich Text Editor, editor1

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

Substring operation incorrect in several cases

1 Attachments

6 replies

1 Attachments

1 Attachments

Reply

Helpful Members This Week

Recently Solved Questions

How to get a list of Asana tasks with their corresponding custom field values?

Using one AttributeRounder for different accuracies

Create date segments of two table with overlap of times

Automate Fanout of columns/splitting attributes to different output by attribute name

Tracing Multiple Networks from Sources to Valves Without Python

Community Stats

Latest FME

Cookie policy

Cookie settings

1 Attachments

1 Attachments

1 Attachments

Reply

Related Topics

internet stability issuesicon

My wifi fails to connect to Internet Every Morning - SB8200icon

S33v2 Keeps Dropping Internet Connectionicon

Comcast sent email saying issue with G36 modemicon

G34/G36 Intermittent Connectivity On Xfinity Network

Helpful Members This Week

Recently Solved Questions

How to get a list of Asana tasks with their corresponding custom field values?

Using one AttributeRounder for different accuracies

Create date segments of two table with overlap of times

Automate Fanout of columns/splitting attributes to different output by attribute name

Tracing Multiple Networks from Sources to Valves Without Python

Popular Tags

Community Stats

Latest FME

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded

Cookie policy

Cookie settings