Skip to main content

Hello,

What we do is get KMZs and run the ArcGIS tool to convert them to to layers in a file GDB. When you use the layer in ArcMap there is a field PopupInfo, which has HTML data we need to extract?? I have tried several recommendation with HTML, but nothing gets me to where I can flatten the data out into a usable table. Please let me know any suggestions?

Hi @dbklingdom

I would recommend using the method in the following article: https://knowledge.safe.com/articles/19918/how-to-expose-feature-attributes-from-kml-tag.html

You may need to modify the XQuery Expression in the XMLXQueryExtractor transformer based on the structure of your XHTML attribute. Try replacing the second line of the XQuery Expression in the article with:

for $x in /html/body/table/tr

@dbklingdom The trick is converting the HTML into XML so that you can use the FME XMLFlattener or similar XML tools. So the HTMLToXHTMLConverter and the XMLFlattener are probably what you need. There is still a bit of clean-up and renaming to do. I've attached an example workspace (2018.1). htmltoxmlexample.zip

There is probably a slightly more elegant way to flatten the XML, but this works with the sample data you sent us.


Hi @dbklingdom

I would recommend using the method in the following article: https://knowledge.safe.com/articles/19918/how-to-expose-feature-attributes-from-kml-tag.html

You may need to modify the XQuery Expression in the XMLXQueryExtractor transformer based on the structure of your XHTML attribute. Try replacing the second line of the XQuery Expression in the article with:

for $x in /html/body/table/tr

FME_TODAY.zipThanks Mark this help out a lot.  For some reason we are getting gaps in the attributes from the attributeCreator.  I have dropped my work and some sample data if you get a chance to give it a try?  Many thx


FME_TODAY.zipThanks Mark this help out a lot. For some reason we are getting gaps in the attributes from the attributeCreator. I have dropped my work and some sample data if you get a chance to give it a try? Many thx

I think you may have replied to the wrong answer 🙂 The 'gaps' in the attributes are caused by your AttributeCreator referring to list values that do not exist on the feature (eg. td{13}).

I have modified your workspace to work with the data attached as well as added an alternative workflow using the steps outlined in the linked article in my answer above.

Working NO fields_safeSupport.fmw


@debbiatsafe Hey Thx for all the help!! Looks I keep running into ever changing HTML. the latestxquery.txt I get no data or an error '... Last line repeated 124 times ...'. I have been tweeing the query but not going anywhere. I think I need to skip the first tr is that possible? Thx again Brian ps. See somebody next week.


@debbiatsafe  Hey Thx for all the help!!  Looks I keep running into ever changing HTML.  the latestxquery.txt   I get no data or an error '... Last line repeated 124 times ...'.  I have been tweeing the query but not going anywhere.  I think I need to skip the first tr is that possible?  Thx again Brian  ps. See somebody next week.

Hi @dbklingdom

I personally find using a text editor like Notepad++ to view the output from the HTMLToXHTMLConverter transformer very helpful in viewing the structure as it is possible to collapse nodes. This makes it easier to make changes to the XPath expression (/html/body/table/...) within the XQuery expression as required.

0684Q00000ArMeCQAV.png

For example, for the text file linked above, the XQuery expression would be:

declare default element namespace "http://www.w3.org/1999/xhtml";
for $x in /html/body/table/tr/td/table/tr
return fme:set-attribute($x/tde1]/text(),$x/tdh2]/text())

If you are looking to make your workspace more dynamic, I would recommend looking at marcp's very helpful comment on the Exposing Feature Attributes from KML tag article. He suggested using the following XQuery expression which does not require an XPath expression to be specified.

declare default element namespace "http://www.w3.org/1999/xhtml"; 
for $x in //tr where (exists($x/tde1]) and compare($x/tdp2]/text(),"<Null>")) 
return fme:set-attribute($x/tde1]/text(),$x/tdh2]/text()) 

As the user mentions in their comment:

The //tr extracts rows, regardless of what comes before, which is very handy so that you don't really need to figure out the structure.

This should reduce the need to change the XQuery expression within the XMLXQueryExtractor.


@debbiatsafe @takashi Thanks debbiatsafe! I had been able to get html/body/table/tr/td/table/tr prier to your reply, but the query run very nice as well. I have been trying to work a way to automagicly Expose the Attribute instead of typing them all. I have been trying Exploders and creates but look like I need a way to iterate _aggList{x}.html_Value. Any recommendation? Thx


Reply