Hi @ingalla, how are you configuring the HTML Extractor? can you place a screen dump of the settings?
I tried with the CSS Selector .attributeList p and can't get a handle on
<div class="attributeList">
<p><span>Name:</span> Basingstoke War Memorial</p>
<p><span>List entry Number:</span> 1435084</p>
</div>
However, if I hack the HTML file and change the class value to all lower case "attributelist" it finds it with the selector .attributeList p
Maybe I'm missing something simple or there is an issue with case sensitivity.
@mark_1spatial, I also found the HTMLExtroctor doesn't work as expected if you set the class name ".attributeList" as the CSS Selector, in FME 2017.0.0.1 build 17271. I don't think you are missing something, and am afraid that there could be a potential bug here.
@ingalla, in the interim (and in FME 2016 or earlier), you can use the StringSearcher to extract <div class="attributeList"> elements from the entire HTML document, and then extract your desired strings which are stored in the <p> elements under the <div>.
Regular Expression Example
<div class="attributeList">.+?</div>
@mark_1spatial, I also found the HTMLExtroctor doesn't work as expected if you set the class name ".attributeList" as the CSS Selector, in FME 2017.0.0.1 build 17271. I don't think you are missing something, and am afraid that there could be a potential bug here.
@ingalla, in the interim (and in FME 2016 or earlier), you can use the StringSearcher to extract <div class="attributeList"> elements from the entire HTML document, and then extract your desired strings which are stored in the <p> elements under the <div>.
Regular Expression Example
<div class="attributeList">.+?</div>
Yes same build number as me. You posted in here about case sensitive tags:
https://knowledge.safe.com/questions/34058/how-to-parse-html-file.html
Hi @ingalla , I could use this method , I did the workspace quickly, I am sure there are other methods, but this works. I hope it works for you too.
Good luck.
Lyes.
extractvaluesfromhtml.fmw
Hi @ingalla , I could use this method , I did the workspace quickly, I am sure there are other methods, but this works. I hope it works for you too.
Good luck.
Lyes.
extractvaluesfromhtml.fmwThanks for the workspace example. This is very useful in my case.