I have some html data that's in the structure
<h2>Status</h2>
<h3>Place</h3>
<p><a name="1">Name</a></p> <p><a name="2">Name<blockquote><p>Description</p></blockquote></a></p> <h3>Place2</h3>
<p><a name="3">Name<blockquote><p>Description</p></blockquote></a></p>
but the Line Feeds are entirely erratic.
I need to have one feature per name anchor (which is easily enough done with the HTMLExtractor) but I also need to have the corresponding contents of the h2|h3 tags stored as attributes.
Normally I would read in the data line by line and use a TestFilter and variables to do so, but since the lines breaks don't match the data structure in any way, I'm not sure as to the best way to proceed.