Skip to main content

Hi all,

 

 

Can the AttributeManager perform operations like the StringSearcher?

 

 

I would like to extract certain strings from an attribute called RAW_DATA, I'm looking for the value in "value="xxx"

 

e.g.

 

<span class="store-card__value">Mon-Fri 09:00 AM  - 09:00 PM

 

</span>

 

<span class="store-card__value">Sat 09:00 AM  - 06:00 PM

 

</span>

 

<span class="store-card__value">Sun 10:00 AM  - 05:00 PM

 

</span>

 

</p>

 

</div>

 

<div>

 

</div>

 

<div class="store-card__section">

 

<a class="store-card__action" href="/teststore.com">Store Details</a>

 

<button class="btn btn-tool store-card__action storeSelectButton selectCategoryStore " id="selectCategoryStore_699" name="699" value="699">Store</button>

 

</div>

 

</section>

 

</li>

 

In a StringReplacer I would use RegEx like value=\"\d{1,3}\" to get what I need.

 

I have to perform a bunch of operations like the above one so I'd rather keep everything in one AttributeManager and not use 20 separate StringReplacers.

 

 

Can the AttributeManager perform these operations?

 

I have been looking at the help file but it is not clear to me. Stuff like this is just Chinese to me:FindString(str, findStr, pstartIdx], ecaseSensitive])

Returns the index in string 

str 
starting at 
startIdx 
that matches 
findStr 
, or -1 if the string is not found. If 
startIdx 
is a negative integer, 
FindString()
 returns the index in 
str 
starting at 
startIdx
 from the end of 
str 
, then matching 
findStr 
going forward (from left to right). If 
startIdx 
is not specified, the search starts at index 0. If 
caseSensitive 
is FALSE, the search is case-insensitive. Otherwise, the search is case-sensitive.

 

 

Cheers,

 

Ed

 

 

 

 

 

 

There is no function in the AttributeManager to return a regex match, you can return the start position of a match or replace a regex match but not return the match itself. There are ways a means to achieve the same thing sometimes but they're not straightforward

 


It seems that you are working with HTML data. Have you tried the HTMLExtractor transformer to retrieve the information needed?


@edhere Your choices are probably:

  • StringSearcher - but as you say you might need lot's of them, but much easier to formulate the regex
  • AttributeCreator/Manager with FindRegEx and Substring. FindRegEx gives you the location of your string and the Substring extracts it. Pretty complex and probably harder to maintain than several StringSearcher's
  • Treat it as an HTML problem as suggested by @gabriel_hirsch. But convert the HTML to XML (HTMLToXHTMLConverter) and then use XMLFlattener to extract the attribution you need.

I've attached an example workspace with all three options (2019.2): xmlextract.fmw


Hi all,

 

 

Thanks for all the replies. Apparently one does not simply use StringReplacer inside an AttributeManager ;-)

I'll have a look at the examples. I might still choose a bunch of StringReplacers and tuck them away in a bookmark!


Reply