Question

How to use StringSearcher function in AttributeManager?

  • 23 January 2020
  • 4 replies
  • 1 view

Badge

Hi all,

 

 

Can the AttributeManager perform operations like the StringSearcher?

 

 

I would like to extract certain strings from an attribute called RAW_DATA, I'm looking for the value in "value="xxx"

 

e.g.

 

<span class="store-card__value">Mon-Fri 09:00 AM  - 09:00 PM

 

</span>

 

<span class="store-card__value">Sat 09:00 AM  - 06:00 PM

 

</span>

 

<span class="store-card__value">Sun 10:00 AM  - 05:00 PM

 

</span>

 

</p>

 

</div>

 

<div>

 

</div>

 

<div class="store-card__section">

 

<a class="store-card__action" href="/teststore.com">Store Details</a>

 

<button class="btn btn-tool store-card__action storeSelectButton selectCategoryStore " id="selectCategoryStore_699" name="699" value="699">Store</button>

 

</div>

 

</section>

 

</li>

 

In a StringReplacer I would use RegEx like value=\"\d{1,3}\" to get what I need.

 

I have to perform a bunch of operations like the above one so I'd rather keep everything in one AttributeManager and not use 20 separate StringReplacers.

 

 

Can the AttributeManager perform these operations?

 

I have been looking at the help file but it is not clear to me. Stuff like this is just Chinese to me:FindString(str, findStr, [startIdx], [caseSensitive])

Returns the index in string 

str 
starting at 
startIdx 
that matches 
findStr 
, or -1 if the string is not found. If 
startIdx 
is a negative integer, 
FindString()
 returns the index in 
str 
starting at 
startIdx
 from the end of 
str 
, then matching 
findStr 
going forward (from left to right). If 
startIdx 
is not specified, the search starts at index 0. If 
caseSensitive 
is FALSE, the search is case-insensitive. Otherwise, the search is case-sensitive.

 

 

Cheers,

 

Ed

 

 

 

 

 

 


4 replies

Userlevel 1
Badge +10

There is no function in the AttributeManager to return a regex match, you can return the start position of a match or replace a regex match but not return the match itself. There are ways a means to achieve the same thing sometimes but they're not straightforward

 

Badge +9

It seems that you are working with HTML data. Have you tried the HTMLExtractor transformer to retrieve the information needed?

Badge +2

@edhere Your choices are probably:

  • StringSearcher - but as you say you might need lot's of them, but much easier to formulate the regex
  • AttributeCreator/Manager with FindRegEx and Substring. FindRegEx gives you the location of your string and the Substring extracts it. Pretty complex and probably harder to maintain than several StringSearcher's
  • Treat it as an HTML problem as suggested by @gabriel_hirsch. But convert the HTML to XML (HTMLToXHTMLConverter) and then use XMLFlattener to extract the attribution you need.

I've attached an example workspace with all three options (2019.2): xmlextract.fmw

Badge

Hi all,

 

 

Thanks for all the replies. Apparently one does not simply use StringReplacer inside an AttributeManager ;-)

I'll have a look at the examples. I might still choose a bunch of StringReplacers and tuck them away in a bookmark!

Reply