Skip to main content
Question

Aggregate string value by common base words


slustado
Contributor
Forum|alt.badge.img+2

Hello! I’m trying to figure out the best way to aggregate values by the most common words. I’ve found a few threads and documents but not quite what I was looking for.

For example, I have a list of building names and numbers where each entry can have a of variation of a building name and number:

 

“1000 The Coolest Building Ever”

“1000 Coolest Building”

“Coolest Building”

“100 Coolest Building Dr.”

 

I would like the output to be “Coolest Building”, as it has common base words across all features. Is this possible?

 

Bonus point if a variation of “Bldg.” “Bldg” can be included. Any advice/guidance is appreciated!

 

4 replies

alexbiz
Enthusiast
Forum|alt.badge.img+13
  • Enthusiast
  • July 6, 2025

Hm, you may use AI to resolve this kind of fuzzy matching I think, if the number of features/different values is not too big.
 


crutledge
Influencer
Forum|alt.badge.img+33
  • Influencer
  • July 6, 2025

Yah. There is no easy way on this one. You would first have to build your list using something like: Normalize Data Using FME Desktop - YouTube or like ​@alexbiz said AI for this maybe?

Then have an attribute mapper for shortforms like bldg=building or st=Street 

First thing I think would be to get the “extras” out of the attributes like the numbers and get it down to a pure line of text. no ##s no special characters. Then remove things like Building, Bldg, st, street, and get down to a “Name” then Normalize. Keep track of this “list of values” for an Attribute Mapper.

This is a challenge. Let us know how it goes. Hope that helps.


takashi
Evangelist
  • July 7, 2025

Hi ​@slustado ,

If your examples cover every string pattern that could appear, I think you can use StringSearcher with this expression to extract the part representing building name - "Coolest" in the examples.
(\d*\s+)?(The\s+)?(.+)\s+(Building|Bldg)


virtualcitymatt
Celebrity
Forum|alt.badge.img+39

I also think this is a easy problem for AI for an AI to solve. 


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings