Skip to main content
Solved

Split list by keyword to individual lists.Or something??

  • January 17, 2025
  • 2 replies
  • 54 views

sworkman
Contributor
Forum|alt.badge.img+4

Hello, 

I have a pdf report that I need to parse out. I can read it and create a line by line list. It has a line with employee code/name, then subsequent lines with codes and their values associated to the employee. Here is a sample of the report and then what it looks like when it is read in. 

I can get it split nicely into rows/columns/attributes no problem. My issue is that I dont know how to break it up in each employees portion and add the employee number to each row.

There would be a few hundred employees. 

So in the end I want to be able to spit out a table that has the columns:

Employee num, Code, Description, units, extensions,etc 

I am very new to FME and just struggling to get it figured out. I dont know if I should be using some kind of loop, python code, aggreator etc. 

Any help would be greatly appreciated! 

 

Best answer by bwn

To break it down into the main challenges:

The first step would be to assign all text line rows an Attribute value with the Employee Number.  This can be done with AttributeCreator run in Adjacent Features mode.

What we can do is get AttributeCreator to step through each text line and whenever the text line starts with “Employee:”  then this is the row that has the Employee Code value in it.  We can then use RegEx to extract it and with then we can keep assigning this code value to all subsequent rows until we reach the next line that has “Employee:” in it, which is where the next employee’s group of data starts.

 

@ReplaceRegularExpression(@Value(text_line_data),"^Employee:\s+(\d+).+$",\1)

The RegEx to extract the employee code puts the group of Digits “\d+” into RegEx string pattern Group 1 by putting the brackets “(...)” around it, and then we use this bracketed group of characters to use as the value for employee_number

Through the Adjacent Features we then assign this value to all subsequent rows except where we encounter a new row that again starts with “Employee:”

Output is:

 

 

From here is a Tester to just get the Code vs Value rows and get rid of the rest Eg.
 

 

Then we can use AttributeSplitter to create Lists per Employee per Code to capture the Unit values.  Assuming it is Tab delimited, then 

 

 

 

An important knowledge item in FME is that “Lists” are actually just normal Attributes but with an Attribute Name that is formatted like _list{0} , _list{1} etc.

We can use this knowledge for each row to Create New Attributes from each List Attribute using AttributeCreator to assign each List Attribute value to a “Normal” Attribute.

 

View original
Did this help you find an answer to your question?

2 replies

bwn
Evangelist
Forum|alt.badge.img+26
  • Evangelist
  • Best Answer
  • January 17, 2025

To break it down into the main challenges:

The first step would be to assign all text line rows an Attribute value with the Employee Number.  This can be done with AttributeCreator run in Adjacent Features mode.

What we can do is get AttributeCreator to step through each text line and whenever the text line starts with “Employee:”  then this is the row that has the Employee Code value in it.  We can then use RegEx to extract it and with then we can keep assigning this code value to all subsequent rows until we reach the next line that has “Employee:” in it, which is where the next employee’s group of data starts.

 

@ReplaceRegularExpression(@Value(text_line_data),"^Employee:\s+(\d+).+$",\1)

The RegEx to extract the employee code puts the group of Digits “\d+” into RegEx string pattern Group 1 by putting the brackets “(...)” around it, and then we use this bracketed group of characters to use as the value for employee_number

Through the Adjacent Features we then assign this value to all subsequent rows except where we encounter a new row that again starts with “Employee:”

Output is:

 

 

From here is a Tester to just get the Code vs Value rows and get rid of the rest Eg.
 

 

Then we can use AttributeSplitter to create Lists per Employee per Code to capture the Unit values.  Assuming it is Tab delimited, then 

 

 

 

An important knowledge item in FME is that “Lists” are actually just normal Attributes but with an Attribute Name that is formatted like _list{0} , _list{1} etc.

We can use this knowledge for each row to Create New Attributes from each List Attribute using AttributeCreator to assign each List Attribute value to a “Normal” Attribute.

 


sworkman
Contributor
Forum|alt.badge.img+4
  • Author
  • Contributor
  • January 21, 2025

THANK YOU SO MUCH!! This is a mind saver. I was able to get it up and running. I knew that this had to be a fairly common situation but was struggling to find anything on it. 


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings