Solved

Split list by keyword to individual lists.Or something??

5 months ago
January 17, 2025
2 replies
54 views

+4

sworkman
Contributor
2 replies

Hello,

I have a pdf report that I need to parse out. I can read it and create a line by line list. It has a line with employee code/name, then subsequent lines with codes and their values associated to the employee. Here is a sample of the report and then what it looks like when it is read in.

I can get it split nicely into rows/columns/attributes no problem. My issue is that I dont know how to break it up in each employees portion and add the employee number to each row.

There would be a few hundred employees.

So in the end I want to be able to spit out a table that has the columns:

Employee num, Code, Description, units, extensions,etc

I am very new to FME and just struggling to get it figured out. I dont know if I should be using some kind of loop, python code, aggreator etc.

Any help would be greatly appreciated!

Best answer by bwn

To break it down into the main challenges:

The first step would be to assign all text line rows an Attribute value with the Employee Number. This can be done with AttributeCreator run in Adjacent Features mode.

What we can do is get AttributeCreator to step through each text line and whenever the text line starts with “Employee:” then this is the row that has the Employee Code value in it. We can then use RegEx to extract it and with then we can keep assigning this code value to all subsequent rows until we reach the next line that has “Employee:” in it, which is where the next employee’s group of data starts.

@ReplaceRegularExpression(@Value(text_line_data),"^Employee:\s+(\d+).+$",\1)

The RegEx to extract the employee code puts the group of Digits “\d+” into RegEx string pattern Group 1 by putting the brackets “(...)” around it, and then we use this bracketed group of characters to use as the value for employee_number

Through the Adjacent Features we then assign this value to all subsequent rows except where we encounter a new row that again starts with “Employee:”

Output is:

From here is a Tester to just get the Code vs Value rows and get rid of the rest Eg.

Then we can use AttributeSplitter to create Lists per Employee per Code to capture the Unit values. Assuming it is Tab delimited, then

An important knowledge item in FME is that “Lists” are actually just normal Attributes but with an Attribute Name that is formatted like _list{0} , _list{1} etc.

We can use this knowledge for each row to Create New Attributes from each List Attribute using AttributeCreator to assign each List Attribute value to a “Normal” Attribute.

View original

Did this help you find an answer to your question?

+26

bwn
Evangelist
562 replies
Best Answer
5 months ago
January 17, 2025

To break it down into the main challenges:

The first step would be to assign all text line rows an Attribute value with the Employee Number. This can be done with AttributeCreator run in Adjacent Features mode.

What we can do is get AttributeCreator to step through each text line and whenever the text line starts with “Employee:” then this is the row that has the Employee Code value in it. We can then use RegEx to extract it and with then we can keep assigning this code value to all subsequent rows until we reach the next line that has “Employee:” in it, which is where the next employee’s group of data starts.

@ReplaceRegularExpression(@Value(text_line_data),"^Employee:\s+(\d+).+$",\1)

The RegEx to extract the employee code puts the group of Digits “\d+” into RegEx string pattern Group 1 by putting the brackets “(...)” around it, and then we use this bracketed group of characters to use as the value for employee_number

Through the Adjacent Features we then assign this value to all subsequent rows except where we encounter a new row that again starts with “Employee:”

Output is:

From here is a Tester to just get the Code vs Value rows and get rid of the rest Eg.

Then we can use AttributeSplitter to create Lists per Employee per Code to capture the Unit values. Assuming it is Tab delimited, then

An important knowledge item in FME is that “Lists” are actually just normal Attributes but with an Attribute Name that is formatted like _list{0} , _list{1} etc.

We can use this knowledge for each row to Create New Attributes from each List Attribute using AttributeCreator to assign each List Attribute value to a “Normal” Attribute.

+4

sworkman
Author
Contributor
2 replies
5 months ago
January 21, 2025

THANK YOU SO MUCH!! This is a mind saver. I was able to get it up and running. I knew that this had to be a fairly common situation but was struggling to find anything on it.

Reply

Rich Text Editor, editor1

Split list by keyword to individual lists.Or something??

2 replies

Reply

Helpful Members This Week

Recently Solved Questions

How to select attributes which are different for different features?

Copy Tables from SDE to FileGDB Using Excel Mapping in FME

IfcZone managment

Loop?

Writing solids

Community Stats

Latest FME

Cookie policy

Cookie settings

Reply

Related Topics

Lens cannot be traced, tolerances may be too loose vs mtficon

Causes of message "Lens cannot be traced, tolerances may be too loose"icon

asking about Zemax tolerance analysisicon

Best practices for importing interferometer measured lens data into OpticStudio

Abnormally long load/optimization times when using TOLR

Helpful Members This Week

Recently Solved Questions

How to select attributes which are different for different features?

Copy Tables from SDE to FileGDB Using Excel Mapping in FME

IfcZone managment

Loop?

Writing solids

Popular Tags

Community Stats

Latest FME

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded

Cookie policy

Cookie settings