Solved

PDF to Table

25 days ago
May 25, 2025
3 replies
61 views

mohamedalsobh
Contributor
6 replies

I’ve reached the workflow stage, but I can’t proceed further. I’m trying to extract multiple tables from a PDF file. I’ve been following https://support.safe.com/hc/en-us/articles/25407564475277-Extracting-Text-and-Tabular-Data-from-PDF#h_01HW3Z9Z37Q33NQQ7R0XBGEVB0, but the structure of my PDF is quite different.

I only want to extract the tables—nothing else. In my case, the table has titles on the right and content on the left. I used the StringSearcher to extract the titles, since they are easier to identify (usually one word).

The challenge now is that I can’t extract or separate the content, as it’s made up of long sentences that are mixed with the titles. I'm looking for a solution to:

Separate the content from the titles
Structure the extracted data into table as the source data

Best answer by crystalatsafe

Hi @mohamedalsobh

You can use a StringReplacer that can search for “sub” and “title” and combine them.

Here is how you would set that up in the parameters

View original

Did this help you find an answer to your question?

+40

j.botterill
Influencer
308 replies
24 days ago
May 26, 2025

the issue here is after the TestFilter_2…. you have 47 features incoming and you filter out 14. But after this you branch out the 14 times 4. This is not necessary. Only have one attribute manager

did you know that the stringSearcher can be performed in an AttributeManager using the text editor on a value… where you use RegEx test. Look up advanced attribute management

mohamedalsobh
Author
Contributor
6 replies
19 days ago
May 31, 2025

j.botterill wrote:

the issue here is after the TestFilter_2…. you have 47 features incoming and you filter out 14. But after this you branch out the 14 times 4. This is not necessary. Only have one attribute manager

did you know that the stringSearcher can be performed in an AttributeManager using the text editor on a value… where you use RegEx test. Look up advanced attribute management

Thanks for your assistance. I was using the StringReplacer to correct some grammar mistakes in many sentences. That’s why I’ve used it more than four times to adjust the wording. I believe some of the text was written incorrectly because the PDF files were exported from Adobe InDesign or Illustrator.

Anyway, do you know a transformer that can help me combine words based on a text pattern?

For example, if the transformer detects the word 'Sub' and 'title', it should combine them into 'Subtitle'

+19

crystalatsafe
Safer
119 replies
Best Answer
15 days ago
June 3, 2025

Hi @mohamedalsobh

You can use a StringReplacer that can search for “sub” and “title” and combine them.

Here is how you would set that up in the parameters

Reply

Rich Text Editor, editor1

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

PDF to Table