Skip to main content
Solved

PDF to Table

  • May 25, 2025
  • 3 replies
  • 116 views

mohamedalsobh
Contributor
Forum|alt.badge.img+6

I’ve reached the workflow stage, but I can’t proceed further. I’m trying to extract multiple tables from a PDF file. I’ve been following https://support.safe.com/hc/en-us/articles/25407564475277-Extracting-Text-and-Tabular-Data-from-PDF#h_01HW3Z9Z37Q33NQQ7R0XBGEVB0, but the structure of my PDF is quite different.

I only want to extract the tables—nothing else. In my case, the table has titles on the right and content on the left. I used the StringSearcher to extract the titles, since they are easier to identify (usually one word).

The challenge now is that I can’t extract or separate the content, as it’s made up of long sentences that are mixed with the titles. I'm looking for a solution to:

  • Separate the content from the titles

  • Structure the extracted data into table as the source data

     

  •  

Best answer by crystalatsafe

Hi ​@mohamedalsobh 

You can use a StringReplacer that can search for “sub” and “title” and combine them. 

Here is how you would set that up in the parameters

 

This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

3 replies

j.botterill
Influencer
Forum|alt.badge.img+53
  • Influencer
  • May 26, 2025

the issue here is after the TestFilter_2…. you have 47 features incoming and you filter out 14. But after this you branch out the 14 times 4. This is not necessary. Only have one attribute manager

did you know that the stringSearcher can be performed in an AttributeManager using the text editor on a value… where you use RegEx test. Look up advanced attribute management


mohamedalsobh
Contributor
Forum|alt.badge.img+6
  • Author
  • Contributor
  • May 31, 2025

the issue here is after the TestFilter_2…. you have 47 features incoming and you filter out 14. But after this you branch out the 14 times 4. This is not necessary. Only have one attribute manager

did you know that the stringSearcher can be performed in an AttributeManager using the text editor on a value… where you use RegEx test. Look up advanced attribute management

Thanks for your assistance. I was using the StringReplacer to correct some grammar mistakes in many sentences. That’s why I’ve used it more than four times to adjust the wording. I believe some of the text was written incorrectly because the PDF files were exported from Adobe InDesign or Illustrator.

Anyway, do you know a transformer that can help me combine words based on a text pattern?

For example, if the transformer detects the word 'Sub' and 'title', it should combine them into 'Subtitle'


crystalatsafe
Safer
Forum|alt.badge.img+26
  • Safer
  • Best Answer
  • June 3, 2025

Hi ​@mohamedalsobh 

You can use a StringReplacer that can search for “sub” and “title” and combine them. 

Here is how you would set that up in the parameters