Skip to main content
Solved

PDF to Table


mohamedalsobh
Contributor
Forum|alt.badge.img+4

I’ve reached the workflow stage, but I can’t proceed further. I’m trying to extract multiple tables from a PDF file. I’ve been following https://support.safe.com/hc/en-us/articles/25407564475277-Extracting-Text-and-Tabular-Data-from-PDF#h_01HW3Z9Z37Q33NQQ7R0XBGEVB0, but the structure of my PDF is quite different.

I only want to extract the tables—nothing else. In my case, the table has titles on the right and content on the left. I used the StringSearcher to extract the titles, since they are easier to identify (usually one word).

The challenge now is that I can’t extract or separate the content, as it’s made up of long sentences that are mixed with the titles. I'm looking for a solution to:

  • Separate the content from the titles

  • Structure the extracted data into table as the source data

     

  •  

Best answer by crystalatsafe

Hi ​@mohamedalsobh 

You can use a StringReplacer that can search for “sub” and “title” and combine them. 

Here is how you would set that up in the parameters

 

View original
Did this help you find an answer to your question?

3 replies

j.botterill
Influencer
Forum|alt.badge.img+40
  • Influencer
  • May 26, 2025

the issue here is after the TestFilter_2…. you have 47 features incoming and you filter out 14. But after this you branch out the 14 times 4. This is not necessary. Only have one attribute manager

did you know that the stringSearcher can be performed in an AttributeManager using the text editor on a value… where you use RegEx test. Look up advanced attribute management


mohamedalsobh
Contributor
Forum|alt.badge.img+4
j.botterill wrote:

the issue here is after the TestFilter_2…. you have 47 features incoming and you filter out 14. But after this you branch out the 14 times 4. This is not necessary. Only have one attribute manager

did you know that the stringSearcher can be performed in an AttributeManager using the text editor on a value… where you use RegEx test. Look up advanced attribute management

Thanks for your assistance. I was using the StringReplacer to correct some grammar mistakes in many sentences. That’s why I’ve used it more than four times to adjust the wording. I believe some of the text was written incorrectly because the PDF files were exported from Adobe InDesign or Illustrator.

Anyway, do you know a transformer that can help me combine words based on a text pattern?

For example, if the transformer detects the word 'Sub' and 'title', it should combine them into 'Subtitle'


crystalatsafe
Safer
Forum|alt.badge.img+19
  • Safer
  • Best Answer
  • June 3, 2025

Hi ​@mohamedalsobh 

You can use a StringReplacer that can search for “sub” and “title” and combine them. 

Here is how you would set that up in the parameters

 


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings