Skip to main content

I’ve reached the workflow stage, but I can’t proceed further. I’m trying to extract multiple tables from a PDF file. I’ve been following https://support.safe.com/hc/en-us/articles/25407564475277-Extracting-Text-and-Tabular-Data-from-PDF#h_01HW3Z9Z37Q33NQQ7R0XBGEVB0, but the structure of my PDF is quite different.

I only want to extract the tables—nothing else. In my case, the table has titles on the right and content on the left. I used the StringSearcher to extract the titles, since they are easier to identify (usually one word).

The challenge now is that I can’t extract or separate the content, as it’s made up of long sentences that are mixed with the titles. I'm looking for a solution to:

  • Separate the content from the titles

  • Structure the extracted data into table as the source data

     

  •  

the issue here is after the TestFilter_2…. you have 47 features incoming and you filter out 14. But after this you branch out the 14 times 4. This is not necessary. Only have one attribute manager

did you know that the stringSearcher can be performed in an AttributeManager using the text editor on a value… where you use RegEx test. Look up advanced attribute management


the issue here is after the TestFilter_2…. you have 47 features incoming and you filter out 14. But after this you branch out the 14 times 4. This is not necessary. Only have one attribute manager

did you know that the stringSearcher can be performed in an AttributeManager using the text editor on a value… where you use RegEx test. Look up advanced attribute management

Thanks for your assistance. I was using the StringReplacer to correct some grammar mistakes in many sentences. That’s why I’ve used it more than four times to adjust the wording. I believe some of the text was written incorrectly because the PDF files were exported from Adobe InDesign or Illustrator.

Anyway, do you know a transformer that can help me combine words based on a text pattern?

For example, if the transformer detects the word 'Sub' and 'title', it should combine them into 'Subtitle'


Hi ​@mohamedalsobh 

You can use a StringReplacer that can search for “sub” and “title” and combine them. 

Here is how you would set that up in the parameters