Question

Extract the Data from pDF

2 years ago
July 13, 2023
10 replies
295 views

asadamjad
170 replies

Anyone help me out to extarct the data from this pdf please

asadamjad
Author
170 replies
2 years ago
July 13, 2023

???

+50

redgeographics
Celebrity
3643 replies
2 years ago
July 13, 2023

What have you tried so far?

asadamjad
Author
170 replies
2 years ago
July 13, 2023

This

1 Attachments

+50

redgeographics
Celebrity
3643 replies
2 years ago
July 13, 2023

Looks good. Although not very useful since when FME reads from a PDF it regards every text object as a single object, rather than actual lines of data. However, they do all have x and y coordinates (in an arbitrary coordinate system).

You can group objects (texts) that belong to the same line (i.e. 'record') based on their x value and identify the columns based on their y value. I'd try a CoordinateExtractor to get the x and y coordinates of all the texts, then an Aggregator, grouping by x coordinate and creating a list. Sort that list on y value and then you should have it, assuming there's no empty cells.

This is also the point where I would turn towards whoever gave me the assignment, tell them I'd rather not make any guarantees about the quality of the output and ask them if they have the original data (spreadsheets).

asadamjad
Author
170 replies
2 years ago
July 13, 2023

there is no x and Y in the data

+50

redgeographics
Celebrity
3643 replies
2 years ago
July 13, 2023

asadamjad wrote:

there is no x and Y in the data

Have you added a CoordinateExtractor?

asadamjad
Author
170 replies
2 years ago
July 13, 2023

yes i did

asadamjad
Author
170 replies
2 years ago
July 13, 2023

this is spatial data

+50

redgeographics
Celebrity
3643 replies
2 years ago
July 13, 2023

asadamjad wrote:

yes i did

And how did you configure it?

Sometimes a transformer does what you want it to do with its defaults, but sometimes it doesn't. Ultimately you are the best person to decide what you want. So if somebody tells you to use a certain transformer and it doesn't appear to work right away, it's generally a good idea to check the documentation.

In the case of the CoordinateExtractor you can set it to extract a specific coordinate, you'll want to use the index 0 there to indicate the first coordinate (as they're points there is only one coordinate anyway) and then by default that stores it in attributes _x and _y. You can then use those in your Aggregator.

+51

geomancer
Evangelist
899 replies
2 years ago
July 14, 2023

I would try to recreate the rows and columns from the PDF file, and carry on from there.

Read the PDF, expose fme_text_string and pdf_page_number, and extract the coordinates.

As the coordinates are calculated per page, lower the Y value for the objects on page 2 (of 2).

Sort on Y (descending) and X (ascending).

Now calculate a row and column number for each feature, looking at the X and Y of this feature and the previous feature, using the row and column values of the previous feature.

(Refinement: as the first cell under the month has to stay empty, make a provision for this).

Now you can further process the data according to your wishes and needs (Aggregator, InlineQuerier, write to a temporary Excel file, or something else). Good luck!

Read_PDF

Reply

Rich Text Editor, editor1

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

Extract the Data from pDF

10 replies

1 Attachments

Reply

Helpful Members This Week

Recently Solved Questions

During Using FeatureTypeFilter Update the _relationship attribute lost.

UI Language switch?

Use SAML account in FME Flow REST API

DocumentPDFStyler: URL in a table

How share an Automation app without requiring to authenticate

Community Stats

Latest FME

Cookie policy

Cookie settings

1 Attachments

Reply

Related Topics

Is there a way to identify the page number as an attribute when extracting vector data from an Adobe Geospatial PDF reader?icon

PDF to Tableicon

Using PythonCaller to convert PDF file to imageicon

HTTP Caller POST To Add Attachment to Cityworks Work Ordericon

PDF to BMP to Base64icon

Helpful Members This Week

Recently Solved Questions

During Using FeatureTypeFilter Update the _relationship attribute lost.

UI Language switch?

Use SAML account in FME Flow REST API

DocumentPDFStyler: URL in a table

How share an Automation app without requiring to authenticate

Popular Tags

Community Stats

Latest FME

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded

Cookie policy

Cookie settings