Anyone help me out to extarct the data from this pdf please
???
What have you tried so far?
This
Looks good. Although not very useful since when FME reads from a PDF it regards every text object as a single object, rather than actual lines of data. However, they do all have x and y coordinates (in an arbitrary coordinate system).
You can group objects (texts) that belong to the same line (i.e. 'record') based on their x value and identify the columns based on their y value. I'd try a CoordinateExtractor to get the x and y coordinates of all the texts, then an Aggregator, grouping by x coordinate and creating a list. Sort that list on y value and then you should have it, assuming there's no empty cells.
This is also the point where I would turn towards whoever gave me the assignment, tell them I'd rather not make any guarantees about the quality of the output and ask them if they have the original data (spreadsheets).
there is no x and Y in the data
there is no x and Y in the data
Have you added a CoordinateExtractor?
yes i did
this is spatial data
yes i did
And how did you configure it?
Sometimes a transformer does what you want it to do with its defaults, but sometimes it doesn't. Ultimately you are the best person to decide what you want. So if somebody tells you to use a certain transformer and it doesn't appear to work right away, it's generally a good idea to check the documentation.
In the case of the CoordinateExtractor you can set it to extract a specific coordinate, you'll want to use the index 0 there to indicate the first coordinate (as they're points there is only one coordinate anyway) and then by default that stores it in attributes _x and _y. You can then use those in your Aggregator.
I would try to recreate the rows and columns from the PDF file, and carry on from there.
Read the PDF, expose fme_text_string and pdf_page_number, and extract the coordinates.
As the coordinates are calculated per page, lower the Y value for the objects on page 2 (of 2).
Sort on Y (descending) and X (ascending).
Now calculate a row and column number for each feature, looking at the X and Y of this feature and the previous feature, using the row and column values of the previous feature.
(Refinement: as the first cell under the month has to stay empty, make a provision for this).
Now you can further process the data according to your wishes and needs (Aggregator, InlineQuerier, write to a temporary Excel file, or something else). Good luck!