Question

Spatial PDF to Shapefile/Geopackage


Badge +1

Hi everyone, I'm trying to convert a PDF file containing geographic data into a shapefile format using FME. There is a lot of work that has been going on in the recent past on this, and even I had tried a few online tools, QGIS plugins and now playing around with FME using a few transformers.

Actually, my spatial pdf file has the transmission lines network that includes transmission lines, substations and the geographic region with text information provided that represents the points and lines on the file with a descriptive legend.

Since I couldn't efficiently work with that file as it is 15Mb and has around 800k entities, I am working on a sample pdf.

Can someone guide me through about what concept can be applied to perform this?

Thank you.

Attached:

wallawalla.pdf: sample file

ERCOT: Practical file


5 replies

Userlevel 3
Badge +17

What happens when you read the pdf and write to shape?

 

What is the desired result?

 

Please explain what you tried, what worked, what didn't work and what you're end goal is.

Badge +1

What happens when you read the pdf and write to shape?

 

What is the desired result?

 

Please explain what you tried, what worked, what didn't work and what you're end goal is.

Hey @jkr_da​ 

So, I tried this wallawalla pdf to perform my action. I used reader as 'individual feature types', used TextPropertyExtractor to get labels in my attribute table and fetch lat/lon and used FeatureColorSetter at the end. Below is the output in geopackage file.

Now, the problem I am facing with my file (ERCOT) is:

  • Its huge, around 15Mb with 800k entities attached to it
  • It has multiple maps inside it.
  • Takes good 4 hours to read the pdf file, around 4 hours to run the translation, and that's no problem.
  • And, I am not able to split this file in point, line, polygon geometries and read its text value.

imageDesired result would be a georeferenced(in 3857) geopackage file with geometry and corresponding attribute values.

 

Thank you.

Userlevel 5
Badge +25

Hey @jkr_da​ 

So, I tried this wallawalla pdf to perform my action. I used reader as 'individual feature types', used TextPropertyExtractor to get labels in my attribute table and fetch lat/lon and used FeatureColorSetter at the end. Below is the output in geopackage file.

Now, the problem I am facing with my file (ERCOT) is:

  • Its huge, around 15Mb with 800k entities attached to it
  • It has multiple maps inside it.
  • Takes good 4 hours to read the pdf file, around 4 hours to run the translation, and that's no problem.
  • And, I am not able to split this file in point, line, polygon geometries and read its text value.

imageDesired result would be a georeferenced(in 3857) geopackage file with geometry and corresponding attribute values.

 

Thank you.

Are you sure the ERCOT map is georeferenced? When I opened it in MAPublisher it appeared to be "just a pdf". It's also not layered which is making it complicated to get the right information out.

Badge +1

Hey @jkr_da​ 

So, I tried this wallawalla pdf to perform my action. I used reader as 'individual feature types', used TextPropertyExtractor to get labels in my attribute table and fetch lat/lon and used FeatureColorSetter at the end. Below is the output in geopackage file.

Now, the problem I am facing with my file (ERCOT) is:

  • Its huge, around 15Mb with 800k entities attached to it
  • It has multiple maps inside it.
  • Takes good 4 hours to read the pdf file, around 4 hours to run the translation, and that's no problem.
  • And, I am not able to split this file in point, line, polygon geometries and read its text value.

imageDesired result would be a georeferenced(in 3857) geopackage file with geometry and corresponding attribute values.

 

Thank you.

@Hans van der Maarel​ That's what I suspected. Now, I believe ERCOT is not georeferenced, nor is it layered. But, this is the only data that is provided from the service provider.

Can we using some work around also map these? Because, I suspect it won't be georeferenced for other providers as well.

Userlevel 5
Badge +25

Hey @jkr_da​ 

So, I tried this wallawalla pdf to perform my action. I used reader as 'individual feature types', used TextPropertyExtractor to get labels in my attribute table and fetch lat/lon and used FeatureColorSetter at the end. Below is the output in geopackage file.

Now, the problem I am facing with my file (ERCOT) is:

  • Its huge, around 15Mb with 800k entities attached to it
  • It has multiple maps inside it.
  • Takes good 4 hours to read the pdf file, around 4 hours to run the translation, and that's no problem.
  • And, I am not able to split this file in point, line, polygon geometries and read its text value.

imageDesired result would be a georeferenced(in 3857) geopackage file with geometry and corresponding attribute values.

 

Thank you.

You really need to have a georeference. If you don't have one, there's ways to get that, but I personally wouldn't use FME for that. Also, you'd have to georeference the inset maps all separately and since the file is not layered you really have to do that manually.

 

Manual georeferencing is possible, but there's no guarantees about the success rate and accuracy.

Reply