Question

Converting CAD lines back to text

5 years ago
April 8, 2020
6 replies
126 views

kkrajewski
14 replies

Any ideas how would FME power be able to convert a 'decapitated' CAD file back to text?

I have a PDF file which used to be a CAD drawing (not sure if Autodesk or Bentley) and it contains only a 'dropped/exploded' line segment geometry. I have managed to read it into FME as line geometries, rotate the page so that it's 'level' and use LineCombiner to join segments back to 'character shapes'.

Now I have geometries as shown below (some are perfect characters, some are in 2-3 parts, like 'd or b') and have no idea how to turn it back into text. I tried exporting it to DGN but Microstation doesn't seem to have a function like that either (i.e. once you drop/explode a text to lines, there's only 'Undo' to help, no function to 'characterise' it again, that I found).

Also tried rasterising it with ImageRasterizer and then calling a Tesseract to attempt OCR, but it notoriously returns 'No text found' regardless of resolution, colour, pixel size, background colour or language.

So any further ideas how to resurrect this 'almost there' file?

jakemolnar
98 replies
5 years ago
April 8, 2020

Is it all the same font?

If so, you might be able to distinguish the characters by properties such as:

number of vertices
angle between first and last vertex
bounding box height/width ratio
etc.

It would be a slow, painstaking process to get it working. Might do the trick though

kkrajewski
Author
14 replies
5 years ago
April 9, 2020

jakemolnar wrote:

Is it all the same font?

If so, you might be able to distinguish the characters by properties such as:

number of vertices
angle between first and last vertex
bounding box height/width ratio
etc.

It would be a slow, painstaking process to get it working. Might do the trick though

@jakemolnar yes it is, but as with CAD schematic, some is vertical, some horizontal, different sizes. Not sure I'm so much determined to create a set of rules for every character and scale it by font size. See below.

lindsay
Contributor
16 replies
5 years ago
April 9, 2020

Do you get any improvement if you buffer the lines prior to rasterisation?

And I'm not sure if the GoogleVisionConnector is an option for the OCR. I saw someone mention using it (albeit unsuccessfully) for a word-find problem with last week's quiz...

kkrajewski
Author
14 replies
5 years ago
April 9, 2020

lindsay wrote:

Do you get any improvement if you buffer the lines prior to rasterisation?

And I'm not sure if the GoogleVisionConnector is an option for the OCR. I saw someone mention using it (albeit unsuccessfully) for a word-find problem with last week's quiz...

@lindsay I have failed at attempting to set up a Cloud authentication. The build-in one does not work for me. @gerhardatsafe ??

gerhardatsafe
288 replies
5 years ago
April 9, 2020

kkrajewski wrote:

@lindsay I have failed at attempting to set up a Cloud authentication. The build-in one does not work for me. @gerhardatsafe ??

Hi @kkrajewski,

We just released a new version of the Google AI package including the GoogleVisionConnector that allows you to use a Service Account key directly in the transformer (https://cloud.google.com/docs/authentication/end-user#creating_your_client_credentials).

You can still use your own OAuth 2.0 credentials as well. Here's good resource on how to create and use your own OAuth 2.0 client: https://cdn.safe.com/resources/ebook/Creating-Web-Connections.pdf

You can upgrade to the new package in FME Desktop under FME Options -> Packages or download the new version via FME Hub and drag & drop it onto your canvas.

Let us know how this goes!

jakemolnar
98 replies
5 years ago
April 9, 2020

kkrajewski wrote:

In that case, I'd say @lindsay has the best idea: try buffering before raster OCR. Otherwise you are dealing with the hard problem of rolling your own character recognition. https://imgs.xkcd.com/comics/tasks.png

Reply

Rich Text Editor, editor1

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

Converting CAD lines back to text