Skip to main content

Anyone know of a good way to analyze attribute values, and determine if a given value is a word in English?

Or maybe even check it against a custom dictionary of names? I'm trying to clean up a bunch of values that have spaces from when the PDFs were output to data via OCR, so it looks like this:

 

Attribute ValueWhat I want attribute to be corrected toThi s is a sent e nc e.This is a sentence.

Hi,

I'm not sure 100% but I will suggest you to use attribute splitter and divide with the help of " " (space). If the list has signal letter other than "a" then merge left list{x} and right list{x} such that word will form.

Above example will be like

Thi s is a sent e nc e --> Thisis a sentence

based on the results you add further more

Pratap


https://knowledge.safe.com/questions/62428/spell-check-attribute-values.html

This might help if you have knowledge of Python


That's a pretty cool, but potentially difficult issue to deal with.

How do you plan on dealing with ambiguities, e.g. "a void" vs "avoid"?


Good question @david_r

At the end, english is a language and depends on the situation the meaning of the sentence will change. In this context, we are joining word/words and trying to make a sentence so the words "a void" or "avoid" both are correct in english :)

So it depends on @dmatranga to decide...

Pratap


Reply