Question

Analyze attribute values for whole words

  • 21 April 2018
  • 4 replies
  • 0 views

Badge +4

Anyone know of a good way to analyze attribute values, and determine if a given value is a word in English?

Or maybe even check it against a custom dictionary of names? I'm trying to clean up a bunch of values that have spaces from when the PDFs were output to data via OCR, so it looks like this:

 

Attribute ValueWhat I want attribute to be corrected toThi s is a sent e nc e.This is a sentence.


4 replies

Badge +2

Hi,

I'm not sure 100% but I will suggest you to use attribute splitter and divide with the help of " " (space). If the list has signal letter other than "a" then merge left list{x} and right list{x} such that word will form.

Above example will be like

Thi s is a sent e nc e --> Thisis a sentence

based on the results you add further more

Pratap

Badge +2

https://knowledge.safe.com/questions/62428/spell-check-attribute-values.html

This might help if you have knowledge of Python

Userlevel 4

That's a pretty cool, but potentially difficult issue to deal with.

How do you plan on dealing with ambiguities, e.g. "a void" vs "avoid"?

Badge +2

Good question @david_r

At the end, english is a language and depends on the situation the meaning of the sentence will change. In this context, we are joining word/words and trying to make a sentence so the words "a void" or "avoid" both are correct in english :)

So it depends on @dmatranga to decide...

Pratap

Reply