Question

Analyze attribute values for whole words

7 years ago
April 21, 2018
4 replies
8 views

dmatranga
30 replies

Anyone know of a good way to analyze attribute values, and determine if a given value is a word in English?

Or maybe even check it against a custom dictionary of names? I'm trying to clean up a bunch of values that have spaces from when the PDFs were output to data via OCR, so it looks like this:

Attribute ValueWhat I want attribute to be corrected toThi s is a sent e nc e.This is a sentence.

+11

pratap
Contributor
600 replies
7 years ago
April 21, 2018

Hi,

I'm not sure 100% but I will suggest you to use attribute splitter and divide with the help of " " (space). If the list has signal letter other than "a" then merge left list{x} and right list{x} such that word will form.

Above example will be like

Thi s is a sent e nc e --> Thisis a sentence

based on the results you add further more

Pratap

+11

pratap
Contributor
600 replies
7 years ago
April 21, 2018

https://knowledge.safe.com/questions/62428/spell-check-attribute-values.html

This might help if you have knowledge of Python

david_r
8359 replies
7 years ago
April 23, 2018

That's a pretty cool, but potentially difficult issue to deal with.

How do you plan on dealing with ambiguities, e.g. "a void" vs "avoid"?

+11

pratap
Contributor
600 replies
7 years ago
April 23, 2018

Good question @david_r

At the end, english is a language and depends on the situation the meaning of the sentence will change. In this context, we are joining word/words and trying to make a sentence so the words "a void" or "avoid" both are correct in english :)

So it depends on @dmatranga to decide...

Pratap

Reply

Rich Text Editor, editor1

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

Analyze attribute values for whole words