Skip to main content

Hi,

 

I encountered something peculiar last week. I had to compare two datasets about chemical substances measured in groundwater.

To compare the datasets, I used the ChangeDetector. While doing so, for one substance an update/change was found for the value of a CASnumber. However, when I look at the originalValue and the revisedValue, I don't see any difference. Also, when I perform a compare using (a plugin of) Notepad++, it also detects a difference in the two CASnumbers. So there seems to be a difference, I just can't seem to spot the difference ;)

 

Maybe some people here can help take a look and provide an explanation?

 

Any feedback is appreciated.

 

The CASnumbers are:

CASnumber_1 = '‎483-63-6'

CASnumber_2 = '483-63-6'

 

See also the screenshots below for a dummy workspace and the comparison using (a plugin of) Notepad++;

imageimage.png 

Because they're different 😁 : CASnumber_1 contains some invisible character in front of the 4.

Saving the value of CASnumber_1 to a text file gives a file of 11 bytes. CASnumber_2 results in a text file of 8 bytes.

Comparing both files in PowerShell gives:

CASnumber_compareIn the AttributeCreator, you can go to the start of CASnumber_1 and press Delete once. This will not visually change the value, but afterwards the TestFilter will indicate both values are equal.


Because they're different 😁 : CASnumber_1 contains some invisible character in front of the 4.

Saving the value of CASnumber_1 to a text file gives a file of 11 bytes. CASnumber_2 results in a text file of 8 bytes.

Comparing both files in PowerShell gives:

CASnumber_compareIn the AttributeCreator, you can go to the start of CASnumber_1 and press Delete once. This will not visually change the value, but afterwards the TestFilter will indicate both values are equal.

Thanks!

That indeed explains. I guess invisible characters are not always easy to spot ;)

I opened my sample workspace in NotePad++, and also found a similar observation;

<XFORM_PARM PARM_NAME="ATTR_TABLE" PARM_VALUE="&quot;&quot; CASnumber_1 SET_TO &lt;u200e&gt;483-63-6 CASnumber_2 SET_TO 483-63-6"/>


Nice find!

U200E is the Left-To-Right Mark, which of course is an invisible character.

Makes one wonder why it's there...


Nice find!

U200E is the Left-To-Right Mark, which of course is an invisible character.

Makes one wonder why it's there...

Yeah, I also found that. And &lt; is xml encoding of '<' (less than), and &gt; is xml encoding of '>' (greater than). So to me it seems like 3 characters, but maybe the '<' and '>' are stored/read as a kind of header/declaration, i guess a as kind of wrapper around the Left To Right Mark.

 

I also don't know why it's there in my dataset, but that's a different question :|

Thanks again for helping me spot it 🙂

First thing is knowing what's up, second thing is how to deal with it ;)


Nice find!

U200E is the Left-To-Right Mark, which of course is an invisible character.

Makes one wonder why it's there...

You're most welcome!


Reply