Skip to main content
Question

How to filter attribute on Bytes vs String Data type


thijsknapen
Contributor
Forum|alt.badge.img+10

Say I have the following data;

imagefor which I can see the following information in the Feature Information window;

feature 1; text (bytes): 48656C6C6F
feature 2; text (string: UTF-8): World

Is there a way that I can filter the features on the basis of the indicated data type (i.e. 'bytes' vs 'string (UTF-8)'?

10 replies

lgrie
Contributor
Forum|alt.badge.img+7
  • Contributor
  • July 17, 2023

You could use a TestFilter.

If you select "Type Is" as an operator you can filter all strings.

image


lgrie
Contributor
Forum|alt.badge.img+7
  • Contributor
  • July 17, 2023
lgrie wrote:

You could use a TestFilter.

If you select "Type Is" as an operator you can filter all strings.

image

You could also use "Encodable In" and select "UTF-8" in the options.


thijsknapen
Contributor
Forum|alt.badge.img+10
  • Author
  • Contributor
  • July 17, 2023
lgrie wrote:

You could use a TestFilter.

If you select "Type Is" as an operator you can filter all strings.

image

Hi @lgrie​ 

Thanks for the response. I already tried those two options. Unfortunately this doesn't work (both features pass these tests);

image


lgrie
Contributor
Forum|alt.badge.img+7
  • Contributor
  • July 17, 2023
thijsknapen wrote:

Hi @lgrie​ 

Thanks for the response. I already tried those two options. Unfortunately this doesn't work (both features pass these tests);

image

Can you provide a sample dataset ?

 

Does the string attribut contain any numbers ? if not, you could use a RegEx to filter.

 


thijsknapen
Contributor
Forum|alt.badge.img+10
  • Author
  • Contributor
  • July 17, 2023
thijsknapen wrote:

Hi @lgrie​ 

Thanks for the response. I already tried those two options. Unfortunately this doesn't work (both features pass these tests);

image

Sure, it's now added to the main ticket/question.


lgrie
Contributor
Forum|alt.badge.img+7
  • Contributor
  • July 17, 2023
thijsknapen wrote:

Hi @lgrie​ 

Thanks for the response. I already tried those two options. Unfortunately this doesn't work (both features pass these tests);

image

I cannot open it, sorry.


thijsknapen
Contributor
Forum|alt.badge.img+10
  • Author
  • Contributor
  • July 17, 2023
thijsknapen wrote:

Hi @lgrie​ 

Thanks for the response. I already tried those two options. Unfortunately this doesn't work (both features pass these tests);

image

Hmm, strange. Why not?

Did you use the 'FME Feature Store (FFS)' reader?

If I re-download the file (zipped FFS), I can successfully read/inspect it. (on FME 2022.1.0.0 - Build 22618 - WIN64) 


ebygomm
Influencer
Forum|alt.badge.img+39
  • Influencer
  • July 17, 2023

Not sure how reliable this method is, it works for your test data.

 

In FME, copy the attribute to a new value, use the AttributeEncoder with Incoming Attribute parameter set to "Use Bytes", tester to check if the encoded attribute is different from the original attribute

image 

Python

import fme
import fmeobjects
 
def FeatureProcessor(feature):
    data = feature.getAttribute("text")
    try:
        data = data.decode()
        feature.setAttribute("datatype","bytes")
    except (UnicodeDecodeError,AttributeError):
        feature.setAttribute("datatype","string")

 


thijsknapen
Contributor
Forum|alt.badge.img+10
  • Author
  • Contributor
  • August 2, 2023
ebygomm wrote:

Not sure how reliable this method is, it works for your test data.

 

In FME, copy the attribute to a new value, use the AttributeEncoder with Incoming Attribute parameter set to "Use Bytes", tester to check if the encoded attribute is different from the original attribute

image 

Python

import fme
import fmeobjects
 
def FeatureProcessor(feature):
    data = feature.getAttribute("text")
    try:
        data = data.decode()
        feature.setAttribute("datatype","bytes")
    except (UnicodeDecodeError,AttributeError):
        feature.setAttribute("datatype","string")

 

Hi @ebygomm​ ,

 

Bit late, but thanks for the reply! That's a creative solution that will definitly work in most cases.

That said, in my usecase I am a bit hesitant to clone the attribute, as the encoded attributes (the bytes), can be quite sizeable (your Python solution may help there).

 

Nothing to do with your solution, but I still feel it's quite odd that the Feature Information window the data type of the attributes, whereas it's not possible to fetch/use that information in Workbench.

If for instance I would have the same value '48656C6C6F', but once as 'bytes' and once as 'string: UTF-8', it seems that they are indistinguishable for Transformers/functions in Workbench, whereas in the Feature Information window you can see what is what. I admit this is probably a theoretical case, but wouldn't it be much easier to be able to leverage the information that is seemingly stored on some level by FME?


thijsknapen
Contributor
Forum|alt.badge.img+10
  • Author
  • Contributor
  • August 2, 2023
ebygomm wrote:

Not sure how reliable this method is, it works for your test data.

 

In FME, copy the attribute to a new value, use the AttributeEncoder with Incoming Attribute parameter set to "Use Bytes", tester to check if the encoded attribute is different from the original attribute

image 

Python

import fme
import fmeobjects
 
def FeatureProcessor(feature):
    data = feature.getAttribute("text")
    try:
        data = data.decode()
        feature.setAttribute("datatype","bytes")
    except (UnicodeDecodeError,AttributeError):
        feature.setAttribute("datatype","string")

 

Update, I created the following idea; AC Idea: Formalize 'bytes' as a Data Type (safe.com)


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings