Skip to main content

Say I have the following data;

imagefor which I can see the following information in the Feature Information window;

feature 1; text (bytes): 48656C6C6F
feature 2; text (string: UTF-8): World

Is there a way that I can filter the features on the basis of the indicated data type (i.e. 'bytes' vs 'string (UTF-8)'?

You could use a TestFilter.

If you select "Type Is" as an operator you can filter all strings.

image


You could use a TestFilter.

If you select "Type Is" as an operator you can filter all strings.

image

You could also use "Encodable In" and select "UTF-8" in the options.


You could use a TestFilter.

If you select "Type Is" as an operator you can filter all strings.

image

Hi @lgrie​ 

Thanks for the response. I already tried those two options. Unfortunately this doesn't work (both features pass these tests);

image


Hi @lgrie​ 

Thanks for the response. I already tried those two options. Unfortunately this doesn't work (both features pass these tests);

image

Can you provide a sample dataset ?

 

Does the string attribut contain any numbers ? if not, you could use a RegEx to filter.

 


Hi @lgrie​ 

Thanks for the response. I already tried those two options. Unfortunately this doesn't work (both features pass these tests);

image

Sure, it's now added to the main ticket/question.


Hi @lgrie​ 

Thanks for the response. I already tried those two options. Unfortunately this doesn't work (both features pass these tests);

image

I cannot open it, sorry.


Hi @lgrie​ 

Thanks for the response. I already tried those two options. Unfortunately this doesn't work (both features pass these tests);

image

Hmm, strange. Why not?

Did you use the 'FME Feature Store (FFS)' reader?

If I re-download the file (zipped FFS), I can successfully read/inspect it. (on FME 2022.1.0.0 - Build 22618 - WIN64) 


Not sure how reliable this method is, it works for your test data.

 

In FME, copy the attribute to a new value, use the AttributeEncoder with Incoming Attribute parameter set to "Use Bytes", tester to check if the encoded attribute is different from the original attribute

image 

Python

import fme
import fmeobjects
 
def FeatureProcessor(feature):
    data = feature.getAttribute("text")
    try:
        data = data.decode()
        feature.setAttribute("datatype","bytes")
    except (UnicodeDecodeError,AttributeError):
        feature.setAttribute("datatype","string")

 


Not sure how reliable this method is, it works for your test data.

 

In FME, copy the attribute to a new value, use the AttributeEncoder with Incoming Attribute parameter set to "Use Bytes", tester to check if the encoded attribute is different from the original attribute

image 

Python

import fme
import fmeobjects
 
def FeatureProcessor(feature):
    data = feature.getAttribute("text")
    try:
        data = data.decode()
        feature.setAttribute("datatype","bytes")
    except (UnicodeDecodeError,AttributeError):
        feature.setAttribute("datatype","string")

 

Hi @ebygomm​ ,

 

Bit late, but thanks for the reply! That's a creative solution that will definitly work in most cases.

That said, in my usecase I am a bit hesitant to clone the attribute, as the encoded attributes (the bytes), can be quite sizeable (your Python solution may help there).

 

Nothing to do with your solution, but I still feel it's quite odd that the Feature Information window the data type of the attributes, whereas it's not possible to fetch/use that information in Workbench.

If for instance I would have the same value '48656C6C6F', but once as 'bytes' and once as 'string: UTF-8', it seems that they are indistinguishable for Transformers/functions in Workbench, whereas in the Feature Information window you can see what is what. I admit this is probably a theoretical case, but wouldn't it be much easier to be able to leverage the information that is seemingly stored on some level by FME?


Not sure how reliable this method is, it works for your test data.

 

In FME, copy the attribute to a new value, use the AttributeEncoder with Incoming Attribute parameter set to "Use Bytes", tester to check if the encoded attribute is different from the original attribute

image 

Python

import fme
import fmeobjects
 
def FeatureProcessor(feature):
    data = feature.getAttribute("text")
    try:
        data = data.decode()
        feature.setAttribute("datatype","bytes")
    except (UnicodeDecodeError,AttributeError):
        feature.setAttribute("datatype","string")

 

Update, I created the following idea; AC Idea: Formalize 'bytes' as a Data Type (safe.com)


Reply