You could use a TestFilter.
If you select "Type Is" as an operator you can filter all strings.
You could use a TestFilter.
If you select "Type Is" as an operator you can filter all strings.
You could also use "Encodable In" and select "UTF-8" in the options.
You could use a TestFilter.
If you select "Type Is" as an operator you can filter all strings.
Hi @lgrie
Thanks for the response. I already tried those two options. Unfortunately this doesn't work (both features pass these tests);
Hi @lgrie
Thanks for the response. I already tried those two options. Unfortunately this doesn't work (both features pass these tests);
Can you provide a sample dataset ?
Does the string attribut contain any numbers ? if not, you could use a RegEx to filter.
Hi @lgrie
Thanks for the response. I already tried those two options. Unfortunately this doesn't work (both features pass these tests);
Sure, it's now added to the main ticket/question.
Hi @lgrie
Thanks for the response. I already tried those two options. Unfortunately this doesn't work (both features pass these tests);
Hmm, strange. Why not?
Did you use the 'FME Feature Store (FFS)' reader?
If I re-download the file (zipped FFS), I can successfully read/inspect it. (on FME 2022.1.0.0 - Build 22618 - WIN64)
Not sure how reliable this method is, it works for your test data.
In FME, copy the attribute to a new value, use the AttributeEncoder with Incoming Attribute parameter set to "Use Bytes", tester to check if the encoded attribute is different from the original attribute
Python
import fme
import fmeobjects
def FeatureProcessor(feature):
data = feature.getAttribute("text")
try:
data = data.decode()
feature.setAttribute("datatype","bytes")
except (UnicodeDecodeError,AttributeError):
feature.setAttribute("datatype","string")
Not sure how reliable this method is, it works for your test data.
In FME, copy the attribute to a new value, use the AttributeEncoder with Incoming Attribute parameter set to "Use Bytes", tester to check if the encoded attribute is different from the original attribute
Python
import fme
import fmeobjects
def FeatureProcessor(feature):
data = feature.getAttribute("text")
try:
data = data.decode()
feature.setAttribute("datatype","bytes")
except (UnicodeDecodeError,AttributeError):
feature.setAttribute("datatype","string")
Hi @ebygomm ,
Bit late, but thanks for the reply! That's a creative solution that will definitly work in most cases.
That said, in my usecase I am a bit hesitant to clone the attribute, as the encoded attributes (the bytes), can be quite sizeable (your Python solution may help there).
Nothing to do with your solution, but I still feel it's quite odd that the Feature Information window the data type of the attributes, whereas it's not possible to fetch/use that information in Workbench.
If for instance I would have the same value '48656C6C6F', but once as 'bytes' and once as 'string: UTF-8', it seems that they are indistinguishable for Transformers/functions in Workbench, whereas in the Feature Information window you can see what is what. I admit this is probably a theoretical case, but wouldn't it be much easier to be able to leverage the information that is seemingly stored on some level by FME?
Not sure how reliable this method is, it works for your test data.
In FME, copy the attribute to a new value, use the AttributeEncoder with Incoming Attribute parameter set to "Use Bytes", tester to check if the encoded attribute is different from the original attribute
Python
import fme
import fmeobjects
def FeatureProcessor(feature):
data = feature.getAttribute("text")
try:
data = data.decode()
feature.setAttribute("datatype","bytes")
except (UnicodeDecodeError,AttributeError):
feature.setAttribute("datatype","string")
Update, I created the following idea; AC Idea: Formalize 'bytes' as a Data Type (safe.com)