How to identify leading and trailing spaces

I’ve been needing to identify leading and trailing spaces across many attributes, with a significant number of features. I don’t need to trim them, but rather identify their existence.

The only solution I’ve been able to come up with so far is to use AttributeValidators - this way, I can set it to run against the 50 or so fields that need to be validated. I’m using Regex and the rule “^ +” for leading spaces, and a second AttributeValidator with the rule “ +$” (double quotation marks only for readability here, not used in the Rule Configuration) for trailing spaces.

This works, but it is very slow. Is there a better way to do this within FME, without resorting to some Python?

Thanks in advance for any thoughts on alternative approaches!

Page 1 / 1

I’m not sure about speed, but if you use an AttributeExploder to turn it all into key/value pairs you only have to do a regex check on _attr_value and it’ll give you exactly which attributes of which features (if you have a unique identifier) have leading or trailing spaces.

This is a good option because you can easily get the attribute name and table name then too

I think this is a great use case for Python, actually. The AttributeExploder would work, but can be very slow if you have a large number of features/attributes, since you’ll end up creating a significant number of new features leading to a lot of overhead.

Try something like this in the input() method:

trailing_attrs = []
for attr in feature.getAllAttributeNames():
    value = str(feature.getAttribute(attr) or "")
    if value:
        if value.strip() != value:
            trailing_attrs.append(attr)
feature.setAttribute("_leading_trailing{}", trailing_attrs)

This will output a list “_leading_trailing{}” that contains any attribute names with a value that contains either leading or trailing whitespace.

This does not create any new features and keeps the memory consumption to a minimum.

Should be feature.getAllAttributeNames() not feature.getAllAttributes()

You’re right, thanks for the keen eye! Fixed.

Thank you @redgeographics and @david_r for the suggestions, and others for further input.

On using the AttributeExploder: By setting Keep Attributes to Yes, I’m able to retain the information that I need, and having not used this transformer before, it is key to the solution below.

It took my ~180k features to ~9.8M features, but setting the Ignore Attributes Containing to drop a few unnecessary attributes brought that down to ~8M features.

For my purposes, I don’t want single space characters, which qualify as leading and trailing spaces, so a Tester drops those out.

It turns out that the AttributeValidators that I was originally using are apparently malfunctioning - the same transformers return far more results, correctly, on the exploded features, than they do on the intact ones (zero). Same RegEx, same fields. Most unfortunate.

The Trailing Space Attribute Validator is still very, very slow working on the single exploded attribute.

I took @david_r ‘s logic from the Python and applied it in my FME workflow. I may still test a Pythonic method later, but for now, the working FME-centric solution is:

AttributeExploder set to retain attributes, excluding a few unnecessary attributes
Tester where _attr_value != “ “
AttributeManager with a new Output Attribute with Value @Trim(@Value(_attr_value)); AttributeTrimmer transformer would work equally well, but need to retain untrimmed attribute for comparison.
Tester where _attr_value_trim != _attr_value, Comparison Mode: Case Insensitive - when this is set to the default Automatic, it incorrectly excludes dozens of features.
TestFilter outputting All Passing Ports:
1. @Left(@Value(_attr_value),1) = “ “ → Leading Space
2. @Right(@Value(_attr_value),1) = “ → Trailing Space
3. Unfiltered → Carriage returns/line breaks

The above is far more performant, returns ~1400 issues when the original AttributeValidators incorrectly returned none, and further identifies the line breaks, which I hadn’t considered. This is a valuable addition to our QC processes.

In hindsight, this doesn’t seem so complicated, but it took your inputs and numerous iterations to end up with a working process. Hopefully this can help someone else later, as well.

Thank you all!

Community Stats

Latest FME

Community Stats

Latest FME

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded