Skip to main content

Not really a question, but feel free to add comments or thoughts below.

We are implementing pre-commit hooks and centralized Python linters in our git workflows, and for this we need to extract all Python code from workspace files pushed to git. Since we do not want to require a full FME installation for this, we’re parsing the .fmw files as text files using Python and extracting all the code blocks within, before passing them on to the linter as if they were stand-alone Python scripts.

As code blocks are encoded in a somewhat non-standard way, I’m sharing a Python function to convert FME encoded strings to regular text, without resorting to any dependencies like an FME installation, (which would allow using FMESession.decodeFromFMEParsableText) or other third-party libraries.

It can decode FME encoded strings on either format

import<space>fme<lf>import<space>fmeobjects<lf>import<space>json<lf><lf><lf>class<space>

or

<opencurly><quote>automation.id<quote>:<quote>839b0966-82ed-4fdf-8539-d95e72edf52e<quote><comma><quote>job.timeSubmitted<quote>...

Here’s the code, it can either be imported as a module or run as a script from the command line:

from xml.sax.saxutils import unescape
import re


def decode_from_fme_parsable_text(encoded: str | None) -> str:
"""
Decodes strings encoded in FME internal format using
proprietary XML-like tags, and possibly also using
encoded tag characters.

:param encoded: encoded input string
:return: decoded input string, empty if input is not a string
"""

if not isinstance(encoded, str):
return ""
else:
decoded = (
unescape(encoded)
.replace("<lt>", "<")
.replace("<gt>", ">")
.replace("<quote>", '"')
.replace("<amp>", "&")
.replace("<backslash>", "\\")
.replace("<solidus>", "/")
.replace("<apos>", "'")
.replace("<dollar>", "$")
.replace("<at>", "@")
.replace("<space>", " ")
.replace("<comma>", ",")
.replace("<openparen>", "(")
.replace("<closeparen>", ")")
.replace("<openbracket>", "t")
.replace("<closebracket>", "]")
.replace("<opencurly>", "{")
.replace("<closecurly>", "}")
.replace("<semicolon>", ";")
.replace("<cr>", "\r")
.replace("<lf>", "\n")
.replace("<tab>", "\t")
.replace("<bell>", "\a")
.replace("<backspace>", "\b")
.replace("<verttab>", "\v")
.replace("<formfeed>", "\f")
)

# Decode extended characters beyond 7-bit ASCII
specials_re = "<u(&0123456789abcdef]{4})>"
specials = re.findall(specials_re, decoded, flags=re.I)
for special in set(specials):
decoded_chr = chr(int(special, 16))
decoded = decoded.replace(f"<u{special}>", decoded_chr)

return decoded


if __name__ == "__main__":
str_encoded = input("FME encoded string: ")
print("Result:")
print(decode_from_fme_parsable_text(str_encoded))

Source: http://docs.safe.com/fme/2013sp1/pdf/FMEQuickTranslator.pdf, pages 48-49.

I’m assuming this hasn’t changed much since 2013, but feel free to correct me :-)

If this is useful to anyone, I’d love your feedback below.

Very useful to document this. I know I looked up my notes because I needed this right after the User Conf in Bonn 😀


Fantastic! Thanks David


Reply