Question

recursively concatenate json attributes

5 years ago
14 June 2018
4 replies
23 views

ottadini
25 replies

I have a response from Azure's OCR service, which looks like this (full response attached):

{
    "language": "en",
    "orientation": "Up",
    "textAngle": 0,
    "regions": [
        {
            "boundingBox": "316,555,1597,123",
            "lines": [
                {
                    "boundingBox": "1515,555,398,29",
                    "words": [
                        {
                            "boundingBox": "1515,555,82,23",
                            "text": "CRMA"
                        },
                        {
                            "boundingBox": "1608,555,154,29",
                            "text": "Completion"
                        },
                        {
                            "boundingBox": "1775,555,138,23",
                            "text": "Document"
                        }
                    ]
                },
                {
                    "boundingBox": "316,632,556,46",
                    "words": [
                        {
                            "boundingBox": "316,632,233,46",
                            "text": "As-built"
                        },
                        {
                            "boundingBox": "570,632,302,46",
                            "text": "Certificate"
                        }
                    ]
                }
            ]
        },

and so on.

I wish to concatenate the text of "words" with spaces, within "lines" separated by a single newline, within "regions" separated by 2 newlines, into a single attribute, so the above snippet would look like

CRMA Completion Document
As-built Certificate

I'm new to the json transformers and seem to be going around in circles with this. How can I do it?

At the moment I have three JSONFragmenters chained together, then I have three Aggregators chained together (see ) It seems a bit awkward.

4 replies

Userlevel 2

+17

daveatsafe
Safer
1580 replies
5 years ago
14 June 2018

Hi @ottadini,

I have modified your workspace to use a single JSONFragmenter that flattens the fragment into a list attribute. We can then use FME's list manipulation transformers to rebuild your document.

The modified workspace actually has a few more transformers, but it does preserve the order of the regions and lines when rebuilding.

m-azure-json-to-text.fmw

O

ottadini
Author
25 replies
5 years ago
15 June 2018

Hi @ottadini,

I have modified your workspace to use a single JSONFragmenter that flattens the fragment into a list attribute. We can then use FME's list manipulation transformers to rebuild your document.

The modified workspace actually has a few more transformers, but it does preserve the order of the regions and lines when rebuilding.

m-azure-json-to-text.fmw

This is much nicer, thanks Dave! The sorting issue was something I was wrestling with, and ended up with 3 of them at one stage.

Userlevel 2

+17

takashi
Contributor
7538 replies
5 years ago
15 June 2018

Hi @ottadini,

I have modified your workspace to use a single JSONFragmenter that flattens the fragment into a list attribute. We can then use FME's list manipulation transformers to rebuild your document.

The modified workspace actually has a few more transformers, but it does preserve the order of the regions and lines when rebuilding.

m-azure-json-to-text.fmw

Probably you can remove the Counter and the Sorter from @DaveAtSafe's solution, if you set "json_index" (which is given by the JSONFragmenter) to the "Group By" parameter and also set "Yes" to the "Input is Ordered by Group" parameter in the Aggregator.

---------

FME bundles Zorba to execute XQuery expressions, and Zorba supports JSONiq extension which allows you to manipulate JSON documents through XQuery expressions. So, in an FME workspace, you can use the XMLXQeuryExtractor to execute XQuery expression including JSONiq syntax.

Your question can also be solved with a short XQeury expression.

----------

XQuery Expression:

[Edit] "
" is the reference to newline character. See also here. XQuery/Special Characters

let $doc := fme:get-json-attribute("azure_response")
let $regions := {
    for $r in jn:members($doc("regions"))
    let $lines := {
        for $ln in jn:members($r("lines"))
        return fn:string-join(jn:members($ln("words"))("text"), " ")
    }
    return fn:string-join($lines, "&#10;")
}
return fn:string-join($regions, "&#10;&#10;")

----------

Surprisingly, the JSONTemplater or the XMLTemplater (Write XML Header: No) can also be used to execute the same expression.

O

ottadini
Author
25 replies
5 years ago
16 June 2018

Hi @ottadini,

I have modified your workspace to use a single JSONFragmenter that flattens the fragment into a list attribute. We can then use FME's list manipulation transformers to rebuild your document.

The modified workspace actually has a few more transformers, but it does preserve the order of the regions and lines when rebuilding.

m-azure-json-to-text.fmw

@takashi -- thank you! I imagine that those changes will speed up the process a little.

Very impressive to again see the breadth of your knowledge! JSONiq seems very powerful, and something I could use in standalone python as well.

recursively concatenate json attributes

4 replies

Reply

Community Stats

Reply

Community Stats

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded