Skip to main content
Question

recursively concatenate json attributes

  • June 14, 2018
  • 4 replies
  • 239 views

ottadini
Supporter
Forum|alt.badge.img+5

I have a response from Azure's OCR service, which looks like this (full response attached):

{
    "language": "en",
    "orientation": "Up",
    "textAngle": 0,
    "regions": [
        {
            "boundingBox": "316,555,1597,123",
            "lines": [
                {
                    "boundingBox": "1515,555,398,29",
                    "words": [
                        {
                            "boundingBox": "1515,555,82,23",
                            "text": "CRMA"
                        },
                        {
                            "boundingBox": "1608,555,154,29",
                            "text": "Completion"
                        },
                        {
                            "boundingBox": "1775,555,138,23",
                            "text": "Document"
                        }
                    ]
                },
                {
                    "boundingBox": "316,632,556,46",
                    "words": [
                        {
                            "boundingBox": "316,632,233,46",
                            "text": "As-built"
                        },
                        {
                            "boundingBox": "570,632,302,46",
                            "text": "Certificate"
                        }
                    ]
                }
            ]
        },

and so on.

I wish to concatenate the text of "words" with spaces, within "lines" separated by a single newline, within "regions" separated by 2 newlines, into a single attribute, so the above snippet would look like 


CRMA Completion Document
As-built Certificate

I'm new to the json transformers and seem to be going around in circles with this. How can I do it?

At the moment I have three JSONFragmenters chained together, then I have three Aggregators chained together (see ) It seems a bit awkward. 

This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

4 replies

daveatsafe
Safer
Forum|alt.badge.img+20
  • Safer
  • June 14, 2018

Hi @ottadini,

I have modified your workspace to use a single JSONFragmenter that flattens the fragment into a list attribute. We can then use FME's list manipulation transformers to rebuild your document.

The modified workspace actually has a few more transformers, but it does preserve the order of the regions and lines when rebuilding.

m-azure-json-to-text.fmw


ottadini
Supporter
Forum|alt.badge.img+5
  • Author
  • Supporter
  • June 15, 2018

Hi @ottadini,

I have modified your workspace to use a single JSONFragmenter that flattens the fragment into a list attribute. We can then use FME's list manipulation transformers to rebuild your document.

The modified workspace actually has a few more transformers, but it does preserve the order of the regions and lines when rebuilding.

m-azure-json-to-text.fmw

This is much nicer, thanks Dave! The sorting issue was something I was wrestling with, and ended up with 3 of them at one stage.

 

 


takashi
Celebrity
  • June 15, 2018

Hi @ottadini,

I have modified your workspace to use a single JSONFragmenter that flattens the fragment into a list attribute. We can then use FME's list manipulation transformers to rebuild your document.

The modified workspace actually has a few more transformers, but it does preserve the order of the regions and lines when rebuilding.

m-azure-json-to-text.fmw

0684Q00000ArKGXQA3.png

Probably you can remove the Counter and the Sorter from @DaveAtSafe's solution, if you set "json_index" (which is given by the JSONFragmenter) to the "Group By" parameter and also set "Yes" to the "Input is Ordered by Group" parameter in the Aggregator.

 

---------

 

FME bundles Zorba to execute XQuery expressions, and Zorba supports JSONiq extension which allows you to manipulate JSON documents through XQuery expressions. So, in an FME workspace, you can use the XMLXQeuryExtractor to execute XQuery expression including JSONiq syntax.

 

Your question can also be solved with a short XQeury expression.

 

----------

 

XQuery Expression:

 

[Edit] "
" is the reference to newline character. See also here. XQuery/Special Characters

 

let $doc := fme:get-json-attribute("azure_response")
let $regions := {
    for $r in jn:members($doc("regions"))
    let $lines := {
        for $ln in jn:members($r("lines"))
        return fn:string-join(jn:members($ln("words"))("text"), " ")
    }
    return fn:string-join($lines, "
")
}
return fn:string-join($regions, "

")

0684Q00000ArMPuQAN.png

 

----------

 

Surprisingly, the JSONTemplater or the XMLTemplater (Write XML Header: No) can also be used to execute the same expression.

ottadini
Supporter
Forum|alt.badge.img+5
  • Author
  • Supporter
  • June 16, 2018

Hi @ottadini,

I have modified your workspace to use a single JSONFragmenter that flattens the fragment into a list attribute. We can then use FME's list manipulation transformers to rebuild your document.

The modified workspace actually has a few more transformers, but it does preserve the order of the regions and lines when rebuilding.

m-azure-json-to-text.fmw

@takashi -- thank you! I imagine that those changes will speed up the process a little.

 

Very impressive to again see the breadth of your knowledge! JSONiq seems very powerful, and something I could use in standalone python as well.