Skip to main content

I have a response from Azure's OCR service, which looks like this (full response attached):

{
    "language": "en",
    "orientation": "Up",
    "textAngle": 0,
    "regions": l
        {
            "boundingBox": "316,555,1597,123",
            "lines": n
                {
                    "boundingBox": "1515,555,398,29",
                    "words": Â
                        {
                            "boundingBox": "1515,555,82,23",
                            "text": "CRMA"
                        },
                        {
                            "boundingBox": "1608,555,154,29",
                            "text": "Completion"
                        },
                        {
                            "boundingBox": "1775,555,138,23",
                            "text": "Document"
                        }
                    ]
                },
                {
                    "boundingBox": "316,632,556,46",
                    "words": Â
                        {
                            "boundingBox": "316,632,233,46",
                            "text": "As-built"
                        },
                        {
                            "boundingBox": "570,632,302,46",
                            "text": "Certificate"
                        }
                    ]
                }
            ]
        },

and so on.

I wish to concatenate the text of "words" with spaces, within "lines" separated by a single newline, within "regions" separated by 2 newlines, into a single attribute, so the above snippet would look like 


CRMA Completion Document
As-built Certificate

I'm new to the json transformers and seem to be going around in circles with this. How can I do it?

At the moment I have three JSONFragmenters chained together, then I have three Aggregators chained together (see ) It seems a bit awkward. 

Hi @ottadini,

I have modified your workspace to use a single JSONFragmenter that flattens the fragment into a list attribute. We can then use FME's list manipulation transformers to rebuild your document.

The modified workspace actually has a few more transformers, but it does preserve the order of the regions and lines when rebuilding.

m-azure-json-to-text.fmw


Hi @ottadini,

I have modified your workspace to use a single JSONFragmenter that flattens the fragment into a list attribute. We can then use FME's list manipulation transformers to rebuild your document.

The modified workspace actually has a few more transformers, but it does preserve the order of the regions and lines when rebuilding.

m-azure-json-to-text.fmw

This is much nicer, thanks Dave! The sorting issue was something I was wrestling with, and ended up with 3 of them at one stage.

 

 


Hi @ottadini,

I have modified your workspace to use a single JSONFragmenter that flattens the fragment into a list attribute. We can then use FME's list manipulation transformers to rebuild your document.

The modified workspace actually has a few more transformers, but it does preserve the order of the regions and lines when rebuilding.

m-azure-json-to-text.fmw

0684Q00000ArKGXQA3.png

Probably you can remove the Counter and the Sorter from @DaveAtSafe's solution, if you set "json_index" (which is given by the JSONFragmenter) to the "Group By" parameter and also set "Yes" to the "Input is Ordered by Group" parameter in the Aggregator.

 

---------

 

FME bundles Zorba to execute XQuery expressions, and Zorba supports JSONiq extension which allows you to manipulate JSON documents through XQuery expressions. So, in an FME workspace, you can use the XMLXQeuryExtractor to execute XQuery expression including JSONiq syntax.

 

Your question can also be solved with a short XQeury expression.

 

----------

 

XQuery Expression:

 

 Edit] "
" is the reference to newline character. See also here. XQuery/Special Characters

 

let $doc := fme:get-json-attribute("azure_response")
let $regions := {
    for $r in jn:members($doc("regions"))
    let $lines := {
        for $ln in jn:members($r("lines"))
        return fn:string-join(jn:members($ln("words"))("text"), " ")
    }
    return fn:string-join($lines, "
")
}
return fn:string-join($regions, "

")

0684Q00000ArMPuQAN.png

 

----------

 

Surprisingly, the JSONTemplater or the XMLTemplater (Write XML Header: No) can also be used to execute the same expression.

Hi @ottadini,

I have modified your workspace to use a single JSONFragmenter that flattens the fragment into a list attribute. We can then use FME's list manipulation transformers to rebuild your document.

The modified workspace actually has a few more transformers, but it does preserve the order of the regions and lines when rebuilding.

m-azure-json-to-text.fmw

@takashi -- thank you! I imagine that those changes will speed up the process a little.

 

Very impressive to again see the breadth of your knowledge! JSONiq seems very powerful, and something I could use in standalone python as well.

 


Reply