Skip to main content

I am receiving a JSON file that's around 85 MB. The file contains a large around of elements in a single nesting. 

I want to extract a given value by looking at the neighbor key.  

In the example below it could be that I want to extract the value "2.000" from the section where "field" has the value "Frontside heights interval".

         "field" : "Frontside heights interval",
         "value" : "2.000",

Can this be done using the built in JSON transformers?

{
   "stackId" : "d9a5ba5e2a00407ca52b68f902e43393",
   "departmentId" : "5f4adeebe93b46368e6451f8b43ba914",
   "registrationPosition" : null,
   "measurementPosition" : {
      "longitude" : 9.6567316353411,
      "latitude" : 56.30290238191994,
      "height" : 29.993129805661738,
      "accuracy" : 6.213567568298807
   },
   "stackNumber" : 0,
   "polterId" : null,
   "registrator" : null,
   "surveyor" : "ipadreg@hdlogsystems.com",
   "state" : "Measured",
   "registrationDateTime" : "2023-08-18T07:33:01.735+00:00",
   "measurementDateTime" : "2023-08-18T07:33:01.735+00:00",
   "lastModifiedDateTime" : "2023-08-18T07:33:01.735+00:00",
   "species" : null,
   "quality" : null,
   "assortmentLength" : 11.6,
   "fieldAndValue" : 0
      {
         "field" : "Frontside polygon",
         "value" : 61,
         "valueId" : "",
         "fieldType" : "Integer",
         "fieldId" : "0cd786727bac4c2fbc13de14b5122aac",
         "unit" : null
      },
      {
         "field" : "Frontside heights interval",
         "value" : "2.000",
         "valueId" : "",
         "fieldType" : "Double",
         "fieldId" : "104b46ae9e0944bc875d87e9a4ebd59f",
         "unit" : "Meters"
      },
      {
         "field" : "Tree species",
         "value" : "ABI, Abies sp.: 100, ",
         "valueId" : "",
         "fieldType" : "KeyValuePairStringInt",
         "fieldId" : "15b9f72bba62426cb764bdc3cfda5617",
         "unit" : "Percentages"
      },

 

If you flatten the JSON you can then search the lists for your key and then extract the relevant value

image


If you flatten the JSON you can then search the lists for your key and then extract the relevant value

image

Thank you. That might be how I should do it, but is it correctly understood that it increases memory-consumption? Does the list live as a reference to json elements or is it new data? The memory overhead is what I wanted to avoid using json extractor directly.


@ebygomm​ has a good solution. However, if you feel this could be easier (I do), please upvote this idea for full JsonPath support in FME: https://community.safe.com/s/bridea/a0r4Q00000HbrITQAZ/more-robust-json-query-support

The example I give in the comments would be exactly what you're looking for.


@ebygomm​ has a good solution. However, if you feel this could be easier (I do), please upvote this idea for full JsonPath support in FME: https://community.safe.com/s/bridea/a0r4Q00000HbrITQAZ/more-robust-json-query-support

The example I give in the comments would be exactly what you're looking for.

Will do. JSON (and XML) handling often results in quite a few transformers, and I see more and more json abuse resulting in huge files, so a more powerful integration would be most welcome.


Will do. JSON (and XML) handling often results in quite a few transformers, and I see more and more json abuse resulting in huge files, so a more powerful integration would be most welcome.

I'm finding more and more it's easier to do the json processing in python with the json module than to try and do the same thing in FME, it's often only a few lines.

 

Xquery and JSONiq can also be often do things in a much neater way but I find that's a steeper learning curve.


Will do. JSON (and XML) handling often results in quite a few transformers, and I see more and more json abuse resulting in huge files, so a more powerful integration would be most welcome.

Ugh... Xquery, I tried, but I had to give up. In regards to python I have very little control over the server where the client will run the workspaces, so I'd rather not create new dependencies.


Will do. JSON (and XML) handling often results in quite a few transformers, and I see more and more json abuse resulting in huge files, so a more powerful integration would be most welcome.

The json module is installed with fme, it doesn't require any additional installations


Will do. JSON (and XML) handling often results in quite a few transformers, and I see more and more json abuse resulting in huge files, so a more powerful integration would be most welcome.

@ebygomm​ I completely agree regarding Python for complex JSON handling, but I'd like to do "simple" JSON lookups directly in e.g. the JSONExtractor without having to break out Python for just small things. Serializing (string to dict and back) is also costly if the JSON document is non-trivial, not even mentioning the fmeobjects interface. Safe is (rightly) putting a lot of clout in how FME can connect to any API natively, and I believe stronger JSON query support would be a logical extension to this.


I know there are already some good answers in here.

I just wanted to share that the JSON Structure presented is quite similar to the one I encountered in this question; https://community.safe.com/s/question/0D54Q00009781bISAQ

 

If you would be interested, the JSONiq (kind of like XQuery for JSON) solution is quite simple here;

for $item in jn:members(fme:get-json-attribute("json_attr")("fieldAndValue"))
where $item("field") = "Frontside heights interval"
return $item("value")

You can use the above code in an XMLXQueryExtractor transformer, while setting the 'XML Input' to 'None (File is specified in query)'. Also, as might be evident, this code assumes your JSON is stored in an attribute named 'json_attr'.

 

The benefit of a solution like this one (or an equivalently suggested Python solution), is that you extract just the result you were looking for, whereas with the JSONFlattener, an attribute (or (nested) list attribute) is created for every component of the JSON. For a JSON of 85 Mb, that seems like quite some overhead.

If you're open to a Pythoniq solution, ebygomm posted some nice sample code at the community question I linked above :)


Reply