Skip to main content
Solved

Loop calls to an OData REST endpoint to download all JSON features


Forum|alt.badge.img

Hi everyone,

I have to download all the features from an OData REST endpoint, but each request has a limit of 100,000 features (I expect there's 8 million or so in the full database).

I've got the process working well with a JSON reader for the first request. But how do I loop so that I don't need 80 JSON readers?

Each time the service responds with a JSON object with its values and a "nextlink" if there are still more features that could be returned. If you then send the same request with a "$skip=100000" query parameter it sends the next 100,000 features.

I've experimented with an HTTPCaller in a custom transformer loop but I'm not familiar enough with FME to test for the existence of the "nextlink" attribute and then increment a skip counter by the number of features returned.

Thanks for your help everyone,

Aiden.

Best answer by david_r

I think you're definitely on the right track with the looping custom transformer.

To extract the nextlink, use the JSONExtractor. The trick will be to find the correct JSON query, but normally it shouldn't be too complicated. Here's an example that assumes that nextlink is a top-level element in the attribute my_json:

After the JSONExtractor, use a Tester to see if URL has a value, if it does you loop with URL for the HTTPCaller. If URL has no value, you've probably reached the end.

Notice that you'll have to insert the logic for how you handle the JSON features somewhere, as it's not included in the screenshot above.

View original
Did this help you find an answer to your question?

8 replies

david_r
Evangelist
  • Best Answer
  • June 17, 2016

I think you're definitely on the right track with the looping custom transformer.

To extract the nextlink, use the JSONExtractor. The trick will be to find the correct JSON query, but normally it shouldn't be too complicated. Here's an example that assumes that nextlink is a top-level element in the attribute my_json:

After the JSONExtractor, use a Tester to see if URL has a value, if it does you loop with URL for the HTTPCaller. If URL has no value, you've probably reached the end.

Notice that you'll have to insert the logic for how you handle the JSON features somewhere, as it's not included in the screenshot above.


Forum|alt.badge.img
  • Author
  • June 20, 2016

Hi David,

Thanks for that answer, you've helped get me 90% of the way there (which I couldn't have done on my own).

I've noticed one strange thing though, I have an odd "off by 1" style problem with the workflow now. The very first read through the loop doesn't pass into the feature extraction logic, please see below.

I'm not sure what could cause the flow to skip the branch on the first run through. Every loop afterwards hits the JSONFragmenter etc.

Any ideas?


david_r
Evangelist
  • June 20, 2016
aidenprice wrote:

Hi David,

Thanks for that answer, you've helped get me 90% of the way there (which I couldn't have done on my own).

I've noticed one strange thing though, I have an odd "off by 1" style problem with the workflow now. The very first read through the loop doesn't pass into the feature extraction logic, please see below.

I'm not sure what could cause the flow to skip the branch on the first run through. Every loop afterwards hits the JSONFragmenter etc.

Any ideas?

That is strange. Try connecting either a Logger or an Inspector to the <Rejected> port and see if something shows up there.


Forum|alt.badge.img
  • Author
  • June 20, 2016
david_r wrote:

That is strange. Try connecting either a Logger or an Inspector to the <Rejected> port and see if something shows up there.

Hi again David,

It isn't passing out of the rejected port so the logger isn't catching anything helpful.

Looking at the returns to the first and subsequent queries they both have the "values" key to the feature array with a slew of features.

I tried wiring the JSONFragmenter upto the end of the JSONExtractor, but that hasn't made a difference either.

I'll try changing the execution order somehow, I think I can do that with FME2016 right?

Thanks,

 

Aiden

Forum|alt.badge.img
  • Author
  • June 20, 2016
david_r wrote:

That is strange. Try connecting either a Logger or an Inspector to the <Rejected> port and see if something shows up there.

Reordering the execution and cutting the connection between the FeatureWriter and the Tester seems to have solved it. I suppose it was short circuiting itself.

Do you have any advice for speeding up the workflow? Would it help if I increased the number of features per database transaction perhaps?

Thanks again.


david_r
Evangelist
  • June 20, 2016
aidenprice wrote:

Reordering the execution and cutting the connection between the FeatureWriter and the Tester seems to have solved it. I suppose it was short circuiting itself.

Do you have any advice for speeding up the workflow? Would it help if I increased the number of features per database transaction perhaps?

Thanks again.

It is possible, but it depends on so many factors that I couldn't tell without trying. If you have a lot of indexes on the table you're writing to, consider also dropping the indexes before loading the data, then re-creating them after.

You are writing a huge amount of features, though, so it will take some time regardless.


Forum|alt.badge.img
  • Author
  • June 20, 2016
david_r wrote:

It is possible, but it depends on so many factors that I couldn't tell without trying. If you have a lot of indexes on the table you're writing to, consider also dropping the indexes before loading the data, then re-creating them after.

You are writing a huge amount of features, though, so it will take some time regardless.

Yeah, I thought about the index problem so I'm actually writing to a temporary table while I'm looping. After I have all the features I drop the main table, copy across the new features and rebuild the index.

Thanks for all your help through this.

 

Aiden.

Forum|alt.badge.img
  • August 29, 2017
aidenprice wrote:

Hi David,

Thanks for that answer, you've helped get me 90% of the way there (which I couldn't have done on my own).

I've noticed one strange thing though, I have an odd "off by 1" style problem with the workflow now. The very first read through the loop doesn't pass into the feature extraction logic, please see below.

I'm not sure what could cause the flow to skip the branch on the first run through. Every loop afterwards hits the JSONFragmenter etc.

Any ideas?

Hi Guys, I am attempting to accomplish the same here but with a different catch: I need the attribute to be brought to the HTTP Caller to be used as a reference for the next all. When I loop, it doesn't bring the attribute to the beginning. Is there a way to do it? Any ideas @david_r?

 


Reply


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings