Skip to main content

Hi everyone,

I have to download all the features from an OData REST endpoint, but each request has a limit of 100,000 features (I expect there's 8 million or so in the full database).

I've got the process working well with a JSON reader for the first request. But how do I loop so that I don't need 80 JSON readers?

Each time the service responds with a JSON object with its values and a "nextlink" if there are still more features that could be returned. If you then send the same request with a "$skip=100000" query parameter it sends the next 100,000 features.

I've experimented with an HTTPCaller in a custom transformer loop but I'm not familiar enough with FME to test for the existence of the "nextlink" attribute and then increment a skip counter by the number of features returned.

Thanks for your help everyone,

Aiden.

I think you're definitely on the right track with the looping custom transformer.

To extract the nextlink, use the JSONExtractor. The trick will be to find the correct JSON query, but normally it shouldn't be too complicated. Here's an example that assumes that nextlink is a top-level element in the attribute my_json:

After the JSONExtractor, use a Tester to see if URL has a value, if it does you loop with URL for the HTTPCaller. If URL has no value, you've probably reached the end.

Notice that you'll have to insert the logic for how you handle the JSON features somewhere, as it's not included in the screenshot above.


Hi David,

Thanks for that answer, you've helped get me 90% of the way there (which I couldn't have done on my own).

I've noticed one strange thing though, I have an odd "off by 1" style problem with the workflow now. The very first read through the loop doesn't pass into the feature extraction logic, please see below.

I'm not sure what could cause the flow to skip the branch on the first run through. Every loop afterwards hits the JSONFragmenter etc.

Any ideas?


Hi David,

Thanks for that answer, you've helped get me 90% of the way there (which I couldn't have done on my own).

I've noticed one strange thing though, I have an odd "off by 1" style problem with the workflow now. The very first read through the loop doesn't pass into the feature extraction logic, please see below.

I'm not sure what could cause the flow to skip the branch on the first run through. Every loop afterwards hits the JSONFragmenter etc.

Any ideas?

That is strange. Try connecting either a Logger or an Inspector to the <Rejected> port and see if something shows up there.


That is strange. Try connecting either a Logger or an Inspector to the <Rejected> port and see if something shows up there.

Hi again David,

It isn't passing out of the rejected port so the logger isn't catching anything helpful.

Looking at the returns to the first and subsequent queries they both have the "values" key to the feature array with a slew of features.

I tried wiring the JSONFragmenter upto the end of the JSONExtractor, but that hasn't made a difference either.

I'll try changing the execution order somehow, I think I can do that with FME2016 right?

Thanks,

 

Aiden

That is strange. Try connecting either a Logger or an Inspector to the <Rejected> port and see if something shows up there.

Reordering the execution and cutting the connection between the FeatureWriter and the Tester seems to have solved it. I suppose it was short circuiting itself.

Do you have any advice for speeding up the workflow? Would it help if I increased the number of features per database transaction perhaps?

Thanks again.


Reordering the execution and cutting the connection between the FeatureWriter and the Tester seems to have solved it. I suppose it was short circuiting itself.

Do you have any advice for speeding up the workflow? Would it help if I increased the number of features per database transaction perhaps?

Thanks again.

It is possible, but it depends on so many factors that I couldn't tell without trying. If you have a lot of indexes on the table you're writing to, consider also dropping the indexes before loading the data, then re-creating them after.

You are writing a huge amount of features, though, so it will take some time regardless.


It is possible, but it depends on so many factors that I couldn't tell without trying. If you have a lot of indexes on the table you're writing to, consider also dropping the indexes before loading the data, then re-creating them after.

You are writing a huge amount of features, though, so it will take some time regardless.

Yeah, I thought about the index problem so I'm actually writing to a temporary table while I'm looping. After I have all the features I drop the main table, copy across the new features and rebuild the index.

Thanks for all your help through this.

 

Aiden.

Hi David,

Thanks for that answer, you've helped get me 90% of the way there (which I couldn't have done on my own).

I've noticed one strange thing though, I have an odd "off by 1" style problem with the workflow now. The very first read through the loop doesn't pass into the feature extraction logic, please see below.

I'm not sure what could cause the flow to skip the branch on the first run through. Every loop afterwards hits the JSONFragmenter etc.

Any ideas?

Hi Guys, I am attempting to accomplish the same here but with a different catch: I need the attribute to be brought to the HTTP Caller to be used as a reference for the next all. When I loop, it doesn't bring the attribute to the beginning. Is there a way to do it? Any ideas @david_r?

 


Reply