Question

Loop on Httpcaller

7 years ago
August 22, 2017
10 replies
236 views

nmeriotdev
12 replies

Hi,

I'm sorry if there's already a discussion about that, but I havn't found it.

Here's my problem :

I'm requesting a Solr index to fetch some datas, and I'm using an HTTPCaller to do that.

There's more than 10M documents indexed and I can't fetch them with just one Solr request.

So I would like to first, fetch 200 000 datas, from 0 to 200 000, and then fetch 200k more datas, from 200k to 400k etc etc... (I hope I'm clear).

I'm able to pass attributes directly on the Solr url used to fetch those datas. I've also find a way to know dynamicly how many times, I'll have to loop. But I don't know how to loop, on the caller, and change the variables on it to increment the number and the range of the datas I need to fetch.

So do you know any way, to loop and fetch the datas 200k per 200k ?

Thanks by advance,

Kind regards,

Nicolas

tim_wood
Contributor
311 replies
7 years ago
August 22, 2017

I'm not familiar with Solr, but I'm thinking that 2 Workspaces could solve this, with one Workspace setting the parameters for the batches of 200k then sending these to the second Workspace using the WorkspaceRunner.

What source data do you have to work with? If you are not reading anything that can be used to set the 200k batches, then Counter could be used, possibly with Creator if you are not reading any data at all.

nmeriotdev
Author
12 replies
7 years ago
August 22, 2017

Thanks for your answer @tim_wood.

I'm not reading anything else, just working with Solr, and I'm already using a creator to initiate the Httpcaller.

Can you give me more details about your idea with counter & creator ?

+18

erik_jan
Contributor
2181 replies
7 years ago
August 22, 2017

Creating a custom transformer from the HTTPCaller would allow you to loop over the HTTPCaller multiple times.

The custom transformer would need a start number (default 0), an end number and an increment.

Then create a loop and exit the loop when the end number is reached.

+28

jdh
Contributor
1981 replies
7 years ago
August 22, 2017

Does it actually need to be a loop? could you not determine the number of batches, clone the trigger that many times and use the copynum to determine the start feature (_copynum*200000) and send them to the HTTPCaller.

You may also wish to use a Decelerator to avoid hammering the service.

nmeriotdev
Author
12 replies
7 years ago
August 22, 2017

Thanks for you answers @eric_jan and @jdh, but could you give me more details, I havn't any formation on FME, I'm learning it from scratch and alone :) (but I'm a software dev so it can help).

nmeriotdev
Author
12 replies
7 years ago
August 22, 2017

jdh wrote:

You may also wish to use a Decelerator to avoid hammering the service.

Thanks, I'll try that

+28

jdh
Contributor
1981 replies
7 years ago
August 22, 2017

nmeriotdev wrote:

Thanks for you answers @eric_jan and @jdh, but could you give me more details, I havn't any formation on FME, I'm learning it from scratch and alone :) (but I'm a software dev so it can help).

see screenshot in my answer. to be more specific we would need the structure of the solr request. I am assuming it has both a start and end feature, but it may have a start and number of features.

nmeriotdev
Author
12 replies
7 years ago
August 23, 2017

jdh wrote:

You may also wish to use a Decelerator to avoid hammering the service.

So thanks a lot jdh , your solution work well (1 hour to fetch more than 2M datas) but I've remove the decelerator.

Now I'll try to improve the XML exploder part.

+28

jdh
Contributor
1981 replies
7 years ago
August 23, 2017

jdh wrote:

You may also wish to use a Decelerator to avoid hammering the service.

Depending on the service being used, there may be a maximum number of requests per time period. The decelerator is to make sure you don't unintentionally use FME to launch a Denial of Service Attack on the service.

+44

mark2atsafe
Safer
2517 replies
7 years ago
August 23, 2017

jdh wrote:

You may also wish to use a Decelerator to avoid hammering the service.

It adds a level of complexity, but you could make this a "child" workspace and create a second (master) workspace to call the child with the WorkspaceRunner. That way you could launch up to 8 calls at a time. Each needs to start/stop FME though so you would have to check if it's truly faster (in 2018 there's going to be a new setting to deal with that).

Of course, a faster workspace means it's even more likely to trigger an issue if the server you are hitting isn't capable of handling traffic at that rate!

Reply

Rich Text Editor, editor1

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

Loop on Httpcaller