Skip to main content

Hi,

I'm sorry if there's already a discussion about that, but I havn't found it.

Here's my problem :

I'm requesting a Solr index to fetch some datas, and I'm using an HTTPCaller to do that.

There's more than 10M documents indexed and I can't fetch them with just one Solr request.

So I would like to first, fetch 200 000 datas, from 0 to 200 000, and then fetch 200k more datas, from 200k to 400k etc etc... (I hope I'm clear).

I'm able to pass attributes directly on the Solr url used to fetch those datas. I've also find a way to know dynamicly how many times, I'll have to loop.
But I don't know how to loop, on the caller, and change the variables on it to increment the number and the range of the datas I need to fetch.

So do you know any way, to loop and fetch the datas 200k per 200k ?

Thanks by advance,

Kind regards,

Nicolas

I'm not familiar with Solr, but I'm thinking that 2 Workspaces could solve this, with one Workspace setting the parameters for the batches of 200k then sending these to the second Workspace using the WorkspaceRunner.

What source data do you have to work with? If you are not reading anything that can be used to set the 200k batches, then Counter could be used, possibly with Creator if you are not reading any data at all.


Thanks for your answer @tim_wood.

 

I'm not reading anything else, just working with Solr, and I'm already using a creator to initiate the Httpcaller.

 

Can you give me more details about your idea with counter & creator ?


Creating a custom transformer from the HTTPCaller would allow you to loop over the HTTPCaller multiple times.

The custom transformer would need a start number (default 0), an end number and an increment.

Then create a loop and exit the loop when the end number is reached.


Does it actually need to be a loop? could you not determine the number of batches, clone the trigger that many times and use the copynum to determine the start feature (_copynum*200000) and send them to the HTTPCaller.

You may also wish to use a Decelerator to avoid hammering the service.


Thanks for you answers @eric_jan and @jdh, but could you give me more details, I havn't any formation on FME, I'm learning it from scratch and alone 🙂 (but I'm a software dev so it can help).


Does it actually need to be a loop? could you not determine the number of batches, clone the trigger that many times and use the copynum to determine the start feature (_copynum*200000) and send them to the HTTPCaller.

You may also wish to use a Decelerator to avoid hammering the service.

 

Thanks, I'll try that

Thanks for you answers @eric_jan and @jdh, but could you give me more details, I havn't any formation on FME, I'm learning it from scratch and alone 🙂 (but I'm a software dev so it can help).

see screenshot in my answer. to be more specific we would need the structure of the solr request. I am assuming it has both a start and end feature, but it may have a start and number of features.

 

 


Does it actually need to be a loop? could you not determine the number of batches, clone the trigger that many times and use the copynum to determine the start feature (_copynum*200000) and send them to the HTTPCaller.

You may also wish to use a Decelerator to avoid hammering the service.

 

So thanks a lot jdh , your solution work well (1 hour to fetch more than 2M datas) but I've remove the decelerator.

 

 

 

 

Now I'll try to improve the XML exploder part.

 

 

 


Does it actually need to be a loop? could you not determine the number of batches, clone the trigger that many times and use the copynum to determine the start feature (_copynum*200000) and send them to the HTTPCaller.

You may also wish to use a Decelerator to avoid hammering the service.

Depending on the service being used, there may be a maximum number of requests per time period. The decelerator is to make sure you don't unintentionally use FME to launch a Denial of Service Attack on the service.

 

 


Does it actually need to be a loop? could you not determine the number of batches, clone the trigger that many times and use the copynum to determine the start feature (_copynum*200000) and send them to the HTTPCaller.

You may also wish to use a Decelerator to avoid hammering the service.

It adds a level of complexity, but you could make this a "child" workspace and create a second (master) workspace to call the child with the WorkspaceRunner. That way you could launch up to 8 calls at a time. Each needs to start/stop FME though so you would have to check if it's truly faster (in 2018 there's going to be a new setting to deal with that).

 

 

Of course, a faster workspace means it's even more likely to trigger an issue if the server you are hitting isn't capable of handling traffic at that rate!

 


Reply