Question

Reading WFS in defined portions


Badge

I have to read all the WFS features, but it can't be sequential read, because server crashes then using this method (feature count is ~3 000 000). I managed to read in all features with 10 different WFS readers and XML filtering by attribute. Is there any simpler/faster way to pull it? For example it is possible to somehow read first 100k features, then request another 100k and so on?


14 replies

Userlevel 4

Try using a FeatureReader in a looping custom transformer. That way you can parametrize the start index of each successive WFS call.

Badge

Try using a FeatureReader in a looping custom transformer. That way you can parametrize the start index of each successive WFS call.

Can you be more specific on this? An example would be helpful.

Userlevel 4

Can you be more specific on this? An example would be helpful.

There have been a few examples posted here on the forums, I believe, but you can also look at this chapter from the FME training manual, which describes all the details:

https://github.com/safesoftware/FMETraining/blob/master/DesktopAdvanced3CustomTransformers/3.11.CustomTransformerLoops.md

Also look in the FME Desktop documentation:

http://docs.safe.com/fme/2019.0/html/FME_Desktop_Documentation/FME_Workbench/Workbench/transformers_custom_looping.htm

 

Userlevel 6
Badge +33

What you are looking for is called Response paging. It needs WFS version 2.0.0 and server side pagination must be enabled to get this to work.

In short, pagination is doing the request in different sub requests to work around the CountDefault Constraint. (Look in the GetCapabilities for "<ows:Constraint name="CountDefault"> ". In this case it is limited to 1000 features a request.)

request 1 ...&SERVICE=WFS&COUNT=1000&STARTINDEX=0'
request 2 ...&SERVICE=WFS&COUNT=1000&STARTINDEX=1000'
request 3 ...&SERVICE=WFS&COUNT=1000&STARTINDEX=2000'
etc...

Nowadays (I think from 2017?) Response Paging is built in FME's WFS FeatureReader. The only thing you need to do is to enter 0 as Start Index to set it to work.

0684Q00000ArJQ0QAN.png

If this does not work other solutions like iterating through the data as @david_r suggests are possible but I would try this first.

Must admit it might not work as expected, I have seen a wide range of implementations of WFS and not always happy with the results.

Badge

Try using a FeatureReader in a looping custom transformer. That way you can parametrize the start index of each successive WFS call.

How can I dynamically pass start index attribute to the reader which has to increase by fixed number (for example: 0,10,20,30...)?

Badge

What you are looking for is called Response paging. It needs WFS version 2.0.0 and server side pagination must be enabled to get this to work.

In short, pagination is doing the request in different sub requests to work around the CountDefault Constraint. (Look in the GetCapabilities for "<ows:Constraint name="CountDefault"> ". In this case it is limited to 1000 features a request.)

request 1 ...&SERVICE=WFS&COUNT=1000&STARTINDEX=0'
request 2 ...&SERVICE=WFS&COUNT=1000&STARTINDEX=1000'
request 3 ...&SERVICE=WFS&COUNT=1000&STARTINDEX=2000'
etc...

Nowadays (I think from 2017?) Response Paging is built in FME's WFS FeatureReader. The only thing you need to do is to enter 0 as Start Index to set it to work.

0684Q00000ArJQ0QAN.png

If this does not work other solutions like iterating through the data as @david_r suggests are possible but I would try this first.

Must admit it might not work as expected, I have seen a wide range of implementations of WFS and not always happy with the results.

Unfortunately, this doesn't work. WFS version is 1.1.0.

Userlevel 4

How can I dynamically pass start index attribute to the reader which has to increase by fixed number (for example: 0,10,20,30...)?

On the screenshot I cannot see any looping components, did you add any? Notably you'll need two outputs (one output and one "goto loop input") as well as two inputs (one input and one loop input).

Badge

Try using a FeatureReader in a looping custom transformer. That way you can parametrize the start index of each successive WFS call.

I've tested if the WFS I'm using is 2.0.0, but it isn't. Actually it is 1.1.0. So I think start index aproach wouldn't work.

Userlevel 6
Badge +33

Unfortunately, this doesn't work. WFS version is 1.1.0.

Ah ouch.

The other way I once did was query by boundingbox, with "&resultType=hits" to get the number of results. If the number of hits is smaller than the CountDefault then use this boundingbox for the request. If the number of hits is bigger than the CountDefault, use a Tiler to split the boundingbox in 4 parts, test again with the smaller parts. For the looping part you need an exported custom transformer. Could not find it and recreated it quickly in 2019.

workspace 

custom 

TestHits.fmx

Wfs.fmwt

It is a bit unfinished but you get the idea. This has it own problems, like diagonal thin long polygons make huge boundingboxes and lots of not needed requests, but you can test if it intersects with the original polygon and dismiss those.

Badge

What you are looking for is called Response paging. It needs WFS version 2.0.0 and server side pagination must be enabled to get this to work.

In short, pagination is doing the request in different sub requests to work around the CountDefault Constraint. (Look in the GetCapabilities for "<ows:Constraint name="CountDefault"> ". In this case it is limited to 1000 features a request.)

request 1 ...&SERVICE=WFS&COUNT=1000&STARTINDEX=0'
request 2 ...&SERVICE=WFS&COUNT=1000&STARTINDEX=1000'
request 3 ...&SERVICE=WFS&COUNT=1000&STARTINDEX=2000'
etc...

Nowadays (I think from 2017?) Response Paging is built in FME's WFS FeatureReader. The only thing you need to do is to enter 0 as Start Index to set it to work.

0684Q00000ArJQ0QAN.png

If this does not work other solutions like iterating through the data as @david_r suggests are possible but I would try this first.

Must admit it might not work as expected, I have seen a wide range of implementations of WFS and not always happy with the results.

It looks way too complicated, than having 10 readers, but thanks for the effort. I managed to reduce reader count to only 5, which I think is reasonable :)

Userlevel 3
Badge +18

Ah ouch.

The other way I once did was query by boundingbox, with "&resultType=hits" to get the number of results. If the number of hits is smaller than the CountDefault then use this boundingbox for the request. If the number of hits is bigger than the CountDefault, use a Tiler to split the boundingbox in 4 parts, test again with the smaller parts. For the looping part you need an exported custom transformer. Could not find it and recreated it quickly in 2019.

workspace 

custom 

TestHits.fmx

Wfs.fmwt

It is a bit unfinished but you get the idea. This has it own problems, like diagonal thin long polygons make huge boundingboxes and lots of not needed requests, but you can test if it intersects with the original polygon and dismiss those.

hi @nielsgerrits, just to say thanks for this very useful and inspiring post, i used it to create my custom wfs-downloader-transformer

Userlevel 6
Badge +33

hi @nielsgerrits, just to say thanks for this very useful and inspiring post, i used it to create my custom wfs-downloader-transformer

Graag gedaan @becchr

Thanks for the feedback, good to know it is useful to someone :)

Badge +8

@nielsgerrits​ I would like to test your testhits.fmx and wfs.fmwt but I can't find them. I have the same problem as described above.

Thanks in advance!

Felipe Verdú

Userlevel 3
Badge +18

@nielsgerrits​ I would like to test your testhits.fmx and wfs.fmwt but I can't find them. I have the same problem as described above.

Thanks in advance!

Felipe Verdú

hi @felipeverdu​ , you can also contact support to add the attachments again to this question (had the same issue before)

Reply