Question

Non-spatial WFS to CSV

  • 19 September 2017
  • 8 replies
  • 15 views

Badge

Hi all,

Would anyone be able to suggest a workflow of exporting non-spatial records from wfs service to csv format? It is a big file with property title information (2+ million records), so I was thinking about narrowing the search by specifying Land District (GetFeature?). I did not work much with wfs services so simplified explanation would be great.

Maybe someone from New Zealand with data.linz experience, especially their wfs services?

Regards,

Maksym


8 replies

Badge +16

Hi Maksym

makefeaturetypes.fmw

downloadfeatures.fmw

MakeFeatureTypes creates a Geodatabase table handy for browsing LDS data but primarily useful for creating the picklist of available layers for DownloadFeatures, which does what it says; tables will ignore the bounding box parameter.

Output is Geodatabase.

DownloadFeatures is suitable for embedding in a ModelBuilder tool to pick up map extent, I'll make a sample when I get time.

You will need to edit the URL parameters to insert your API key.

Badge +1

Bruce's two workbenches do work, but for spatial layers only. They do not use CSV and they do not make a subset of non-spatial records which was the question. They also leave all field widths at 2048+ chars which will crash ArcGIS when you try to index these fields.

The problem is that there is no easy query you can do on the non-spatial tables so you have to download the whole table and make a subset using some other related spatial layer afterwards.

Trying to download very large tables with WFS is sure to end badly. My system halts suspiciously at exactly 1 M records for some layers so there is some sort of limit that I could not remove, even though LINZ says there is no limit on their part and it works for them.

For very large tables LINZ provide a changeset mechanism to just download the changes to merge into an existing table that has been downloaded previously using the zip-and-ship interface. This is quite a bit more complex but much faster in the end.

My solution is to use the HttpCaller transformer to download the changesets to a CSV file and then update a geodatabase with a predefined schema. The GML format is no better at defining field widths than CSV, both give you the field names. The advantage of CSV is that it is much more compact, human readable and flexible.

Even CSV spatial layers are able to be converted into featureclasses because the shape field is WKT format which can be transformed.

Web services are unreliable. Often you get a failure to connect, or you get an interruption. It is helpful to check if you have got the expected number of records and allow for a restart if it is not correct before you plough on and corrupt your database.lds-bde-sqlite-changeset-sample.fmw

Badge +1

Bruce's two workbenches do work, but for spatial layers only. They do not use CSV and they do not make a subset of non-spatial records which was the question. They also leave all field widths at 2048+ chars which will crash ArcGIS when you try to index these fields.

The problem is that there is no easy query you can do on the non-spatial tables so you have to download the whole table and make a subset using some other related spatial layer afterwards.

Trying to download very large tables with WFS is sure to end badly. My system halts suspiciously at exactly 1 M records for some layers so there is some sort of limit that I could not remove, even though LINZ says there is no limit on their part and it works for them.

For very large tables LINZ provide a changeset mechanism to just download the changes to merge into an existing table that has been downloaded previously using the zip-and-ship interface. This is quite a bit more complex but much faster in the end.

My solution is to use the HttpCaller transformer to download the changesets to a CSV file and then update a geodatabase with a predefined schema. The GML format is no better at defining field widths than CSV, both give you the field names. The advantage of CSV is that it is much more compact, human readable and flexible.

Even CSV spatial layers are able to be converted into featureclasses because the shape field is WKT format which can be transformed.

Web services are unreliable. Often you get a failure to connect, or you get an interruption. It is helpful to check if you have got the expected number of records and allow for a restart if it is not correct before you plough on and corrupt your database.lds-bde-sqlite-changeset-sample.fmw

Another solution is to use an equivalent simplified layer that LINZ provide for common usage. For example the entire titles database has no spatial layers at all! But there are relation tables that link to survey layers and LINZ provide a dissolved parcel-title (spatial) layer that you could use a spatial query. Note that you have to use a HTTPCaller, not a WFS reader because the extensions use a TLA clip are not OGC standard filter format so that they will not work. You have to assemble a URL yourself.

 

 

If you use a WFS FeatureReader (instead of a Reader) then you can use any extent to download a (small!) subset of features.

 

An extent is supported which is usually better than an actual polygon, do a clip or select after you have reduced the features, you may want adjacent parcels.

 

 

Badge

Bruce's two workbenches do work, but for spatial layers only. They do not use CSV and they do not make a subset of non-spatial records which was the question. They also leave all field widths at 2048+ chars which will crash ArcGIS when you try to index these fields.

The problem is that there is no easy query you can do on the non-spatial tables so you have to download the whole table and make a subset using some other related spatial layer afterwards.

Trying to download very large tables with WFS is sure to end badly. My system halts suspiciously at exactly 1 M records for some layers so there is some sort of limit that I could not remove, even though LINZ says there is no limit on their part and it works for them.

For very large tables LINZ provide a changeset mechanism to just download the changes to merge into an existing table that has been downloaded previously using the zip-and-ship interface. This is quite a bit more complex but much faster in the end.

My solution is to use the HttpCaller transformer to download the changesets to a CSV file and then update a geodatabase with a predefined schema. The GML format is no better at defining field widths than CSV, both give you the field names. The advantage of CSV is that it is much more compact, human readable and flexible.

Even CSV spatial layers are able to be converted into featureclasses because the shape field is WKT format which can be transformed.

Web services are unreliable. Often you get a failure to connect, or you get an interruption. It is helpful to check if you have got the expected number of records and allow for a restart if it is not correct before you plough on and corrupt your database.lds-bde-sqlite-changeset-sample.fmw

@kimo thank you for an explanation. I just started looking at work spaces, that @bruceharold attached and realized it works only for spatial wfs services.

 

 

You are correct, there is an option of exporting csv table (700+ MB) or .gdb (nearly 1GB). The idea of this exercise is to have all spatially represented parcels with title number as a reference to match the rest of the information from title memorial list (which is a non-spatial web service).

 

 

At the end I would have all required parcels (around 100 - potential subdivision) with additional information (Joined, Related?), so I could see what parcels are affected by easements etc. I can do that with ArcMap but the problem is how to query out unnecessary titles prior to bringing to ArcMap, so I would have only those that match by certificate of title number.
Badge +1
@kimo thank you for an explanation. I just started looking at work spaces, that @bruceharold attached and realized it works only for spatial wfs services. 

 

 

You are correct, there is an option of exporting csv table (700+ MB) or .gdb (nearly 1GB). The idea of this exercise is to have all spatially represented parcels with title number as a reference to match the rest of the information from title memorial list (which is a non-spatial web service). 

 

 

At the end I would have all required parcels (around 100 - potential subdivision) with additional information (Joined, Related?), so I could see what parcels are affected by easements etc. I can do that with ArcMap but the problem is how to query out unnecessary titles prior to bringing to ArcMap, so I would have only those that match by certificate of title number.
Because the memorial table is so large the best way is to use an SQL query using the title_no as a key. It helps to have it indexed. It is much faster to use a full relational database and the easiest one is sqlite which is supported by FME and ArcGIS (for mobile applications) and built into python. 

 

You would download the memorials as a gpkg, add in your 100 titles and run a workbench which started with an SQLCreator that had something like:

 

 

SELECT * from title_memorial, mytitle WHERE title_memorial.title_no = mytitle.title_memorial AND current = 'T'

 

Another way is to use a python script with a dictionary or set. 

 

 

You will need a relate rather than a join because there will be multiple current memorials for a given title. There is a memorial_type where you could also discard mortgages etc.

 

 

Badge

Bruce's two workbenches do work, but for spatial layers only. They do not use CSV and they do not make a subset of non-spatial records which was the question. They also leave all field widths at 2048+ chars which will crash ArcGIS when you try to index these fields.

The problem is that there is no easy query you can do on the non-spatial tables so you have to download the whole table and make a subset using some other related spatial layer afterwards.

Trying to download very large tables with WFS is sure to end badly. My system halts suspiciously at exactly 1 M records for some layers so there is some sort of limit that I could not remove, even though LINZ says there is no limit on their part and it works for them.

For very large tables LINZ provide a changeset mechanism to just download the changes to merge into an existing table that has been downloaded previously using the zip-and-ship interface. This is quite a bit more complex but much faster in the end.

My solution is to use the HttpCaller transformer to download the changesets to a CSV file and then update a geodatabase with a predefined schema. The GML format is no better at defining field widths than CSV, both give you the field names. The advantage of CSV is that it is much more compact, human readable and flexible.

Even CSV spatial layers are able to be converted into featureclasses because the shape field is WKT format which can be transformed.

Web services are unreliable. Often you get a failure to connect, or you get an interruption. It is helpful to check if you have got the expected number of records and allow for a restart if it is not correct before you plough on and corrupt your database.lds-bde-sqlite-changeset-sample.fmw

thanks @kimo I will give it a go and come back with some results hopefully. I think that I've met you at FME workshop in Auckland two years ago, correct me if I am wrong. Anyway I got lucky to get an answer from a kiwi.

 

Badge +16

Bruce's two workbenches do work, but for spatial layers only. They do not use CSV and they do not make a subset of non-spatial records which was the question. They also leave all field widths at 2048+ chars which will crash ArcGIS when you try to index these fields.

The problem is that there is no easy query you can do on the non-spatial tables so you have to download the whole table and make a subset using some other related spatial layer afterwards.

Trying to download very large tables with WFS is sure to end badly. My system halts suspiciously at exactly 1 M records for some layers so there is some sort of limit that I could not remove, even though LINZ says there is no limit on their part and it works for them.

For very large tables LINZ provide a changeset mechanism to just download the changes to merge into an existing table that has been downloaded previously using the zip-and-ship interface. This is quite a bit more complex but much faster in the end.

My solution is to use the HttpCaller transformer to download the changesets to a CSV file and then update a geodatabase with a predefined schema. The GML format is no better at defining field widths than CSV, both give you the field names. The advantage of CSV is that it is much more compact, human readable and flexible.

Even CSV spatial layers are able to be converted into featureclasses because the shape field is WKT format which can be transformed.

Web services are unreliable. Often you get a failure to connect, or you get an interruption. It is helpful to check if you have got the expected number of records and allow for a restart if it is not correct before you plough on and corrupt your database.lds-bde-sqlite-changeset-sample.fmw

downloadfeatures.fmw will work with non-spatial data, just add a testfilter for layer name contains 'layer' or 'table' and route the table output port through a StringConcatenator that doesn't add BBOX to the GetFeature URL. Any number of features can be downloaded. Text fields are 2048 wide but bytes aren't written to disk in FGDB unless filled.

 

 

Badge

I ended up running workbench with SQLExecutor, which uses required parcels an an initiator. Also I set syntax, that @kimo suggested, exposed attributes and run it through AttributeManager to retrieve duplicated values. On the output I received a table of title_no relations between two sources. It has duplicated values for title_no, as was expected.

Reply