Skip to main content

Hello to the whole FME community!

 

I have a very simple and basic question to ask you.

I work with large volumes of hydrographic data, address data, etc. and soon lidar data.

With each treatment / test I perform, my files often duplicate and it becomes easy to get lost in this mountain of data.

 

So I would like to have your opinion on the advantages that I would have to use a database management system (PostgreSQL with the PostGIS extension)

For now, I think that this would allow me to:

 

- better manage the storage and versions of my files

- optimize the transitions between my files and FME (by directly connecting my DB in FME)

- increase the speed of loading and writing my data

- get my data instantly in QGIS (by also connecting my DB in QGIS)

- to decrease the storage space of my files given that they will no longer be stored in shapefile (+ dbf + shx)?

 

Tell me if I'm wrong, and if you have other advantages to list me, I'm interested! :)

 

Thank you !

@theodrogo I think you've answered your own question. You might also add cloud database solutions to your analysis. But if you search for "reasons for implementing a database" you'll pretty well get the same points that you listed.


I can only agree with @markatsafe. FME and PostGIS is a great combination.

An added benefit is that with indexed key fields you can join tables in the database (using the SQLExecutor rather than a regular reader) and avoid using FeatureMergers/Joiners in the workspace, it is usually much, much faster when the datasets grow.


+1 from me, I use FME and PostGIS on a regular basis and the combination works great. Especially if you want to offer easier access to certain subsets from the data a spatial database is a smart move and PostGIS is certainly capable of offering all that.


I've similarly setup and run a PostGIS server and found it relatively straightforward and painless, as long as you are somewhat SQL and enterprise database savvy. It's a relatively well featured and high performance environment.

BUT, for single user, "enterprise-grade" database performance I have generally found I don't need to go the PostGIS route anymore.

That's because in a single user environment, SpatiaLite as an alternative does pretty much everything PostGIS can do, and it is easier to setup and administer being file based and only "runs" when you connect to the single database file. My last SpatiaLite "server" was a single 600GB database file with 1.3 billion features on an SSD with both a spatial index and field indices. In a single user environment, it was faster than any equivalent server I've run like Oracle, PostGIS or SQL Server and it didn't come with having to install and maintain a server running in the background of your PC an enterprise database. Frankly, the speed it ran at with 1.3 billion records was nearly the same as when it was running with 5,000 records. It would happily accept me throwing another 3 billion records at it and it wouldn't care the way it is currently running!

Some advantages are:

  • Not having to run a server in the background on your PC as the host
  • Being a single file: Shifting your "server" to another computer is just copy/paste
  • It is extremely compact.
  • No admin accounts or user accounts to have to setup/configure, and remember passwords for
  • QGIS works pretty well with it.
  • FME reads and writes really fast to it being SQL transaction based
  • Setting up a Command LIne Interface is just grabbing the SpatiaLite.EXE CLI file and running it. Connecting by CLI is literally just running SpatiaLite.EXE "DB filepath". The SpatiaLiteGUI.exe interface application is also not too bad as well.
  • It has a 2 way ODBC driver that works straight off the bat in any ODBC capable software. So far successfully used for read/write in Excel, Access etc.
  • You can run a Test or Development space to trial operations on your data without affecting your production data by simply starting a new, in-memory database and attaching your Production database to it to copy temporary tables across and test in there.
  • It can directly connect to XLS, TXT as "Virtual" tables (either permanently or temporarily) and use these as the source tables for appends to your master tables
  • It has a library of Spatial functions nearly equivalent to PostGIS
  • FME's SQLExector and SQLCreator similarly also work directly with it. Just as for any of the SQL based interactions FME has with it, these run very fast.

The downsides are that there isn't as high a user base to consult with on issues, it does rely on you being pretty SQL aware as to what does and does not make for high performance in large datasets, and it is pickier with you being careful with setting and using indices, but it's amazing to get a server grade system and performance out of a single file.

But on the other hand, if you want to also learn a bit of enterprise server admin work, PostGIS I found pretty fun, and wasn't as painful to get the client PC able to connect to the data as Oracle or SQL Server.


I've similarly setup and run a PostGIS server and found it relatively straightforward and painless, as long as you are somewhat SQL and enterprise database savvy. It's a relatively well featured and high performance environment.

BUT, for single user, "enterprise-grade" database performance I have generally found I don't need to go the PostGIS route anymore.

That's because in a single user environment, SpatiaLite as an alternative does pretty much everything PostGIS can do, and it is easier to setup and administer being file based and only "runs" when you connect to the single database file. My last SpatiaLite "server" was a single 600GB database file with 1.3 billion features on an SSD with both a spatial index and field indices. In a single user environment, it was faster than any equivalent server I've run like Oracle, PostGIS or SQL Server and it didn't come with having to install and maintain a server running in the background of your PC an enterprise database. Frankly, the speed it ran at with 1.3 billion records was nearly the same as when it was running with 5,000 records. It would happily accept me throwing another 3 billion records at it and it wouldn't care the way it is currently running!

Some advantages are:

  • Not having to run a server in the background on your PC as the host
  • Being a single file: Shifting your "server" to another computer is just copy/paste
  • It is extremely compact.
  • No admin accounts or user accounts to have to setup/configure, and remember passwords for
  • QGIS works pretty well with it.
  • FME reads and writes really fast to it being SQL transaction based
  • Setting up a Command LIne Interface is just grabbing the SpatiaLite.EXE CLI file and running it. Connecting by CLI is literally just running SpatiaLite.EXE "DB filepath". The SpatiaLiteGUI.exe interface application is also not too bad as well.
  • It has a 2 way ODBC driver that works straight off the bat in any ODBC capable software. So far successfully used for read/write in Excel, Access etc.
  • You can run a Test or Development space to trial operations on your data without affecting your production data by simply starting a new, in-memory database and attaching your Production database to it to copy temporary tables across and test in there.
  • It can directly connect to XLS, TXT as "Virtual" tables (either permanently or temporarily) and use these as the source tables for appends to your master tables
  • It has a library of Spatial functions nearly equivalent to PostGIS
  • FME's SQLExector and SQLCreator similarly also work directly with it. Just as for any of the SQL based interactions FME has with it, these run very fast.

The downsides are that there isn't as high a user base to consult with on issues, it does rely on you being pretty SQL aware as to what does and does not make for high performance in large datasets, and it is pickier with you being careful with setting and using indices, but it's amazing to get a server grade system and performance out of a single file.

But on the other hand, if you want to also learn a bit of enterprise server admin work, PostGIS I found pretty fun, and wasn't as painful to get the client PC able to connect to the data as Oracle or SQL Server.

That's some very interesting food for thought, thanks for sharing.

I've worked a lot with file-based SQL databases (even long before SQLite existed) and can attest that they can get you some amazing performance if you know what you're doing. However, I've also had some very unfortunate cases of file corruption e.g. when used over unreliable network connections, and that did sour my opinion on them. I'm not saying that is a current problem, but it used to be a problem before if you ventured outside the type of use these single-file databases were meant for.

So for me, actually communicating with a service rather than a file is a good thing, especially if there are multiple users and/or the data is on a different physical machine.

However, for the scenarios where there is only a single user (FME) and everything is located on the same machine, I can easily see the benefits.


That's some very interesting food for thought, thanks for sharing.

I've worked a lot with file-based SQL databases (even long before SQLite existed) and can attest that they can get you some amazing performance if you know what you're doing. However, I've also had some very unfortunate cases of file corruption e.g. when used over unreliable network connections, and that did sour my opinion on them. I'm not saying that is a current problem, but it used to be a problem before if you ventured outside the type of use these single-file databases were meant for.

So for me, actually communicating with a service rather than a file is a good thing, especially if there are multiple users and/or the data is on a different physical machine.

However, for the scenarios where there is only a single user (FME) and everything is located on the same machine, I can easily see the benefits.

I forgot to add SpatiaLite also natively supports both linking to SHP files as a VirtualSHP table ie. Act as Linked tables, or you can just import/append SHP data directly into a native SpatiaLite table, and also go the other way and export it back out to SHP. This can be done in any of the interfaces (ie. Don't need QGIS or FME to import/export SHP data), and also, since ArcGIS 10.2, Spatialite database files are also recognised as a Geodatabase Feature Class container (read-only, although ArcCatalog will also happily create new, blank Spatialite tables and copy whole feature classes across from container to container) so they can also be viewed in ArcMap.

Agree @david_r, when I get to multi-user, remote PC is the server type, need data security/user privilege lockouts/kicking ability, then these are all boxes that PostGiS ticks and where that is probably where I would be looking too.

 

Accessing a SpatiaLite/SQLite file on a remote file server tends to reduce its advantages, particularly performance as it now has to deal with file server comms overheads that aren't necessarily great at DB I/O type network traffic (same issues putting SHP/FGDBs on remote file servers as well). In theory, due to its architecture/rollbacks/checks/journalling it should be reasonably resilient to loss of connection/packet loss etc. so long as it is only a single application/single user only ever touching it at one time, some complications can possibly arise otherwise.

However, if I need to keep the "Master" database on a remote file server, it works better if you run a local "Slave" on your own PC where all the transactions occur across the day and then just say setting a schedule to copy/sync the database file up to refresh/backup the Master, say in the evening. In a single user case, they are the only one who is going to be accessing/changing the data!

 

 


Reply