Question

Putting in the Second Reader once the First Output is Written


Badge

Hello.

Can we control the flow of the in[ut gdb feature classes so that the first feature class is read first, goes to the complete workspace and the output is written? Then the second feature class is read, goes to the complete workspace and the output is written and so on! Thanks.


16 replies

Badge

You can't control the order of FeatureTypes neither it is possible to run the FeatureTypes iteralively using only one Workspace.

 

What you can do is to publish the "FeatureTypes to Read" Parameter and then starting the workspace FeatureType by FeatureType.

Badge +6

You can't with default readers and writers.

 

But you could use a FeatureWriter transformer to write the first featureclass and then use a FeatureReader transformer to read the second featureclass.

Badge +15

I think there are three options, two are mentioned already.

1. Use FeatureReaders and FeatureWriters instead of Readers and Writers. Then you can control the order of reading and writing.

2. Use a WorkspaceRunner that runs twice. Reading the file based on a Published Parameter.

3. If the file to write is not the file to read you could read all the files with the readers but use a FeatureHolder to hold the further process until you are ready to write the first file. But they will always read all the files one after another. Run complete proces and write all files one after another in this case. That is probably not what you want.

Badge +7

I'd probably go with FeatureReaders and FeatureWriters.

The other option would be a parent Workspace containing 2 WorkspaceRunners, the first to run a Workspace that reads and writes the first Feature Class, the second to run a Workspace that reads the first Feature Class and writes the second.

@ThomasAtAxmann I think if all your Feature Types are coming from the same Reader, they are read in alphabetical order of the Feature Type name. But you could read them with separate Readers, then you can control the order by moving the Readers up or down in the Navigator pane. But it still wouldn't deliver the solution you want.

There's also the FeatureHolder transformer if you want to hold everything until all the data has been processed up to that point.

The other thing to be aware of is potential locking of the GDB layers. I'm not sure what format you're working with but you can sometimes get locking issues when your reading from the same data you're writing to. If using ESRI FGDBs, you might need to test with the Open API Reader/Writer and the ESRI one that requires ArcGIS to be installed. I recently had a deadlocking issue with SQL Server because I was using a WorkspaceRunner with parallel processing so that 4 Workspaces were running concurrently. I'm looking at a few options to resolve this including using SQL to change the database settings before and after the child Workspaces have run.

Badge +7

I think there are three options, two are mentioned already.

1. Use FeatureReaders and FeatureWriters instead of Readers and Writers. Then you can control the order of reading and writing.

2. Use a WorkspaceRunner that runs twice. Reading the file based on a Published Parameter.

3. If the file to write is not the file to read you could read all the files with the readers but use a FeatureHolder to hold the further process until you are ready to write the first file. But they will always read all the files one after another. Run complete proces and write all files one after another in this case. That is probably not what you want.

Great minds think alike!

Badge

@tim_wood: "if all your Feature Types are coming from the same Reader, they are read in alphabetical order of the Feature Type name" - I'm not sure about that. Actually I don't know which FT FME reads first, but I never noticed an alphabetical order.

As I understand, dewan wants that the second FT starts reading, when the first is completely finished with writing. You can use FeatureReader-FeatureWriter-Combination but if there are a lot of FeatureTypes it could be a very looooong chain of those Transformer-combinations connected in series.

I would prever a 2-Workspace-Solution: The first workspace using a SchemaReader and sending the FT to a WorkspaceRunner using the "wait for job to complete"-option and a second workspace which writes the incoming FeatureTypes in fanout or filtering them to the appropriate Writer-FT.

Badge +7

@tim_wood: "if all your Feature Types are coming from the same Reader, they are read in alphabetical order of the Feature Type name" - I'm not sure about that. Actually I don't know which FT FME reads first, but I never noticed an alphabetical order.

As I understand, dewan wants that the second FT starts reading, when the first is completely finished with writing. You can use FeatureReader-FeatureWriter-Combination but if there are a lot of FeatureTypes it could be a very looooong chain of those Transformer-combinations connected in series.

I would prever a 2-Workspace-Solution: The first workspace using a SchemaReader and sending the FT to a WorkspaceRunner using the "wait for job to complete"-option and a second workspace which writes the incoming FeatureTypes in fanout or filtering them to the appropriate Writer-FT.

Maybe you're right. It was alphabetical in the FGDB I tested but that was the order the feature classes were created in.

Separate Workspaces vs FeatureReaders/FeatureWriters will depend on the circumstances. FeatureReaders/Writers can be neater as it's all in the same Workspace. With WorkspaceRunners, it can be easier to catch errors and halt the process if there's a problem e.g. if Workspace 1 fails, don't run Workspace 2.

Badge

@tim_wood: "if all your Feature Types are coming from the same Reader, they are read in alphabetical order of the Feature Type name" - I'm not sure about that. Actually I don't know which FT FME reads first, but I never noticed an alphabetical order.

As I understand, dewan wants that the second FT starts reading, when the first is completely finished with writing. You can use FeatureReader-FeatureWriter-Combination but if there are a lot of FeatureTypes it could be a very looooong chain of those Transformer-combinations connected in series.

I would prever a 2-Workspace-Solution: The first workspace using a SchemaReader and sending the FT to a WorkspaceRunner using the "wait for job to complete"-option and a second workspace which writes the incoming FeatureTypes in fanout or filtering them to the appropriate Writer-FT.

Hi @ThomasAtAxmann and @tim_wood for your suggestions.

 

That is exactly the problem with featurereader since I am having a long list if FTs and also I am not sure if we can control the flow of data one by one.

 

'As I understand, dewan wants that the second FT starts reading, when the first is completely finished with writing.' That is precisely what I want.

I somewhat got your solution about WorkspaceRunner and I had tried it as well. The problem that I am facing is the output of the first workbench(from the WorkspaceRunner) does not populate attributes. Could you please explain a bit more on it? Or if you already have an example workspace, that might help. Thanks.

Userlevel 4
Badge +25

So I have a single workspace that will do this... but it's fairly cluttered. Basically you read the source data, but only one feature per feature class.

That will give you a list of feature classes to read.

Then you sort that list into whatever order you want, and push it into a FeatureReader. That will let you read the data for that feature class.

There are two setups. One is that you filter the list (FeatureTypeFilter) and have a FeatureReader per feature class. The other is that you don't filter the list, and have on FeatureReader that reads only the fme_feature_type class (ie read the class that this feature represents).

The second way is tidier, but it means you can't access the attributes directly.

After reading you process the data and write it with a FeatureWriter. The only restriction is that you can't use a group-based transformer, because it will affect the order of features.

 

I made a short video demonstration here: https://www.screencast.com/t/iKQL5CnZ

 

Really though, the question is why you want to do this. If it's just that the data needs to be written in a certain order, then that can be done much easier. Also information can be passed from one feature class to another (with a FeatureMerger) which again would be WAY easier. For example, rather than "read table 1, write table 1, read table 2 with some info from table 1" you could just "read both tables, pass info from table 1 to table 2, write both tables"

In short, if you can say *why* the work needs to be in this order, perhaps we can suggest a solution that is easier to implement.

Badge +7

So I have a single workspace that will do this... but it's fairly cluttered. Basically you read the source data, but only one feature per feature class.

That will give you a list of feature classes to read.

Then you sort that list into whatever order you want, and push it into a FeatureReader. That will let you read the data for that feature class.

There are two setups. One is that you filter the list (FeatureTypeFilter) and have a FeatureReader per feature class. The other is that you don't filter the list, and have on FeatureReader that reads only the fme_feature_type class (ie read the class that this feature represents).

The second way is tidier, but it means you can't access the attributes directly.

After reading you process the data and write it with a FeatureWriter. The only restriction is that you can't use a group-based transformer, because it will affect the order of features.

 

I made a short video demonstration here: https://www.screencast.com/t/iKQL5CnZ

 

Really though, the question is why you want to do this. If it's just that the data needs to be written in a certain order, then that can be done much easier. Also information can be passed from one feature class to another (with a FeatureMerger) which again would be WAY easier. For example, rather than "read table 1, write table 1, read table 2 with some info from table 1" you could just "read both tables, pass info from table 1 to table 2, write both tables"

In short, if you can say *why* the work needs to be in this order, perhaps we can suggest a solution that is easier to implement.

In the past, I've used different feature classes to break a process which takes a long time into different stages (a bit like Feature Caching). So rather than have a single process that takes 32 hours to run, I would run a 4 hour process, then a 2 hour one, etc.

Now, I tend to have a single SQL Server table which I load the data into, then use SQLExecutor or a Writer with fme_db_operation set to UPDATE in subsequent Workspaces to populate additional fields or do other stuff.

Badge

So I have a single workspace that will do this... but it's fairly cluttered. Basically you read the source data, but only one feature per feature class.

That will give you a list of feature classes to read.

Then you sort that list into whatever order you want, and push it into a FeatureReader. That will let you read the data for that feature class.

There are two setups. One is that you filter the list (FeatureTypeFilter) and have a FeatureReader per feature class. The other is that you don't filter the list, and have on FeatureReader that reads only the fme_feature_type class (ie read the class that this feature represents).

The second way is tidier, but it means you can't access the attributes directly.

After reading you process the data and write it with a FeatureWriter. The only restriction is that you can't use a group-based transformer, because it will affect the order of features.

 

I made a short video demonstration here: https://www.screencast.com/t/iKQL5CnZ

 

Really though, the question is why you want to do this. If it's just that the data needs to be written in a certain order, then that can be done much easier. Also information can be passed from one feature class to another (with a FeatureMerger) which again would be WAY easier. For example, rather than "read table 1, write table 1, read table 2 with some info from table 1" you could just "read both tables, pass info from table 1 to table 2, write both tables"

In short, if you can say *why* the work needs to be in this order, perhaps we can suggest a solution that is easier to implement.

Alright @Mark2AtSafe, maybe I didn't explain why I need this.

So, I have a tabular form of data in the server. I am using SQLCreator(postGIS) to extract the data from the server and then converting it to the spatial form which has the attributes populated as per the requirement.

One other input that I have is the clusters(grids, with no attribute info) of all countries on the basis of priority (P1, P2, P3, and P4) in the form of shapefiles I am downloading from AWS using S3 downloader(downloading zipped file, storing it to local (using WorkspaceRunner for this). There can be two priorities in one country, i.e., either P1 or P2, P2, or P3, P3 or P4. The last priority can sometimes be the boundary of the country. So, when all the regions are run in one go, the boundary information is automatically written as an output for the countries that didn't have the boundary as a priority. And also, for running the workbench on 204 countries, keeping a track of errors and warnings is also a bit complicated.

So, basically, if we need to run the workbench country by country(all priorities in on go), and write the output in the priority folder inside the country folder.

 

Now, we use FeatureReader/Writer, it is too manual to join 204 connectors.

Sharing a basic flow of the data.

Badge

I'd probably go with FeatureReaders and FeatureWriters.

The other option would be a parent Workspace containing 2 WorkspaceRunners, the first to run a Workspace that reads and writes the first Feature Class, the second to run a Workspace that reads the first Feature Class and writes the second.

@ThomasAtAxmann I think if all your Feature Types are coming from the same Reader, they are read in alphabetical order of the Feature Type name. But you could read them with separate Readers, then you can control the order by moving the Readers up or down in the Navigator pane. But it still wouldn't deliver the solution you want.

There's also the FeatureHolder transformer if you want to hold everything until all the data has been processed up to that point.

The other thing to be aware of is potential locking of the GDB layers. I'm not sure what format you're working with but you can sometimes get locking issues when your reading from the same data you're writing to. If using ESRI FGDBs, you might need to test with the Open API Reader/Writer and the ESRI one that requires ArcGIS to be installed. I recently had a deadlocking issue with SQL Server because I was using a WorkspaceRunner with parallel processing so that 4 Workspaces were running concurrently. I'm looking at a few options to resolve this including using SQL to change the database settings before and after the child Workspaces have run.

Hi, @tim_wood My input is the multiple zipped shapefiles(described a bit more in the above comment) which I want to be read one by one, so not an issue of gdb locking.

I have tried using the WorkspaceRunner and checked the data. I think the easiest way to check if the other data is written once the first is complete is one dataset is written completely and then the other starts. Well, in my wb, all the datasets are coming in one go and then the size is increasing. So, I guess, it is running parallel. Is it?

Badge +7

I'd probably go with FeatureReaders and FeatureWriters.

The other option would be a parent Workspace containing 2 WorkspaceRunners, the first to run a Workspace that reads and writes the first Feature Class, the second to run a Workspace that reads the first Feature Class and writes the second.

@ThomasAtAxmann I think if all your Feature Types are coming from the same Reader, they are read in alphabetical order of the Feature Type name. But you could read them with separate Readers, then you can control the order by moving the Readers up or down in the Navigator pane. But it still wouldn't deliver the solution you want.

There's also the FeatureHolder transformer if you want to hold everything until all the data has been processed up to that point.

The other thing to be aware of is potential locking of the GDB layers. I'm not sure what format you're working with but you can sometimes get locking issues when your reading from the same data you're writing to. If using ESRI FGDBs, you might need to test with the Open API Reader/Writer and the ESRI one that requires ArcGIS to be installed. I recently had a deadlocking issue with SQL Server because I was using a WorkspaceRunner with parallel processing so that 4 Workspaces were running concurrently. I'm looking at a few options to resolve this including using SQL to change the database settings before and after the child Workspaces have run.

Would it help to use the "Read Multiple Files/Folders" functionality so you read in *.SHP from *.ZIP similar to the examples below:

Badge

Would it help to use the "Read Multiple Files/Folders" functionality so you read in *.SHP from *.ZIP similar to the examples below:

@tim_wood I think I can read the zip files, but there is an issue with the workspace runner to through the second dataset after the first one is finished. And if I use the FeatureReader/Writer, then connecting 204 datasets would be too manual intervention. Also, the WorkspaceRunner does not carry forward the attributes in the parent workspace. Do you already have an existing wb for this or documentation?

Badge +7

@tim_wood I think I can read the zip files, but there is an issue with the workspace runner to through the second dataset after the first one is finished. And if I use the FeatureReader/Writer, then connecting 204 datasets would be too manual intervention. Also, the WorkspaceRunner does not carry forward the attributes in the parent workspace. Do you already have an existing wb for this or documentation?

The way I've done it before is...

Parent Workspace runs child Workspace 1. Child Workspace 1 reads data and writes intermediate data.

On completion of child Workspace 1, Parent Workspace runs child Workspace 2. Child Workspace 2 reads intermediate data and writes the next output (intermediate or final).

Once you've built and run child Workspace 1 (e.g. using some test data), you can create the Reader for the intermediate data in child Workspace 2 which should pick up all the attributes including the ones added in child Workspace 1.

Userlevel 4
Badge +25

Alright @Mark2AtSafe, maybe I didn't explain why I need this.

So, I have a tabular form of data in the server. I am using SQLCreator(postGIS) to extract the data from the server and then converting it to the spatial form which has the attributes populated as per the requirement.

One other input that I have is the clusters(grids, with no attribute info) of all countries on the basis of priority (P1, P2, P3, and P4) in the form of shapefiles I am downloading from AWS using S3 downloader(downloading zipped file, storing it to local (using WorkspaceRunner for this). There can be two priorities in one country, i.e., either P1 or P2, P2, or P3, P3 or P4. The last priority can sometimes be the boundary of the country. So, when all the regions are run in one go, the boundary information is automatically written as an output for the countries that didn't have the boundary as a priority. And also, for running the workbench on 204 countries, keeping a track of errors and warnings is also a bit complicated.

So, basically, if we need to run the workbench country by country(all priorities in on go), and write the output in the priority folder inside the country folder.

 

Now, we use FeatureReader/Writer, it is too manual to join 204 connectors.

Sharing a basic flow of the data.

OK, my apologies but I made a few assumptions that might not be correct. I thought you were reading from a Geodatabase, and I thought that there is a requirement for table 2 not to start processing until table 1 is fully finished. But I'm not seeing that here. I'm seeing that it needs to be processed separately, but not that it couldn't be processed simultaneously. Correct me if I'm wrong, but the results of Country A don't affect Country B, right?

So ignoring the AWS and SQL complications, what you have is a set of Shape files, and a polygon with attributes that you need to overlay once per set of Shape features.

There are two ways of achieving that.

Firstly you could read all of the Shape data at once, read the polygon feature, and process them in the AreaOnAreaOverlayer using a Group-By setting. You would probably need a polygon feature per shape dataset, with each polygon having a name matching the shape data. That way the group-by will overlay only each shapefile against its polygon; not shapefile on shapefile. Then you need to write the output. You can set up what we call a Fanout to do this. A fanout lets you specify which folder to write to. So you have an attribute with priority, and one with country, and you write to C:\\MyData\\<Country>\\<Priority>\\filename.xxx

This method is fine for small amounts of data, but not if each Shapefile is very large (say 200+ files each 10mb or greater).

The second method would be a separate process for each file. There I would generally have a control workspace with a File/Directory/Path reader to read a list of the Shape files. Then I pass each file name to the worker workspace, which reads the Shapefile supplied and overlays it against the polygon. The WorkspaceRunner works fine there. Then, as before, you can use a Fanout to write the data to the correct folder.

In the second scenario you could also publish/pass in the name of the country to use as a log file name. That way you get a log file per country. Otherwise each time you run the workspace it could overwrite the same log file.

I hope this helps. The AWS and SQL parts certainly make it a little more complicated too. Although we can help here, it might be worth seeing if a local certified FME partner is able to assist. Safe doesn't provide consulting services, but our partners do. Obviously there might be a cost involved, but other aspects would be easier (easier to share your workspaces with them in private, for example). Check out the partners page online at: https://www.safe.com/partners/

Reply