Skip to main content

I have a complex workflow using multiple Workspaces. Each reads and/or writes temporal operational data to CSV’s.

My question is, when do you use a Reader/Writer as opposed to a FeatureReader/FeatureWriter?

I’ve settled on using FeatureReaders/FeatureWriters though I’m not sure if make much difference.

I understand the Feature* transformers are geared towards mid-workflow use where you have input and output.

Thanks in advance.

I think this is a personal preference for the most part. For me, when the FeatureReader and FeatureWriter were introduced, the possibilities to run different subprocesses in one bigger workflow were endless. This was addicting and I never used the classic readers anymore. Except for some situations where a FeatureReader did not have the same functions implemented as the classic Reader. Like point cloud reading using a bounding box. (This might be fixed nowadays?) I also find the FeatureReader more intuitive in terms of dataset and feature type selection.

The only advantage I see a classic reader has over the FeatureReader is when you drag a file to the canvas, you get the correct classic reader. And when using a FeatureReader, you always need a Creator to initiate it, if you do not have an initiating feature.


Hey thanks for the input Niels. You mentioned ‘drag a file to the canvas’. I never knew this! Just tried it. Very nice! I learn something new every day.

Cheers!


I agree with @nielsgerrits in that it is largely a matter of personal preference, unless you really want the dynamic options that FeatureReader and  -Writer bring to the mix (i.e. triggering them based on other stuff happening in the workspace).

 

One advantage the readers give you though is that you can more easily control which one is being read first by simply ordering them in the navigator.


Another thing I did run into with the traditional readers was when I was reading in a CSV file. The file contained some ID values that were very long string numbers (0360002622328000087538). The reader converted the values to real64 and not what I needed. With FeatureReader, I was able to manually override the datatype. I think that’s what sent me down this rabbit hole to begin with. :)


Another thing I did run into with the traditional readers was when I was reading in a CSV file. The file contained some ID values that were very long string numbers (0360002622328000087538). The reader converted the values to real64 and not what I needed. With FeatureReader, I was able to manually override the datatype. I think that’s what sent me down this rabbit hole to begin with. :)

Probably a good reason why this attribute type interpretation was introduced in the CSV reader, but I have had my share of issues with it. I don’t like it.

I think you can also fix this with a classic reader, but as I never use them, I do not know how. But for me, this stuff is more intuitive in the FeatureReader.


While I really like the FeatureReaders, I often utilize the default Readers for two main reasons. The first being what @redgeographics mentioned, that it is easier to control read order with them. The second is very use-case specific. Like @nielsgerrits said not all functions are included in both. When I am working with MongoDB, the Reader allows me to easily use JSON Queries however the FeatureReader does not allow this functionality.


One advantage with FeatureWriters are that you can verify the results (what actually got written) by following up with some transformers performing some QA-checks.

This without having to create a new workspace.

For instance: some database-formats have “bulk insert” or similar for writing chunks of records to a database. In some circumstances, the db-schema may accept only some of these features due to specific db-rules.
 


Generally the only time I start a process now with a Reader is if its a legacy process that always started that way! 99% of the time my processes start with a creator and feed into FeatureReaders.

 

Of course, there are some limitations, especially around more advanced functionality (as mentioned in other posts)


I agree with @nielsgerrits in that it is largely a matter of personal preference, unless you really want the dynamic options that FeatureReader and  -Writer bring to the mix (i.e. triggering them based on other stuff happening in the workspace).

 

One advantage the readers give you though is that you can more easily control which one is being read first by simply ordering them in the navigator.

About the advandage of reader order: I only want to read data from second or third featurereaders when I really need it, so if there is no feature to initiate the featurereader, there is no time lost with loading unnecessary data. Depending on type of data and data location, I/O takes the big chunk of the total processing time. In the end it comes down to different specific scenarios. It is nice we have the option to choose what we like :)


My personal preference is to use Readers as much as possible. That way you can identify all sources of the workspace in the Navigator.

FeatureReaders, SQLCreators, DatabaseJoiners and such transformers obscure what sources are used.

Of course, with good annotation of your workspace this should not be too much of a problem.


My personal preference is to use Readers as much as possible. That way you can identify all sources of the workspace in the Navigator.

FeatureReaders, SQLCreators, DatabaseJoiners and such transformers obscure what sources are used.

Of course, with good annotation of your workspace this should not be too much of a problem.

Those are great points! For the use case that I have, the sources are all CSV files. These are working files I’m using to house temporary data. But if the source changes from dev, test, to prod, I like your approach of using Readers/Writers. I’ll definitely add this to my decision making moving forward.

Thanks!


Reply