I have a csv file looking as below ATT1 ATT2 ATT3 New Attribute Fri Jun 24 08:01:00 2016 1.46676E+12 A Sat Jun 25 08:01:00 2016 1.46684E+12 A Thu Jul 07 08:01:00 2016 1.46788E+12 A Fri Jul 08 08:01:00 2016 1.46796E+12 A Sat Jul 09 08:01:00 2016 1.46805E+12 A Sun Jul 10 08:01:00 2016 1.46814E+12 B Mon Jul 11 08:01:00 2016 1.46822E+12 B Sat Sep 24 08:01:00 2016 1.4747E+12 B Sun Sep 25 08:01:00 2016 1.47479E+12 B I want to set the "New Attribute" to 'Yes' based on the "ATT3" value of the last row. Or i want to filter the records based on the ATT3 value of the last row.

How to filter a csv file based on the last row in FME

Userlevel 2

+16

erik_jan
Contributor
2180 replies
6 years ago
28 September 2017

AttributeCreator has the option to look a number of rows ahead or back.

Have a look at the Adjacent feature handling:

In a Tester you can then do: If ATTR3 from this feature != ATTR3 from next feature the ATTR4 = "Yes" or use the Conditional value when creating ATTR4

+14

davideagle
Contributor
577 replies
6 years ago
28 September 2017

Or use the Text Reader and set the parameter to read from the bottom, then use the Sampler to sample off the first feature (actually the last).

Userlevel 2

+16

erik_jan
Contributor
2180 replies
6 years ago
28 September 2017

Or use the Text Reader and set the parameter to read from the bottom, then use the Sampler to sample off the first feature (actually the last).

This will get you the last record of the entire dataset, not per ATTR3 value.

Userlevel 2

+17

takashi
Contributor
7538 replies
6 years ago
29 September 2017

Hi @sobanmughal, a pair of VariableSetter and VariableRetriever in conjunction with a FeatureHolder can be used to add the last ATT3 to every feature as an attribute. You can then compare ATT3 with the last ATT3 for each feature.

S

Hi @sobanmughal, a pair of VariableSetter and VariableRetriever in conjunction with a FeatureHolder can be used to add the last ATT3 to every feature as an attribute. You can then compare ATT3 with the last ATT3 for each feature.

Thanks @takashi I have a million files to read, is there any simpler method with less number of transformers?

S

AttributeCreator has the option to look a number of rows ahead or back.

Have a look at the Adjacent feature handling:

In a Tester you can then do: If ATTR3 from this feature != ATTR3 from next feature the ATTR4 = "Yes" or use the Conditional value when creating ATTR4

Thanks @erik_jan can you explain in bit more detail as I am not very proficient in FME

S

Hi @sobanmughal, a pair of VariableSetter and VariableRetriever in conjunction with a FeatureHolder can be used to add the last ATT3 to every feature as an attribute. You can then compare ATT3 with the last ATT3 for each feature.

I am reading multiple files at a time and this will filter all attributes based on the last attribute of the last file.

T

+6

This will get you the last record of the entire dataset, not per ATTR3 value.

But I think it's a good idea, use AttributeSplitter can more easily from the entire line ATT3, if there are many lines of the original CSV, set the text reader's Max Features to Read=1, can only read a line as the last line of the ATT3, such execution efficiency will be higher.

filterbylastrowvalue.fmwt

Userlevel 2

+17

takashi
Contributor
7538 replies
6 years ago
29 September 2017

Hi @sobanmughal, a pair of VariableSetter and VariableRetriever in conjunction with a FeatureHolder can be used to add the last ATT3 to every feature as an attribute. You can then compare ATT3 with the last ATT3 for each feature.

You are right. With the workflow I provided above, only the last ATT3 in the final file will be used when multiple CSV files are processed at once.

In order to process for each file individually, make the variable name unique to each file. For example, expose "fme_dataset" in the CSV reader and use the dataset name as suffix of the variable name, like this.

to be continued.

Userlevel 2

+17

takashi
Contributor
7538 replies
6 years ago
29 September 2017

Hi @sobanmughal, a pair of VariableSetter and VariableRetriever in conjunction with a FeatureHolder can be used to add the last ATT3 to every feature as an attribute. You can then compare ATT3 with the last ATT3 for each feature.

However, if the number of input files was very large, the workflow could consume huge memory space, since the FeatureHolder should hold all input features before starting output.

If there were millions files to be process at once, I would use a PythonCaller with this script. Note that you still need to expose "fme_dataset" in the CSV reader.

# PythonCaller Script Example
class FeatureProcessor(object):
    def __init__(self):
        self.features = []
        self.prevDataset = None
        
    def input(self, feature):
        dataset = feature.getAttribute('fme_dataset')
        if dataset != self.prevDataset:
            self.flush()
            self.prevDataset = dataset
        self.features.append(feature)
        
    def close(self):
        self.flush()
        
    def flush(self):
        if self.features:
            lastAtt3 = self.features[-1].getAttribute('ATT3')
            for feature in self.features:
                if feature.getAttribute('ATT3') == lastAtt3:
                    feature.setAttribute('New Attribute', 'Yes')
                self.pyoutput(feature)
            self.features = []

Userlevel 2

+17

takashi
Contributor
7538 replies
6 years ago
29 September 2017

This will get you the last record of the entire dataset, not per ATTR3 value.

Agree with @taojunabc. In addition, if there are multiple CSV files to be processed at once, you can expose "fme_dataset" in the reader feature types and set it to the Group By parameter in the FeatureMerger.

S

Hi @sobanmughal, a pair of VariableSetter and VariableRetriever in conjunction with a FeatureHolder can be used to add the last ATT3 to every feature as an attribute. You can then compare ATT3 with the last ATT3 for each feature.

Where the Python caller needs to be inserted?

My workspace looks like below.

capture5.png

Userlevel 4

+30

danilo_fme
Evangelist
1880 replies
6 years ago
30 September 2017

Where the Python caller needs to be inserted?

My workspace looks like below.

capture5.png

@sobanmughal, i believe after your Reader CSV

Userlevel 2

+17

takashi
Contributor
7538 replies
6 years ago
30 September 2017

Hi @sobanmughal, a pair of VariableSetter and VariableRetriever in conjunction with a FeatureHolder can be used to add the last ATT3 to every feature as an attribute. You can then compare ATT3 with the last ATT3 for each feature.

You can replace the VariableSetter, FeatureHolder, and VariableRetriever in your workspace with a single PythoCaller containing the script I posted above. The script adds a new attribute called 'New Attribute' which stores 'Yes' if ATT3 of the feature is equal to the ATT3 of the last feature for each file.

Expose the attribute name 'New Attribute' via the 'Attributes to Expose' parameter in the PythonCaller. You can then filter the features output from the PythonCaller by testing whether the 'New Attribute' stores 'Yes' or not.

Userlevel 2

+17

takashi
Contributor
7538 replies
6 years ago
30 September 2017

Hi @sobanmughal, a pair of VariableSetter and VariableRetriever in conjunction with a FeatureHolder can be used to add the last ATT3 to every feature as an attribute. You can then compare ATT3 with the last ATT3 for each feature.

Besides, if all the features whose ATT3 were NOT equal to the last ATT3 can be discarded, you can modify the 'flush' method like this, so that the PythonCaller outputs only features whose ATT3 were equal to the last ATT3, rather than adding 'New Attribute'. If you do so, the subsequent filtering process can also be removed.

    def flush(self):
        if self.features:
            lastAtt3 = self.features[-1].getAttribute('ATT3')
            for feature in self.features:
                if feature.getAttribute('ATT3') == lastAtt3:
                    self.pyoutput(feature)
            self.features = []

+2

Thanks @erik_jan can you explain in bit more detail as I am not very proficient in FME

Hi @sobanmughal, I think takashi and 1spatialDave's suggestions were based on the understanding that you want to use the row at the end of a file to set the new attribute's value. But if by last row, you mean the row above the row you want to set value for, then, @erik_jan has got the right idea for you. The documentation would have some more details, but here is a screenshot on how you'd set this up in the AttributeCreator:

Hope this helps clarify things a bit.

Xiaomeng

Userlevel 4

+13

fmelizard
Contributor
3725 replies
6 years ago
3 October 2017

Given that you're facing huge data volumes, I'd be VERY tempted to read all the data twice.

I'd suggest using @takashi's idea of the variables, but extending to having the data read twice.

The first time (in the first reader you put down), you'd set a Variable whose name was the fme_dataset and whose value was the value of Attr3. When the datasets are exhausted, you've got your variables all set.

The second time (in the second reader) you'd retrieve that Variable and do your test and then act accordingly.

Basically, reading CSV is very cheap (especially in FME 2017 and later). So its better NOT to hold on to lots of data in FME land and instead just read the CSVs twice.

My sample workspace & data are here: lastrecord.fmwt

S

Given that you're facing huge data volumes, I'd be VERY tempted to read all the data twice.

I'd suggest using @takashi's idea of the variables, but extending to having the data read twice.

The first time (in the first reader you put down), you'd set a Variable whose name was the fme_dataset and whose value was the value of Attr3. When the datasets are exhausted, you've got your variables all set.

The second time (in the second reader) you'd retrieve that Variable and do your test and then act accordingly.

Basically, reading CSV is very cheap (especially in FME 2017 and later). So its better NOT to hold on to lots of data in FME land and instead just read the CSVs twice.

My sample workspace & data are here: lastrecord.fmwt

Great suggestion @daleatsafe.

S

However, if the number of input files was very large, the workflow could consume huge memory space, since the FeatureHolder should hold all input features before starting output.

If there were millions files to be process at once, I would use a PythonCaller with this script. Note that you still need to expose "fme_dataset" in the CSV reader.

# PythonCaller Script Example
class FeatureProcessor(object):
    def __init__(self):
        self.features = []
        self.prevDataset = None
        
    def input(self, feature):
        dataset = feature.getAttribute('fme_dataset')
        if dataset != self.prevDataset:
            self.flush()
            self.prevDataset = dataset
        self.features.append(feature)
        
    def close(self):
        self.flush()
        
    def flush(self):
        if self.features:
            lastAtt3 = self.features[-1].getAttribute('ATT3')
            for feature in self.features:
                if feature.getAttribute('ATT3') == lastAtt3:
                    feature.setAttribute('New Attribute', 'Yes')
                self.pyoutput(feature)
            self.features = []

@takashi the script works fine. However, I have come across with another problem where aggregator holds on to all the data before it outputs to the writer. I want to take average for each file and the pass on to the writer.capture6.png

Userlevel 2

+17

takashi
Contributor
7538 replies
6 years ago
5 October 2017

@takashi the script works fine. However, I have come across with another problem where aggregator holds on to all the data before it outputs to the writer. I want to take average for each file and the pass on to the writer.capture6.png

If you are using the Aggregator to perform something for each source file, you can set "Yes" to the "Input is Ordered by Group" parameter in the Aggregator to process for each group.

S

@takashi the script works fine. However, I have come across with another problem where aggregator holds on to all the data before it outputs to the writer. I want to take average for each file and the pass on to the writer.capture6.png

That was easy:)

Thanks!!!!

S

Hi @sobanmughal, a pair of VariableSetter and VariableRetriever in conjunction with a FeatureHolder can be used to add the last ATT3 to every feature as an attribute. You can then compare ATT3 with the last ATT3 for each feature.

This is the final script for PythonCaller Transformer

import fme
import fmeobjects


class FeatureProcessor(object):
    def __init__(self):
        self.features = []
        self.prevDataset = None
        
    def input(self, feature):
        dataset = feature.getAttribute('fme_dataset')
        if dataset != self.prevDataset:
            self.flush()
            self.prevDataset = dataset
        self.features.append(feature)
        
    def close(self):
        self.flush()
        
    def flush(self):
        if self.features:
            lastAtt3 = self.features[-1].getAttribute('Ordered Rate')
            for feature in self.features:
                if feature.getAttribute('Ordered Rate') == lastAtt3:
                    self.pyoutput(feature)
        self.features = []

Userlevel 4

+13

fmelizard
Contributor
3725 replies
6 years ago
5 October 2017

Great suggestion @daleatsafe.

BTW I just wish there was a way to use the Sampler to get the very last record only. That would be even a bit more efficient. I'm pretty tempted to make that change to our codebase I must say...

T

+6

Except for the advice above, I also thought of a approach that uses InlineQuerier.

Of course, this approach is not the most efficient, but provides another way to think about it.

if you have a lot of csv files to read, I think that using WorkspaceRunner would be more efficient than using Group By.

filterbylastrowvalue2.fmwt

Userlevel 1

+4

ulfme
Contributor
20 replies
6 years ago
9 October 2017

Hi!

I wonder if using Pointclouds could have a performance-impact positively/negatively here ?

For instance like the screenshot below: You can get the last value in the text-file with PointCloud-operations. (Don't think you can do the actual filtering without changing the component to some Integer though. Would be cool though.).

Now - this if the case really is the last record in the file.

@daleatsafe , @xiaomengatsafe?

How to filter a csv file based on the last row in FME

28 replies

Reply

Community Stats

Reply

Community Stats

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded