Solved

how to write to many formats in optimized way ?


Badge +3

user selects one feature class (published parameter) and fme writes to one to six formats(like csv, shape, dgn, acad, xls,mapinfo etc...)

in the fmw i am asking formats to write via published parameter

output_format =dwg,shape, xls  like(comma separated)

then i used python prg  to split these values create new attribute for each format(yes)

import fme
import fmeobjects
import math
def frq(feature):
    my_list = feature.getAttribute('_list{}')
    ESRISHAPE=''
    XLSXW=''
    CSV=''
    AUTOCAD_OD=''
    AUTOCAD_DWF=''
    FILEGDB=''
    for item in my_list:
        if item=="ESRISHAPE":
            ESRISHAPE="YES"
        if item=="FILEGDB":
            FILEGDB="YES"
        if item=="XLSXW":
            XLSXW="YES"
        if item=="CSV":
            CSV="YES"
        if item=="AUTOCAD_OD":
            AUTOCAD_OD="YES"
        if item=="DWF":
            AUTOCAD_DWF="YES"
    feature.setAttribute("ESRISHAPE", ESRISHAPE)
    feature.setAttribute("XLSXW", XLSXW)
    feature.setAttribute("CSV", CSV)
    feature.setAttribute("AUTOCAD_OD", AUTOCAD_OD)
    feature.setAttribute("AUTOCAD_DWF", AUTOCAD_DWF)
    feature.setAttribute("FILEGDB", FILEGDB)
    
 

at the end i am checking these values using multiple test filters and sending to each writes

now my question is 

if user wants to write 3 formats then all the features are going to each test filter transformers(6 filters for six formats)

dwg=yes then goes to dwg writer

dgn = yes then goes to dgn writer

........

........

all the features goes to 6 diffrent feature filter for testing the attrib value

i think , we can write same fmw in more optimized way? my worry is , if i have to write only three formats then why all the features should go to six filters ? this is the huge load for fme right?

can we do something for the optimization?

icon

Best answer by larry 5 October 2017, 17:10

View original

13 replies

Userlevel 2
Badge +12

I used a similar process:

Have a "Choice with alias (multiple)" published parameter with a list of formats.

Then use a Tester transformer to see if the format is in the chosen list (eg parameter contains ESRISHAPE for Shape output etc.)

Then use a dynamic Shape writer to write to Shape files.

Repeat this for each format in the list.

If you allow the user to pick just one format at the time, you can use "Choice with alias" published parameter with a list of formats.instead, followed by the Generic writer.

More information here

Badge +22

So you are essentially cloning the features 6 times, regardless of how many formats are chosen.

 

What about simply exploding the format list, and using a featureWriter set to generic format? In the parameters you can set to Output Format to an attribute. All you need to do is ensure that your format names exactly match those used by fme.

Badge

In your PythonCaller, I would replace the function by a class and instead of creating six attributes, I would create a copy of the feature with the attribute "FORMAT" set to the required format for each format requested by the user.

Then only one AttributeFilter (on the FORMAT attribute) is required to filter features based on the format requested.

0684Q00000ArKkRQAV.png

Python code:

import fme
import fmeobjects

class FeatureProcessor(object):
def __init__(self):
pass

def input(self,feature):
my_list = feature.getAttribute('_list{}')
for item in my_list:
if item=="ESRISHAPE":
outputFeature = feature.clone()
outputFeature.setAttribute("FORMAT", "ESRISHAPE")
self.pyoutput(outputFeature)
if item=="FILEGDB":
outputFeature = feature.clone()
outputFeature.setAttribute("FORMAT", "FILEGDB")
self.pyoutput(outputFeature)
if item=="XLSXW":
outputFeature = feature.clone()
outputFeature.setAttribute("FORMAT", "XLSXW")
self.pyoutput(outputFeature)
if item=="CSV":
outputFeature = feature.clone()
outputFeature.setAttribute("FORMAT", "CSV")
self.pyoutput(outputFeature)
if item=="AUTOCAD_OD":
outputFeature = feature.clone()
outputFeature.setAttribute("FORMAT", "AUTOCAD_OD")
self.pyoutput(outputFeature)
if item=="DWF":
outputFeature = feature.clone()
outputFeature.setAttribute("FORMAT", "AUTOCAD_DWF")
self.pyoutput(outputFeature)

def close(self):
pass

Userlevel 4
Badge +25

Does it look like this?

Notice both Passed/Failed are connected to the next Tester, but only one of them should ever occur.

I think if you do this then you will have only one extra copy. Yes, the data goes to each Tester, but it passes through because the Tester is not group-based. You don't get a copy of the data at every Tester, a copy occurs only when that test is true. So if the number of formats is n, then you'll get n+1 copies at most (and maybe not even that since the final tester has no failed output connected).

I think what might be more important is to be using a FeatureWriter transformer, instead of a writer. Then things will happen in parallel and the data won't be cached at each writer. eg - see Dale's answer to a previous question.

If you really want to make sure you only get n copies of the data, then make a count of the chosen formats and use a Cloner to create that many copies. Then you can use a single TestFilter to separate each cloned set and point it to the correct writer (you'd need to be able to map the clone number to a particular format, but that shouldn't be too hard).

Hope this helps.

Badge +3

@Mark2AtSafe

Hi Mark,

I used testfilter in this way. is this costlier than tester ?

Badge +3

In your PythonCaller, I would replace the function by a class and instead of creating six attributes, I would create a copy of the feature with the attribute "FORMAT" set to the required format for each format requested by the user.

Then only one AttributeFilter (on the FORMAT attribute) is required to filter features based on the format requested.

0684Q00000ArKkRQAV.png

Python code:

import fme
import fmeobjects

class FeatureProcessor(object):
def __init__(self):
pass

def input(self,feature):
my_list = feature.getAttribute('_list{}')
for item in my_list:
if item=="ESRISHAPE":
outputFeature = feature.clone()
outputFeature.setAttribute("FORMAT", "ESRISHAPE")
self.pyoutput(outputFeature)
if item=="FILEGDB":
outputFeature = feature.clone()
outputFeature.setAttribute("FORMAT", "FILEGDB")
self.pyoutput(outputFeature)
if item=="XLSXW":
outputFeature = feature.clone()
outputFeature.setAttribute("FORMAT", "XLSXW")
self.pyoutput(outputFeature)
if item=="CSV":
outputFeature = feature.clone()
outputFeature.setAttribute("FORMAT", "CSV")
self.pyoutput(outputFeature)
if item=="AUTOCAD_OD":
outputFeature = feature.clone()
outputFeature.setAttribute("FORMAT", "AUTOCAD_OD")
self.pyoutput(outputFeature)
if item=="DWF":
outputFeature = feature.clone()
outputFeature.setAttribute("FORMAT", "AUTOCAD_DWF")
self.pyoutput(outputFeature)

def close(self):
pass

 

ok i will implement this, I have one question here. 

 

Does this python copy the features. if yes then this is also huge for fme load right?

 

if there are 1 million records in the source, and user wants to write three formats

 

then the code will copy 3 million records . is it true?

 

Badge +3

So you are essentially cloning the features 6 times, regardless of how many formats are chosen.

 

What about simply exploding the format list, and using a featureWriter set to generic format? In the parameters you can set to Output Format to an attribute. All you need to do is ensure that your format names exactly match those used by fme.

@jdh, not 6 times ithink, (number of formats) times

 

if it is two formats then two times

 

 

correct me if i am wrong?

 

i will also try feature writer with Generic

 

Badge +3

Does it look like this?

Notice both Passed/Failed are connected to the next Tester, but only one of them should ever occur.

I think if you do this then you will have only one extra copy. Yes, the data goes to each Tester, but it passes through because the Tester is not group-based. You don't get a copy of the data at every Tester, a copy occurs only when that test is true. So if the number of formats is n, then you'll get n+1 copies at most (and maybe not even that since the final tester has no failed output connected).

I think what might be more important is to be using a FeatureWriter transformer, instead of a writer. Then things will happen in parallel and the data won't be cached at each writer. eg - see Dale's answer to a previous question.

If you really want to make sure you only get n copies of the data, then make a count of the chosen formats and use a Cloner to create that many copies. Then you can use a single TestFilter to separate each cloned set and point it to the correct writer (you'd need to be able to map the clone number to a particular format, but that shouldn't be too hard).

Hope this helps.

[Mark said]copy occurs only when that test is true....

 

but still features are going to each tester(even unwanted tester)

 

 

cloner: good choice i think

 

 

Badge +3

@Mark2atsafe

@Larry

cloner is faster? or python ( Larry's code ) is Faster ?

which one will execute fast?

Userlevel 4
Badge +25

@Mark2AtSafe

Hi Mark,

I used testfilter in this way. is this costlier than tester ?

It's certainly costlier the way you have it set up, because you are always creating 6 copies of the data. Basically your transformers operate in parallel when you want to have them in series (if that makes sense). It's not a difference between the Tester and TestFilter, just a difference in how the connections are set up. If you connect your TestFilters the same way I connected my Testers then it would help for sure.

 

Userlevel 4
Badge +25
[Mark said]copy occurs only when that test is true....

 

but still features are going to each tester(even unwanted tester)

 

 

cloner: good choice i think

 

 

Yes, features will always go to each Tester, but only one copy of the data and it is passed through (not duplicated). A copy is created only where you actually need the data. In your workspace you are creating a copy whether you need it or not.

 

Badge

 

ok i will implement this, I have one question here.

 

Does this python copy the features. if yes then this is also huge for fme load right?

 

if there are 1 million records in the source, and user wants to write three formats

 

then the code will copy 3 million records . is it true?

 

Yes but I don't see any other way to implement this without having a copy of the feature for each required format.

 

 

Badge

@Mark2atsafe

@Larry

cloner is faster? or python ( Larry's code ) is Faster ?

which one will execute fast?

Cloner should be faster, you'll have to test to see what is the impact.

Reply