Solved

how to write to many formats in optimized way ?

6 years ago
5 October 2017
13 replies
5 views

f.kemminje
187 replies

user selects one feature class (published parameter) and fme writes to one to six formats(like csv, shape, dgn, acad, xls,mapinfo etc...)

in the fmw i am asking formats to write via published parameter

output_format =dwg,shape, xls like(comma separated)

then i used python prg to split these values create new attribute for each format(yes)

import fme
import fmeobjects
import math
def frq(feature):
    my_list = feature.getAttribute('_list{}')
    ESRISHAPE=''
    XLSXW=''
    CSV=''
    AUTOCAD_OD=''
    AUTOCAD_DWF=''
    FILEGDB=''
    for item in my_list:
        if item=="ESRISHAPE":
            ESRISHAPE="YES"
        if item=="FILEGDB":
            FILEGDB="YES"
        if item=="XLSXW":
            XLSXW="YES"
        if item=="CSV":
            CSV="YES"
        if item=="AUTOCAD_OD":
            AUTOCAD_OD="YES"
        if item=="DWF":
            AUTOCAD_DWF="YES"
    feature.setAttribute("ESRISHAPE", ESRISHAPE)
    feature.setAttribute("XLSXW", XLSXW)
    feature.setAttribute("CSV", CSV)
    feature.setAttribute("AUTOCAD_OD", AUTOCAD_OD)
    feature.setAttribute("AUTOCAD_DWF", AUTOCAD_DWF)
    feature.setAttribute("FILEGDB", FILEGDB)

at the end i am checking these values using multiple test filters and sending to each writes

now my question is

if user wants to write 3 formats then all the features are going to each test filter transformers(6 filters for six formats)

dwg=yes then goes to dwg writer

dgn = yes then goes to dgn writer

........

all the features goes to 6 diffrent feature filter for testing the attrib value

i think , we can write same fmw in more optimized way? my worry is , if i have to write only three formats then why all the features should go to six filters ? this is the huge load for fme right?

can we do something for the optimization?

icon

Best answer by larry 5 October 2017, 17:10

View original

13 replies

Userlevel 2

+12

erik_jan
Contributor
2177 replies
6 years ago
5 October 2017

I used a similar process:

Have a "Choice with alias (multiple)" published parameter with a list of formats.

Then use a Tester transformer to see if the format is in the chosen list (eg parameter contains ESRISHAPE for Shape output etc.)

Then use a dynamic Shape writer to write to Shape files.

Repeat this for each format in the list.

If you allow the user to pick just one format at the time, you can use "Choice with alias" published parameter with a list of formats.instead, followed by the Generic writer.

More information here

+22

jdh
Contributor
1959 replies
6 years ago
5 October 2017

So you are essentially cloning the features 6 times, regardless of how many formats are chosen.

What about simply exploding the format list, and using a featureWriter set to generic format? In the parameters you can set to Output Format to an attribute. All you need to do is ensure that your format names exactly match those used by fme.

larry
173 replies
6 years ago
5 October 2017
Best Answer

In your PythonCaller, I would replace the function by a class and instead of creating six attributes, I would create a copy of the feature with the attribute "FORMAT" set to the required format for each format requested by the user.

Then only one AttributeFilter (on the FORMAT attribute) is required to filter features based on the format requested.

Python code:

import fme
import fmeobjects

class FeatureProcessor(object):
	def __init__(self):
		pass

	def input(self,feature):
		my_list = feature.getAttribute('_list{}')
		for item in my_list:
			if item=="ESRISHAPE":
				outputFeature = feature.clone()
				outputFeature.setAttribute("FORMAT", "ESRISHAPE")
				self.pyoutput(outputFeature)
			if item=="FILEGDB":
				outputFeature = feature.clone()
				outputFeature.setAttribute("FORMAT", "FILEGDB")
				self.pyoutput(outputFeature)
			if item=="XLSXW":
				outputFeature = feature.clone()
				outputFeature.setAttribute("FORMAT", "XLSXW")
				self.pyoutput(outputFeature)
			if item=="CSV":
				outputFeature = feature.clone()
				outputFeature.setAttribute("FORMAT", "CSV")
				self.pyoutput(outputFeature)
			if item=="AUTOCAD_OD":
				outputFeature = feature.clone()
				outputFeature.setAttribute("FORMAT", "AUTOCAD_OD")
				self.pyoutput(outputFeature)
			if item=="DWF":
				outputFeature = feature.clone()
				outputFeature.setAttribute("FORMAT", "AUTOCAD_DWF")
				self.pyoutput(outputFeature)

	def close(self):
		pass

Userlevel 4

+25

Does it look like this?

Notice both Passed/Failed are connected to the next Tester, but only one of them should ever occur.

I think if you do this then you will have only one extra copy. Yes, the data goes to each Tester, but it passes through because the Tester is not group-based. You don't get a copy of the data at every Tester, a copy occurs only when that test is true. So if the number of formats is n, then you'll get n+1 copies at most (and maybe not even that since the final tester has no failed output connected).

I think what might be more important is to be using a FeatureWriter transformer, instead of a writer. Then things will happen in parallel and the data won't be cached at each writer. eg - see Dale's answer to a previous question.

If you really want to make sure you only get n copies of the data, then make a count of the chosen formats and use a Cloner to create that many copies. Then you can use a single TestFilter to separate each cloned set and point it to the correct writer (you'd need to be able to map the clone number to a particular format, but that shouldn't be too hard).

Hope this helps.

@Mark2AtSafe

Hi Mark,

I used testfilter in this way. is this costlier than tester ?

Then only one AttributeFilter (on the FORMAT attribute) is required to filter features based on the format requested.

Python code:

import fme
import fmeobjects

class FeatureProcessor(object):
	def __init__(self):
		pass

	def input(self,feature):
		my_list = feature.getAttribute('_list{}')
		for item in my_list:
			if item=="ESRISHAPE":
				outputFeature = feature.clone()
				outputFeature.setAttribute("FORMAT", "ESRISHAPE")
				self.pyoutput(outputFeature)
			if item=="FILEGDB":
				outputFeature = feature.clone()
				outputFeature.setAttribute("FORMAT", "FILEGDB")
				self.pyoutput(outputFeature)
			if item=="XLSXW":
				outputFeature = feature.clone()
				outputFeature.setAttribute("FORMAT", "XLSXW")
				self.pyoutput(outputFeature)
			if item=="CSV":
				outputFeature = feature.clone()
				outputFeature.setAttribute("FORMAT", "CSV")
				self.pyoutput(outputFeature)
			if item=="AUTOCAD_OD":
				outputFeature = feature.clone()
				outputFeature.setAttribute("FORMAT", "AUTOCAD_OD")
				self.pyoutput(outputFeature)
			if item=="DWF":
				outputFeature = feature.clone()
				outputFeature.setAttribute("FORMAT", "AUTOCAD_DWF")
				self.pyoutput(outputFeature)

	def close(self):
		pass

ok i will implement this, I have one question here.

Does this python copy the features. if yes then this is also huge for fme load right?

if there are 1 million records in the source, and user wants to write three formats

then the code will copy 3 million records . is it true?

So you are essentially cloning the features 6 times, regardless of how many formats are chosen.

@jdh, not 6 times ithink, (number of formats) times

if it is two formats then two times

correct me if i am wrong?

i will also try feature writer with Generic

Does it look like this?

Notice both Passed/Failed are connected to the next Tester, but only one of them should ever occur.

Hope this helps.

[Mark said]copy occurs only when that test is true....

but still features are going to each tester(even unwanted tester)

cloner: good choice i think

@Mark2atsafe

@Larry

cloner is faster? or python ( Larry's code ) is Faster ?

which one will execute fast?

Userlevel 4

+25

@Mark2AtSafe

Hi Mark,

I used testfilter in this way. is this costlier than tester ?

It's certainly costlier the way you have it set up, because you are always creating 6 copies of the data. Basically your transformers operate in parallel when you want to have them in series (if that makes sense). It's not a difference between the Tester and TestFilter, just a difference in how the connections are set up. If you connect your TestFilters the same way I connected my Testers then it would help for sure.

Userlevel 4

+25

[Mark said]copy occurs only when that test is true....

but still features are going to each tester(even unwanted tester)

cloner: good choice i think

Yes, features will always go to each Tester, but only one copy of the data and it is passed through (not duplicated). A copy is created only where you actually need the data. In your workspace you are creating a copy whether you need it or not.

ok i will implement this, I have one question here.

Does this python copy the features. if yes then this is also huge for fme load right?

if there are 1 million records in the source, and user wants to write three formats

then the code will copy 3 million records . is it true?

Yes but I don't see any other way to implement this without having a copy of the feature for each required format.

@Mark2atsafe

@Larry

cloner is faster? or python ( Larry's code ) is Faster ?

which one will execute fast?

Cloner should be faster, you'll have to test to see what is the impact.

how to write to many formats in optimized way ?

13 replies

Reply

Community Stats

Reply

Community Stats

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded