Skip to main content
Solved

how to write to many formats in optimized way ?


f.kemminje
Contributor
Forum|alt.badge.img+11

user selects one feature class (published parameter) and fme writes to one to six formats(like csv, shape, dgn, acad, xls,mapinfo etc...)

in the fmw i am asking formats to write via published parameter

output_format =dwg,shape, xls  like(comma separated)

then i used python prg  to split these values create new attribute for each format(yes)

import fme
import fmeobjects
import math
def frq(feature):
    my_list = feature.getAttribute('_list{}')
    ESRISHAPE=''
    XLSXW=''
    CSV=''
    AUTOCAD_OD=''
    AUTOCAD_DWF=''
    FILEGDB=''
    for item in my_list:
        if item=="ESRISHAPE":
            ESRISHAPE="YES"
        if item=="FILEGDB":
            FILEGDB="YES"
        if item=="XLSXW":
            XLSXW="YES"
        if item=="CSV":
            CSV="YES"
        if item=="AUTOCAD_OD":
            AUTOCAD_OD="YES"
        if item=="DWF":
            AUTOCAD_DWF="YES"
    feature.setAttribute("ESRISHAPE", ESRISHAPE)
    feature.setAttribute("XLSXW", XLSXW)
    feature.setAttribute("CSV", CSV)
    feature.setAttribute("AUTOCAD_OD", AUTOCAD_OD)
    feature.setAttribute("AUTOCAD_DWF", AUTOCAD_DWF)
    feature.setAttribute("FILEGDB", FILEGDB)
    
 

at the end i am checking these values using multiple test filters and sending to each writes

now my question is 

if user wants to write 3 formats then all the features are going to each test filter transformers(6 filters for six formats)

dwg=yes then goes to dwg writer

dgn = yes then goes to dgn writer

........

........

all the features goes to 6 diffrent feature filter for testing the attrib value

i think , we can write same fmw in more optimized way? my worry is , if i have to write only three formats then why all the features should go to six filters ? this is the huge load for fme right?

can we do something for the optimization?

Best answer by larry

In your PythonCaller, I would replace the function by a class and instead of creating six attributes, I would create a copy of the feature with the attribute "FORMAT" set to the required format for each format requested by the user.

Then only one AttributeFilter (on the FORMAT attribute) is required to filter features based on the format requested.

0684Q00000ArKkRQAV.png

Python code:

import fme
import fmeobjects

class FeatureProcessor(object):
	def __init__(self):
		pass

	def input(self,feature):
		my_list = feature.getAttribute('_list{}')
		for item in my_list:
			if item=="ESRISHAPE":
				outputFeature = feature.clone()
				outputFeature.setAttribute("FORMAT""ESRISHAPE")
				self.pyoutput(outputFeature)
			if item=="FILEGDB":
				outputFeature = feature.clone()
				outputFeature.setAttribute("FORMAT""FILEGDB")
				self.pyoutput(outputFeature)
			if item=="XLSXW":
				outputFeature = feature.clone()
				outputFeature.setAttribute("FORMAT""XLSXW")
				self.pyoutput(outputFeature)
			if item=="CSV":
				outputFeature = feature.clone()
				outputFeature.setAttribute("FORMAT""CSV")
				self.pyoutput(outputFeature)
			if item=="AUTOCAD_OD":
				outputFeature = feature.clone()
				outputFeature.setAttribute("FORMAT""AUTOCAD_OD")
				self.pyoutput(outputFeature)
			if item=="DWF":
				outputFeature = feature.clone()
				outputFeature.setAttribute("FORMAT""AUTOCAD_DWF")
				self.pyoutput(outputFeature)

	def close(self):
		pass

View original
Did this help you find an answer to your question?

13 replies

erik_jan
Contributor
Forum|alt.badge.img+19
  • Contributor
  • October 5, 2017

I used a similar process:

Have a "Choice with alias (multiple)" published parameter with a list of formats.

Then use a Tester transformer to see if the format is in the chosen list (eg parameter contains ESRISHAPE for Shape output etc.)

Then use a dynamic Shape writer to write to Shape files.

Repeat this for each format in the list.

If you allow the user to pick just one format at the time, you can use "Choice with alias" published parameter with a list of formats.instead, followed by the Generic writer.

More information here


jdh
Contributor
Forum|alt.badge.img+28
  • Contributor
  • October 5, 2017

So you are essentially cloning the features 6 times, regardless of how many formats are chosen.

 

What about simply exploding the format list, and using a featureWriter set to generic format? In the parameters you can set to Output Format to an attribute. All you need to do is ensure that your format names exactly match those used by fme.


Forum|alt.badge.img
  • Best Answer
  • October 5, 2017

In your PythonCaller, I would replace the function by a class and instead of creating six attributes, I would create a copy of the feature with the attribute "FORMAT" set to the required format for each format requested by the user.

Then only one AttributeFilter (on the FORMAT attribute) is required to filter features based on the format requested.

0684Q00000ArKkRQAV.png

Python code:

import fme
import fmeobjects

class FeatureProcessor(object):
	def __init__(self):
		pass

	def input(self,feature):
		my_list = feature.getAttribute('_list{}')
		for item in my_list:
			if item=="ESRISHAPE":
				outputFeature = feature.clone()
				outputFeature.setAttribute("FORMAT""ESRISHAPE")
				self.pyoutput(outputFeature)
			if item=="FILEGDB":
				outputFeature = feature.clone()
				outputFeature.setAttribute("FORMAT""FILEGDB")
				self.pyoutput(outputFeature)
			if item=="XLSXW":
				outputFeature = feature.clone()
				outputFeature.setAttribute("FORMAT""XLSXW")
				self.pyoutput(outputFeature)
			if item=="CSV":
				outputFeature = feature.clone()
				outputFeature.setAttribute("FORMAT""CSV")
				self.pyoutput(outputFeature)
			if item=="AUTOCAD_OD":
				outputFeature = feature.clone()
				outputFeature.setAttribute("FORMAT""AUTOCAD_OD")
				self.pyoutput(outputFeature)
			if item=="DWF":
				outputFeature = feature.clone()
				outputFeature.setAttribute("FORMAT""AUTOCAD_DWF")
				self.pyoutput(outputFeature)

	def close(self):
		pass


mark2atsafe
Safer
Forum|alt.badge.img+48
  • Safer
  • October 5, 2017

Does it look like this?

Notice both Passed/Failed are connected to the next Tester, but only one of them should ever occur.

I think if you do this then you will have only one extra copy. Yes, the data goes to each Tester, but it passes through because the Tester is not group-based. You don't get a copy of the data at every Tester, a copy occurs only when that test is true. So if the number of formats is n, then you'll get n+1 copies at most (and maybe not even that since the final tester has no failed output connected).

I think what might be more important is to be using a FeatureWriter transformer, instead of a writer. Then things will happen in parallel and the data won't be cached at each writer. eg - see Dale's answer to a previous question.

If you really want to make sure you only get n copies of the data, then make a count of the chosen formats and use a Cloner to create that many copies. Then you can use a single TestFilter to separate each cloned set and point it to the correct writer (you'd need to be able to map the clone number to a particular format, but that shouldn't be too hard).

Hope this helps.


f.kemminje
Contributor
Forum|alt.badge.img+11
  • Author
  • Contributor
  • October 6, 2017

@Mark2AtSafe

Hi Mark,

I used testfilter in this way. is this costlier than tester ?


f.kemminje
Contributor
Forum|alt.badge.img+11
  • Author
  • Contributor
  • October 6, 2017
larry wrote:

In your PythonCaller, I would replace the function by a class and instead of creating six attributes, I would create a copy of the feature with the attribute "FORMAT" set to the required format for each format requested by the user.

Then only one AttributeFilter (on the FORMAT attribute) is required to filter features based on the format requested.

0684Q00000ArKkRQAV.png

Python code:

import fme
import fmeobjects

class FeatureProcessor(object):
	def __init__(self):
		pass

	def input(self,feature):
		my_list = feature.getAttribute('_list{}')
		for item in my_list:
			if item=="ESRISHAPE":
				outputFeature = feature.clone()
				outputFeature.setAttribute("FORMAT""ESRISHAPE")
				self.pyoutput(outputFeature)
			if item=="FILEGDB":
				outputFeature = feature.clone()
				outputFeature.setAttribute("FORMAT""FILEGDB")
				self.pyoutput(outputFeature)
			if item=="XLSXW":
				outputFeature = feature.clone()
				outputFeature.setAttribute("FORMAT""XLSXW")
				self.pyoutput(outputFeature)
			if item=="CSV":
				outputFeature = feature.clone()
				outputFeature.setAttribute("FORMAT""CSV")
				self.pyoutput(outputFeature)
			if item=="AUTOCAD_OD":
				outputFeature = feature.clone()
				outputFeature.setAttribute("FORMAT""AUTOCAD_OD")
				self.pyoutput(outputFeature)
			if item=="DWF":
				outputFeature = feature.clone()
				outputFeature.setAttribute("FORMAT""AUTOCAD_DWF")
				self.pyoutput(outputFeature)

	def close(self):
		pass

 

ok i will implement this, I have one question here. 

 

Does this python copy the features. if yes then this is also huge for fme load right?

 

if there are 1 million records in the source, and user wants to write three formats

 

then the code will copy 3 million records . is it true?

 


f.kemminje
Contributor
Forum|alt.badge.img+11
  • Author
  • Contributor
  • October 6, 2017
jdh wrote:

So you are essentially cloning the features 6 times, regardless of how many formats are chosen.

 

What about simply exploding the format list, and using a featureWriter set to generic format? In the parameters you can set to Output Format to an attribute. All you need to do is ensure that your format names exactly match those used by fme.

@jdh, not 6 times ithink, (number of formats) times

 

if it is two formats then two times

 

 

correct me if i am wrong?

 

i will also try feature writer with Generic

 


f.kemminje
Contributor
Forum|alt.badge.img+11
  • Author
  • Contributor
  • October 6, 2017
mark2atsafe wrote:

Does it look like this?

Notice both Passed/Failed are connected to the next Tester, but only one of them should ever occur.

I think if you do this then you will have only one extra copy. Yes, the data goes to each Tester, but it passes through because the Tester is not group-based. You don't get a copy of the data at every Tester, a copy occurs only when that test is true. So if the number of formats is n, then you'll get n+1 copies at most (and maybe not even that since the final tester has no failed output connected).

I think what might be more important is to be using a FeatureWriter transformer, instead of a writer. Then things will happen in parallel and the data won't be cached at each writer. eg - see Dale's answer to a previous question.

If you really want to make sure you only get n copies of the data, then make a count of the chosen formats and use a Cloner to create that many copies. Then you can use a single TestFilter to separate each cloned set and point it to the correct writer (you'd need to be able to map the clone number to a particular format, but that shouldn't be too hard).

Hope this helps.

[Mark said]copy occurs only when that test is true....

 

but still features are going to each tester(even unwanted tester)

 

 

cloner: good choice i think

 

 


f.kemminje
Contributor
Forum|alt.badge.img+11
  • Author
  • Contributor
  • October 6, 2017

@Mark2atsafe

@Larry

cloner is faster? or python ( Larry's code ) is Faster ?

which one will execute fast?


mark2atsafe
Safer
Forum|alt.badge.img+48
  • Safer
  • October 6, 2017
f.kemminje wrote:

@Mark2AtSafe

Hi Mark,

I used testfilter in this way. is this costlier than tester ?

It's certainly costlier the way you have it set up, because you are always creating 6 copies of the data. Basically your transformers operate in parallel when you want to have them in series (if that makes sense). It's not a difference between the Tester and TestFilter, just a difference in how the connections are set up. If you connect your TestFilters the same way I connected my Testers then it would help for sure.

 


mark2atsafe
Safer
Forum|alt.badge.img+48
  • Safer
  • October 6, 2017
f.kemminje wrote:
[Mark said]copy occurs only when that test is true....

 

but still features are going to each tester(even unwanted tester)

 

 

cloner: good choice i think

 

 

Yes, features will always go to each Tester, but only one copy of the data and it is passed through (not duplicated). A copy is created only where you actually need the data. In your workspace you are creating a copy whether you need it or not.

 


Forum|alt.badge.img
  • October 6, 2017
f.kemminje wrote:

 

ok i will implement this, I have one question here.

 

Does this python copy the features. if yes then this is also huge for fme load right?

 

if there are 1 million records in the source, and user wants to write three formats

 

then the code will copy 3 million records . is it true?

 

Yes but I don't see any other way to implement this without having a copy of the feature for each required format.

 

 


Forum|alt.badge.img
  • October 6, 2017
f.kemminje wrote:

@Mark2atsafe

@Larry

cloner is faster? or python ( Larry's code ) is Faster ?

which one will execute fast?

Cloner should be faster, you'll have to test to see what is the impact.

Reply


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings