I was hoping to get your input into how to speed up the following workflow: Read an Excel File with several sheets from a Network Share → Output each sheet to a different output port → Create a CSV file for each output port in a network share.
I already have the above working, the problem is that CSV writer is taking a loooong time writing the files. My connection to the network is pretty good so it is not a matter of latency between my PC and the network share (at least I don’t think so).
Does anyone have any idea on how to improve the writing speed of the CSV files or what to do improve the overall time it takes for the workflow to complete?.
Thanks in advance.
RZ.-
Page 1 / 1
Relatively speaking, networks are really slow.
How’s the comparative performance if you write the CSV to a local folder rather than the network? If that’s much faster, then it’s indeed the network connection that is slowing it down.
Hi,
So I have test it locally and it takes a little over 2 minutes to for the CSV Writers to create the CSV files (it creates 3 different files, one each based on each sheet from the original excel file).
However, when creating the files in the network it takes over 30 mins, that I think is just way to long even if doing so over a network connection. And the files created are about 15MB aprox for the biggest one. I mean I would understand 5 or even 10mins, but over 30mins that seems a bit too much. Any ideas to speed it up?
Thanks again.
RZ.-
Consider writing the CSV files to a local, temporary folder (use the TempPathnameCreator for this) using the FeatureWriter, then moving the resulting files to the network share at the end. That is usually much quicker because of less I/O operations over the network.
Thanks David,
That’s actually what I’m working on right now, requirement just change to have the same files in multiple locations so I was thinking of using FileCopy to create multiple copies of the file once created in a temp folder.
I’m new to FME so I’m not familiar with all the transformers available didn’t knew about this TempPathnameCreator transformer, I’ll play around with it to see if I can make it work.
Do I understand correctly that the temp folder created by this transformer is available only during the execution of the workspace and all content including the file is deleted once the workplace finishes? Thanks again,
RZ.-
Do I understand correctly that the temp folder created by this transformer is available only during the execution of the workspace and all content including the file is deleted once the workplace finishes?
Yes, that’s correct. Of course you could also do it yourself, but you’ll get a lot of functionality for free using this transformer instead.
@david_r
Thanks for the suggestion, unfortunately since I am new to FME I was not really able to make the TempPathnameCreator transformer work. I can put the transformer after the Feature Reader because I’m creating 3 x Outputs from the excel sheets, when I pass all 3 outputs into the TempPathnameCreator it bundles all the 3 x outputs into one, so I can’t create separate CSV Files with the FeatureWriter, that is why I linked it with the creator and pass it down to the FeatureWriter directily.
I then set the Dataset for the FeatureWriter to _pathname for each input in the writer. Later with FileCopy I take the _dataset information (which stores the files location from the summary output port of the writer and create a copy of the files to their final destination in 2 different folders.
But it doesn’t work, the translation log gives me the following error: FeatureWriter (WriterFactory): MULTI_WRITER: No dataset was specified for MULTI_WRITER_DATASET or FeatureWriter_0_DATASET or CSV2_DATASET My guess is that the temporary path is not being passed down so it doesn’t know were to write the first CSV file.
It's good to know: In FME if you connect two transformers to a single port of another transformer you get 2 sets of Features and not one set of Features with both attributes.
If you want to bring the attributes of one Feature to another you could use a FeatureMerger, but in most cases you connect the Transformers in Serie not in Parallel.
Thumbs up for using Creator → FeatureReader and not a Reader.
@jkr_wrk Thanks for the idea, I’m still not able to make it work.
I tried simplifying things by just extracting one sheet (figure I would make it work first for one and then work on making it work for 3), but when putting the TempPathnameCreator before the Reader I get the message in the Writer that the _pathname attribute does not exists.
The _pathname attribute is added to the <Schema> output port in the Reader and not to the “Live” output port, if I try to expose it in the “Live” (via Parameters in the Reader) it gives me an error and doesn’t allow me to set it:
I even tried adding in the Shcema/Data Features option in the Reader:
But still the attribute is not pass to the Writer.
Like I said not much experience with FME so I guess I’m missing something that is so obvious that I cannot see it
Right now the flow looks like this:
If you have any other ideas I would be greatfull.
Regards,
RZ.-
Yes!!, thanks @jkr_wrk that did the trick and it works for each sheet as well.
Thank you both @david_r, @jkr_wrk for your help!
I will mark this post as answered, the final solution to improve the speed of writing the CSV file was:
Change the mindset: Instead of writing each CSV file (in this case 3) to 2 different locations, write the CSV files into a temporary location in the FME Flow server using the TempPathnameCreator (before reading the original excel file).
Once all CSV files are created in the temporary location, copy them to the different destinations.
This approach change the overall processing time from 30 minutes to just under 2 minutes.