Question

How to convert this to run in FME to pass all excel file types .xlsx out to an attribute. What I want to achieve is to recurse into every folder on my list, recursing into subfolders to identify all .xlsx files and return their file paths.

1 year ago
August 24, 2023
16 replies
47 views

Forum|alt.badge.img

+7

fme
Contributor
31 replies

import os

def get_excel_file_paths(folder_path):

excel_file_paths = []

for root, dirs, files in os.walk(folder_path):

for file in files:

if file.endswith(".xlsx"):

excel_file_paths.append(os.path.join(root, file))

return excel_file_paths

folder_path = "/path/to/my/folder" ----I have a list of folders in a csv file which i read into the workspace so i plan to use that attribute here

excel_paths = get_excel_file_paths(folder_path)

for path in excel_paths:

print(path)

Having troubles converting this to run in fme

Thanks!

Forum|alt.badge.img

+33

ebygomm
Influencer
3267 replies
1 year ago
August 24, 2023

A Directory and File Pathnames reader would be the FME way to achieve this

Forum|alt.badge.img

+8

boydfme
Contributor
24 replies
1 year ago
August 24, 2023

Basic Single Implementation

You can use a path reader to read in a single folder location:

path reader

And then just set the text search for excels (*.xlsx) and say yes to look in subfolders:

When setup you will get attributes for all the file paths and other ancillary information as attributes. This would satisfy looking into 1 folder location with subfolders (if present).

For a recursive approach using your csv:

You could read in the csv and whatever attribute contains the folder directories you need to search (with subfolders) for excels you could link that to a FeatureReader transformer and then populate the settings as above as a PATH reader. To make it dynamic you would just use your csv attributes to populate the base folder path in the FeatureReader settings.

So each "feature" from the csv will enter the FeatureReader and populate the base folder directory where it should look and then find all the .xlsx's.

Forum|alt.badge.img

+7

fme
Author
Contributor
31 replies
1 year ago
August 24, 2023

ebygomm wrote:

A Directory and File Pathnames reader would be the FME way to achieve this

@ebygomm could you please share the workspace in the screenshot? Trying it out now but the feature reader does not seem to give me the ports in your screenshot and also failed.

I did not think the featurereader had directory and filepath readers in it so was looking at more of a python implementation as it needs to go through lots of folders to retrieve the file paths. However, I will try this out and monitor its efficiency. Thanks

Forum|alt.badge.img

+7

fme
Author
Contributor
31 replies
1 year ago
August 24, 2023

boydfme wrote:

Basic Single Implementation

You can use a path reader to read in a single folder location:

path reader

And then just set the text search for excels (*.xlsx) and say yes to look in subfolders:

When setup you will get attributes for all the file paths and other ancillary information as attributes. This would satisfy looking into 1 folder location with subfolders (if present).

For a recursive approach using your csv:

You could read in the csv and whatever attribute contains the folder directories you need to search (with subfolders) for excels you could link that to a FeatureReader transformer and then populate the settings as above as a PATH reader. To make it dynamic you would just use your csv attributes to populate the base folder path in the FeatureReader settings.

So each "feature" from the csv will enter the FeatureReader and populate the base folder directory where it should look and then find all the .xlsx's.

@boydfme Thanks for your response! It runs but fails in the process. Also, there is no output ports for path_windows etc as it is in using the Directory and File Pathnames by itself.

Forum|alt.badge.img

+49

redgeographics
Celebrity
3626 replies
1 year ago
August 24, 2023

@fme you've said, twice, that "it fails". Can you elaborate? Are you getting an error message and if so, which one? How exactly have you set up your workspace?

Forum|alt.badge.img

+7

fme
Author
Contributor
31 replies
1 year ago
August 24, 2023

redgeographics wrote:

@fme you've said, twice, that "it fails". Can you elaborate? Are you getting an error message and if so, which one? How exactly have you set up your workspace?

@Hans van der Maarel i've resolved the error. However, after running i don't see port outputs like path_windows, path_unix, etc. (the output ports seen from using the standalone directory and path reader). Do i need to configure something else on the featurereader to see this?

Forum|alt.badge.img

+33

ebygomm
Influencer
3267 replies
1 year ago
August 24, 2023

fme wrote:

@Hans van der Maarel i've resolved the error. However, after running i don't see port outputs like path_windows, path_unix, etc. (the output ports seen from using the standalone directory and path reader). Do i need to configure something else on the featurereader to see this?

In the FeatureReader if using the Directory and File Pathnames reader the easiest thing is to set the Output Port to Single Output Port and then choose which attributes to expose. Features with those attributes should then exit the Generic port

Forum|alt.badge.img

+7

fme
Author
Contributor
31 replies
1 year ago
August 24, 2023

fme wrote:

@Hans van der Maarel i've resolved the error. However, after running i don't see port outputs like path_windows, path_unix, etc. (the output ports seen from using the standalone directory and path reader). Do i need to configure something else on the featurereader to see this?

Thanks!@ebygomm

Forum|alt.badge.img

+7

fme
Author
Contributor
31 replies
1 year ago
August 24, 2023

fme wrote:

@Hans van der Maarel i've resolved the error. However, after running i don't see port outputs like path_windows, path_unix, etc. (the output ports seen from using the standalone directory and path reader). Do i need to configure something else on the featurereader to see this?

@ebygomm Can Path reader filter 2 type of files? Eg. a pdf and xlsx instead of using different path readers?

Forum|alt.badge.img

+8

boydfme
Contributor
24 replies
1 year ago
August 24, 2023

fme wrote:

@Hans van der Maarel i've resolved the error. However, after running i don't see port outputs like path_windows, path_unix, etc. (the output ports seen from using the standalone directory and path reader). Do i need to configure something else on the featurereader to see this?

yes. For example:

*.{pdf,xlsx}

Forum|alt.badge.img

+7

fme
Author
Contributor
31 replies
1 year ago
August 24, 2023

fme wrote:

@Hans van der Maarel i've resolved the error. However, after running i don't see port outputs like path_windows, path_unix, etc. (the output ports seen from using the standalone directory and path reader). Do i need to configure something else on the featurereader to see this?

@boydfme Thanks you! Appreciate the help!

Forum|alt.badge.img

+7

fme
Author
Contributor
31 replies
1 year ago
August 29, 2023

Is there a way to implement batch processing in fme workbench? The path reader is currently taking days to go through the folders to retrieve the files. Is there a varied efficient way to do this?

Forum|alt.badge.img

+33

ebygomm
Influencer
3267 replies
1 year ago
August 29, 2023

fme wrote:

Is there a way to implement batch processing in fme workbench? The path reader is currently taking days to go through the folders to retrieve the files. Is there a varied efficient way to do this?

I've seen reports previously of poor performance with the path reader when recurse is set to yes

Perhaps you could see if a python solution is any quicker?

import fme
import fmeobjects
import glob
 
class FeatureProcessor(object):
    
    def __init__(self):
        pass
 
    def input(self, feature):
        root_dir = feature.getAttribute('directory')
        for filename in glob.iglob(root_dir + '/**/*.xlsx', recursive=True):
            feature.setAttribute("filename",filename)
            self.pyoutput(feature)
 
    def close(self):
        pass

Forum|alt.badge.img

+7

fme
Author
Contributor
31 replies
1 year ago
August 30, 2023

ebygomm wrote:

I've seen reports previously of poor performance with the path reader when recurse is set to yes

Perhaps you could see if a python solution is any quicker?

import fme
import fmeobjects
import glob
 
class FeatureProcessor(object):
    
    def __init__(self):
        pass
 
    def input(self, feature):
        root_dir = feature.getAttribute('directory')
        for filename in glob.iglob(root_dir + '/**/*.xlsx', recursive=True):
            feature.setAttribute("filename",filename)
            self.pyoutput(feature)
 
    def close(self):
        pass

Thank you @ebygomm I got an error after replacing directory with path windows from path reader reading folder names. Attached is a screenshot of the error. Thanks for all the help!

1 Attachments

Forum|alt.badge.img

+33

ebygomm
Influencer
3267 replies
1 year ago
August 30, 2023

ebygomm wrote:

I've seen reports previously of poor performance with the path reader when recurse is set to yes

Perhaps you could see if a python solution is any quicker?

import fme
import fmeobjects
import glob
 
class FeatureProcessor(object):
    
    def __init__(self):
        pass
 
    def input(self, feature):
        root_dir = feature.getAttribute('directory')
        for filename in glob.iglob(root_dir + '/**/*.xlsx', recursive=True):
            feature.setAttribute("filename",filename)
            self.pyoutput(feature)
 
    def close(self):
        pass

You've duplicated feature.getAttribute when changing the name in the line starting root_dir

Forum|alt.badge.img

+7

fme
Author
Contributor
31 replies
1 year ago
August 31, 2023

ebygomm wrote:

I've seen reports previously of poor performance with the path reader when recurse is set to yes

Perhaps you could see if a python solution is any quicker?

import fme
import fmeobjects
import glob
 
class FeatureProcessor(object):
    
    def __init__(self):
        pass
 
    def input(self, feature):
        root_dir = feature.getAttribute('directory')
        for filename in glob.iglob(root_dir + '/**/*.xlsx', recursive=True):
            feature.setAttribute("filename",filename)
            self.pyoutput(feature)
 
    def close(self):
        pass

Thank you @ebygomm I am currently using workspace runner to go through all the files within the folders to be processed. However, I noticed something strange as in when I set Workspace run per Fme process to 1 with Maximum Concurrent FME processes set to 7, I have more records in my output than when i increase that number. It looks like there is data loss when Workspace run per Fme process is greater. With a low number, it takes hours to process the files and wished it would work with Workspace run per Fme process set to 1000 or more. Is this a bug with Workspace Runner or am I missing something?

Reply

Rich Text Editor, editor1

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

×

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing