Skip to main content
Question

How to convert this to run in FME to pass all excel file types .xlsx out to an attribute. What I want to achieve is to recurse into every folder on my list, recursing into subfolders to identify all .xlsx files and return their file paths.


fme
Contributor
Forum|alt.badge.img+7
  • Contributor

import os

 

def get_excel_file_paths(folder_path):

  excel_file_paths = []

   

  for root, dirs, files in os.walk(folder_path):

    for file in files:

      if file.endswith(".xlsx"):

        excel_file_paths.append(os.path.join(root, file))

         

  return excel_file_paths

 

folder_path = "/path/to/my/folder" ----I have a list of folders in a csv file which i read into the workspace so i plan to use that attribute here

excel_paths = get_excel_file_paths(folder_path)

 

for path in excel_paths:

  print(path)

 

Having troubles converting this to run in fme

 

Thanks!

16 replies

ebygomm
Influencer
Forum|alt.badge.img+33
  • Influencer
  • August 24, 2023

A Directory and File Pathnames reader would be the FME way to achieve this

image


boydfme
Contributor
Forum|alt.badge.img+8
  • Contributor
  • August 24, 2023

Basic Single Implementation

You can use a path reader to read in a single folder location:

path reader 

And then just set the text search for excels (*.xlsx) and say yes to look in subfolders:

imageWhen setup you will get attributes for all the file paths and other ancillary information as attributes. This would satisfy looking into 1 folder location with subfolders (if present).

 

For a recursive approach using your csv:

You could read in the csv and whatever attribute contains the folder directories you need to search (with subfolders) for excels you could link that to a FeatureReader transformer and then populate the settings as above as a PATH reader. To make it dynamic you would just use your csv attributes to populate the base folder path in the FeatureReader settings.

So each "feature" from the csv will enter the FeatureReader and populate the base folder directory where it should look and then find all the .xlsx's.

 

 

 


fme
Contributor
Forum|alt.badge.img+7
  • Author
  • Contributor
  • August 24, 2023
ebygomm wrote:

A Directory and File Pathnames reader would be the FME way to achieve this

image

@ebygomm​ could you please share the workspace in the screenshot? Trying it out now but the feature reader does not seem to give me the ports in your screenshot and also failed.

 

I did not think the featurereader had directory and filepath readers in it so was looking at more of a python implementation as it needs to go through lots of folders to retrieve the file paths. However, I will try this out and monitor its efficiency. Thanks


fme
Contributor
Forum|alt.badge.img+7
  • Author
  • Contributor
  • August 24, 2023
boydfme wrote:

Basic Single Implementation

You can use a path reader to read in a single folder location:

path reader 

And then just set the text search for excels (*.xlsx) and say yes to look in subfolders:

imageWhen setup you will get attributes for all the file paths and other ancillary information as attributes. This would satisfy looking into 1 folder location with subfolders (if present).

 

For a recursive approach using your csv:

You could read in the csv and whatever attribute contains the folder directories you need to search (with subfolders) for excels you could link that to a FeatureReader transformer and then populate the settings as above as a PATH reader. To make it dynamic you would just use your csv attributes to populate the base folder path in the FeatureReader settings.

So each "feature" from the csv will enter the FeatureReader and populate the base folder directory where it should look and then find all the .xlsx's.

 

 

 

@boydfme​ Thanks for your response! It runs but fails in the process. Also, there is no output ports for path_windows etc as it is in using the Directory and File Pathnames by itself.


redgeographics
Celebrity
Forum|alt.badge.img+49

@fme​ you've said, twice, that "it fails". Can you elaborate? Are you getting an error message and if so, which one? How exactly have you set up your workspace?


fme
Contributor
Forum|alt.badge.img+7
  • Author
  • Contributor
  • August 24, 2023
redgeographics wrote:

@fme​ you've said, twice, that "it fails". Can you elaborate? Are you getting an error message and if so, which one? How exactly have you set up your workspace?

@Hans van der Maarel​ i've resolved the error. However, after running i don't see port outputs like path_windows, path_unix, etc. (the output ports seen from using the standalone directory and path reader). Do i need to configure something else on the featurereader to see this?


ebygomm
Influencer
Forum|alt.badge.img+33
  • Influencer
  • August 24, 2023
fme wrote:

@Hans van der Maarel​ i've resolved the error. However, after running i don't see port outputs like path_windows, path_unix, etc. (the output ports seen from using the standalone directory and path reader). Do i need to configure something else on the featurereader to see this?

In the FeatureReader if using the Directory and File Pathnames reader the easiest thing is to set the Output Port to Single Output Port and then choose which attributes to expose. Features with those attributes should then exit the Generic port

image


fme
Contributor
Forum|alt.badge.img+7
  • Author
  • Contributor
  • August 24, 2023
fme wrote:

@Hans van der Maarel​ i've resolved the error. However, after running i don't see port outputs like path_windows, path_unix, etc. (the output ports seen from using the standalone directory and path reader). Do i need to configure something else on the featurereader to see this?

Thanks!@ebygomm​ 


fme
Contributor
Forum|alt.badge.img+7
  • Author
  • Contributor
  • August 24, 2023
fme wrote:

@Hans van der Maarel​ i've resolved the error. However, after running i don't see port outputs like path_windows, path_unix, etc. (the output ports seen from using the standalone directory and path reader). Do i need to configure something else on the featurereader to see this?

@ebygomm​ Can Path reader filter 2 type of files? Eg. a pdf and xlsx instead of using different path readers?


boydfme
Contributor
Forum|alt.badge.img+8
  • Contributor
  • August 24, 2023
fme wrote:

@Hans van der Maarel​ i've resolved the error. However, after running i don't see port outputs like path_windows, path_unix, etc. (the output ports seen from using the standalone directory and path reader). Do i need to configure something else on the featurereader to see this?

yes. For example:

*.{pdf,xlsx}

 

 


fme
Contributor
Forum|alt.badge.img+7
  • Author
  • Contributor
  • August 24, 2023
fme wrote:

@Hans van der Maarel​ i've resolved the error. However, after running i don't see port outputs like path_windows, path_unix, etc. (the output ports seen from using the standalone directory and path reader). Do i need to configure something else on the featurereader to see this?

@boydfme​  Thanks you! Appreciate the help!


fme
Contributor
Forum|alt.badge.img+7
  • Author
  • Contributor
  • August 29, 2023

Is there a way to implement batch processing in fme workbench? The path reader is currently taking days to go through the folders to retrieve the files. Is there a varied efficient way to do this?


ebygomm
Influencer
Forum|alt.badge.img+33
  • Influencer
  • August 29, 2023
fme wrote:

Is there a way to implement batch processing in fme workbench? The path reader is currently taking days to go through the folders to retrieve the files. Is there a varied efficient way to do this?

I've seen reports previously of poor performance with the path reader when recurse is set to yes

Perhaps you could see if a python solution is any quicker?

import fme
import fmeobjects
import glob
 
class FeatureProcessor(object):
    
    def __init__(self):
        pass
 
    def input(self, feature):
        root_dir = feature.getAttribute('directory')
        for filename in glob.iglob(root_dir + '/**/*.xlsx', recursive=True):
            feature.setAttribute("filename",filename)
            self.pyoutput(feature)
 
    def close(self):
        pass   

image


fme
Contributor
Forum|alt.badge.img+7
  • Author
  • Contributor
  • August 30, 2023
ebygomm wrote:

I've seen reports previously of poor performance with the path reader when recurse is set to yes

Perhaps you could see if a python solution is any quicker?

import fme
import fmeobjects
import glob
 
class FeatureProcessor(object):
    
    def __init__(self):
        pass
 
    def input(self, feature):
        root_dir = feature.getAttribute('directory')
        for filename in glob.iglob(root_dir + '/**/*.xlsx', recursive=True):
            feature.setAttribute("filename",filename)
            self.pyoutput(feature)
 
    def close(self):
        pass   

image

Thank you @ebygomm​  I got an error after replacing directory with path windows from path reader reading folder names. Attached is a screenshot of the error. Thanks for all the help!

 


ebygomm
Influencer
Forum|alt.badge.img+33
  • Influencer
  • August 30, 2023
ebygomm wrote:

I've seen reports previously of poor performance with the path reader when recurse is set to yes

Perhaps you could see if a python solution is any quicker?

import fme
import fmeobjects
import glob
 
class FeatureProcessor(object):
    
    def __init__(self):
        pass
 
    def input(self, feature):
        root_dir = feature.getAttribute('directory')
        for filename in glob.iglob(root_dir + '/**/*.xlsx', recursive=True):
            feature.setAttribute("filename",filename)
            self.pyoutput(feature)
 
    def close(self):
        pass   

image

You've duplicated feature.getAttribute when changing the name in the line starting root_dir


fme
Contributor
Forum|alt.badge.img+7
  • Author
  • Contributor
  • August 31, 2023
ebygomm wrote:

I've seen reports previously of poor performance with the path reader when recurse is set to yes

Perhaps you could see if a python solution is any quicker?

import fme
import fmeobjects
import glob
 
class FeatureProcessor(object):
    
    def __init__(self):
        pass
 
    def input(self, feature):
        root_dir = feature.getAttribute('directory')
        for filename in glob.iglob(root_dir + '/**/*.xlsx', recursive=True):
            feature.setAttribute("filename",filename)
            self.pyoutput(feature)
 
    def close(self):
        pass   

image

Thank you @ebygomm​  I am currently using workspace runner to go through all the files within the folders to be processed. However, I noticed something strange as in when I set Workspace run per Fme process to 1 with Maximum Concurrent FME processes set to 7, I have more records in my output than when i increase that number. It looks like there is data loss when Workspace run per Fme process  is greater. With a low number, it takes hours to process the files and wished it would work with Workspace run per Fme process set to 1000 or more. Is this a bug with Workspace Runner or am I missing something?


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings