Question

Reading multiple files from multiple zipfiles

  • 7 December 2023
  • 6 replies
  • 29 views

Userlevel 5
Badge +25

I have a bit of a dilemma. I'm setting up a self-serve workflow where the end-user will have to upload some files and then run them through a validation and processing workspace.

 

The files that are needed are 1 mid/mif and 5 csv's with different schema's. To make things easier I want the user to zip them all up and upload them as a single file. I'm using a File user parameter then point that to a Generic reader which reads [somethingsomething].zip\\*.* and then use wildcards to match them to the specific feature types. This works fine if a single zip file is processed.

 

However... this data is grouped by region and the customer indicated that sometimes they need multiple regions. Plus... features from two adjacent regions may actually geographically coincide, so the processing workspace needs to have them both at the same time.

 

I set up the File parameter to handle multiple files as input, split the value and then run them through a FeatureReader. That's where the issue is: I can't seem to get it to recognize the separate csv's, they're all lumped into one port and I'm not even sure it's reading all the attributes from all 5 csv's.

 

Any ideas? I really don't want to go to adding separate readers for everything because of the high potential for user error.


6 replies

Badge +13

It is possible that those csv in different zip files may have the same name and cause data to lump together when FME read it.

1) Maybe you can try set the reader to read Directory and Files Name, set Path Filter to *.* and expose path_windows. This should read all the files submitted by the user.

2) Now you add another feature reader after that one to to read csv data. and put dataset path as path_windows. (You will need seperate reader if there's other file extension besides .csv)

3) in the CSV feature reader > Parameter > Schema Attribute to expose >select fme_dataset and fme_basename

4) Under Output > Single Output Port> Attribute and Geometry Handling > <Generic> Port> put fme_dataset and fme_basename

5) Connect Matcher to <Generic> port and in Matcher you can uncheck the "Check Geometry" and in Check Attributes > Match Selected Attributes and Select fme_dataset

6) Expose the csv attributes and start your processing

 

This should let you read all the csv regardless of schemas or if they have the same name in different zip file or not. _match_id allow you to group csv easier and you can handle them differently.

 

I hope i understand your problem correctly. Happy FMEing!

Userlevel 5
Badge +25

It is possible that those csv in different zip files may have the same name and cause data to lump together when FME read it.

1) Maybe you can try set the reader to read Directory and Files Name, set Path Filter to *.* and expose path_windows. This should read all the files submitted by the user.

2) Now you add another feature reader after that one to to read csv data. and put dataset path as path_windows. (You will need seperate reader if there's other file extension besides .csv)

3) in the CSV feature reader > Parameter > Schema Attribute to expose >select fme_dataset and fme_basename

4) Under Output > Single Output Port> Attribute and Geometry Handling > <Generic> Port> put fme_dataset and fme_basename

5) Connect Matcher to <Generic> port and in Matcher you can uncheck the "Check Geometry" and in Check Attributes > Match Selected Attributes and Select fme_dataset

6) Expose the csv attributes and start your processing

 

This should let you read all the csv regardless of schemas or if they have the same name in different zip file or not. _match_id allow you to group csv easier and you can handle them differently.

 

I hope i understand your problem correctly. Happy FMEing!

Thanks for your suggestion but no luck so far I'm afraid.

 

Basically what I want to do is allow the user to upload two or more zipfiles with this content:

imageThe csv's all have a different schema (and it's important, because there's a lot of logic happening) and their names *may* be prefixed, although I think we can make a good case to not allow that. I also want to keep it as simple for user as possible, so only a single upload.

 

I currently have it set up with a "Files" type parameter;

imageWhich then gets put into a Generic reader:

imageThe \\* is to force the Generic reader to look inside the zipfile. I've added a CSV reader as a resource and imported the feature types, using wildcard matching to make sure the right data ends up in the right place.

 

This works fine when I only process one zipfile at a time. But when I try two the $ZIPFILE parameter looks like this: ""C:\\Temp\\File1.zip" "C:\\Temp\\File2.zip"" and the Generic reader then only processes the last one. So that's why I decided to try the same approach with a FeatureReader, by splitting the parameter into its parts first and using each as a separate initiator. But then I can't seem to get it to separate the csv's out 😕 In fact, I can't even expose the original filename as an attribute so I can manually filter them.

Userlevel 2
Badge +17

Hi @Hans van der Maarel​ ,

Can FilePathExtractor from FME Hub help you?

This transformer extracts multiple zip files and outputs features which have some attributes containing extracted file path etc. Something like PATH reader.

Badge +13

Thanks for your suggestion but no luck so far I'm afraid.

 

Basically what I want to do is allow the user to upload two or more zipfiles with this content:

imageThe csv's all have a different schema (and it's important, because there's a lot of logic happening) and their names *may* be prefixed, although I think we can make a good case to not allow that. I also want to keep it as simple for user as possible, so only a single upload.

 

I currently have it set up with a "Files" type parameter;

imageWhich then gets put into a Generic reader:

imageThe \\* is to force the Generic reader to look inside the zipfile. I've added a CSV reader as a resource and imported the feature types, using wildcard matching to make sure the right data ends up in the right place.

 

This works fine when I only process one zipfile at a time. But when I try two the $ZIPFILE parameter looks like this: ""C:\\Temp\\File1.zip" "C:\\Temp\\File2.zip"" and the Generic reader then only processes the last one. So that's why I decided to try the same approach with a FeatureReader, by splitting the parameter into its parts first and using each as a separate initiator. But then I can't seem to get it to separate the csv's out 😕 In fact, I can't even expose the original filename as an attribute so I can manually filter them.

Good morning @Hans van der Maarel​. The only different we have on the User parameter is that i have Path Selection = Multiple Paths, and has specify extension filter = *.zip 

user_param(however if leave it as * and in the feature reader have it read recurse subfolder should achieve the same thing) so I don't think that was the issue.

I create bogus csv files and seperate them into 2 different zip files and tried it. I was able to read all the csv files and it showed different schemas on the feature information in SingleMatched.

all_files I also attached the workspace to see if this would work.

 

Please let me know how this goes!

Userlevel 5
Badge +25

Thanks for your suggestion but no luck so far I'm afraid.

 

Basically what I want to do is allow the user to upload two or more zipfiles with this content:

imageThe csv's all have a different schema (and it's important, because there's a lot of logic happening) and their names *may* be prefixed, although I think we can make a good case to not allow that. I also want to keep it as simple for user as possible, so only a single upload.

 

I currently have it set up with a "Files" type parameter;

imageWhich then gets put into a Generic reader:

imageThe \\* is to force the Generic reader to look inside the zipfile. I've added a CSV reader as a resource and imported the feature types, using wildcard matching to make sure the right data ends up in the right place.

 

This works fine when I only process one zipfile at a time. But when I try two the $ZIPFILE parameter looks like this: ""C:\\Temp\\File1.zip" "C:\\Temp\\File2.zip"" and the Generic reader then only processes the last one. So that's why I decided to try the same approach with a FeatureReader, by splitting the parameter into its parts first and using each as a separate initiator. But then I can't seem to get it to separate the csv's out 😕 In fact, I can't even expose the original filename as an attribute so I can manually filter them.

Thanks, that does seem to work for the CSV's (although I need to manually expose the attributes), but there's mid/mif files inside the zip as well 😅

Badge +13

Thanks for your suggestion but no luck so far I'm afraid.

 

Basically what I want to do is allow the user to upload two or more zipfiles with this content:

imageThe csv's all have a different schema (and it's important, because there's a lot of logic happening) and their names *may* be prefixed, although I think we can make a good case to not allow that. I also want to keep it as simple for user as possible, so only a single upload.

 

I currently have it set up with a "Files" type parameter;

imageWhich then gets put into a Generic reader:

imageThe \\* is to force the Generic reader to look inside the zipfile. I've added a CSV reader as a resource and imported the feature types, using wildcard matching to make sure the right data ends up in the right place.

 

This works fine when I only process one zipfile at a time. But when I try two the $ZIPFILE parameter looks like this: ""C:\\Temp\\File1.zip" "C:\\Temp\\File2.zip"" and the Generic reader then only processes the last one. So that's why I decided to try the same approach with a FeatureReader, by splitting the parameter into its parts first and using each as a separate initiator. But then I can't seem to get it to separate the csv's out 😕 In fact, I can't even expose the original filename as an attribute so I can manually filter them.

Oh yea I forgot about those. if it's always only CSV and mid/mif in those zipfiles. I think you can expose path_extension in FeatureReader and do Testfilter separate the extension and then send them to the designated reader (CSV & MapInfo & Generic(any format)) I never tried generic any format, so i'm not quite sure if that would work.

 

And yes that's one down side on Generic port for it not expose any of the attribute, we will have to do it as the process goes lol. Attribute Expose from feature cache or dataset might help you, but it's not very dynamic sadly.

Reply