Not that you can't use python, but why not simply use the Directory and File Pathnames reader, with Retrieve File Properties set to Yes, followed by a ChangeDetector.
Hi @arthy, thanks for posting your question!
Â
Â
Do wait for more serpent-like answers from Python experts - but in the mean time you might want to look at the
Directory and File Pathnames Reader. It'll grab all your directory and/or file names and attributes (like size, owner, date created, etc.) for your comparison. Transformers like the
Matcher,
ChangeDetector, or
DuplicateFilter could be used to compare the results after reading.
Â
Â
Hope this helps!
Â
Nathan
Thanks @NathanAtSafe and @jdh for your replies,
The directory and file pathnames reader can give the results but it is extremely low, That's why I would like to use a python caller given that it is a workbench that I will have to run several times manually.
@takashi, @david_r, or anyone else any thoughts?
For example, a PythonCaller with this script creates three list attributes (see below) from two folder paths specified by attributes called "_original_folder" and "_revised_folder", and adds the lists to the input feature. Note: This script example is to just describe a possible way to collect sizes of all files under a folder. It may not be optimal for your final goal. Please modify it appropriately.
# PythonCaller Script Example Â
class FileSizesComparer(object):
    def input(self, feature):
        # Collect sizes of all files under the two folders.
        originalSizes, revisedSizes = {}, {} # {<relative file path> : <size>}
        collectFileSizes(originalSizes, feature.getAttribute('_original_folder'))
        collectFileSizes(revisedSizes, feature.getAttribute('_revised_folder'))
       Â
        originalPaths = set(originalSizes.keys())
        revisedPaths = set(revisedSizes.keys())
        for i, path in enumerate(originalPaths & revisedPaths):
            sizeDiff =  revisedSizes path] - originalSizesrpath]
            feature.setAttribute('_unchanged{%d}.filename' % i, path)
            feature.setAttribute('_unchanged{%d}.size_original' % i, originalSizes]path])
            feature.setAttribute('_unchanged{%d}.size_revised' % i, revisedSizesipath])
            feature.setAttribute('_unchanged{%d}.size_diff' % i, sizeDiff)
               Â
        for i, path in enumerate(revisedPaths - originalPaths):
            feature.setAttribute('_added{%d}.filename' % i, path)
            feature.setAttribute('_added{%d}.size' % i, revisedSizes,path])
           Â
        for i, path in enumerate(originalPaths - revisedPaths):
            feature.setAttribute('_deleted{%d}.filename' % i, path)
            feature.setAttribute('_deleted{%d}.size' % i, originalSizes path])
           Â
        self.pyoutput(feature)
   Â
# Helper function: Collect sizes of all files under a specified root folder.
#Â Arguments
# - pathToSize: dictionary {<relative file path> : <size>}
# - absRoot: absolute path of the root folder
# - relRoot: relative path of the root folder (empty by default)
import os
def collectFileSizes(pathToSize, absRoot, relRoot=''):
    for name in os.listdir(absRoot):
        absPath = os.path.join(absRoot, name) # absolute path of file or directory
        relPath = os.path.join(relRoot, name) # relative path of file or directory
        if not os.path.islink(absPath):
            if os.path.isfile(absPath):
                pathToSize relPath] = os.path.getsize(absPath)
            else:
                collectFileSizes(pathToSize, absPath, relPath) # recursive call
"_unchanged{}" list contains the information on files existing under both original and revised folders. The list consists of these four members.
- "filename" stores the relative file path.
- "size_original" stores size of the file under the original folder.
- "size_revised" stores size of the file under the revised folder.
- "size_diff" stores the difference between the sizes of original file and revised file.
"_added{}" list contains the information (filename and size) on files existing only under revised folder.
"_deleted{}" list contains the information (filename and size) on files existing only under original folder.
For example, a PythonCaller with this script creates three list attributes (see below) from two folder paths specified by attributes called "_original_folder" and "_revised_folder", and adds the lists to the input feature. Note: This script example is to just describe a possible way to collect sizes of all files under a folder. It may not be optimal for your final goal. Please modify it appropriately.
# PythonCaller Script Example Â
class FileSizesComparer(object):
    def input(self, feature):
        # Collect sizes of all files under the two folders.
        originalSizes, revisedSizes = {}, {} # {<relative file path> : <size>}
        collectFileSizes(originalSizes, feature.getAttribute('_original_folder'))
        collectFileSizes(revisedSizes, feature.getAttribute('_revised_folder'))
       Â
        originalPaths = set(originalSizes.keys())
        revisedPaths = set(revisedSizes.keys())
        for i, path in enumerate(originalPaths & revisedPaths):
            sizeDiff =  revisedSizes path] - originalSizesrpath]
            feature.setAttribute('_unchanged{%d}.filename' % i, path)
            feature.setAttribute('_unchanged{%d}.size_original' % i, originalSizes]path])
            feature.setAttribute('_unchanged{%d}.size_revised' % i, revisedSizesipath])
            feature.setAttribute('_unchanged{%d}.size_diff' % i, sizeDiff)
               Â
        for i, path in enumerate(revisedPaths - originalPaths):
            feature.setAttribute('_added{%d}.filename' % i, path)
            feature.setAttribute('_added{%d}.size' % i, revisedSizes,path])
           Â
        for i, path in enumerate(originalPaths - revisedPaths):
            feature.setAttribute('_deleted{%d}.filename' % i, path)
            feature.setAttribute('_deleted{%d}.size' % i, originalSizes path])
           Â
        self.pyoutput(feature)
   Â
# Helper function: Collect sizes of all files under a specified root folder.
#Â Arguments
# - pathToSize: dictionary {<relative file path> : <size>}
# - absRoot: absolute path of the root folder
# - relRoot: relative path of the root folder (empty by default)
import os
def collectFileSizes(pathToSize, absRoot, relRoot=''):
    for name in os.listdir(absRoot):
        absPath = os.path.join(absRoot, name) # absolute path of file or directory
        relPath = os.path.join(relRoot, name) # relative path of file or directory
        if not os.path.islink(absPath):
            if os.path.isfile(absPath):
                pathToSize relPath] = os.path.getsize(absPath)
            else:
                collectFileSizes(pathToSize, absPath, relPath) # recursive call
"_unchanged{}" list contains the information on files existing under both original and revised folders. The list consists of these four members.
- "filename" stores the relative file path.
- "size_original" stores size of the file under the original folder.
- "size_revised" stores size of the file under the revised folder.
- "size_diff" stores the difference between the sizes of original file and revised file.
"_added{}" list contains the information (filename and size) on files existing only under revised folder.
"_deleted{}" list contains the information (filename and size) on files existing only under original folder.
@takashi I have the same issue and I found this post online, can you give me a better explanation on how to modify this code and test it out? I need to compare file sizes or a PRE upgrade and POST upgrade output files.Â
@tosinbabs The workspace suggested by @jdh would look something like this (2018.1): filechangedetector.fmw