Not that you can't use python, but why not simply use the Directory and File Pathnames reader, with Retrieve File Properties set to Yes, followed by a ChangeDetector.
Hi @arthy, thanks for posting your question!
Do wait for more serpent-like answers from Python experts - but in the mean time you might want to look at the
Directory and File Pathnames Reader. It'll grab all your directory and/or file names and attributes (like size, owner, date created, etc.) for your comparison. Transformers like the
Matcher,
ChangeDetector, or
DuplicateFilter could be used to compare the results after reading.
Hope this helps!
Nathan
Thanks @NathanAtSafe and @jdh for your replies,
The directory and file pathnames reader can give the results but it is extremely low, That's why I would like to use a python caller given that it is a workbench that I will have to run several times manually.
@takashi, @david_r, or anyone else any thoughts?
For example, a PythonCaller with this script creates three list attributes (see below) from two folder paths specified by attributes called "_original_folder" and "_revised_folder", and adds the lists to the input feature. Note: This script example is to just describe a possible way to collect sizes of all files under a folder. It may not be optimal for your final goal. Please modify it appropriately.
# PythonCaller Script Example
class FileSizesComparer(object):
def input(self, feature):
# Collect sizes of all files under the two folders.
originalSizes, revisedSizes = {}, {} # {<relative file path> : <size>}
collectFileSizes(originalSizes, feature.getAttribute('_original_folder'))
collectFileSizes(revisedSizes, feature.getAttribute('_revised_folder'))
originalPaths = set(originalSizes.keys())
revisedPaths = set(revisedSizes.keys())
for i, path in enumerate(originalPaths & revisedPaths):
sizeDiff = revisedSizes[path] - originalSizes[path]
feature.setAttribute('_unchanged{%d}.filename' % i, path)
feature.setAttribute('_unchanged{%d}.size_original' % i, originalSizes[path])
feature.setAttribute('_unchanged{%d}.size_revised' % i, revisedSizes[path])
feature.setAttribute('_unchanged{%d}.size_diff' % i, sizeDiff)
for i, path in enumerate(revisedPaths - originalPaths):
feature.setAttribute('_added{%d}.filename' % i, path)
feature.setAttribute('_added{%d}.size' % i, revisedSizes[path])
for i, path in enumerate(originalPaths - revisedPaths):
feature.setAttribute('_deleted{%d}.filename' % i, path)
feature.setAttribute('_deleted{%d}.size' % i, originalSizes[path])
self.pyoutput(feature)
# Helper function: Collect sizes of all files under a specified root folder.
# Arguments
# - pathToSize: dictionary {<relative file path> : <size>}
# - absRoot: absolute path of the root folder
# - relRoot: relative path of the root folder (empty by default)
import os
def collectFileSizes(pathToSize, absRoot, relRoot=''):
for name in os.listdir(absRoot):
absPath = os.path.join(absRoot, name) # absolute path of file or directory
relPath = os.path.join(relRoot, name) # relative path of file or directory
if not os.path.islink(absPath):
if os.path.isfile(absPath):
pathToSize[relPath] = os.path.getsize(absPath)
else:
collectFileSizes(pathToSize, absPath, relPath) # recursive call
"_unchanged{}" list contains the information on files existing under both original and revised folders. The list consists of these four members.
- "filename" stores the relative file path.
- "size_original" stores size of the file under the original folder.
- "size_revised" stores size of the file under the revised folder.
- "size_diff" stores the difference between the sizes of original file and revised file.
"_added{}" list contains the information (filename and size) on files existing only under revised folder.
"_deleted{}" list contains the information (filename and size) on files existing only under original folder.
For example, a PythonCaller with this script creates three list attributes (see below) from two folder paths specified by attributes called "_original_folder" and "_revised_folder", and adds the lists to the input feature. Note: This script example is to just describe a possible way to collect sizes of all files under a folder. It may not be optimal for your final goal. Please modify it appropriately.
# PythonCaller Script Example
class FileSizesComparer(object):
def input(self, feature):
# Collect sizes of all files under the two folders.
originalSizes, revisedSizes = {}, {} # {<relative file path> : <size>}
collectFileSizes(originalSizes, feature.getAttribute('_original_folder'))
collectFileSizes(revisedSizes, feature.getAttribute('_revised_folder'))
originalPaths = set(originalSizes.keys())
revisedPaths = set(revisedSizes.keys())
for i, path in enumerate(originalPaths & revisedPaths):
sizeDiff = revisedSizes[path] - originalSizes[path]
feature.setAttribute('_unchanged{%d}.filename' % i, path)
feature.setAttribute('_unchanged{%d}.size_original' % i, originalSizes[path])
feature.setAttribute('_unchanged{%d}.size_revised' % i, revisedSizes[path])
feature.setAttribute('_unchanged{%d}.size_diff' % i, sizeDiff)
for i, path in enumerate(revisedPaths - originalPaths):
feature.setAttribute('_added{%d}.filename' % i, path)
feature.setAttribute('_added{%d}.size' % i, revisedSizes[path])
for i, path in enumerate(originalPaths - revisedPaths):
feature.setAttribute('_deleted{%d}.filename' % i, path)
feature.setAttribute('_deleted{%d}.size' % i, originalSizes[path])
self.pyoutput(feature)
# Helper function: Collect sizes of all files under a specified root folder.
# Arguments
# - pathToSize: dictionary {<relative file path> : <size>}
# - absRoot: absolute path of the root folder
# - relRoot: relative path of the root folder (empty by default)
import os
def collectFileSizes(pathToSize, absRoot, relRoot=''):
for name in os.listdir(absRoot):
absPath = os.path.join(absRoot, name) # absolute path of file or directory
relPath = os.path.join(relRoot, name) # relative path of file or directory
if not os.path.islink(absPath):
if os.path.isfile(absPath):
pathToSize[relPath] = os.path.getsize(absPath)
else:
collectFileSizes(pathToSize, absPath, relPath) # recursive call
"_unchanged{}" list contains the information on files existing under both original and revised folders. The list consists of these four members.
- "filename" stores the relative file path.
- "size_original" stores size of the file under the original folder.
- "size_revised" stores size of the file under the revised folder.
- "size_diff" stores the difference between the sizes of original file and revised file.
"_added{}" list contains the information (filename and size) on files existing only under revised folder.
"_deleted{}" list contains the information (filename and size) on files existing only under original folder.
@takashi I have the same issue and I found this post online, can you give me a better explanation on how to modify this code and test it out? I need to compare file sizes or a PRE upgrade and POST upgrade output files.
@tosinbabs The workspace suggested by @jdh would look something like this (2018.1): filechangedetector.fmw