Given a sequence of paths (full directory paths), finds and groups duplicate files recursively.
Doesn't provide 100% accuracy. Reported files in a group might not be exactly the same.
Go to src/filedups in terminal.
Put full paths of directories you want to search in in-dirs.txt file on separate lines.
Options (M, X are the number of bytes) (default value for M is 1024000 (1000 KB), default value for X is None):
--min-file-size M
--max-file-size X
Then run main.py:
For Linux:
python3 main.py in-dirs.txt
1000 KB minimum file size:
python3 main.py in-dirs.txt --min-file-size 1024000
200 KB minimum, 2000 KB maximum file size:
python3 main.py in-dirs.txt --min-file-size 204800 --max-file-size 2048000
For Windows:
py main.py in-dirs.txt
Results will be in a text file of current working directory of command line
, which starts with filedups and contains timestamp of the scan.
It takes at least 3 minutes to filter 284000 files to 40300 files and then find duplicates.
It takes at least 19 minutes to filter 286000 files to 140000 files and then find duplicates.
Executables: https://mega.nz/folder/9MtnBS6Y#mX-uxPin8hcAnt5ENvXBOg
