filedups

Given a sequence of paths (full directory paths), finds and groups duplicate files recursively.
Doesn't provide 100% accuracy. Reported files in a group might not be exactly the same.

Example Scan on Windows

HOW TO USE

Go to src/filedups in terminal.
Put full paths of directories you want to search in in-dirs.txt file on separate lines.
Options (M, X are the number of bytes) (default value for M is 1024000 (1000 KB), default value for X is None):
--min-file-size M
--max-file-size X

Then run main.py:
For Linux:
python3 main.py in-dirs.txt

1000 KB minimum file size:
python3 main.py in-dirs.txt --min-file-size 1024000

200 KB minimum, 2000 KB maximum file size:
python3 main.py in-dirs.txt --min-file-size 204800 --max-file-size 2048000

For Windows:
py main.py in-dirs.txt

Results will be in a text file of current working directory of command line
, which starts with filedups and contains timestamp of the scan.

Notes

It takes at least 3 minutes to filter 284000 files to 40300 files and then find duplicates.
It takes at least 19 minutes to filter 286000 files to 140000 files and then find duplicates.

Executables: https://mega.nz/folder/9MtnBS6Y#mX-uxPin8hcAnt5ENvXBOg

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
src/filedups		src/filedups
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

filedups

Example Scan on Windows

HOW TO USE

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

filedups

Example Scan on Windows

HOW TO USE

Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages