Perl CPAN modules to simplify file cleanup
I had a clever idea a couple months ago: to write a blog post detailing how to find recursively find duplicate files in a folder. My technique was good enough: track file sizes, find files that had the same file size and MD5 hash, and display the resulting list. It wasn’t foolproof, but it showed some thought. After spending a little too much time on the post, I realized I had never checked CPAN. Of course there is already a module to handle that exact task.
The Problem
So here is my problem. I have — let’s see —
I have 44,388 files in my Sync folder.
I organized my home machines recently. When I say “organized” I mean that everything got swept into my ~/Sync
folder to deal with later. The refuse of several years squirreling files into random locations is now sitting in that single folder.
Well, now it is time to clean that single folder up. I want to find and delete duplicate files. I planned to focus on image files, but File::Find::Duplicates makes it easier to find all duplicates.
The Solution
File::Find::Duplicates exports a find_duplicate_files
subroutine, which finds the duplicate files in a list of folders.
First tell me how many sets of duplicates I have.
count-dupes.pl
This will tell me how much work is ahead of me.
Removing the files was easy, but it rattled my nerves.
remove-dupes.pl
I fought the temptation to add progress bars or anything like that. Focus on getting the job done. I can add work if I end up revisiting this task later.
I removed a lot of files. Are there still any duplicates?
Thing is, I suspect that my Sync
directory contains many empty subdirectories.
About Those Directories
File::Find::Rule::DirectoryEmpty helps with exactly that problem. It extends the useful File::Find::Rule module to simplify finding files with characteristics you define.
find-leaves.pl
Yow. I can delete those directories, but then there could be parent directories that are now empty, and then grandparent directories, and then —
You know what? Just keep looking and deleting until there no more empty directories.
remove-leaves.pl
I like a little logging on each pass so that I know what my program is seeing.
I might dig in later to actually organize the remaining files. I may even automate it with some Perl. This is good enough for today, though.
Done
Now I have 40,880 files in my ~/Sync
folder. Maybe I should have counted directories too.
Backlinks
Got a comment? A question? More of a comment than a question?
Talk to me about this page on:
Added to vault 2024-01-15. Updated on 2024-02-01