Boston Linux & Unix (BLU) Home | Calendar | Mail Lists | List Archives | Desktop SIG | Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings
Linux Cafe | Meeting Notes | Blog | Linux Links | Bling | About BLU

BLU Discuss list archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Help with finding duplicate photos



 So, I've been thinking, and because I'm no good at the command line, but 
I can hold my own with mysql, I'm going to populate a mysql table with 
the filename (+path) and the md5 checksum of that file. Then I'll run 
queries on the table. Watch this space... 

Tom Haskins-Vaughan wrote: 
> Thanks guys, I'll have a look and let you know how I get on. 
> 
> Tom Metro wrote: 
>> David Kramer wrote: 
>>> Tom Haskins-Vaughan wrote: 
>>>> I have a directory, /home/photos and in that folder are lots and 
>>>> lots of photos in many different subfolders. 
>>> 
>>> sum /home/photos/* | sort 
>>> 
>>> Wherever you see the same number at the beginning of two consecutive 
>>> lines, you have a match. 
>> 
>> Good idea, but the op mentioned that they're in sub folders, and sum 
>> won't traverse the directory tree. You can however use 'find' to do 
>> that, and then post-process the output with 'cut', 'sort', and 'uniq' 
>> to  report only the files that are identical. 
>> 
>> But I'd probably just write a small Perl program to do it using 
>> File::Find, and Digest::MD5, or Perl's built-in checksum capability. 
>> 
>> This is a common problem, so if you're not up to scripting a solution, 
>> check Freshmeat.net. There's probably one specifically designed for 
>> finding duplicate images. 
>> 
>>  -Tom 
>> 
> 


BLU is a member of BostonUserGroups
BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Valid HTML 4.01! Valid CSS!



Boston Linux & Unix / webmaster@blu.org