Download Shareware and Freeware Software for Windows, Linux, Macintosh, PDA

line Home  |  About Us  |  Link To Us  |  FAQ  |  Contact

Serving Software Downloads in 956 Categories, Downloaded 50.181.373 Times

dupfinder for Linux 1.4.3

Company: Andriy Mylenkyy
Date Added: September 04, 2013  |  Visits: 529

dupfinder for Linux

Report Broken Link
Printer Friendly Version


Product Homepage
Download (43 downloads)



dupfind is a Python utility that allows you to find duplicated files and directories in your file system.<br /><br />Show how utility find duplicated files:<br /><br />By default utility identifies duplication files by file content.<br /><br />First of all - create several different files in the current directory.<br /><br /> >>> createFile('tfile1.txt', "A"*10)<br /> >>> createFile('tfile2.txt', "A"*1025)<br /> >>> createFile('tfile3.txt', "A"*2048)<br /><br />Then create other files in another directory, one of them to be the same as already created ones.<br /><br /> >>> mkd("dir1")<br /> >>> createFile('tfile1.txt', "A"*20, "dir1")<br /> >>> createFile('tfile2.txt', "A"*1025, "dir1")<br /> >>> createFile('tfile13.txt', "A"*48, "dir1")<br /><br />Look into the directories contents:<br /><br /> >>> ls()<br /> === list directory ===<br /> D :: dir1 :: ...<br /> F :: tfile1.txt :: 10<br /> F :: tfile2.txt :: 1025<br /> F :: tfile3.txt :: 2048<br /><br /> >>> ls("dir1")<br /> === list dir1 directory ===<br /> F :: tfile1.txt :: 20<br /> F :: tfile13.txt :: 48<br /> F :: tfile2.txt :: 1025<br /><br />We see, that "tfile2.txt" is same in both directories, while "tfile1.txt" - has the same name, but differs in size. So utility must identify only "tfile2.txt" as a duplication file.<br /><br />We force output results with "-o " argument to outputf file, and pass testdir as directory that is looking for duplications.<br /><br /> >>> dupfind("-o %(o)s %(dir)s" % {'o':outputf, 'dir': testdir})<br /><br />Now check the results file for duplications.<br /><br /> >>> cat(outputf)<br /> hash,size,type,ext,name,directory,modification,operation,operation_data<br /> ...,1025,F,txt,tfile2.txt,.../tmp.../dir1,...<br /> ...,1025,F,txt,tfile2.txt,.../tmp...,...<br /><br />Show quick/slow utility mode:<br /><br />As mentioned above - utility identifies duplication files by file contents. This mode slows down the system and consumes a lot of system resources.<br /><br />However, in most cases the file name and size is enough to identify the duplication. So in that case you can use quick mode --quick (-q) option.<br /><br />So test the previous files in the quick mode:<br /><br /> >>> dupfind("-q -o %(o)s %(dir)s" % {'o':outputf, 'dir': testdir})<br /><br />Now check the result file for duplications.<br /><br /> >>> cat(outputf)<br /> hash,size,type,ext,name,directory,modification,operation,operation_data<br /> ...,1025,F,txt,tfile2.txt,.../tmp.../dir1,...<br /> ...,1025,F,txt,tfile2.txt,.../tmp...,...<br /><br />As we can see the quick mode identifies duplications correctly.<br /><br />Let's show that there are cases when this mode can lead to mistakes. To do that let's add a file with the same name and size but different content and apply utility in both modes:<br /><br /> >>> createFile('tfile000.txt', "First "*20,)<br /> >>> createFile('tfile000.txt', "Second "*20, "dir1")<br /><br />Now check the duplication results using default (not quick mode) ...<br /><br /> >>> dupfind(" -o %(o)s %(dir)s" % {'o':outputf, 'dir': testdir})<br /> >>> cat(outputf)<br /> hash,size,type,ext,name,directory,modification,operation,operation_data<br /> ...,1025,F,txt,tfile2.txt,.../tmp.../dir1,...<br /> ...,1025,F,txt,tfile2.txt,.../tmp...,...<br /><br />As we can see not-quick mode identifies duplications correctly.<br /><br />Let's check duplications using the quick mode...<br /><br /> >>> dupfind(" -q -o %(o)s %(dir)s" % {'o':outputf, 'dir': testdir})<br /> >>> cat(outputf)<br /> hash,size,type,ext,name,directory,modification,operation,operation_data<br /> ...,140,F,txt,tfile000.txt,.../tmp.../dir1,...<br /> ...,140,F,txt,tfile000.txt,.../tmp...,...<br /> ...,1025,F,txt,tfile2.txt,.../tmp.../dir1,...<br /> ...,1025,F,txt,tfile2.txt,.../tmp...,...<br /><br />As we can see wrong duplications are found using the quick-mode.<br /><br />Cleanup the test<br /><br /> >>> cleanTestDir()<br /><br />Show how utility finds duplicated directories:<br /><br />Utility identifies duplicated directories as directories, all files of which are duplicated and all inner directories are also duplicated directories.<br />First compare 2 directories with the same files.<br /><br />Create directories with the same content.<br /><br /> >>> def mkDir(dpath):<br /> ... mkd(dpath)<br /> ... createFile('tfile1.txt', "A"*10, dpath)<br /> ... createFile('tfile2.txt', "A"*1025, dpath)<br /> ... createFile('tfile3.txt', "A"*2048, dpath)<br /> ...<br /> >>> mkDir("dir1")<br /> >>> mkDir("dir2")<br /><br />Confirm that the directories' contents are really identical<br /><br /> >>> ls("dir1")<br /> === list dir1 directory ===<br /> F :: tfile1.txt :: 10<br /> F :: tfile2.txt :: 1025<br /> F :: tfile3.txt :: 2048<br /><br /> >>> ls("dir2")<br /> === list dir2 directory ===<br /> F :: tfile1.txt :: 10<br /> F :: tfile2.txt :: 1025<br /> F :: tfile3.txt :: 2048<br /><br />Now run the utility and check the result file:<br /><br /> >>> dupfind("-o %(o)s %(dir)s" % {'o':outputf, 'dir': testdir})<br /> >>> cat(outputf)<br /> hash,size,type,ext,name,directory,modification,operation,operation_data<br /> ...,D,,dir1,...<br /> ...,D,,dir2,...<br /><br />Compare 2 directories with the same files and dirs.<br /><br />Create new directories with the same content, but different names in previously created directories.<br /><br />So for directories to be interpreted as duplications - they don't need to have the same name, but the identical content.<br /><br />Add 2 identical directories to the previous ones.<br /><br /> >>> def mkDir1(dpath):<br /> ... mkd(dpath)<br /> ... createFile('tfile11.txt', "B"*4000, dpath)<br /> ... createFile('tfile12.txt', "B"*222, dpath)<br /> ...<br /> >>> mkDir1("dir1/dir11")<br /> >>> mkDir1("dir2/dir21")<br /><br />Note that we added two directories with same contents, but different names. This should not break duplications.<br /><br /> >>> def mkDir2(dpath):<br /> ... mkd(dpath)<br /> ... createFile('tfile21.txt', "C"*4096, dpath)<br /> ... createFile('tfile22.txt', "C"*123, dpath)<br /> ... createFile('tfile23.txt', "C"*444, dpath)<br /> ... createFile('tfile24.txt', "C"*555, dpath)<br /> ...<br /> >>> mkDir2("dir1/dir22")<br /> >>> mkDir2("dir2/dir22")<br /><br />Confirm that directories' contents are really identical<br /><br /> >>> ls("dir1")<br /> === list dir1 directory ===<br /> D :: dir11 :: -1<br /> D :: dir22 :: -1<br /> F :: tfile1.txt :: 10<br /> F :: tfile2.txt :: 1025<br /> F :: tfile3.txt :: 2048<br /> >>> ls("dir2")<br /> === list dir2 directory ===<br /> D :: dir21 :: -1<br /> D :: dir22 :: -1<br /> F :: tfile1.txt :: 10<br /> F :: tfile2.txt :: 1025<br /> F :: tfile3.txt :: 2048<br /><br />And contents for inner directories<br /><br />First subdirectory:<br /><br /> >>> ls("dir1/dir11")<br /> === list dir1/dir11 directory ===<br /> F :: tfile11.txt :: 4000<br /> F :: tfile12.txt :: 222<br /> >>> ls("dir2/dir21")<br /> === list dir2/dir21 directory ===<br /> F :: tfile11.txt :: 4000<br /> F :: tfile12.txt :: 222<br /><br />Second subdirectory:<br /><br /> >>> ls("dir1/dir22")<br /> === list dir1/dir22 directory ===<br /> F :: tfile21.txt :: 4096<br /> F :: tfile22.txt :: 123<br /> F :: tfile23.txt :: 444<br /> F :: tfile24.txt :: 555<br /> >>> ls("dir2/dir22")<br /> === list dir2/dir22 directory ===<br /> F :: tfile21.txt :: 4096<br /> F :: tfile22.txt :: 123<br /> F :: tfile23.txt :: 444<br /> F :: tfile24.txt :: 555<br /><br />Now test the utility.<br /><br /> >>> dupfind("-o %(o)s %(dir)s" % {'o':outputf, 'dir': testdir})<br /><br />Checks the results file for duplications.<br /><br /> >>> cat(outputf)<br /> hash,size,type,ext,name,directory,modification,operation,operation_data<br /> ...,D,,dir1,...<br /> ...,D,,dir2,...<br /><br />NOTE:<br /><br />Inner duplication directories are excluded from the results:<br /><br /> >>> outputres = file(outputf).read()<br /> >>> "dir1/dir11" in outputres<br /> False<br /> >>> "dir1/dir22" in outputres<br /> False<br /> >>> "dir2/dir21" in outputres<br /> False<br /> >>> "dir2/dir22" in outputres<br /> False<br /><br />Utility accepts more than one argument as directories list:<br /><br />Use previous directory structure to prove this:<br /><br />Now pass to utility "dir1/dir11" and "dir2" directories:<br /><br /> >>> dupfind("-o %(o)s %(dir1-11)s %(dir2)s" % {<br /> ... 'o':outputf,<br /> ... 'dir1-11': os.path.join(testdir,"dir1/dir11"),<br /> ... 'dir2': os.path.join(testdir,"dir2"),})<br /><br />Now check the result file for duplications.<br /><br /> >>> cat(outputf)<br /> hash,size,type,ext,name,directory,modification,operation,operation_data<br /> ...,D,,dir11,.../tmp.../dir1,...<br /> ...,D,,dir21,.../tmp.../dir2,...<br /><br />DUPMANAGE UTILITY:<br /><br />dupmanage utility allows you to manage duplication files and directories of your file system with csv data file.<br /><br />Utility use csv-formatted data-file to process duplication items. Data file must contain the following columns:<br /><br /> * type<br /> * name<br /> * directory<br /> * operation<br /> * operation_data<br /><br />Utility supports 2 types of operations with duplication items:<br /><br /> * deleting ("D")<br /> * symlinking ("L") only for UNIX-like systems<br /><br />operation_data is only used for symlinking operation and must contain the path to symlinking sorce item.<br />Show how utility manages duplications:<br /><br />To show - use previous directory structure and also add several duplications:<br /><br />Create a file in the root directory and the same file in another catalog.<br /><br /> >>> createFile('tfile03.txt', "D"*100)<br /> >>> mkd("dir3")<br /> >>> createFile('tfile03.txt', "D"*100, "dir3")<br /><br />Look into directories contents:<br /><br /> >>> ls()<br /> === list directory ===<br /> D :: dir1 :: ...<br /> D :: dir2 :: ...<br /> D :: dir3 :: ...<br /> F :: tfile03.txt :: 100<br /><br /> >>> ls("dir3")<br /> === list dir3 directory ===<br /> F :: tfile03.txt :: 100<br /><br />We already know the previous duplications, so now we create csv-formatted data file to manage duplications.<br /><br /> >>> manage_data = """type,name,directory,operation,operation_data<br /> ... F,tfile03.txt,%(testdir)s/dir3,L,%(testdir)s/tfile03.txt<br /> ... D,dir2,%(testdir)s,D,<br /> ... """ % {'testdir': testdir}<br /> >>> createFile('manage.csv', manage_data)<br /><br />Now call the utility and check result directory content:<br /><br /> >>> manage_path = os.path.join(testdir, 'manage.csv')<br /> >>> dupmanage("%s -v" % manage_path)<br /> [...<br /> [...]: Symlink .../tfile03.txt item to .../dir3/tfile03.txt<br /> [...]: Remove .../dir2 directory<br /> [...]: Processed 2 items<br /><br />Review directory content:<br /><br /> >>> ls()<br /> === list directory ===<br /> D :: dir1 :: ...<br /> D :: dir3 :: ...<br /> F :: tfile03.txt :: 100<br /><br /> >>> ls("dir3")<br /> === list dir3 directory ===<br /> L :: tfile03.txt :: ...<br /><br />#md5=25b2b06a31e74ce1e59a23de682af1ff<br /><br />#md5=5b0697388a8a9913a0ffc56a5e38926a

Requirements: No special requirements
Platforms: *nix, Linux
Keyword: 3d3d3d Catoutputf Check Contents Dir Directories Directory Dirsquot Dpath Dupfinder Dupfinder Linux Duplicated Duplication Duplications Files Gtgtgt Linux Outputf Quick Tfile Txt Utility
Users rating: 0/10

License: Freeware Size: 10.24 KB
USER REVIEWS
More Reviews or Write Review


DUPFINDER FOR LINUX RELATED
Utilities  -  Dir 2 File 1.2.1
Dir 2 File is a small utility that allows you to export directory listings to HTML, plain text, CSV, or Excel files quickly and extremely easily. Includes options for HTML page customisation, and a backup facility to quickly backup files in a...
2.3 MB  
FTP Clients  -  SynchronEX Backup & FTP 2.1
SynchronEX is a versatile tool for synchronization, backup and FTP of files and directories - optimized for one-click automation and supporting shell usage. Possible applications are laptop/server-synchronisation, backup and incremental backup...
600 KB  
Security Tools  -  Appnimi Web Directory Buster 1.0
Appnimi Web Directory Buster is designed to let you search for files on a remote web directory. This program guarantees the most complicated filenames can also be retrieved. Appnimi Web Directory Buster allows to search for the files on the web...
2.11 MB  
File Cataloguers  -  PrintFolders 2.0
PrintFolders provides you an easy way to print contents of folders into a plain-text (TXT) or HTML file. It is a simple and easy-to-use utility which helps you to catalogue masses of files automatically. That is especially useful for listing MP3...
408 KB  
Programming  -  GenJar 1.0.2
GenJar is a specialized Ant task that builds jar files based on class dependencies rather than simply the contents of a directory.
38.59 KB  
Utilities  -  Syncranator 0.9.0
A free, simple and fast utility to synchronise the contents of two directory trees.
83.57 KB  
Authentication  -  adLDAP 2.1
This project is really to help others in getting the whole LDAP SSL Active Directory puzzle working natively on Linux. Given the varied nature of organisations and sites, adLDAP may not be _your_ complete solution, but it should be a very sound...
 
Programming  -  nautilus-python 0.7.0
These are unstable bindings for the nautilus extension library introduced in Gnome 2.6. For examples and documentation check the examples sub directory. Note that scripts are loaded from...
327.68 KB  
Virus Removers  -  MJ Registry Watcher 1.2.6.1
MJ Registry Watcher is a simple registry, file and directory hooker/poller, that safeguards the most important startup files, registry keys and values, and other more exotic registry locations commonly attacked by trojans. It has very low resource...
575 KB  
Search & Replace Tools  -  FileSearchy Pro 1.0
Quick and convenient utility to find files on your computer. Instantly find files by name or make advanced searches in file contents, date and size. Supports search inside many popular file formats. Highlights found text in file names and...
7.8 MB  
NEW DOWNLOADS IN SHELL & DESKTOP, FILE UTILITIES
Shell & Desktop  -  Glunarclock 0.32.4
GNOME Lunar Clock Applet displays the current phase of the Moon as an applet for the gnome panel. In the properties box you can choose between a real image Features Pointing with the mouse at the applet...
522.24 KB  
Shell & Desktop  -  Fekete 5
Icon theme for Linux For all possible desktop, and Linux distro Special additives: Suse's Yast icons. Xfce system icons, and archaic mimetypes icons Mandriva "special placed" status icons. Libreoffice icons.
71.59 MB  
Shell & Desktop  -  XFast 0.9
XFast is a slim and lightweighted desktop environment that incorporates X and Window Manager within the same project.
1.15 MB  
Shell & Desktop  -  print selection konqueror service menu 0.1
This service menu give you a *silly* way to print fast your selection on konqueror USE select the text copy the text rigt button on the webpage select print selection a kdialog will appear paste the text
10.24 KB  
Shell & Desktop  -  Faenza 1.2
Faenza icon theme is available to install for Ubuntu users via a PPA repository. View the README file for instructions and a list of known issues.
23.49 MB  
File Utilities  -  Active@ KillDisk Linux Console 9.1.1110
Active@ KillDisk for Linux (Console) is a powerful utility that will: wipe confidential data from unused space on your hard drive; erase data from partitions or from an entire hard disk; destroy data permanently. Active@ KillDisk for Linux...
11.07 MB  
File Utilities  -  Metalinks 5.1
Metalinks is a project to facilitate data distribution over mirrors and P2P networks. It does so by defining an XML format and the tools to handle these. The metalink files contain all the information needed to download and verify files.
5.05 MB  
File Utilities  -  PUFS 0.0.2c
PUFS - Peer Union File System - is a poor man's na???ve distributed file system built on top of FUSE, hence running totally in user space. The project is distributed under the GPL license. PUFS' philosophy is somewhat in line...
408.58 KB  
File Utilities  -  frfs 0.0.3
frfs implements a fully functional in-RAM filesystem using the FUSE framework. Overview: With Linux, creating RAM-backed file system is easy: su to root, mount a tmpfs some place, come back to plain user. Ah, but...
153.6 KB  
File Utilities  -  twander 3.231
'twander' is a macro-programmable Filesystem Browser that runs on both Unix-like systems as well as Win32 systems. It embraces the best ideas of both similar GUI-driven programs (Konqueror, Windows Explorer) as well as text-based interfaces...
737.28 KB