Download Shareware and Freeware Software for Windows, Linux, Macintosh, PDA

line Home  |  About Us  |  Link To Us  |  FAQ  |  Contact

Serving Software Downloads in 976 Categories, Downloaded 29.543.302 Times

dupfinder for Linux 1.4.3

Company: Andriy Mylenkyy
Date Added: September 04, 2013  |  Visits: 256

dupfinder for Linux

Report Broken Link
Printer Friendly Version


Product Homepage
Download (20 downloads)



dupfind is a Python utility that allows you to find duplicated files and directories in your file system.<br /><br />Show how utility find duplicated files:<br /><br />By default utility identifies duplication files by file content.<br /><br />First of all - create several different files in the current directory.<br /><br /> >>> createFile('tfile1.txt', "A"*10)<br /> >>> createFile('tfile2.txt', "A"*1025)<br /> >>> createFile('tfile3.txt', "A"*2048)<br /><br />Then create other files in another directory, one of them to be the same as already created ones.<br /><br /> >>> mkd("dir1")<br /> >>> createFile('tfile1.txt', "A"*20, "dir1")<br /> >>> createFile('tfile2.txt', "A"*1025, "dir1")<br /> >>> createFile('tfile13.txt', "A"*48, "dir1")<br /><br />Look into the directories contents:<br /><br /> >>> ls()<br /> === list directory ===<br /> D :: dir1 :: ...<br /> F :: tfile1.txt :: 10<br /> F :: tfile2.txt :: 1025<br /> F :: tfile3.txt :: 2048<br /><br /> >>> ls("dir1")<br /> === list dir1 directory ===<br /> F :: tfile1.txt :: 20<br /> F :: tfile13.txt :: 48<br /> F :: tfile2.txt :: 1025<br /><br />We see, that "tfile2.txt" is same in both directories, while "tfile1.txt" - has the same name, but differs in size. So utility must identify only "tfile2.txt" as a duplication file.<br /><br />We force output results with "-o " argument to outputf file, and pass testdir as directory that is looking for duplications.<br /><br /> >>> dupfind("-o %(o)s %(dir)s" % {'o':outputf, 'dir': testdir})<br /><br />Now check the results file for duplications.<br /><br /> >>> cat(outputf)<br /> hash,size,type,ext,name,directory,modification,operation,operation_data<br /> ...,1025,F,txt,tfile2.txt,.../tmp.../dir1,...<br /> ...,1025,F,txt,tfile2.txt,.../tmp...,...<br /><br />Show quick/slow utility mode:<br /><br />As mentioned above - utility identifies duplication files by file contents. This mode slows down the system and consumes a lot of system resources.<br /><br />However, in most cases the file name and size is enough to identify the duplication. So in that case you can use quick mode --quick (-q) option.<br /><br />So test the previous files in the quick mode:<br /><br /> >>> dupfind("-q -o %(o)s %(dir)s" % {'o':outputf, 'dir': testdir})<br /><br />Now check the result file for duplications.<br /><br /> >>> cat(outputf)<br /> hash,size,type,ext,name,directory,modification,operation,operation_data<br /> ...,1025,F,txt,tfile2.txt,.../tmp.../dir1,...<br /> ...,1025,F,txt,tfile2.txt,.../tmp...,...<br /><br />As we can see the quick mode identifies duplications correctly.<br /><br />Let's show that there are cases when this mode can lead to mistakes. To do that let's add a file with the same name and size but different content and apply utility in both modes:<br /><br /> >>> createFile('tfile000.txt', "First "*20,)<br /> >>> createFile('tfile000.txt', "Second "*20, "dir1")<br /><br />Now check the duplication results using default (not quick mode) ...<br /><br /> >>> dupfind(" -o %(o)s %(dir)s" % {'o':outputf, 'dir': testdir})<br /> >>> cat(outputf)<br /> hash,size,type,ext,name,directory,modification,operation,operation_data<br /> ...,1025,F,txt,tfile2.txt,.../tmp.../dir1,...<br /> ...,1025,F,txt,tfile2.txt,.../tmp...,...<br /><br />As we can see not-quick mode identifies duplications correctly.<br /><br />Let's check duplications using the quick mode...<br /><br /> >>> dupfind(" -q -o %(o)s %(dir)s" % {'o':outputf, 'dir': testdir})<br /> >>> cat(outputf)<br /> hash,size,type,ext,name,directory,modification,operation,operation_data<br /> ...,140,F,txt,tfile000.txt,.../tmp.../dir1,...<br /> ...,140,F,txt,tfile000.txt,.../tmp...,...<br /> ...,1025,F,txt,tfile2.txt,.../tmp.../dir1,...<br /> ...,1025,F,txt,tfile2.txt,.../tmp...,...<br /><br />As we can see wrong duplications are found using the quick-mode.<br /><br />Cleanup the test<br /><br /> >>> cleanTestDir()<br /><br />Show how utility finds duplicated directories:<br /><br />Utility identifies duplicated directories as directories, all files of which are duplicated and all inner directories are also duplicated directories.<br />First compare 2 directories with the same files.<br /><br />Create directories with the same content.<br /><br /> >>> def mkDir(dpath):<br /> ... mkd(dpath)<br /> ... createFile('tfile1.txt', "A"*10, dpath)<br /> ... createFile('tfile2.txt', "A"*1025, dpath)<br /> ... createFile('tfile3.txt', "A"*2048, dpath)<br /> ...<br /> >>> mkDir("dir1")<br /> >>> mkDir("dir2")<br /><br />Confirm that the directories' contents are really identical<br /><br /> >>> ls("dir1")<br /> === list dir1 directory ===<br /> F :: tfile1.txt :: 10<br /> F :: tfile2.txt :: 1025<br /> F :: tfile3.txt :: 2048<br /><br /> >>> ls("dir2")<br /> === list dir2 directory ===<br /> F :: tfile1.txt :: 10<br /> F :: tfile2.txt :: 1025<br /> F :: tfile3.txt :: 2048<br /><br />Now run the utility and check the result file:<br /><br /> >>> dupfind("-o %(o)s %(dir)s" % {'o':outputf, 'dir': testdir})<br /> >>> cat(outputf)<br /> hash,size,type,ext,name,directory,modification,operation,operation_data<br /> ...,D,,dir1,...<br /> ...,D,,dir2,...<br /><br />Compare 2 directories with the same files and dirs.<br /><br />Create new directories with the same content, but different names in previously created directories.<br /><br />So for directories to be interpreted as duplications - they don't need to have the same name, but the identical content.<br /><br />Add 2 identical directories to the previous ones.<br /><br /> >>> def mkDir1(dpath):<br /> ... mkd(dpath)<br /> ... createFile('tfile11.txt', "B"*4000, dpath)<br /> ... createFile('tfile12.txt', "B"*222, dpath)<br /> ...<br /> >>> mkDir1("dir1/dir11")<br /> >>> mkDir1("dir2/dir21")<br /><br />Note that we added two directories with same contents, but different names. This should not break duplications.<br /><br /> >>> def mkDir2(dpath):<br /> ... mkd(dpath)<br /> ... createFile('tfile21.txt', "C"*4096, dpath)<br /> ... createFile('tfile22.txt', "C"*123, dpath)<br /> ... createFile('tfile23.txt', "C"*444, dpath)<br /> ... createFile('tfile24.txt', "C"*555, dpath)<br /> ...<br /> >>> mkDir2("dir1/dir22")<br /> >>> mkDir2("dir2/dir22")<br /><br />Confirm that directories' contents are really identical<br /><br /> >>> ls("dir1")<br /> === list dir1 directory ===<br /> D :: dir11 :: -1<br /> D :: dir22 :: -1<br /> F :: tfile1.txt :: 10<br /> F :: tfile2.txt :: 1025<br /> F :: tfile3.txt :: 2048<br /> >>> ls("dir2")<br /> === list dir2 directory ===<br /> D :: dir21 :: -1<br /> D :: dir22 :: -1<br /> F :: tfile1.txt :: 10<br /> F :: tfile2.txt :: 1025<br /> F :: tfile3.txt :: 2048<br /><br />And contents for inner directories<br /><br />First subdirectory:<br /><br /> >>> ls("dir1/dir11")<br /> === list dir1/dir11 directory ===<br /> F :: tfile11.txt :: 4000<br /> F :: tfile12.txt :: 222<br /> >>> ls("dir2/dir21")<br /> === list dir2/dir21 directory ===<br /> F :: tfile11.txt :: 4000<br /> F :: tfile12.txt :: 222<br /><br />Second subdirectory:<br /><br /> >>> ls("dir1/dir22")<br /> === list dir1/dir22 directory ===<br /> F :: tfile21.txt :: 4096<br /> F :: tfile22.txt :: 123<br /> F :: tfile23.txt :: 444<br /> F :: tfile24.txt :: 555<br /> >>> ls("dir2/dir22")<br /> === list dir2/dir22 directory ===<br /> F :: tfile21.txt :: 4096<br /> F :: tfile22.txt :: 123<br /> F :: tfile23.txt :: 444<br /> F :: tfile24.txt :: 555<br /><br />Now test the utility.<br /><br /> >>> dupfind("-o %(o)s %(dir)s" % {'o':outputf, 'dir': testdir})<br /><br />Checks the results file for duplications.<br /><br /> >>> cat(outputf)<br /> hash,size,type,ext,name,directory,modification,operation,operation_data<br /> ...,D,,dir1,...<br /> ...,D,,dir2,...<br /><br />NOTE:<br /><br />Inner duplication directories are excluded from the results:<br /><br /> >>> outputres = file(outputf).read()<br /> >>> "dir1/dir11" in outputres<br /> False<br /> >>> "dir1/dir22" in outputres<br /> False<br /> >>> "dir2/dir21" in outputres<br /> False<br /> >>> "dir2/dir22" in outputres<br /> False<br /><br />Utility accepts more than one argument as directories list:<br /><br />Use previous directory structure to prove this:<br /><br />Now pass to utility "dir1/dir11" and "dir2" directories:<br /><br /> >>> dupfind("-o %(o)s %(dir1-11)s %(dir2)s" % {<br /> ... 'o':outputf,<br /> ... 'dir1-11': os.path.join(testdir,"dir1/dir11"),<br /> ... 'dir2': os.path.join(testdir,"dir2"),})<br /><br />Now check the result file for duplications.<br /><br /> >>> cat(outputf)<br /> hash,size,type,ext,name,directory,modification,operation,operation_data<br /> ...,D,,dir11,.../tmp.../dir1,...<br /> ...,D,,dir21,.../tmp.../dir2,...<br /><br />DUPMANAGE UTILITY:<br /><br />dupmanage utility allows you to manage duplication files and directories of your file system with csv data file.<br /><br />Utility use csv-formatted data-file to process duplication items. Data file must contain the following columns:<br /><br /> * type<br /> * name<br /> * directory<br /> * operation<br /> * operation_data<br /><br />Utility supports 2 types of operations with duplication items:<br /><br /> * deleting ("D")<br /> * symlinking ("L") only for UNIX-like systems<br /><br />operation_data is only used for symlinking operation and must contain the path to symlinking sorce item.<br />Show how utility manages duplications:<br /><br />To show - use previous directory structure and also add several duplications:<br /><br />Create a file in the root directory and the same file in another catalog.<br /><br /> >>> createFile('tfile03.txt', "D"*100)<br /> >>> mkd("dir3")<br /> >>> createFile('tfile03.txt', "D"*100, "dir3")<br /><br />Look into directories contents:<br /><br /> >>> ls()<br /> === list directory ===<br /> D :: dir1 :: ...<br /> D :: dir2 :: ...<br /> D :: dir3 :: ...<br /> F :: tfile03.txt :: 100<br /><br /> >>> ls("dir3")<br /> === list dir3 directory ===<br /> F :: tfile03.txt :: 100<br /><br />We already know the previous duplications, so now we create csv-formatted data file to manage duplications.<br /><br /> >>> manage_data = """type,name,directory,operation,operation_data<br /> ... F,tfile03.txt,%(testdir)s/dir3,L,%(testdir)s/tfile03.txt<br /> ... D,dir2,%(testdir)s,D,<br /> ... """ % {'testdir': testdir}<br /> >>> createFile('manage.csv', manage_data)<br /><br />Now call the utility and check result directory content:<br /><br /> >>> manage_path = os.path.join(testdir, 'manage.csv')<br /> >>> dupmanage("%s -v" % manage_path)<br /> [...<br /> [...]: Symlink .../tfile03.txt item to .../dir3/tfile03.txt<br /> [...]: Remove .../dir2 directory<br /> [...]: Processed 2 items<br /><br />Review directory content:<br /><br /> >>> ls()<br /> === list directory ===<br /> D :: dir1 :: ...<br /> D :: dir3 :: ...<br /> F :: tfile03.txt :: 100<br /><br /> >>> ls("dir3")<br /> === list dir3 directory ===<br /> L :: tfile03.txt :: ...<br /><br />#md5=25b2b06a31e74ce1e59a23de682af1ff<br /><br />#md5=5b0697388a8a9913a0ffc56a5e38926a

Requirements: No special requirements
Platforms: *nix, Linux
Keyword: 3d3d3d Catoutputf Check Contents Dir Directories Directory Dirsquot Dpath Dupfinder Dupfinder Linux Duplicated Duplication Duplications Files Gtgtgt Linux Outputf Quick Tfile Txt Utility
Users rating: 0/10

License: Freeware Size: 10.24 KB
USER REVIEWS
More Reviews or Write Review


DUPFINDER FOR LINUX RELATED
Desktop Monitors  -  365DirMon (directory monitor expert) 1.0
Directory Monitor is a free software that allows you to monitor the files in the directories on your computer. It monitors the directories for any changes including file modifications, deletions and new files. Support for all versions of Windows...
1.69 MB  
FTP Clients  -  SynchronEX Backup & FTP 2.1
SynchronEX is a versatile tool for synchronization, backup and FTP of files and directories - optimized for one-click automation and supporting shell usage. Possible applications are laptop/server-synchronisation, backup and incremental backup...
600 KB  
Utilities  -  Dir 2 File 1.2.1
Dir 2 File is a small utility that allows you to export directory listings to HTML, plain text, CSV, or Excel files quickly and extremely easily. Includes options for HTML page customisation, and a backup facility to quickly backup files in a...
2.3 MB  
File Cataloguers  -  PrintFolders 2.0
PrintFolders provides you an easy way to print contents of folders into a plain-text (TXT) or HTML file. It is a simple and easy-to-use utility which helps you to catalogue masses of files automatically. That is especially useful for listing MP3...
408 KB  
Programming  -  GenJar 1.0.2
GenJar is a specialized Ant task that builds jar files based on class dependencies rather than simply the contents of a directory.
38.59 KB  
Utilities  -  Syncranator 0.9.0
A free, simple and fast utility to synchronise the contents of two directory trees.
83.57 KB  
Programming  -  nautilus-python 0.7.0
These are unstable bindings for the nautilus extension library introduced in Gnome 2.6. For examples and documentation check the examples sub directory. Note that scripts are loaded from...
327.68 KB  
Security Tools  -  Appnimi Web Directory Buster 1.0
Appnimi Web Directory Buster is designed to let you search for files on a remote web directory. This program guarantees the most complicated filenames can also be retrieved. Appnimi Web Directory Buster allows to search for the files on the web...
2.11 MB  
Productivity  -  Console WP8 Lite 1.6.0.0
Execute commands 'prompt windows' from your Windows phone. The only shell app on Windows phone ! Access to over 45 functions with the commands ("MS-DOS") - Show the contents of a directory - Read the contents of a file - Edit a...
3 MB  
Virus Removers  -  MJ Registry Watcher 1.2.6.1
MJ Registry Watcher is a simple registry, file and directory hooker/poller, that safeguards the most important startup files, registry keys and values, and other more exotic registry locations commonly attacked by trojans. It has very low resource...
575 KB  
NEW DOWNLOADS IN SHELL & DESKTOP, FILE UTILITIES
Shell & Desktop  -  Glunarclock 0.32.4
GNOME Lunar Clock Applet displays the current phase of the Moon as an applet for the gnome panel. In the properties box you can choose between a real image Features Pointing with the mouse at the applet...
522.24 KB  
Shell & Desktop  -  KOpenBabel 0.2
KOpenBabel is a graphical interface to Open Babel.[1] KOpenBabel can handle and convert over 70 chemical file formats. At this time, it can convert files, guess input file type and convert a large number of files with a single click. The user...
20.48 KB  
Shell & Desktop  -  Fekete 5
Icon theme for Linux For all possible desktop, and Linux distro Special additives: Suse's Yast icons. Xfce system icons, and archaic mimetypes icons Mandriva "special placed" status icons. Libreoffice icons.
71.59 MB  
Shell & Desktop  -  DesktopTools 02-alpha
DesktopTools is a collection of small utillities which help to make your daily life easier. Since the tools themself are rather small (projectwise) I keep them not as individual projects, but have them under this collective name.
102.4 KB  
Shell & Desktop  -  XFast 0.9
XFast is a slim and lightweighted desktop environment that incorporates X and Window Manager within the same project.
1.15 MB  
File Utilities  -  Active@ KillDisk Linux Console 9.1.1110
Active@ KillDisk for Linux (Console) is a powerful utility that will: wipe confidential data from unused space on your hard drive; erase data from partitions or from an entire hard disk; destroy data permanently. Active@ KillDisk for Linux...
11.07 MB  
File Utilities  -  Metalinks 5.1
Metalinks is a project to facilitate data distribution over mirrors and P2P networks. It does so by defining an XML format and the tools to handle these. The metalink files contain all the information needed to download and verify files.
5.05 MB  
File Utilities  -  SSH LoginWatcher 0.9
LoginWatcher is tailing your messages file and is waiting for an entry representing a failed login attempt via SSH. After a predefined number of attempts, the IP address of the offending host is added to the hosts.deny file to prevent further logins.
10.24 KB  
File Utilities  -  PUFS 0.0.2c
PUFS - Peer Union File System - is a poor man's na???ve distributed file system built on top of FUSE, hence running totally in user space. The project is distributed under the GPL license. PUFS' philosophy is somewhat in line...
408.58 KB  
File Utilities  -  frfs 0.0.3
frfs implements a fully functional in-RAM filesystem using the FUSE framework. Overview: With Linux, creating RAM-backed file system is easy: su to root, mount a tmpfs some place, come back to plain user. Ah, but...
153.6 KB