Download Shareware and Freeware Software for Windows, Linux, Macintosh, PDA

line Home  |  About Us  |  Link To Us  |  FAQ  |  Contact

Serving Software Downloads in 976 Categories, Downloaded 29.544.727 Times

Text::Bloom 1.07

  Date Added: August 27, 2010  |  Visits: 604


Report Broken Link
Printer Friendly Version

Product Homepage
Download (72 downloads)

Text::Bloom can evaluate Bloom signature of a set of terms. SYNOPSIS my $b = Text::Bloom->new(); $b->Compute( qw( foo bar baz ) ); my $sig = $b->WriteToString(); $b->WriteToFile( afile.sig ); my $b2 = Text::Bloom::NewFromFile( afile.sig ); my $b3 = Text::Bloom->new(); $b3->Compute( qw( foo bar barbaz ) ); my $sim = $b->Similarity( $b2 ); my $b4 = Text::Bloom::NewFromString( $sig ); Text::Bloom applies the Bloom filtering technique to the statistical analysis of documents. The terms in the document are quantized using a base-36 radix representation; each term thus corresponds to an integer in the range 0..p-1, where p is a prime, currently set to the greatest prime less than 2^32. Each quantized value is mapped to d integers in the range 0..size-1, where size is an integer less than p, currently 2^17, using a family of hash functions, computed by the HashV function. Each hashed value is used as the index in a large bit vector. Bits corresponding to terms present in the document are set to 1; all other bits are set to 0. Of course, collisions may cause the same bit to be set twice, by different terms. It follows that, if the document contains n distinct terms, in the resulting bit vector at most n * d bits are set to 1. The resulting bit string is a very compact representation of the presence/absence of terms in the document, and is therefore characterised as a signature. Moreover, it does not depend on a pre-set dictionary of terms. The signature may be used for: testing whether a given set of terms is present in the document, computing which fraction of terms are common to two documents. The bit representation may be written to and read from a file. Text::Bloom prepends a header to the bit stream proper; moreover, whenever the package Compress::Zlib is available, the bit vector is compressed, so that disk space requirements are drastically reduced, especially for small documents. The hash function is obviously a crucial component of the filter; the reference implementation uses a radix representation of strings. Each term must therefore match the regular expression /[0-9a-z]+/. There are quite a few viable alternatives, which can be pursued by subclassing and redefining the method QuantizeV..

Requirements: No special requirements
Platforms: Linux
Keyword: Bit Bloom Document Libraries Programming Representation Signature Terms Textbloom
Users rating: 0/10

License: Freeware Size: 13.31 KB
More Reviews or Write Review

MS Office Add-Ins  -  BI Share
BI Share, the document management system for SharePoint, broadens, simplifies and accelerates work with document libraries and other types of contents: BI Share will help to turn corporate information into intellectual assets of your organization,...
1.88 MB  
Puzzles  -  X-pired 1.22
X-pired is an action-puzzle game written in C using SDL, SDL_mixer, SDL_image and SDL_gfx libraries distributed under the terms of GNU GPL. The goal of the game in each level is to reach the exit square, avoiding exploding barrels and other...
2.1 MB  
Server Tools  -  SharePoint Search and Replace 1.0.16
The SharePoint Search and Replace package provides a tool that replaces hardcoded strings in SharePoint lists, document libraries, web part properties and text-based documents. SharePoint administrators will find this useful during SharePoint...
266 KB  
Games  -  Castle Hero 0.03
This is Catle Hero, a small text adventure/rpg i'm making to gain a bit more skills in programming.
181.14 KB  
Programming  -  sven for Linux 0.7.2
sven is a document-oriented programming library that helps you put content in SVN. It requires `pysvn` which you will probably want to install system-wide. Basic usage: from sven.backend import SvnAccess ...
10.24 KB  
Business  -  Aviation Docs 1.99.6
Aviation Docs provides pilots, fleet operators, and maintenance personnel with access to Fleet, Aircraft and Aircraft-type specific document libraries on the iPad from anywhere in the world! This application is made available at no charge...
30.8 MB  
Libraries  -  Inline::C 0.44
Inline::C is a Perl module that can Write Perl Subroutines in C. Inline::C is a module that allows you to write Perl subroutines in C. Since version 0.30 the Inline module supports multiple programming languages and each language has its own...
92.16 KB  
Libraries  -  Bio::Ontology::GOterm 1.4
Bio::Ontology::GOterm is a representation of GO terms. SYNOPSIS $term = Bio::Ontology::GOterm->new ( -go_id => "GO:0016847", -name => "1-aminocyclopropane-1-carboxylate synthase", -definition => "Catalysis of ...", -is_obsolete => 0,...
4.7 MB  
Libraries  -  Bit::Vector 6.4
Bit::Vector is an efficient bit vector, set of integers and "big int" math library. CLASS METHODS Version $version = Bit::Vector->Version(); Word_Bits $bits = Bit::Vector->Word_Bits(); # bits in a machine word Long_Bits $bits =...
133.12 KB  
Libraries  -  Module::Signature 0.55
Module::Signature is a Perl module signature file manipulation. SYNOPSIS As a shell command: % cpansign # verify an existing SIGNATURE, or # make a new one if none exists % cpansign sign # make signature; overwrites existing one %...
68.61 KB  
Programming  -  FLEX-db Digital Asset Manager 3.0.9
FLEX-db - an enterprise Digital Asset Manager (DAM). It ingests and links metadata with files, creates thumbnails, and processes files using business rules. FLEX-db has a JSP client, Java app server for file input and output and an EJB metadata...
21.57 MB  
Programming  -  Libicom 0.9.0
The libicom library is a character based dynamicly linked library for Linux. It is used to remotely control the Icom IC-R8500 wide band receiver via an RS232 link. All call and return parameters to the control functions are character string based....
20.48 KB  
Programming  -  dotdesktop 0.3
Dotdesktop library provides ability to parse desktop entry file and access the information in a convenient way. Desktop entry file format is defined by, it is used to describe information about an application such as the name and...
327.68 KB  
Programming  -  Cedalion for Linux 0.2.6
Cedalion is a programming language that allows its users to add new abstractions and define (and use) internal DSLs. Its innovation is in the fact that it uses projectional editing to allow the new abstractions to have no syntactic limitations.
471.04 KB  
Programming  -  libyasl 0.2
Libyasl is a C++ class library to easily realize TCP/UDP/Multicast clientsand servers in IPv4 and IPv6 environments under GNU/Linux systems.
143.36 KB  
Libraries  -  EuGTK 4.8.9
Makes it easy to develop good- looking, fast, cross-platform programs that run on Linux, OS X, and Windows. Euphoria is a very fast interpreted/compiled language with straight-forward syntax. EuGTK allows programming in a clean, object-oriented...
10.68 MB  
Libraries  -  Linux User Group Library Manager 1.0
The LUG Library Manager is a project to help Linux User Groups start their own library. A LUG library is helpful to the community at large because it increases access to information, and gives everyone the opportunity to become more knowledgeable.
5.35 KB  
Libraries  -  Module::MakefilePL::Parse 0.12
Module::MakefilePL::Parse is a Perl module to parse required modules from Makefile.PL. SYNOPSIS use Module::MakefilePL::Parse; open $fh, Makefile.PL; $parser = Module::MakefilePL::Parse->new( join("", ) ); $info = $parser->required;...
8.19 KB  
Libraries  -  sqlpp 0.06
sqlpp Perl package is a SQL preprocessor. sqlpp is a conventional cpp-alike preprocessor taught to understand SQL ( PgSQL, in particular) syntax specificities. In addition to the standard #define/#ifdef/#else/#endif cohort, provides also...
10.24 KB  
Libraries  -  App::SimpleScan::Substitution::Line 2.02
App::SimpleScan::Substitution::Line is a line with optional fixed variable values. SYNOPSIS my $line = App::SimpleScan::Substitution::Line->new(" this "); # Use only this value when substituting " ". $line->fix(substituite =>...
54.27 KB