Download Shareware and Freeware Software for Windows, Linux, Macintosh, PDA

line Home  |  About Us  |  Link To Us  |  FAQ  |  Contact

Serving Software Downloads in 976 Categories, Downloaded 32.285.131 Times

Text::Scraper 0.02

  Date Added: March 23, 2010  |  Visits: 744

Text::Scraper

Report Broken Link
Printer Friendly Version


Product Homepage
Download (111 downloads)

Text::Scraper contains structured data from (un)structured text. SYNOPSIS use Text::Scraper; use LWP::Simple; use Data::Dumper; # # 1. Get our template and source text # my $tmpl = Text::Scraper->slurp(*DATA); my $src = get(http://search.cpan.org/recent) || die $!; # # 2. Extract data from source # my $obj = Text::Scraper->new(tmpl => $tmpl); my $data = $obj->scrape($src); # # 3. Do something really neat...(left as excercise) # print "Newest Submission: ", $data->[0]{submissions}[0]{name}, "nn"; print "Scraper model:n", Dumper($obj), "nn"; print "Parsed model:n", Dumper($data) , "nn"; __DATA__ < div class=path>< center>< table>< tr> < ?tmpl stuff pre_nav ?> < td class=datecell>< span>< big>< b> < ?tmpl var date_string ?> < /b>< /big>< /span>< /td> < ?tmpl stuff post_nav ?> < /tr>< /table>< /center>< /div> < ul> < ?tmpl loop submissions ?> < li>< a href="< ?tmpl var link ?>">< ?tmpl var name ?>< /a> < ?tmpl if has_description ?> < small> -- < ?tmpl var description ?>< /small> < ?tmpl end has_description ?> < /li> < ?tmpl end submissions ?> < /ul> ABSTRACT Text::Scraper provides a fully functional base-class to quickly develop Screen-Scrapers and other text extraction tools. Programmatically generated text such as dynamic webpages are trivially reversed engineered. Using templates, the programmer is freed from staring at fragile, heavily escaped regular expressions, mapping capture groups to named variables or wrestling with the DOM and badly formed HTML. In addition, extracted data can be hierarchical, which is beyond the capabilities of vanilla regular expressions. Text::Scrapers functionality overlaps some existing CPAN modules - Template::Extract and WWW::Scraper. Text::Scraper is much more lightweight than either and has a more general application domain than the latter. It has no dependencies on other frameworks, modules or design-decisions. On average, Text::Scraper benchmarks around 250% faster than Template::Extract - and uses significantly less memory. Unlike both existing modules, Text::Scraper generalizes its functionality to allow the programmer to refine template capture groups beyond (.*?), fully redefine the template syntax and introduce new template constructs bound to custom classes..

Requirements: No special requirements
Platforms: Linux
Keyword: Data Data From Libraries Programming Structured Structured Data Template Textscraper Tmpl Tmpl Var Un Var
Users rating: 0/10

License: Freeware Size: 46.08 KB
TEXT::SCRAPER RELATED
Libraries  -  Audio::Data 1.029
Audio::Data is a module for representing audio data to perl. SYNOPSIS use Audio::Data; my $audio = Audio::Data->new(rate => , ...); $audio->method(...) $audio OP ... Audio::Data represents audio data to perl in a fairly compact and...
88.06 KB  
Libraries  -  LBC Libraries 0.0.7
LBC Libraries is a set of easy C libraries that provide classical data types and structures (string, hash, queue, stack, tree). LBC Libraries package is focused on simplicity and portability. It uses ANSI C (C98 standard not required), and...
57.34 KB  
Libraries  -  C Generic Library 0.4.2
C Generic Library is a generic data structure library is a bunch of data structures that are designed and created in as generic a fashion as possible. Each data structure will contain its own basic memory management, be able to handle any object...
26.62 KB  
Libraries  -  Test::Data 1.20
Test::Data is a Perl module to test functions for particular variable types. SYNOPSIS use Test::Data qw(Scalar Array Hash Function); Test::Data provides utility functions to check properties and values of data and variables. Functions...
8.19 KB  
Libraries  -  Data::Serializer 0.41
Data::Serializer package contains modules that serialize data structures. SYNOPSIS use Data::Serializer; $obj = Data::Serializer->new(); $obj = Data::Serializer->new( serializer => Storable, digester => MD5, cipher => DES, secret => my...
25.6 KB  
Libraries  -  Data::TreeDumper 0.33
Data::TreeDumper is an improved replacement for Data::Dumper. Powerful filtering capability. SYNOPSIS use Data::TreeDumper ; my $sub = sub {} ; my $s = { A => { a => { } , bbbbbb => $sub , c123 => $sub , d => $sub } , C => {...
26.62 KB  
File Utilities  -  Data::Locations 5.4
Data::Locations - magic insertion points in your data Did you already encounter the problem that you had to produce some data in a particular order, but that some piece of the data was still unavailable at the point in the sequence where it...
44.03 KB  
File Renamers  -  DiskGetor Data Recovery Free 2.05 2.05
DiskGetor Data Recovery Free : 1, Increased ability of undeletion; 2, Recover system destroyed files and recover deleted ( word, excel ,photo,dwg ,cad,office ppt etc important data) files names perfectly which were erased in Recycle Bin ; e...
2.6 MB  
Backup Utilities  -  Mac iPhone Data Recovery 1.0.0
Mac iPhone Data Recovery is the world's best data recovery software for iPhone from Mac Platform. This fantastic date recovery can support all kinds of iPhone models, like the popular iPhone4s, iPhone4, iPhone3GS, and the previous version are also...
14.8 MB  
Libraries  -  ULDBF 0.0.8
ULBC project is a set of ANSI C Libraries to add typical data types like Strings, Queues, Stacks, Hashes, Trees, etc with its correspondent API to manage them. This initiative is the base for bigger tools that will use ULBC as its general...
68.61 KB  
NEW DOWNLOADS IN PROGRAMMING, LIBRARIES
Programming  -  Cedalion for Linux 0.2.6
Cedalion is a programming language that allows its users to add new abstractions and define (and use) internal DSLs. Its innovation is in the fact that it uses projectional editing to allow the new abstractions to have no syntactic limitations.
471.04 KB  
Programming  -  Math::GMPf 0.29
Math::GMPf - perl interface to the GMP library's floating point (mpf) functions.
30.72 KB  
Programming  -  Net::Wire10 1.08
Net::Wire10 is a Pure Perl connector that talks to Sphinx, MySQL and Drizzle servers. Net::Wire10 implements the low-level network protocol, alias the MySQL wire protocol version 10, necessary for talking to one of the aforementioned...
30.72 KB  
Programming  -  logilab-common 0.56.2
a bunch of modules providing low level functionnalities shared among some python projects devel Please note that some of the modules have some extra dependencies. For instance, logilab.common.db will require a db-api 2.0 compliant...
174.08 KB  
Programming  -  OpenSSL for linux 1.0.0a
The OpenSSL Project is a collaborative effort to develop a robust, commercial-grade, full-featured, and Open Source toolkit implementing the Secure Sockets Layer (SSL v2/v3) and Transport Layer Security (TLS v1) protocols as well as a...
3.83 MB  
Libraries  -  wolfSSL 3.15.3
The wolfSSL embedded SSL/TLS library is a lightweight SSL library written in ANSI standard C and targeted for embedded and RTOS environments - primarily because of its small size, speed, and feature set. It is commonly used in standard operating...
3.88 MB  
Libraries  -  EuGTK 4.8.9
Makes it easy to develop good- looking, fast, cross-platform programs that run on Linux, OS X, and Windows. Euphoria is a very fast interpreted/compiled language with straight-forward syntax. EuGTK allows programming in a clean, object-oriented...
10.68 MB  
Libraries  -  Linux User Group Library Manager 1.0
The LUG Library Manager is a project to help Linux User Groups start their own library. A LUG library is helpful to the community at large because it increases access to information, and gives everyone the opportunity to become more knowledgeable.
5.35 KB  
Libraries  -  Module::MakefilePL::Parse 0.12
Module::MakefilePL::Parse is a Perl module to parse required modules from Makefile.PL. SYNOPSIS use Module::MakefilePL::Parse; open $fh, Makefile.PL; $parser = Module::MakefilePL::Parse->new( join("", ) ); $info = $parser->required;...
8.19 KB  
Libraries  -  sqlpp 0.06
sqlpp Perl package is a SQL preprocessor. sqlpp is a conventional cpp-alike preprocessor taught to understand SQL ( PgSQL, in particular) syntax specificities. In addition to the standard #define/#ifdef/#else/#endif cohort, provides also...
10.24 KB