Download Shareware and Freeware Software for Windows, Linux, Macintosh, PDA

line Home  |  About Us  |  Link To Us  |  FAQ  |  Contact

Serving Software Downloads in 976 Categories, Downloaded 29.881.926 Times

Text::Scraper 0.02

  Date Added: March 23, 2010  |  Visits: 700

Text::Scraper

Report Broken Link
Printer Friendly Version


Product Homepage
Download (111 downloads)



Text::Scraper contains structured data from (un)structured text. SYNOPSIS use Text::Scraper; use LWP::Simple; use Data::Dumper; # # 1. Get our template and source text # my $tmpl = Text::Scraper->slurp(*DATA); my $src = get(http://search.cpan.org/recent) || die $!; # # 2. Extract data from source # my $obj = Text::Scraper->new(tmpl => $tmpl); my $data = $obj->scrape($src); # # 3. Do something really neat...(left as excercise) # print "Newest Submission: ", $data->[0]{submissions}[0]{name}, "nn"; print "Scraper model:n", Dumper($obj), "nn"; print "Parsed model:n", Dumper($data) , "nn"; __DATA__ < div class=path>< center>< table>< tr> < ?tmpl stuff pre_nav ?> < td class=datecell>< span>< big>< b> < ?tmpl var date_string ?> < /b>< /big>< /span>< /td> < ?tmpl stuff post_nav ?> < /tr>< /table>< /center>< /div> < ul> < ?tmpl loop submissions ?> < li>< a href="< ?tmpl var link ?>">< ?tmpl var name ?>< /a> < ?tmpl if has_description ?> < small> -- < ?tmpl var description ?>< /small> < ?tmpl end has_description ?> < /li> < ?tmpl end submissions ?> < /ul> ABSTRACT Text::Scraper provides a fully functional base-class to quickly develop Screen-Scrapers and other text extraction tools. Programmatically generated text such as dynamic webpages are trivially reversed engineered. Using templates, the programmer is freed from staring at fragile, heavily escaped regular expressions, mapping capture groups to named variables or wrestling with the DOM and badly formed HTML. In addition, extracted data can be hierarchical, which is beyond the capabilities of vanilla regular expressions. Text::Scrapers functionality overlaps some existing CPAN modules - Template::Extract and WWW::Scraper. Text::Scraper is much more lightweight than either and has a more general application domain than the latter. It has no dependencies on other frameworks, modules or design-decisions. On average, Text::Scraper benchmarks around 250% faster than Template::Extract - and uses significantly less memory. Unlike both existing modules, Text::Scraper generalizes its functionality to allow the programmer to refine template capture groups beyond (.*?), fully redefine the template syntax and introduce new template constructs bound to custom classes..

Requirements: No special requirements
Platforms: Linux
Keyword: Data Data From Libraries Programming Structured Structured Data Template Textscraper Tmpl Tmpl Var Un Var
Users rating: 0/10

License: Freeware Size: 46.08 KB
USER REVIEWS
More Reviews or Write Review


TEXT::SCRAPER RELATED
Libraries  -  Audio::Data 1.029
Audio::Data is a module for representing audio data to perl. SYNOPSIS use Audio::Data; my $audio = Audio::Data->new(rate => , ...); $audio->method(...) $audio OP ... Audio::Data represents audio data to perl in a fairly compact and...
88.06 KB  
Libraries  -  LBC Libraries 0.0.7
LBC Libraries is a set of easy C libraries that provide classical data types and structures (string, hash, queue, stack, tree). LBC Libraries package is focused on simplicity and portability. It uses ANSI C (C98 standard not required), and...
57.34 KB  
Libraries  -  C Generic Library 0.4.2
C Generic Library is a generic data structure library is a bunch of data structures that are designed and created in as generic a fashion as possible. Each data structure will contain its own basic memory management, be able to handle any object...
26.62 KB  
Libraries  -  Test::Data 1.20
Test::Data is a Perl module to test functions for particular variable types. SYNOPSIS use Test::Data qw(Scalar Array Hash Function); Test::Data provides utility functions to check properties and values of data and variables. Functions...
8.19 KB  
Libraries  -  Data::Serializer 0.41
Data::Serializer package contains modules that serialize data structures. SYNOPSIS use Data::Serializer; $obj = Data::Serializer->new(); $obj = Data::Serializer->new( serializer => Storable, digester => MD5, cipher => DES, secret => my...
25.6 KB  
Libraries  -  Data::TreeDumper 0.33
Data::TreeDumper is an improved replacement for Data::Dumper. Powerful filtering capability. SYNOPSIS use Data::TreeDumper ; my $sub = sub {} ; my $s = { A => { a => { } , bbbbbb => $sub , c123 => $sub , d => $sub } , C => {...
26.62 KB  
File Utilities  -  Data::Locations 5.4
Data::Locations - magic insertion points in your data Did you already encounter the problem that you had to produce some data in a particular order, but that some piece of the data was still unavailable at the point in the sequence where it...
44.03 KB  
File Renamers  -  DiskGetor Data Recovery Free 2.05 2.05
DiskGetor Data Recovery Free : 1, Increased ability of undeletion; 2, Recover system destroyed files and recover deleted ( word, excel ,photo,dwg ,cad,office ppt etc important data) files names perfectly which were erased in Recycle Bin ; e...
2.6 MB  
Backup Utilities  -  Mac iPhone Data Recovery 1.0.0
Mac iPhone Data Recovery is the world's best data recovery software for iPhone from Mac Platform. This fantastic date recovery can support all kinds of iPhone models, like the popular iPhone4s, iPhone4, iPhone3GS, and the previous version are also...
14.8 MB  
Libraries  -  ULDBF 0.0.8
ULBC project is a set of ANSI C Libraries to add typical data types like Strings, Queues, Stacks, Hashes, Trees, etc with its correspondent API to manage them. This initiative is the base for bigger tools that will use ULBC as its general...
68.61 KB  
NEW DOWNLOADS IN PROGRAMMING, LIBRARIES
Programming  -  FLEX-db Digital Asset Manager 3.0.9
FLEX-db - an enterprise Digital Asset Manager (DAM). It ingests and links metadata with files, creates thumbnails, and processes files using business rules. FLEX-db has a JSP client, Java app server for file input and output and an EJB metadata...
21.57 MB  
Programming  -  Libicom 0.9.0
The libicom library is a character based dynamicly linked library for Linux. It is used to remotely control the Icom IC-R8500 wide band receiver via an RS232 link. All call and return parameters to the control functions are character string based....
20.48 KB  
Programming  -  dotdesktop 0.3
Dotdesktop library provides ability to parse desktop entry file and access the information in a convenient way. Desktop entry file format is defined by freedesktop.org, it is used to describe information about an application such as the name and...
327.68 KB  
Programming  -  Cedalion for Linux 0.2.6
Cedalion is a programming language that allows its users to add new abstractions and define (and use) internal DSLs. Its innovation is in the fact that it uses projectional editing to allow the new abstractions to have no syntactic limitations.
471.04 KB  
Programming  -  libyasl 0.2
Libyasl is a C++ class library to easily realize TCP/UDP/Multicast clientsand servers in IPv4 and IPv6 environments under GNU/Linux systems.
143.36 KB  
Libraries  -  wolfSSL 3.11.0
The wolfSSL embedded SSL/TLS library is a lightweight SSL library written in ANSI standard C and targeted for embedded and RTOS environments - primarily because of its small size, speed, and feature set. It is commonly used in standard operating...
2.73 MB  
Libraries  -  EuGTK 4.8.9
Makes it easy to develop good- looking, fast, cross-platform programs that run on Linux, OS X, and Windows. Euphoria is a very fast interpreted/compiled language with straight-forward syntax. EuGTK allows programming in a clean, object-oriented...
10.68 MB  
Libraries  -  Linux User Group Library Manager 1.0
The LUG Library Manager is a project to help Linux User Groups start their own library. A LUG library is helpful to the community at large because it increases access to information, and gives everyone the opportunity to become more knowledgeable.
5.35 KB  
Libraries  -  Module::MakefilePL::Parse 0.12
Module::MakefilePL::Parse is a Perl module to parse required modules from Makefile.PL. SYNOPSIS use Module::MakefilePL::Parse; open $fh, Makefile.PL; $parser = Module::MakefilePL::Parse->new( join("", ) ); $info = $parser->required;...
8.19 KB  
Libraries  -  sqlpp 0.06
sqlpp Perl package is a SQL preprocessor. sqlpp is a conventional cpp-alike preprocessor taught to understand SQL ( PgSQL, in particular) syntax specificities. In addition to the standard #define/#ifdef/#else/#endif cohort, provides also...
10.24 KB