Text Extraction
PDF to text file converter that allows extracting text from a batch of PDF files. PDF2Text Pilot is an open-source tool. Software developers can use the code as an example of solving a text extraction task. Working from command line is supported.
Platforms: Windows
License: Freeware | Size: 2.04 MB | Download (240): PDF2Text Pilot Download |
Miraplacid Text Driver SDK generates virtual printer driver with all the functionality you find in Miraplacid Text Driver. You can customize it and embed into your software. With the driver generated with Miraplacid Text Driver SDK you can save the extracted information as plain, formatted text...
Platforms: Windows 7, Windows, Other
License: Freeware | Size: 8.81 MB | Download (384): Miraplacid Text Driver SDK Download |
Text::Scraper contains structured data from (un)structured text. SYNOPSIS use Text::Scraper; use LWP::Simple; use Data::Dumper; # # 1. Get our template and source text # my $tmpl = Text::Scraper->slurp(*DATA); my $src = get(http://search.cpan.org/recent) || die $!; # # 2. Extract...
Platforms: *nix
License: Freeware | Size: 46.08 KB | Download (142): Text::Scraper Download |
Miraplacid Text Driver SDK generates virtual printer driver with all the functionality you find in Miraplacid Text Driver Terminal Server Edition. You can customize it and embed into your software. With the driver generated with Miraplacid Text Driver SDK TE, you can save the extracted...
Platforms: Windows 7, Windows, Other
License: Freeware | Size: 8.82 MB | Download (497): Miraplacid Text Driver SDK TE Download |
TextMarker system is a rule-based tool designed for information extraction and text processing tasks. The comprehensible rule language can be easily extended and supports several scripting functionalities. TextMarker uses DLTK and UIMA.
for WindowsAll
Platforms: Windows
License: Freeware | Download (47): TextMarker Download |
HTMLParser is a super-fast real-time parser for real-world HTML. What has attracted most developers to HTMLParser has been its simplicity in design, speed and ability to handle streaming real-world html. The two fundamental use-cases that are handled by the parser are extraction and...
Platforms: *nix
License: Freeware | Size: 4.2 MB | Download (98): HTML Parser Download |
This program will extract the text even from damaged or corrupted Microsoft Office and Open Office files 2.X and 3.X files with the extensions .doc, docx, xls, xlsx, ppt, pptx, odt, ods and odp as well as possibly the template and macro variants of these extensions such as dot, xlt and pps if...
Platforms: Windows
License: Freeware | Size: 10.33 MB | Download (556): Corrupt office2txt Download |
All Free OCR provides an efficient solution for companies and users looking to efficiently manage their documents. It can extract text from images, scanned papers and scanned PDF documents to eliminate the need for retyping. The cutting-edge OCR technology guarantees you highly accurate text...
Platforms: Windows, Windows 7, Windows Server
License: Freeware | Size: 6.97 MB | Download (1417): All Free OCR Download |
Advanced OCR Free helps you view PDF files and automate the transformation of image-based content into editable, searchable PDF files within workflows, so you can access content for vital business processes more easily. The simple design of the interface makes text extraction a breeze....
Platforms: Windows, Windows 7
License: Freeware | Size: 6.83 MB | Download (232): Advanced OCR Free Download |
The PDF-Analyzer is a tool extracting all attributes from pdf files. You can use it from the explorer contextmenu and 'stand alone' as a 'PDF Browser', too. You can see all attributes/properties of a selected pdf file. That reaches from the document informations e.g. titles, topic affiliation and...
Platforms: Windows, Windows 8, Windows 7, Windows Server
License: Freeware | Size: 5.95 MB | Download (150): PDF-Analyzer Download |
Word Extractor is a hacking tool that extracts (human) words from binary (machine) files. Is suitable for many purposes like finding a cheat in a game, finding hidden text or passwords in a file (exe, bin, dll), text extraction from corrupted documents, etc...
Platforms: Windows
License: Freeware | Size: 220 KB | Download (428): Word Extractor Download |
Minnow is a Web-based PDF discussion engine for discussing and annotating documents in an online, multi-user environment. Minnow was developed by the Mathematical Biology group at the Northwest Fisheries Science Center in Seattle, WA with support by NOAA/NMFS. Whats New in This Release: -...
Platforms: *nix
License: Freeware | Size: 440.32 KB | Download (91): Minnow Download |
Generate Word documents (doc, docx) in ASP.NET, Visual Basic .NET and C#.
Bytescout Document SDK is 100% managed .NET (1.10, 2.00 and higher) library for document (DOC, DOCX) writing, reading and modification.
Benefits:
* Microsoft Word (or Microsoft Office) is not required;
* Made...
Platforms: Windows
License: Freeware | Size: 1.33 MB | Download (233): Bytescout Document SDK for .NET Download |
This script extracts contents of all verbatim environments from the LaTeX file specified on command line. Modified LaTeX code with verbatiminput commands instead of verbatim is produced on the standard output.
Platforms: Windows, Mac, *nix, Python, BSD Solaris
License: Freeware | Download (57): Extract verbatim texts from LaTeX file Download |
PDF to Text is a 100% freeware that used to quickly convert PDF documents to plain text files in batch mode. It works without Adobe Acrobat or Adobe Reader, and has friendly interface, small size, accurate and fast conversion ability. It retains the original text, format and layout (as much as...
Platforms: Windows
License: Freeware | Size: 1.26 MB | Download (799): PDF to Text Download |
Text Mining Tool is a freeware program for extraction of text from files of the next types: pdf, doc, rtf, chm, html without need to have installed any other programs like Word, Arcrobat, etc. Its one of the most important features - simple and user-friendly interface with hotkeys available. It...
Platforms: Windows
License: Freeware | Size: 8.39 MB | Download (212): Text Mining Tool Download |
T-Rex (Trainable Relation Extraction) is a highly configurable machine learning-based Information Extraction from Text framework, which includes tools for document classification, entity extraction and relation extraction.
Platforms: Windows, Mac, Linux
License: Freeware | Size: 26.54 MB | Download (46): Trainable Relation Extraction framework Download |
Text Inserter inserts pre-defined pieces of text into any application, making text entry a lot quicker and easier.
Once the text entry has been defined (takes 30 seconds) it will be inserted wherever the cursor happens to be by one of 3 methods:
- clicking a button on the floating bar.
- using...
Platforms: Windows
License: Freeware | Size: 340 KB | Download (404): Text Inserter Download |
HTML To Text utility converts HTML documents to simple text files, by removing all HTML tags and formatting the text according to your preferences.
Features:
* HTML To Text automatically removes all tags and scripts from the document.
* The remained text is formatted according to...
Platforms: Windows, Other
License: Freeware | Size: 4.24 MB | Download (295): HTML To Text Download |
Make your site come alive! Create slide shows with a fractal transition effect with the Cloud Text Applet. Cloud Text Applet can run in the middle
Platforms: Windows, Mac, *nix
License: Freeware | Size: 90 KB | Download (229): Cloud Text Applet Download |