Download Shareware and Freeware Software for Windows, Linux, Macintosh, PDA

line Home  |  About Us  |  Link To Us  |  FAQ  |  Contact

Serving Software Downloads in 976 Categories, Downloaded 29.875.540 Times

PDFlib TET 2.2

  Date Added: August 16, 2010  |  Visits: 1.232

PDFlib TET

Report Broken Link
Printer Friendly Version


Product Homepage
Download (83 downloads)



PDFlib TET (Text Extraction Toolkit) is software for reliably extracting text information from any PDF file. It is available as a library/component and as a command-line tool. TET makes available the text contents of a PDF as Unicode strings or structured XML, plus detailed glyph and font information. With TET you can retrieve the corresponding Unicode values for text in a PDF document, as well as its position on the page. In addition to low-level text retrieval TET contains advanced content analysis algorithms for determining word boundaries, removing redundant duplicate text (such as shadows and artificial bold). Using the auxiliary pCOS interface you can retrieve arbitrary objects from the PDF, such as metadata, hypertext, etc. Fully functional evaluation versions of TET including documentation and samples are available from the TET download page for all supported platforms. Purchasing a license and applying the license key will fully enable the evaluation version for production deployment. With PDFlib TET you can: - extract text from PDF, e.g. to store it in a database - implement a search engine for processing PDF - convert the text content of PDF pages to XML for processing with other tools - process PDFs based on their contents Supported PDF Input PDFlib TET supports all relevant flavors of PDF input: - all PDF versions up to PDF 1.7 (Acrobat 8) - all font and encoding types: base 14 fonts, TrueType, PostScript, OpenType, CID fonts - encrypted PDF with 40- and 128-bit encryption (appropriate permission settings or password required) Unicode Although text in PDF is usually not encoded in Unicode, PDFlib TET will normalize the text from a PDF document to Unicode: - TET converts all text contents to Unicode. In C the text will be returned in the UTF-8 or UTF-16 formats, and as native Unicode strings in all other language bindings. - Ligatures and other multi-character glyphs will be decomposed into a sequence of their constituent Unicode characters. - Vendor-specific Unicode assignments (Private Use Area, PUA) are identified, and mapped to characters in the common Unicode area if possible. - Glyphs without appropriate Unicode mappings are identified as such, and are mapped to a configurable replacement character. Full CJK Support TET includes full support for extracting Chinese, Japanese, and Korean text. All predefined CJK CMaps (encodings) are recognized; horizontal and vertical writing modes are supported. Content Analysis and Word Identification TET can be used to retrieve low-level glyph information, but also includes advanced algorithms for content analysis: - Detect word boundaries to retrieve words instead of characters. - Recombine the parts of hyphenated words. - Remove duplicate instances of text, e.g. shadow and artificial bold text. - Recombine paragraphs into reading order. - Reorder text which is scattered over the page. - Reconstruct lines of text. Geometry TET provides precise metrics for the text, such as the position on the page, glyph widths, text direction. Specific areas on the page can be excluded or included in the text extraction, e.g. to ignore headers and footers or margins. Limitations:. Description

Requirements: No special requirements
Platforms: Linux
Keyword: Extracting Graphics Information Multimedia Page Pdflib Pdflib Tet Tet Text Text Extraction Text Extraction Toolkit Text Information Unicode Versions
Users rating: 0/10

License: Shareware Cost: $995.00 USD
USER REVIEWS
More Reviews or Write Review


PDFLIB TET RELATED
Modules  -  Update Session on Page Change MOD 1.0.0
This MOD modifies the phpBB session management so it updates the user session information on page changes (so we get up to date info for viewonline).
 
Utilities  -  3D IGES Viewer RS 3.1
This is a great standard version 3D IGES(.igs or .iges format) model viewer for iPad. IGES format ( Initial Graphics Exchange Specification ) is a vendor neutral data format that allows the digital exchange of information among CAD, CAE, CAM, PLM,...
10.7 MB  
Utilities  -  3D IGES Viewer RSi 3.1
This is a great standard version 3D IGES(.igs or .iges format) model viewer for iPhone. IGES format ( Initial Graphics Exchange Specification ) is a vendor neutral data format that allows the digital exchange of information among CAD, CAE, CAM,...
6.5 MB  
Web Searching Tools  -  Easy Web Page Watcher 2.11
Easy Web Page Watcher continuously watches web pages that you specify for text, updates, or availability. If the text appears or an update occurs, Easy Web Page Watcher will alert you with an audible sound and email. It displays the information...
824 KB  
Utilities  -  Linux-EduCD 0.8
Linux-EduCD is a Polish live DVD based on KANOTIX, with focus on education, graphics, office, multimedia and software development..
1.33 GB  
Multimedia  -  Pulsating text for Graphics and Animations 1.1
This script makes use of IE's multimedia filters, coupled with scripting, to render an "aura" around any given text. The effect can easily be applied to multiple text, each with differing pulsating colors and speed.
102.4 KB  
Graphics Editors  -  The Creator for Mac 7.2.9
The Logo Creator by Laughingbird Software - create logos that look like a Photoshop guru spent hours laboring over! The Logo Creator starts you off by showing you over 200 creative templates! No extra work on your part. So now your landing...
28.68 MB  
Graphics Editors  -  The Logo Creator for Mac 7.2.9.2
The Logo Creator by Laughingbird Software - create logos that look like a Photoshop guru spent hours laboring over! The Logo Creator starts you off by showing you over 200 creative templates! No extra work on your part. So now your landing...
28.68 MB  
Graphics Editors  -  The Logo Creator 6.8.1
The Logo Creator by Laughingbird Software - create logos that look like a Photoshop guru spent hours laboring over! The Logo Creator starts you off by showing you over 200 creative templates! No extra work on your part. So now your landing...
185.94 MB  
Backup Utilities  -  SIM Card Reader Tool 3.0.1.5
Cell phone SIM card message and contact number recovery tool retrieve all read and unread short messages (commonly knows as SMS), contact name and number, phonebook directory information, multimedia message that are stored SIM card. Mobile SIM...
484 KB  
NEW DOWNLOADS IN MULTIMEDIA & GRAPHICS, 3D GRAPHIC TOOLS
Multimedia & Graphics  -  Free Video Capture 5.4.9
The conspicuous feature of Free Video Capture is recording screen video. It enables you to make your own video through web camera or record your games online to share them with your friends or upload to websites for entertainment. It is a...
1.54 MB  
Multimedia & Graphics  -  Open Factory 3D For Linux 2.4
Open Factory 3D is a free factory design application that helps you to place your machines and factory equipment on a factory 2D plan, with a 3D preview.
15.49 MB  
Multimedia & Graphics  -  Fractal4D 1.30
Fractal4D is an Adobe AIR application that lets you draw really cool detailed fractal swirls that can then be exported as a vector for use in Adobe Illustrator or as a plain PNG. There are a whole load of options that allow you to tweak the...
40.96 KB  
Multimedia & Graphics  -  gst-simple-player 0.0.0
gst-player is a very basic media player that uses GStreamer. The objective is to be simple but useful. Some of the features are based on MPlayer's UI. gst-player can also be used as an example for bigger projects.
40.96 KB  
Multimedia & Graphics  -  PVR150 Capture Utility 0.8
PVR150 Capture Utility is a video capture tool for MythTV. Developer comments I haven't been at this very long so I really don't know what I'm doing. I just know it works on my computer. Your milage may vary wildly. I was...
81.92 KB  
3D Graphic Tools  -  progeCAD 2018 Professional CAD Software 18.0.2.8
progeCAD perpetual license offers an affordable way to draw AutoCAD DWG and DXF files - edit existing files or cerate your own - using similar commands and icons as used in previous releases of AutoCAD itself. Many choose progeCAD over AutoCAD LT...
512.02 MB  
3D Graphic Tools  -  Cheewoo Shape Tracer 2.7.2003.1003
Converting raster image data file of a part / pattern into a vector shape file without manual digitizing. Supporting image scanning twain device to acquire image data as a input. Contains advanced image processing feature to remove noise in the...
4.99 MB  
3D Graphic Tools  -  XnConvert 1.74
XnConvert is a cross-platform batch image-converter and resizer with a powerful and ease of use experience. All common picture and graphics formats are supported (i.e. JPG, PNG, TIFF, GIF, Camera RAW, JPEG2000, WebP, OpenEXR) as well as supporting...
3.84 MB  
3D Graphic Tools  -  Print2CAD 2017 8.0
Print2CAD 2017 8th Generation quickly and precisely converts a raster or vector-based PDF to DWG or DXF with full OCR capability to create fully editable, AutoCAD and other CAD system ready files. Print2CAD 2017 8th Generation also converts...
359.3 KB  
3D Graphic Tools  -  progeCAD 2017 Professional CAD Software 17.0.6.15
progeCAD perpetual license offers an affordable way to read and write AutoCAD files, using similar commands and toolbars as used in previous releases of AutoCAD itself. Many choose progeCAD over AutoCAD LT because of it's low cost (currently a...
410.29 MB