Corpus
The Arabic Corpus is composed of arabic texts for text categorization. The corpus Khaleej-2004 contains 5690 documents. It is divided to 4 topics (categories). The corpus Watan-2004 contains 20291 documents organized in 6 topics (categories).
Platforms: *nix
License: Freeware | Size: 13.76 MB | Download (32): Arabic Corpus Download |
Zeno Systems Omnibus is a comprehensive English dictionary and thesaurus housed in a single, easy to use package. Featuring a corpus gleaned from several sources, this is truly a massive tome comprising very nearly 3 million cross-linked definitions, synonyms, and all manner of related terms....
Platforms: Windows
License: Shareware | Cost: $14.00 USD | Size: 13.08 MB | Download (147): Zeno Systems Omnibus Download |
No serious language specialist today does work without the aid of Corpus Query Software to readily and rapidly study actual language usage. The tlCorpus Corpus Query Software brings the efficiency and professionalism of the TLex Lexicography Software to corpus work.
FEATURES:
·...
Platforms: Windows
License: Shareware | Cost: $55.00 USD | Size: 11.6 MB | Download (50): tlCorpus Download |
No serious language specialist today does work without the aid of Corpus Query Software to readily and rapidly study actual language usage. The tlCorpus Corpus Query Software brings the efficiency and professionalism of the TLex Lexicography Software to corpus work.
FEATURES:
·...
Platforms: Mac
License: Shareware | Cost: $55.00 USD | Size: 11.6 MB | Download (44): tlCorpus for Mac OS X Download |
Bitextor is an application created to generate translation memories using multilingual websites as a corpus source. It downloads an entire website and applies a set of heuristics (based mainly on HTML tag structure and text block length) to find bitexts.
Platforms: *nix
License: Freeware | Size: 204.8 KB | Download (35): Bitextor Download |
CorpusSearch is a tool that finds syntactic structures in a corpus of annotated sentence trees. It can be used as a research tool on a corpus, or as a development tool for building the corpus. CorpusSearch 2 is a Java program that supports research in corpus linguistics. It is useful both for...
Platforms: *nix
License: Freeware | Size: 2.92 MB | Download (36): CorpusSearch for Linux Download |
The Catholic Calendar v1.0 calculates the dates of Catholic religious feasts associated with Easter (Western churches only).This means you can calculate the dates of the following feasts (in the full version): Ash Wednesday, Palm Sunday, Holy Thursday, Good Friday, Holy Saturday, Easter Sunday,...
Platforms: Windows
License: Shareware | Cost: $5.00 USD | Size: 594 KB | Download (315): Catholic Calendar Download |
Uplug is a collection of tools for linguistic corpus processing, word alignment, and term extraction from parallel corpora. Several tools have been integrated in Uplug. Pre-processing tools include a sentence splitter, tokenizer, and external part-of-speech tagger and shallow parsers. The...
Platforms: *nix
License: Freeware | Size: 21.9 MB | Download (107): Uplug Download |
An open-source corpus analysis class library written in C#. GUI of Tenka Text 0.1.3 comes with Wordlister - an advanced, extremely fast graphical wordlist tool and a simple regex concordance tool. Tenka Text - the open-source answer to WordSmith Tool
Platforms: Windows, Mac, BSD, Solaris, Linux
License: Freeware | Size: 707.74 KB | Download (51): Corsis (formerly Tenka Text) Download |
Emdros is a corpus query system for storage and retrieval of linguistic analyses of text. It is especially applicable in corpus linguistics dealing with syntax, morphology, phonology, and/or discourse. It is also a generally useful text database engine.
Platforms: Windows, Mac, Solaris, Linux
License: Freeware | Size: 8.33 MB | Download (48): Emdros Download |
PyAnnotation is a Python Library to access and manipulate linguistically annotated corpus files. Supported file formats are Kura XML, Elan XML and Toolbox files. A Corpus Reader API is provided to support statistical analysis within the NLTK.
Platforms: Windows, Mac, Linux
License: Freeware | Size: 45.38 KB | Download (46): PyAnnotation Download |
TXM is a free and open-source cross-platform Unicode & XML based text/corpus analysis environment and graphical client, supporting Windows, Linux and Mac OS X. It can also be used online as a J2EE standard compliant web portal (GWT based) with access control built in.
It offers a comprehensive...
Platforms: Windows, Mac, Linux
License: Freeware | Size: 2.46 MB | Download (46): TXM Download |
This module extends the nodereference fields by providing a filter based searching engine in order to automatically fill it using the Solace API filters features as backend.This means you can attach a SolR filter instance to node reference fields. Then, any node owner can enable and configure a...
Platforms: PHP
License: Freeware | Size: 30.72 KB | Download (46): Solace Node Reference Download |
The Deep Email Miner Application is a software solution for the multistaged analysis of an Email Corpus. Social network analysis and text mining techniques are connected to enable an in depth view into the underlying information. Written in Java, published under the GNU GPL and hosted by...
Platforms: Mac
License: Shareware | Cost: $0.00 USD | Size: 9.38 MB | Download (41): Deep Email Miner Download |
Cunei is a data-driven machine translation system that builds dynamic, statistical models based on instances of known translations found in a corpus.
Platforms: *nix
License: Freeware | Size: 174.08 KB | Download (38): Cunei Machine Translation Platform Download |
A meta package-manager to deploy projects on UNIX Systemes sponsored by Makina Corpus. FEATURES; * Auto Update system. When minimerge upgrade (easy_install -U), we have now the infrastructure to run update callbacks. * Now minibuilds have revisions, this can facilitate their reinstallation as...
Platforms: *nix
License: Freeware | Size: 133.12 KB | Download (32): minitage.core Download |
PasteScripts to facilitate use of minitage and creation of minitage based projects sponsored by Makina Corpus. Projects templates * minitage.zope3: A sample layout for a zope 3 application * minitage.plone25: A sample layout for a plone 25 application * minitage.plone3: A sample layout for a...
Platforms: *nix
License: Freeware | Size: 634.88 KB | Download (40): minitage.paste Download |
MaxEntropy is a Perl5 module for Maximum Entropy Modeling and Feature Induction. SYNOPSIS use Statistics::MaxEntropy; # debugging messages; default 0 $Statistics::MaxEntropy::debug = 0; # maximum number of iterations for IIS; default 100 $Statistics::MaxEntropy::NEWTON_max_it = 100; #...
Platforms: *nix
License: Freeware | Size: 41.98 KB | Download (100): Statistics::MaxEntropy Download |
DadaDodo project is a program that generates random sentences based on input files. Sometimes these sentences are nonsense; but sometimes they cut right through to the heart of the matter, and reveal hidden meanings. DadaDodo works rather differently than Dissociated Press; whereas...
Platforms: *nix
License: Freeware | Size: 22.53 KB | Download (106): DadaDodo Download |
Knorpora is a modified version of the Knoppix 3.3 Live CD for students of corpus-based computational linguistics. Like Knoppix, the Knorpora CD allows you to run a fully operational Debian/Linux operating system from the CD-ROM drive, without installing anything on the computer. The Knorpora...
Platforms: *nix
License: Freeware | Size: 676.4 MB | Download (91): Knorpora Download |