Flour Bluff Isd Corpus Christi
The Arabic Corpus is composed of arabic texts for text categorization. The corpus Khaleej-2004 contains 5690 documents. It is divided to 4 topics (categories). The corpus Watan-2004 contains 20291 documents organized in 6 topics (categories).
Platforms: *nix
License: Freeware | Size: 13.76 MB | Download (32): Arabic Corpus Download |
Bitextor is an application created to generate translation memories using multilingual websites as a corpus source. It downloads an entire website and applies a set of heuristics (based mainly on HTML tag structure and text block length) to find bitexts.
Platforms: *nix
License: Freeware | Size: 204.8 KB | Download (35): Bitextor Download |
CorpusSearch is a tool that finds syntactic structures in a corpus of annotated sentence trees. It can be used as a research tool on a corpus, or as a development tool for building the corpus. CorpusSearch 2 is a Java program that supports research in corpus linguistics. It is useful both for...
Platforms: *nix
License: Freeware | Size: 2.92 MB | Download (36): CorpusSearch for Linux Download |
Uplug is a collection of tools for linguistic corpus processing, word alignment, and term extraction from parallel corpora. Several tools have been integrated in Uplug. Pre-processing tools include a sentence splitter, tokenizer, and external part-of-speech tagger and shallow parsers. The...
Platforms: *nix
License: Freeware | Size: 21.9 MB | Download (108): Uplug Download |
An open-source corpus analysis class library written in C#. GUI of Tenka Text 0.1.3 comes with Wordlister - an advanced, extremely fast graphical wordlist tool and a simple regex concordance tool. Tenka Text - the open-source answer to WordSmith Tool
Platforms: Windows, Mac, BSD, Solaris, Linux
License: Freeware | Size: 707.74 KB | Download (51): Corsis (formerly Tenka Text) Download |
Emdros is a corpus query system for storage and retrieval of linguistic analyses of text. It is especially applicable in corpus linguistics dealing with syntax, morphology, phonology, and/or discourse. It is also a generally useful text database engine.
Platforms: Windows, Mac, Solaris, Linux
License: Freeware | Size: 8.33 MB | Download (48): Emdros Download |
PyAnnotation is a Python Library to access and manipulate linguistically annotated corpus files. Supported file formats are Kura XML, Elan XML and Toolbox files. A Corpus Reader API is provided to support statistical analysis within the NLTK.
Platforms: Windows, Mac, Linux
License: Freeware | Size: 45.38 KB | Download (46): PyAnnotation Download |
TXM is a free and open-source cross-platform Unicode & XML based text/corpus analysis environment and graphical client, supporting Windows, Linux and Mac OS X. It can also be used online as a J2EE standard compliant web portal (GWT based) with access control built in.
It offers a comprehensive...
Platforms: Windows, Mac, Linux
License: Freeware | Size: 2.46 MB | Download (46): TXM Download |
This module extends the nodereference fields by providing a filter based searching engine in order to automatically fill it using the Solace API filters features as backend.This means you can attach a SolR filter instance to node reference fields. Then, any node owner can enable and configure a...
Platforms: PHP
License: Freeware | Size: 30.72 KB | Download (46): Solace Node Reference Download |
"Mean" might be too strong a word, but this is not a friendly game. It???*a*?s you versus the computer, and the program plays to win. You take turns playing ascending cards (from Ace to King) in the red area, trying to be the first to get rid of your pile of 26 red cards. At the start of every...
Platforms: Mac
License: Freeware | Size: 1.92 MB | Download (35): Growly Spite & Malice Download |
Cunei is a data-driven machine translation system that builds dynamic, statistical models based on instances of known translations found in a corpus.
Platforms: *nix
License: Freeware | Size: 174.08 KB | Download (38): Cunei Machine Translation Platform Download |
A meta package-manager to deploy projects on UNIX Systemes sponsored by Makina Corpus. FEATURES; * Auto Update system. When minimerge upgrade (easy_install -U), we have now the infrastructure to run update callbacks. * Now minibuilds have revisions, this can facilitate their reinstallation as...
Platforms: *nix
License: Freeware | Size: 133.12 KB | Download (32): minitage.core Download |
PasteScripts to facilitate use of minitage and creation of minitage based projects sponsored by Makina Corpus. Projects templates * minitage.zope3: A sample layout for a zope 3 application * minitage.plone25: A sample layout for a plone 25 application * minitage.plone3: A sample layout for a...
Platforms: *nix
License: Freeware | Size: 634.88 KB | Download (40): minitage.paste Download |
MaxEntropy is a Perl5 module for Maximum Entropy Modeling and Feature Induction. SYNOPSIS use Statistics::MaxEntropy; # debugging messages; default 0 $Statistics::MaxEntropy::debug = 0; # maximum number of iterations for IIS; default 100 $Statistics::MaxEntropy::NEWTON_max_it = 100; #...
Platforms: *nix
License: Freeware | Size: 41.98 KB | Download (100): Statistics::MaxEntropy Download |
DadaDodo project is a program that generates random sentences based on input files. Sometimes these sentences are nonsense; but sometimes they cut right through to the heart of the matter, and reveal hidden meanings. DadaDodo works rather differently than Dissociated Press; whereas...
Platforms: *nix
License: Freeware | Size: 22.53 KB | Download (106): DadaDodo Download |
Knorpora is a modified version of the Knoppix 3.3 Live CD for students of corpus-based computational linguistics. Like Knoppix, the Knorpora CD allows you to run a fully operational Debian/Linux operating system from the CD-ROM drive, without installing anything on the computer. The Knorpora...
Platforms: *nix
License: Freeware | Size: 676.4 MB | Download (91): Knorpora Download |
Understanding computer networks without performing practical experiments is really difficult, not to say it is almost impossible. Unfortunately, setting up a networking lab can be very expensive. Netkit has been conceived as an environment for setting up and performing networking experiments at...
Platforms: *nix
License: Freeware | Size: 778.24 KB | Download (137): Netkit 4 Download |
Search::Lemur is a Perl class to query a Lemur server, and parse the results. SYNOPSYS use Search::Lemur; my $lem = Search::Lemur->new("http://url/to/lemur.cgi"); # run some queries, and get back an array of results # a query with a single term: my @results1 = $lem->query("encryption");...
Platforms: *nix
License: Freeware | Size: 8.19 KB | Download (89): Search::Lemur Download |
Search::FreeText is a free text indexing module for medium-to-large text corpuses. SYNOPSIS my $test = new Search::FreeText(-db => [DB_File, "stories.db"]); $text->open_index(); $text->clear_index(); $text->index_document(1, "Hello world"); $text->index_document(2, "World in motion");...
Platforms: *nix
License: Freeware | Size: 10.24 KB | Download (95): Search::FreeText Download |
TextSearch is a program that helps you search through a set of text files which are in a hierarchical structure, i.e. a directory structure. Each document is searched using a regular expression and an overview of the results is shown as a tree structure. By clicking on a file, it can be viewed,...
Platforms: *nix
License: Freeware | Size: 15.36 KB | Download (96): TextSearch Download |