Download Shareware and Freeware Software for Windows, Linux, Macintosh, PDA

line Home  |  About Us  |  Link To Us  |  FAQ  |  Contact

Serving Software Downloads in 976 Categories, Downloaded 29.987.038 Times

Mguesser 0.4

Company: Alexander Barkov
Date Added: August 04, 2013  |  Visits: 230

Mguesser

Report Broken Link
Printer Friendly Version


Product Homepage
Download (18 downloads)

WHAT'S THIS?<br /><br />mguesser is a standalone part of libmnogosearch (a core of mnogo search engine http://www.mnogosearch.org) which allows to guess character set and language of a text file.<br /><br />mguesser is implemented using "N-Gram-Based Text Categorization" technique which is implemented in TextCat language guesser written in Perl (http://www.let.rug.nl/~vannoord/TextCat/). mguesser is significantly faster than TextCat especially on large texts.<br /><br />This package consist of C written N-gram based algorithms as well as a number of maps for texts in various languages and character sets. Take a look into "maps" directory of this package to check the currently supported languages and character sets.<br />INSTALLATION<br /><br /> * Download source package from http://www.mnogosearch.org/guesser/mguesser-0.4.tar.gz<br /> * Unpack the distribution<br /> * Change directory to the unpacked distribution, then type "make". <br /><br />By default, mguesser will seek for language maps in "maps" subdirectory of the current directory. You can change the default language map location in Makefile by redefining the "-DLMDIR" value.<br />USAGE<br /><br />mguesser takes a plain text data to STDIN. Note that other "almost text" formats like HTML will return bad results. In later releases I'll possibly add a command line switch to tell mguesser that the input data is HTML. mguesser works fine for texts with size starting from 500 bytes and longer. Shorter texts are guessed not so well.<br /><br />To guess language and character set of some text file use:<br /><br /> mguesser < text_file<br /><br />mguesser will display how much your file corresponds to various language maps in the order of quality. mguesser returns values between 0 and 1.<br /><br />You can also display a specified number of the best results using -n command line switch. For example, this command will display 3 best results:<br /><br /> mguesser -n3 < text_file<br /><br />To make mguesser load language maps from a non-default directory, use:<br /><br /> mguesser -d/path/to/maps/<br /><br />To load language maps from multiple directories, use a colon separated list:<br /><br /> mguesser -d/path/to/maps1/:/path/to/maps2/:/path/to/maps3/<br /><br />To create a new language map, use:<br /><br /> mguesser -p -c charset -l language < text_file<br /><br />When executed with -p command line parameter, mguesser creates a new language map built on text_file and prints it to STDOUT. Please note that to create a high quality language map, the source text file should be large enough. A 500 Kb text is usually enough to produce a high quality map.<br /><br />You can also include these files into your own applications. Take a look into main() function which is located in the guesser.c to check the order of guesser functions calls.<br />TODO<br /><br /> * Make it possible to guess other than text formats: HTML, XML<br /> * Implement various command line switches to choose output format <br /><br />Alexander Barkov <bar@mnogosearch.org>

Requirements: No special requirements
Platforms: *nix, Linux
Keyword: Character Command Directory Display Guess Language Languages Large Mguesser Number Package Quality Quotmapsquot Results Switch Texts
Users rating: 0/10

License: Freeware Size: 143.36 KB
MGUESSER RELATED
Libraries  -  LCDML 1.2
LCDML project (or Liquid Crystal Display Markup Language) is a description language based on XML and used to describe the text that should be displayed on a LCD. It supports both static and dynamic text messages and bar charts and allows to...
6.14 KB  
Utilities  -  LargeFileViewer 0.61
Software to display the content of large text files LargeFileViewer is a free and easy to use software that can display the content of large, big, huge (text) files.
10.24 KB  
Desktop Toys  -  unicode-screensaver 0.2
unicode-screensaver is a simple screensaver application that repeatedly and picks an unicode character and displays it in a very large font size together with its unicode code point and the character name.
491.52 KB  
Networking  -  LJ Longtail SEO 1.8
LJ Longtail SEO is a tool that detects search engine visits and uses this information to display a list of links based on second page search results.The results in the database are aged off based on customizable settings so that once your longtail...
10 KB  
Programming  -  Yazoo 1.3.1
"Yazoo" is a command-line, interpreted scripting language that provides a ready-made environment for C or C++ functions. A user embeds his own routines into the language by referencing them in one of Yazoo's own source files, then recompiling...
81.92 KB  
Utilities  -  Label&Mark 1.0
It is the search & viewer of G-BookMarks. ? Filter a label and a title by a search character and can display it. ? Choose BookMark which I searched and can be maintained in a file. ? Choose BookMark to use with a special screen and...
102.4 KB  
Games  -  Guess The Game - quiz 1.25
Guess The Game guess games from screenshots! Do you think you know more games than anyone else? Do you argue with your friends about whos the best games expert? Guess games from screenshots, win achievements and share your results with friends...
11.5 MB  
Input Device Utilities  -  QuickiHash 1.00
QuickiHash is a lightweight application that has been designed to load quickly and display hashes for files in a number of different formats. Files can be dropped into the main window, on the application icon, specified via the commandline,...
45.73 KB  
Games  -  4 Digits 1.0.0.0
In this game a 4 with no repeating digits number is generating and you should guess by entering 4 digits number. For example 1346 if your number has repeated digits with the generated number, the program shows how many of them are existing like...
1024 KB  
Productivity  -  Battery Advisor Tile 1.0.0.0
Battery Advisor Tile supports live tiles! You get the following information: * remaining days left for battery usage * charge % indicator * date and time the tile was last updated * large charge % number * and, a fun...
1024 KB  
NEW DOWNLOADS IN SHELL & DESKTOP, TEXT EDITORS
Shell & Desktop  -  Glunarclock 0.32.4
GNOME Lunar Clock Applet displays the current phase of the Moon as an applet for the gnome panel. In the properties box you can choose between a real image Features Pointing with the mouse at the applet...
522.24 KB  
Shell & Desktop  -  KOpenBabel 0.2
KOpenBabel is a graphical interface to Open Babel.[1] KOpenBabel can handle and convert over 70 chemical file formats. At this time, it can convert files, guess input file type and convert a large number of files with a single click. The user...
20.48 KB  
Shell & Desktop  -  Fekete 5
Icon theme for Linux For all possible desktop, and Linux distro Special additives: Suse's Yast icons. Xfce system icons, and archaic mimetypes icons Mandriva "special placed" status icons. Libreoffice icons.
71.59 MB  
Shell & Desktop  -  DesktopTools 02-alpha
DesktopTools is a collection of small utillities which help to make your daily life easier. Since the tools themself are rather small (projectwise) I keep them not as individual projects, but have them under this collective name.
102.4 KB  
Shell & Desktop  -  XFast 0.9
XFast is a slim and lightweighted desktop environment that incorporates X and Window Manager within the same project.
1.15 MB  
Text Editors  -  DocBook Doclet 6.0.3
DocBook Doclet (dbdoclet) creates DocBook XML and class diagrams from Javadoc comments, converts HTML to DocBook, and transfoms DocBook XML into various output formats. It consists of a complete DocBook distribution containing schemas and the...
57.64 MB  
Text Editors  -  text-hr 0.17
text-hr is Morphological/Inflection Engine for Croatian language written in Python programming language. Includes stopwords and Part-Of-Speech tagging engine (POS tagging) based on inverse inflection algorithm for detection. Since API...
112.64 KB  
Text Editors  -  SeaScope 0.4
A pyQt GUI front-end for cscope. Written in python using pyQt, QScintilla libraries. Features: * Search features o cscope search features o Call tree for functions o Call tree for symbols ...
10.24 KB  
Text Editors  -  Val(a)IDE 0.7.1
Val(a)IDE is an IDE (Integrated Development Environment) application for the Vala programming language. Here are some key features of "Val(a)IDE": ?*A* Syntax highlighting for Vala ?*A* Project compilation
1.52 MB  
Text Editors  -  greyd 1.0
greyd is a transparent Greylist proxy for the purpose of rejecting spam send by spambot armies. The first generation of code which has been running in production for about 3 months has greatly reduced the amount of spam that needs to be processed...
10.24 KB