Download Shareware and Freeware Software for Windows, Linux, Macintosh, PDA

line Home  |  About Us  |  Link To Us  |  FAQ  |  Contact

Serving Software Downloads in 976 Categories, Downloaded 31.854.959 Times

text-hr 0.17

Company: Robert Lujo
Date Added: December 03, 2013  |  Visits: 370

text-hr

Report Broken Link
Printer Friendly Version


Product Homepage
Download (16 downloads)

text-hr is Morphological/Inflection Engine for Croatian language written in Python programming language. Includes stopwords and Part-Of-Speech tagging engine (POS tagging) based on inverse inflection algorithm for detection.<br /><br />Since API is not still freezed, this project is still in alpha.<br /><br />TAGS<br /><br /> Croatian language, python, natural language processing (NLP), Part-of-speech (POS) tagging, stopwords, inverse inflection, morphological lexicon<br /><br />FEATURES<br /><br />To name the most important are:<br /><br /> * inflection system - for producing all forms of one word<br /> * detection of word types (POS tagging) - from existing list of word forms<br /> * list of stopwords<br /><br />System is based on unicode strings, default codepage to convert from and to string is cp-1250.<br /><br />Check Getting started.<br /><br />INSTALLATION<br /><br />Installation instructions - if you have installed pip package http://pypi.python.org/pypi/pip:<br /><br />pip install text-hr<br /><br />If not, then old-fashioned way:<br /><br /> * download zip from http://pypi.python.org/pypi/text-hr/<br /> * unzip<br /> * open shell<br /> * go to distribution directory<br /> * python setup.py install<br /><br />GETTING STARTED<br /><br />There are three important parts that this project provides:<br /><br /> * Inflection system - for producing all forms of one word<br /> * Detection of word types (POS tagging) - from existing list of word forms<br /> * List of stopwords<br /><br />Inflection system<br /><br />Usage example - start python shell:<br /><br />> python<br />>>> from text_hr.verbs import Verb<br />>>> v = Verb("platiti")<br />>>> for k in sorted(v.forms.keys()):<br />... print k, v.forms[k]<br />...<br />AOR/P/1 [u'platismo']<br />AOR/P/2 [u'platiste']<br />AOR/P/3 [u'platiu0161e']<br />AOR/S/1 [u'platih']<br />AOR/S/2 [u'plati']<br />AOR/S/3 [u'plati']<br />IMP/P/1 [u'platasmo', u'plau0107asmo', u'platijasmo']<br />IMP/P/2 [u'plataste', u'plau0107aste', u'platijaste']<br />IMP/P/3 [u'platahu', u'plau0107ahu', u'platijahu']<br />...<br />VA_PA//P_O+S+V+N [u'plau0107eno']<br />X_INF// [u'platiti']<br />X_VAD_PAS// [u'plativu0161i']<br />X_VAD_PRE// [u'plateu0107i']<br />X_VAD_PRE// [u'plateu0107i']<br /><br />Detection of word types (POS tagging)<br /><br />TODO: to be done - check test_detect.txt for samples, and detect.py for the logic:<br /><br />first example in test_detect.txt:<br /><br />>>> from text_hr.detect import WordTypeRecognizerExample<br />>>> def test_it(word_list, word_types_filter=None, level=2):<br />... wdh = WordTypeRecognizerExample(word_list, silent=True)<br />... if not word_types_filter is None:<br />... wdh.detect(word_types_filter=word_types_filter, level=level) # e.g. word_types_filter=["N"]<br />... else:<br />... wdh.detect(level=level) # all word types<br />... lines_file = LinesFile()<br />... wdh.dump_result(lines_file) # doctest: +NORMALIZE_WHITESPACE +ELLIPSIS<br />... print "n".join(lines_file.lines)<br />... return wdh<br /><br />>>> class LinesFile(object):<br />... def __init__(self):<br />... self.lines = []<br />... def write(self, s):<br />... self.lines.append(repr(s.rstrip()))<br /><br />>>> word_list = [<br />... "Broj 84"<br />... , "broji 34"<br />... , "Brojila 28"<br />... , "broje 23"<br />... , "broje?*a*?i 22"<br />... , "brojim 7"<br />... , "brojimo 5"<br />... , "broji?*?Z 4"<br />... , "brojahu 2"<br />... , "broja?*?Ze 1"<br />... , "brojite 1"<br />... , "-brijestovu 1"<br />... , "brijestovi 1" #the only one checked with endswith, but all other will be checked with get_freq<br />... , "-brijestove 1"<br />... , "-brijestova 1"<br />... ]<br /><br />Lowest quality, but fastest<br />>>> wdh = test_it(word_list, level=4) # doctest: +ELLIPSIS<br />" 10/ 183 -> brojati (u'V-XX_-_JATI-je\u0107i-0') 84/broj,34/broji,23/broje,22/brojexe6i,7/brojim,5/brojimo,4/brojix9a,2/brojahu,1/brojite,1/brojax9ae"<br /><br />List of stopwords<br /><br />TODO: to be simplified and explained in details. this is not tested.<br /><br />Something like:<br /><br />from text_hr import word_types<br /><br />word_types_list = None<br />for wordobj, l_key, cnt, _suff_id, wform_key, wform in word_types.get_all_std_words(word_types_list):<br /> if not (wordobj==wordobj_old and l_key==l_key_old):<br /> wordobj_data["value_base"] = wordobj<br /> l_key_flds = l_key.split("#")<br /> # wordobj l_key wform_key form<br /> # ondje FX#ADV#MJE.GDJE ''<br /> # one CH#PRON.OSO# #P/3F#|A#1 'njih'<br /> assert len(l_key_flds)==3, l_key_flds<br /> is_changeable = (l_key_flds[0]=="CH")<br /> print "word_type", l_key_flds[1]<br /> print "subtype", l_key_flds[2]<br /><br /> assert wordobj_obj<br /> # TODO:<br /> # if wform:<br /> # raise NotImplementedError("now wordforms don't hold wf/key, but wf/cnt - it is reduced. Here this is not implemented!!!")<br /><br />Further<br /><br />Since there is currently no good documentation, the best source of further information is by reading tests inside of modules and tests in tests directory (dev version). More information in Running tests. And you can allways read a source.<br /><br />DOCUMENTATION<br /><br />Sorry but currently there is no good documentation. In progress ...<br /><br />SUPPORT<br /><br />Since this project is limited with my free time, support will be limited.<br /><br />REPORT BUG OR REQUEST FEATURE<br /><br />If you encounter bug, the best is to report it to bitbucket web page http://bitbucket.org/trebor74hr/text-hr.<br /><br />If there will be an interest for development for other inflection rich languages, I'd be glad to decouple language specific code and create new project that will be capable to deal with multiple languages.<br /><br />The best way to contact me is by mail (find in LICENCE).<br /><br />TODO list is in readme.txt (dev version).<br /><br />CONTRIBUTION<br /><br />Since this project is not currently in the stable API phase, contribution should wait for a while.<br /><br />RUNNING TESTS<br /><br />All tests are doctests (not unittests). There are three type of tests in the package:<br /><br /> 1. doctests in each module - e.g. in verbs.py<br /> 2. doctests in tests/test_*.txt - only development version<br /> 3. tests which are not automatically compared - i.e. in special call mode detect.py can produce output file which needs to be compared manually with some existing file. Such test(s) are very slow. This needs to be changed to be automatic.<br /><br />Running each module directly will run 1. and 2. if running from development version. To get development version To use development version (http://bitbucket.org/trebor74hr/text-hr):<br /><br />hg clone https://trebor74hr@bitbucket.org/trebor74hr/text-hr<br /><br />create text_hr.pth in python site-packages directory with path to text-hr e.g.:<br /><br />r:hg-clonespythontext-hr<br /><br />To run all tests:<br /><br /> * go to tests directory<br /> * run tests.py like (with sample output):<br /><br /> > python tests.py<br /> testing module __init__<br /> testing module adjectives<br /> ...<br /> testing module word_types<br /> testing textfile R:hg-clonespythontext-hrteststest_adj.txt<br /> ...<br /> testing textfile R:hg-clonespythontext-hrteststest_verbs_type.txt<br /><br />To run tests for just one module:<br /><br /> * goto text_hr directory<br /> * run tests by running module, e.g.:<br /><br /> > py pronouns.py<br /> __main__: running doctests<br /> ..teststest_pronouns.txt: running doctests<br /><br /> * in the case you're not running from dev version, you'll get output like this:<br /><br /> > py pronouns.py<br /> __main__: running doctests<br /> ..teststest_pronouns.txt: Not found, skipping<br /><br /><br />#md5=c5e00de08d0b465a1624028c17cc29d0

Requirements: No special requirements
Platforms: *nix, Linux
Keyword: Development Directory Doctests Existing Forms Hr Inflection Language Module Print Project Python Quot Running Tagging Testing Tests Text Types Version
Users rating: 0/10

License: Freeware Size: 112.64 KB
TEXT-HR RELATED
Development Editors  -  Octave Workshop 0.10
Octave Workshop is an integrated development environment for the GNU Octave programming language.
1.36 MB  
Programming  -  Cython 0.9.6.3
Cython is a language that makes writing C extensions for the Python language as easy as Python itself. It is based on the well-known Pyrex, but supports more cutting edge functionality and optimizations. Development of Cython is mainly...
542.72 KB  
Development Editors  -  Turbo Pascal 7.0
How to install Turbo Pascal on Windows x64 Turbo Pascal is a software development system that includes a compiler and an Integrated Development Environment (IDE) for the Pascal programming language running under CP/M, CP/M-86, and MS-DOS,...
1.34 MB  
Business  -  Barcode Generator & Overprinter 6. 4. 2003
If you need to over print a barcode on existing forms, shipping labels, invoices, reports, etc. Barcode Generator & Overprinter can satisfy your requirement, just need a few quick mouse motions to set the print position, you can print barcodes on...
3.4 MB  
Modules  -  Subform Element 1.0
This form element allows reusing existing forms inside a form.Building a new form can mean reusing an existing form and adding new form items to it.InstallationUnpack in your modules folder (usually /sites/all/modules/) and enable under Administer...
 
Modules  -  Inventory field Taxonomy Query Language 5.x-1.x-de
The taxonomy query language module, tql, implements a plugin for the search (Drupal core) and views module. It provides a new tab Taxonomy in the Drupal search and a new Views filter.If you have the tql module and the search module enabled, a...
30.72 KB  
Games  -  Baby Tweets English 1.4.3
** First words game for toddlers. ** * No ads. * Simple navigation. * Narration available in English French Russian and Hebrew. When we follow the general development of a baby, one observes that language and speech are primary...
47.5 MB  
Development Tools  -  PDFMap 2.0
PDFMap is both a command line tool, a CGI script and a Python language module, designed to make the automated generation of very high quality interactive maps in the PDF format easy. PDFMap can place different objects on a rasterized map...
 
Programming  -  Youhp3 3.8
Youpee is an html preprocessor that allows you to embed any code of any script language as well as calling any external program to generate text files. It is specially designed to work with html/xml documents and provides traditional features:...
491.52 KB  
Utilities  -  VEE, Vim Editor Embedded 3.2 beta
VEE, Vim Embedded Editor is the sum of VIM and X-Terminal and is written by python language, pygtk binding, vte python modules and is tightly integrated with GNOME environment. VEE was started to make a text editor, which was based on the VIM,...
2.8 MB  
NEW DOWNLOADS IN SHELL & DESKTOP, TEXT EDITORS
Shell & Desktop  -  Glunarclock 0.32.4
GNOME Lunar Clock Applet displays the current phase of the Moon as an applet for the gnome panel. In the properties box you can choose between a real image Features Pointing with the mouse at the applet...
522.24 KB  
Shell & Desktop  -  KOpenBabel 0.2
KOpenBabel is a graphical interface to Open Babel.[1] KOpenBabel can handle and convert over 70 chemical file formats. At this time, it can convert files, guess input file type and convert a large number of files with a single click. The user...
20.48 KB  
Shell & Desktop  -  Fekete 5
Icon theme for Linux For all possible desktop, and Linux distro Special additives: Suse's Yast icons. Xfce system icons, and archaic mimetypes icons Mandriva "special placed" status icons. Libreoffice icons.
71.59 MB  
Shell & Desktop  -  DesktopTools 02-alpha
DesktopTools is a collection of small utillities which help to make your daily life easier. Since the tools themself are rather small (projectwise) I keep them not as individual projects, but have them under this collective name.
102.4 KB  
Shell & Desktop  -  XFast 0.9
XFast is a slim and lightweighted desktop environment that incorporates X and Window Manager within the same project.
1.15 MB  
Text Editors  -  DocBook Doclet 6.0.3
DocBook Doclet (dbdoclet) creates DocBook XML and class diagrams from Javadoc comments, converts HTML to DocBook, and transfoms DocBook XML into various output formats. It consists of a complete DocBook distribution containing schemas and the...
57.64 MB  
Text Editors  -  SeaScope 0.4
A pyQt GUI front-end for cscope. Written in python using pyQt, QScintilla libraries. Features: * Search features o cscope search features o Call tree for functions o Call tree for symbols ...
10.24 KB  
Text Editors  -  Val(a)IDE 0.7.1
Val(a)IDE is an IDE (Integrated Development Environment) application for the Vala programming language. Here are some key features of "Val(a)IDE": ?*A* Syntax highlighting for Vala ?*A* Project compilation
1.52 MB  
Text Editors  -  greyd 1.0
greyd is a transparent Greylist proxy for the purpose of rejecting spam send by spambot armies. The first generation of code which has been running in production for about 3 months has greatly reduced the amount of spam that needs to be processed...
10.24 KB  
Text Editors  -  Siril 0.8
Siril is an astronomical image processing software for Linux.
204.8 KB