Open Source Tamil Tools and NLP Library for Python 3
======================================================
திற மூல தமிழ் கருவிகள் version 1.1
-------------------------------------
.. image:: https://github.com/Ezhil-Language-Foundation/open-tamil/actions/workflows/regression.yml/badge.svg
.. image:: open-tamil-logo.jpg
மென்பொருள் (Software)
===================
பைதான் தொகுப்புகள் (Python Packages)
-----------------------------------
'tamil' என்ற பைதான் தொகுப்பை வழங்குகிறோம்
=====================================
tamilstemmer
------------
This module (introduced in v0.96) provides access to simple stemmer functions
originally created by Damodharan Rajalingam.
tamil
-----
open-tamil provides Python package 'tamil' with ability to,
1. map unicode code-points to Tamil letters - basic but important parsing - in a routine called get_letters from a Tamil word
`tamil.utf8.get_letters` and `tamil.utf8.get_letters_iterable` API return the Tamil letters from the unicode points of a normalized unicode string.
These routines are written with efficiency in mind, and tested for accuracy.
2. work with vowels (uyir) and consonants (mei), compound, uyir-mei letters
3. reverse letters in Tamil word
4. numeral - convert a given number (integer) into a numeral in Indian or American based system.
e.g. following call will return the string
>> tamil.numeral.num2tamilstr_american( long(1e7) )
u"பத்து மில்லியன்",
5. date module: new update to this module in the v1.1 release was added by Arunmozhi (Techolic)
adds datetime class with strftime, tamil_weekday(),
Example usage::
>>> from tamil.date import datetime
>>> d = datetime(2022, 1, 25, 9, 30)
>>> d.strftime_ta("%a %d, %b %Y")
'செவ்வாய் 25, ஜனவரி 2022'
>>> d.strftime_ta("%A (%d %b %Y) %p %I:%M")
'செவ்வாய்க்கிழமை (25 ஜனவரி 2022) முற்பொழுது 09:30'
This adds a subclass of datetime.datetime class from the Python standard
library that can be used as an alternate to the standard library class
with an extra date-to-string function called strftime_ta which functions
similar to the strftime function, except day, month names are returned
in Tamil.
txt2unicode
-----------
Tamil Text Encode to Unicode Converter and vice versa.
If you don't you know what your Tamil text encoding is, don't worry; the `tamil.txt2unicode.auto2unicode` function will find it and convert to unicode for you.
யுனிகோட் மாற்றி மற்றும் மாறாகவும் தமிழ் உரைக் குறியாக்கம்.
உங்களது தமிழ் உரைக் குறியீடு என்னவென்று தெரியாதெனில், நீங்கள் கவலை கொள்ளத் தேவையில்லை; `tamil.txt2unicode.auto2unicode` செயல்பாடு இதனைக் கண்டறியும் மற்றும் இதனை யுனிகோடுக்கு மாற்றும்.
Right now, it supports 25 known Tamil encodings. Read more details about [txt2unicode](tamil/txt2unicode/README.md) and [limitation](examples/txt2unicode/encodes_chars/README.md) of `auto2unicode` and `unicode2auto`.
தற்சமயம், இது 25 தமிழ் குறியாக்கம் கொண்ட எழுத்துருக்களை ஆதரிக்கிறது. [txt2unicode](tamil/txt2unicode/README.md) பற்றி மேலும் விவரங்களும் 'auto2unicode' மற்றும் 'unicode2auto'-வின் [குறைபாடுகளையும்] (examples/txt2unicode/encodes_chars/README.md) காண்க.
txt2ipa
-------
Tamil Unicode Text to International Phonetic Alphabet (IPA) converter
Read more details about [txt2ipa](tamil/txt2ipa/README.md)
சர்வதேச (ஐபிஏ) மாற்றி, தமிழ் யுனிகோட் உரை; மேலும் விபரங்களுக்கு -> படிக்க [இங்கு சொடுக்கவும்](tamil/txt2ipa/README.md).
transliterate
-------------
The python package `transliterate` provides for commonly used transliteration
phonetic schemes like,
1. Azhagi - phonetic maps for all Tamil letters - many -> one supporting multiple form inputs
2. Jaffna Library - phonetic maps for all Tamil letters - one->one
3. Combinational layout - based on phonetic mapping of vowel+consonant
4. University of Madras, ISO - transliteration schemes are added.
where you can supply English text, which phonetically encodes Tamil, and then receive Unicode encoded, in a best-effort algorithm for the longest phonetic match.
`transliterate` தொகுப்பு பொதுவாக பயன்படுத்தப்படும் ஒலிபெயர்ப்புகளை வழங்குகிறது; அவை,
1. அழகி - தமிழ் கடிதங்கள் ஒலிப்பு வரைபடங்கள் - பல -> ஒரு ஆதரவு பல வடிவம் உள்ளீடுகள்
2. யாழ்ப்பாண நூலகம் - தமிழ் கடிதங்கள் ஒலிப்பு வரைபடங்கள் - ஒன்று> ஒரு
3. பலதரப்பட்ட அமைப்பு - உயிர் + மெய் உச்சரிப்பு மேப்பிங் அடிப்படையில்
tamilmorse
----------
இந்த தொகுப்பில் தமிழுக்கான மோர்சு குறிகளை உருவாக்கவும், குறியீடுகளை
பிரித்துப்பார்கவும் முடியும்.
tamilsandhi
-------------
தமிழில் சந்திப்பிழை திருத்தி உருவாக்கவும் பிழைக திருத்தவும் உதவியாகஇந்த நிரல் தொகுப்பு வழிவகுக்கும். ஏரக்குறைய 40-விதிகளை கொண்டது இந்த நிரல் தொகுப்பை உருவாக்கியவர் திருமதி. நித்யா. மேலும் விவரங்களுக்க https://github.com/nithyadurai87/tamil-sandhi-checker
Tamil Sandhi Checker is a project created and maintained by Nithya Duraisamy,
with contributions from Ezhil Language Foundation. It is distributed under terms of GNU GPLv3.
For convenience this code is packaged with Open-Tamil.
C-tamil
-------
The package under C-tamil provides some of the same functionality as Python 'tamil' but in ISO-C for C/C++ use.
*சி தமிழ்*
பைதான் 'தமிழ்' தொகுப்பில் உள்ள சில பயன்பாடுகளை 'சி தமிழ்' ஐஎஸ்ஓ-சி-யில், சி/சி++ பயன்படுத்தும் வகையில் கொடுக்கும்.
திரை விசைப்பலகை (Onscreen Keyboard)
----------------------------------
Open-tamil provides the keyboard layout in the file `keyboard/tamil.js` for they jQuery UI plugin.
'tamil.js' விசைப்பலகை அமைப்பை வழங்குகிறது.
மாதிரிகள் (Language Modes)
-------------------------
Basic support for letter unigram, bigram models using UTF-8 based corpora are supported in the package 'ngram/'
which supports unigram model at the moment. More complex language models are expected to be developed soon.
எழுத்து unigram அடிப்படை ஆதரவு, மற்றும் UTF-8 அடிப்படையில் சொற்கிடங்கின் பயன்படுத்தி bigram மாதிரிகள் 'ngram/' தொகுப்பால் ஆதரிக்கப்படுகின்றன, தற்பொழுது அது மாதிரி unigram-ஐ ஆதரிக்கிறது. மிகவும் நுணுக்கமான மொழி மாதிரிகள் விரைவில் அபிவிருத்தி செய்யப்படும் என எதிர்பார்க்கப்படுகிறது.
நிறுவுதல் (Installation)
=======================
Installation from Python Package Index is also recommended, following the commands,
$ pip install open-tamil
Installing from sources
=======================
After pulling sources from git repo you need to sync the submodule
for tamilsandhi by issuing the following commands,
$ git submodule init
$ git submodule update --force
This is required for packaging, tamilsandhichecker, along with open-tamil.
உதாரணங்கள் (Example
===================
Open-Tamil is a set of Python libraries which can help your application - web, system software, GUI on desktop etc. support Tamil text processing, inputs etc.
Open-Tamil is still a basic collection of tools - its not complete yet. We have keyboard layouts, converters to change old encoding to UTF-8, N-gram language models, transliterators etc.
Examples for using Python Open-Tamil are found [here](tests/).
ஓபன்-தமிழ் என்பது தொகுக்கப்பட்ட பைதான் நூலகமாகும், உங்கள் வலை, ணினி நிரல், முகத்திரை வரைகலை மற்றும் பல தமிழ் எழுத்துரு செயற்பாடுகளுக்கு மிகவும் உவியாக இருக்கும்.
ஓபன்-தமிழ் என்பது அடிப்படை தொகுப்புக்களை மட்டுமே கொண்ட கருவிகளாும், இது இன்னும் முழுமை பெறவில்லை. இதில் UTF-8, என்-கிராம் மொழி மாதிரிகள், transliterators முதலியன பழைய முறையை மாற்ற விசைப்பலகை அமைப்பு, மாற்றிகள் உள்ளன. பைதான் ஓபன் தமிழ் பயன்படுத்தி உதாரணங்கள் [இங்கு](tests/) காணப்படுகின்றன.
இலக்குகள் (Goals)
=================
Goal of this package is to collect and develop open-source licensed Tamil tools, in one location that provide the following,
1. Unicode standard tools for Tamil - provide various tools for Tamil Unicode development. Currently 25 encodes are supported, read about it [here](tamil/txt2unicode/README.md)
2. Access Unicode Tamil letters, vowels and consonants.
3. Breakdown Tamil glyphs and unicode code-points into Tamil letter representations - collation
4. Tools for navigating a corpus of data, build word frequency, prediction tables etc.
5. Conversion from various encodings. e.g. TSCII to Unicode etc. We hope eventually to converts between the other major Tamil encodings like TAB, TAM, Bamini (*insert-your-favortie-font-encoding*) into Tamil Unicode encoding.
6. Support all of above in Python3.
While most of tools in this package will be in Python 2.6. or later, we are open to other open-source language source code contributions.
Contributing to Open-Tamil
===========================
1. Please add your code, and unit tests under MIT, GNU GPL or ASF licenses.
2. Update your code into modules, add unit tests following the Python flake8, pylint standards
3. Please do not mix TABS and SPACES. Use 4-space for Tabs.
4. Make sure your module installed as part of pip package
5. Ensure your code works for Python 2 and 3.
பற்றி(About)
============
Tamil is classical language primarily spoken in South India.
தமிழ் முதன்மையாக தென் இந்தியாவில் பேசப்படும் பாரம்பரிய மொழி ஆகும்.
Raw data
{
"_id": null,
"home_page": "https://github.com/Ezhil-Language-Foundation/open-tamil",
"name": "Open-Tamil",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "",
"author": "M. Annamalai, T. Arulalan, and other contributors",
"author_email": "ezhillang@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/b5/56/63140925e005952f3c7c521b039948108cdbbf782ec205aff85a6a465164/Open-Tamil-1.1.tar.gz",
"platform": "PC",
"description": "Open Source Tamil Tools and NLP Library for Python 3\n======================================================\n\u0ba4\u0bbf\u0bb1 \u0bae\u0bc2\u0bb2 \u0ba4\u0bae\u0bbf\u0bb4\u0bcd \u0b95\u0bb0\u0bc1\u0bb5\u0bbf\u0b95\u0bb3\u0bcd version 1.1\n-------------------------------------\n.. image:: https://github.com/Ezhil-Language-Foundation/open-tamil/actions/workflows/regression.yml/badge.svg\n.. image:: open-tamil-logo.jpg\n\n\u0bae\u0bc6\u0ba9\u0bcd\u0baa\u0bca\u0bb0\u0bc1\u0bb3\u0bcd (Software)\n===================\n\u0baa\u0bc8\u0ba4\u0bbe\u0ba9\u0bcd \u0ba4\u0bca\u0b95\u0bc1\u0baa\u0bcd\u0baa\u0bc1\u0b95\u0bb3\u0bcd (Python Packages)\n-----------------------------------\n'tamil' \u0b8e\u0ba9\u0bcd\u0bb1 \u0baa\u0bc8\u0ba4\u0bbe\u0ba9\u0bcd \u0ba4\u0bca\u0b95\u0bc1\u0baa\u0bcd\u0baa\u0bc8 \u0bb5\u0bb4\u0b99\u0bcd\u0b95\u0bc1\u0b95\u0bbf\u0bb1\u0bcb\u0bae\u0bcd\n=====================================\ntamilstemmer\n------------\nThis module (introduced in v0.96) provides access to simple stemmer functions\n originally created by Damodharan Rajalingam.\n\ntamil\n-----\nopen-tamil provides Python package 'tamil' with ability to,\n\n1. map unicode code-points to Tamil letters - basic but important parsing - in a routine called get_letters from a Tamil word\n `tamil.utf8.get_letters` and `tamil.utf8.get_letters_iterable` API return the Tamil letters from the unicode points of a normalized unicode string.\n These routines are written with efficiency in mind, and tested for accuracy.\n\n2. work with vowels (uyir) and consonants (mei), compound, uyir-mei letters\n3. reverse letters in Tamil word\n4. numeral - convert a given number (integer) into a numeral in Indian or American based system.\n e.g. following call will return the string\n >> tamil.numeral.num2tamilstr_american( long(1e7) )\n u\"\u0baa\u0ba4\u0bcd\u0ba4\u0bc1 \u0bae\u0bbf\u0bb2\u0bcd\u0bb2\u0bbf\u0baf\u0ba9\u0bcd\",\n5. date module: new update to this module in the v1.1 release was added by Arunmozhi (Techolic)\n adds datetime class with strftime, tamil_weekday(),\n Example usage::\n\n >>> from tamil.date import datetime\n >>> d = datetime(2022, 1, 25, 9, 30)\n >>> d.strftime_ta(\"%a %d, %b %Y\")\n '\u0b9a\u0bc6\u0bb5\u0bcd\u0bb5\u0bbe\u0baf\u0bcd 25, \u0b9c\u0ba9\u0bb5\u0bb0\u0bbf 2022'\n >>> d.strftime_ta(\"%A (%d %b %Y) %p %I:%M\")\n '\u0b9a\u0bc6\u0bb5\u0bcd\u0bb5\u0bbe\u0baf\u0bcd\u0b95\u0bcd\u0b95\u0bbf\u0bb4\u0bae\u0bc8 (25 \u0b9c\u0ba9\u0bb5\u0bb0\u0bbf 2022) \u0bae\u0bc1\u0bb1\u0bcd\u0baa\u0bca\u0bb4\u0bc1\u0ba4\u0bc1 09:30'\n\nThis adds a subclass of datetime.datetime class from the Python standard\nlibrary that can be used as an alternate to the standard library class\nwith an extra date-to-string function called strftime_ta which functions\nsimilar to the strftime function, except day, month names are returned\nin Tamil.\n\ntxt2unicode\n-----------\nTamil Text Encode to Unicode Converter and vice versa.\nIf you don't you know what your Tamil text encoding is, don't worry; the `tamil.txt2unicode.auto2unicode` function will find it and convert to unicode for you.\n\u0baf\u0bc1\u0ba9\u0bbf\u0b95\u0bcb\u0b9f\u0bcd \u0bae\u0bbe\u0bb1\u0bcd\u0bb1\u0bbf \u0bae\u0bb1\u0bcd\u0bb1\u0bc1\u0bae\u0bcd \u0bae\u0bbe\u0bb1\u0bbe\u0b95\u0bb5\u0bc1\u0bae\u0bcd \u0ba4\u0bae\u0bbf\u0bb4\u0bcd \u0b89\u0bb0\u0bc8\u0b95\u0bcd \u0b95\u0bc1\u0bb1\u0bbf\u0baf\u0bbe\u0b95\u0bcd\u0b95\u0bae\u0bcd.\n\u0b89\u0b99\u0bcd\u0b95\u0bb3\u0ba4\u0bc1 \u0ba4\u0bae\u0bbf\u0bb4\u0bcd \u0b89\u0bb0\u0bc8\u0b95\u0bcd \u0b95\u0bc1\u0bb1\u0bbf\u0baf\u0bc0\u0b9f\u0bc1 \u0b8e\u0ba9\u0bcd\u0ba9\u0bb5\u0bc6\u0ba9\u0bcd\u0bb1\u0bc1 \u0ba4\u0bc6\u0bb0\u0bbf\u0baf\u0bbe\u0ba4\u0bc6\u0ba9\u0bbf\u0bb2\u0bcd, \u0ba8\u0bc0\u0b99\u0bcd\u0b95\u0bb3\u0bcd \u0b95\u0bb5\u0bb2\u0bc8 \u0b95\u0bca\u0bb3\u0bcd\u0bb3\u0ba4\u0bcd \u0ba4\u0bc7\u0bb5\u0bc8\u0baf\u0bbf\u0bb2\u0bcd\u0bb2\u0bc8; `tamil.txt2unicode.auto2unicode` \u0b9a\u0bc6\u0baf\u0bb2\u0bcd\u0baa\u0bbe\u0b9f\u0bc1 \u0b87\u0ba4\u0ba9\u0bc8\u0b95\u0bcd \u0b95\u0ba3\u0bcd\u0b9f\u0bb1\u0bbf\u0baf\u0bc1\u0bae\u0bcd \u0bae\u0bb1\u0bcd\u0bb1\u0bc1\u0bae\u0bcd \u0b87\u0ba4\u0ba9\u0bc8 \u0baf\u0bc1\u0ba9\u0bbf\u0b95\u0bcb\u0b9f\u0bc1\u0b95\u0bcd\u0b95\u0bc1 \u0bae\u0bbe\u0bb1\u0bcd\u0bb1\u0bc1\u0bae\u0bcd.\n\nRight now, it supports 25 known Tamil encodings. Read more details about [txt2unicode](tamil/txt2unicode/README.md) and [limitation](examples/txt2unicode/encodes_chars/README.md) of `auto2unicode` and `unicode2auto`.\n\u0ba4\u0bb1\u0bcd\u0b9a\u0bae\u0baf\u0bae\u0bcd, \u0b87\u0ba4\u0bc1 25 \u0ba4\u0bae\u0bbf\u0bb4\u0bcd \u0b95\u0bc1\u0bb1\u0bbf\u0baf\u0bbe\u0b95\u0bcd\u0b95\u0bae\u0bcd \u0b95\u0bca\u0ba3\u0bcd\u0b9f \u0b8e\u0bb4\u0bc1\u0ba4\u0bcd\u0ba4\u0bc1\u0bb0\u0bc1\u0b95\u0bcd\u0b95\u0bb3\u0bc8 \u0b86\u0ba4\u0bb0\u0bbf\u0b95\u0bcd\u0b95\u0bbf\u0bb1\u0ba4\u0bc1. [txt2unicode](tamil/txt2unicode/README.md) \u0baa\u0bb1\u0bcd\u0bb1\u0bbf \u0bae\u0bc7\u0bb2\u0bc1\u0bae\u0bcd \u0bb5\u0bbf\u0bb5\u0bb0\u0b99\u0bcd\u0b95\u0bb3\u0bc1\u0bae\u0bcd 'auto2unicode' \u0bae\u0bb1\u0bcd\u0bb1\u0bc1\u0bae\u0bcd 'unicode2auto'-\u0bb5\u0bbf\u0ba9\u0bcd [\u0b95\u0bc1\u0bb1\u0bc8\u0baa\u0bbe\u0b9f\u0bc1\u0b95\u0bb3\u0bc8\u0baf\u0bc1\u0bae\u0bcd] (examples/txt2unicode/encodes_chars/README.md) \u0b95\u0bbe\u0ba3\u0bcd\u0b95.\n\ntxt2ipa\n-------\nTamil Unicode Text to International Phonetic Alphabet (IPA) converter\nRead more details about [txt2ipa](tamil/txt2ipa/README.md)\n\u0b9a\u0bb0\u0bcd\u0bb5\u0ba4\u0bc7\u0b9a (\u0b90\u0baa\u0bbf\u0b8f) \u0bae\u0bbe\u0bb1\u0bcd\u0bb1\u0bbf, \u0ba4\u0bae\u0bbf\u0bb4\u0bcd \u0baf\u0bc1\u0ba9\u0bbf\u0b95\u0bcb\u0b9f\u0bcd \u0b89\u0bb0\u0bc8; \u0bae\u0bc7\u0bb2\u0bc1\u0bae\u0bcd \u0bb5\u0bbf\u0baa\u0bb0\u0b99\u0bcd\u0b95\u0bb3\u0bc1\u0b95\u0bcd\u0b95\u0bc1 -> \u0baa\u0b9f\u0bbf\u0b95\u0bcd\u0b95 [\u0b87\u0b99\u0bcd\u0b95\u0bc1 \u0b9a\u0bca\u0b9f\u0bc1\u0b95\u0bcd\u0b95\u0bb5\u0bc1\u0bae\u0bcd](tamil/txt2ipa/README.md).\n\ntransliterate\n-------------\nThe python package `transliterate` provides for commonly used transliteration\nphonetic schemes like,\n\n1. Azhagi - phonetic maps for all Tamil letters - many -> one supporting multiple form inputs\n2. Jaffna Library - phonetic maps for all Tamil letters - one->one\n3. Combinational layout - based on phonetic mapping of vowel+consonant\n4. University of Madras, ISO - transliteration schemes are added.\n\nwhere you can supply English text, which phonetically encodes Tamil, and then receive Unicode encoded, in a best-effort algorithm for the longest phonetic match.\n\n`transliterate` \u0ba4\u0bca\u0b95\u0bc1\u0baa\u0bcd\u0baa\u0bc1 \u0baa\u0bca\u0ba4\u0bc1\u0bb5\u0bbe\u0b95 \u0baa\u0baf\u0ba9\u0bcd\u0baa\u0b9f\u0bc1\u0ba4\u0bcd\u0ba4\u0baa\u0bcd\u0baa\u0b9f\u0bc1\u0bae\u0bcd \u0b92\u0bb2\u0bbf\u0baa\u0bc6\u0baf\u0bb0\u0bcd\u0baa\u0bcd\u0baa\u0bc1\u0b95\u0bb3\u0bc8 \u0bb5\u0bb4\u0b99\u0bcd\u0b95\u0bc1\u0b95\u0bbf\u0bb1\u0ba4\u0bc1; \u0b85\u0bb5\u0bc8,\n1. \u0b85\u0bb4\u0b95\u0bbf - \u0ba4\u0bae\u0bbf\u0bb4\u0bcd \u0b95\u0b9f\u0bbf\u0ba4\u0b99\u0bcd\u0b95\u0bb3\u0bcd \u0b92\u0bb2\u0bbf\u0baa\u0bcd\u0baa\u0bc1 \u0bb5\u0bb0\u0bc8\u0baa\u0b9f\u0b99\u0bcd\u0b95\u0bb3\u0bcd - \u0baa\u0bb2 -> \u0b92\u0bb0\u0bc1 \u0b86\u0ba4\u0bb0\u0bb5\u0bc1 \u0baa\u0bb2 \u0bb5\u0b9f\u0bbf\u0bb5\u0bae\u0bcd \u0b89\u0bb3\u0bcd\u0bb3\u0bc0\u0b9f\u0bc1\u0b95\u0bb3\u0bcd\n2. \u0baf\u0bbe\u0bb4\u0bcd\u0baa\u0bcd\u0baa\u0bbe\u0ba3 \u0ba8\u0bc2\u0bb2\u0b95\u0bae\u0bcd - \u0ba4\u0bae\u0bbf\u0bb4\u0bcd \u0b95\u0b9f\u0bbf\u0ba4\u0b99\u0bcd\u0b95\u0bb3\u0bcd \u0b92\u0bb2\u0bbf\u0baa\u0bcd\u0baa\u0bc1 \u0bb5\u0bb0\u0bc8\u0baa\u0b9f\u0b99\u0bcd\u0b95\u0bb3\u0bcd - \u0b92\u0ba9\u0bcd\u0bb1\u0bc1> \u0b92\u0bb0\u0bc1\n3. \u0baa\u0bb2\u0ba4\u0bb0\u0baa\u0bcd\u0baa\u0b9f\u0bcd\u0b9f \u0b85\u0bae\u0bc8\u0baa\u0bcd\u0baa\u0bc1 - \u0b89\u0baf\u0bbf\u0bb0\u0bcd + \u0bae\u0bc6\u0baf\u0bcd \u0b89\u0b9a\u0bcd\u0b9a\u0bb0\u0bbf\u0baa\u0bcd\u0baa\u0bc1 \u0bae\u0bc7\u0baa\u0bcd\u0baa\u0bbf\u0b99\u0bcd \u0b85\u0b9f\u0bbf\u0baa\u0bcd\u0baa\u0b9f\u0bc8\u0baf\u0bbf\u0bb2\u0bcd\n\ntamilmorse\n----------\n\u0b87\u0ba8\u0bcd\u0ba4 \u0ba4\u0bca\u0b95\u0bc1\u0baa\u0bcd\u0baa\u0bbf\u0bb2\u0bcd \u0ba4\u0bae\u0bbf\u0bb4\u0bc1\u0b95\u0bcd\u0b95\u0bbe\u0ba9 \u0bae\u0bcb\u0bb0\u0bcd\u0b9a\u0bc1 \u0b95\u0bc1\u0bb1\u0bbf\u0b95\u0bb3\u0bc8 \u0b89\u0bb0\u0bc1\u0bb5\u0bbe\u0b95\u0bcd\u0b95\u0bb5\u0bc1\u0bae\u0bcd, \u0b95\u0bc1\u0bb1\u0bbf\u0baf\u0bc0\u0b9f\u0bc1\u0b95\u0bb3\u0bc8\n\u0baa\u0bbf\u0bb0\u0bbf\u0ba4\u0bcd\u0ba4\u0bc1\u0baa\u0bcd\u0baa\u0bbe\u0bb0\u0bcd\u0b95\u0bb5\u0bc1\u0bae\u0bcd \u0bae\u0bc1\u0b9f\u0bbf\u0baf\u0bc1\u0bae\u0bcd.\n\ntamilsandhi\n-------------\n\u0ba4\u0bae\u0bbf\u0bb4\u0bbf\u0bb2\u0bcd \u0b9a\u0ba8\u0bcd\u0ba4\u0bbf\u0baa\u0bcd\u0baa\u0bbf\u0bb4\u0bc8 \u0ba4\u0bbf\u0bb0\u0bc1\u0ba4\u0bcd\u0ba4\u0bbf \u0b89\u0bb0\u0bc1\u0bb5\u0bbe\u0b95\u0bcd\u0b95\u0bb5\u0bc1\u0bae\u0bcd \u0baa\u0bbf\u0bb4\u0bc8\u0b95 \u0ba4\u0bbf\u0bb0\u0bc1\u0ba4\u0bcd\u0ba4\u0bb5\u0bc1\u0bae\u0bcd \u0b89\u0ba4\u0bb5\u0bbf\u0baf\u0bbe\u0b95\u0b87\u0ba8\u0bcd\u0ba4 \u0ba8\u0bbf\u0bb0\u0bb2\u0bcd \u0ba4\u0bca\u0b95\u0bc1\u0baa\u0bcd\u0baa\u0bc1 \u0bb5\u0bb4\u0bbf\u0bb5\u0b95\u0bc1\u0b95\u0bcd\u0b95\u0bc1\u0bae\u0bcd. \u0b8f\u0bb0\u0b95\u0bcd\u0b95\u0bc1\u0bb1\u0bc8\u0baf 40-\u0bb5\u0bbf\u0ba4\u0bbf\u0b95\u0bb3\u0bc8 \u0b95\u0bca\u0ba3\u0bcd\u0b9f\u0ba4\u0bc1 \u0b87\u0ba8\u0bcd\u0ba4 \u0ba8\u0bbf\u0bb0\u0bb2\u0bcd \u0ba4\u0bca\u0b95\u0bc1\u0baa\u0bcd\u0baa\u0bc8 \u0b89\u0bb0\u0bc1\u0bb5\u0bbe\u0b95\u0bcd\u0b95\u0bbf\u0baf\u0bb5\u0bb0\u0bcd \u0ba4\u0bbf\u0bb0\u0bc1\u0bae\u0ba4\u0bbf. \u0ba8\u0bbf\u0ba4\u0bcd\u0baf\u0bbe. \u0bae\u0bc7\u0bb2\u0bc1\u0bae\u0bcd \u0bb5\u0bbf\u0bb5\u0bb0\u0b99\u0bcd\u0b95\u0bb3\u0bc1\u0b95\u0bcd\u0b95 https://github.com/nithyadurai87/tamil-sandhi-checker\nTamil Sandhi Checker is a project created and maintained by Nithya Duraisamy,\nwith contributions from Ezhil Language Foundation. It is distributed under terms of GNU GPLv3.\n\nFor convenience this code is packaged with Open-Tamil.\n\nC-tamil\n-------\nThe package under C-tamil provides some of the same functionality as Python 'tamil' but in ISO-C for C/C++ use.\n*\u0b9a\u0bbf \u0ba4\u0bae\u0bbf\u0bb4\u0bcd*\n\u0baa\u0bc8\u0ba4\u0bbe\u0ba9\u0bcd '\u0ba4\u0bae\u0bbf\u0bb4\u0bcd' \u0ba4\u0bca\u0b95\u0bc1\u0baa\u0bcd\u0baa\u0bbf\u0bb2\u0bcd \u0b89\u0bb3\u0bcd\u0bb3 \u0b9a\u0bbf\u0bb2 \u0baa\u0baf\u0ba9\u0bcd\u0baa\u0bbe\u0b9f\u0bc1\u0b95\u0bb3\u0bc8 '\u0b9a\u0bbf \u0ba4\u0bae\u0bbf\u0bb4\u0bcd' \u0b90\u0b8e\u0bb8\u0bcd\u0b93-\u0b9a\u0bbf-\u0baf\u0bbf\u0bb2\u0bcd, \u0b9a\u0bbf/\u0b9a\u0bbf++ \u0baa\u0baf\u0ba9\u0bcd\u0baa\u0b9f\u0bc1\u0ba4\u0bcd\u0ba4\u0bc1\u0bae\u0bcd \u0bb5\u0b95\u0bc8\u0baf\u0bbf\u0bb2\u0bcd \u0b95\u0bca\u0b9f\u0bc1\u0b95\u0bcd\u0b95\u0bc1\u0bae\u0bcd.\n\n\u0ba4\u0bbf\u0bb0\u0bc8 \u0bb5\u0bbf\u0b9a\u0bc8\u0baa\u0bcd\u0baa\u0bb2\u0b95\u0bc8 (Onscreen Keyboard)\n----------------------------------\nOpen-tamil provides the keyboard layout in the file `keyboard/tamil.js` for they jQuery UI plugin.\n'tamil.js' \u0bb5\u0bbf\u0b9a\u0bc8\u0baa\u0bcd\u0baa\u0bb2\u0b95\u0bc8 \u0b85\u0bae\u0bc8\u0baa\u0bcd\u0baa\u0bc8 \u0bb5\u0bb4\u0b99\u0bcd\u0b95\u0bc1\u0b95\u0bbf\u0bb1\u0ba4\u0bc1.\n\n\u0bae\u0bbe\u0ba4\u0bbf\u0bb0\u0bbf\u0b95\u0bb3\u0bcd (Language Modes)\n-------------------------\nBasic support for letter unigram, bigram models using UTF-8 based corpora are supported in the package 'ngram/'\nwhich supports unigram model at the moment. More complex language models are expected to be developed soon.\n\u0b8e\u0bb4\u0bc1\u0ba4\u0bcd\u0ba4\u0bc1 unigram \u0b85\u0b9f\u0bbf\u0baa\u0bcd\u0baa\u0b9f\u0bc8 \u0b86\u0ba4\u0bb0\u0bb5\u0bc1, \u0bae\u0bb1\u0bcd\u0bb1\u0bc1\u0bae\u0bcd UTF-8 \u0b85\u0b9f\u0bbf\u0baa\u0bcd\u0baa\u0b9f\u0bc8\u0baf\u0bbf\u0bb2\u0bcd \u0b9a\u0bca\u0bb1\u0bcd\u0b95\u0bbf\u0b9f\u0b99\u0bcd\u0b95\u0bbf\u0ba9\u0bcd \u0baa\u0baf\u0ba9\u0bcd\u0baa\u0b9f\u0bc1\u0ba4\u0bcd\u0ba4\u0bbf bigram \u0bae\u0bbe\u0ba4\u0bbf\u0bb0\u0bbf\u0b95\u0bb3\u0bcd 'ngram/' \u0ba4\u0bca\u0b95\u0bc1\u0baa\u0bcd\u0baa\u0bbe\u0bb2\u0bcd \u0b86\u0ba4\u0bb0\u0bbf\u0b95\u0bcd\u0b95\u0baa\u0bcd\u0baa\u0b9f\u0bc1\u0b95\u0bbf\u0ba9\u0bcd\u0bb1\u0ba9, \u0ba4\u0bb1\u0bcd\u0baa\u0bca\u0bb4\u0bc1\u0ba4\u0bc1 \u0b85\u0ba4\u0bc1 \u0bae\u0bbe\u0ba4\u0bbf\u0bb0\u0bbf unigram-\u0b90 \u0b86\u0ba4\u0bb0\u0bbf\u0b95\u0bcd\u0b95\u0bbf\u0bb1\u0ba4\u0bc1. \u0bae\u0bbf\u0b95\u0bb5\u0bc1\u0bae\u0bcd \u0ba8\u0bc1\u0ba3\u0bc1\u0b95\u0bcd\u0b95\u0bae\u0bbe\u0ba9 \u0bae\u0bca\u0bb4\u0bbf \u0bae\u0bbe\u0ba4\u0bbf\u0bb0\u0bbf\u0b95\u0bb3\u0bcd \u0bb5\u0bbf\u0bb0\u0bc8\u0bb5\u0bbf\u0bb2\u0bcd \u0b85\u0baa\u0bbf\u0bb5\u0bbf\u0bb0\u0bc1\u0ba4\u0bcd\u0ba4\u0bbf \u0b9a\u0bc6\u0baf\u0bcd\u0baf\u0baa\u0bcd\u0baa\u0b9f\u0bc1\u0bae\u0bcd \u0b8e\u0ba9 \u0b8e\u0ba4\u0bbf\u0bb0\u0bcd\u0baa\u0bbe\u0bb0\u0bcd\u0b95\u0bcd\u0b95\u0baa\u0bcd\u0baa\u0b9f\u0bc1\u0b95\u0bbf\u0bb1\u0ba4\u0bc1.\n\n\u0ba8\u0bbf\u0bb1\u0bc1\u0bb5\u0bc1\u0ba4\u0bb2\u0bcd (Installation)\n=======================\nInstallation from Python Package Index is also recommended, following the commands,\n\n $ pip install open-tamil\n\nInstalling from sources\n=======================\nAfter pulling sources from git repo you need to sync the submodule\nfor tamilsandhi by issuing the following commands,\n\n$ git submodule init \n$ git submodule update --force\n\nThis is required for packaging, tamilsandhichecker, along with open-tamil.\n\n\u0b89\u0ba4\u0bbe\u0bb0\u0ba3\u0b99\u0bcd\u0b95\u0bb3\u0bcd (Example\n===================\nOpen-Tamil is a set of Python libraries which can help your application - web, system software, GUI on desktop etc. support Tamil text processing, inputs etc.\n\nOpen-Tamil is still a basic collection of tools - its not complete yet. We have keyboard layouts, converters to change old encoding to UTF-8, N-gram language models, transliterators etc.\n\nExamples for using Python Open-Tamil are found [here](tests/).\n\n\u0b93\u0baa\u0ba9\u0bcd-\u0ba4\u0bae\u0bbf\u0bb4\u0bcd \u0b8e\u0ba9\u0bcd\u0baa\u0ba4\u0bc1 \u0ba4\u0bca\u0b95\u0bc1\u0b95\u0bcd\u0b95\u0baa\u0bcd\u0baa\u0b9f\u0bcd\u0b9f \u0baa\u0bc8\u0ba4\u0bbe\u0ba9\u0bcd \u0ba8\u0bc2\u0bb2\u0b95\u0bae\u0bbe\u0b95\u0bc1\u0bae\u0bcd, \u0b89\u0b99\u0bcd\u0b95\u0bb3\u0bcd \u0bb5\u0bb2\u0bc8, \u0ba3\u0bbf\u0ba9\u0bbf \u0ba8\u0bbf\u0bb0\u0bb2\u0bcd, \u0bae\u0bc1\u0b95\u0ba4\u0bcd\u0ba4\u0bbf\u0bb0\u0bc8 \u0bb5\u0bb0\u0bc8\u0b95\u0bb2\u0bc8 \u0bae\u0bb1\u0bcd\u0bb1\u0bc1\u0bae\u0bcd \u0baa\u0bb2 \u0ba4\u0bae\u0bbf\u0bb4\u0bcd \u0b8e\u0bb4\u0bc1\u0ba4\u0bcd\u0ba4\u0bc1\u0bb0\u0bc1 \u0b9a\u0bc6\u0baf\u0bb1\u0bcd\u0baa\u0bbe\u0b9f\u0bc1\u0b95\u0bb3\u0bc1\u0b95\u0bcd\u0b95\u0bc1 \u0bae\u0bbf\u0b95\u0bb5\u0bc1\u0bae\u0bcd \u0b89\u0bb5\u0bbf\u0baf\u0bbe\u0b95 \u0b87\u0bb0\u0bc1\u0b95\u0bcd\u0b95\u0bc1\u0bae\u0bcd.\n\u0b93\u0baa\u0ba9\u0bcd-\u0ba4\u0bae\u0bbf\u0bb4\u0bcd \u0b8e\u0ba9\u0bcd\u0baa\u0ba4\u0bc1 \u0b85\u0b9f\u0bbf\u0baa\u0bcd\u0baa\u0b9f\u0bc8 \u0ba4\u0bca\u0b95\u0bc1\u0baa\u0bcd\u0baa\u0bc1\u0b95\u0bcd\u0b95\u0bb3\u0bc8 \u0bae\u0b9f\u0bcd\u0b9f\u0bc1\u0bae\u0bc7 \u0b95\u0bca\u0ba3\u0bcd\u0b9f \u0b95\u0bb0\u0bc1\u0bb5\u0bbf\u0b95\u0bb3\u0bbe\u0bc1\u0bae\u0bcd, \u0b87\u0ba4\u0bc1 \u0b87\u0ba9\u0bcd\u0ba9\u0bc1\u0bae\u0bcd \u0bae\u0bc1\u0bb4\u0bc1\u0bae\u0bc8 \u0baa\u0bc6\u0bb1\u0bb5\u0bbf\u0bb2\u0bcd\u0bb2\u0bc8. \u0b87\u0ba4\u0bbf\u0bb2\u0bcd UTF-8, \u0b8e\u0ba9\u0bcd-\u0b95\u0bbf\u0bb0\u0bbe\u0bae\u0bcd \u0bae\u0bca\u0bb4\u0bbf \u0bae\u0bbe\u0ba4\u0bbf\u0bb0\u0bbf\u0b95\u0bb3\u0bcd, transliterators \u0bae\u0bc1\u0ba4\u0bb2\u0bbf\u0baf\u0ba9 \u0baa\u0bb4\u0bc8\u0baf \u0bae\u0bc1\u0bb1\u0bc8\u0baf\u0bc8 \u0bae\u0bbe\u0bb1\u0bcd\u0bb1 \u0bb5\u0bbf\u0b9a\u0bc8\u0baa\u0bcd\u0baa\u0bb2\u0b95\u0bc8 \u0b85\u0bae\u0bc8\u0baa\u0bcd\u0baa\u0bc1, \u0bae\u0bbe\u0bb1\u0bcd\u0bb1\u0bbf\u0b95\u0bb3\u0bcd \u0b89\u0bb3\u0bcd\u0bb3\u0ba9. \u0baa\u0bc8\u0ba4\u0bbe\u0ba9\u0bcd \u0b93\u0baa\u0ba9\u0bcd \u0ba4\u0bae\u0bbf\u0bb4\u0bcd \u0baa\u0baf\u0ba9\u0bcd\u0baa\u0b9f\u0bc1\u0ba4\u0bcd\u0ba4\u0bbf \u0b89\u0ba4\u0bbe\u0bb0\u0ba3\u0b99\u0bcd\u0b95\u0bb3\u0bcd [\u0b87\u0b99\u0bcd\u0b95\u0bc1](tests/) \u0b95\u0bbe\u0ba3\u0baa\u0bcd\u0baa\u0b9f\u0bc1\u0b95\u0bbf\u0ba9\u0bcd\u0bb1\u0ba9.\n\n\u0b87\u0bb2\u0b95\u0bcd\u0b95\u0bc1\u0b95\u0bb3\u0bcd (Goals)\n=================\nGoal of this package is to collect and develop open-source licensed Tamil tools, in one location that provide the following,\n\n1. Unicode standard tools for Tamil - provide various tools for Tamil Unicode development. Currently 25 encodes are supported, read about it [here](tamil/txt2unicode/README.md)\n2. Access Unicode Tamil letters, vowels and consonants.\n3. Breakdown Tamil glyphs and unicode code-points into Tamil letter representations - collation\n4. Tools for navigating a corpus of data, build word frequency, prediction tables etc.\n5. Conversion from various encodings. e.g. TSCII to Unicode etc. We hope eventually to converts between the other major Tamil encodings like TAB, TAM, Bamini (*insert-your-favortie-font-encoding*) into Tamil Unicode encoding.\n6. Support all of above in Python3.\n\nWhile most of tools in this package will be in Python 2.6. or later, we are open to other open-source language source code contributions.\n\nContributing to Open-Tamil\n===========================\n1. Please add your code, and unit tests under MIT, GNU GPL or ASF licenses.\n2. Update your code into modules, add unit tests following the Python flake8, pylint standards\n3. Please do not mix TABS and SPACES. Use 4-space for Tabs.\n4. Make sure your module installed as part of pip package\n5. Ensure your code works for Python 2 and 3.\n\n\u0baa\u0bb1\u0bcd\u0bb1\u0bbf(About)\n============\nTamil is classical language primarily spoken in South India.\n\u0ba4\u0bae\u0bbf\u0bb4\u0bcd \u0bae\u0bc1\u0ba4\u0ba9\u0bcd\u0bae\u0bc8\u0baf\u0bbe\u0b95 \u0ba4\u0bc6\u0ba9\u0bcd \u0b87\u0ba8\u0bcd\u0ba4\u0bbf\u0baf\u0bbe\u0bb5\u0bbf\u0bb2\u0bcd \u0baa\u0bc7\u0b9a\u0baa\u0bcd\u0baa\u0b9f\u0bc1\u0bae\u0bcd \u0baa\u0bbe\u0bb0\u0bae\u0bcd\u0baa\u0bb0\u0bbf\u0baf \u0bae\u0bca\u0bb4\u0bbf \u0b86\u0b95\u0bc1\u0bae\u0bcd.",
"bugtrack_url": null,
"license": "MIT",
"summary": "Tamil language text processing tools for Python v3",
"version": "1.1",
"project_urls": {
"Download": "https://github.com/Ezhil-Language-Foundation/open-tamil/archive/master.zip",
"Homepage": "https://github.com/Ezhil-Language-Foundation/open-tamil"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "b55663140925e005952f3c7c521b039948108cdbbf782ec205aff85a6a465164",
"md5": "46abc35c93d39313ca873e40b9564343",
"sha256": "339623ef6aefbca8e6d8fff3e2901d437f5d9b774576f1eb6c1540fbbe20ebb8"
},
"downloads": -1,
"filename": "Open-Tamil-1.1.tar.gz",
"has_sig": false,
"md5_digest": "46abc35c93d39313ca873e40b9564343",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 2554397,
"upload_time": "2022-05-27T03:38:24",
"upload_time_iso_8601": "2022-05-27T03:38:24.687287Z",
"url": "https://files.pythonhosted.org/packages/b5/56/63140925e005952f3c7c521b039948108cdbbf782ec205aff85a6a465164/Open-Tamil-1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2022-05-27 03:38:24",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Ezhil-Language-Foundation",
"github_project": "open-tamil",
"travis_ci": true,
"coveralls": false,
"github_actions": true,
"lcname": "open-tamil"
}