goslate


Namegoslate JSON
Version 1.5.4 PyPI version JSON
download
home_pagehttps://pypi.python.org/pypi/goslate
SummaryGoslate: Free Google Translate API
upload_time2022-06-13 07:44:32
maintainer
docs_urlhttps://pythonhosted.org/goslate/
authorZHUO Qiang
requires_python
licenseMIT
keywords google translation i18n l10n
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            Goslate: Free Google Translate API
##################################################

.. note::
   Google has updated its translation service recently with a ticket mechanism to prevent simple crawler programs like ``goslate`` from accessing.
   Though a more sophisticated crawler may still work technically, it would have crossed the fine line between using the service and breaking the service.
   ``goslate`` will not be updated to break google's ticket mechanism. Free lunch is over. Thanks for using.

.. contents:: :local:

``goslate`` provides you *free* python API to google translation service by querying google translation website.

It is:

- **Free**: get translation through public google web site without fee
- **Fast**: batch, cache and concurrently fetch
- **Simple**: single file module, just ``Goslate().translate('Hi!', 'zh')``


Simple Usage
==============

The basic usage is simple:

.. sourcecode:: python

 >>> import goslate
 >>> gs = goslate.Goslate()
 >>> print(gs.translate('hello world', 'de'))
 hallo welt

 
Installation
===============

goslate support both Python2 and Python3. You could install it via:


.. sourcecode:: bash
  
  $ pip install goslate

 
or just download `latest goslate.py <https://bitbucket.org/zhuoqiang/goslate/raw/tip/goslate.py>`_ directly and use

``futures`` `package <https://pypi.python.org/pypi/futures>`_ is optional but recommended to install for best performance in large text translation tasks.

 
Proxy Support
===============

Proxy support could be added as following:

.. sourcecode:: python

 import urllib2
 import goslate

 proxy_handler = urllib2.ProxyHandler({"http" : "http://proxy-domain.name:8080"})
 proxy_opener = urllib2.build_opener(urllib2.HTTPHandler(proxy_handler), 
                                     urllib2.HTTPSHandler(proxy_handler))
                                     
 gs_with_proxy = goslate.Goslate(opener=proxy_opener)
 translation = gs_with_proxy.translate("hello world", "de")
 
 
Romanization
====================

Romanization or latinization (or romanisation, latinisation), in linguistics, is the conversion of writing from a different writing system to the Roman (Latin) script, or a system for doing so.

For example, pinyin is the default romanization method for Chinese language.

You could get translation in romanized writing as following:

.. sourcecode:: python

 >>> import goslate
 >>> roman_gs = goslate.Goslate(writing=goslate.WRITING_ROMAN)
 >>> print(roman_gs.translate('China', 'zh'))
 Zhōngguó
  

You could also get translation in both native writing system and ramon writing system

.. sourcecode:: python

 >>> import goslate                
 >>> gs = goslate.Goslate(writing=goslate.WRITING_NATIVE_AND_ROMAN)
 >>> gs.translate('China', 'zh')
 ('中国', 'Zhōngguó')

 
You could see the result will be a tuple in this case: ``(Translation-in-Native-Writing, Translation-in-Roman-Writing)``

Language Detection
====================

Sometimes all you need is just find out which language the text is:

.. sourcecode:: python

 >>> import goslate
 >>> gs = goslate.Goslate()
 >>> language_id = gs.detect('hallo welt')
 >>> language_id
 'de'
 >>> gs.get_languages()[language_id]
 'German'


Concurrent Querying 
====================

It is not necessary to roll your own multi-thread solution to speed up massive translation. Goslate has already done it for you. It utilizes ``concurrent.futures`` for concurrent querying. The max worker number is 120 by default. 

The worker number could be changed as following:

.. sourcecode:: python

 >>> import goslate
 >>> import concurrent.futures
 >>> executor = concurrent.futures.ThreadPoolExecutor(max_workers=200)
 >>> gs = goslate.Goslate(executor=executor)
 >>> it = gs.translate(['text1', 'text2', 'text3'])
 >>> list(it)
 ['translation1', 'translation2', 'translation3']

 
It is advised to install ``concurrent.futures`` backport lib in python2.7 (python3 has it by default) to enable concurrent querying. 

The input could be list, tuple or any iterator, even the file object which iterate line by line

.. sourcecode:: python

 >>> translated_lines = gs.translate(open('readme.txt'))
 >>> translation = '\n'.join(translated_lines)

 
Do not worry about short texts will increase the query time. Internally, goslate will join small text into one big text to reduce the unnecessary query round trips.
 
 
Batch Translation
====================

Google translation does not support very long text, goslate bypasses this limitation by splitting the long text internally before sending it to Google and joining the multiple results into one translation text to the end user. 

.. sourcecode:: python

 >>> import goslate
 >>> with open('the game of thrones.txt', 'r') as f:
 >>>     novel_text = f.read()
 >>> gs = goslate.Goslate()
 >>> gs.translate(novel_text)


Performance Consideration
================================

Goslate uses batch and concurrent fetch aggressively to achieve maximized translation speed internally.

All you need to do is reduce API calling times by utilizing batch translation and concurrent querying.

For example, say if you want to translate 3 big text files. Instead of manually translate them one by one, line by line:

.. sourcecode:: python

 import goslate
 
 big_files = ['a.txt', 'b.txt', 'c.txt']
 gs = goslate.Goslate()
 
 translation = []
 for big_file in big_files:
     with open(big_file, 'r') as f:
         translated_lines = []
         for line in f:
             translated_line = gs.translate(line)
             translated_lines.append(translated_line)
     
         translation.append('\n'.join(translated_lines))
 
         
It is better to leave them to Goslate totally. The following code is not only simpler but also much faster (+100x) :

.. sourcecode:: python

 import goslate
 
 big_files = ['a.txt', 'b.txt', 'c.txt']
 gs = goslate.Goslate()
 
 translation_iter = gs.translate(open(big_file, 'r').read() for big_file in big_files)
 translation = list(translation_iter)
 
 
Internally, goslate will first adjust the text to make them not so big that do not fit Google query API, nor so small that increase the total HTTP querying times. Then it will use concurrent queries to speed things even further.
 

Lookup Details in Dictionary
================================

If you want detail dictionary explanation for a single word/phrase, you could

.. sourcecode:: python

 >>> import goslate
 >>> gs = goslate.Goslate()
 >>> gs.lookup_dictionary('sun', 'de')
 [[['Sonne', 'sun', 0]],
  [['noun',
    ['Sonne'],
    [['Sonne', ['sun', 'Sun', 'Sol'], 0.44374731, 'die']],
    'sun',
    1],
   ['verb',
    ['der Sonne aussetzen'],
    [['der Sonne aussetzen', ['sun'], 1.1544633e-06]],
    'sun',
    2]],
  'en',
  0.9447732,
  [['en'], [0.9447732]]]

There are 2 limitations for this API:

* The result is a complex list structure which you have to parse for your own usage

* The input must be a single word/phase, batch translation and concurrent querying are not supported


Query Error
==================

If you get an HTTP 5xx error, it is probably because google has banned your client IP address from transaction querying.

You could verify it by accessing google translation service in the browser manually.

You could try the following to overcome this issue:

* query through a HTTP/SOCKS5 proxy, see `Proxy Support`_

* using another google domain for translation: ``gs = Goslate(service_urls=['http://translate.google.de'])``

* wait for 3 seconds before issue another querying
  
  
API References 
================================

please check `API reference <http://pythonhosted.org/goslate/#module-goslate>`_
 

Command Line Interface
==============================

``goslate.py`` is also a command line tool which you could use directly
    
- Translate ``stdin`` input into Chinese in GBK encoding

  .. sourcecode:: bash
  
     $ echo "hello world" | goslate.py -t zh-CN -o gbk

- Translate 2 text files into Chinese, output to UTF-8 file

  .. sourcecode:: bash
  
     $ goslate.py -t zh-CN -o utf-8 source/1.txt "source 2.txt" > output.txt

     
use ``--help`` for detail usage
     
.. sourcecode:: bash
  
   $ goslate.py -h
     
     
How to Contribute
==================

- Report `issues & suggestions <https://bitbucket.org/zhuoqiang/goslate/issues>`_
- Fork `repository <https://bitbucket.org/zhuoqiang/goslate>`_
- `Donation <http://pythonhosted.org/goslate/#donate>`_

What's New
============

1.5.4
----------

* handle deprecated `threading.currentThread()` properly
* add `retry_wait_duration` param to fine control the retry behavior in case of connection error


1.5.2
----------

* [fix bug] removes newlines from descriptions to avoid installation failure


1.5.0
----------

* Add new API ``Goslate.lookup_dictionary()`` to get detail information for a single word/phrase, thanks for Adam's suggestion
  
* Improve document with more user scenario and performance consideration


1.4.0
----------

* [fix bug] update to adapt latest google translation service changes


1.3.2
----------

* [fix bug] fix compatible issue with latest google translation service json format changes

* [fix bug] unit test failure



1.3.0
---------

* [new feature] Translation in roman writing system (romanization), thanks for Javier del Alamo's contribution.
  
* [new feature] Customizable service URL. you could provide multiple google translation service URLs for better concurrency performance

* [new option] roman writing translation option for CLI
  
* [fix bug] Google translation may change normal space to no-break space

* [fix bug] Google web API changed for getting supported language list




            

Raw data

            {
    "_id": null,
    "home_page": "https://pypi.python.org/pypi/goslate",
    "name": "goslate",
    "maintainer": "",
    "docs_url": "https://pythonhosted.org/goslate/",
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "google translation i18n l10n",
    "author": "ZHUO Qiang",
    "author_email": "zhuo.qiang@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/58/58/660a0bd2e64716b1c30f611c3ac719bee9a356187d170a17d52648ba742d/goslate-1.5.4.tar.gz",
    "platform": null,
    "description": "Goslate: Free Google Translate API\n##################################################\n\n.. note::\n   Google has updated its translation service recently with a ticket mechanism to prevent simple crawler programs like ``goslate`` from accessing.\n   Though a more sophisticated crawler may still work technically, it would have crossed the fine line between using the service and breaking the service.\n   ``goslate`` will not be updated to break google's ticket mechanism. Free lunch is over. Thanks for using.\n\n.. contents:: :local:\n\n``goslate`` provides you *free* python API to google translation service by querying google translation website.\n\nIt is:\n\n- **Free**: get translation through public google web site without fee\n- **Fast**: batch, cache and concurrently fetch\n- **Simple**: single file module, just ``Goslate().translate('Hi!', 'zh')``\n\n\nSimple Usage\n==============\n\nThe basic usage is simple:\n\n.. sourcecode:: python\n\n >>> import goslate\n >>> gs = goslate.Goslate()\n >>> print(gs.translate('hello world', 'de'))\n hallo welt\n\n \nInstallation\n===============\n\ngoslate support both Python2 and Python3. You could install it via:\n\n\n.. sourcecode:: bash\n  \n  $ pip install goslate\n\n \nor just download `latest goslate.py <https://bitbucket.org/zhuoqiang/goslate/raw/tip/goslate.py>`_ directly and use\n\n``futures`` `package <https://pypi.python.org/pypi/futures>`_ is optional but recommended to install for best performance in large text translation tasks.\n\n \nProxy Support\n===============\n\nProxy support could be added as following:\n\n.. sourcecode:: python\n\n import urllib2\n import goslate\n\n proxy_handler = urllib2.ProxyHandler({\"http\" : \"http://proxy-domain.name:8080\"})\n proxy_opener = urllib2.build_opener(urllib2.HTTPHandler(proxy_handler), \n                                     urllib2.HTTPSHandler(proxy_handler))\n                                     \n gs_with_proxy = goslate.Goslate(opener=proxy_opener)\n translation = gs_with_proxy.translate(\"hello world\", \"de\")\n \n \nRomanization\n====================\n\nRomanization or latinization (or romanisation, latinisation), in linguistics, is the conversion of writing from a different writing system to the Roman (Latin) script, or a system for doing so.\n\nFor example, pinyin is the default romanization method for Chinese language.\n\nYou could get translation in romanized writing as following:\n\n.. sourcecode:: python\n\n >>> import goslate\n >>> roman_gs = goslate.Goslate(writing=goslate.WRITING_ROMAN)\n >>> print(roman_gs.translate('China', 'zh'))\n Zh\u014dnggu\u00f3\n  \n\nYou could also get translation in both native writing system and ramon writing system\n\n.. sourcecode:: python\n\n >>> import goslate                \n >>> gs = goslate.Goslate(writing=goslate.WRITING_NATIVE_AND_ROMAN)\n >>> gs.translate('China', 'zh')\n ('\u4e2d\u56fd', 'Zh\u014dnggu\u00f3')\n\n \nYou could see the result will be a tuple in this case: ``(Translation-in-Native-Writing, Translation-in-Roman-Writing)``\n\nLanguage Detection\n====================\n\nSometimes all you need is just find out which language the text is:\n\n.. sourcecode:: python\n\n >>> import goslate\n >>> gs = goslate.Goslate()\n >>> language_id = gs.detect('hallo welt')\n >>> language_id\n 'de'\n >>> gs.get_languages()[language_id]\n 'German'\n\n\nConcurrent Querying \n====================\n\nIt is not necessary to roll your own multi-thread solution to speed up massive translation. Goslate has already done it for you. It utilizes ``concurrent.futures`` for concurrent querying. The max worker number is 120 by default. \n\nThe worker number could be changed as following:\n\n.. sourcecode:: python\n\n >>> import goslate\n >>> import concurrent.futures\n >>> executor = concurrent.futures.ThreadPoolExecutor(max_workers=200)\n >>> gs = goslate.Goslate(executor=executor)\n >>> it = gs.translate(['text1', 'text2', 'text3'])\n >>> list(it)\n ['translation1', 'translation2', 'translation3']\n\n \nIt is advised to install ``concurrent.futures`` backport lib in python2.7 (python3 has it by default) to enable concurrent querying. \n\nThe input could be list, tuple or any iterator, even the file object which iterate line by line\n\n.. sourcecode:: python\n\n >>> translated_lines = gs.translate(open('readme.txt'))\n >>> translation = '\\n'.join(translated_lines)\n\n \nDo not worry about short texts will increase the query time. Internally, goslate will join small text into one big text to reduce the unnecessary query round trips.\n \n \nBatch Translation\n====================\n\nGoogle translation does not support very long text, goslate bypasses this limitation by splitting the long text internally before sending it to Google and joining the multiple results into one translation text to the end user. \n\n.. sourcecode:: python\n\n >>> import goslate\n >>> with open('the game of thrones.txt', 'r') as f:\n >>>     novel_text = f.read()\n >>> gs = goslate.Goslate()\n >>> gs.translate(novel_text)\n\n\nPerformance Consideration\n================================\n\nGoslate uses batch and concurrent fetch aggressively to achieve maximized translation speed internally.\n\nAll you need to do is reduce API calling times by utilizing batch translation and concurrent querying.\n\nFor example, say if you want to translate 3 big text files. Instead of manually translate them one by one, line by line:\n\n.. sourcecode:: python\n\n import goslate\n \n big_files = ['a.txt', 'b.txt', 'c.txt']\n gs = goslate.Goslate()\n \n translation = []\n for big_file in big_files:\n     with open(big_file, 'r') as f:\n         translated_lines = []\n         for line in f:\n             translated_line = gs.translate(line)\n             translated_lines.append(translated_line)\n     \n         translation.append('\\n'.join(translated_lines))\n \n         \nIt is better to leave them to Goslate totally. The following code is not only simpler but also much faster (+100x) :\n\n.. sourcecode:: python\n\n import goslate\n \n big_files = ['a.txt', 'b.txt', 'c.txt']\n gs = goslate.Goslate()\n \n translation_iter = gs.translate(open(big_file, 'r').read() for big_file in big_files)\n translation = list(translation_iter)\n \n \nInternally, goslate will first adjust the text to make them not so big that do not fit Google query API, nor so small that increase the total HTTP querying times. Then it will use concurrent queries to speed things even further.\n \n\nLookup Details in Dictionary\n================================\n\nIf you want detail dictionary explanation for a single word/phrase, you could\n\n.. sourcecode:: python\n\n >>> import goslate\n >>> gs = goslate.Goslate()\n >>> gs.lookup_dictionary('sun', 'de')\n [[['Sonne', 'sun', 0]],\n  [['noun',\n    ['Sonne'],\n    [['Sonne', ['sun', 'Sun', 'Sol'], 0.44374731, 'die']],\n    'sun',\n    1],\n   ['verb',\n    ['der Sonne aussetzen'],\n    [['der Sonne aussetzen', ['sun'], 1.1544633e-06]],\n    'sun',\n    2]],\n  'en',\n  0.9447732,\n  [['en'], [0.9447732]]]\n\nThere are 2 limitations for this API:\n\n* The result is a complex list structure which you have to parse for your own usage\n\n* The input must be a single word/phase, batch translation and concurrent querying are not supported\n\n\nQuery Error\n==================\n\nIf you get an HTTP 5xx error, it is probably because google has banned your client IP address from transaction querying.\n\nYou could verify it by accessing google translation service in the browser manually.\n\nYou could try the following to overcome this issue:\n\n* query through a HTTP/SOCKS5 proxy, see `Proxy Support`_\n\n* using another google domain for translation: ``gs = Goslate(service_urls=['http://translate.google.de'])``\n\n* wait for 3 seconds before issue another querying\n  \n  \nAPI References \n================================\n\nplease check `API reference <http://pythonhosted.org/goslate/#module-goslate>`_\n \n\nCommand Line Interface\n==============================\n\n``goslate.py`` is also a command line tool which you could use directly\n    \n- Translate ``stdin`` input into Chinese in GBK encoding\n\n  .. sourcecode:: bash\n  \n     $ echo \"hello world\" | goslate.py -t zh-CN -o gbk\n\n- Translate 2 text files into Chinese, output to UTF-8 file\n\n  .. sourcecode:: bash\n  \n     $ goslate.py -t zh-CN -o utf-8 source/1.txt \"source 2.txt\" > output.txt\n\n     \nuse ``--help`` for detail usage\n     \n.. sourcecode:: bash\n  \n   $ goslate.py -h\n     \n     \nHow to Contribute\n==================\n\n- Report `issues & suggestions <https://bitbucket.org/zhuoqiang/goslate/issues>`_\n- Fork `repository <https://bitbucket.org/zhuoqiang/goslate>`_\n- `Donation <http://pythonhosted.org/goslate/#donate>`_\n\nWhat's New\n============\n\n1.5.4\n----------\n\n* handle deprecated `threading.currentThread()` properly\n* add `retry_wait_duration` param to fine control the retry behavior in case of connection error\n\n\n1.5.2\n----------\n\n* [fix bug] removes newlines from descriptions to avoid installation failure\n\n\n1.5.0\n----------\n\n* Add new API ``Goslate.lookup_dictionary()`` to get detail information for a single word/phrase, thanks for Adam's suggestion\n  \n* Improve document with more user scenario and performance consideration\n\n\n1.4.0\n----------\n\n* [fix bug] update to adapt latest google translation service changes\n\n\n1.3.2\n----------\n\n* [fix bug] fix compatible issue with latest google translation service json format changes\n\n* [fix bug] unit test failure\n\n\n\n1.3.0\n---------\n\n* [new feature] Translation in roman writing system (romanization), thanks for Javier del Alamo's contribution.\n  \n* [new feature] Customizable service URL. you could provide multiple google translation service URLs for better concurrency performance\n\n* [new option] roman writing translation option for CLI\n  \n* [fix bug] Google translation may change normal space to no-break space\n\n* [fix bug] Google web API changed for getting supported language list\n\n\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Goslate: Free Google Translate API",
    "version": "1.5.4",
    "project_urls": {
        "Homepage": "https://pypi.python.org/pypi/goslate"
    },
    "split_keywords": [
        "google",
        "translation",
        "i18n",
        "l10n"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3a6f7e0e8649f8c0b03af8cbd4081cea8e0d1ae71cf7b6575745840921935177",
                "md5": "b6d9a6e8faea872bd2d6eccaa3d48b0e",
                "sha256": "a93385124c6733c863d9bb02005bba7504e501ec3c7acb9062af011e5b1a6e79"
            },
            "downloads": -1,
            "filename": "goslate-1.5.4-py3.9.egg",
            "has_sig": false,
            "md5_digest": "b6d9a6e8faea872bd2d6eccaa3d48b0e",
            "packagetype": "bdist_egg",
            "python_version": "1.5.4",
            "requires_python": null,
            "size": 20796,
            "upload_time": "2022-06-13T07:27:30",
            "upload_time_iso_8601": "2022-06-13T07:27:30.838800Z",
            "url": "https://files.pythonhosted.org/packages/3a/6f/7e0e8649f8c0b03af8cbd4081cea8e0d1ae71cf7b6575745840921935177/goslate-1.5.4-py3.9.egg",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5858660a0bd2e64716b1c30f611c3ac719bee9a356187d170a17d52648ba742d",
                "md5": "ce8bb1a342adbc513aebeef654c07d64",
                "sha256": "c6ad6b121d19eec08c29cea7385c055a7bc53247d1b8cd935f68228a4e2063d5"
            },
            "downloads": -1,
            "filename": "goslate-1.5.4.tar.gz",
            "has_sig": false,
            "md5_digest": "ce8bb1a342adbc513aebeef654c07d64",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 14093,
            "upload_time": "2022-06-13T07:44:32",
            "upload_time_iso_8601": "2022-06-13T07:44:32.185072Z",
            "url": "https://files.pythonhosted.org/packages/58/58/660a0bd2e64716b1c30f611c3ac719bee9a356187d170a17d52648ba742d/goslate-1.5.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2022-06-13 07:44:32",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "goslate"
}
        
Elapsed time: 0.55838s