tika-app


Nametika-app JSON
Version 1.1.1 PyPI version JSON
download
home_pagehttps://github.com/fedelemantuano/tika-app-python
SummaryPython client for Apache Tika App
upload_time2017-06-25 13:36:23
maintainerNone
docs_urlNone
authorFedele Mantuano
requires_pythonNone
licenseApache License, Version 2.0
keywords tika apache toolkit
VCS
bugtrack_url
requirements chainmap mail-parser python-magic simplejson six
Travis-CI
coveralls test coverage No coveralls.
            |PyPI version| |Build Status| |Coverage Status|

tika-app-python
===============

Overview
--------

tika-app-python is a wrapper for `Apache Tika App`_.

Apache 2 Open Source License
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

tika-app-python can be downloaded, used, and modified free of charge. It
is available under the Apache 2 license.

Authors
-------

Main Author
~~~~~~~~~~~

Fedele Mantuano (**Twitter**:
[@fedelemantuano](https://twitter.com/fedelemantuano))

Installation
------------

Clone repository

::

    git clone https://github.com/fedelemantuano/tika-app-python.git

and install tika-app-python with ``setup.py``:

::

    cd tika-app-python

    python setup.py install

or use ``pip``:

::

    pip install tika-app

Usage in a project
------------------

Import ``TikaApp`` class:

::

    from tikapp import TikaApp

    tika_client = TikaApp(file_jar="/opt/tika/tika-app-1.15.jar")

For get **content type**:

::

    tika_client.detect_content_type("your_file")

For detect **language**:

::

    tika_client.detect_language("your_file")

For detect **all metadata and content**:

::

    tika_client.extract_all_content("your_file")

For detect **only content**:

::

    tika_client.extract_only_content("your_file")

If you want to use payload in base64, you can use the same methods with
``payload`` argument:

::

    tika_client.detect_content_type(payload="base64_payload")
    tika_client.detect_language(payload="base64_payload")
    tika_client.extract_all_content(payload="base64_payload")
    tika_client.extract_only_content(payload="base64_payload")

Usage from command-line
-----------------------

If you installed tika-app-python with ``pip`` or ``setup.py`` you can
use it with command-line. To use tika-app-python you should submit the
Apache Tika app JAR. You can: - leave the default value:
``/opt/tika/tika-app-1.15.jar`` - set the enviroment value
``TIKA_APP_JAR`` - use ``--jar`` switch

The last one overwrite all the others.

These are all swithes:

::

    usage: tikapp [-h] (-f FILE | -p PAYLOAD) [-j JAR] [-d] [-t] [-l] [-a]
                       [-v]

    Wrapper for Apache Tika App.

    optional arguments:
      -h, --help            show this help message and exit
      -f FILE, --file FILE  File to submit (default: None)
      -p PAYLOAD, --payload PAYLOAD
                            Base64 payload to submit (default: None)
      -j JAR, --jar JAR     Apache Tika app JAR (default: None)
      -d, --detect          Detect document type (default: False)
      -t, --text            Output plain text content (default: False)
      -l, --language        Output only language (default: False)
      -a, --all             Output metadata and content from all embedded files
                            (default: False)
      -v, --version         show program's version number and exit

Example:

\`\`\`shell $ tikapp -f example\_file -a


Performance tests
-----------------

These are the results of performance tests in `tests`_ folder:

::

    (Python 2)
    tika_content_type()             0.704840 sec
    tika_detect_language()          1.592066 sec
    magic_content_type()            0.000215 sec
    tika_extract_all_content()      0.816366 sec
    tika_extract_only_content()     0.788667 sec

    (Python 3)
    tika_content_type()             0.698357 sec
    tika_detect_language()          1.593452 sec
    magic_content_type()            0.000226 sec
    tika_extract_all_content()      0.785915 sec
    tika_extract_only_content()     0.766517 sec

.. _tests: https://github.com/fedelemantuano/tika-app-python/tree/develop/tests
.. _Apache Tika App: https://tika.apache.org/

.. |PyPI version| image:: https://badge.fury.io/py/tika-app.svg
   :target: https://badge.fury.io/py/tika-app
.. |Build Status| image:: https://travis-ci.org/fedelemantuano/tika-app-python.svg?branch=master
   :target: https://travis-ci.org/fedelemantuano/tika-app-python
.. |Coverage Status| image:: https://coveralls.io/repos/github/fedelemantuano/tika-app-python/badge.svg?branch=master
   :target: https://coveralls.io/github/fedelemantuano/tika-app-python?branch=master
            

Raw data

            {
    "maintainer": null, 
    "docs_url": null, 
    "requires_python": null, 
    "maintainer_email": null, 
    "cheesecake_code_kwalitee_id": null, 
    "keywords": "tika,apache,toolkit", 
    "upload_time": "2017-06-25 13:36:23", 
    "requirements": [
        {
            "name": "chainmap", 
            "specs": [
                [
                    "==", 
                    "1.0.2"
                ]
            ]
        }, 
        {
            "name": "mail-parser", 
            "specs": [
                [
                    "==", 
                    "1.1.6"
                ]
            ]
        }, 
        {
            "name": "python-magic", 
            "specs": [
                [
                    "==", 
                    "0.4.12"
                ]
            ]
        }, 
        {
            "name": "simplejson", 
            "specs": [
                [
                    "==", 
                    "3.10.0"
                ]
            ]
        }, 
        {
            "name": "six", 
            "specs": [
                [
                    "==", 
                    "1.10.0"
                ]
            ]
        }
    ], 
    "author": "Fedele Mantuano", 
    "home_page": "https://github.com/fedelemantuano/tika-app-python", 
    "github_user": "fedelemantuano", 
    "download_url": "https://pypi.python.org/packages/32/34/98986b4fcf0c782d9f4ba5f7ab9a062ac0a35b72483e903c3fd0df697f5a/tika-app-1.1.1.tar.gz", 
    "platform": "Linux", 
    "version": "1.1.1", 
    "cheesecake_documentation_id": null, 
    "description": "|PyPI version| |Build Status| |Coverage Status|\n\ntika-app-python\n===============\n\nOverview\n--------\n\ntika-app-python is a wrapper for `Apache Tika App`_.\n\nApache 2 Open Source License\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\ntika-app-python can be downloaded, used, and modified free of charge. It\nis available under the Apache 2 license.\n\nAuthors\n-------\n\nMain Author\n~~~~~~~~~~~\n\nFedele Mantuano (**Twitter**:\n[@fedelemantuano](https://twitter.com/fedelemantuano))\n\nInstallation\n------------\n\nClone repository\n\n::\n\n    git clone https://github.com/fedelemantuano/tika-app-python.git\n\nand install tika-app-python with ``setup.py``:\n\n::\n\n    cd tika-app-python\n\n    python setup.py install\n\nor use ``pip``:\n\n::\n\n    pip install tika-app\n\nUsage in a project\n------------------\n\nImport ``TikaApp`` class:\n\n::\n\n    from tikapp import TikaApp\n\n    tika_client = TikaApp(file_jar=\"/opt/tika/tika-app-1.15.jar\")\n\nFor get **content type**:\n\n::\n\n    tika_client.detect_content_type(\"your_file\")\n\nFor detect **language**:\n\n::\n\n    tika_client.detect_language(\"your_file\")\n\nFor detect **all metadata and content**:\n\n::\n\n    tika_client.extract_all_content(\"your_file\")\n\nFor detect **only content**:\n\n::\n\n    tika_client.extract_only_content(\"your_file\")\n\nIf you want to use payload in base64, you can use the same methods with\n``payload`` argument:\n\n::\n\n    tika_client.detect_content_type(payload=\"base64_payload\")\n    tika_client.detect_language(payload=\"base64_payload\")\n    tika_client.extract_all_content(payload=\"base64_payload\")\n    tika_client.extract_only_content(payload=\"base64_payload\")\n\nUsage from command-line\n-----------------------\n\nIf you installed tika-app-python with ``pip`` or ``setup.py`` you can\nuse it with command-line. To use tika-app-python you should submit the\nApache Tika app JAR. You can: - leave the default value:\n``/opt/tika/tika-app-1.15.jar`` - set the enviroment value\n``TIKA_APP_JAR`` - use ``--jar`` switch\n\nThe last one overwrite all the others.\n\nThese are all swithes:\n\n::\n\n    usage: tikapp [-h] (-f FILE | -p PAYLOAD) [-j JAR] [-d] [-t] [-l] [-a]\n                       [-v]\n\n    Wrapper for Apache Tika App.\n\n    optional arguments:\n      -h, --help            show this help message and exit\n      -f FILE, --file FILE  File to submit (default: None)\n      -p PAYLOAD, --payload PAYLOAD\n                            Base64 payload to submit (default: None)\n      -j JAR, --jar JAR     Apache Tika app JAR (default: None)\n      -d, --detect          Detect document type (default: False)\n      -t, --text            Output plain text content (default: False)\n      -l, --language        Output only language (default: False)\n      -a, --all             Output metadata and content from all embedded files\n                            (default: False)\n      -v, --version         show program's version number and exit\n\nExample:\n\n\\`\\`\\`shell $ tikapp -f example\\_file -a\n\n\nPerformance tests\n-----------------\n\nThese are the results of performance tests in `tests`_ folder:\n\n::\n\n    (Python 2)\n    tika_content_type()             0.704840 sec\n    tika_detect_language()          1.592066 sec\n    magic_content_type()            0.000215 sec\n    tika_extract_all_content()      0.816366 sec\n    tika_extract_only_content()     0.788667 sec\n\n    (Python 3)\n    tika_content_type()             0.698357 sec\n    tika_detect_language()          1.593452 sec\n    magic_content_type()            0.000226 sec\n    tika_extract_all_content()      0.785915 sec\n    tika_extract_only_content()     0.766517 sec\n\n.. _tests: https://github.com/fedelemantuano/tika-app-python/tree/develop/tests\n.. _Apache Tika App: https://tika.apache.org/\n\n.. |PyPI version| image:: https://badge.fury.io/py/tika-app.svg\n   :target: https://badge.fury.io/py/tika-app\n.. |Build Status| image:: https://travis-ci.org/fedelemantuano/tika-app-python.svg?branch=master\n   :target: https://travis-ci.org/fedelemantuano/tika-app-python\n.. |Coverage Status| image:: https://coveralls.io/repos/github/fedelemantuano/tika-app-python/badge.svg?branch=master\n   :target: https://coveralls.io/github/fedelemantuano/tika-app-python?branch=master", 
    "lcname": "tika-app", 
    "bugtrack_url": "", 
    "github": true, 
    "coveralls": false, 
    "name": "tika-app", 
    "license": "Apache License, Version 2.0", 
    "travis_ci": true, 
    "github_project": "tika-app-python", 
    "summary": "Python client for Apache Tika App", 
    "split_keywords": [
        "tika", 
        "apache", 
        "toolkit"
    ], 
    "author_email": "mantuano.fedele@gmail.com", 
    "urls": [
        {
            "has_sig": false, 
            "upload_time": "2017-06-25T13:36:23", 
            "comment_text": "", 
            "python_version": "source", 
            "url": "https://pypi.python.org/packages/32/34/98986b4fcf0c782d9f4ba5f7ab9a062ac0a35b72483e903c3fd0df697f5a/tika-app-1.1.1.tar.gz", 
            "md5_digest": "2399a597185fffb2bcbaa14772942d93", 
            "downloads": 0, 
            "filename": "tika-app-1.1.1.tar.gz", 
            "packagetype": "sdist", 
            "path": "32/34/98986b4fcf0c782d9f4ba5f7ab9a062ac0a35b72483e903c3fd0df697f5a/tika-app-1.1.1.tar.gz", 
            "size": 6674
        }
    ], 
    "_id": null, 
    "cheesecake_installability_id": null
}