atap-corpus


Nameatap-corpus JSON
Version 0.1.15 PyPI version JSON
download
home_pagehttps://github.com/Australian-Text-Analytics-Platform/atap_corpus
SummaryCorpus mini-framework allowing for memory-efficient slicing and provides a standardised base corpus structure for the collection of ATAP tools.
upload_time2024-09-24 07:08:12
maintainerNone
docs_urlNone
authorJack Chan
requires_python<3.13,>=3.10
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <a href="https://atap.edu.au"><img src="https://www.atap.edu.au/atap-logo.png" width="125" height="50" align="right"></a>  
# ATAP Corpus

Provides a standardised base Corpus structure for ATAP tools.

Different Corpus can be sliced into subcorpus based on different criterias and will always return an subclass
instance of BaseCorpus.
The slicing criteria is flexible, it accepts a user defined function and comes with convenience slicing
operations layered on top of it out-of-the-box.
Subcorpus maintains a parent-child relationship with original corpus in a tree internally.

Corpus can also be serialised and deserialised which can be used to carry across different ATAP analytics notebooks.

```shell
pip install atap_corpus
```

[//]: # (### Extras: Viz:)

[//]: # ()
[//]: # (Out of the box, Corpus also comes with simple and quick visualisations such as word clouds, timelines etc.)

[//]: # ()
[//]: # (```shell)

[//]: # (pip install atap_corpus[viz])

[//]: # (```)

## Tests

To run all the unit tests, there is a script you can execute.

```shell
./scripts/run_tests.sh
```

This repo originated from Juxtorpus and is a decoupling effort.
Juxtorpus repo may be accessed [here](https://github.com/Sydney-Informatics-Hub/juxtorpus).

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Australian-Text-Analytics-Platform/atap_corpus",
    "name": "atap-corpus",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.13,>=3.10",
    "maintainer_email": null,
    "keywords": null,
    "author": "Jack Chan",
    "author_email": "huen.chan@sydney.edu.au",
    "download_url": "https://files.pythonhosted.org/packages/3d/90/c14bd27250e7c9c44e06ac0083c4096310f2d7c63fb5f2364fa262ff41e4/atap_corpus-0.1.15.tar.gz",
    "platform": null,
    "description": "<a href=\"https://atap.edu.au\"><img src=\"https://www.atap.edu.au/atap-logo.png\" width=\"125\" height=\"50\" align=\"right\"></a>  \n# ATAP Corpus\n\nProvides a standardised base Corpus structure for ATAP tools.\n\nDifferent Corpus can be sliced into subcorpus based on different criterias and will always return an subclass\ninstance of BaseCorpus.\nThe slicing criteria is flexible, it accepts a user defined function and comes with convenience slicing\noperations layered on top of it out-of-the-box.\nSubcorpus maintains a parent-child relationship with original corpus in a tree internally.\n\nCorpus can also be serialised and deserialised which can be used to carry across different ATAP analytics notebooks.\n\n```shell\npip install atap_corpus\n```\n\n[//]: # (### Extras: Viz:)\n\n[//]: # ()\n[//]: # (Out of the box, Corpus also comes with simple and quick visualisations such as word clouds, timelines etc.)\n\n[//]: # ()\n[//]: # (```shell)\n\n[//]: # (pip install atap_corpus[viz])\n\n[//]: # (```)\n\n## Tests\n\nTo run all the unit tests, there is a script you can execute.\n\n```shell\n./scripts/run_tests.sh\n```\n\nThis repo originated from Juxtorpus and is a decoupling effort.\nJuxtorpus repo may be accessed [here](https://github.com/Sydney-Informatics-Hub/juxtorpus).\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Corpus mini-framework allowing for memory-efficient slicing and provides a standardised base corpus structure for the collection of ATAP tools. ",
    "version": "0.1.15",
    "project_urls": {
        "Homepage": "https://github.com/Australian-Text-Analytics-Platform/atap_corpus",
        "Repository": "https://github.com/Australian-Text-Analytics-Platform/atap_corpus"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6790baa6c9391f44db3a23a0f66be2942b4415d82ca43ce4ec51d0e702105791",
                "md5": "508cf0a6b6758ceea19b619e9e10c5ca",
                "sha256": "dc7a06e315dbe7ef39992bc8d310874a175c7860e9e9b9dd184cc353d84a89d8"
            },
            "downloads": -1,
            "filename": "atap_corpus-0.1.15-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "508cf0a6b6758ceea19b619e9e10c5ca",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.13,>=3.10",
            "size": 30837,
            "upload_time": "2024-09-24T07:08:11",
            "upload_time_iso_8601": "2024-09-24T07:08:11.143467Z",
            "url": "https://files.pythonhosted.org/packages/67/90/baa6c9391f44db3a23a0f66be2942b4415d82ca43ce4ec51d0e702105791/atap_corpus-0.1.15-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3d90c14bd27250e7c9c44e06ac0083c4096310f2d7c63fb5f2364fa262ff41e4",
                "md5": "570610343c3a54bdd09a2710eedf760d",
                "sha256": "cd7a15177c4636899f90fed708de7d5280a89839258101b2ab3954e03580015e"
            },
            "downloads": -1,
            "filename": "atap_corpus-0.1.15.tar.gz",
            "has_sig": false,
            "md5_digest": "570610343c3a54bdd09a2710eedf760d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.13,>=3.10",
            "size": 25526,
            "upload_time": "2024-09-24T07:08:12",
            "upload_time_iso_8601": "2024-09-24T07:08:12.508455Z",
            "url": "https://files.pythonhosted.org/packages/3d/90/c14bd27250e7c9c44e06ac0083c4096310f2d7c63fb5f2364fa262ff41e4/atap_corpus-0.1.15.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-24 07:08:12",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Australian-Text-Analytics-Platform",
    "github_project": "atap_corpus",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "atap-corpus"
}
        
Elapsed time: 0.28989s