# ArrowTextClassifier
ArrowTextClassifier is a Python package for text classification tasks, offering functionalities to train, summarize, and classify text using convolutional neural network (CNN) architecture.
## Installation
You can install ArrowTextClassifier via pip:
```bash
pip install ArrowTextClassifier
```
## How it Works
ArrowTextClassifier implements a convolutional neural network (CNN) architecture for text classification. It tokenizes input text, embeds the tokens, applies convolutional filters over the embedded tokens to extract features, and then classifies the text into predefined categories.
## Usage
### Training
To train a text classification model, you can utilize the `train_model` method provided by the `Model` class:
```python
from ArrowTextClassifier import Model
model = Model(name="your_model_name")
model.train_model(dataset)
```
#### How to make a dataset
To make your own custom dataset for training you need to create a parquet file with the following format:
*Example Parquet File*
```json
{"label":"normal","example":"Hey there!"}
{"label":"normal","example":"Hi!"}
{"label":"toxic","example":"You suck!"}
```
After you have created the parquet file with the data in the format above, you can provide to the dataset to start training the model.
### Summarization
To summarize a trained model, you can use the `summarize` method:
```python
model.summarize(
model_path="path_to_your_model",
hyperparams_path="path_to_hyperparameters_file",
vocabulary_path="path_to_vocabulary_file",
modelSummary_write_path="path_to_write_model_summary"
)
```
### Classification
For classifying text using the trained model:
```python
result = model.classify(
model_path="path_to_your_model",
hyperparams_path="path_to_hyperparameters_file",
text="your_input_text",
vocabulary_path="path_to_vocabulary_file"
)
print(result)
```
## Getting Started
This package provides tools for text classification tasks. You can explore and customize it according to your requirements. Refer to the documentation for detailed usage instructions. We have also made our own colab [notebook](https://colab.research.google.com/drive/1fGDLICkctfdpTgLoh_Bouv-NY-q-kdlQ?usp=sharing) to help you train a custom offensive language classifier using this.
## License
This project is licensed under the MIT License - see the LICENSE file for details.
---
## Contact
For any questions or feedback, please contact technologypower24@gmail.com or you can contact me at discord - techpowerb.
Raw data
{
"_id": null,
"home_page": "https://github.com/Bhargav230m/ArrowTextClassifier.git",
"name": "ArrowTextClassifier",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": null,
"keywords": "text classification, natural language processing, NLP, PyTorch, machine learning, deep learning, text summarization, preprocessing, data science, artificial intelligence, dataset, discord",
"author": "techpowerb",
"author_email": "technologypower24@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/6c/1a/0010a3aef31d2ce95efdf9d42bc66475060b9c9c5d57887a4446b3b79846/ArrowTextClassifier-1.0.3.tar.gz",
"platform": null,
"description": "# ArrowTextClassifier\r\n\r\nArrowTextClassifier is a Python package for text classification tasks, offering functionalities to train, summarize, and classify text using convolutional neural network (CNN) architecture.\r\n\r\n## Installation\r\n\r\nYou can install ArrowTextClassifier via pip:\r\n\r\n```bash\r\npip install ArrowTextClassifier\r\n```\r\n\r\n## How it Works\r\n\r\nArrowTextClassifier implements a convolutional neural network (CNN) architecture for text classification. It tokenizes input text, embeds the tokens, applies convolutional filters over the embedded tokens to extract features, and then classifies the text into predefined categories.\r\n\r\n## Usage\r\n\r\n### Training\r\n\r\nTo train a text classification model, you can utilize the `train_model` method provided by the `Model` class:\r\n\r\n```python\r\nfrom ArrowTextClassifier import Model\r\n\r\nmodel = Model(name=\"your_model_name\")\r\nmodel.train_model(dataset)\r\n```\r\n\r\n#### How to make a dataset\r\n\r\nTo make your own custom dataset for training you need to create a parquet file with the following format:\r\n\r\n*Example Parquet File*\r\n\r\n```json\r\n{\"label\":\"normal\",\"example\":\"Hey there!\"}\r\n{\"label\":\"normal\",\"example\":\"Hi!\"}\r\n{\"label\":\"toxic\",\"example\":\"You suck!\"}\r\n```\r\n\r\nAfter you have created the parquet file with the data in the format above, you can provide to the dataset to start training the model.\r\n\r\n### Summarization\r\n\r\nTo summarize a trained model, you can use the `summarize` method:\r\n\r\n```python\r\nmodel.summarize(\r\n model_path=\"path_to_your_model\",\r\n hyperparams_path=\"path_to_hyperparameters_file\",\r\n vocabulary_path=\"path_to_vocabulary_file\",\r\n modelSummary_write_path=\"path_to_write_model_summary\"\r\n)\r\n```\r\n\r\n### Classification\r\n\r\nFor classifying text using the trained model:\r\n\r\n```python\r\nresult = model.classify(\r\n model_path=\"path_to_your_model\",\r\n hyperparams_path=\"path_to_hyperparameters_file\",\r\n text=\"your_input_text\",\r\n vocabulary_path=\"path_to_vocabulary_file\"\r\n)\r\nprint(result)\r\n```\r\n\r\n## Getting Started\r\n\r\nThis package provides tools for text classification tasks. You can explore and customize it according to your requirements. Refer to the documentation for detailed usage instructions. We have also made our own colab [notebook](https://colab.research.google.com/drive/1fGDLICkctfdpTgLoh_Bouv-NY-q-kdlQ?usp=sharing) to help you train a custom offensive language classifier using this.\r\n\r\n## License\r\n\r\nThis project is licensed under the MIT License - see the LICENSE file for details.\r\n\r\n---\r\n\r\n## Contact\r\n\r\nFor any questions or feedback, please contact technologypower24@gmail.com or you can contact me at discord - techpowerb.\r\n",
"bugtrack_url": null,
"license": null,
"summary": "ArrowTextClassifier is a simple text classification tool written in pytorch that allows you to train, summarize, and use text classification models for various tasks.",
"version": "1.0.3",
"project_urls": {
"Homepage": "https://github.com/Bhargav230m/ArrowTextClassifier.git"
},
"split_keywords": [
"text classification",
" natural language processing",
" nlp",
" pytorch",
" machine learning",
" deep learning",
" text summarization",
" preprocessing",
" data science",
" artificial intelligence",
" dataset",
" discord"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "2bd2e6a1111141a1abed2d53209fb18315c298b7754d049173bf12e850f64644",
"md5": "cb8b0d04dff09bf09616d16a6e4a0c5b",
"sha256": "3433b196ff044e80e4c5fc016e9726ae01a133dc9d8fc3b4deecbb083b1f22af"
},
"downloads": -1,
"filename": "ArrowTextClassifier-1.0.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "cb8b0d04dff09bf09616d16a6e4a0c5b",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 9941,
"upload_time": "2024-04-20T14:25:37",
"upload_time_iso_8601": "2024-04-20T14:25:37.514166Z",
"url": "https://files.pythonhosted.org/packages/2b/d2/e6a1111141a1abed2d53209fb18315c298b7754d049173bf12e850f64644/ArrowTextClassifier-1.0.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "6c1a0010a3aef31d2ce95efdf9d42bc66475060b9c9c5d57887a4446b3b79846",
"md5": "80c29ad861f574fe7e106975be132599",
"sha256": "d128a1210cc580c66fb0b6e2f98a27b9d117193945d5c6fbc26b53f93d041697"
},
"downloads": -1,
"filename": "ArrowTextClassifier-1.0.3.tar.gz",
"has_sig": false,
"md5_digest": "80c29ad861f574fe7e106975be132599",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 7928,
"upload_time": "2024-04-20T14:25:39",
"upload_time_iso_8601": "2024-04-20T14:25:39.284521Z",
"url": "https://files.pythonhosted.org/packages/6c/1a/0010a3aef31d2ce95efdf9d42bc66475060b9c9c5d57887a4446b3b79846/ArrowTextClassifier-1.0.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-04-20 14:25:39",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Bhargav230m",
"github_project": "ArrowTextClassifier",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "arrowtextclassifier"
}