# word_family_counter
A Python script for counting word families in a text file using advanced morphological analysis with spaCy.
## Features
- Processes text files to count word families
- Uses spaCy for advanced linguistic analysis and lemmatization
- Handles contractions, compound words, and various text preprocessing tasks
- Supports multiple languages (depending on available spaCy models)
- Provides detailed output with word family frequencies
## Installation
1. Clone the repository:
```
git clone https://github.com/BlueBirdBack/word_family_counter.git
cd word_family_counter
```
2. Create a virtual environment (optional but recommended):
```
python -m venv venv
source venv/bin/activate # On Windows, use `venv\Scripts\activate`
```
3. Install the required dependencies:
```
pip install -r requirements.txt
```
4. Download the spaCy language model:
```
python -m spacy download en_core_web_sm
```
## Usage
Run the command with a text file as an argument:
```
word_family_counter path/to/your/text_file.txt
```
Optional arguments:
- `--verbose`: Increase output verbosity for debugging purposes
- `--language`: Specify the spaCy model to use (default: en_core_web_sm)
Example:
```
word_family_counter sample.txt --verbose --language en_core_web_md
```
Note: Ensure that you have installed the required spaCy model before running the command. If you encounter an error about missing models, run the installation command in step 4 again.
## Output
The script will display:
1. Total number of words in the text
2. Total number of unique word families
3. A list of word families sorted by frequency (descending) and then alphabetically
## License
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
1. Fork the repository
2. Create your feature branch (`git checkout -b feature/AmazingFeature`)
3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the branch (`git push origin feature/AmazingFeature`)
5. Open a Pull Request
## Contact
BlueBirdBack - avery@bluebirdback.com
Project Link: [https://github.com/BlueBirdBack/word_family_counter](https://github.com/BlueBirdBack/word_family_counter)
Raw data
{
"_id": null,
"home_page": null,
"name": "word-family-counter",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "linguistics, nlp, text analysis, word families",
"author": null,
"author_email": "BlueBirdBack <avery@bluebirdback.com>",
"download_url": "https://files.pythonhosted.org/packages/bf/fc/47e67159c9801a568931f9444cf81b6830ef98b5b8b034ed90aea7fcdffd/word_family_counter-0.1.1.tar.gz",
"platform": null,
"description": "# word_family_counter\n\nA Python script for counting word families in a text file using advanced morphological analysis with spaCy.\n\n## Features\n\n- Processes text files to count word families\n- Uses spaCy for advanced linguistic analysis and lemmatization\n- Handles contractions, compound words, and various text preprocessing tasks\n- Supports multiple languages (depending on available spaCy models)\n- Provides detailed output with word family frequencies\n\n## Installation\n\n1. Clone the repository:\n ```\n git clone https://github.com/BlueBirdBack/word_family_counter.git\n cd word_family_counter\n ```\n\n2. Create a virtual environment (optional but recommended):\n ```\n python -m venv venv\n source venv/bin/activate # On Windows, use `venv\\Scripts\\activate`\n ```\n\n3. Install the required dependencies:\n ```\n pip install -r requirements.txt\n ```\n\n4. Download the spaCy language model:\n ```\n python -m spacy download en_core_web_sm\n ```\n\n## Usage\n\nRun the command with a text file as an argument:\n\n```\nword_family_counter path/to/your/text_file.txt\n```\n\nOptional arguments:\n- `--verbose`: Increase output verbosity for debugging purposes\n- `--language`: Specify the spaCy model to use (default: en_core_web_sm)\n\nExample:\n```\nword_family_counter sample.txt --verbose --language en_core_web_md\n```\n\nNote: Ensure that you have installed the required spaCy model before running the command. If you encounter an error about missing models, run the installation command in step 4 again.\n\n## Output\n\nThe script will display:\n1. Total number of words in the text\n2. Total number of unique word families\n3. A list of word families sorted by frequency (descending) and then alphabetically\n\n## License\n\nThis project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n1. Fork the repository\n2. Create your feature branch (`git checkout -b feature/AmazingFeature`)\n3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)\n4. Push to the branch (`git push origin feature/AmazingFeature`)\n5. Open a Pull Request\n\n## Contact\n\nBlueBirdBack - avery@bluebirdback.com\n\nProject Link: [https://github.com/BlueBirdBack/word_family_counter](https://github.com/BlueBirdBack/word_family_counter)\n",
"bugtrack_url": null,
"license": "MIT License",
"summary": "A script for counting word families in a text file.",
"version": "0.1.1",
"project_urls": {
"Documentation": "https://github.com/BlueBirdBack/word_family_counter#readme",
"Homepage": "https://github.com/BlueBirdBack/word_family_counter",
"Issues": "https://github.com/BlueBirdBack/word_family_counter/issues",
"Repository": "https://github.com/BlueBirdBack/word_family_counter.git"
},
"split_keywords": [
"linguistics",
" nlp",
" text analysis",
" word families"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "a1950f1e0c6c6d5c96a9fa169ba7eaf1f188a8be2773a0c5f994c26ed319bb41",
"md5": "0962e8e8fc4a6ab1ba13941b430b1696",
"sha256": "8a66bdd584a57a99ddc2283bd08212f7a50ade2181e8258a803a22bb9479d444"
},
"downloads": -1,
"filename": "word_family_counter-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "0962e8e8fc4a6ab1ba13941b430b1696",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 6873,
"upload_time": "2024-09-26T09:37:44",
"upload_time_iso_8601": "2024-09-26T09:37:44.000027Z",
"url": "https://files.pythonhosted.org/packages/a1/95/0f1e0c6c6d5c96a9fa169ba7eaf1f188a8be2773a0c5f994c26ed319bb41/word_family_counter-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "bffc47e67159c9801a568931f9444cf81b6830ef98b5b8b034ed90aea7fcdffd",
"md5": "81ff9720daa1160c591b82e469ad130d",
"sha256": "8a27260c9a9c79fdbdcae72211c9e95642236e7950cf75d1310a7a291e36e8c8"
},
"downloads": -1,
"filename": "word_family_counter-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "81ff9720daa1160c591b82e469ad130d",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 7246,
"upload_time": "2024-09-26T09:37:45",
"upload_time_iso_8601": "2024-09-26T09:37:45.981948Z",
"url": "https://files.pythonhosted.org/packages/bf/fc/47e67159c9801a568931f9444cf81b6830ef98b5b8b034ed90aea7fcdffd/word_family_counter-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-26 09:37:45",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "BlueBirdBack",
"github_project": "word_family_counter#readme",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "spacy",
"specs": []
},
{
"name": "contractions",
"specs": []
},
{
"name": "psutil",
"specs": []
}
],
"lcname": "word-family-counter"
}