word-family-counter


Nameword-family-counter JSON
Version 0.1.1 PyPI version JSON
download
home_pageNone
SummaryA script for counting word families in a text file.
upload_time2024-09-26 09:37:45
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseMIT License
keywords linguistics nlp text analysis word families
VCS
bugtrack_url
requirements spacy contractions psutil
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # word_family_counter

A Python script for counting word families in a text file using advanced morphological analysis with spaCy.

## Features

- Processes text files to count word families
- Uses spaCy for advanced linguistic analysis and lemmatization
- Handles contractions, compound words, and various text preprocessing tasks
- Supports multiple languages (depending on available spaCy models)
- Provides detailed output with word family frequencies

## Installation

1. Clone the repository:
   ```
   git clone https://github.com/BlueBirdBack/word_family_counter.git
   cd word_family_counter
   ```

2. Create a virtual environment (optional but recommended):
   ```
   python -m venv venv
   source venv/bin/activate  # On Windows, use `venv\Scripts\activate`
   ```

3. Install the required dependencies:
   ```
   pip install -r requirements.txt
   ```

4. Download the spaCy language model:
   ```
   python -m spacy download en_core_web_sm
   ```

## Usage

Run the command with a text file as an argument:

```
word_family_counter path/to/your/text_file.txt
```

Optional arguments:
- `--verbose`: Increase output verbosity for debugging purposes
- `--language`: Specify the spaCy model to use (default: en_core_web_sm)

Example:
```
word_family_counter sample.txt --verbose --language en_core_web_md
```

Note: Ensure that you have installed the required spaCy model before running the command. If you encounter an error about missing models, run the installation command in step 4 again.

## Output

The script will display:
1. Total number of words in the text
2. Total number of unique word families
3. A list of word families sorted by frequency (descending) and then alphabetically

## License

This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

1. Fork the repository
2. Create your feature branch (`git checkout -b feature/AmazingFeature`)
3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the branch (`git push origin feature/AmazingFeature`)
5. Open a Pull Request

## Contact

BlueBirdBack - avery@bluebirdback.com

Project Link: [https://github.com/BlueBirdBack/word_family_counter](https://github.com/BlueBirdBack/word_family_counter)

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "word-family-counter",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "linguistics, nlp, text analysis, word families",
    "author": null,
    "author_email": "BlueBirdBack <avery@bluebirdback.com>",
    "download_url": "https://files.pythonhosted.org/packages/bf/fc/47e67159c9801a568931f9444cf81b6830ef98b5b8b034ed90aea7fcdffd/word_family_counter-0.1.1.tar.gz",
    "platform": null,
    "description": "# word_family_counter\n\nA Python script for counting word families in a text file using advanced morphological analysis with spaCy.\n\n## Features\n\n- Processes text files to count word families\n- Uses spaCy for advanced linguistic analysis and lemmatization\n- Handles contractions, compound words, and various text preprocessing tasks\n- Supports multiple languages (depending on available spaCy models)\n- Provides detailed output with word family frequencies\n\n## Installation\n\n1. Clone the repository:\n   ```\n   git clone https://github.com/BlueBirdBack/word_family_counter.git\n   cd word_family_counter\n   ```\n\n2. Create a virtual environment (optional but recommended):\n   ```\n   python -m venv venv\n   source venv/bin/activate  # On Windows, use `venv\\Scripts\\activate`\n   ```\n\n3. Install the required dependencies:\n   ```\n   pip install -r requirements.txt\n   ```\n\n4. Download the spaCy language model:\n   ```\n   python -m spacy download en_core_web_sm\n   ```\n\n## Usage\n\nRun the command with a text file as an argument:\n\n```\nword_family_counter path/to/your/text_file.txt\n```\n\nOptional arguments:\n- `--verbose`: Increase output verbosity for debugging purposes\n- `--language`: Specify the spaCy model to use (default: en_core_web_sm)\n\nExample:\n```\nword_family_counter sample.txt --verbose --language en_core_web_md\n```\n\nNote: Ensure that you have installed the required spaCy model before running the command. If you encounter an error about missing models, run the installation command in step 4 again.\n\n## Output\n\nThe script will display:\n1. Total number of words in the text\n2. Total number of unique word families\n3. A list of word families sorted by frequency (descending) and then alphabetically\n\n## License\n\nThis project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n1. Fork the repository\n2. Create your feature branch (`git checkout -b feature/AmazingFeature`)\n3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)\n4. Push to the branch (`git push origin feature/AmazingFeature`)\n5. Open a Pull Request\n\n## Contact\n\nBlueBirdBack - avery@bluebirdback.com\n\nProject Link: [https://github.com/BlueBirdBack/word_family_counter](https://github.com/BlueBirdBack/word_family_counter)\n",
    "bugtrack_url": null,
    "license": "MIT License",
    "summary": "A script for counting word families in a text file.",
    "version": "0.1.1",
    "project_urls": {
        "Documentation": "https://github.com/BlueBirdBack/word_family_counter#readme",
        "Homepage": "https://github.com/BlueBirdBack/word_family_counter",
        "Issues": "https://github.com/BlueBirdBack/word_family_counter/issues",
        "Repository": "https://github.com/BlueBirdBack/word_family_counter.git"
    },
    "split_keywords": [
        "linguistics",
        " nlp",
        " text analysis",
        " word families"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a1950f1e0c6c6d5c96a9fa169ba7eaf1f188a8be2773a0c5f994c26ed319bb41",
                "md5": "0962e8e8fc4a6ab1ba13941b430b1696",
                "sha256": "8a66bdd584a57a99ddc2283bd08212f7a50ade2181e8258a803a22bb9479d444"
            },
            "downloads": -1,
            "filename": "word_family_counter-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0962e8e8fc4a6ab1ba13941b430b1696",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 6873,
            "upload_time": "2024-09-26T09:37:44",
            "upload_time_iso_8601": "2024-09-26T09:37:44.000027Z",
            "url": "https://files.pythonhosted.org/packages/a1/95/0f1e0c6c6d5c96a9fa169ba7eaf1f188a8be2773a0c5f994c26ed319bb41/word_family_counter-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "bffc47e67159c9801a568931f9444cf81b6830ef98b5b8b034ed90aea7fcdffd",
                "md5": "81ff9720daa1160c591b82e469ad130d",
                "sha256": "8a27260c9a9c79fdbdcae72211c9e95642236e7950cf75d1310a7a291e36e8c8"
            },
            "downloads": -1,
            "filename": "word_family_counter-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "81ff9720daa1160c591b82e469ad130d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 7246,
            "upload_time": "2024-09-26T09:37:45",
            "upload_time_iso_8601": "2024-09-26T09:37:45.981948Z",
            "url": "https://files.pythonhosted.org/packages/bf/fc/47e67159c9801a568931f9444cf81b6830ef98b5b8b034ed90aea7fcdffd/word_family_counter-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-26 09:37:45",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "BlueBirdBack",
    "github_project": "word_family_counter#readme",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "spacy",
            "specs": []
        },
        {
            "name": "contractions",
            "specs": []
        },
        {
            "name": "psutil",
            "specs": []
        }
    ],
    "lcname": "word-family-counter"
}
        
Elapsed time: 0.49344s