sparkgrep


Namesparkgrep JSON
Version 0.1.1a1 PyPI version JSON
download
home_pageNone
SummaryPre-commit hooks for Apache Spark development (Databricks, EMR, Dataproc, and more)
upload_time2025-09-04 13:21:44
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseMIT
keywords spark databricks pre-commit code-quality linting
VCS
bugtrack_url
requirements black flake8 nbformat ruff bandit pytest pytest-cov build
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # SparkGrep

![Static Badge](https://img.shields.io/badge/preview-red)
[![Lines of Code](https://sonarcloud.io/api/project_badges/measure?project=sparkgrep&metric=ncloc)](https://sonarcloud.io/summary/new_code?id=sparkgrep)
[![Quality Gate Status](https://sonarcloud.io/api/project_badges/measure?project=sparkgrep&metric=alert_status)](https://sonarcloud.io/summary/new_code?id=sparkgrep)
[![Maintainability Rating](https://sonarcloud.io/api/project_badges/measure?project=sparkgrep&metric=sqale_rating)](https://sonarcloud.io/summary/new_code?id=sparkgrep)
[![Security Rating](https://sonarcloud.io/api/project_badges/measure?project=sparkgrep&metric=security_rating)](https://sonarcloud.io/summary/new_code?id=sparkgrep)
[![Reliability Rating](https://sonarcloud.io/api/project_badges/measure?project=sparkgrep&metric=reliability_rating)](https://sonarcloud.io/summary/new_code?id=sparkgrep)
[![Coverage](https://sonarcloud.io/api/project_badges/measure?project=sparkgrep&metric=coverage)](https://sonarcloud.io/summary/new_code?id=sparkgrep)
[![Bugs](https://sonarcloud.io/api/project_badges/measure?project=sparkgrep&metric=bugs)](https://sonarcloud.io/summary/new_code?id=sparkgrep)
[![Vulnerabilities](https://sonarcloud.io/api/project_badges/measure?project=sparkgrep&metric=vulnerabilities)](https://sonarcloud.io/summary/new_code?id=sparkgrep)
[![Code Smells](https://sonarcloud.io/api/project_badges/measure?project=sparkgrep&metric=code_smells)](https://sonarcloud.io/summary/new_code?id=sparkgrep)
[![Python Version](https://img.shields.io/badge/python-3.12+-blue.svg)](https://www.python.org/downloads/)
[![Code style: Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
[![Security: Bandit](https://img.shields.io/badge/security-bandit-greenb.svg)](https://github.com/PyCQA/bandit)
[![License](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)

Pre-commit hook that detects debugging leftovers in Apache Spark applications.

## 🎯 Purpose

SparkGrep helps maintain clean Apache Spark codebases by detecting common debugging leftovers and performance anti-patterns that developers often forget to remove before committing code.

### 🔍 What it Detects

- **`display()` calls** - Jupyter/Databricks debugging function
- **`.show()` methods** - DataFrame inspection calls
- **`.collect()` without assignment** - Potential performance issues
- **`.count()` without assignment** - Unnecessary computations
- **Custom patterns** - User-defined patterns via configuration

## 🚀 Installation

```bash
pip install sparkgrep
```

## 📋 Usage

### As a Pre-commit Hook

Add to your `.pre-commit-config.yaml`:

```yaml
repos:
  - repo: https://github.com/leandroasaservice/sparkgrep
    rev: v0.1.1a1  # Use this preview version.
    hooks:
      - id: sparkgrep
```

### Command Line

```bash
# Check specific files
sparkgrep src/my_script.py notebook.ipynb

# Check with additional patterns
sparkgrep --additional-patterns "debug_print:Debug print statement" src/

# Disable default patterns and use only custom ones
sparkgrep --disable-default-patterns --additional-patterns "my_pattern:My description" src/
```

----

## 🛡️ Security & Quality

This project maintains high security and code quality standards:

### 🔒 Security Measures

- **Automated vulnerability detection** and issue creation
- **Admin-protected CI/CD** pipelines
- **Dependency vulnerability monitoring**

### 📊 Code Quality

- **80% minimum code coverage** enforced in CI
- **SonarCloud integration** for continuous code quality analysis
- **Automated testing** on every PR
- **Code formatting** with Ruff

----

## 📁 Project Structure

```sh
sparkgrep/
├── src/sparkgrep/          # Main package
│   ├── cli.py              # Command-line interface
│   ├── patterns.py         # Pattern definitions
│   ├── file_processors.py  # File processing logic
│   └── utils.py            # Utility functions
├── tests/                  # Test suite
│   ├── unit/               # Unit tests
│   └── integration/        # Integration tests
├── .github/                # GitHub configuration
│   ├── workflows/          # CI/CD pipelines
│   └── ISSUE_TEMPLATE/     # Issue templates
└── docs/                   # Documentation
```

## 🤝 Contributing

1. **Fork** the repository
2. **Create** a feature branch (`git checkout -b feature/amazing-feature`)
3. **Make** your changes with tests
4. **Ensure** all checks pass (`task quality`, `task test`)
5. **Submit** a pull request

### Contribution Guidelines

- **Tests required** for all new features
- **Security scans** must pass
- **Code coverage** must remain ≥ 80%
- **Admin approval** required for all PRs to main
- **Follow** existing code style and patterns
See [CONTRIBUTING.md](doc/CONTRIBUTING.md) for details.

## 📄 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## 📞 Support

- **Issues**: [GitHub Issues](https://github.com/leandroasaservice/sparkgrep/issues)
- **Discussions**: [GitHub Discussions](https://github.com/leandroasaservice/sparkgrep/discussions)
- **Documentation**: [Project Docs](doc/)

----

## Made with ❤️ for the Apache Spark community

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "sparkgrep",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "Leandro Kellermann de Oliveira <lkellermann@leandroasaservice.com>",
    "keywords": "spark, databricks, pre-commit, code-quality, linting",
    "author": null,
    "author_email": "Leandro Kellermann de Oliveira <lkellermann@leandroasaservice.com>",
    "download_url": "https://files.pythonhosted.org/packages/72/de/aa525ba89fdfa4b26460ffca69068dbb7cd39b967044e01024b82a9bce19/sparkgrep-0.1.1a1.tar.gz",
    "platform": null,
    "description": "# SparkGrep\n\n![Static Badge](https://img.shields.io/badge/preview-red)\n[![Lines of Code](https://sonarcloud.io/api/project_badges/measure?project=sparkgrep&metric=ncloc)](https://sonarcloud.io/summary/new_code?id=sparkgrep)\n[![Quality Gate Status](https://sonarcloud.io/api/project_badges/measure?project=sparkgrep&metric=alert_status)](https://sonarcloud.io/summary/new_code?id=sparkgrep)\n[![Maintainability Rating](https://sonarcloud.io/api/project_badges/measure?project=sparkgrep&metric=sqale_rating)](https://sonarcloud.io/summary/new_code?id=sparkgrep)\n[![Security Rating](https://sonarcloud.io/api/project_badges/measure?project=sparkgrep&metric=security_rating)](https://sonarcloud.io/summary/new_code?id=sparkgrep)\n[![Reliability Rating](https://sonarcloud.io/api/project_badges/measure?project=sparkgrep&metric=reliability_rating)](https://sonarcloud.io/summary/new_code?id=sparkgrep)\n[![Coverage](https://sonarcloud.io/api/project_badges/measure?project=sparkgrep&metric=coverage)](https://sonarcloud.io/summary/new_code?id=sparkgrep)\n[![Bugs](https://sonarcloud.io/api/project_badges/measure?project=sparkgrep&metric=bugs)](https://sonarcloud.io/summary/new_code?id=sparkgrep)\n[![Vulnerabilities](https://sonarcloud.io/api/project_badges/measure?project=sparkgrep&metric=vulnerabilities)](https://sonarcloud.io/summary/new_code?id=sparkgrep)\n[![Code Smells](https://sonarcloud.io/api/project_badges/measure?project=sparkgrep&metric=code_smells)](https://sonarcloud.io/summary/new_code?id=sparkgrep)\n[![Python Version](https://img.shields.io/badge/python-3.12+-blue.svg)](https://www.python.org/downloads/)\n[![Code style: Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)\n[![Security: Bandit](https://img.shields.io/badge/security-bandit-greenb.svg)](https://github.com/PyCQA/bandit)\n[![License](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)\n\nPre-commit hook that detects debugging leftovers in Apache Spark applications.\n\n## \ud83c\udfaf Purpose\n\nSparkGrep helps maintain clean Apache Spark codebases by detecting common debugging leftovers and performance anti-patterns that developers often forget to remove before committing code.\n\n### \ud83d\udd0d What it Detects\n\n- **`display()` calls** - Jupyter/Databricks debugging function\n- **`.show()` methods** - DataFrame inspection calls\n- **`.collect()` without assignment** - Potential performance issues\n- **`.count()` without assignment** - Unnecessary computations\n- **Custom patterns** - User-defined patterns via configuration\n\n## \ud83d\ude80 Installation\n\n```bash\npip install sparkgrep\n```\n\n## \ud83d\udccb Usage\n\n### As a Pre-commit Hook\n\nAdd to your `.pre-commit-config.yaml`:\n\n```yaml\nrepos:\n  - repo: https://github.com/leandroasaservice/sparkgrep\n    rev: v0.1.1a1  # Use this preview version.\n    hooks:\n      - id: sparkgrep\n```\n\n### Command Line\n\n```bash\n# Check specific files\nsparkgrep src/my_script.py notebook.ipynb\n\n# Check with additional patterns\nsparkgrep --additional-patterns \"debug_print:Debug print statement\" src/\n\n# Disable default patterns and use only custom ones\nsparkgrep --disable-default-patterns --additional-patterns \"my_pattern:My description\" src/\n```\n\n----\n\n## \ud83d\udee1\ufe0f Security & Quality\n\nThis project maintains high security and code quality standards:\n\n### \ud83d\udd12 Security Measures\n\n- **Automated vulnerability detection** and issue creation\n- **Admin-protected CI/CD** pipelines\n- **Dependency vulnerability monitoring**\n\n### \ud83d\udcca Code Quality\n\n- **80% minimum code coverage** enforced in CI\n- **SonarCloud integration** for continuous code quality analysis\n- **Automated testing** on every PR\n- **Code formatting** with Ruff\n\n----\n\n## \ud83d\udcc1 Project Structure\n\n```sh\nsparkgrep/\n\u251c\u2500\u2500 src/sparkgrep/          # Main package\n\u2502   \u251c\u2500\u2500 cli.py              # Command-line interface\n\u2502   \u251c\u2500\u2500 patterns.py         # Pattern definitions\n\u2502   \u251c\u2500\u2500 file_processors.py  # File processing logic\n\u2502   \u2514\u2500\u2500 utils.py            # Utility functions\n\u251c\u2500\u2500 tests/                  # Test suite\n\u2502   \u251c\u2500\u2500 unit/               # Unit tests\n\u2502   \u2514\u2500\u2500 integration/        # Integration tests\n\u251c\u2500\u2500 .github/                # GitHub configuration\n\u2502   \u251c\u2500\u2500 workflows/          # CI/CD pipelines\n\u2502   \u2514\u2500\u2500 ISSUE_TEMPLATE/     # Issue templates\n\u2514\u2500\u2500 docs/                   # Documentation\n```\n\n## \ud83e\udd1d Contributing\n\n1. **Fork** the repository\n2. **Create** a feature branch (`git checkout -b feature/amazing-feature`)\n3. **Make** your changes with tests\n4. **Ensure** all checks pass (`task quality`, `task test`)\n5. **Submit** a pull request\n\n### Contribution Guidelines\n\n- **Tests required** for all new features\n- **Security scans** must pass\n- **Code coverage** must remain \u2265 80%\n- **Admin approval** required for all PRs to main\n- **Follow** existing code style and patterns\nSee [CONTRIBUTING.md](doc/CONTRIBUTING.md) for details.\n\n## \ud83d\udcc4 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## \ud83d\udcde Support\n\n- **Issues**: [GitHub Issues](https://github.com/leandroasaservice/sparkgrep/issues)\n- **Discussions**: [GitHub Discussions](https://github.com/leandroasaservice/sparkgrep/discussions)\n- **Documentation**: [Project Docs](doc/)\n\n----\n\n## Made with \u2764\ufe0f for the Apache Spark community\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Pre-commit hooks for Apache Spark development (Databricks, EMR, Dataproc, and more)",
    "version": "0.1.1a1",
    "project_urls": {
        "Bug Reports": "https://github.com/leandroasaservice/sparkgrep/issues",
        "Contributing": "https://github.com/leandroasaservice/sparkgrep/blob/main/doc/CONTRIBUTING.md",
        "Documentation": "https://github.com/leandroasaservice/sparkgrep/blob/main/README.md",
        "Homepage": "https://github.com/leandroasaservice/sparkgrep",
        "Repository": "https://github.com/leandroasaservice/sparkgrep",
        "Source": "https://github.com/leandroasaservice/sparkgrep"
    },
    "split_keywords": [
        "spark",
        " databricks",
        " pre-commit",
        " code-quality",
        " linting"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "84e925d6bc4400d372013c12cc5b098778c98d762dd4edab0ee19a32d3b9f650",
                "md5": "7f62ebbf6dda0472df4b05621210945c",
                "sha256": "fb363d714c7883e4e6c344b973b83905164f401a18f7c670954deaf91baf00be"
            },
            "downloads": -1,
            "filename": "sparkgrep-0.1.1a1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7f62ebbf6dda0472df4b05621210945c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 8945,
            "upload_time": "2025-09-04T13:21:42",
            "upload_time_iso_8601": "2025-09-04T13:21:42.778957Z",
            "url": "https://files.pythonhosted.org/packages/84/e9/25d6bc4400d372013c12cc5b098778c98d762dd4edab0ee19a32d3b9f650/sparkgrep-0.1.1a1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "72deaa525ba89fdfa4b26460ffca69068dbb7cd39b967044e01024b82a9bce19",
                "md5": "4e3c258c51b247a3e3fb754dff21281e",
                "sha256": "3d30cea61b3e315b1eef82f0ebf13b3ef676689bd52360f670e58004ef425473"
            },
            "downloads": -1,
            "filename": "sparkgrep-0.1.1a1.tar.gz",
            "has_sig": false,
            "md5_digest": "4e3c258c51b247a3e3fb754dff21281e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 10004,
            "upload_time": "2025-09-04T13:21:44",
            "upload_time_iso_8601": "2025-09-04T13:21:44.075668Z",
            "url": "https://files.pythonhosted.org/packages/72/de/aa525ba89fdfa4b26460ffca69068dbb7cd39b967044e01024b82a9bce19/sparkgrep-0.1.1a1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-04 13:21:44",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "leandroasaservice",
    "github_project": "sparkgrep",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "black",
            "specs": [
                [
                    ">=",
                    "23.0.0"
                ]
            ]
        },
        {
            "name": "flake8",
            "specs": [
                [
                    ">=",
                    "6.0.0"
                ]
            ]
        },
        {
            "name": "nbformat",
            "specs": [
                [
                    ">=",
                    "5.0.0"
                ]
            ]
        },
        {
            "name": "ruff",
            "specs": [
                [
                    "==",
                    "0.12.7"
                ]
            ]
        },
        {
            "name": "bandit",
            "specs": [
                [
                    "==",
                    "1.8.6"
                ]
            ]
        },
        {
            "name": "pytest",
            "specs": [
                [
                    "==",
                    "8.4.1"
                ]
            ]
        },
        {
            "name": "pytest-cov",
            "specs": [
                [
                    "==",
                    "6.2.1"
                ]
            ]
        },
        {
            "name": "build",
            "specs": [
                [
                    "==",
                    "1.3.0"
                ]
            ]
        }
    ],
    "lcname": "sparkgrep"
}
        
Elapsed time: 3.75564s