groovy-parser


Namegroovy-parser JSON
Version 0.1.1 PyPI version JSON
download
home_pagehttps://github.com/inab/python-groovy-parser
SummaryGroovy 3.0.x parser based on Pygments and Lark
upload_time2023-07-12 18:15:14
maintainer
docs_urlNone
authorJosé M. Fernández <https://orcid.org/0000-0002-4806-5140>
requires_python>=3.7
licenseApache-2.0
keywords
VCS
bugtrack_url
requirements lark Pygments
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # python-groovy-parser

Python package which implements a Groovy 3.0.X parser, using both Pygments, Lark and the corresponding grammar.

The tokenizer, lexer and grammar have being tested, stressed and fine tuned
to be able to properly parse both Nextflow (i.e. `*.nf`), `nextflow.config`-like files
and real Groovy code from:

* https://github.com/nf-core/modules.git
* https://github.com/nf-core/rnaseq.git
* https://github.com/nf-core/viralintegration.git
* https://github.com/nf-core/viralrecon.git
* https://github.com/wombat-p/WOMBAT-Pipelines.git
* https://github.com/nextflow-io/nextflow.git

## Install
You can install the development version of this package through pip just running:

```bash
pip install git+https://github.com/inab/python-groovy-parser.git
```

## Test programs

This repo contains a couple of test programs called
[translated-groovy3-parser.py](translated-groovy3-parser.py) and
[cached-translated-groovy3-parser.py](cached-translated-groovy3-parser.py),
which demonstrate how to use the parser and digest it a bit.

The programs take one or more files as input.

```bash
git pull https://github.com/nf-core/rnaseq.git
translated-groovy3-parser.py $(find rnaseq -type f -name "*.nf")
```

If an input file is for instance `rnaseq/modules/local/bedtools_genomecov.nf`,
the program generates a log file `rnaseq/modules/local/bedtools_genomecov.nf.lark`,
where the parsing traces are stored (emitted tokens, parsing errors, etc...).

Also, when the parsing task worked properly, it condenses and serializes
the parse tree into a file with extension `.lark.json` (for instance,
`rnaseq/modules/local/bedtools_genomecov.nf.lark.json`).

And as a proof of concept, it tries to identify features from Nextflow files,
like the declared processes, includes and workflows, and they are roughly printed
at a file with extension `.lark.result` (for instance `rnaseq/modules/local/bedtools_genomecov.nf.lark.result`).

As parsing task is heavy, the parsing module also contains a method to
be able to cache the parsed tree in JSON format in a persistent store,
like a filesystem. So, next operation would be expensive the first time,
but not the next ones:

```bash
GROOVY_CACHEDIR=/tmp/somecachedir cached-translated-groovy3-parser.py $(find rnaseq -type f -name "*.nf")
```

The caching directory contents depend on the grammar and the implementations, as well as versions of the dependencies.
So, if this software is updated (due grammar is updated or a bug is fixed),
cached contents from previous versions are not reused.

# Acknowledgements

The tokenizer is an evolution from Pygments Groovy lexer https://github.com/pygments/pygments/blob/b7c8f35440f591c6687cb912aa223f5cf37b6704/pygments/lexers/jvm.py#L543-L618

The Lark grammar has been created from https://github.com/apache/groovy/blob/3b6909a3dbb574e66f5d0fb6aafb6e28316033a8/src/antlr/GroovyParser.g4 ,
converting it to EBNF using https://bottlecaps.de/convert/ ,
translating the EBNF representation to Lark format partially by hand.

Some fixes were inspired on https://github.com/daniellansun/groovy-antlr4-grammar-optimized/tree/master/src/main/antlr4/org/codehaus/groovy/parser/antlr4

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/inab/python-groovy-parser",
    "name": "groovy-parser",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "",
    "author": "Jos\u00e9 M. Fern\u00e1ndez <https://orcid.org/0000-0002-4806-5140>",
    "author_email": "jose.m.fernandez@bsc.es",
    "download_url": "https://files.pythonhosted.org/packages/bd/40/ed2c1f425057c4f0e06a6982c5e205dab8bdd28c4cf595ca391c390d0ae9/groovy-parser-0.1.1.tar.gz",
    "platform": null,
    "description": "# python-groovy-parser\n\nPython package which implements a Groovy 3.0.X parser, using both Pygments, Lark and the corresponding grammar.\n\nThe tokenizer, lexer and grammar have being tested, stressed and fine tuned\nto be able to properly parse both Nextflow (i.e. `*.nf`), `nextflow.config`-like files\nand real Groovy code from:\n\n* https://github.com/nf-core/modules.git\n* https://github.com/nf-core/rnaseq.git\n* https://github.com/nf-core/viralintegration.git\n* https://github.com/nf-core/viralrecon.git\n* https://github.com/wombat-p/WOMBAT-Pipelines.git\n* https://github.com/nextflow-io/nextflow.git\n\n## Install\nYou can install the development version of this package through pip just running:\n\n```bash\npip install git+https://github.com/inab/python-groovy-parser.git\n```\n\n## Test programs\n\nThis repo contains a couple of test programs called\n[translated-groovy3-parser.py](translated-groovy3-parser.py) and\n[cached-translated-groovy3-parser.py](cached-translated-groovy3-parser.py),\nwhich demonstrate how to use the parser and digest it a bit.\n\nThe programs take one or more files as input.\n\n```bash\ngit pull https://github.com/nf-core/rnaseq.git\ntranslated-groovy3-parser.py $(find rnaseq -type f -name \"*.nf\")\n```\n\nIf an input file is for instance `rnaseq/modules/local/bedtools_genomecov.nf`,\nthe program generates a log file `rnaseq/modules/local/bedtools_genomecov.nf.lark`,\nwhere the parsing traces are stored (emitted tokens, parsing errors, etc...).\n\nAlso, when the parsing task worked properly, it condenses and serializes\nthe parse tree into a file with extension `.lark.json` (for instance,\n`rnaseq/modules/local/bedtools_genomecov.nf.lark.json`).\n\nAnd as a proof of concept, it tries to identify features from Nextflow files,\nlike the declared processes, includes and workflows, and they are roughly printed\nat a file with extension `.lark.result` (for instance `rnaseq/modules/local/bedtools_genomecov.nf.lark.result`).\n\nAs parsing task is heavy, the parsing module also contains a method to\nbe able to cache the parsed tree in JSON format in a persistent store,\nlike a filesystem. So, next operation would be expensive the first time,\nbut not the next ones:\n\n```bash\nGROOVY_CACHEDIR=/tmp/somecachedir cached-translated-groovy3-parser.py $(find rnaseq -type f -name \"*.nf\")\n```\n\nThe caching directory contents depend on the grammar and the implementations, as well as versions of the dependencies.\nSo, if this software is updated (due grammar is updated or a bug is fixed),\ncached contents from previous versions are not reused.\n\n# Acknowledgements\n\nThe tokenizer is an evolution from Pygments Groovy lexer https://github.com/pygments/pygments/blob/b7c8f35440f591c6687cb912aa223f5cf37b6704/pygments/lexers/jvm.py#L543-L618\n\nThe Lark grammar has been created from https://github.com/apache/groovy/blob/3b6909a3dbb574e66f5d0fb6aafb6e28316033a8/src/antlr/GroovyParser.g4 ,\nconverting it to EBNF using https://bottlecaps.de/convert/ ,\ntranslating the EBNF representation to Lark format partially by hand.\n\nSome fixes were inspired on https://github.com/daniellansun/groovy-antlr4-grammar-optimized/tree/master/src/main/antlr4/org/codehaus/groovy/parser/antlr4\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Groovy 3.0.x parser based on Pygments and Lark",
    "version": "0.1.1",
    "project_urls": {
        "Bug Tracker": "https://github.com/inab/python-groovy-parser/issues",
        "Homepage": "https://github.com/inab/python-groovy-parser"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c8b393120431fea654e9d7b3eef133d36a8f080d1f70b48d0d81a6d3f3c95069",
                "md5": "66cd9b457916b6ced39f1c29626a1e64",
                "sha256": "ec97a0047c2456a9b48bc570a0143b3a3b8f2ab03af1e52397f93b538373eeb1"
            },
            "downloads": -1,
            "filename": "groovy_parser-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "66cd9b457916b6ced39f1c29626a1e64",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 30512,
            "upload_time": "2023-07-12T18:15:11",
            "upload_time_iso_8601": "2023-07-12T18:15:11.965395Z",
            "url": "https://files.pythonhosted.org/packages/c8/b3/93120431fea654e9d7b3eef133d36a8f080d1f70b48d0d81a6d3f3c95069/groovy_parser-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "bd40ed2c1f425057c4f0e06a6982c5e205dab8bdd28c4cf595ca391c390d0ae9",
                "md5": "bad64e51b90020e877d55ace17037229",
                "sha256": "b05f1bbb1fe8ab245f025c08fb7f96f072a7fa3ff5c7475b345e0f597b29de32"
            },
            "downloads": -1,
            "filename": "groovy-parser-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "bad64e51b90020e877d55ace17037229",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 27845,
            "upload_time": "2023-07-12T18:15:14",
            "upload_time_iso_8601": "2023-07-12T18:15:14.042218Z",
            "url": "https://files.pythonhosted.org/packages/bd/40/ed2c1f425057c4f0e06a6982c5e205dab8bdd28c4cf595ca391c390d0ae9/groovy-parser-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-07-12 18:15:14",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "inab",
    "github_project": "python-groovy-parser",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "lark",
            "specs": []
        },
        {
            "name": "Pygments",
            "specs": []
        }
    ],
    "lcname": "groovy-parser"
}
        
Elapsed time: 0.12366s