alpino-query


Namealpino-query JSON
Version 2.1.10 PyPI version JSON
download
home_pagehttps://github.com/UUDigitalHumanitieslab/alpino-query
SummaryGenerating XPATH queries based on a Dutch Alpino syntax tree and user-specified token properties.
upload_time2023-03-22 13:25:26
maintainer
docs_urlNone
authorDigital Humanities Lab, Utrecht University
requires_python>=3.7, <4
licenseCC BY-NC-SA 4.0
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![Actions Status](https://github.com/UUDigitalHumanitiesLab/alpino-query/workflows/Python%20package/badge.svg)](https://github.com/UUDigitalHumanitiesLab/alpino-query/actions)

# Alpino Query

```bash
pip install alpino-query
```

When running locally without installing, instead of `alpino-query` use `python -m alpino_query`.

## Parse

Parse a tokenized sentence using the Alpino instance running on [gretel.hum.uu.nl](https://gretel.hum.uu.nl).

For example:

```bash
alpino-query parse Dit is een voorbeeldzin .
```

Note that the period is a separate token.

It also works when the sentence is passed as a single argument.

```bash
alpino-query parse "Dit is een voorbeeldzin ."
```

## Mark

Mark which part of the treebank should selected for filtering. It has three inputs:

1. [Lassy/Alpino XML](https://www.let.rug.nl/~vannoord/Lassy/)
2. the tokens of the sentence
3. for each token specify the properties which should be marked

For example:

```bash
alpino-query mark "$(<tests/data/001.xml)" "Dit is een voorbeeldzin ." "pos pos pos pos pos"
```

It is also possible to mark multiple properties for a token, this is done by separating them with a comma. Each of these can also be specified to be negated. These will then be marked as 'exclude' in the tree.

```bash
alpino-query mark "$(<tests/data/001.xml)" "Dit is een voorbeeldzin ." "pos pos,-word,rel pos pos pos"
```

## Subtree

Generates a subtree containing only the marked properties. It will also contain additional attributes to mark that properties should be excluded and/or case sensitive.

The second argument can be empty, `cat`, `rel` or both (i.e. `catrel` or `cat,rel`). This indicates which attributes should be removed from the top node. When only one node is left in the subtree, this argument is ignored.

```bash
alpino-query subtree "$(<tests/data/001.marked.xml)" cat
```

## XPath

Generates an XPath to query a treebank from the generated subtree. Second argument indicates whether a query should be generated which is order-sensitive.

```bash
alpino-query xpath "$(<tests/data/001.subtree.xml)" 0
```

## Using as Module

```python
from alpino_query import AlpinoQuery

tokens = ["Dit", "is", "een", "voorbeeldzin", "."]
attributes = ["pos", "pos,-word,rel", "pos", "pos", "pos"]

query = AlpinoQuery()
alpino_xml = query.parse(tokens)
query.mark(alpino_xml, tokens, attributes)
print(query.marked_xml) # query.marked contains the lxml Element

query.generate_subtree(["rel", "cat"])
print(query.subtree_xml) # query.subtree contains the lxml Element

query.generate_xpath(False) # True to make order sensitive
print(query.xpath)
```

## Considerations

### Exclusive

When querying a node this could be exclusive in multiple ways.
For example:

* a node should not be a noun `node[@pos!="noun"]`
* it should not have a node which is a noun `not(node[@pos="noun"])`

The first statement does *require* the existence of a node, whereas the second also holds true if there is no node at all. When a token is only exclusive (e.g. not a noun) a query of the second form will be generated, if a token has both inclusive and exclusive properties a query of the first form will be generated.

### Relations

`@cat` and `@rel` are always preserved for nodes which have children. The only way for this to be dropped is for when all the children are removed by specifying the `na` property for the child tokens.

## Upload to PyPi

```bash
pip install twine
python setup.py sdist
twine upload dist/*
```



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/UUDigitalHumanitieslab/alpino-query",
    "name": "alpino-query",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7, <4",
    "maintainer_email": "",
    "keywords": "",
    "author": "Digital Humanities Lab, Utrecht University",
    "author_email": "digitalhumanities@uu.nl",
    "download_url": "https://files.pythonhosted.org/packages/37/b0/a38ca7df3c6a5414fe0706006204ab0d5fa036a44d2285157430f33c9f92/alpino-query-2.1.10.tar.gz",
    "platform": null,
    "description": "[![Actions Status](https://github.com/UUDigitalHumanitiesLab/alpino-query/workflows/Python%20package/badge.svg)](https://github.com/UUDigitalHumanitiesLab/alpino-query/actions)\n\n# Alpino Query\n\n```bash\npip install alpino-query\n```\n\nWhen running locally without installing, instead of `alpino-query` use `python -m alpino_query`.\n\n## Parse\n\nParse a tokenized sentence using the Alpino instance running on [gretel.hum.uu.nl](https://gretel.hum.uu.nl).\n\nFor example:\n\n```bash\nalpino-query parse Dit is een voorbeeldzin .\n```\n\nNote that the period is a separate token.\n\nIt also works when the sentence is passed as a single argument.\n\n```bash\nalpino-query parse \"Dit is een voorbeeldzin .\"\n```\n\n## Mark\n\nMark which part of the treebank should selected for filtering. It has three inputs:\n\n1. [Lassy/Alpino XML](https://www.let.rug.nl/~vannoord/Lassy/)\n2. the tokens of the sentence\n3. for each token specify the properties which should be marked\n\nFor example:\n\n```bash\nalpino-query mark \"$(<tests/data/001.xml)\" \"Dit is een voorbeeldzin .\" \"pos pos pos pos pos\"\n```\n\nIt is also possible to mark multiple properties for a token, this is done by separating them with a comma. Each of these can also be specified to be negated. These will then be marked as 'exclude' in the tree.\n\n```bash\nalpino-query mark \"$(<tests/data/001.xml)\" \"Dit is een voorbeeldzin .\" \"pos pos,-word,rel pos pos pos\"\n```\n\n## Subtree\n\nGenerates a subtree containing only the marked properties. It will also contain additional attributes to mark that properties should be excluded and/or case sensitive.\n\nThe second argument can be empty, `cat`, `rel` or both (i.e. `catrel` or `cat,rel`). This indicates which attributes should be removed from the top node. When only one node is left in the subtree, this argument is ignored.\n\n```bash\nalpino-query subtree \"$(<tests/data/001.marked.xml)\" cat\n```\n\n## XPath\n\nGenerates an XPath to query a treebank from the generated subtree. Second argument indicates whether a query should be generated which is order-sensitive.\n\n```bash\nalpino-query xpath \"$(<tests/data/001.subtree.xml)\" 0\n```\n\n## Using as Module\n\n```python\nfrom alpino_query import AlpinoQuery\n\ntokens = [\"Dit\", \"is\", \"een\", \"voorbeeldzin\", \".\"]\nattributes = [\"pos\", \"pos,-word,rel\", \"pos\", \"pos\", \"pos\"]\n\nquery = AlpinoQuery()\nalpino_xml = query.parse(tokens)\nquery.mark(alpino_xml, tokens, attributes)\nprint(query.marked_xml) # query.marked contains the lxml Element\n\nquery.generate_subtree([\"rel\", \"cat\"])\nprint(query.subtree_xml) # query.subtree contains the lxml Element\n\nquery.generate_xpath(False) # True to make order sensitive\nprint(query.xpath)\n```\n\n## Considerations\n\n### Exclusive\n\nWhen querying a node this could be exclusive in multiple ways.\nFor example:\n\n* a node should not be a noun `node[@pos!=\"noun\"]`\n* it should not have a node which is a noun `not(node[@pos=\"noun\"])`\n\nThe first statement does *require* the existence of a node, whereas the second also holds true if there is no node at all. When a token is only exclusive (e.g. not a noun) a query of the second form will be generated, if a token has both inclusive and exclusive properties a query of the first form will be generated.\n\n### Relations\n\n`@cat` and `@rel` are always preserved for nodes which have children. The only way for this to be dropped is for when all the children are removed by specifying the `na` property for the child tokens.\n\n## Upload to PyPi\n\n```bash\npip install twine\npython setup.py sdist\ntwine upload dist/*\n```\n\n\n",
    "bugtrack_url": null,
    "license": "CC BY-NC-SA 4.0",
    "summary": "Generating XPATH queries based on a Dutch Alpino syntax tree and user-specified token properties.",
    "version": "2.1.10",
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "37b0a38ca7df3c6a5414fe0706006204ab0d5fa036a44d2285157430f33c9f92",
                "md5": "07083784f04c41847f152d43605a267b",
                "sha256": "a69395a3cfcd2ddadb9c7696eb2716660067fa5466f1143a4ff7cf4884d5863e"
            },
            "downloads": -1,
            "filename": "alpino-query-2.1.10.tar.gz",
            "has_sig": false,
            "md5_digest": "07083784f04c41847f152d43605a267b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7, <4",
            "size": 16418,
            "upload_time": "2023-03-22T13:25:26",
            "upload_time_iso_8601": "2023-03-22T13:25:26.137584Z",
            "url": "https://files.pythonhosted.org/packages/37/b0/a38ca7df3c6a5414fe0706006204ab0d5fa036a44d2285157430f33c9f92/alpino-query-2.1.10.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-03-22 13:25:26",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "UUDigitalHumanitieslab",
    "github_project": "alpino-query",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "alpino-query"
}
        
Elapsed time: 0.16265s