[](https://github.com/UUDigitalHumanitiesLab/alpino-query/actions)
# Alpino Query
```bash
pip install alpino-query
```
When running locally without installing, instead of `alpino-query` use `python -m alpino_query`.
## Parse
Parse a tokenized sentence using the Alpino instance running on [gretel.hum.uu.nl](https://gretel.hum.uu.nl).
For example:
```bash
alpino-query parse Dit is een voorbeeldzin .
```
Note that the period is a separate token.
It also works when the sentence is passed as a single argument.
```bash
alpino-query parse "Dit is een voorbeeldzin ."
```
## Mark
Mark which part of the treebank should selected for filtering. It has three inputs:
1. [Lassy/Alpino XML](https://www.let.rug.nl/~vannoord/Lassy/)
2. the tokens of the sentence
3. for each token specify the properties which should be marked
For example:
```bash
alpino-query mark "$(<tests/data/001.xml)" "Dit is een voorbeeldzin ." "pos pos pos pos pos"
```
It is also possible to mark multiple properties for a token, this is done by separating them with a comma. Each of these can also be specified to be negated. These will then be marked as 'exclude' in the tree.
```bash
alpino-query mark "$(<tests/data/001.xml)" "Dit is een voorbeeldzin ." "pos pos,-word,rel pos pos pos"
```
## Subtree
Generates a subtree containing only the marked properties. It will also contain additional attributes to mark that properties should be excluded and/or case sensitive.
The second argument can be empty, `cat`, `rel` or both (i.e. `catrel` or `cat,rel`). This indicates which attributes should be removed from the top node. When only one node is left in the subtree, this argument is ignored.
```bash
alpino-query subtree "$(<tests/data/001.marked.xml)" cat
```
## XPath
Generates an XPath to query a treebank from the generated subtree. Second argument indicates whether a query should be generated which is order-sensitive.
```bash
alpino-query xpath "$(<tests/data/001.subtree.xml)" 0
```
## Using as Module
```python
from alpino_query import AlpinoQuery
tokens = ["Dit", "is", "een", "voorbeeldzin", "."]
attributes = ["pos", "pos,-word,rel", "pos", "pos", "pos"]
query = AlpinoQuery()
alpino_xml = query.parse(tokens)
query.mark(alpino_xml, tokens, attributes)
print(query.marked_xml) # query.marked contains the lxml Element
query.generate_subtree(["rel", "cat"])
print(query.subtree_xml) # query.subtree contains the lxml Element
query.generate_xpath(False) # True to make order sensitive
print(query.xpath)
```
## Considerations
### Exclusive
When querying a node this could be exclusive in multiple ways.
For example:
* a node should not be a noun `node[@pos!="noun"]`
* it should not have a node which is a noun `not(node[@pos="noun"])`
The first statement does *require* the existence of a node, whereas the second also holds true if there is no node at all. When a token is only exclusive (e.g. not a noun) a query of the second form will be generated, if a token has both inclusive and exclusive properties a query of the first form will be generated.
### Relations
`@cat` and `@rel` are always preserved for nodes which have children. The only way for this to be dropped is for when all the children are removed by specifying the `na` property for the child tokens.
## Upload to PyPi
```bash
pip install twine
python setup.py sdist
twine upload dist/*
```
Raw data
{
"_id": null,
"home_page": "https://github.com/UUDigitalHumanitieslab/alpino-query",
"name": "alpino-query",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.7, <4",
"maintainer_email": "",
"keywords": "",
"author": "Digital Humanities Lab, Utrecht University",
"author_email": "digitalhumanities@uu.nl",
"download_url": "https://files.pythonhosted.org/packages/37/b0/a38ca7df3c6a5414fe0706006204ab0d5fa036a44d2285157430f33c9f92/alpino-query-2.1.10.tar.gz",
"platform": null,
"description": "[](https://github.com/UUDigitalHumanitiesLab/alpino-query/actions)\n\n# Alpino Query\n\n```bash\npip install alpino-query\n```\n\nWhen running locally without installing, instead of `alpino-query` use `python -m alpino_query`.\n\n## Parse\n\nParse a tokenized sentence using the Alpino instance running on [gretel.hum.uu.nl](https://gretel.hum.uu.nl).\n\nFor example:\n\n```bash\nalpino-query parse Dit is een voorbeeldzin .\n```\n\nNote that the period is a separate token.\n\nIt also works when the sentence is passed as a single argument.\n\n```bash\nalpino-query parse \"Dit is een voorbeeldzin .\"\n```\n\n## Mark\n\nMark which part of the treebank should selected for filtering. It has three inputs:\n\n1. [Lassy/Alpino XML](https://www.let.rug.nl/~vannoord/Lassy/)\n2. the tokens of the sentence\n3. for each token specify the properties which should be marked\n\nFor example:\n\n```bash\nalpino-query mark \"$(<tests/data/001.xml)\" \"Dit is een voorbeeldzin .\" \"pos pos pos pos pos\"\n```\n\nIt is also possible to mark multiple properties for a token, this is done by separating them with a comma. Each of these can also be specified to be negated. These will then be marked as 'exclude' in the tree.\n\n```bash\nalpino-query mark \"$(<tests/data/001.xml)\" \"Dit is een voorbeeldzin .\" \"pos pos,-word,rel pos pos pos\"\n```\n\n## Subtree\n\nGenerates a subtree containing only the marked properties. It will also contain additional attributes to mark that properties should be excluded and/or case sensitive.\n\nThe second argument can be empty, `cat`, `rel` or both (i.e. `catrel` or `cat,rel`). This indicates which attributes should be removed from the top node. When only one node is left in the subtree, this argument is ignored.\n\n```bash\nalpino-query subtree \"$(<tests/data/001.marked.xml)\" cat\n```\n\n## XPath\n\nGenerates an XPath to query a treebank from the generated subtree. Second argument indicates whether a query should be generated which is order-sensitive.\n\n```bash\nalpino-query xpath \"$(<tests/data/001.subtree.xml)\" 0\n```\n\n## Using as Module\n\n```python\nfrom alpino_query import AlpinoQuery\n\ntokens = [\"Dit\", \"is\", \"een\", \"voorbeeldzin\", \".\"]\nattributes = [\"pos\", \"pos,-word,rel\", \"pos\", \"pos\", \"pos\"]\n\nquery = AlpinoQuery()\nalpino_xml = query.parse(tokens)\nquery.mark(alpino_xml, tokens, attributes)\nprint(query.marked_xml) # query.marked contains the lxml Element\n\nquery.generate_subtree([\"rel\", \"cat\"])\nprint(query.subtree_xml) # query.subtree contains the lxml Element\n\nquery.generate_xpath(False) # True to make order sensitive\nprint(query.xpath)\n```\n\n## Considerations\n\n### Exclusive\n\nWhen querying a node this could be exclusive in multiple ways.\nFor example:\n\n* a node should not be a noun `node[@pos!=\"noun\"]`\n* it should not have a node which is a noun `not(node[@pos=\"noun\"])`\n\nThe first statement does *require* the existence of a node, whereas the second also holds true if there is no node at all. When a token is only exclusive (e.g. not a noun) a query of the second form will be generated, if a token has both inclusive and exclusive properties a query of the first form will be generated.\n\n### Relations\n\n`@cat` and `@rel` are always preserved for nodes which have children. The only way for this to be dropped is for when all the children are removed by specifying the `na` property for the child tokens.\n\n## Upload to PyPi\n\n```bash\npip install twine\npython setup.py sdist\ntwine upload dist/*\n```\n\n\n",
"bugtrack_url": null,
"license": "CC BY-NC-SA 4.0",
"summary": "Generating XPATH queries based on a Dutch Alpino syntax tree and user-specified token properties.",
"version": "2.1.10",
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "37b0a38ca7df3c6a5414fe0706006204ab0d5fa036a44d2285157430f33c9f92",
"md5": "07083784f04c41847f152d43605a267b",
"sha256": "a69395a3cfcd2ddadb9c7696eb2716660067fa5466f1143a4ff7cf4884d5863e"
},
"downloads": -1,
"filename": "alpino-query-2.1.10.tar.gz",
"has_sig": false,
"md5_digest": "07083784f04c41847f152d43605a267b",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7, <4",
"size": 16418,
"upload_time": "2023-03-22T13:25:26",
"upload_time_iso_8601": "2023-03-22T13:25:26.137584Z",
"url": "https://files.pythonhosted.org/packages/37/b0/a38ca7df3c6a5414fe0706006204ab0d5fa036a44d2285157430f33c9f92/alpino-query-2.1.10.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-03-22 13:25:26",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "UUDigitalHumanitieslab",
"github_project": "alpino-query",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "alpino-query"
}