jaf


Namejaf JSON
Version 0.2.0 PyPI version JSON
download
home_pageNone
SummaryJSON Array Filter (`jaf`) is a versatile filter for JSON arrays. It is a domain-specific language (DSL) that allows you to filter JSON arrays using a simple, yet powerful syntax. Programmatically, the AST can be used to filter JSON arrays in a more flexible way.
upload_time2024-12-21 05:45:03
maintainerNone
docs_urlNone
authorNone
requires_pythonNone
licenseMIT License
keywords filter json ast dsl lark parser array
VCS
bugtrack_url
requirements lark-parser pytest flake8
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # `jaf` - JSON Array Filter

`jaf` is a versatile filtering system designed to sift through JSON arrays.
It allows users to filter JSON arrays based on complex conditions using a
simple and intuitive query language. The query language is designed to be
easy to use and understand, while still being powerful enough to handle
complex filtering tasks.

## Builtins

We refer to the available operators as `builtins`. These are the functions that
are available to the user to use in their queries. The `builtins` are functions
that are used to compare values, perform operations, or combine other functions
to create complex queries. The `builtins` are the core of the filtering system
and are used to create queries that can filter JSON arrays based on the
specified conditions.

Predicates conanically end with a `?`, e.g., `eq?` (eauals) and `lt?` (less-than).
General operators do not canocially end with a `?`, e.g., `lower-case` and `or`.
The predicates are used to compare values, while the operators are used to combine
predicates or perform other operations that make the desired comparison possible
or the desired result achievable.

> *Note*: We do not use operators like `==` or `>`, but instead use `eq?` and
> `gt?`. The primary reason for this choice is that we provide a command-line
> tool, and if we used `>` it would be interpreted as a redirection operator
> by the shell.

For example, the `lower-case` operator is used to convert a string to lowercase before
comparison, so that the comparison is case-insensitive. Here is an example
query that uses the `lower-case` operator:

```python
['and', ['eq?', ['lower-case', ['path', 'language']], 'python']]
```

This query will filter repositories where the `language` field is equal to
`"python"`, regardless of the case of the letters.

> *Note*: Depending on the `builtins`, the query language can be Turing complete.
> e.g., it would be trivial to add a `lambda` builtin that allows users to define
> their own functions. However, this is not a safe practice, as it would allow
> users to execute arbitrary code. Therefore, we have chosen to limit the default
> `builtins` to a safe set of functions that are useful for filtering JSON arrays.
> If you need additional functionality, you can always extend or provide your own
> set of `builtins` to include the functions you need. As a limiting case, a
> `lambda` builtin could be added to the `builtins` to allow users to define their
> own functions.

## Query Language

Queries are represented using an Abstract Syntax Tree (AST) based on nested
lists, where each list takes the form of `[<expressio>, <arg1>, <arg2>,...]`.

We also provide a Domain-Specific Language (DSL) that allows users to craft
queries using an intuitive infix notation. The DSL is converted into the AST
before being evaluated. Here is the EBNF for the query language:

```ebnf
%import common.WS
%import common.ESCAPED_STRING
%import common.SIGNED_NUMBER
%ignore WS

start: expr

expr: bool_expr

?bool_expr: or_expr

?or_expr: and_expr
        | or_expr OR and_expr -> or_operation

?and_expr: primary
        | and_expr AND primary -> and_operation

?primary: operand
       | "(" bool_expr ")"

?operand: condition
       | function_call
       | path
       | bare_path
       | value

condition: operand operator operand

operator: IDENTIFIER

function_call: "(" IDENTIFIER operand+ ")"

path: ":" path_component ("." path_component)*

bare_path: path_component ("." path_component)*

path_component: IDENTIFIER 
             | STAR  
             | DOUBLESTAR

STAR: "*" 
DOUBLESTAR: "**"

value: ESCAPED_STRING
     | NUMBER
     | BOOLEAN

BOOLEAN: "True" | "False"
NUMBER: SIGNED_NUMBER

IDENTIFIER: /[a-zA-Z][a-zA-Z0-9_\-\?]*/

OR: "OR"
AND: "AND"
```

For example, consider the following query AST:

```python
['and',
    ['eq?', ['lower-case', ['path', 'language']], 'python'],
    ['gt?', ['path', 'stars'], 100],
    ['eq?', ['path','owner.name'], ['path': 'user.name']]]
```

It has an equivalent DSL given by:

```text
(lower-case :language) eq? "python" AND :stars gt? 100 AND :owner.name eq? :user.name
```

We see that we have a special notation for `path` commands: we prefix the field
name with a colon: `:`, such as `:language` and `:owner.name`. This is to distinguish
field names from other strings in the query. The `path`command is used to
access the value of a field in the JSON array. For example, `:owner.name` will
access the value of the `name` field in the `owner` object where as `owner.name`
will be interpreted as a string.

Paths can also include two kinds of wildcards, `*` and `**`. The wildcard `*`
matches any fieldname, e.g., `a.*.b.c` will match `a.d.b.c.a` (it will return `{'c': 'a'}`.
The wildcard `**` will match any fieldname at any depth after the specified path,
e.g., `a.**.c` will match `a.b.c.a` (it will also return `{'c': 'a'}`. You can use
as many wildcards as you wish in a single query. If *any* of the objects denoted by
a wildcard path satisfy the query, the object satisfies the query.

The DSL is converted into the AST (see the above EBNF) before being evaluated.
This query AST is evaluated against each element of the JSON array, and if it
returns `True`, the corresponding index into the JSON array for that element is
added to the result. This is how we filter the JSON array. Alternatively, since
queries can also specify general functions, the result may be a value rather
than a Boolean, e.g., `['lower-case', 'Python']` will return `'python`.

## Relative Advantages of AST and DSL

Both have their own advantages and can be used interchangeably based on the
user's preference. The AST is:

- programmatic
- easily manipulated
- can be generated from a DSL
- easily serialized for storage or transmission
- allows for operators to be queries, facilitating some meta-programming

The DSL is:

- More human-readable, e.g. infix notation for logical operators
- Easier to write and understand
- Compact

## Installation

You can install `jaf` via PyPI:

```bash
pip install `jaf`
```

Or install directly from the source:

```bash
git clone https://github.com/queelius/jaf.git
cd jaf
pip install .
```

## Examples

Suppose we have a list of repositories in the following format:

```python
repos = [
    {
        'id': 1,
        'name': 'DataScienceRepo',
        'language': 'Python',
        'stars': 150,
        'forks': 30,
        'description': 'A repository for data science projects.',
        'owner': {
            'name': 'alice',
            'active': True
        }
    },
    # ... other repositories ...
]
```

### AST-Based Query

Filter repositories where the lower-case of `language` is `"python"`,
`owner.active` is `True`, and `stars` are greater than `100`:

```python
query = ['and',
    ['eq?',
        ['lower-case', ['path', 'language'], 'Python']],
        ['path', 'owner.active'],
        ['gt?', ['path', 'stars'], 100]]

filtered = jaf(sample_repos, query_ast)
print("Filtered Repositories:")
pprint(filtered)
# Output: [1, ...]
```

### DSL-Based Query

The equivalent query using the DSL:

```python
query = '(lower-case :language) eq? "python" AND :owner.active AND :stars gt? 100'
filtered = jaf(repos, query)
print("Filtered Repositories:")
print(filtered)
# Output: [1, ...]
```

### Complex Queries

Combine multiple conditions with logical operators.

```python
query = ':language neq? "R" AND (:stars gt? 100 OR :forks gt? 50)'
filtered = jaf(repos, query)
print("Filtered Repositories:")
```

### Handling Errors

Catch and handle filtering errors gracefully.

```python
try:
    invalid_query = 'language unknown "Python"'
    jaf(repos, invalid_query)
except FilterError as e:
    print(f"Error: {e}")
```

## Contributing

Contributions are welcome! Please open an issue or submit a pull request for any enhancements or bug fixes.

## License

This project is licensed under the MIT License. See the `LICENSE` file for details.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "jaf",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "filter, JSON, AST, DSL, Lark, parser, array",
    "author": null,
    "author_email": "Alex Towell <lex@metafunctor.com>",
    "download_url": "https://files.pythonhosted.org/packages/26/27/67bfcb35d8e4e0d822319958f78ee5170e64882e15e25a10576f86288ed1/jaf-0.2.0.tar.gz",
    "platform": null,
    "description": "# `jaf` - JSON Array Filter\n\n`jaf` is a versatile filtering system designed to sift through JSON arrays.\nIt allows users to filter JSON arrays based on complex conditions using a\nsimple and intuitive query language. The query language is designed to be\neasy to use and understand, while still being powerful enough to handle\ncomplex filtering tasks.\n\n## Builtins\n\nWe refer to the available operators as `builtins`. These are the functions that\nare available to the user to use in their queries. The `builtins` are functions\nthat are used to compare values, perform operations, or combine other functions\nto create complex queries. The `builtins` are the core of the filtering system\nand are used to create queries that can filter JSON arrays based on the\nspecified conditions.\n\nPredicates conanically end with a `?`, e.g., `eq?` (eauals) and `lt?` (less-than).\nGeneral operators do not canocially end with a `?`, e.g., `lower-case` and `or`.\nThe predicates are used to compare values, while the operators are used to combine\npredicates or perform other operations that make the desired comparison possible\nor the desired result achievable.\n\n> *Note*: We do not use operators like `==` or `>`, but instead use `eq?` and\n> `gt?`. The primary reason for this choice is that we provide a command-line\n> tool, and if we used `>` it would be interpreted as a redirection operator\n> by the shell.\n\nFor example, the `lower-case` operator is used to convert a string to lowercase before\ncomparison, so that the comparison is case-insensitive. Here is an example\nquery that uses the `lower-case` operator:\n\n```python\n['and', ['eq?', ['lower-case', ['path', 'language']], 'python']]\n```\n\nThis query will filter repositories where the `language` field is equal to\n`\"python\"`, regardless of the case of the letters.\n\n> *Note*: Depending on the `builtins`, the query language can be Turing complete.\n> e.g., it would be trivial to add a `lambda` builtin that allows users to define\n> their own functions. However, this is not a safe practice, as it would allow\n> users to execute arbitrary code. Therefore, we have chosen to limit the default\n> `builtins` to a safe set of functions that are useful for filtering JSON arrays.\n> If you need additional functionality, you can always extend or provide your own\n> set of `builtins` to include the functions you need. As a limiting case, a\n> `lambda` builtin could be added to the `builtins` to allow users to define their\n> own functions.\n\n## Query Language\n\nQueries are represented using an Abstract Syntax Tree (AST) based on nested\nlists, where each list takes the form of `[<expressio>, <arg1>, <arg2>,...]`.\n\nWe also provide a Domain-Specific Language (DSL) that allows users to craft\nqueries using an intuitive infix notation. The DSL is converted into the AST\nbefore being evaluated. Here is the EBNF for the query language:\n\n```ebnf\n%import common.WS\n%import common.ESCAPED_STRING\n%import common.SIGNED_NUMBER\n%ignore WS\n\nstart: expr\n\nexpr: bool_expr\n\n?bool_expr: or_expr\n\n?or_expr: and_expr\n        | or_expr OR and_expr -> or_operation\n\n?and_expr: primary\n        | and_expr AND primary -> and_operation\n\n?primary: operand\n       | \"(\" bool_expr \")\"\n\n?operand: condition\n       | function_call\n       | path\n       | bare_path\n       | value\n\ncondition: operand operator operand\n\noperator: IDENTIFIER\n\nfunction_call: \"(\" IDENTIFIER operand+ \")\"\n\npath: \":\" path_component (\".\" path_component)*\n\nbare_path: path_component (\".\" path_component)*\n\npath_component: IDENTIFIER \n             | STAR  \n             | DOUBLESTAR\n\nSTAR: \"*\" \nDOUBLESTAR: \"**\"\n\nvalue: ESCAPED_STRING\n     | NUMBER\n     | BOOLEAN\n\nBOOLEAN: \"True\" | \"False\"\nNUMBER: SIGNED_NUMBER\n\nIDENTIFIER: /[a-zA-Z][a-zA-Z0-9_\\-\\?]*/\n\nOR: \"OR\"\nAND: \"AND\"\n```\n\nFor example, consider the following query AST:\n\n```python\n['and',\n    ['eq?', ['lower-case', ['path', 'language']], 'python'],\n    ['gt?', ['path', 'stars'], 100],\n    ['eq?', ['path','owner.name'], ['path': 'user.name']]]\n```\n\nIt has an equivalent DSL given by:\n\n```text\n(lower-case :language) eq? \"python\" AND :stars gt? 100 AND :owner.name eq? :user.name\n```\n\nWe see that we have a special notation for `path` commands: we prefix the field\nname with a colon: `:`, such as `:language` and `:owner.name`. This is to distinguish\nfield names from other strings in the query. The `path`command is used to\naccess the value of a field in the JSON array. For example, `:owner.name` will\naccess the value of the `name` field in the `owner` object where as `owner.name`\nwill be interpreted as a string.\n\nPaths can also include two kinds of wildcards, `*` and `**`. The wildcard `*`\nmatches any fieldname, e.g., `a.*.b.c` will match `a.d.b.c.a` (it will return `{'c': 'a'}`.\nThe wildcard `**` will match any fieldname at any depth after the specified path,\ne.g., `a.**.c` will match `a.b.c.a` (it will also return `{'c': 'a'}`. You can use\nas many wildcards as you wish in a single query. If *any* of the objects denoted by\na wildcard path satisfy the query, the object satisfies the query.\n\nThe DSL is converted into the AST (see the above EBNF) before being evaluated.\nThis query AST is evaluated against each element of the JSON array, and if it\nreturns `True`, the corresponding index into the JSON array for that element is\nadded to the result. This is how we filter the JSON array. Alternatively, since\nqueries can also specify general functions, the result may be a value rather\nthan a Boolean, e.g., `['lower-case', 'Python']` will return `'python`.\n\n## Relative Advantages of AST and DSL\n\nBoth have their own advantages and can be used interchangeably based on the\nuser's preference. The AST is:\n\n- programmatic\n- easily manipulated\n- can be generated from a DSL\n- easily serialized for storage or transmission\n- allows for operators to be queries, facilitating some meta-programming\n\nThe DSL is:\n\n- More human-readable, e.g. infix notation for logical operators\n- Easier to write and understand\n- Compact\n\n## Installation\n\nYou can install `jaf` via PyPI:\n\n```bash\npip install `jaf`\n```\n\nOr install directly from the source:\n\n```bash\ngit clone https://github.com/queelius/jaf.git\ncd jaf\npip install .\n```\n\n## Examples\n\nSuppose we have a list of repositories in the following format:\n\n```python\nrepos = [\n    {\n        'id': 1,\n        'name': 'DataScienceRepo',\n        'language': 'Python',\n        'stars': 150,\n        'forks': 30,\n        'description': 'A repository for data science projects.',\n        'owner': {\n            'name': 'alice',\n            'active': True\n        }\n    },\n    # ... other repositories ...\n]\n```\n\n### AST-Based Query\n\nFilter repositories where the lower-case of `language` is `\"python\"`,\n`owner.active` is `True`, and `stars` are greater than `100`:\n\n```python\nquery = ['and',\n    ['eq?',\n        ['lower-case', ['path', 'language'], 'Python']],\n        ['path', 'owner.active'],\n        ['gt?', ['path', 'stars'], 100]]\n\nfiltered = jaf(sample_repos, query_ast)\nprint(\"Filtered Repositories:\")\npprint(filtered)\n# Output: [1, ...]\n```\n\n### DSL-Based Query\n\nThe equivalent query using the DSL:\n\n```python\nquery = '(lower-case :language) eq? \"python\" AND :owner.active AND :stars gt? 100'\nfiltered = jaf(repos, query)\nprint(\"Filtered Repositories:\")\nprint(filtered)\n# Output: [1, ...]\n```\n\n### Complex Queries\n\nCombine multiple conditions with logical operators.\n\n```python\nquery = ':language neq? \"R\" AND (:stars gt? 100 OR :forks gt? 50)'\nfiltered = jaf(repos, query)\nprint(\"Filtered Repositories:\")\n```\n\n### Handling Errors\n\nCatch and handle filtering errors gracefully.\n\n```python\ntry:\n    invalid_query = 'language unknown \"Python\"'\n    jaf(repos, invalid_query)\nexcept FilterError as e:\n    print(f\"Error: {e}\")\n```\n\n## Contributing\n\nContributions are welcome! Please open an issue or submit a pull request for any enhancements or bug fixes.\n\n## License\n\nThis project is licensed under the MIT License. See the `LICENSE` file for details.\n",
    "bugtrack_url": null,
    "license": "MIT License",
    "summary": "JSON Array Filter (`jaf`) is a versatile filter for JSON arrays. It is a domain-specific language (DSL) that allows you to filter JSON arrays using a simple, yet powerful syntax. Programmatically, the AST can be used to filter JSON arrays in a more flexible way.",
    "version": "0.2.0",
    "project_urls": {
        "Documentation": "https://github.com/queelius/jaf#readme",
        "Homepage": "https://github.com/queelius/jaf",
        "Repository": "https://github.com/queelius/jaf"
    },
    "split_keywords": [
        "filter",
        " json",
        " ast",
        " dsl",
        " lark",
        " parser",
        " array"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "72a0c05641a296510bcf04a17fb2c84cd646384940a573caffe88282afb6db92",
                "md5": "1cdcd53ac269abe9862aa8e1b0dbb54a",
                "sha256": "775415995de93b1b2d768a9e40436e98a6276fc9a0fefef249952a4ee8ec60cb"
            },
            "downloads": -1,
            "filename": "jaf-0.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1cdcd53ac269abe9862aa8e1b0dbb54a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 13154,
            "upload_time": "2024-12-21T05:45:01",
            "upload_time_iso_8601": "2024-12-21T05:45:01.759168Z",
            "url": "https://files.pythonhosted.org/packages/72/a0/c05641a296510bcf04a17fb2c84cd646384940a573caffe88282afb6db92/jaf-0.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "262767bfcb35d8e4e0d822319958f78ee5170e64882e15e25a10576f86288ed1",
                "md5": "d0a4a36a6f58638fe8d0be5f81ae1249",
                "sha256": "61be525ebe2986e2c69616dbf5bbc82db49b18d4341b7791e612c9177df2c9bb"
            },
            "downloads": -1,
            "filename": "jaf-0.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "d0a4a36a6f58638fe8d0be5f81ae1249",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 14497,
            "upload_time": "2024-12-21T05:45:03",
            "upload_time_iso_8601": "2024-12-21T05:45:03.870653Z",
            "url": "https://files.pythonhosted.org/packages/26/27/67bfcb35d8e4e0d822319958f78ee5170e64882e15e25a10576f86288ed1/jaf-0.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-21 05:45:03",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "queelius",
    "github_project": "jaf#readme",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "lark-parser",
            "specs": [
                [
                    ">=",
                    "0.12.0"
                ]
            ]
        },
        {
            "name": "pytest",
            "specs": [
                [
                    ">=",
                    "6.0.0"
                ]
            ]
        },
        {
            "name": "flake8",
            "specs": [
                [
                    ">=",
                    "3.8.0"
                ]
            ]
        }
    ],
    "lcname": "jaf"
}
        
Elapsed time: 1.76908s