apies


Nameapies JSON
Version 1.11.0 PyPI version JSON
download
home_pagehttps://github.com/OpenBudget/apies
SummaryA flask blueprint providing an API for accessing and searching an ElasticSearch index created from source datapackages
upload_time2024-04-18 14:35:27
maintainerNone
docs_urlNone
authorAdam Kariv
requires_pythonNone
licenseMIT
keywords data
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI
coveralls test coverage No coveralls.
            # apies

[![Travis](https://img.shields.io/travis/OpenBudget/apies/master.svg)](https://travis-ci.org/datahq/apies)
[![Coveralls](http://img.shields.io/coveralls/OpenBudget/apies.svg?branch=master)](https://coveralls.io/r/OpenBudget/apies?branch=master)
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/apies.svg)

apies is a flask blueprint providing an API for accessing and searching an ElasticSearch index created from source datapackages.

## endpoints

### `/get/<doc-id>`

Fetches a document from the index.

Query parameters that can be used:
- **type**: The type of the document to fetch (if not `docs`)

### `/search/count`

### `/search/<doc-types>`

Performs a search on the index.

`doc-types` is a comma separated list of document types to search.

Query parameters that can be used:
- **q**: The full text search textual query

- **filter**: A JSON object with filters to apply to the search. These are applied to the query but don't affect the scoring of the results.
  Filters should be an array of objects, each object depicting a single filter. All filters are combined with an `OR` operator. For example:
    ```
    [
        {
            "first-name": "John",
            "last-name": "Watson"
        },
        {
            "first-name": "Sherlock",
            "last-name": "Holmes"
        }
    ]
    ```
  Each object contains a set of rules that all must match. Each rule is a key-value pair, where the key is the field name and the value is the value to match. The value can be a string or an array of strings. If the value is an array, the rule will match if any of the values in the array match. For example:
    ```
    {
        "first-name": ["Emily", "Charlotte"],
        "last-name": "Bronte"
    }
    ```
  Field names can be appended with two underscores and an operator to convey other relations other than equality. For example:
    ```
    {
        "first-name": "Emily",
        "last-name": "Bronte",
        "age__gt": 30,
    }
    ```
  Allowed operators are:
  ('gt', 'gte', 'lt', 'lte', 'eq', 'not', 'like', 'bounded', 'all'):
    - `gt`: greater than
    - `gte`: greater than or equal to
    - `lt`: less than
    - `lte`: less than or equal to
    - `eq`: equal to
    - `not`: not equal to
    - `like`: like (textual match)
    - `bounded`: bounded (geospatial match to a bounding box)
    - `all`: all (for arrays - all values in the array must exist in the target)

  If multiple operators are needed for the same field, the field can also be suffixed by a hashtag and a number. For example:
    ```
    {
        "city": "San Francisco",
        "price__lt": 300000,
        "bedrooms__gt": 4,
        "amenities": "garage",
        "amenities#1": ["pool", "back yard"],
    }
    ```
    The above filter will match all documents where the `city` is "San Francisco", `price` is less than 300000, more than 4 `bedrooms`, the `amenities` field contains 'garage' and at least one of "pool" and "back yard".

- **lookup**: A JSON object with lookup filters to apply to the search. These filter the results, but also affect the scoring of the results.
- **context**: A textual context to search in (i.e. run the search in a subset of results matching the full-text-search query provided in this field)

- **extra**: Extra information that's passed to library extensions

- **size**: Number of results to fetch (default: 10)
- **offset**: Offset of first result to fetch (default: 0)
- **order**: Order results by (default: _score)

- **highlight**: Commas separated list of fields to highlight
- **snippets**: Commas separated list of fields to fetch snippets from

- **match_type**: ElasticSearch match type (default: most_fields)
- **match_operator**: ElasticSearch match operator (default: and)
- **minscore**: Minimum score for a result to be returned (default: 0.0)

### `download/<doctypes>`

Downloads search results in either csv, xls or xlsx format.

Query parameters that can be used:
- **types_formatted**: The type of the documents to search
- **search_term**: The Elastic search query
- **size**: Number of hits to return
- **offset**: Whether or not term offsets should be returned
- **filters**: What offset to use for the pagination
- **dont_highlight**:
- **from_date**: If there should be a date range applied to the search, and from what date
- **to_date**: If there should be a date range applied to the search, and until what date
- **order**:
- **file_format**: The format of the file to be returned, either 'csv', 'xls' or 'xlsx'.
If not passed the file format will be xlsx
- **file_name**: The name of the file to be returned, by default the name will be 'search_results'
- **column_mapping**: If the columns should get a different name then in the
original data, a column map can be send, for example:
```
{
  "עיר": "address.city",
  "תקציב": "details.budget"
}
```

For example, get a csv file with column mapping:
```
http://localhost:5000/api/download/jobs?q=engineering&size=2&file_format=csv&file_name=my_results&column_mapping={%22mispar%22:%22Job%20ID%22}
```

Or get an xslx file without column mapping:
```
http://localhost:5000/api/download/jobs?q=engineering&size=2&file_format=xlsx&file_name=my_results
```

## configuration

Flask configuration for this blueprint:


```python

    from apies import apies_blueprint
    import elasticsearch

    app.register_blueprint(
        apies_blueprint(['path/to/datapackage.json', Package(), ...],
                        elasticsearch.Elasticsearch(...), 
                        {'doc-type-1': 'index-for-doc-type-1', ...}, 
                        'index-for-documents',
                        dont_highlight=['fields', 'not.to', 'highlight'],
                        text_field_rules=lambda schema_field: [], # list of tuples: ('exact'/'inexact'/'natural', <field-name>)
                        multi_match_type='most_fields',
                        multi_match_operator='and'),
        url_prefix='/search/'
    )
```

## local development

You can start a local development server by following these steps:

1. Install Dependencies:
    
    a. Install Docker locally
    
    b. Install Python dependencies:

    ```bash
    $ pip install dataflows dataflows-elasticsearch
    $ pip install -e .
    ```
2. Go to the `sample/` directory
3. Start ElasticSearch locally:
   ```bash
   $ ./start_elasticsearch.sh
   ```

   This script will wait and poll the server until it's up and running.
   You can test it yourself by running:
   ```bash
   $ curl -s http://localhost:9200
        {
        "name" : "99cd2db44924",
        "cluster_name" : "docker-cluster",
        "cluster_uuid" : "nF9fuwRyRYSzyQrcH9RCnA",
        "version" : {
            "number" : "7.4.2",
            "build_flavor" : "default",
            "build_type" : "docker",
            "build_hash" : "2f90bbf7b93631e52bafb59b3b049cb44ec25e96",
            "build_date" : "2019-10-28T20:40:44.881551Z",
            "build_snapshot" : false,
            "lucene_version" : "8.2.0",
            "minimum_wire_compatibility_version" : "6.8.0",
            "minimum_index_compatibility_version" : "6.0.0-beta1"
        },
        "tagline" : "You Know, for Search"
        }
   ```
4. Load data into the database
   ```bash
   $ DATAFLOWS_ELASTICSEARCH=localhost:9200 python load_fixtures.py
   ```
   You can test that data was loaded:
   ```bash
   $ curl -s http://localhost:9200/jobs-job/_count?pretty
    {
        "count" : 1757,
        "_shards" : {
            "total" : 1,
            "successful" : 1,
            "skipped" : 0,
            "failed" : 0
        }
    }
   ```
5. Start the sample server
   ```bash
   $ python server.py 
    * Serving Flask app "server" (lazy loading)
    * Environment: production
    WARNING: Do not use the development server in a production environment.
    Use a production WSGI server instead.
    * Debug mode: off
    * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
   ```  
6. Now you can hit the server's endpoints, for example:
   ```bash
        $ curl -s 'localhost:5000/api/search/jobs?q=engineering&size=2' | jq
        127.0.0.1 - - [26/Jun/2019 10:45:31] "GET /api/search/jobs?q=engineering&size=2 HTTP/1.1" 200 -
        {
            "search_counts": {
                "_current": {
                "total_overall": 617
                }
            },
            "search_results": [
                {
                "score": 18.812,
                "source": {
                    "# Of Positions": "5",
                    "Additional Information": "TO BE APPOINTED TO ANY CIVIL <em>ENGINEERING</em> POSITION IN BRIDGES, CANDIDATES MUST POSSESS ONE YEAR OF CIVIL <em>ENGINEERING</em> EXPERIENCE IN BRIDGE DESIGN, BRIDGE CONSTRUCTION, BRIDGE MAINTENANCE OR BRIDGE INSPECTION.",
                    "Agency": "DEPARTMENT OF TRANSPORTATION",
                    "Business Title": "Civil Engineer 2",
                    "Civil Service Title": "CIVIL ENGINEER",
                    "Division/Work Unit": "<em>Engineering</em> Review & Support",
            ...
        }
    ```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/OpenBudget/apies",
    "name": "apies",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "data",
    "author": "Adam Kariv",
    "author_email": "adam.kariv@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/81/c1/ee074c4f174f103f1d8e218730a19416dc83851ab6395e9de9265fc2db2e/apies-1.11.0.tar.gz",
    "platform": null,
    "description": "# apies\n\n[![Travis](https://img.shields.io/travis/OpenBudget/apies/master.svg)](https://travis-ci.org/datahq/apies)\n[![Coveralls](http://img.shields.io/coveralls/OpenBudget/apies.svg?branch=master)](https://coveralls.io/r/OpenBudget/apies?branch=master)\n![PyPI - Python Version](https://img.shields.io/pypi/pyversions/apies.svg)\n\napies is a flask blueprint providing an API for accessing and searching an ElasticSearch index created from source datapackages.\n\n## endpoints\n\n### `/get/<doc-id>`\n\nFetches a document from the index.\n\nQuery parameters that can be used:\n- **type**: The type of the document to fetch (if not `docs`)\n\n### `/search/count`\n\n### `/search/<doc-types>`\n\nPerforms a search on the index.\n\n`doc-types` is a comma separated list of document types to search.\n\nQuery parameters that can be used:\n- **q**: The full text search textual query\n\n- **filter**: A JSON object with filters to apply to the search. These are applied to the query but don't affect the scoring of the results.\n  Filters should be an array of objects, each object depicting a single filter. All filters are combined with an `OR` operator. For example:\n    ```\n    [\n        {\n            \"first-name\": \"John\",\n            \"last-name\": \"Watson\"\n        },\n        {\n            \"first-name\": \"Sherlock\",\n            \"last-name\": \"Holmes\"\n        }\n    ]\n    ```\n  Each object contains a set of rules that all must match. Each rule is a key-value pair, where the key is the field name and the value is the value to match. The value can be a string or an array of strings. If the value is an array, the rule will match if any of the values in the array match. For example:\n    ```\n    {\n        \"first-name\": [\"Emily\", \"Charlotte\"],\n        \"last-name\": \"Bronte\"\n    }\n    ```\n  Field names can be appended with two underscores and an operator to convey other relations other than equality. For example:\n    ```\n    {\n        \"first-name\": \"Emily\",\n        \"last-name\": \"Bronte\",\n        \"age__gt\": 30,\n    }\n    ```\n  Allowed operators are:\n  ('gt', 'gte', 'lt', 'lte', 'eq', 'not', 'like', 'bounded', 'all'):\n    - `gt`: greater than\n    - `gte`: greater than or equal to\n    - `lt`: less than\n    - `lte`: less than or equal to\n    - `eq`: equal to\n    - `not`: not equal to\n    - `like`: like (textual match)\n    - `bounded`: bounded (geospatial match to a bounding box)\n    - `all`: all (for arrays - all values in the array must exist in the target)\n\n  If multiple operators are needed for the same field, the field can also be suffixed by a hashtag and a number. For example:\n    ```\n    {\n        \"city\": \"San Francisco\",\n        \"price__lt\": 300000,\n        \"bedrooms__gt\": 4,\n        \"amenities\": \"garage\",\n        \"amenities#1\": [\"pool\", \"back yard\"],\n    }\n    ```\n    The above filter will match all documents where the `city` is \"San Francisco\", `price` is less than 300000, more than 4 `bedrooms`, the `amenities` field contains 'garage' and at least one of \"pool\" and \"back yard\".\n\n- **lookup**: A JSON object with lookup filters to apply to the search. These filter the results, but also affect the scoring of the results.\n- **context**: A textual context to search in (i.e. run the search in a subset of results matching the full-text-search query provided in this field)\n\n- **extra**: Extra information that's passed to library extensions\n\n- **size**: Number of results to fetch (default: 10)\n- **offset**: Offset of first result to fetch (default: 0)\n- **order**: Order results by (default: _score)\n\n- **highlight**: Commas separated list of fields to highlight\n- **snippets**: Commas separated list of fields to fetch snippets from\n\n- **match_type**: ElasticSearch match type (default: most_fields)\n- **match_operator**: ElasticSearch match operator (default: and)\n- **minscore**: Minimum score for a result to be returned (default: 0.0)\n\n### `download/<doctypes>`\n\nDownloads search results in either csv, xls or xlsx format.\n\nQuery parameters that can be used:\n- **types_formatted**: The type of the documents to search\n- **search_term**: The Elastic search query\n- **size**: Number of hits to return\n- **offset**: Whether or not term offsets should be returned\n- **filters**: What offset to use for the pagination\n- **dont_highlight**:\n- **from_date**: If there should be a date range applied to the search, and from what date\n- **to_date**: If there should be a date range applied to the search, and until what date\n- **order**:\n- **file_format**: The format of the file to be returned, either 'csv', 'xls' or 'xlsx'.\nIf not passed the file format will be xlsx\n- **file_name**: The name of the file to be returned, by default the name will be 'search_results'\n- **column_mapping**: If the columns should get a different name then in the\noriginal data, a column map can be send, for example:\n```\n{\n  \"\u05e2\u05d9\u05e8\": \"address.city\",\n  \"\u05ea\u05e7\u05e6\u05d9\u05d1\": \"details.budget\"\n}\n```\n\nFor example, get a csv file with column mapping:\n```\nhttp://localhost:5000/api/download/jobs?q=engineering&size=2&file_format=csv&file_name=my_results&column_mapping={%22mispar%22:%22Job%20ID%22}\n```\n\nOr get an xslx file without column mapping:\n```\nhttp://localhost:5000/api/download/jobs?q=engineering&size=2&file_format=xlsx&file_name=my_results\n```\n\n## configuration\n\nFlask configuration for this blueprint:\n\n\n```python\n\n    from apies import apies_blueprint\n    import elasticsearch\n\n    app.register_blueprint(\n        apies_blueprint(['path/to/datapackage.json', Package(), ...],\n                        elasticsearch.Elasticsearch(...), \n                        {'doc-type-1': 'index-for-doc-type-1', ...}, \n                        'index-for-documents',\n                        dont_highlight=['fields', 'not.to', 'highlight'],\n                        text_field_rules=lambda schema_field: [], # list of tuples: ('exact'/'inexact'/'natural', <field-name>)\n                        multi_match_type='most_fields',\n                        multi_match_operator='and'),\n        url_prefix='/search/'\n    )\n```\n\n## local development\n\nYou can start a local development server by following these steps:\n\n1. Install Dependencies:\n    \n    a. Install Docker locally\n    \n    b. Install Python dependencies:\n\n    ```bash\n    $ pip install dataflows dataflows-elasticsearch\n    $ pip install -e .\n    ```\n2. Go to the `sample/` directory\n3. Start ElasticSearch locally:\n   ```bash\n   $ ./start_elasticsearch.sh\n   ```\n\n   This script will wait and poll the server until it's up and running.\n   You can test it yourself by running:\n   ```bash\n   $ curl -s http://localhost:9200\n        {\n        \"name\" : \"99cd2db44924\",\n        \"cluster_name\" : \"docker-cluster\",\n        \"cluster_uuid\" : \"nF9fuwRyRYSzyQrcH9RCnA\",\n        \"version\" : {\n            \"number\" : \"7.4.2\",\n            \"build_flavor\" : \"default\",\n            \"build_type\" : \"docker\",\n            \"build_hash\" : \"2f90bbf7b93631e52bafb59b3b049cb44ec25e96\",\n            \"build_date\" : \"2019-10-28T20:40:44.881551Z\",\n            \"build_snapshot\" : false,\n            \"lucene_version\" : \"8.2.0\",\n            \"minimum_wire_compatibility_version\" : \"6.8.0\",\n            \"minimum_index_compatibility_version\" : \"6.0.0-beta1\"\n        },\n        \"tagline\" : \"You Know, for Search\"\n        }\n   ```\n4. Load data into the database\n   ```bash\n   $ DATAFLOWS_ELASTICSEARCH=localhost:9200 python load_fixtures.py\n   ```\n   You can test that data was loaded:\n   ```bash\n   $ curl -s http://localhost:9200/jobs-job/_count?pretty\n    {\n        \"count\" : 1757,\n        \"_shards\" : {\n            \"total\" : 1,\n            \"successful\" : 1,\n            \"skipped\" : 0,\n            \"failed\" : 0\n        }\n    }\n   ```\n5. Start the sample server\n   ```bash\n   $ python server.py \n    * Serving Flask app \"server\" (lazy loading)\n    * Environment: production\n    WARNING: Do not use the development server in a production environment.\n    Use a production WSGI server instead.\n    * Debug mode: off\n    * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)\n   ```  \n6. Now you can hit the server's endpoints, for example:\n   ```bash\n        $ curl -s 'localhost:5000/api/search/jobs?q=engineering&size=2' | jq\n        127.0.0.1 - - [26/Jun/2019 10:45:31] \"GET /api/search/jobs?q=engineering&size=2 HTTP/1.1\" 200 -\n        {\n            \"search_counts\": {\n                \"_current\": {\n                \"total_overall\": 617\n                }\n            },\n            \"search_results\": [\n                {\n                \"score\": 18.812,\n                \"source\": {\n                    \"# Of Positions\": \"5\",\n                    \"Additional Information\": \"TO BE APPOINTED TO ANY CIVIL <em>ENGINEERING</em> POSITION IN BRIDGES, CANDIDATES MUST POSSESS ONE YEAR OF CIVIL <em>ENGINEERING</em> EXPERIENCE IN BRIDGE DESIGN, BRIDGE CONSTRUCTION, BRIDGE MAINTENANCE OR BRIDGE INSPECTION.\",\n                    \"Agency\": \"DEPARTMENT OF TRANSPORTATION\",\n                    \"Business Title\": \"Civil Engineer 2\",\n                    \"Civil Service Title\": \"CIVIL ENGINEER\",\n                    \"Division/Work Unit\": \"<em>Engineering</em> Review & Support\",\n            ...\n        }\n    ```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A flask blueprint providing an API for accessing and searching an ElasticSearch index created from source datapackages",
    "version": "1.11.0",
    "project_urls": {
        "Homepage": "https://github.com/OpenBudget/apies"
    },
    "split_keywords": [
        "data"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b3670c58f4ba42757001eee056eb36916154cca73fe4888ab41809e20cc259eb",
                "md5": "1d1d6acc8719903b0a7288986c656b37",
                "sha256": "641ed6efb6850782064befb583c8b0b21b471723264523a9e286e75878cca415"
            },
            "downloads": -1,
            "filename": "apies-1.11.0-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1d1d6acc8719903b0a7288986c656b37",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": null,
            "size": 16757,
            "upload_time": "2024-04-18T14:35:25",
            "upload_time_iso_8601": "2024-04-18T14:35:25.835556Z",
            "url": "https://files.pythonhosted.org/packages/b3/67/0c58f4ba42757001eee056eb36916154cca73fe4888ab41809e20cc259eb/apies-1.11.0-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "81c1ee074c4f174f103f1d8e218730a19416dc83851ab6395e9de9265fc2db2e",
                "md5": "8b3685a36876340e32305f796d507c1b",
                "sha256": "25c25d03eb7c118cfec245578ae3c2ec54bda2e19e300e195260ae5b9ee7ea32"
            },
            "downloads": -1,
            "filename": "apies-1.11.0.tar.gz",
            "has_sig": false,
            "md5_digest": "8b3685a36876340e32305f796d507c1b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 19921,
            "upload_time": "2024-04-18T14:35:27",
            "upload_time_iso_8601": "2024-04-18T14:35:27.949145Z",
            "url": "https://files.pythonhosted.org/packages/81/c1/ee074c4f174f103f1d8e218730a19416dc83851ab6395e9de9265fc2db2e/apies-1.11.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-18 14:35:27",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "OpenBudget",
    "github_project": "apies",
    "travis_ci": true,
    "coveralls": false,
    "github_actions": false,
    "tox": true,
    "lcname": "apies"
}
        
Elapsed time: 0.24680s