# FHIR Views
## Introduction
FHIR Views is a way to define simple, tabular views over complex FHIR data and
turn them into queries that use
[SQL on FHIR conventions](https://github.com/FHIR/sql-on-fhir), or other data
sources in the future. It is installed as part of a simple `pip install
google-fhir-views[r4,bigquery]` command.
FHIR Views has two main concepts:
* A view *definition*, which defines the fields and criteria created by a
view. It provides a Python API for convenience, but ultimately a
view definition is a set of FHIRPath expressions that we'll explore below.
* A view *runner*, which creates that view over some data source.
For example, let's create a simple view of patient resources for patients born
before 1960:
```py
import datetime
from google.fhir.views import bigquery_runner, r4
# Load views based on the base FHIR R4 profile definitions.
views = r4.base_r4()
# Creates a view using the base patient profile.
pats = views.view_of('Patient')
# In this case we interpret the 'current' address as one where period is empty.
# This can be adjusted to meet the needs of a specific dataset.
current = pats.address.where(pats.address.period.empty()).first()
simple_pats = pats.select([
pats.id.alias('id'),
pats.gender.alias('gender'),
pats.birthDate.alias('birthdate'),
current.line.first().alias('street'),
current.city.alias('city'),
current.state.alias('state'),
current.postalCode.alias('zip')
]).where(
pats.birthDate < datetime.date(1960,1,1))
```
With support for
[SQL on FHIR v2](https://build.fhir.org/ig/FHIR/sql-on-fhir-v2/), the above
view can also be defined as:
```py
simple_pats_config = {
'resource': 'Patient',
'select': [
{
'alias': 'id',
'path': 'id',
},
{
'alias': 'gender',
'path': 'gender',
},
{
'alias': 'birthDate',
'path': 'birthDate',
},
{
'alias': 'street',
'path': 'address.where(address.period.empty()).first().line.first()',
},
{
'alias': 'city',
'path': 'address.where(address.period.empty()).first().city',
},
{
'alias': 'state',
'path': 'address.where(address.period.empty()).first().state',
},
{
'alias': 'zip',
'path': 'address.where(address.period.empty()).first().postalCode',
},
],
'where': [
{'path': 'birthDate < @1960-01-01'},
],
}
views = r4.base_r4()
simple_pats = views.from_view_definition(simple_pats_config)
```
If you run the above in a Jupyter notebook or similar tool, you'll notice that
the view builder supports tab suggestions that matches the fields in the FHIR
resource of the given profile. In fact, this is just a Pythonic way to build
FHIRPath expressions to be used by the runner, with suggestions available
by just pressing tab:
![tab suggestion image](./tab_suggest_example.png)
That builder is convenient for Python users, but you can also see the FHIRPath
expression themselves by just getting the string representation of the view,
such as by running `print(simple_pats)`. Notice every column and the 'where'
criteria are defined by FHIRPath expressions, while every column must define
its name in the alias() function appended to the FHIRPath:
```
View<http://hl7.org/fhir/StructureDefinition/Patient.select(
id.alias(id),
gender.alias(gender),
birthDate.alias(birthdate),
address.where(period.empty()).first().line.first().alias(street),
address.where(period.empty()).first().city.alias(city),
address.where(period.empty()).first().state.alias(state),
address.where(period.empty()).first().postalCode.alias(zip)
).where(
birthDate < @1960-01-01
)>
```
In other words, any _runner_ implementation would basically use the FHIRPath
expressions to select and filter the underlying data. The example below
will use a BigQuery runner, which translates FHIRPath expressions into SQL,
but runners in Apache Spark and directly on JSON will follow. This could
also be exported as a simple JSON structure and passed to remote services
to evaluate the FHIRPath expressions and produce a view for the user.
Now that we've defined a view, let's run it against a real dataset. We'll run
this over BigQuery:
```py
# Get a BigQuery client. This may require additional authentication to access
# BigQuery, depending on your notebook environment. Typically the client
# and runner are created only once at the start of a notebook.
from google.cloud import bigquery as bq
client = bq.Client()
runner = bigquery_runner.BigQueryRunner(
client,
fhir_dataset='bigquery-public-data.fhir_synthea',
snake_case_resource_tables=True)
runner.to_dataframe(simple_pats, limit = 5)
```
Which produces this table:
| | id | gender | birthdate | street | city | state | zip |
|---:|:-------------------------------------|:---------|:------------|:-------------------------|:------------|:--------------|------:|
| 0 | 6759d2b7-38b4-4798-97c0-d171a53e013a | male | 1916-03-21 | 659 Bayer Wall Apt 61 | Boston | Massachusetts | 02108 |
| 1 | 41dbee4d-d355-413f-a040-93ca037fe646 | male | 1951-12-05 | 226 Sipes Ranch Unit 37 | Lynnfield | Massachusetts | 01940 |
| 2 | e194d708-8989-4e0c-a8e1-eda7351672ce | male | 1947-09-24 | 638 Pouros Wall Suite 52 | Lynnfield | Massachusetts | 01940 |
| 3 | 4bccdc85-c040-45dd-ada3-a55064439a01 | male | 1943-06-20 | 825 Jakubowski Extension | Tewksbury | Massachusetts | 01876 |
| 4 | 8dca4c3c-d2d5-460f-9168-5f18e5d29b2b | male | 1945-12-13 | 319 Cronin Light | Hubbardston | Massachusetts | 01452 |
That's it! Now the returned dataframe contains a table of the example patients
described in the query, pulled from the FHIR data stored in BigQuery. Examples
below will show more sophisticated use cases such as turning a FHIR view into
a BigQuery virtual view or incorporating clinical content from code value sets.
At this time we support a BigQuery runner to consume FHIR data in BigQuery as
our data source, but future runners may support other data stores, FHIR servers,
or FHIR bulk extracts on disk.
## Working with code values
Most meaningful analysis of healthcare data involves navigating clinical
terminologies. In some cases these value sets come from an established authority
like the [Value Set Authority Center](https://vsac.nlm.nih.gov/), and other
times they are defined and maintained locally for custom use cases.
FHIR Views offers a convenient mechanism to create and use such value sets in
your queries. Here is an example that defines a collection of LOINC codes
indicating LDL results:
```py
LDL_TEST = r4.value_set('urn:example:value_set:ldl').with_codes(
'http://loinc.org', ['18262-6', '18261-8', '12773-8']).build()
```
Now we can easily query observations with a view that uses the FHIRPath
`memberOf` function:
```py
# Creates the base observation view for convenience, typically done once per
# base type in a notebook.
obs = views.view_of('Observation')
ldl_obs = obs.select([
obs.subject.idFor('Patient').alias('patient'),
# Below is a Pythonic shorthand -- users could type
# `obs.value.ofType('Quantity').value` instead for the FHIRPath ofType
# expression, but the shorthand helps autocompletion
obs.valueQuantity.value.alias('value'),
obs.valueQuantity.unit.alias('unit'),
obs.code.coding.display.first().alias('test'),
obs.effectiveDateTime.alias('effectiveTime')
]).where(obs.code.memberOf(LDL_TEST))
runner.to_dataframe(ldl_obs, limit=5)
```
| | patient | value | unit | test | effectiveTime |
|---:|:-------------------------------------|---------:|:-------|:------------------------------------|:--------------------------|
| 0 | 903156da-ca5d-4ec3-ad36-073a9437afe4 | 153.058 | mg/dL | Low Density Lipoprotein Cholesterol | 2014-06-20 11:30:15+00:00 |
| 1 | 3d268dce-fed4-4bc7-b156-c78e810c5183 | 149.379 | mg/dL | Low Density Lipoprotein Cholesterol | 2013-06-10 16:20:36+00:00 |
| 2 | fdf7c87b-1c8f-4d09-8d51-e622f747a7c8 | 88.047 | mg/dL | Low Density Lipoprotein Cholesterol | 2013-10-07 00:08:45+00:00 |
| 3 | 9007c0ff-a0ad-48dc-adc2-c0908c06fba8 | 108.145 | mg/dL | Low Density Lipoprotein Cholesterol | 2016-03-18 10:31:54+00:00 |
| 4 | cc5a2dd6-37b6-4f15-9da7-53f3b85e3370 | 64.5849 | mg/dL | Low Density Lipoprotein Cholesterol | 2012-05-26 13:27:46+00:00 |
## Working with external value sets and terminology services
You can also work with value sets defined by external terminology services. To
do so, you must first create a terminology service client.
This example uses the UMLS terminology service from the NIH. In order access
this terminology service, you need to
[sign up here.](https://uts.nlm.nih.gov/uts/signup-login) You should then enter
the API key found [on your profile page](https://uts.nlm.nih.gov/uts/profile) in
the place of 'your-umls-api-key' below.
```py
from google.fhir.r4.terminology import terminology_service_client
tx_client = terminology_service_client.TerminologyServiceClient({
'http://cts.nlm.nih.gov/fhir/': ('apikey', 'your-umls-api-key'),
})
```
Before making queries against an externally-defined value set, you must first
get the codes defined by the value set and write them to a BigQuery table. You
only need to perform this step once. After doing so, you'll be able to reference
the value set definitions you've written in future queries.
```py
injury_value_set_url = 'http://cts.nlm.nih.gov/fhir/ValueSet/2.16.840.1.113762.1.4.1029.5'
wound_disorder_value_set_url = 'http://cts.nlm.nih.gov/fhir/ValueSet/2.16.840.1.113762.1.4.1219.178'
runner.materialize_value_set_expansion((injury_value_set_url, wound_disorder_value_set_url), tx_client)
```
To make queries against an externally-defined value set which you've saved to
BigQuery, you can simply refer to its URL.
```py
injury_conds = cond.select([
cond.id.alias('id'),
cond.subject.idFor('Patient').alias('patientId'),
cond.code.alias('codes')
]).where(cond.code.memberOf(injury_value_set_url))
runner.create_database_view(injury_conds, 'injury_conditions')
```
## Saving FHIR Views as BigQuery Views
While `runner.to_dataframe` is convenient to retrieve data for local analysis,
it's often useful to create such flattened views in BigQuery itself. They can be
easily queried with much simpler SQL, or used by a variety of business
intelligence or other data analysis tools.
For this reason, the BigQueryRunner offers a `create_database_view` method that
will convert the view definition into a
[BigQuery View](https://cloud.google.com/bigquery/docs/views), which can then
just be consumed as if it was a first-class table that is updated when the
underlying data is updated. Here's an example:
```py
runner.create_database_view(ldl_obs, 'ldl_observations')
```
By default the view is created in the fhir_dataset used by the runner, but this
isn't always desirable (for example, a user may want to do their analysis in
their own, isolated dataset). Therefore it's common to specify a `view_dataset`
when creating the runner as the target for any views created. Here's an example:
```py
runner = bigquery_runner.BigQueryRunner(
client,
fhir_dataset='bigquery-public-data.fhir_synthea',
view_dataset='example_project.diabetic_care_example')
```
Raw data
{
"_id": null,
"home_page": "https://github.com/google/fhir-py",
"name": "google-fhir-views",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.12,>=3.8",
"maintainer_email": null,
"keywords": "google, fhir, python, healthcare",
"author": "Google LLC",
"author_email": "google-fhir-pypi@google.com",
"download_url": "https://github.com/google/fhir-py/releases",
"platform": null,
"description": "# FHIR Views\n\n## Introduction\n\nFHIR Views is a way to define simple, tabular views over complex FHIR data and\nturn them into queries that use\n[SQL on FHIR conventions](https://github.com/FHIR/sql-on-fhir), or other data\nsources in the future. It is installed as part of a simple `pip install\ngoogle-fhir-views[r4,bigquery]` command.\n\nFHIR Views has two main concepts:\n\n* A view *definition*, which defines the fields and criteria created by a\n view. It provides a Python API for convenience, but ultimately a\n view definition is a set of FHIRPath expressions that we'll explore below.\n* A view *runner*, which creates that view over some data source.\n\nFor example, let's create a simple view of patient resources for patients born\nbefore 1960:\n\n```py\nimport datetime\nfrom google.fhir.views import bigquery_runner, r4\n\n# Load views based on the base FHIR R4 profile definitions.\nviews = r4.base_r4()\n\n# Creates a view using the base patient profile.\npats = views.view_of('Patient')\n\n# In this case we interpret the 'current' address as one where period is empty.\n# This can be adjusted to meet the needs of a specific dataset.\ncurrent = pats.address.where(pats.address.period.empty()).first()\n\nsimple_pats = pats.select([\n pats.id.alias('id'),\n pats.gender.alias('gender'),\n pats.birthDate.alias('birthdate'),\n current.line.first().alias('street'),\n current.city.alias('city'),\n current.state.alias('state'),\n current.postalCode.alias('zip')\n ]).where(\n pats.birthDate < datetime.date(1960,1,1))\n```\n\nWith support for\n[SQL on FHIR v2](https://build.fhir.org/ig/FHIR/sql-on-fhir-v2/), the above\nview can also be defined as:\n\n```py\nsimple_pats_config = {\n 'resource': 'Patient',\n 'select': [\n {\n 'alias': 'id',\n 'path': 'id',\n },\n {\n 'alias': 'gender',\n 'path': 'gender',\n },\n {\n 'alias': 'birthDate',\n 'path': 'birthDate',\n },\n {\n 'alias': 'street',\n 'path': 'address.where(address.period.empty()).first().line.first()',\n },\n {\n 'alias': 'city',\n 'path': 'address.where(address.period.empty()).first().city',\n },\n {\n 'alias': 'state',\n 'path': 'address.where(address.period.empty()).first().state',\n },\n {\n 'alias': 'zip',\n 'path': 'address.where(address.period.empty()).first().postalCode',\n },\n ],\n 'where': [\n {'path': 'birthDate < @1960-01-01'},\n ],\n}\n\nviews = r4.base_r4()\nsimple_pats = views.from_view_definition(simple_pats_config)\n```\n\nIf you run the above in a Jupyter notebook or similar tool, you'll notice that\nthe view builder supports tab suggestions that matches the fields in the FHIR\nresource of the given profile. In fact, this is just a Pythonic way to build\nFHIRPath expressions to be used by the runner, with suggestions available\nby just pressing tab:\n\n![tab suggestion image](./tab_suggest_example.png)\n\nThat builder is convenient for Python users, but you can also see the FHIRPath\nexpression themselves by just getting the string representation of the view,\nsuch as by running `print(simple_pats)`. Notice every column and the 'where'\ncriteria are defined by FHIRPath expressions, while every column must define\nits name in the alias() function appended to the FHIRPath:\n\n```\nView<http://hl7.org/fhir/StructureDefinition/Patient.select(\n id.alias(id),\n gender.alias(gender),\n birthDate.alias(birthdate),\n address.where(period.empty()).first().line.first().alias(street),\n address.where(period.empty()).first().city.alias(city),\n address.where(period.empty()).first().state.alias(state),\n address.where(period.empty()).first().postalCode.alias(zip)\n).where(\n birthDate < @1960-01-01\n)>\n```\n\nIn other words, any _runner_ implementation would basically use the FHIRPath\nexpressions to select and filter the underlying data. The example below\nwill use a BigQuery runner, which translates FHIRPath expressions into SQL,\nbut runners in Apache Spark and directly on JSON will follow. This could\nalso be exported as a simple JSON structure and passed to remote services\nto evaluate the FHIRPath expressions and produce a view for the user.\n\nNow that we've defined a view, let's run it against a real dataset. We'll run\nthis over BigQuery:\n\n```py\n# Get a BigQuery client. This may require additional authentication to access\n# BigQuery, depending on your notebook environment. Typically the client\n# and runner are created only once at the start of a notebook.\nfrom google.cloud import bigquery as bq\nclient = bq.Client()\nrunner = bigquery_runner.BigQueryRunner(\n client,\n fhir_dataset='bigquery-public-data.fhir_synthea',\n snake_case_resource_tables=True)\n\nrunner.to_dataframe(simple_pats, limit = 5)\n```\n\nWhich produces this table:\n\n| | id | gender | birthdate | street | city | state | zip |\n|---:|:-------------------------------------|:---------|:------------|:-------------------------|:------------|:--------------|------:|\n| 0 | 6759d2b7-38b4-4798-97c0-d171a53e013a | male | 1916-03-21 | 659 Bayer Wall Apt 61 | Boston | Massachusetts | 02108 |\n| 1 | 41dbee4d-d355-413f-a040-93ca037fe646 | male | 1951-12-05 | 226 Sipes Ranch Unit 37 | Lynnfield | Massachusetts | 01940 |\n| 2 | e194d708-8989-4e0c-a8e1-eda7351672ce | male | 1947-09-24 | 638 Pouros Wall Suite 52 | Lynnfield | Massachusetts | 01940 |\n| 3 | 4bccdc85-c040-45dd-ada3-a55064439a01 | male | 1943-06-20 | 825 Jakubowski Extension | Tewksbury | Massachusetts | 01876 |\n| 4 | 8dca4c3c-d2d5-460f-9168-5f18e5d29b2b | male | 1945-12-13 | 319 Cronin Light | Hubbardston | Massachusetts | 01452 |\n\n\nThat's it! Now the returned dataframe contains a table of the example patients\ndescribed in the query, pulled from the FHIR data stored in BigQuery. Examples\nbelow will show more sophisticated use cases such as turning a FHIR view into\na BigQuery virtual view or incorporating clinical content from code value sets.\n\nAt this time we support a BigQuery runner to consume FHIR data in BigQuery as\nour data source, but future runners may support other data stores, FHIR servers,\nor FHIR bulk extracts on disk.\n\n## Working with code values\n\nMost meaningful analysis of healthcare data involves navigating clinical\nterminologies. In some cases these value sets come from an established authority\nlike the [Value Set Authority Center](https://vsac.nlm.nih.gov/), and other\ntimes they are defined and maintained locally for custom use cases.\n\nFHIR Views offers a convenient mechanism to create and use such value sets in\nyour queries. Here is an example that defines a collection of LOINC codes\nindicating LDL results:\n\n```py\nLDL_TEST = r4.value_set('urn:example:value_set:ldl').with_codes(\n 'http://loinc.org', ['18262-6', '18261-8', '12773-8']).build()\n```\n\nNow we can easily query observations with a view that uses the FHIRPath\n`memberOf` function:\n\n```py\n# Creates the base observation view for convenience, typically done once per\n# base type in a notebook.\nobs = views.view_of('Observation')\n\nldl_obs = obs.select([\n obs.subject.idFor('Patient').alias('patient'),\n # Below is a Pythonic shorthand -- users could type\n # `obs.value.ofType('Quantity').value` instead for the FHIRPath ofType\n # expression, but the shorthand helps autocompletion\n obs.valueQuantity.value.alias('value'),\n obs.valueQuantity.unit.alias('unit'),\n obs.code.coding.display.first().alias('test'),\n obs.effectiveDateTime.alias('effectiveTime')\n ]).where(obs.code.memberOf(LDL_TEST))\n\nrunner.to_dataframe(ldl_obs, limit=5)\n```\n\n| | patient | value | unit | test | effectiveTime |\n|---:|:-------------------------------------|---------:|:-------|:------------------------------------|:--------------------------|\n| 0 | 903156da-ca5d-4ec3-ad36-073a9437afe4 | 153.058 | mg/dL | Low Density Lipoprotein Cholesterol | 2014-06-20 11:30:15+00:00 |\n| 1 | 3d268dce-fed4-4bc7-b156-c78e810c5183 | 149.379 | mg/dL | Low Density Lipoprotein Cholesterol | 2013-06-10 16:20:36+00:00 |\n| 2 | fdf7c87b-1c8f-4d09-8d51-e622f747a7c8 | 88.047 | mg/dL | Low Density Lipoprotein Cholesterol | 2013-10-07 00:08:45+00:00 |\n| 3 | 9007c0ff-a0ad-48dc-adc2-c0908c06fba8 | 108.145 | mg/dL | Low Density Lipoprotein Cholesterol | 2016-03-18 10:31:54+00:00 |\n| 4 | cc5a2dd6-37b6-4f15-9da7-53f3b85e3370 | 64.5849 | mg/dL | Low Density Lipoprotein Cholesterol | 2012-05-26 13:27:46+00:00 |\n\n\n## Working with external value sets and terminology services\n\nYou can also work with value sets defined by external terminology services. To\ndo so, you must first create a terminology service client.\n\nThis example uses the UMLS terminology service from the NIH. In order access\nthis terminology service, you need to\n[sign up here.](https://uts.nlm.nih.gov/uts/signup-login) You should then enter\nthe API key found [on your profile page](https://uts.nlm.nih.gov/uts/profile) in\nthe place of 'your-umls-api-key' below.\n\n```py\nfrom google.fhir.r4.terminology import terminology_service_client\n\ntx_client = terminology_service_client.TerminologyServiceClient({\n 'http://cts.nlm.nih.gov/fhir/': ('apikey', 'your-umls-api-key'),\n})\n```\n\nBefore making queries against an externally-defined value set, you must first\nget the codes defined by the value set and write them to a BigQuery table. You\nonly need to perform this step once. After doing so, you'll be able to reference\nthe value set definitions you've written in future queries.\n\n```py\ninjury_value_set_url = 'http://cts.nlm.nih.gov/fhir/ValueSet/2.16.840.1.113762.1.4.1029.5'\nwound_disorder_value_set_url = 'http://cts.nlm.nih.gov/fhir/ValueSet/2.16.840.1.113762.1.4.1219.178'\nrunner.materialize_value_set_expansion((injury_value_set_url, wound_disorder_value_set_url), tx_client)\n```\n\nTo make queries against an externally-defined value set which you've saved to\nBigQuery, you can simply refer to its URL.\n\n```py\ninjury_conds = cond.select([\n cond.id.alias('id'),\n cond.subject.idFor('Patient').alias('patientId'),\n cond.code.alias('codes')\n ]).where(cond.code.memberOf(injury_value_set_url))\n\nrunner.create_database_view(injury_conds, 'injury_conditions')\n```\n\n## Saving FHIR Views as BigQuery Views\n\nWhile `runner.to_dataframe` is convenient to retrieve data for local analysis,\nit's often useful to create such flattened views in BigQuery itself. They can be\neasily queried with much simpler SQL, or used by a variety of business\nintelligence or other data analysis tools.\n\nFor this reason, the BigQueryRunner offers a `create_database_view` method that\nwill convert the view definition into a\n[BigQuery View](https://cloud.google.com/bigquery/docs/views), which can then\njust be consumed as if it was a first-class table that is updated when the\nunderlying data is updated. Here's an example:\n\n```py\nrunner.create_database_view(ldl_obs, 'ldl_observations')\n```\n\nBy default the view is created in the fhir_dataset used by the runner, but this\nisn't always desirable (for example, a user may want to do their analysis in\ntheir own, isolated dataset). Therefore it's common to specify a `view_dataset`\nwhen creating the runner as the target for any views created. Here's an example:\n\n```py\nrunner = bigquery_runner.BigQueryRunner(\n client,\n fhir_dataset='bigquery-public-data.fhir_synthea',\n view_dataset='example_project.diabetic_care_example')\n```\n",
"bugtrack_url": null,
"license": "Apache 2.0",
"summary": "Tools to create views of FHIR data for analysis.",
"version": "0.11.0",
"project_urls": {
"Download": "https://github.com/google/fhir-py/releases",
"Homepage": "https://github.com/google/fhir-py"
},
"split_keywords": [
"google",
" fhir",
" python",
" healthcare"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "b76efe7a5251a0aeffdcedc7881f5194bbe1f0e712e0b2dc35d102654db2b83d",
"md5": "d194def5c040259dbf3121d8b737b14e",
"sha256": "60d2295974363bc29d465fb9237c385fc260ec561b6649f935a9bc9533f6bdcb"
},
"downloads": -1,
"filename": "google_fhir_views-0.11.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "d194def5c040259dbf3121d8b737b14e",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.12,>=3.8",
"size": 65792,
"upload_time": "2024-07-15T21:51:30",
"upload_time_iso_8601": "2024-07-15T21:51:30.683921Z",
"url": "https://files.pythonhosted.org/packages/b7/6e/fe7a5251a0aeffdcedc7881f5194bbe1f0e712e0b2dc35d102654db2b83d/google_fhir_views-0.11.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-07-15 21:51:30",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "google",
"github_project": "fhir-py",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "google-fhir-views"
}