bigquery-ml-utils


Namebigquery-ml-utils JSON
Version 1.3.0 PyPI version JSON
download
home_pagehttps://github.com/GoogleCloudPlatform/bigquery-ml-utils
SummaryBigQuery ML Utils
upload_time2024-02-28 02:24:13
maintainer
docs_urlNone
authorGoogle Inc.
requires_python
licenseApache 2.0
keywords bqml utils
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # BigQuery ML Utils

[BigQuery ML](https://cloud.google.com/bigquery-ml/docs/introduction) (aka.
BQML) lets you create and execute machine learning models in [BigQuery](https://cloud.google.com/bigquery/docs/introduction)
using standard SQL queries. The BigQuery ML Utils library is an integrated suite
of machine learning tools for building and using BigQuery ML models.


## Installation

Install this library in a [virtualenv](https://virtualenv.pypa.io/en/latest/)
using pip. [virtualenv](https://virtualenv.pypa.io/en/latest/) is a tool to
create isolated Python environments. The basic problem it addresses is one of
dependencies and versions, and indirectly permissions.

With [virtualenv](https://virtualenv.pypa.io/en/latest/), it's possible to
install this library without needing system install permissions, and without
clashing with the installed system
dependencies.

### Mac/Linux

```
    pip install virtualenv
    virtualenv <your-env>
    source <your-env>/bin/activate
    <your-env>/bin/pip install bigquery-ml-utils
```

### Windows

```
    pip install virtualenv
    virtualenv <your-env>
    <your-env>\Scripts\activate
    <your-env>\Scripts\pip.exe install bigquery-ml-utils
```

## Overview

### Inference

#### Transform Predictor

The Transform Predictor feeds input data into the BQML model trained with
TRANSFORM. It performs both preprocessing and postprocessing on the input and
output. The first argument is a [SavedModel](https://www.tensorflow.org/guide/saved_model/)
which represents the [TRANSFORM clause](https://cloud.google.com/bigquery-ml/docs/bigqueryml-transform/)
for feature preprocessing. The second argument is a
[SavedModel](https://www.tensorflow.org/guide/saved_model/) or
[XGBoost Booster](https://xgboost.readthedocs.io/en/latest/) which represents
the model logic.

#### XGBoost Predictor

The XGBoost Predictor feeds input data into the BQML XGBoost model. It performs
both preprocessing and postprocessing on the input and output. The first
argument is a [XGBoost Booster](https://xgboost.readthedocs.io/en/latest/) which
represents the model logic. The following arguments are model assets.

### Tensorflow Ops

BQML Tensorflow Custom Ops provides SQL functions ([Date functions](https://cloud.google.com/bigquery/docs/reference/standard-sql/date_functions),
[Datetime functions](https://cloud.google.com/bigquery/docs/reference/standard-sql/datetime_functions),
[Time functions](https://cloud.google.com/bigquery/docs/reference/standard-sql/time_functions)
and [Timestamp functions](https://cloud.google.com/bigquery/docs/reference/standard-sql/timestamp_functions))
that are not available in TensorFlow. The implementation and function behavior
align with the [BigQuery](https://cloud.google.com/bigquery). This is part of an
effort to bridge the gap between the SQL community and the Tensorflow community.
The following example returns the same result as `TIMESTAMP_ADD(timestamp_expression, INTERVAL int64_expression date_part)`

```
>>> timestamp = tf.constant(['2008-12-25 15:30:00+00', '2023-11-11 14:30:00+00'], dtype=tf.string)
>>> interval = tf.constant([200, 300], dtype=tf.int64)
>>> result = timestamp_ops.timestamp_add(timestamp, interval, 'MINUTE')
tf.Tensor([b'2008-12-25 18:50:00.0 +0000' b'2023-11-11 19:30:00.0 +0000'], shape=(2,), dtype=string)
```

Note: `/usr/share/zoneinfo` is needed for parsing time zone which might not be
available in your OS. You will need to install `tzdata` to generate it. For
example, add the following code in your Dockerfile.

```
RUN apt-get update && DEBIAN_FRONTEND="noninteractive" \
    TZ="America/Los_Angeles" apt-get install -y tzdata
```

### Model Generator

#### Text Embedding Model Generator

The Text Embedding Model Generator automatically loads a text embedding model
from Tensorflow hub and integrates a signature such that the resulting model can
be immediately integrated within BQML. Currently, the NNLM and BERT embedding
models can be selected.

##### NNLM Text Embedding Model

The [NNLM](https://tfhub.dev/google/nnlm-en-dim50-with-normalization/2) model
has a model size of <150MB and is recommended for phrases, news, tweets,
reviews, etc. NNLM does not carry any default signatures because it is designed
to be utilized as a Keras layer; however, the Text Embedding Model Generator
takes care of this.

##### SWIVEL Text Embedding Model

The [SWIVEL](https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1) model
has a model size of <150MB and is recommended for phrases, news, tweets,
reviews, etc. SWIVEL does not require pre-processing because the embedding model
already satisfies BQML imported model requirements. However, in order to align
signatures for NNLM, SWIVEL, and BERT, the Text Embedding Model Generator
establishes the same input label for SWIVEL.


##### BERT Text Embedding Model

The BERT model has a model size of ~200MB and is recommended for phrases, news,
tweets, reviews, paragraphs, etc. The [BERT](https://tfhub.dev/tensorflow/bert_en_cased_L-12_H-768_A-12/4) model does not carry any default signatures
because it is designed to be utilized as a Keras layer. The Text Embedding Model
Generator takes care of this and also integrates a [text preprocessing layer](https://tfhub.dev/tensorflow/bert_en_cased_preprocess/3) for BERT.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/GoogleCloudPlatform/bigquery-ml-utils",
    "name": "bigquery-ml-utils",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "bqml utils",
    "author": "Google Inc.",
    "author_email": "no-reply@google.com",
    "download_url": "",
    "platform": null,
    "description": "# BigQuery ML Utils\n\n[BigQuery ML](https://cloud.google.com/bigquery-ml/docs/introduction) (aka.\nBQML) lets you create and execute machine learning models in [BigQuery](https://cloud.google.com/bigquery/docs/introduction)\nusing standard SQL queries. The BigQuery ML Utils library is an integrated suite\nof machine learning tools for building and using BigQuery ML models.\n\n\n## Installation\n\nInstall this library in a [virtualenv](https://virtualenv.pypa.io/en/latest/)\nusing pip. [virtualenv](https://virtualenv.pypa.io/en/latest/) is a tool to\ncreate isolated Python environments. The basic problem it addresses is one of\ndependencies and versions, and indirectly permissions.\n\nWith [virtualenv](https://virtualenv.pypa.io/en/latest/), it's possible to\ninstall this library without needing system install permissions, and without\nclashing with the installed system\ndependencies.\n\n### Mac/Linux\n\n```\n    pip install virtualenv\n    virtualenv <your-env>\n    source <your-env>/bin/activate\n    <your-env>/bin/pip install bigquery-ml-utils\n```\n\n### Windows\n\n```\n    pip install virtualenv\n    virtualenv <your-env>\n    <your-env>\\Scripts\\activate\n    <your-env>\\Scripts\\pip.exe install bigquery-ml-utils\n```\n\n## Overview\n\n### Inference\n\n#### Transform Predictor\n\nThe Transform Predictor feeds input data into the BQML model trained with\nTRANSFORM. It performs both preprocessing and postprocessing on the input and\noutput. The first argument is a [SavedModel](https://www.tensorflow.org/guide/saved_model/)\nwhich represents the [TRANSFORM clause](https://cloud.google.com/bigquery-ml/docs/bigqueryml-transform/)\nfor feature preprocessing. The second argument is a\n[SavedModel](https://www.tensorflow.org/guide/saved_model/) or\n[XGBoost Booster](https://xgboost.readthedocs.io/en/latest/) which represents\nthe model logic.\n\n#### XGBoost Predictor\n\nThe XGBoost Predictor feeds input data into the BQML XGBoost model. It performs\nboth preprocessing and postprocessing on the input and output. The first\nargument is a [XGBoost Booster](https://xgboost.readthedocs.io/en/latest/) which\nrepresents the model logic. The following arguments are model assets.\n\n### Tensorflow Ops\n\nBQML Tensorflow Custom Ops provides SQL functions ([Date functions](https://cloud.google.com/bigquery/docs/reference/standard-sql/date_functions),\n[Datetime functions](https://cloud.google.com/bigquery/docs/reference/standard-sql/datetime_functions),\n[Time functions](https://cloud.google.com/bigquery/docs/reference/standard-sql/time_functions)\nand [Timestamp functions](https://cloud.google.com/bigquery/docs/reference/standard-sql/timestamp_functions))\nthat are not available in TensorFlow. The implementation and function behavior\nalign with the [BigQuery](https://cloud.google.com/bigquery). This is part of an\neffort to bridge the gap between the SQL community and the Tensorflow community.\nThe following example returns the same result as `TIMESTAMP_ADD(timestamp_expression, INTERVAL int64_expression date_part)`\n\n```\n>>> timestamp = tf.constant(['2008-12-25 15:30:00+00', '2023-11-11 14:30:00+00'], dtype=tf.string)\n>>> interval = tf.constant([200, 300], dtype=tf.int64)\n>>> result = timestamp_ops.timestamp_add(timestamp, interval, 'MINUTE')\ntf.Tensor([b'2008-12-25 18:50:00.0 +0000' b'2023-11-11 19:30:00.0 +0000'], shape=(2,), dtype=string)\n```\n\nNote: `/usr/share/zoneinfo` is needed for parsing time zone which might not be\navailable in your OS. You will need to install `tzdata` to generate it. For\nexample, add the following code in your Dockerfile.\n\n```\nRUN apt-get update && DEBIAN_FRONTEND=\"noninteractive\" \\\n    TZ=\"America/Los_Angeles\" apt-get install -y tzdata\n```\n\n### Model Generator\n\n#### Text Embedding Model Generator\n\nThe Text Embedding Model Generator automatically loads a text embedding model\nfrom Tensorflow hub and integrates a signature such that the resulting model can\nbe immediately integrated within BQML. Currently, the NNLM and BERT embedding\nmodels can be selected.\n\n##### NNLM Text Embedding Model\n\nThe [NNLM](https://tfhub.dev/google/nnlm-en-dim50-with-normalization/2) model\nhas a model size of <150MB and is recommended for phrases, news, tweets,\nreviews, etc. NNLM does not carry any default signatures because it is designed\nto be utilized as a Keras layer; however, the Text Embedding Model Generator\ntakes care of this.\n\n##### SWIVEL Text Embedding Model\n\nThe [SWIVEL](https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1) model\nhas a model size of <150MB and is recommended for phrases, news, tweets,\nreviews, etc. SWIVEL does not require pre-processing because the embedding model\nalready satisfies BQML imported model requirements. However, in order to align\nsignatures for NNLM, SWIVEL, and BERT, the Text Embedding Model Generator\nestablishes the same input label for SWIVEL.\n\n\n##### BERT Text Embedding Model\n\nThe BERT model has a model size of ~200MB and is recommended for phrases, news,\ntweets, reviews, paragraphs, etc. The [BERT](https://tfhub.dev/tensorflow/bert_en_cased_L-12_H-768_A-12/4) model does not carry any default signatures\nbecause it is designed to be utilized as a Keras layer. The Text Embedding Model\nGenerator takes care of this and also integrates a [text preprocessing layer](https://tfhub.dev/tensorflow/bert_en_cased_preprocess/3) for BERT.\n",
    "bugtrack_url": null,
    "license": "Apache 2.0",
    "summary": "BigQuery ML Utils",
    "version": "1.3.0",
    "project_urls": {
        "Homepage": "https://github.com/GoogleCloudPlatform/bigquery-ml-utils"
    },
    "split_keywords": [
        "bqml",
        "utils"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4e1cd77a3ca33bba6f20adc33f7d03c376cf93e3b9d53532681dc157c82dc2da",
                "md5": "8bd21c6fb65a74a63f4940067453b631",
                "sha256": "0f330cf16b701072ca5b0fa37847ebf95225542f5064e4ff1c63462b9b2f01a1"
            },
            "downloads": -1,
            "filename": "bigquery_ml_utils-1.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "8bd21c6fb65a74a63f4940067453b631",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": null,
            "size": 8102138,
            "upload_time": "2024-02-28T02:24:13",
            "upload_time_iso_8601": "2024-02-28T02:24:13.589432Z",
            "url": "https://files.pythonhosted.org/packages/4e/1c/d77a3ca33bba6f20adc33f7d03c376cf93e3b9d53532681dc157c82dc2da/bigquery_ml_utils-1.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "cb21a248dfb5ba79facd69a1ebd4d8c15ff8897e8c9de2ab395e91705f598204",
                "md5": "fb32a9d4229ba7409214a7615483baee",
                "sha256": "46d3d46d0f339c0f4bb329bd4cd6929ffeb64c621477b5577caa13db98e39aaa"
            },
            "downloads": -1,
            "filename": "bigquery_ml_utils-1.3.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "fb32a9d4229ba7409214a7615483baee",
            "packagetype": "bdist_wheel",
            "python_version": "cp37",
            "requires_python": null,
            "size": 8102136,
            "upload_time": "2024-02-28T02:24:17",
            "upload_time_iso_8601": "2024-02-28T02:24:17.509385Z",
            "url": "https://files.pythonhosted.org/packages/cb/21/a248dfb5ba79facd69a1ebd4d8c15ff8897e8c9de2ab395e91705f598204/bigquery_ml_utils-1.3.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4f8bc2b1dfcb93e260f05ead09f5025b6d9b314dcd7dbcac63f91f64560967ce",
                "md5": "9c5b6c62be088fb5b5f060efd50da908",
                "sha256": "098fd79b6d28415ea88176dbaf64bc6860109e702f95795473d3fcc8718fd571"
            },
            "downloads": -1,
            "filename": "bigquery_ml_utils-1.3.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "9c5b6c62be088fb5b5f060efd50da908",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": null,
            "size": 8102138,
            "upload_time": "2024-02-28T02:24:19",
            "upload_time_iso_8601": "2024-02-28T02:24:19.545725Z",
            "url": "https://files.pythonhosted.org/packages/4f/8b/c2b1dfcb93e260f05ead09f5025b6d9b314dcd7dbcac63f91f64560967ce/bigquery_ml_utils-1.3.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "bfbb44059c759634d0c411d87443b75d346ba3bd1093ca1c42b4083a98aa566a",
                "md5": "77261d6d3b2a020ad042200c2ff1c0e9",
                "sha256": "27606f4d9b750758ae89719eb543ed65a3c05b0fdde4b10b5ccb7f1e15860fae"
            },
            "downloads": -1,
            "filename": "bigquery_ml_utils-1.3.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "77261d6d3b2a020ad042200c2ff1c0e9",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": null,
            "size": 8102138,
            "upload_time": "2024-02-28T02:24:23",
            "upload_time_iso_8601": "2024-02-28T02:24:23.589996Z",
            "url": "https://files.pythonhosted.org/packages/bf/bb/44059c759634d0c411d87443b75d346ba3bd1093ca1c42b4083a98aa566a/bigquery_ml_utils-1.3.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-28 02:24:13",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "GoogleCloudPlatform",
    "github_project": "bigquery-ml-utils",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "bigquery-ml-utils"
}
        
Elapsed time: 0.19920s