gooddata-flight-server


Namegooddata-flight-server JSON
Version 1.33.0 PyPI version JSON
download
home_pageNone
SummaryFlight RPC server to host custom functions
upload_time2024-12-12 12:30:40
maintainerNone
docs_urlNone
authorGoodData
requires_python>=3.9.0
licenseMIT
keywords gooddata flight rpc flight rpc custom functions analytics headless business intelligence headless-bi cloud native semantic layer sql metrics
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # GoodData Flight Server

The GoodData Flight Server is an opinionated, pluggable Flight RPC Server implementation.

It builds on top of the Flight RPC components provided by [PyArrow](https://pypi.org/project/pyarrow/) and
on functions and capabilities typically needed when building production-ready
Flight RPC data services:

- A robust configuration system leveraging [Dynaconf](https://www.dynaconf.com/)
- Enablement of data service observability (logging, metrics, tracing)
- Health checking exposed via liveness and readiness endpoints
- Token-based authentication with pluggable token verification methods

Next to this, the server also comes with infrastructure that you can leverage
for building data service functionality itself:

- Library for generating and serving Flights created using long-running tasks
- Extendable error handling infrastructure that allows your service to
  provide error information in structured manner

Code in this package is derived from our production codebase, where we run
and operate many different data services and have this infrastructure proven
and battle-tested.

## Getting Started

The `gooddata-flight-server` package is like any other. You can install it
using `pip install gooddata-flight-server` or - more common - add it as dependency
to your project.

The server takes care of all the boilerplate, and you take care of implementing
the Flight RPC methods - similar as you would implement them using PyArrow's Flight
server.

Here is a very simple example of the data service's Flight RPC methods implementation:

```python
import gooddata_flight_server as gf
import pyarrow.flight


class DataServiceMethods(gf.FlightServerMethods):
  """
  This example data service serves some sample static data. Any
  DoGet request will return that static data. All other Flight RPC
  methods are left unimplemented.

  Note how the class holds onto the `ServerContext` - the implementations
  will usually want to do this because the context contains additional
  dependencies such as:

  - Location to send out in FlightInfo
  - Health monitor that the implementation can use to indicate
    its status
  - Task executor to perform long-running tasks
  """

  StaticData = pyarrow.table({
    "col1": [1, 2, 3]
  })

  def __init__(self, ctx: gf.ServerContext) -> None:
    self._ctx = ctx

  def do_get(self,
             context: pyarrow.flight.ServerCallContext,
             ticket: pyarrow.flight.Ticket
             ) -> pyarrow.flight.FlightDataStream:
    return pyarrow.flight.RecordBatchStream(
      self.StaticData
    )


@gf.flight_server_methods
def my_service(ctx: gf.ServerContext) -> gf.FlightServerMethods:
  """
  Factory function for the data service. It returns implementation of Flight RPC
  methods which are then integrated into the server.

  The ServerContext passed in `ctx` allows you to access available configuration
  and various useful server components.
  """
  return DataServiceMethods(ctx)


if __name__ == "__main__":
  # additional options & config files can be passed to the
  # create_server methods; more on this later
  server = gf.create_server(my_service)
  server.start()

  # the main thread will block on this call
  #
  # SIGINT/SIGTERM causes graceful shutdown - the method will
  # exit once server is stopped.
  server.wait_for_stop()
```

Notice the annotated `my_service` function. This is a factory for your data service's
Flight RPC methods. The server will call it out at appropriate time during the startup.
It will pass you the full context available at the time from where your code can access:

- available configuration loaded using Dynaconf
- health-checking components
- components to use for running long-running tasks.

During startup, the server will register signal handlers for SIGINT and SIGTERM - it will
perform graceful shutdown and tear everything down in the correct order when it receives them.

The server also comes with a simple CLI that you can use to start it up and load particular
data service:

```shell
$ gooddata-flight-server start --methods-provider my_service.main
```

The CLI will import the `my_service.main` Python module and look for a function decorated
with `@flight_server_methods`. It will start the server and make it initialize your data service
implementation and integrate it into the Flight RPC server.

Without any configuration, the server will bind to `127.0.0.1:17001` and run without TLS and not
use any authentication. It will not start health check or metric endpoints and will not start
the OpenTelemetry exporters.

NOTE: the CLI also has other arguments that let you specify configuration files to load and
logging configuration to use.

### Configuration

The server uses [Dynaconf](https://www.dynaconf.com/) to for all its configuration. There are
many settings already in place to influence server's configuration and behavior. Your data service
code can also leverage Dynaconf config to configure itself: you can pass any number of configuration
files / env variables at startup; the server will load them all using Dynaconf and let your code
work with Dynaconf structures.

We recommend you to check out the Dynaconf documentation to learn more about how it works and
what are the capabilities. This text will only highlight the most common usage.

The available server settings are documented in the [sample-config.toml](sample-config.toml).
You can take this and use it as template for your own configuration.

To use a configuration file during startup, you can start the server like this:

```shell
$ gooddata-flight-server start \
  --methods-provider my_service.main \
  --config server.config.toml
```

In case your service needs its own configuration, it is often a good idea to keep it in
a separate file and add that to startup:

```shell
$ gooddata-flight-server start \
  --methods-provider my_service.main \
  --config server.config.toml my_service.config.toml
```

#### Environment variables

All settings that you can code into the config file can be also provided using environment
variables.

The server's Dynaconf integration is set up so that all environment variables are
expected to be prefixed with `GOODDATA_FLIGHT_`.

The environment variable naming convention is set up by Dynaconf and goes as follows:
`GOODDATA_FLIGHT_{SECTION}__{SETTING_NAME}`

Where the `SECTION` is for example `[server]`. For convenience, the [sample-config.toml](sample-config.toml)
indicates the full name of respective environment variable in each setting's documentation.

#### Configuration for your service

If your service needs its own configuration, you should aim to have a TOML config file like this:

```toml
[my_service]
# env: GOODDATA_FLIGHT_MY_SERVICE__OPT1
opt1 = "value"
```

When you provide such config file to server, it will parse it and make its contents available in the `ctx.settings`.
You can then access value of this setting in your factory function. For example like this:

```python
import gooddata_flight_server as gf

_MY_CONFIG_SECTION = "my_service"

@gf.flight_server_methods
def my_service(ctx: gf.ServerContext) -> gf.FlightServerMethods:
    opt1 = ctx.settings.get(f"{_MY_CONFIG_SECTION}.opt1")

    # ... create and return server methods ...
```

### Authentication

Currently, the server supports two modes of authentication:

- no authentication
- token-based authentication and allows you to plug in custom token verification logic

The token verification method that comes built-in with the server is a simple one: the token is
an arbitrary, secret value shared between server and client. You configure the list of valid secret
tokens at server start-up and then at your discretion distribute these secret values to clients.

By default, the server runs with no authentication. To turn on the token based authentication,
you have to:

- Set the `authentication_method` setting to `token`.

  By default, the server will use the built-in token verification strategy
  called `EnumeratedTokenVerification`.

- Configure the secret tokens.

  You can do this using environment variable: `GOODDATA_FLIGHT_ENUMERATED_TOKENS__TOKENS='["", ""]'`.
  Put the secret token(s) inside the quotes. Alternatively, you can code tokens into a configuration file
  such as this:

  ```toml
  [enumerated_tokens]
  tokens = ["", ""]
  ```

  IMPORTANT: never commit secrets to your VCS.

With this setup in place, the server will expect the Flight clients to include token in the
`authorization` header in form of `Bearer <token>`. The token must be present on every
call.

Here is an example how to make a call that includes the `authorization` header:

```python
import pyarrow.flight

def example_call_using_tokens():
    opts = pyarrow.flight.FlightCallOptions(headers=[(b"authorization", b"Bearer <token>")])
    client = pyarrow.flight.FlightClient("grpc+tls://localhost:17001")

    for flight in client.list_flights(b"", opts):
        print(flight)
```

## Developer Manual

This part of the documentation explains additional capabilities of the server.

### Long-running tasks

Part of this package is a component that you can use to generate Flight data using long-running
tasks: the `TaskExecutor` component. The server will configure and create an instance of TaskExecutor
at startup; your server can access it via `ServerContext`.

The `TaskExecutor` implementation wraps on top of `ThreadPoolExecutor`: you can configure the number of
threads available for your tasks using `task_threads` setting. Each active task will use one thread from
this pool. If all threads are occupied, the tasks will be queued using FIFO strategy.

To use the `TaskExecutor`, you have to encapsulate the Flight data generation logic into a class
that extends the `Task` interface. Here, in the `run()` method you implement the necessary
algorithm that generates data.

The `Task` interface comes with a contract how your code should return the result (data) or raise
errors. The `TaskExecutor` will hold onto the results generated by your task and retain them for
a configured amount of time (see `task_result_ttl_sec` setting). The infrastructure recognizes that
your task may generate result that can be consumed either repeatedly (say Arrow Tables) or just
once (say RecordBatchReader backed by live stream).

Here is an example showing how to code a task, how to integrate its execution and how to
send out data that it generated:

```python
from typing import Union, Any

import pyarrow.flight

import gooddata_flight_server as gf


class MyServiceTask(gf.Task):
    def __init__(
            self,
            task_specific_payload: Any,
            cmd: bytes,
    ):
        super().__init__(cmd)

        self._task_specific_payload = task_specific_payload

    def run(self) -> Union[gf.TaskResult, gf.TaskError]:
        # tasks support cancellation; your code can check for
        # cancellation at any time; if the task was cancelled the
        # method will raise exception.
        #
        # do not forget to do cleanup on cancellation
        self.check_cancelled()

        # ... do whatever is needed to generate the data
        data: pyarrow.RecordBatchReader = some_method_to_generate_data()

        # when the data is ready, wrap it in a result that implements
        # the FlightDataTaskResult interface; there are built-in implementations
        # to wrap Arrow Table or Arrow RecordBatchReader.
        #
        # you can write your own result if you need special handling
        # of result and/or resources bound to the result.
        return gf.FlightDataTaskResult.for_data(data)


class DataServiceMethods(gf.FlightServerMethods):
    def __init__(self, ctx: gf.ServerContext) -> None:
        self._ctx = ctx

    def _prepare_flight_info(self, task_result: gf.TaskExecutionResult) -> pyarrow.flight.FlightInfo:
        if task_result.error is not None:
            raise task_result.error.as_flight_error()

        if task_result.cancelled:
            raise gf.ErrorInfo.for_reason(
                gf.ErrorCode.COMMAND_CANCELLED,
                f"Service call was cancelled. Invocation task was: '{task_result.task_id}'.",
            ).to_server_error()

        result = task_result.result

        return pyarrow.flight.FlightInfo(
            schema=result.get_schema(),
            descriptor=pyarrow.flight.FlightDescriptor.for_command(task_result.cmd),
            endpoints=[
                pyarrow.flight.FlightEndpoint(
                    ticket=pyarrow.flight.Ticket(ticket=task_result.task_id.encode()),
                    locations=[self._ctx.location],
                )
            ],
            total_records=-1,
            total_bytes=-1,
        )

    def get_flight_info(
            self,
            context: pyarrow.flight.ServerCallContext,
            descriptor: pyarrow.flight.FlightDescriptor,
    ) -> pyarrow.flight.FlightInfo:
        cmd = descriptor.command
        # parse & validate the command
        some_parsed_command = ...

        # create your custom task; you will usually pass the parsed command
        # so that task knows what to do. The 'raw' command is required as well because
        # it should be bounced back in the FlightInfo
        task = MyServiceTask(task_specific_payload=some_parsed_command, cmd=cmd)
        self._ctx.task_executor.submit(task)

        # wait for the task to complete
        result = self._ctx.task_executor.wait_for_result(task_id=task.task_id)

        # once the task completes, create the FlightInfo or raise exception in
        # case the task failed. The ticket in the FlightInfo should contain the
        # task identifier.
        return self._prepare_flight_info(result)

    def do_get(self,
               context: pyarrow.flight.ServerCallContext,
               ticket: pyarrow.flight.Ticket
               ) -> pyarrow.flight.FlightDataStream:
        # caller comes to pick the data; the ticket should be the task identifier
        task_id = ticket.ticket.decode()

        # this utility method on the base class takes care of everything needed
        # to correctly create FlightDataStream from the task result (or die trying
        # in case the task result is no longer preset, or the result indicates that
        # the task has failed)
        return self.do_get_task_result(context, self._ctx.task_executor, task_id)
```

### Custom token verification strategy

At the moment, the built-in token verification strategy supported by the server is the
most basic one. In cases when this strategy is not good enough, you can code your own
and plug it into the server.

The `TokenVerificationStrategy` interface sets contract for your custom strategy. You
implement this class inside a Python module and then tell the server to load that
module.

For example, you create a module `my_service.auth.custom_token_verification` where you
implement the verification strategy:

```python
import gooddata_flight_server as gf
import pyarrow.flight
from typing import Any


class MyCustomTokenVerification(gf.TokenVerificationStrategy):
    def verify(self, call_info: pyarrow.flight.CallInfo, token: str) -> Any:
        # implement your arbitrary logic here;
        #
        # see method and class documentation to learn more
        raise NotImplementedError

    @classmethod
    def create(cls, ctx: gf.ServerContext) -> "TokenVerificationStrategy":
        # code has chance to read any necessary settings from `ctx.settings`
        # property and then use those values to construct the class
        #
        # see method and class documentation to learn more
        return MyCustomTokenVerification()
```

Then, you can use the `token_verification` setting to tell the server to look up
and load token verification strategy from `my_service.auth.custom_token_verification` module.

Using custom verification strategy, you can implement support for say JWT tokens or look
up valid tokens inside some database.

NOTE: As is, the server infrastructure does not concern itself with how the clients actually
obtain the valid tokens. At the moment, this is outside of this project's scope. You can distribute
tokens to clients using some procedure or implement custom APIs where clients have to log in
in order to obtain a valid token.

### Logging

The server comes with the `structlog` installed by default. The `structlog` is used and configured
so that it uses Python stdlib logging backend. The `structlog` pipeline is set up so that:

- In dev mode, the logs are pretty-printed into console (achieved by `--dev-log` option of the server)
- In production deployment, the logs are serialized into JSON (using orjson) which is then written out.
  This is ideal for consumption in log aggregators.

By default, the stdlib loggers are configured using the [default.logging.ini](./gooddata_flight_server/server/default.logging.ini)
file. In the default setup, all INFO-level logs are emitted.

If you want to customize the logging configuration, then:

- make a copy of this file and tweak it as you need
- either pass path to your config file to the `create_server` function or use `--logging-config`
  argument on the CLI

The config file is the standard Python logging configuration file. You can learn about its intricacies
in Python documentation.

NOTE: you typically do not want to touch the formatter settings inside the logging ini file - the
`structlog` library creates the entire log lines accordingly to deployment mode.

The use of `structlog` and loggers is fairly straightforward:

```python
import structlog

_LOGGER = structlog.get_logger("my_service")
_LOGGER.info("event-name", some_event_key="value_to_log")
```

#### Recommendations

Here are few assorted recommendations based on our production experience with `structlog`:

- You can log complex objects such as lists, tuples, dicts and data classes no problem
  - Be careful though. What can be serialized into dev-log may not always serialize
    using `orjson` into production logs
- Always log exceptions using the special [exc_info](https://www.structlog.org/en/stable/exceptions.html) event key.
- Mind the cardinality of the logger instances. If you have a class of which you may have thousands of
  instances, then it is **not a good idea** to create a logger instance for each instance of your class - even
  if the logger name is the same; this is because each logger instance comes with memory overhead.

### Prometheus Metrics

The server can be configured to start HTTP endpoint that exposes values of Prometheus
metrics. This is disabled by default.

To get started with Prometheus metrics you need to:

- Set `metrics_host` and `metrics_port`

  - Check out the config file comments to learn more about these settings.
  - What you have to remember is that the Prometheus scraper is an external process that
    needs to reach the HTTP endpoint via network.

From then on, you can start using the Prometheus client to create various types of metrics. For example:

```python
from prometheus_client import Counter

# instantiate counter
MY_COUNTER = Counter(
    "my_counter",
    "Fitting description of `my_counter`.",
)

def some_function():
    # ...
    MY_COUNTER.inc()
```

#### Recommendations

Here are a few assorted recommendations based on our production experience:

- You must avoid double-declaration of metrics. If you try to define metric with same
  identifier twice, the registration will fail.

- It is nice to declare all/most metrics in single place. For example create `my_metrics.py`
  file and in that have `MyMetrics` class with one static field per metric.

  This approach leads to better 'discoverability' of available metrics just by looking
  at code. Using class with static field per-metric in turn makes imports and autocomplete
  more convenient.

### Open Telemetry

The server can be configured to integrate with OpenTelemetry and start and auto-configure
OpenTelemetry exporters. It will also auto-fill the ResourceAttributes by doing discovery where possible.

See the `otel_*` options in the configuration files to learn more. In a nutshell it
goes as follows:

- Configure which exporter to use using `otel_exporter_type` setting.

  Nowadays, the `otlp-grpc` or `otlp-http` is the usual choice.

  Depending on the exporter you use, you may/must specify additional, exporter-specific
  environment variables to configure the exporter. The supported environment variables
  are documented in the respective OpenTelemetry exporter package; e.g. they are not
  something special to GoodData's Flight Server.

  See [official exporter documentation](https://opentelemetry-python.readthedocs.io/en/latest/exporter/otlp/otlp.html#module-opentelemetry.exporter.otlp.proto.grpc).

- Install the respective exporter package.

- Tweak the other `otel_*` settings: you must at minimum set the `otel_service_name`

  The settings apart from `otel_service_name` will fall back to defaults.

To start tracing, you need to initialize a tracer. You can do so as follows:

```python
from opentelemetry import trace

MY_TRACER: trace.Tracer = trace.ProxyTracer("my_tracer")
```

Typically, you want to create one instance of tracer for your entire data service and then
import that instance and use it wherever needed to create spans:

```python
from your_module_with_tracer import MY_TRACER

def some_function():
    # ... code
    with MY_TRACER.start_as_current_span("do_some_work") as span:
        # ... code
        pass
```

Note: there are many ways to instrument your code with spans. See [OpenTelemetry documentation](https://opentelemetry.io/docs/languages/python/instrumentation/)
to find out more.

#### Recommendations

Here are a few assorted recommendations based on our production experience:

- Always use the `ProxyTracer`. The underlying initialization code done by the server
  will correctly set the actual tracer that will be called from the ProxyTracer.

  This way, if you turn off OpenTelemetry (by commenting out the `otel_export_type` setting or setting it
  to 'none'), the NoOpTracer will be injected under the covers and all the tracing code will
  be no-op as well.

### Health Checks

The server comes with a basic health-checking infrastructure - this is especially useful
when deploying to environments (such as k8s) that monitor health of your server and can automatically
restart it in case of problems.

When you configure the `health_check_host` (and optionally also `health_check_port`) setting, the
server will expose two HTTP endpoints:

- `/ready` - indicates whether the server is up and ready to serve requests

  The endpoint will respond with status `500` if the server is not ready. Otherwise will respond with
  `202`. The server is deemed ready when all its modules are up and the Flight RPC server is
  'unlocked' to handle requests.

- `/live` - indicates whether the server is still alive and can be used. The liveness is determined
  from the status of the modules.

  Each of the server's modules can report its status to a central health checking service. If any of
  the modules is unhealthy, the whole server is unhealthy.

  Similar to the readiness, the server will respond with status `500` when not healthy. Otherwise, it
  will respond with status `202`.

Creating health-checks is fairly straightforward:

- Your service's factory function receives `ServerContext`

  - The `ServerContext` contains `health` property - which returns an instance of `ServerHealthMonitor`

  - At this occasion, your code should hold onto / propagate the health monitor to any mission-critical
    modules / components used by your implementation

- The `ServerHealthMonitor` has `set_module_status(module, status)` method - you can use this to indicate status

  - The module `name` argument to this method can be anything you see fit
  - The status is either `ModuleHealthStatus.OK` or `ModuleHealthStatus.NOT_OK`
  - When your module is `NOT_OK`, the entire server is `NOT_OK`
  - Usually, there is a grace period for which the server can be `NOT_OK`; after the time is up,
    environment will restart the server
  - If you return your module back to `OK` status, the server returns to `OK` status as well - thus
    avoiding the automatic restarts.

Here is an example component using health monitoring:

```python
import gooddata_flight_server as gf

class YourMissionCriticalComponent:
    """
    Let's say this component is used to perform some heavy lifting / important job.

    The component is created in your service's factory and is used during Flight RPC
    invocation. You propagate the `health` monitor to it at construction time.
    """
    def __init__(self, health: gf.ServerHealthMonitor) -> None:
        self._health = health

    def some_important_method(self):
        try:
            # this does some important work
            return
        except OSError:
            # it runs into some kind of unrecoverable error (OSError here is purely example);
            # by setting the status to NOT_OK, your component indicates that it is unhealthy
            # and the /live endpoint will report the entire server as unhealthy.
            #
            # usually, the liveness checks have a grace period. if you set the module back
            # to `gf.ModuleHealthStatus.OK` everything turns healthy again. If the grace
            # period elapses, the server will usually be restarted by the environment.
            self._health.set_module_status("YourMissionCriticalComponent", gf.ModuleHealthStatus.NOT_OK)
            raise
```

## Troubleshooting

### Clients cannot read data during GetFlightInfo->DoGet flow; getting DNS errors

The root cause here is usually misconfiguration of `listen_host` and `advertise_host`

You must always remember that `GetFlightInfo` returns a `FlightInfo` that is used
by clients to obtain the data using `DoGet`. The `FlightInfo` contains the location(s)
that the client will connect to - they must be reachable by the client.

There are a few things to check:

1. Ensure that your service implementation correctly sets Location in the FlightInfo

   Usually, you want to set the location to the value that your service implementation
   receives in the `ServerContext`. This location is prepared by the server and contains
   the value of `advertise_host` and `advertise_port`.

2. Ensure that the `advertise_host` is set correctly; mistakes can happen easily especially
   in dockerized environments. The documentation of `listen_host` and `advertise_host`
   has additional detail.

   To highlight specifics of Dockerized deployment:

   - The server most often needs to listen on `0.0.0.0`
   - The server must, however, advertise different hostname/IP - one that is reachable from
     outside the container

### The server's RSS keeps on growing; looks like server leaking memory

This can be usually observed on servers that are write-heavy: servers that handle a lot
of `DoPut` or `DoExchange` requests. When such servers run in environments that enforce
RSS limits, they can end up killed.

Often, this not a leak but a behavior of `malloc`. Even if you tell PyArrow to use
the `jemalloc` allocator, the underlying gRPC server used by Flight RPC will use `malloc` and
by default `malloc` will take its time returning unused memory back to the system.

And since the gRPC server is responsible for allocating memory for the received Arrow data,
it is often the `DoPut` or `DoExchange` workload that look like leaking memory.

If the RSS size is a problem (say you are running service inside k8s with memory limit), the
usual strategy is to:

1. Set / tweak malloc behavior using `GLIBC_TUNABLES` environment variable; reduce
   the malloc trim threshold and possibly also reduce number of malloc arenas

   Here is a quite aggressive: `GLIBC_TUNABLES="glibc.malloc.trim_threshold=4:glibc.malloc.arena_max=2:glibc.malloc.tcache_count=0"`

2. Periodically call `malloc_trim` to poke malloc to trim any unneeded allocations and
   return them to the system.

   The GoodData Flight server already implements period malloc trim. By default, the interval
   is set to `30 seconds`. You can change this interval using the `malloc_trim_interval_sec`
   setting.

Additionally, we recommend to read up on [Python Memory Management](https://realpython.com/python-memory-management/) -
especially the part where CPython is not returning unused blocks back to the system. This may be another reason for
RSS growth - the tricky bit here being that it really depends on object creation patterns in your service.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "gooddata-flight-server",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9.0",
    "maintainer_email": null,
    "keywords": "gooddata, flight, rpc, flight rpc, custom functions, analytics, headless, business, intelligence, headless-bi, cloud, native, semantic, layer, sql, metrics",
    "author": "GoodData",
    "author_email": "support@gooddata.com",
    "download_url": "https://files.pythonhosted.org/packages/df/8c/3bd2ca2f28cac9fa050593e1e4a88ce3f82ae6bd2a0d00d6d25f6d6fe79b/gooddata_flight_server-1.33.0.tar.gz",
    "platform": null,
    "description": "# GoodData Flight Server\n\nThe GoodData Flight Server is an opinionated, pluggable Flight RPC Server implementation.\n\nIt builds on top of the Flight RPC components provided by [PyArrow](https://pypi.org/project/pyarrow/) and\non functions and capabilities typically needed when building production-ready\nFlight RPC data services:\n\n- A robust configuration system leveraging [Dynaconf](https://www.dynaconf.com/)\n- Enablement of data service observability (logging, metrics, tracing)\n- Health checking exposed via liveness and readiness endpoints\n- Token-based authentication with pluggable token verification methods\n\nNext to this, the server also comes with infrastructure that you can leverage\nfor building data service functionality itself:\n\n- Library for generating and serving Flights created using long-running tasks\n- Extendable error handling infrastructure that allows your service to\n  provide error information in structured manner\n\nCode in this package is derived from our production codebase, where we run\nand operate many different data services and have this infrastructure proven\nand battle-tested.\n\n## Getting Started\n\nThe `gooddata-flight-server` package is like any other. You can install it\nusing `pip install gooddata-flight-server` or - more common - add it as dependency\nto your project.\n\nThe server takes care of all the boilerplate, and you take care of implementing\nthe Flight RPC methods - similar as you would implement them using PyArrow's Flight\nserver.\n\nHere is a very simple example of the data service's Flight RPC methods implementation:\n\n```python\nimport gooddata_flight_server as gf\nimport pyarrow.flight\n\n\nclass DataServiceMethods(gf.FlightServerMethods):\n  \"\"\"\n  This example data service serves some sample static data. Any\n  DoGet request will return that static data. All other Flight RPC\n  methods are left unimplemented.\n\n  Note how the class holds onto the `ServerContext` - the implementations\n  will usually want to do this because the context contains additional\n  dependencies such as:\n\n  - Location to send out in FlightInfo\n  - Health monitor that the implementation can use to indicate\n    its status\n  - Task executor to perform long-running tasks\n  \"\"\"\n\n  StaticData = pyarrow.table({\n    \"col1\": [1, 2, 3]\n  })\n\n  def __init__(self, ctx: gf.ServerContext) -> None:\n    self._ctx = ctx\n\n  def do_get(self,\n             context: pyarrow.flight.ServerCallContext,\n             ticket: pyarrow.flight.Ticket\n             ) -> pyarrow.flight.FlightDataStream:\n    return pyarrow.flight.RecordBatchStream(\n      self.StaticData\n    )\n\n\n@gf.flight_server_methods\ndef my_service(ctx: gf.ServerContext) -> gf.FlightServerMethods:\n  \"\"\"\n  Factory function for the data service. It returns implementation of Flight RPC\n  methods which are then integrated into the server.\n\n  The ServerContext passed in `ctx` allows you to access available configuration\n  and various useful server components.\n  \"\"\"\n  return DataServiceMethods(ctx)\n\n\nif __name__ == \"__main__\":\n  # additional options & config files can be passed to the\n  # create_server methods; more on this later\n  server = gf.create_server(my_service)\n  server.start()\n\n  # the main thread will block on this call\n  #\n  # SIGINT/SIGTERM causes graceful shutdown - the method will\n  # exit once server is stopped.\n  server.wait_for_stop()\n```\n\nNotice the annotated `my_service` function. This is a factory for your data service's\nFlight RPC methods. The server will call it out at appropriate time during the startup.\nIt will pass you the full context available at the time from where your code can access:\n\n- available configuration loaded using Dynaconf\n- health-checking components\n- components to use for running long-running tasks.\n\nDuring startup, the server will register signal handlers for SIGINT and SIGTERM - it will\nperform graceful shutdown and tear everything down in the correct order when it receives them.\n\nThe server also comes with a simple CLI that you can use to start it up and load particular\ndata service:\n\n```shell\n$ gooddata-flight-server start --methods-provider my_service.main\n```\n\nThe CLI will import the `my_service.main` Python module and look for a function decorated\nwith `@flight_server_methods`. It will start the server and make it initialize your data service\nimplementation and integrate it into the Flight RPC server.\n\nWithout any configuration, the server will bind to `127.0.0.1:17001` and run without TLS and not\nuse any authentication. It will not start health check or metric endpoints and will not start\nthe OpenTelemetry exporters.\n\nNOTE: the CLI also has other arguments that let you specify configuration files to load and\nlogging configuration to use.\n\n### Configuration\n\nThe server uses [Dynaconf](https://www.dynaconf.com/) to for all its configuration. There are\nmany settings already in place to influence server's configuration and behavior. Your data service\ncode can also leverage Dynaconf config to configure itself: you can pass any number of configuration\nfiles / env variables at startup; the server will load them all using Dynaconf and let your code\nwork with Dynaconf structures.\n\nWe recommend you to check out the Dynaconf documentation to learn more about how it works and\nwhat are the capabilities. This text will only highlight the most common usage.\n\nThe available server settings are documented in the [sample-config.toml](sample-config.toml).\nYou can take this and use it as template for your own configuration.\n\nTo use a configuration file during startup, you can start the server like this:\n\n```shell\n$ gooddata-flight-server start \\\n  --methods-provider my_service.main \\\n  --config server.config.toml\n```\n\nIn case your service needs its own configuration, it is often a good idea to keep it in\na separate file and add that to startup:\n\n```shell\n$ gooddata-flight-server start \\\n  --methods-provider my_service.main \\\n  --config server.config.toml my_service.config.toml\n```\n\n#### Environment variables\n\nAll settings that you can code into the config file can be also provided using environment\nvariables.\n\nThe server's Dynaconf integration is set up so that all environment variables are\nexpected to be prefixed with `GOODDATA_FLIGHT_`.\n\nThe environment variable naming convention is set up by Dynaconf and goes as follows:\n`GOODDATA_FLIGHT_{SECTION}__{SETTING_NAME}`\n\nWhere the `SECTION` is for example `[server]`. For convenience, the [sample-config.toml](sample-config.toml)\nindicates the full name of respective environment variable in each setting's documentation.\n\n#### Configuration for your service\n\nIf your service needs its own configuration, you should aim to have a TOML config file like this:\n\n```toml\n[my_service]\n# env: GOODDATA_FLIGHT_MY_SERVICE__OPT1\nopt1 = \"value\"\n```\n\nWhen you provide such config file to server, it will parse it and make its contents available in the `ctx.settings`.\nYou can then access value of this setting in your factory function. For example like this:\n\n```python\nimport gooddata_flight_server as gf\n\n_MY_CONFIG_SECTION = \"my_service\"\n\n@gf.flight_server_methods\ndef my_service(ctx: gf.ServerContext) -> gf.FlightServerMethods:\n    opt1 = ctx.settings.get(f\"{_MY_CONFIG_SECTION}.opt1\")\n\n    # ... create and return server methods ...\n```\n\n### Authentication\n\nCurrently, the server supports two modes of authentication:\n\n- no authentication\n- token-based authentication and allows you to plug in custom token verification logic\n\nThe token verification method that comes built-in with the server is a simple one: the token is\nan arbitrary, secret value shared between server and client. You configure the list of valid secret\ntokens at server start-up and then at your discretion distribute these secret values to clients.\n\nBy default, the server runs with no authentication. To turn on the token based authentication,\nyou have to:\n\n- Set the `authentication_method` setting to `token`.\n\n  By default, the server will use the built-in token verification strategy\n  called `EnumeratedTokenVerification`.\n\n- Configure the secret tokens.\n\n  You can do this using environment variable: `GOODDATA_FLIGHT_ENUMERATED_TOKENS__TOKENS='[\"\", \"\"]'`.\n  Put the secret token(s) inside the quotes. Alternatively, you can code tokens into a configuration file\n  such as this:\n\n  ```toml\n  [enumerated_tokens]\n  tokens = [\"\", \"\"]\n  ```\n\n  IMPORTANT: never commit secrets to your VCS.\n\nWith this setup in place, the server will expect the Flight clients to include token in the\n`authorization` header in form of `Bearer <token>`. The token must be present on every\ncall.\n\nHere is an example how to make a call that includes the `authorization` header:\n\n```python\nimport pyarrow.flight\n\ndef example_call_using_tokens():\n    opts = pyarrow.flight.FlightCallOptions(headers=[(b\"authorization\", b\"Bearer <token>\")])\n    client = pyarrow.flight.FlightClient(\"grpc+tls://localhost:17001\")\n\n    for flight in client.list_flights(b\"\", opts):\n        print(flight)\n```\n\n## Developer Manual\n\nThis part of the documentation explains additional capabilities of the server.\n\n### Long-running tasks\n\nPart of this package is a component that you can use to generate Flight data using long-running\ntasks: the `TaskExecutor` component. The server will configure and create an instance of TaskExecutor\nat startup; your server can access it via `ServerContext`.\n\nThe `TaskExecutor` implementation wraps on top of `ThreadPoolExecutor`: you can configure the number of\nthreads available for your tasks using `task_threads` setting. Each active task will use one thread from\nthis pool. If all threads are occupied, the tasks will be queued using FIFO strategy.\n\nTo use the `TaskExecutor`, you have to encapsulate the Flight data generation logic into a class\nthat extends the `Task` interface. Here, in the `run()` method you implement the necessary\nalgorithm that generates data.\n\nThe `Task` interface comes with a contract how your code should return the result (data) or raise\nerrors. The `TaskExecutor` will hold onto the results generated by your task and retain them for\na configured amount of time (see `task_result_ttl_sec` setting). The infrastructure recognizes that\nyour task may generate result that can be consumed either repeatedly (say Arrow Tables) or just\nonce (say RecordBatchReader backed by live stream).\n\nHere is an example showing how to code a task, how to integrate its execution and how to\nsend out data that it generated:\n\n```python\nfrom typing import Union, Any\n\nimport pyarrow.flight\n\nimport gooddata_flight_server as gf\n\n\nclass MyServiceTask(gf.Task):\n    def __init__(\n            self,\n            task_specific_payload: Any,\n            cmd: bytes,\n    ):\n        super().__init__(cmd)\n\n        self._task_specific_payload = task_specific_payload\n\n    def run(self) -> Union[gf.TaskResult, gf.TaskError]:\n        # tasks support cancellation; your code can check for\n        # cancellation at any time; if the task was cancelled the\n        # method will raise exception.\n        #\n        # do not forget to do cleanup on cancellation\n        self.check_cancelled()\n\n        # ... do whatever is needed to generate the data\n        data: pyarrow.RecordBatchReader = some_method_to_generate_data()\n\n        # when the data is ready, wrap it in a result that implements\n        # the FlightDataTaskResult interface; there are built-in implementations\n        # to wrap Arrow Table or Arrow RecordBatchReader.\n        #\n        # you can write your own result if you need special handling\n        # of result and/or resources bound to the result.\n        return gf.FlightDataTaskResult.for_data(data)\n\n\nclass DataServiceMethods(gf.FlightServerMethods):\n    def __init__(self, ctx: gf.ServerContext) -> None:\n        self._ctx = ctx\n\n    def _prepare_flight_info(self, task_result: gf.TaskExecutionResult) -> pyarrow.flight.FlightInfo:\n        if task_result.error is not None:\n            raise task_result.error.as_flight_error()\n\n        if task_result.cancelled:\n            raise gf.ErrorInfo.for_reason(\n                gf.ErrorCode.COMMAND_CANCELLED,\n                f\"Service call was cancelled. Invocation task was: '{task_result.task_id}'.\",\n            ).to_server_error()\n\n        result = task_result.result\n\n        return pyarrow.flight.FlightInfo(\n            schema=result.get_schema(),\n            descriptor=pyarrow.flight.FlightDescriptor.for_command(task_result.cmd),\n            endpoints=[\n                pyarrow.flight.FlightEndpoint(\n                    ticket=pyarrow.flight.Ticket(ticket=task_result.task_id.encode()),\n                    locations=[self._ctx.location],\n                )\n            ],\n            total_records=-1,\n            total_bytes=-1,\n        )\n\n    def get_flight_info(\n            self,\n            context: pyarrow.flight.ServerCallContext,\n            descriptor: pyarrow.flight.FlightDescriptor,\n    ) -> pyarrow.flight.FlightInfo:\n        cmd = descriptor.command\n        # parse & validate the command\n        some_parsed_command = ...\n\n        # create your custom task; you will usually pass the parsed command\n        # so that task knows what to do. The 'raw' command is required as well because\n        # it should be bounced back in the FlightInfo\n        task = MyServiceTask(task_specific_payload=some_parsed_command, cmd=cmd)\n        self._ctx.task_executor.submit(task)\n\n        # wait for the task to complete\n        result = self._ctx.task_executor.wait_for_result(task_id=task.task_id)\n\n        # once the task completes, create the FlightInfo or raise exception in\n        # case the task failed. The ticket in the FlightInfo should contain the\n        # task identifier.\n        return self._prepare_flight_info(result)\n\n    def do_get(self,\n               context: pyarrow.flight.ServerCallContext,\n               ticket: pyarrow.flight.Ticket\n               ) -> pyarrow.flight.FlightDataStream:\n        # caller comes to pick the data; the ticket should be the task identifier\n        task_id = ticket.ticket.decode()\n\n        # this utility method on the base class takes care of everything needed\n        # to correctly create FlightDataStream from the task result (or die trying\n        # in case the task result is no longer preset, or the result indicates that\n        # the task has failed)\n        return self.do_get_task_result(context, self._ctx.task_executor, task_id)\n```\n\n### Custom token verification strategy\n\nAt the moment, the built-in token verification strategy supported by the server is the\nmost basic one. In cases when this strategy is not good enough, you can code your own\nand plug it into the server.\n\nThe `TokenVerificationStrategy` interface sets contract for your custom strategy. You\nimplement this class inside a Python module and then tell the server to load that\nmodule.\n\nFor example, you create a module `my_service.auth.custom_token_verification` where you\nimplement the verification strategy:\n\n```python\nimport gooddata_flight_server as gf\nimport pyarrow.flight\nfrom typing import Any\n\n\nclass MyCustomTokenVerification(gf.TokenVerificationStrategy):\n    def verify(self, call_info: pyarrow.flight.CallInfo, token: str) -> Any:\n        # implement your arbitrary logic here;\n        #\n        # see method and class documentation to learn more\n        raise NotImplementedError\n\n    @classmethod\n    def create(cls, ctx: gf.ServerContext) -> \"TokenVerificationStrategy\":\n        # code has chance to read any necessary settings from `ctx.settings`\n        # property and then use those values to construct the class\n        #\n        # see method and class documentation to learn more\n        return MyCustomTokenVerification()\n```\n\nThen, you can use the `token_verification` setting to tell the server to look up\nand load token verification strategy from `my_service.auth.custom_token_verification` module.\n\nUsing custom verification strategy, you can implement support for say JWT tokens or look\nup valid tokens inside some database.\n\nNOTE: As is, the server infrastructure does not concern itself with how the clients actually\nobtain the valid tokens. At the moment, this is outside of this project's scope. You can distribute\ntokens to clients using some procedure or implement custom APIs where clients have to log in\nin order to obtain a valid token.\n\n### Logging\n\nThe server comes with the `structlog` installed by default. The `structlog` is used and configured\nso that it uses Python stdlib logging backend. The `structlog` pipeline is set up so that:\n\n- In dev mode, the logs are pretty-printed into console (achieved by `--dev-log` option of the server)\n- In production deployment, the logs are serialized into JSON (using orjson) which is then written out.\n  This is ideal for consumption in log aggregators.\n\nBy default, the stdlib loggers are configured using the [default.logging.ini](./gooddata_flight_server/server/default.logging.ini)\nfile. In the default setup, all INFO-level logs are emitted.\n\nIf you want to customize the logging configuration, then:\n\n- make a copy of this file and tweak it as you need\n- either pass path to your config file to the `create_server` function or use `--logging-config`\n  argument on the CLI\n\nThe config file is the standard Python logging configuration file. You can learn about its intricacies\nin Python documentation.\n\nNOTE: you typically do not want to touch the formatter settings inside the logging ini file - the\n`structlog` library creates the entire log lines accordingly to deployment mode.\n\nThe use of `structlog` and loggers is fairly straightforward:\n\n```python\nimport structlog\n\n_LOGGER = structlog.get_logger(\"my_service\")\n_LOGGER.info(\"event-name\", some_event_key=\"value_to_log\")\n```\n\n#### Recommendations\n\nHere are few assorted recommendations based on our production experience with `structlog`:\n\n- You can log complex objects such as lists, tuples, dicts and data classes no problem\n  - Be careful though. What can be serialized into dev-log may not always serialize\n    using `orjson` into production logs\n- Always log exceptions using the special [exc_info](https://www.structlog.org/en/stable/exceptions.html) event key.\n- Mind the cardinality of the logger instances. If you have a class of which you may have thousands of\n  instances, then it is **not a good idea** to create a logger instance for each instance of your class - even\n  if the logger name is the same; this is because each logger instance comes with memory overhead.\n\n### Prometheus Metrics\n\nThe server can be configured to start HTTP endpoint that exposes values of Prometheus\nmetrics. This is disabled by default.\n\nTo get started with Prometheus metrics you need to:\n\n- Set `metrics_host` and `metrics_port`\n\n  - Check out the config file comments to learn more about these settings.\n  - What you have to remember is that the Prometheus scraper is an external process that\n    needs to reach the HTTP endpoint via network.\n\nFrom then on, you can start using the Prometheus client to create various types of metrics. For example:\n\n```python\nfrom prometheus_client import Counter\n\n# instantiate counter\nMY_COUNTER = Counter(\n    \"my_counter\",\n    \"Fitting description of `my_counter`.\",\n)\n\ndef some_function():\n    # ...\n    MY_COUNTER.inc()\n```\n\n#### Recommendations\n\nHere are a few assorted recommendations based on our production experience:\n\n- You must avoid double-declaration of metrics. If you try to define metric with same\n  identifier twice, the registration will fail.\n\n- It is nice to declare all/most metrics in single place. For example create `my_metrics.py`\n  file and in that have `MyMetrics` class with one static field per metric.\n\n  This approach leads to better 'discoverability' of available metrics just by looking\n  at code. Using class with static field per-metric in turn makes imports and autocomplete\n  more convenient.\n\n### Open Telemetry\n\nThe server can be configured to integrate with OpenTelemetry and start and auto-configure\nOpenTelemetry exporters. It will also auto-fill the ResourceAttributes by doing discovery where possible.\n\nSee the `otel_*` options in the configuration files to learn more. In a nutshell it\ngoes as follows:\n\n- Configure which exporter to use using `otel_exporter_type` setting.\n\n  Nowadays, the `otlp-grpc` or `otlp-http` is the usual choice.\n\n  Depending on the exporter you use, you may/must specify additional, exporter-specific\n  environment variables to configure the exporter. The supported environment variables\n  are documented in the respective OpenTelemetry exporter package; e.g. they are not\n  something special to GoodData's Flight Server.\n\n  See [official exporter documentation](https://opentelemetry-python.readthedocs.io/en/latest/exporter/otlp/otlp.html#module-opentelemetry.exporter.otlp.proto.grpc).\n\n- Install the respective exporter package.\n\n- Tweak the other `otel_*` settings: you must at minimum set the `otel_service_name`\n\n  The settings apart from `otel_service_name` will fall back to defaults.\n\nTo start tracing, you need to initialize a tracer. You can do so as follows:\n\n```python\nfrom opentelemetry import trace\n\nMY_TRACER: trace.Tracer = trace.ProxyTracer(\"my_tracer\")\n```\n\nTypically, you want to create one instance of tracer for your entire data service and then\nimport that instance and use it wherever needed to create spans:\n\n```python\nfrom your_module_with_tracer import MY_TRACER\n\ndef some_function():\n    # ... code\n    with MY_TRACER.start_as_current_span(\"do_some_work\") as span:\n        # ... code\n        pass\n```\n\nNote: there are many ways to instrument your code with spans. See [OpenTelemetry documentation](https://opentelemetry.io/docs/languages/python/instrumentation/)\nto find out more.\n\n#### Recommendations\n\nHere are a few assorted recommendations based on our production experience:\n\n- Always use the `ProxyTracer`. The underlying initialization code done by the server\n  will correctly set the actual tracer that will be called from the ProxyTracer.\n\n  This way, if you turn off OpenTelemetry (by commenting out the `otel_export_type` setting or setting it\n  to 'none'), the NoOpTracer will be injected under the covers and all the tracing code will\n  be no-op as well.\n\n### Health Checks\n\nThe server comes with a basic health-checking infrastructure - this is especially useful\nwhen deploying to environments (such as k8s) that monitor health of your server and can automatically\nrestart it in case of problems.\n\nWhen you configure the `health_check_host` (and optionally also `health_check_port`) setting, the\nserver will expose two HTTP endpoints:\n\n- `/ready` - indicates whether the server is up and ready to serve requests\n\n  The endpoint will respond with status `500` if the server is not ready. Otherwise will respond with\n  `202`. The server is deemed ready when all its modules are up and the Flight RPC server is\n  'unlocked' to handle requests.\n\n- `/live` - indicates whether the server is still alive and can be used. The liveness is determined\n  from the status of the modules.\n\n  Each of the server's modules can report its status to a central health checking service. If any of\n  the modules is unhealthy, the whole server is unhealthy.\n\n  Similar to the readiness, the server will respond with status `500` when not healthy. Otherwise, it\n  will respond with status `202`.\n\nCreating health-checks is fairly straightforward:\n\n- Your service's factory function receives `ServerContext`\n\n  - The `ServerContext` contains `health` property - which returns an instance of `ServerHealthMonitor`\n\n  - At this occasion, your code should hold onto / propagate the health monitor to any mission-critical\n    modules / components used by your implementation\n\n- The `ServerHealthMonitor` has `set_module_status(module, status)` method - you can use this to indicate status\n\n  - The module `name` argument to this method can be anything you see fit\n  - The status is either `ModuleHealthStatus.OK` or `ModuleHealthStatus.NOT_OK`\n  - When your module is `NOT_OK`, the entire server is `NOT_OK`\n  - Usually, there is a grace period for which the server can be `NOT_OK`; after the time is up,\n    environment will restart the server\n  - If you return your module back to `OK` status, the server returns to `OK` status as well - thus\n    avoiding the automatic restarts.\n\nHere is an example component using health monitoring:\n\n```python\nimport gooddata_flight_server as gf\n\nclass YourMissionCriticalComponent:\n    \"\"\"\n    Let's say this component is used to perform some heavy lifting / important job.\n\n    The component is created in your service's factory and is used during Flight RPC\n    invocation. You propagate the `health` monitor to it at construction time.\n    \"\"\"\n    def __init__(self, health: gf.ServerHealthMonitor) -> None:\n        self._health = health\n\n    def some_important_method(self):\n        try:\n            # this does some important work\n            return\n        except OSError:\n            # it runs into some kind of unrecoverable error (OSError here is purely example);\n            # by setting the status to NOT_OK, your component indicates that it is unhealthy\n            # and the /live endpoint will report the entire server as unhealthy.\n            #\n            # usually, the liveness checks have a grace period. if you set the module back\n            # to `gf.ModuleHealthStatus.OK` everything turns healthy again. If the grace\n            # period elapses, the server will usually be restarted by the environment.\n            self._health.set_module_status(\"YourMissionCriticalComponent\", gf.ModuleHealthStatus.NOT_OK)\n            raise\n```\n\n## Troubleshooting\n\n### Clients cannot read data during GetFlightInfo->DoGet flow; getting DNS errors\n\nThe root cause here is usually misconfiguration of `listen_host` and `advertise_host`\n\nYou must always remember that `GetFlightInfo` returns a `FlightInfo` that is used\nby clients to obtain the data using `DoGet`. The `FlightInfo` contains the location(s)\nthat the client will connect to - they must be reachable by the client.\n\nThere are a few things to check:\n\n1. Ensure that your service implementation correctly sets Location in the FlightInfo\n\n   Usually, you want to set the location to the value that your service implementation\n   receives in the `ServerContext`. This location is prepared by the server and contains\n   the value of `advertise_host` and `advertise_port`.\n\n2. Ensure that the `advertise_host` is set correctly; mistakes can happen easily especially\n   in dockerized environments. The documentation of `listen_host` and `advertise_host`\n   has additional detail.\n\n   To highlight specifics of Dockerized deployment:\n\n   - The server most often needs to listen on `0.0.0.0`\n   - The server must, however, advertise different hostname/IP - one that is reachable from\n     outside the container\n\n### The server's RSS keeps on growing; looks like server leaking memory\n\nThis can be usually observed on servers that are write-heavy: servers that handle a lot\nof `DoPut` or `DoExchange` requests. When such servers run in environments that enforce\nRSS limits, they can end up killed.\n\nOften, this not a leak but a behavior of `malloc`. Even if you tell PyArrow to use\nthe `jemalloc` allocator, the underlying gRPC server used by Flight RPC will use `malloc` and\nby default `malloc` will take its time returning unused memory back to the system.\n\nAnd since the gRPC server is responsible for allocating memory for the received Arrow data,\nit is often the `DoPut` or `DoExchange` workload that look like leaking memory.\n\nIf the RSS size is a problem (say you are running service inside k8s with memory limit), the\nusual strategy is to:\n\n1. Set / tweak malloc behavior using `GLIBC_TUNABLES` environment variable; reduce\n   the malloc trim threshold and possibly also reduce number of malloc arenas\n\n   Here is a quite aggressive: `GLIBC_TUNABLES=\"glibc.malloc.trim_threshold=4:glibc.malloc.arena_max=2:glibc.malloc.tcache_count=0\"`\n\n2. Periodically call `malloc_trim` to poke malloc to trim any unneeded allocations and\n   return them to the system.\n\n   The GoodData Flight server already implements period malloc trim. By default, the interval\n   is set to `30 seconds`. You can change this interval using the `malloc_trim_interval_sec`\n   setting.\n\nAdditionally, we recommend to read up on [Python Memory Management](https://realpython.com/python-memory-management/) -\nespecially the part where CPython is not returning unused blocks back to the system. This may be another reason for\nRSS growth - the tricky bit here being that it really depends on object creation patterns in your service.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Flight RPC server to host custom functions",
    "version": "1.33.0",
    "project_urls": {
        "Documentation": "https://gooddata-flight-server.readthedocs.io/en/v1.33.0",
        "Source": "https://github.com/gooddata/gooddata-python-sdk"
    },
    "split_keywords": [
        "gooddata",
        " flight",
        " rpc",
        " flight rpc",
        " custom functions",
        " analytics",
        " headless",
        " business",
        " intelligence",
        " headless-bi",
        " cloud",
        " native",
        " semantic",
        " layer",
        " sql",
        " metrics"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "14d5d136c65742438ef937063ec32fc04d808210c3ee8181deabeddb7a22e784",
                "md5": "8e3d8d1e6d32c2ce3eb486fd700a212e",
                "sha256": "bb2137812f7d3498bd028bd76e7177996e37c9e9648e0ddfb6ea21b6b1c5497b"
            },
            "downloads": -1,
            "filename": "gooddata_flight_server-1.33.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "8e3d8d1e6d32c2ce3eb486fd700a212e",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9.0",
            "size": 76676,
            "upload_time": "2024-12-12T12:30:37",
            "upload_time_iso_8601": "2024-12-12T12:30:37.567169Z",
            "url": "https://files.pythonhosted.org/packages/14/d5/d136c65742438ef937063ec32fc04d808210c3ee8181deabeddb7a22e784/gooddata_flight_server-1.33.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "df8c3bd2ca2f28cac9fa050593e1e4a88ce3f82ae6bd2a0d00d6d25f6d6fe79b",
                "md5": "c019ea0e33e97bec3f0425e31be30119",
                "sha256": "eb1db5ffae81d47f78945c7a97c095fcacee2b43b80614702dd283128fb2bd86"
            },
            "downloads": -1,
            "filename": "gooddata_flight_server-1.33.0.tar.gz",
            "has_sig": false,
            "md5_digest": "c019ea0e33e97bec3f0425e31be30119",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9.0",
            "size": 70430,
            "upload_time": "2024-12-12T12:30:40",
            "upload_time_iso_8601": "2024-12-12T12:30:40.046809Z",
            "url": "https://files.pythonhosted.org/packages/df/8c/3bd2ca2f28cac9fa050593e1e4a88ce3f82ae6bd2a0d00d6d25f6d6fe79b/gooddata_flight_server-1.33.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-12 12:30:40",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "gooddata",
    "github_project": "gooddata-python-sdk",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "gooddata-flight-server"
}
        
Elapsed time: 0.51126s