omero2pandas


Nameomero2pandas JSON
Version 0.3.0 PyPI version JSON
download
home_pagehttps://github.com/glencoesoftware/omero2pandas
SummaryOMERO.tables to pandas bridge
upload_time2024-08-26 15:10:04
maintainerNone
docs_urlNone
authorGlencoe Software, Inc.
requires_python>=3.6
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # omero2pandas

A convenience package to download data from OMERO.tables into Pandas dataframes.

# Installation

omero2pandas can be installed with pip on Python 3.6+:
```
pip install omero2pandas
```
omero2pandas also supports authentication using tokens generated by `omero-user-token`. 
Compatible versions can be installed as follows:
```
pip install omero2pandas[token]
```
See the [omero-user-token documentation](https://github.com/glencoesoftware/omero-user-token) for more information.


# Usage

```python
import omero2pandas
df = omero2pandas.read_table(file_id=402)
df.head()
```

Tables can be referenced based on their OriginalFile's ID or their Annotation's ID. 
These can be easily obtained by hovering over the relevant table in OMERO.web, which shows a tooltip with these IDs.

To avoid loading data directly into a dataframe, you can also download directly into a CSV:

```python
import omero2pandas
omero2pandas.download_table("/path/to/output.csv", file_id=2, chunk_size=1000)
```

`chunk_size` can be specified when both reading and downloading tables. It determines 
how many rows are loaded from the server in a single operation.

### Supplying credentials

Multiple modes of connecting to OMERO are supported. If you're already familiar with omero-py, you can supply a premade client:

```python
import omero
import omero2pandas
my_client = omero.client(host="myserver", port=4064)
df = omero2pandas.read_table(file_id=402, omero_connector=my_client)
df.head()
```

Alternatively, your connection and login details can be provided via arguments:

```python
import omero2pandas
df = omero2pandas.read_table(file_id=402, server="omero.mysite.com", port=4064,
                             username="myuser", password="mypass")
df.head()
```

If you have `omero_user_token` installed, an existing token will be automatically detected and used to connect:

```python
import omero2pandas
df = omero2pandas.read_table(file_id=402)
df.head()
```

You can also generate the connection object separately using the built-in wrapper:
```python
import omero2pandas
connector = omero2pandas.connect_to_omero(server="myserver", port=4064)
# User will be prompted for any missing connection info. 

df = omero2pandas.read_table(file_id=402, omero_connector=connector)
df.head()
```

When prompting for missing connection information, the package automatically detects whether 
omero2pandas is running in a Jupyter environment. If so, you'll get a login widget to complete details.
Otherwise a CLI interface will be provided.

This behaviour can be disabled by supplying `interactive=False` to the connect call.

## Reading data

Several utility methods are provided for working with OMERO.tables. These all support the full range of connection modes.

Fetch the names of the columns in a table:
```python
import omero2pandas
columns = omero2pandas.get_table_columns(annotation_id=142)
# Returns a list of column names
```

Fetch the dimensions of a table:
```python
import omero2pandas
num_rows, num_cols = omero2pandas.get_table_size(annotation_id=12)
# Returns a tuple containing row and column count.
```


You can read out specific rows and/or columns
```python
import omero2pandas
my_dataframe = omero2pandas.read_table(file_id=10, 
                                       column_names=['object', 'intensity'],
                                       rows=range(0, 100, 10))
my_dataframe.head()
# Returns object and intensity columns for every 10th row in the table
```

Returned dataframes also come with a pandas index column, representing the original row numbers from the OMERO.table.

## Writing data

Pandas dataframes can also be written back as new OMERO.tables.
N.b. It is currently not possible to modify a table on the server.

Connection handling works just as it does with downloading, you can 
provide credentials, a token or a connection object.

To upload data, the user needs to specify which OMERO object(s) the table
will be associated with. This can be achieved with the `parent_id` and 
`parent_type` arguments. Supported objects are Dataset, 
Well, Plate, Project, Screen and Image.

```python
import pandas
import omero2pandas
my_data = pandas.read_csv("/path/to/my_data.csv")
ann_id = omero2pandas.upload_table(my_data, "Name for table", 
                                   parent_id=142, parent_type="Image")
# Returns the annotation ID of the uploaded FileAnnotation object
```

Once uploaded, the table will be accessible on OMERO.web under the file 
annotations panel of the parent object. Using unique table names is advised.

### Linking to multiple objects

To link to multiple objects, you can supply a list of `(<type>, <id>)`
tuples to the `links` parameter. The resulting table's FileAnnotation 
will be linked to all objects in the `links` parameter (plus 
`parent_type`:`parent_id` if provided).


```python
import omero2pandas
ann_id = omero2pandas.upload_table(
    "/path/to/my.csv", "My table", 
    links=[("Image", 101), ("Dataset", 2), ("Roi", 1923)])
# Uploads with Annotation links to Image 101, Dataset 2 and ROI 1923 
```

Links allow OMERO.web to display the resulting table as 
an annotation associated with those objects.


### Large Tables
The first argument to `upload_table` can be a pandas dataframe or a path to a 
.csv file containing the table data. In the latter case the table will be read 
in chunks corresponding to the `chunk_size` argument. This will allow you to 
upload tables which are too large to load into system memory.

```python
import omero2pandas
ann_id = omero2pandas.upload_table("/path/to/my.csv", "My table", 
                                   142, chunk_size=100)
# Reads and uploads the file to Image 142, loading 100 lines at a time 
```

The `chunk_size` argument sets how many rows to send with each call to the server. 
If not specified, omero2pandas will attempt to automatically optimise chunk 
size to send ~2 million table cells per call (up to a max of 50,000 
rows per message for narrow tables).




# Advanced Usage

This package also contains utility functions for managing an OMERO connection.

`omero2pandas.connect_to_omero()` takes many of the arguments from the other functions and returns an `OMEROConnection` object.

The `OMEROConnection` handles your OMERO login and session, cleaning everything up automatically on exit. This has some accessory methods to access useful API calls:

```python
import omero2pandas
connector = omero2pandas.OMEROConnection()
connector.connect()
client = connector.get_client()
blitz = connector.get_gateway()
```
When a client is active within the `OMEROConnection` object, calls to this wrapper class will also be forwarded directly to the client object.

OMEROConnection objects can also be used as a context manager:
```python
import omero2pandas
with omero2pandas.OMEROConnection(server='my.server', port=4064, 
                                  username='test.user',) as connector:
    blitz = connector.get_gateway()
    image = blitz.getObject('Image', id=100)
    # Continue using the standard OMERO API.
```

The context manager will handle session creation and cleanup automatically.

### Connection Management

omero2pandas keeps track of any active connector objects and shuts them down 
safely when Python exits. Deleting all references to a connector will also 
handle closing the connection to OMERO gracefully. You can also call 
`connector.shutdown()` to close a connection manually.

By default omero2pandas also keeps active connections alive by pinging the 
server once per minute (otherwise the session may timeout and require 
reconnecting). This can be disabled as follows 

```python
omero2pandas.connect_to_omero(keep_alive=False)
```

N.b. omero2pandas uses a different system from the native OMERO API's
`client.enableKeepAlive` function, using both is unnecessary.

### Querying tables

You can also supply [PyTables condition syntax](https://www.pytables.org/usersguide/condition_syntax.html) to the `read_table` and `download_table` functions.
Returned tables will only include rows which pass this filter.

**Basic syntax**

Select rows representing objects with area greater than 20:
```python
omero2pandas.read_table(file_id=10, query='(area>20)')
```

**Multiple conditions**

Select rows representing objects with an even ID number lower than 50:
```python
omero2pandas.read_table(file_id=10, query='(id%2==0) & (id<50)')
```

**Complex conditions** 

Select rows representing objects which originated from an ROI named 'Nucleus':
```python
omero2pandas.read_table(file_id=10, query='x!="Nucleus"', variables={'x': omero.rtypes.rstring('Roi Name')})
```

N.b. Column names containing spaces aren't supported by the native syntax, but can be supplied as variables which are provided by the `variables` parameter.

The variables map needs to be a dictionary mapping string variables to [OMERO rtypes](https://omero.readthedocs.io/en/v5.6.9/developers/GettingStarted/AdvancedClientDevelopment.html#rtypes) objects rather than raw Python objects. 
These should match the relevant column type. Mapped variables are substituted into the query during processing.

A `variables` map usually isn't needed for simple queries. The basic condition string should automatically get converted to a meaningful type, but when this fails 
replacing tricky elements with a variable may help.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/glencoesoftware/omero2pandas",
    "name": "omero2pandas",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": null,
    "author": "Glencoe Software, Inc.",
    "author_email": "info@glencoesoftware.com",
    "download_url": "https://files.pythonhosted.org/packages/80/5e/a0cfadb37d5091c7f11e893b72aba5bf7dc6cebf2828d07ab5403c4acff7/omero2pandas-0.3.0.tar.gz",
    "platform": "any",
    "description": "# omero2pandas\n\nA convenience package to download data from OMERO.tables into Pandas dataframes.\n\n# Installation\n\nomero2pandas can be installed with pip on Python 3.6+:\n```\npip install omero2pandas\n```\nomero2pandas also supports authentication using tokens generated by `omero-user-token`. \nCompatible versions can be installed as follows:\n```\npip install omero2pandas[token]\n```\nSee the [omero-user-token documentation](https://github.com/glencoesoftware/omero-user-token) for more information.\n\n\n# Usage\n\n```python\nimport omero2pandas\ndf = omero2pandas.read_table(file_id=402)\ndf.head()\n```\n\nTables can be referenced based on their OriginalFile's ID or their Annotation's ID. \nThese can be easily obtained by hovering over the relevant table in OMERO.web, which shows a tooltip with these IDs.\n\nTo avoid loading data directly into a dataframe, you can also download directly into a CSV:\n\n```python\nimport omero2pandas\nomero2pandas.download_table(\"/path/to/output.csv\", file_id=2, chunk_size=1000)\n```\n\n`chunk_size` can be specified when both reading and downloading tables. It determines \nhow many rows are loaded from the server in a single operation.\n\n### Supplying credentials\n\nMultiple modes of connecting to OMERO are supported. If you're already familiar with omero-py, you can supply a premade client:\n\n```python\nimport omero\nimport omero2pandas\nmy_client = omero.client(host=\"myserver\", port=4064)\ndf = omero2pandas.read_table(file_id=402, omero_connector=my_client)\ndf.head()\n```\n\nAlternatively, your connection and login details can be provided via arguments:\n\n```python\nimport omero2pandas\ndf = omero2pandas.read_table(file_id=402, server=\"omero.mysite.com\", port=4064,\n                             username=\"myuser\", password=\"mypass\")\ndf.head()\n```\n\nIf you have `omero_user_token` installed, an existing token will be automatically detected and used to connect:\n\n```python\nimport omero2pandas\ndf = omero2pandas.read_table(file_id=402)\ndf.head()\n```\n\nYou can also generate the connection object separately using the built-in wrapper:\n```python\nimport omero2pandas\nconnector = omero2pandas.connect_to_omero(server=\"myserver\", port=4064)\n# User will be prompted for any missing connection info. \n\ndf = omero2pandas.read_table(file_id=402, omero_connector=connector)\ndf.head()\n```\n\nWhen prompting for missing connection information, the package automatically detects whether \nomero2pandas is running in a Jupyter environment. If so, you'll get a login widget to complete details.\nOtherwise a CLI interface will be provided.\n\nThis behaviour can be disabled by supplying `interactive=False` to the connect call.\n\n## Reading data\n\nSeveral utility methods are provided for working with OMERO.tables. These all support the full range of connection modes.\n\nFetch the names of the columns in a table:\n```python\nimport omero2pandas\ncolumns = omero2pandas.get_table_columns(annotation_id=142)\n# Returns a list of column names\n```\n\nFetch the dimensions of a table:\n```python\nimport omero2pandas\nnum_rows, num_cols = omero2pandas.get_table_size(annotation_id=12)\n# Returns a tuple containing row and column count.\n```\n\n\nYou can read out specific rows and/or columns\n```python\nimport omero2pandas\nmy_dataframe = omero2pandas.read_table(file_id=10, \n                                       column_names=['object', 'intensity'],\n                                       rows=range(0, 100, 10))\nmy_dataframe.head()\n# Returns object and intensity columns for every 10th row in the table\n```\n\nReturned dataframes also come with a pandas index column, representing the original row numbers from the OMERO.table.\n\n## Writing data\n\nPandas dataframes can also be written back as new OMERO.tables.\nN.b. It is currently not possible to modify a table on the server.\n\nConnection handling works just as it does with downloading, you can \nprovide credentials, a token or a connection object.\n\nTo upload data, the user needs to specify which OMERO object(s) the table\nwill be associated with. This can be achieved with the `parent_id` and \n`parent_type` arguments. Supported objects are Dataset, \nWell, Plate, Project, Screen and Image.\n\n```python\nimport pandas\nimport omero2pandas\nmy_data = pandas.read_csv(\"/path/to/my_data.csv\")\nann_id = omero2pandas.upload_table(my_data, \"Name for table\", \n                                   parent_id=142, parent_type=\"Image\")\n# Returns the annotation ID of the uploaded FileAnnotation object\n```\n\nOnce uploaded, the table will be accessible on OMERO.web under the file \nannotations panel of the parent object. Using unique table names is advised.\n\n### Linking to multiple objects\n\nTo link to multiple objects, you can supply a list of `(<type>, <id>)`\ntuples to the `links` parameter. The resulting table's FileAnnotation \nwill be linked to all objects in the `links` parameter (plus \n`parent_type`:`parent_id` if provided).\n\n\n```python\nimport omero2pandas\nann_id = omero2pandas.upload_table(\n    \"/path/to/my.csv\", \"My table\", \n    links=[(\"Image\", 101), (\"Dataset\", 2), (\"Roi\", 1923)])\n# Uploads with Annotation links to Image 101, Dataset 2 and ROI 1923 \n```\n\nLinks allow OMERO.web to display the resulting table as \nan annotation associated with those objects.\n\n\n### Large Tables\nThe first argument to `upload_table` can be a pandas dataframe or a path to a \n.csv file containing the table data. In the latter case the table will be read \nin chunks corresponding to the `chunk_size` argument. This will allow you to \nupload tables which are too large to load into system memory.\n\n```python\nimport omero2pandas\nann_id = omero2pandas.upload_table(\"/path/to/my.csv\", \"My table\", \n                                   142, chunk_size=100)\n# Reads and uploads the file to Image 142, loading 100 lines at a time \n```\n\nThe `chunk_size` argument sets how many rows to send with each call to the server. \nIf not specified, omero2pandas will attempt to automatically optimise chunk \nsize to send ~2 million table cells per call (up to a max of 50,000 \nrows per message for narrow tables).\n\n\n\n\n# Advanced Usage\n\nThis package also contains utility functions for managing an OMERO connection.\n\n`omero2pandas.connect_to_omero()` takes many of the arguments from the other functions and returns an `OMEROConnection` object.\n\nThe `OMEROConnection` handles your OMERO login and session, cleaning everything up automatically on exit. This has some accessory methods to access useful API calls:\n\n```python\nimport omero2pandas\nconnector = omero2pandas.OMEROConnection()\nconnector.connect()\nclient = connector.get_client()\nblitz = connector.get_gateway()\n```\nWhen a client is active within the `OMEROConnection` object, calls to this wrapper class will also be forwarded directly to the client object.\n\nOMEROConnection objects can also be used as a context manager:\n```python\nimport omero2pandas\nwith omero2pandas.OMEROConnection(server='my.server', port=4064, \n                                  username='test.user',) as connector:\n    blitz = connector.get_gateway()\n    image = blitz.getObject('Image', id=100)\n    # Continue using the standard OMERO API.\n```\n\nThe context manager will handle session creation and cleanup automatically.\n\n### Connection Management\n\nomero2pandas keeps track of any active connector objects and shuts them down \nsafely when Python exits. Deleting all references to a connector will also \nhandle closing the connection to OMERO gracefully. You can also call \n`connector.shutdown()` to close a connection manually.\n\nBy default omero2pandas also keeps active connections alive by pinging the \nserver once per minute (otherwise the session may timeout and require \nreconnecting). This can be disabled as follows \n\n```python\nomero2pandas.connect_to_omero(keep_alive=False)\n```\n\nN.b. omero2pandas uses a different system from the native OMERO API's\n`client.enableKeepAlive` function, using both is unnecessary.\n\n### Querying tables\n\nYou can also supply [PyTables condition syntax](https://www.pytables.org/usersguide/condition_syntax.html) to the `read_table` and `download_table` functions.\nReturned tables will only include rows which pass this filter.\n\n**Basic syntax**\n\nSelect rows representing objects with area greater than 20:\n```python\nomero2pandas.read_table(file_id=10, query='(area>20)')\n```\n\n**Multiple conditions**\n\nSelect rows representing objects with an even ID number lower than 50:\n```python\nomero2pandas.read_table(file_id=10, query='(id%2==0) & (id<50)')\n```\n\n**Complex conditions** \n\nSelect rows representing objects which originated from an ROI named 'Nucleus':\n```python\nomero2pandas.read_table(file_id=10, query='x!=\"Nucleus\"', variables={'x': omero.rtypes.rstring('Roi Name')})\n```\n\nN.b. Column names containing spaces aren't supported by the native syntax, but can be supplied as variables which are provided by the `variables` parameter.\n\nThe variables map needs to be a dictionary mapping string variables to [OMERO rtypes](https://omero.readthedocs.io/en/v5.6.9/developers/GettingStarted/AdvancedClientDevelopment.html#rtypes) objects rather than raw Python objects. \nThese should match the relevant column type. Mapped variables are substituted into the query during processing.\n\nA `variables` map usually isn't needed for simple queries. The basic condition string should automatically get converted to a meaningful type, but when this fails \nreplacing tricky elements with a variable may help.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "OMERO.tables to pandas bridge",
    "version": "0.3.0",
    "project_urls": {
        "Homepage": "https://github.com/glencoesoftware/omero2pandas"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "be49704fe75c45f493562f2a97a7f3da8213c43cf67942d6ddf8f50410c3bc66",
                "md5": "2948cc0dbb94a5c7467d8d0b003159e8",
                "sha256": "b17851da70727f511152737e6e2b123f992b8da8eb0ec9ab8de7d8fd21fae5b5"
            },
            "downloads": -1,
            "filename": "omero2pandas-0.3.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "2948cc0dbb94a5c7467d8d0b003159e8",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 28249,
            "upload_time": "2024-08-26T15:10:02",
            "upload_time_iso_8601": "2024-08-26T15:10:02.981601Z",
            "url": "https://files.pythonhosted.org/packages/be/49/704fe75c45f493562f2a97a7f3da8213c43cf67942d6ddf8f50410c3bc66/omero2pandas-0.3.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "805ea0cfadb37d5091c7f11e893b72aba5bf7dc6cebf2828d07ab5403c4acff7",
                "md5": "abe6a55ad70476a343dfed9475847a8d",
                "sha256": "8732861aa9f2d5232f1564bd3fcff31406cf2c56e536f536771b682d04a2c399"
            },
            "downloads": -1,
            "filename": "omero2pandas-0.3.0.tar.gz",
            "has_sig": false,
            "md5_digest": "abe6a55ad70476a343dfed9475847a8d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 29704,
            "upload_time": "2024-08-26T15:10:04",
            "upload_time_iso_8601": "2024-08-26T15:10:04.789524Z",
            "url": "https://files.pythonhosted.org/packages/80/5e/a0cfadb37d5091c7f11e893b72aba5bf7dc6cebf2828d07ab5403c4acff7/omero2pandas-0.3.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-26 15:10:04",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "glencoesoftware",
    "github_project": "omero2pandas",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "omero2pandas"
}
        
Elapsed time: 0.28743s