crunch-convert

Name	crunch-convert JSON
Version	0.3.3 JSON
	download
home_page	https://github.com/crunchdao/crunch-convert
Summary	crunch-convert - Conversion module for the CrunchDAO Platform
upload_time	2025-10-07 18:46:33
maintainer	None
docs_url	None
author	Enzo CACERES
requires_python	>=3
license	None
keywords	package development template
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Crunch Convert Tool

[![PyTest](https://github.com/crunchdao/crunch-convert/actions/workflows/pytest.yml/badge.svg)](https://github.com/crunchdao/crunch-convert/actions/workflows/pytest.yml)

This Python library is designed for the [CrunchDAO Platform](https://hub.crunchdao.com/), exposing the conversion tools in a very small CLI.

- [Crunch Convert Tool](#crunch-convert-tool)
- [Installation](#installation)
- [Usage](#usage)
  - [Convert a Notebook](#convert-a-notebook)
  - [Freeze Requirements](#freeze-requirements)
- [Features](#features)
  - [Automatic line commenting](#automatic-line-commenting)
  - [Specifying package versions](#specifying-package-versions)
  - [R imports via rpy2](#r-imports-via-rpy2)
  - [Embedded Files](#embedded-files)
- [Contributing](#contributing)
- [License](#license)

# Installation

Use [pip](https://pypi.org/project/crunch-convert/) to install the `crunch-convert`.

```bash
pip install --upgrade crunch-convert
```

# Usage

## Convert a Notebook

```bash
crunch-convert notebook ./my-notebook.ipynb --write-requirements --write-embedded-files
```

<details>
<summary>Show a programmatic way</summary>

```python
from crunch_convert.notebook import extract_from_file
from crunch_convert.requirements_txt import CrunchHubWhitelist, format_files_from_imported

flatten = extract_from_file("notebook.ipynb")

# Write the main.py
with open("main.py", "w") as fd:
  fd.write(flatten.source_code)

# Map the imported requirements using the Crunch Hub's whitelist
whitelist = CrunchHubWhitelist()
requirements_files = format_files_from_imported(
  flatten.requirements,
  header="extracted from a notebook",
  whitelist=whitelist,
)

# Write the requirements.txt files (Python and/or R)
for requirement_language, content in requirements_files.items():
  with open(requirement_language.txt_file_name, "w") as fd:
    fd.write(content)

# Write the embedded files
for embedded_file in flatten.embedded_files:
  with open(embedded_file.normalized_path, "w") as fd:
    fd.write(embedded_file.content)
```
</details>

## Freeze Requirements

<details>
<summary>Show a programmatic way</summary>

```python
from crunch_convert import RequirementLanguage
from crunch_convert.requirements_txt import CrunchHubVersionFinder, CrunchHubWhitelist, format_files_from_named, freeze, parse_from_file

whitelist = CrunchHubWhitelist()
version_finder = CrunchHubVersionFinder()

# Open the requirements.txt to freeze
with open("requirements.txt", "r") as fd:
    content = fd.read()

# Parse it into NamedRequirement
requirements = parse_from_file(
    language=RequirementLanguage.PYTHON,
    file_content=content
)

# Freeze them
frozen_requirements = freeze(
    requirements=requirements,

    # Only freeze if required by the whitelist
    freeze_only_if_required=True,
    whitelist=whitelist,

    version_finder=version_finder,
)

# Format the new requirements.txt using now frozen requirements
frozen_requirements_files = format_files_from_named(
    frozen_requirements,
    header="frozen from registry",
    whitelist=whitelist,
)

# Write to the new file
with open("requirements.frozen.txt", "w") as fd:
    content = frozen_requirements_files[RequirementLanguage.PYTHON]
    fd.write(content)
```
</details>

> [!TIP]
> The output of `format_files_from_imported()` can be re-parsed right after, no need to first store it in a file.

# Features

## Automatic line commenting

Only includes the functions, imports, and classes will be kept.

Everything else is commented out to prevent side effects when your code is loaded into the cloud environment. (e.g. when you're exploring the data, debugging your algorithm, or doing visualizating using Matplotlib, etc.)

You can prevent this behavior by using special comments to tell the system to keep part of your code:

- To start a section that you want to keep, write: `@crunch/keep:on`
- To end the section, write: `@crunch/keep:off`

```python
# @crunch/keep:on

# keep global initialization
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# keep constants
TRAIN_DEPTH = 42
IMPORTANT_FEATURES = [ "a", "b", "c" ]

# @crunch/keep:off

# this will be ignored
x, y = crunch.load_data()

def train(...):
    ...
```

The result will be:

```python
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

TRAIN_DEPTH = 42
IMPORTANT_FEATURES = [ "a", "b", "c" ]

#x, y = crunch.load_data()

def train(...):
    ...
```

> [!TIP]
> You can put a `@crunch/keep:on` at the top of the cell and never close it to keep everything.

## Specifying package versions

Since submitting a notebook does not include a `requirements.txt`, users can instead specify the version of a package using import-level [requirement specifiers](https://pip.pypa.io/en/stable/reference/requirement-specifiers/#examples) in a comment on the same line.

```python
# Valid statements
import pandas # == 1.3
import sklearn # >= 1.2, < 2.0
import tqdm # [foo, bar]
import sklearn # ~= 1.4.2
from requests import Session # == 1.5
```

Specifying multiple times will cause the submission to be rejected if they are different.

```python
# Inconsistant versions will be rejected
import pandas # == 1.3
import pandas # == 1.5
```

Specifying versions on standard libraries does nothing (but they will still be rejected if there is an inconsistent version).

```python
# Will be ignored
import os # == 1.3
import sys # == 1.5
```

If an optional dependency is required for the code to work properly, an import statement must be added, even if the code does not use it directly.

```python
import castle.algorithms

# Keep me, I am needed by castle
import torch
```

It is possible for multiple import names to resolve to different libraries on PyPI. If this happens, you must specify which one you want. If you do not want a specific version, you can use `@latest`, as without this, we cannot distinguish between commented code and version specifiers.

```python
# Prefer https://pypi.org/project/EMD-signal/
import pyemd # EMD-signal @latest

# Prefer https://pypi.org/project/pyemd/
import pyemd # pyemd @latest
```

## R imports via rpy2

For notebook users, the packages are automatically extracted from the `importr("<name>")` calls, which is provided by [rpy2](https://rpy2.github.io/).

```python
# Import the `importr` function
from rpy2.robjects.packages import importr

# Import the "base" R package
base = importr("base")
```

The following format must be followed:
- The import must be declared at the root level.
- The result must be assigned to a variable; the variable's name will not matter.
- The function name must be `importr`, and it must be imported as shown in the example above.
- The first argument must be a string constant, variables or other will be ignored.
- The other arguments are ignored; this allows for [custom import mapping](https://rpy2.github.io/doc/latest/html/robjects_rpackages.html#importing-r-packages) if necessary.

The line will not be commented, [read more about line commenting here](#automatic-line-commenting).

## Embedded Files

Additional files can be embedded in cells to be submitted with the Notebook. In order for the system to recognize a cell as an Embed File, the following syntax must be followed:

```
---
file: <file_name>.md
---

<!-- File content goes here -->
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Aenean rutrum condimentum ornare.
```

Submitting multiple cells with the same file name will be rejected.

While the focus is on Markdown files, any text file will be accepted. Including but not limited to: `.txt`, `.yaml`, `.json`, ...

# Contributing

Pull requests are always welcome! If you find any issues or have suggestions for improvements, please feel free to submit a pull request or open an issue in the GitHub repository.

# License

[MIT](https://choosealicense.com/licenses/mit/)

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/crunchdao/crunch-convert",
    "name": "crunch-convert",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3",
    "maintainer_email": null,
    "keywords": "package development template",
    "author": "Enzo CACERES",
    "author_email": "enzo.caceres@crunchdao.com",
    "download_url": "https://files.pythonhosted.org/packages/db/d0/d9c1e3c5a8a108e5cd01923dd8e14d20b65ad2907888fe2c1e050e40e77a/crunch_convert-0.3.3.tar.gz",
    "platform": null,
    "description": "# Crunch Convert Tool\n\n[![PyTest](https://github.com/crunchdao/crunch-convert/actions/workflows/pytest.yml/badge.svg)](https://github.com/crunchdao/crunch-convert/actions/workflows/pytest.yml)\n\nThis Python library is designed for the [CrunchDAO Platform](https://hub.crunchdao.com/), exposing the conversion tools in a very small CLI.\n\n- [Crunch Convert Tool](#crunch-convert-tool)\n- [Installation](#installation)\n- [Usage](#usage)\n  - [Convert a Notebook](#convert-a-notebook)\n  - [Freeze Requirements](#freeze-requirements)\n- [Features](#features)\n  - [Automatic line commenting](#automatic-line-commenting)\n  - [Specifying package versions](#specifying-package-versions)\n  - [R imports via rpy2](#r-imports-via-rpy2)\n  - [Embedded Files](#embedded-files)\n- [Contributing](#contributing)\n- [License](#license)\n\n# Installation\n\nUse [pip](https://pypi.org/project/crunch-convert/) to install the `crunch-convert`.\n\n```bash\npip install --upgrade crunch-convert\n```\n\n# Usage\n\n## Convert a Notebook\n\n```bash\ncrunch-convert notebook ./my-notebook.ipynb --write-requirements --write-embedded-files\n```\n\n<details>\n<summary>Show a programmatic way</summary>\n\n```python\nfrom crunch_convert.notebook import extract_from_file\nfrom crunch_convert.requirements_txt import CrunchHubWhitelist, format_files_from_imported\n\nflatten = extract_from_file(\"notebook.ipynb\")\n\n# Write the main.py\nwith open(\"main.py\", \"w\") as fd:\n  fd.write(flatten.source_code)\n\n# Map the imported requirements using the Crunch Hub's whitelist\nwhitelist = CrunchHubWhitelist()\nrequirements_files = format_files_from_imported(\n  flatten.requirements,\n  header=\"extracted from a notebook\",\n  whitelist=whitelist,\n)\n\n# Write the requirements.txt files (Python and/or R)\nfor requirement_language, content in requirements_files.items():\n  with open(requirement_language.txt_file_name, \"w\") as fd:\n    fd.write(content)\n\n# Write the embedded files\nfor embedded_file in flatten.embedded_files:\n  with open(embedded_file.normalized_path, \"w\") as fd:\n    fd.write(embedded_file.content)\n```\n</details>\n\n## Freeze Requirements\n\n<details>\n<summary>Show a programmatic way</summary>\n\n```python\nfrom crunch_convert import RequirementLanguage\nfrom crunch_convert.requirements_txt import CrunchHubVersionFinder, CrunchHubWhitelist, format_files_from_named, freeze, parse_from_file\n\nwhitelist = CrunchHubWhitelist()\nversion_finder = CrunchHubVersionFinder()\n\n# Open the requirements.txt to freeze\nwith open(\"requirements.txt\", \"r\") as fd:\n    content = fd.read()\n\n# Parse it into NamedRequirement\nrequirements = parse_from_file(\n    language=RequirementLanguage.PYTHON,\n    file_content=content\n)\n\n# Freeze them\nfrozen_requirements = freeze(\n    requirements=requirements,\n\n    # Only freeze if required by the whitelist\n    freeze_only_if_required=True,\n    whitelist=whitelist,\n\n    version_finder=version_finder,\n)\n\n# Format the new requirements.txt using now frozen requirements\nfrozen_requirements_files = format_files_from_named(\n    frozen_requirements,\n    header=\"frozen from registry\",\n    whitelist=whitelist,\n)\n\n# Write to the new file\nwith open(\"requirements.frozen.txt\", \"w\") as fd:\n    content = frozen_requirements_files[RequirementLanguage.PYTHON]\n    fd.write(content)\n```\n</details>\n\n> [!TIP]\n> The output of `format_files_from_imported()` can be re-parsed right after, no need to first store it in a file.\n\n# Features\n\n## Automatic line commenting\n\nOnly includes the functions, imports, and classes will be kept.\n\nEverything else is commented out to prevent side effects when your code is loaded into the cloud environment. (e.g. when you're exploring the data, debugging your algorithm, or doing visualizating using Matplotlib, etc.)\n\nYou can prevent this behavior by using special comments to tell the system to keep part of your code:\n\n- To start a section that you want to keep, write: `@crunch/keep:on`\n- To end the section, write: `@crunch/keep:off`\n\n```python\n# @crunch/keep:on\n\n# keep global initialization\ndevice = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n\n# keep constants\nTRAIN_DEPTH = 42\nIMPORTANT_FEATURES = [ \"a\", \"b\", \"c\" ]\n\n# @crunch/keep:off\n\n# this will be ignored\nx, y = crunch.load_data()\n\ndef train(...):\n    ...\n```\n\nThe result will be:\n\n```python\ndevice = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n\nTRAIN_DEPTH = 42\nIMPORTANT_FEATURES = [ \"a\", \"b\", \"c\" ]\n\n#x, y = crunch.load_data()\n\ndef train(...):\n    ...\n```\n\n> [!TIP]\n> You can put a `@crunch/keep:on` at the top of the cell and never close it to keep everything.\n\n## Specifying package versions\n\nSince submitting a notebook does not include a `requirements.txt`, users can instead specify the version of a package using import-level [requirement specifiers](https://pip.pypa.io/en/stable/reference/requirement-specifiers/#examples) in a comment on the same line.\n\n```python\n# Valid statements\nimport pandas # == 1.3\nimport sklearn # >= 1.2, < 2.0\nimport tqdm # [foo, bar]\nimport sklearn # ~= 1.4.2\nfrom requests import Session # == 1.5\n```\n\nSpecifying multiple times will cause the submission to be rejected if they are different.\n\n```python\n# Inconsistant versions will be rejected\nimport pandas # == 1.3\nimport pandas # == 1.5\n```\n\nSpecifying versions on standard libraries does nothing (but they will still be rejected if there is an inconsistent version).\n\n```python\n# Will be ignored\nimport os # == 1.3\nimport sys # == 1.5\n```\n\nIf an optional dependency is required for the code to work properly, an import statement must be added, even if the code does not use it directly.\n\n```python\nimport castle.algorithms\n\n# Keep me, I am needed by castle\nimport torch\n```\n\nIt is possible for multiple import names to resolve to different libraries on PyPI. If this happens, you must specify which one you want. If you do not want a specific version, you can use `@latest`, as without this, we cannot distinguish between commented code and version specifiers.\n\n```python\n# Prefer https://pypi.org/project/EMD-signal/\nimport pyemd # EMD-signal @latest\n\n# Prefer https://pypi.org/project/pyemd/\nimport pyemd # pyemd @latest\n```\n\n## R imports via rpy2\n\nFor notebook users, the packages are automatically extracted from the `importr(\"<name>\")` calls, which is provided by [rpy2](https://rpy2.github.io/).\n\n```python\n# Import the `importr` function\nfrom rpy2.robjects.packages import importr\n\n# Import the \"base\" R package\nbase = importr(\"base\")\n```\n\nThe following format must be followed:\n- The import must be declared at the root level.\n- The result must be assigned to a variable; the variable's name will not matter.\n- The function name must be `importr`, and it must be imported as shown in the example above.\n- The first argument must be a string constant, variables or other will be ignored.\n- The other arguments are ignored; this allows for [custom import mapping](https://rpy2.github.io/doc/latest/html/robjects_rpackages.html#importing-r-packages) if necessary.\n\nThe line will not be commented, [read more about line commenting here](#automatic-line-commenting).\n\n## Embedded Files\n\nAdditional files can be embedded in cells to be submitted with the Notebook. In order for the system to recognize a cell as an Embed File, the following syntax must be followed:\n\n```\n---\nfile: <file_name>.md\n---\n\n<!-- File content goes here -->\nLorem ipsum dolor sit amet, consectetur adipiscing elit.\nAenean rutrum condimentum ornare.\n```\n\nSubmitting multiple cells with the same file name will be rejected.\n\nWhile the focus is on Markdown files, any text file will be accepted. Including but not limited to: `.txt`, `.yaml`, `.json`, ...\n\n# Contributing\n\nPull requests are always welcome! If you find any issues or have suggestions for improvements, please feel free to submit a pull request or open an issue in the GitHub repository.\n\n# License\n\n[MIT](https://choosealicense.com/licenses/mit/)\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "crunch-convert - Conversion module for the CrunchDAO Platform",
    "version": "0.3.3",
    "project_urls": {
        "Homepage": "https://github.com/crunchdao/crunch-convert"
    },
    "split_keywords": [
        "package",
        "development",
        "template"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "5b24b36f001dc60be01851122e1602bc962f077f751dad7dfa6615dff6b14095",
                "md5": "aa3d33cbf85f0622abe319e40dea9d2e",
                "sha256": "aa60b312b203d8fec65570141bf605ed405c8d76d7045e2eca82badf68883a73"
            },
            "downloads": -1,
            "filename": "crunch_convert-0.3.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "aa3d33cbf85f0622abe319e40dea9d2e",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3",
            "size": 20579,
            "upload_time": "2025-10-07T18:46:32",
            "upload_time_iso_8601": "2025-10-07T18:46:32.298248Z",
            "url": "https://files.pythonhosted.org/packages/5b/24/b36f001dc60be01851122e1602bc962f077f751dad7dfa6615dff6b14095/crunch_convert-0.3.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "dbd0d9c1e3c5a8a108e5cd01923dd8e14d20b65ad2907888fe2c1e050e40e77a",
                "md5": "3083f0c3c795f3ce203ddb4666024d9e",
                "sha256": "a3f83eee7b57e8f89d26cfa835a0772127e9ac67df24437b94ad45dcb8b56160"
            },
            "downloads": -1,
            "filename": "crunch_convert-0.3.3.tar.gz",
            "has_sig": false,
            "md5_digest": "3083f0c3c795f3ce203ddb4666024d9e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3",
            "size": 19076,
            "upload_time": "2025-10-07T18:46:33",
            "upload_time_iso_8601": "2025-10-07T18:46:33.412399Z",
            "url": "https://files.pythonhosted.org/packages/db/d0/d9c1e3c5a8a108e5cd01923dd8e14d20b65ad2907888fe2c1e050e40e77a/crunch_convert-0.3.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-07 18:46:33",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "crunchdao",
    "github_project": "crunch-convert",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "crunch-convert"
}

Enzo CACERES