scilint


Namescilint JSON
Version 0.2.4 PyPI version JSON
download
home_pagehttps://github.com/newday-data/scilint/tree/{branch}/
Summaryinfuse quality into notebook based workflows with a new type of build tool.
upload_time2023-10-31 15:33:52
maintainer
docs_urlNone
authorDonal Simmie
requires_python>=3.8
licenseApache Software License 2.0
keywords research production exploration ci/cd
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # 🧐 `scilint`

<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->
<p align="center">
<a href="https://badge.fury.io/py/scilint">
<img src="https://badge.fury.io/py/scilint.svg" alt="Pypi Package"> </a>
</p>

`scilint` aims to **bring a style and quality standard into notebook
based Data Science workflows**. How you define a quality notebook is
difficult and somewhat subjective. It can have the obvious meaning of
being free of bugs but also legibility and ease of comprehension are
important too.

`scilint` takes the approach of breaking down potentially quality
relevant aspects of the notebook and providing what we believe are
sensible defaults that potentially correlate with higher quality
workflows. We also let users define the quality line as they see fit
through configuration of existing thresholds and ability to add new
metrics (coming soon). As use of the library grows we anticipate being
able to statistically relate some of the quality relevant attributes to
key delivery metrics like “change failure rate” or “lead time to
production”.

# 🤔 Why do I need quality notebooks?

*If you prefer to move out of notebook-based workflows, post-exploration
to an IDE+Python mix I encourage you to have another ponder on the
benefits of staying in a notebook-based workflow. Notebooks have a
strong visual emphasis and proximity to data. They are also the primary
axis of change within Data Science - new ideas are gained from diving
into data. So instead of packing up your code, re-writing it for
elsewhere and all the waste that entails bring quality to your
exploration workflow and spend more time building stuff that matters.*

If you’re still not convinced watch this
[video](https://www.youtube.com/watch?v=9Q6sLbz37gk) where **Jeremy
Howard** does a far better job of explaining why notebooks are for
serious development too!

# ✅ What is Notebook Quality?

This is a good question and this library does not pretend to have the
answer. But we feel the problem space is worth exploring because the
value of high quality deliveries means lower time-to-market, less time
in re-work or rote porting of code and frees people up to think about
and solve hard problems. That said, there are some practices that we
have observed producing “better” notebooks workflows from experience in
production Data Science teams. These are:

- **Extracting code to small modular functions**
- **Testing those functions work in a variety of scenarios**
- Putting sufficient **emphasis on legibility and ease of
  comprehension** through adequate use of markdown

These are the starting premises that permit the notebook quality
conversation to start. To bring this to life a little, we would say that
**the notebook on the left is of lower quality than the notebook on the
right**..

<p align="center">
<img src="nbs/images/scilint_before_after_prep.png" alt="Low vs High" width="738" border="3px solid black">
</p>

# 🚀 Getting Started

> Please note `scilint` is only tested on linux and macos currently.

## Install

`pip install scilint`

## Commands

### **[`scilint_lint`](https://newday-data.github.io/scilint/scilint.html#scilint_lint)**

Exposes potential quality issues within your notebook using some
pre-defined checks. Default threshold values for these checks are
provided that will enable a build to be marked as passed or failed.

<details>
<summary>
<b>Show parameters</b>
</summary>

#### `--fail_over`

> For now a very basic failure threshold is set by providing a number of
> warnings that will be accepted without failing the build. The default
> is 1 but this can be increased via the `--fail_over` parameter. As the
> library matures we will revisit adding more nuanced options.

#### `--exclusions`

> You can exclude individual notebooks or directories using the
> `--exclusions` parameter. This is a comma separated list of paths
> where you can provide directories like “dir/” or specific notebooks
> like “somenotebook.ipynb”

#### `--display_report`

> Print the lint warnings report as a markdown formatted table.

#### `--out_dir`

> Directory to persist the lint_report, warning_violations and the
> confgiruation used.

#### `--print_syntax_errors`

> The code is parsed using the `ast` module if that parsing fails due to
> syntax errors that is noted in the warning report but the exact syntax
> error is not provided. With this flag the `SyntaxError` reason message
> that failed notebook parsing will be printed to the screen for each
> offending notebook.

</details>
<p align="center">
<img src="nbs/images/scilint_lint.png" alt="scilint_lint" width="738" border="3px solid black">
</p>

### **[`scilint_tidy`](https://newday-data.github.io/scilint/scilint.html#scilint_tidy)**

To get a consistent style across your notebooks you can run
[`scilint_tidy`](https://newday-data.github.io/scilint/scilint.html#scilint_tidy);
this currently runs `autoflake`, `black` and `isort` **in-place across
all of the notebooks in your project**. This function wraps an
opinionated flavour of the excellent
[nbQA](https://github.com/nbQA-dev/nbQA) library.

> ⚠️Note: as this **command runs in-place it will edit your existing
> notebooks**. If you would like to test what this formatting does
> without actually affecting their state then we recommended trying this
> the first time from a clean git state. That way you can stash the
> changes if you are not happy with them.

<p align="center">
<img src="nbs/images/scilint_tidy.png" alt="scilint_lint" width="738">
</p>

### **[`scilint_build`](https://newday-data.github.io/scilint/scilint.html#scilint_build)**

Chains existing functions together to form a build script for notebook
based projects. Has two versions which are executed automatically on
detection of whether your project uses `nbdev` or not.

1.  Non-nbdev projects chain these commands:
    [`scilint_tidy`](https://newday-data.github.io/scilint/scilint.html#scilint_tidy),
    [`scilint_lint`](https://newday-data.github.io/scilint/scilint.html#scilint_lint)
2.  `nbdev` projects chain the following commands:
    [`scilint_tidy`](https://newday-data.github.io/scilint/scilint.html#scilint_tidy),
    [nbdev_export](https://nbdev.fast.ai/api/export.html),
    [nbdev_test](https://nbdev.fast.ai/api/test.html),
    [`scilint_lint`](https://newday-data.github.io/scilint/scilint.html#scilint_lint),
    [nbdev_clean](https://nbdev.fast.ai/api/clean.html)

<p align="center">
<img src="nbs/images/scilint_build.png" alt="scilint_lint" width="738">
</p>

## **[`scilint_ci`](https://newday-data.github.io/scilint/scilint.html#scilint_ci)** \[`nbdev` only\]

Adds documentation generation to
[`scilint_build`](https://newday-data.github.io/scilint/scilint.html#scilint_build).
This requires an `nbdev` project and a working quarto build. Quarto is a
core part of the nbdev system, if you are having trouble installing it,
check out the `nbdev` Github [page](https://github.com/fastai/nbdev).
For more details on the Quarto project, check out their home
[page](https://quarto.org/).

# 📈 Quality Indicators

The below are potential quality indicators that you can use to set a
minimum bar for quality and comprehensibility within your projects.
These are not exhaustive or definite quality indicators - they are a
starting point to open the conversation about what it means to have a
high quality notebook in practice.

1.  Calls-Per-Function (CPF):\*\* compares the **amount of calls to the
    amount of functions**. *Looks for possible relationship between
    function definitions and usage.*
2.  In-Function-Percent (IFP): the **percentage of code that is within a
    function** rather than outside function scope.
3.  Tests-Per-Function-Mean (TPF: the **average number of tests (where
    test==assert) for all functions**. *Mean value so may be dominated
    by outliers.*
4.  Tests-Function-Coverage-Pct (TFC): what **percentage of all
    functions have at least one test**. *Note: this is coverage at
    function-level not line-based coverage.*
5.  MarkdownToCodeRatio (MCP): what is the **ratio of markdown cells to
    code cells**.
6.  TotalCodeLen (TCL): the **total line length** of the notebook code
    cells.
7.  Loc-Per-MD-Section (LPS): the **lines of code per Markdown section**
    header.
8.  SyntaxErrors (SYN): if the code within the notebook has **invalid
    Python syntax**.

> *as already stated there is no definitive answer as to whether any of
> these are low or high quality. However there are reasons to believe
> inituitively that higher or lower values of the above will produce
> higher quality notebooks. There are many questions left to answer,
> like the role of docstrings, comments and type annotations; their
> effectiveness may warrant inclusion but that is an open question at
> the moment. As this library is used and refined with more projects and
> more experimental metrics then these intuitions can evaluated more
> rigorously.*

## ➕ Adding New Indicators

For now post your ideas as a feature request and we can discuss, if
accepted you can provide a PR. We are looking for a more rigorous way
link indicator and effectivess, until that is found discussion is the
best we can do!

# 👓 Quality Specs (& a Quality Standard)

Often in Software Engineering code is both likely to go into production
and likely to continue to be used once it does. In this enviroment it
makes sense for codebases to have a single quality standard. In the
**explore vs exploit** decision making
[trade-off](https://en.wikipedia.org/wiki/Exploration-exploitation_dilemma)
this environment could be classified as **high exploit**.

For problems that are **high explore**, like most Data Science work, we
argue that **a single quality bar is not sufficient**. `scilint`
promotes adopting a *progressive consolidation*\* approach where
exploration code starts with a speed of exploration goal and this may
gradually shift to increase the emphasis on quality and reuse as the
utility of the workflow becomes proven.

This feature is known as “Quality Specs” and it allows multiple
different specifications of quality to exist within a project. The
standard can be a relatively low bar for exploration work but can become
more demanding as you are closer to the productionisation of your work.

\**(term first used by [Gaël Varoqouax](https://gael-varoquaux.info/);
see
[here](https://gael-varoquaux.info/programming/software-for-reproducible-science-lets-not-have-a-misunderstanding.html)
for argument expansion).*

## Reference Quality Standard

> The progressive consolidation workflow that we use on projects is the
> reference implementation for `scilint` and is summarised in the below
> image:

<p align="center">
<img src="nbs/images/quality_standard.png" alt="Quality Standard" width="738" border="3px solid white">
</p>

- **Legacy:** especially on larger projects there may be a large number
  of legacy notebooks that are not in use and no there is no obvious
  value in improving their quality. This could be removed from the
  workflow if you have enforced a quality standard from the outset.
- **Exploratory:** exploratory workflows are typically off-line and
  involve much iteration. The benefit of some quality bar here is that
  it aids collaboration, review and generally helps perform team-based
  Data Science easier.
- **Experimental:** we split production workflows into two groups:
  experimental and validated. Experimental notebooks are, as the name
  suggests, experiments that are yet to be proven. As they are released
  to customers they should have a reasonably high quality standard but
  not the same as validated work.
- **Validated:** we need to have the most confidence that all validated
  learning activity (experiments which have been accepted and scaled out
  to all users) will run properly for along time after it is written.

## What is a Quality Spec in practice?

A quality spec in practice is just a yaml configuration file of the
properties of the quality spec. It contains threshold values for warning
along with some other settings. To adopt a multi-spec standard place a
spec file into each directory that you want to have different standards
for. Look at `nbs/examples/nbs` to see an example of a multi-spec
standard.

    ---
      exclusions: ~
      fail_over: 1
      out_dir: "/tmp/scilint/"
      precision: 3
      print_syntax_errors: false
      evaluate: true
      warnings:
        lt:
          calls_per_func_median: 1
          calls_per_func_mean: 1
          in_func_pct: 20
          tests_func_coverage_pct: 20
          tests_per_func_mean: 0.5
          markdown_code_pct: 5
        gt:
          total_code_len: 50000
          loc_per_md_section: 2000
        equals:
          has_syntax_error: true

## What does a lint report look like?

The lint warnings are printed to the console and a more thorough report
is generated and saved as a CSV file which looks like this:

<p align="center">
<img src="nbs/images/sample_report.png" alt="Sample Report" width="738" border="3px solid white">
</p>

# 🔁 Changing Behaviour - Recommended Usage

Infusing quality into workflows is aided by having timely, short-cycle
feedback of issues. Addtionally whatever quality bar you choose as a
team, it should be non-negotiable that way you can spend time thinking
about what matters like the problem you are trying to solve not
nitpicking on small details repeatedly.

We recommend using `scilint` in the following way to maximise benefit:

1.  Decide upon a quality standard including the different specs for
    your ideal team workflow from idea to production - or just use the
    reference standard of:
    `legacy, exploratory=>experimental=>validated`. If you don’t want
    the complexity of a multi-spec standard you can just use a single
    default spec.
2.  Set `fail_over` to 1 - there is a temptation to slide this value to
    meet the amount of warning you have - it is probably easier to
    enforce a `fail_over` of 1 and to discuss the value of the
    thresholds instead if you feel the warning is not warranted.
3.  Open a terminal environment alongside your notebook environment: run
    [`scilint_build`](https://newday-data.github.io/scilint/scilint.html#scilint_build)
    often to check your project is in good shape
4.  Add pre-commit hooks to run
    [`scilint_build`](https://newday-data.github.io/scilint/scilint.html#scilint_build)
    or
    [`scilint_ci`](https://newday-data.github.io/scilint/scilint.html#scilint_ci)
    (`nbdev` only) before your changes are commited. Don’t forget to
    commit your work often!
5.  Add a CI build job that runs
    [`scilint_build`](https://newday-data.github.io/scilint/scilint.html#scilint_build)
    or
    [`scilint_ci`](https://newday-data.github.io/scilint/scilint.html#scilint_ci).
    A Github action workflow is included in this repo that does just
    that.

<p align="center">
<img src="nbs/images/scilint_pre_commit.png" alt="Pre-commit hook" width="738" border="3px solid black">
</p>

# 🙌 Standing on the shoulders of giants - *an nbdev library*

> `scilint` is written on top of the excellent `nbdev` library. This
> library is revolutionary as it truly optimises all the benefits of
> notebooks and compensates for most of their weaker points. For more
> information on `nbdev` see the [homepage](https://nbdev.fast.ai/) or
> [github repo](https://github.com/fastai/nbdev)

## 🤓 Make the switch to `nbdev`!

In case you hadn’t guessed yet we are big `nbdev` fans. `scilint` has a
better developer experience on an `nbdev` project and is more fully
featured but mostly because it will really help you when trying to move
from exploratory development to production processes.

Converting your libraries to `nbdev` is not required for this tool to
work but we argue that it would confer many benefits if you are part of
a Production Data Science team. `nbdev` contains many features that are
useful for Data Science workflows; too many in fact to cover here. We
will focus on the major features we consider to have the most impact:

1.  Explicit **separation of exploration from what is *fundamental* for
    the workflow to execute** using the `export` directive.
2.  Introducing a fit-for-purpose **test runner for notebooks**.
3.  **In-flow documentation** of a notebook that is focused on the
    reader and powerfully expressive thanks to Quarto Markdown (aids
    building towards published reproducible research)
4.  **Git friendly workflow** via pre-commit hooks.
5.  Being able to build a **modular notebook workflow as it is easy to
    export and import functions from notebooks** in your project - this
    puts shared reusable functions within reach of the team easily.

# 👍 Contributing

After you clone this repository, please run `nbdev_install_hooks` in
your terminal. This sets up git hooks, which clean up the notebooks to
remove the extraneous stuff stored in the notebooks (e.g. which cells
you ran) which causes unnecessary merge conflicts.

To run the tests in parallel, launch nbdev_test.

Before submitting a PR, check that the local library and notebooks
match.

If you made a change to the notebooks in one of the exported cells, you
can export it to the library with nbdev_prepare. If you made a change to
the library, you can export it back to the notebooks with nbdev_update.



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/newday-data/scilint/tree/{branch}/",
    "name": "scilint",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "research,production,exploration,CI/CD",
    "author": "Donal Simmie",
    "author_email": "oss@newday.co.uk",
    "download_url": "https://files.pythonhosted.org/packages/31/0e/31f81e7c00bd90e92c1ae1c61e9b47cada3ccca8363d52341914b31f9236/scilint-0.2.4.tar.gz",
    "platform": null,
    "description": "# \ud83e\uddd0 `scilint`\n\n<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->\n<p align=\"center\">\n<a href=\"https://badge.fury.io/py/scilint\">\n<img src=\"https://badge.fury.io/py/scilint.svg\" alt=\"Pypi Package\"> </a>\n</p>\n\n`scilint` aims to **bring a style and quality standard into notebook\nbased Data Science workflows**. How you define a quality notebook is\ndifficult and somewhat subjective. It can have the obvious meaning of\nbeing free of bugs but also legibility and ease of comprehension are\nimportant too.\n\n`scilint` takes the approach of breaking down potentially quality\nrelevant aspects of the notebook and providing what we believe are\nsensible defaults that potentially correlate with higher quality\nworkflows. We also let users define the quality line as they see fit\nthrough configuration of existing thresholds and ability to add new\nmetrics (coming soon). As use of the library grows we anticipate being\nable to statistically relate some of the quality relevant attributes to\nkey delivery metrics like \u201cchange failure rate\u201d or \u201clead time to\nproduction\u201d.\n\n# \ud83e\udd14 Why do I need quality notebooks?\n\n*If you prefer to move out of notebook-based workflows, post-exploration\nto an IDE+Python mix I encourage you to have another ponder on the\nbenefits of staying in a notebook-based workflow. Notebooks have a\nstrong visual emphasis and proximity to data. They are also the primary\naxis of change within Data Science - new ideas are gained from diving\ninto data. So instead of packing up your code, re-writing it for\nelsewhere and all the waste that entails bring quality to your\nexploration workflow and spend more time building stuff that matters.*\n\nIf you\u2019re still not convinced watch this\n[video](https://www.youtube.com/watch?v=9Q6sLbz37gk) where **Jeremy\nHoward** does a far better job of explaining why notebooks are for\nserious development too!\n\n# \u2705 What is Notebook Quality?\n\nThis is a good question and this library does not pretend to have the\nanswer. But we feel the problem space is worth exploring because the\nvalue of high quality deliveries means lower time-to-market, less time\nin re-work or rote porting of code and frees people up to think about\nand solve hard problems. That said, there are some practices that we\nhave observed producing \u201cbetter\u201d notebooks workflows from experience in\nproduction Data Science teams. These are:\n\n- **Extracting code to small modular functions**\n- **Testing those functions work in a variety of scenarios**\n- Putting sufficient **emphasis on legibility and ease of\n  comprehension** through adequate use of markdown\n\nThese are the starting premises that permit the notebook quality\nconversation to start. To bring this to life a little, we would say that\n**the notebook on the left is of lower quality than the notebook on the\nright**..\n\n<p align=\"center\">\n<img src=\"nbs/images/scilint_before_after_prep.png\" alt=\"Low vs High\" width=\"738\" border=\"3px solid black\">\n</p>\n\n# \ud83d\ude80 Getting Started\n\n> Please note `scilint` is only tested on linux and macos currently.\n\n## Install\n\n`pip install scilint`\n\n## Commands\n\n### **[`scilint_lint`](https://newday-data.github.io/scilint/scilint.html#scilint_lint)**\n\nExposes potential quality issues within your notebook using some\npre-defined checks. Default threshold values for these checks are\nprovided that will enable a build to be marked as passed or failed.\n\n<details>\n<summary>\n<b>Show parameters</b>\n</summary>\n\n#### `--fail_over`\n\n> For now a very basic failure threshold is set by providing a number of\n> warnings that will be accepted without failing the build. The default\n> is 1 but this can be increased via the `--fail_over` parameter. As the\n> library matures we will revisit adding more nuanced options.\n\n#### `--exclusions`\n\n> You can exclude individual notebooks or directories using the\n> `--exclusions` parameter. This is a comma separated list of paths\n> where you can provide directories like \u201cdir/\u201d or specific notebooks\n> like \u201csomenotebook.ipynb\u201d\n\n#### `--display_report`\n\n> Print the lint warnings report as a markdown formatted table.\n\n#### `--out_dir`\n\n> Directory to persist the lint_report, warning_violations and the\n> confgiruation used.\n\n#### `--print_syntax_errors`\n\n> The code is parsed using the `ast` module if that parsing fails due to\n> syntax errors that is noted in the warning report but the exact syntax\n> error is not provided. With this flag the `SyntaxError` reason message\n> that failed notebook parsing will be printed to the screen for each\n> offending notebook.\n\n</details>\n<p align=\"center\">\n<img src=\"nbs/images/scilint_lint.png\" alt=\"scilint_lint\" width=\"738\" border=\"3px solid black\">\n</p>\n\n### **[`scilint_tidy`](https://newday-data.github.io/scilint/scilint.html#scilint_tidy)**\n\nTo get a consistent style across your notebooks you can run\n[`scilint_tidy`](https://newday-data.github.io/scilint/scilint.html#scilint_tidy);\nthis currently runs `autoflake`, `black` and `isort` **in-place across\nall of the notebooks in your project**. This function wraps an\nopinionated flavour of the excellent\n[nbQA](https://github.com/nbQA-dev/nbQA) library.\n\n> \u26a0\ufe0fNote: as this **command runs in-place it will edit your existing\n> notebooks**. If you would like to test what this formatting does\n> without actually affecting their state then we recommended trying this\n> the first time from a clean git state. That way you can stash the\n> changes if you are not happy with them.\n\n<p align=\"center\">\n<img src=\"nbs/images/scilint_tidy.png\" alt=\"scilint_lint\" width=\"738\">\n</p>\n\n### **[`scilint_build`](https://newday-data.github.io/scilint/scilint.html#scilint_build)**\n\nChains existing functions together to form a build script for notebook\nbased projects. Has two versions which are executed automatically on\ndetection of whether your project uses `nbdev` or not.\n\n1.  Non-nbdev projects chain these commands:\n    [`scilint_tidy`](https://newday-data.github.io/scilint/scilint.html#scilint_tidy),\n    [`scilint_lint`](https://newday-data.github.io/scilint/scilint.html#scilint_lint)\n2.  `nbdev` projects chain the following commands:\n    [`scilint_tidy`](https://newday-data.github.io/scilint/scilint.html#scilint_tidy),\n    [nbdev_export](https://nbdev.fast.ai/api/export.html),\n    [nbdev_test](https://nbdev.fast.ai/api/test.html),\n    [`scilint_lint`](https://newday-data.github.io/scilint/scilint.html#scilint_lint),\n    [nbdev_clean](https://nbdev.fast.ai/api/clean.html)\n\n<p align=\"center\">\n<img src=\"nbs/images/scilint_build.png\" alt=\"scilint_lint\" width=\"738\">\n</p>\n\n## **[`scilint_ci`](https://newday-data.github.io/scilint/scilint.html#scilint_ci)** \\[`nbdev` only\\]\n\nAdds documentation generation to\n[`scilint_build`](https://newday-data.github.io/scilint/scilint.html#scilint_build).\nThis requires an `nbdev` project and a working quarto build. Quarto is a\ncore part of the nbdev system, if you are having trouble installing it,\ncheck out the `nbdev` Github [page](https://github.com/fastai/nbdev).\nFor more details on the Quarto project, check out their home\n[page](https://quarto.org/).\n\n# \ud83d\udcc8 Quality Indicators\n\nThe below are potential quality indicators that you can use to set a\nminimum bar for quality and comprehensibility within your projects.\nThese are not exhaustive or definite quality indicators - they are a\nstarting point to open the conversation about what it means to have a\nhigh quality notebook in practice.\n\n1.  Calls-Per-Function (CPF):\\*\\* compares the **amount of calls to the\n    amount of functions**. *Looks for possible relationship between\n    function definitions and usage.*\n2.  In-Function-Percent (IFP): the **percentage of code that is within a\n    function** rather than outside function scope.\n3.  Tests-Per-Function-Mean (TPF: the **average number of tests (where\n    test==assert) for all functions**. *Mean value so may be dominated\n    by outliers.*\n4.  Tests-Function-Coverage-Pct (TFC): what **percentage of all\n    functions have at least one test**. *Note: this is coverage at\n    function-level not line-based coverage.*\n5.  MarkdownToCodeRatio (MCP): what is the **ratio of markdown cells to\n    code cells**.\n6.  TotalCodeLen (TCL): the **total line length** of the notebook code\n    cells.\n7.  Loc-Per-MD-Section (LPS): the **lines of code per Markdown section**\n    header.\n8.  SyntaxErrors (SYN): if the code within the notebook has **invalid\n    Python syntax**.\n\n> *as already stated there is no definitive answer as to whether any of\n> these are low or high quality. However there are reasons to believe\n> inituitively that higher or lower values of the above will produce\n> higher quality notebooks. There are many questions left to answer,\n> like the role of docstrings, comments and type annotations; their\n> effectiveness may warrant inclusion but that is an open question at\n> the moment. As this library is used and refined with more projects and\n> more experimental metrics then these intuitions can evaluated more\n> rigorously.*\n\n## \u2795 Adding New Indicators\n\nFor now post your ideas as a feature request and we can discuss, if\naccepted you can provide a PR. We are looking for a more rigorous way\nlink indicator and effectivess, until that is found discussion is the\nbest we can do!\n\n# \ud83d\udc53 Quality Specs (& a Quality Standard)\n\nOften in Software Engineering code is both likely to go into production\nand likely to continue to be used once it does. In this enviroment it\nmakes sense for codebases to have a single quality standard. In the\n**explore vs exploit** decision making\n[trade-off](https://en.wikipedia.org/wiki/Exploration-exploitation_dilemma)\nthis environment could be classified as **high exploit**.\n\nFor problems that are **high explore**, like most Data Science work, we\nargue that **a single quality bar is not sufficient**. `scilint`\npromotes adopting a *progressive consolidation*\\* approach where\nexploration code starts with a speed of exploration goal and this may\ngradually shift to increase the emphasis on quality and reuse as the\nutility of the workflow becomes proven.\n\nThis feature is known as \u201cQuality Specs\u201d and it allows multiple\ndifferent specifications of quality to exist within a project. The\nstandard can be a relatively low bar for exploration work but can become\nmore demanding as you are closer to the productionisation of your work.\n\n\\**(term first used by [Ga\u00ebl Varoqouax](https://gael-varoquaux.info/);\nsee\n[here](https://gael-varoquaux.info/programming/software-for-reproducible-science-lets-not-have-a-misunderstanding.html)\nfor argument expansion).*\n\n## Reference Quality Standard\n\n> The progressive consolidation workflow that we use on projects is the\n> reference implementation for `scilint` and is summarised in the below\n> image:\n\n<p align=\"center\">\n<img src=\"nbs/images/quality_standard.png\" alt=\"Quality Standard\" width=\"738\" border=\"3px solid white\">\n</p>\n\n- **Legacy:** especially on larger projects there may be a large number\n  of legacy notebooks that are not in use and no there is no obvious\n  value in improving their quality. This could be removed from the\n  workflow if you have enforced a quality standard from the outset.\n- **Exploratory:** exploratory workflows are typically off-line and\n  involve much iteration. The benefit of some quality bar here is that\n  it aids collaboration, review and generally helps perform team-based\n  Data Science easier.\n- **Experimental:** we split production workflows into two groups:\n  experimental and validated. Experimental notebooks are, as the name\n  suggests, experiments that are yet to be proven. As they are released\n  to customers they should have a reasonably high quality standard but\n  not the same as validated work.\n- **Validated:** we need to have the most confidence that all validated\n  learning activity (experiments which have been accepted and scaled out\n  to all users) will run properly for along time after it is written.\n\n## What is a Quality Spec in practice?\n\nA quality spec in practice is just a yaml configuration file of the\nproperties of the quality spec. It contains threshold values for warning\nalong with some other settings. To adopt a multi-spec standard place a\nspec file into each directory that you want to have different standards\nfor. Look at `nbs/examples/nbs` to see an example of a multi-spec\nstandard.\n\n    ---\n      exclusions: ~\n      fail_over: 1\n      out_dir: \"/tmp/scilint/\"\n      precision: 3\n      print_syntax_errors: false\n      evaluate: true\n      warnings:\n        lt:\n          calls_per_func_median: 1\n          calls_per_func_mean: 1\n          in_func_pct: 20\n          tests_func_coverage_pct: 20\n          tests_per_func_mean: 0.5\n          markdown_code_pct: 5\n        gt:\n          total_code_len: 50000\n          loc_per_md_section: 2000\n        equals:\n          has_syntax_error: true\n\n## What does a lint report look like?\n\nThe lint warnings are printed to the console and a more thorough report\nis generated and saved as a CSV file which looks like this:\n\n<p align=\"center\">\n<img src=\"nbs/images/sample_report.png\" alt=\"Sample Report\" width=\"738\" border=\"3px solid white\">\n</p>\n\n# \ud83d\udd01 Changing Behaviour - Recommended Usage\n\nInfusing quality into workflows is aided by having timely, short-cycle\nfeedback of issues. Addtionally whatever quality bar you choose as a\nteam, it should be non-negotiable that way you can spend time thinking\nabout what matters like the problem you are trying to solve not\nnitpicking on small details repeatedly.\n\nWe recommend using `scilint` in the following way to maximise benefit:\n\n1.  Decide upon a quality standard including the different specs for\n    your ideal team workflow from idea to production - or just use the\n    reference standard of:\n    `legacy, exploratory=>experimental=>validated`. If you don\u2019t want\n    the complexity of a multi-spec standard you can just use a single\n    default spec.\n2.  Set `fail_over` to 1 - there is a temptation to slide this value to\n    meet the amount of warning you have - it is probably easier to\n    enforce a `fail_over` of 1 and to discuss the value of the\n    thresholds instead if you feel the warning is not warranted.\n3.  Open a terminal environment alongside your notebook environment: run\n    [`scilint_build`](https://newday-data.github.io/scilint/scilint.html#scilint_build)\n    often to check your project is in good shape\n4.  Add pre-commit hooks to run\n    [`scilint_build`](https://newday-data.github.io/scilint/scilint.html#scilint_build)\n    or\n    [`scilint_ci`](https://newday-data.github.io/scilint/scilint.html#scilint_ci)\n    (`nbdev` only) before your changes are commited. Don\u2019t forget to\n    commit your work often!\n5.  Add a CI build job that runs\n    [`scilint_build`](https://newday-data.github.io/scilint/scilint.html#scilint_build)\n    or\n    [`scilint_ci`](https://newday-data.github.io/scilint/scilint.html#scilint_ci).\n    A Github action workflow is included in this repo that does just\n    that.\n\n<p align=\"center\">\n<img src=\"nbs/images/scilint_pre_commit.png\" alt=\"Pre-commit hook\" width=\"738\" border=\"3px solid black\">\n</p>\n\n# \ud83d\ude4c Standing on the shoulders of giants - *an nbdev library*\n\n> `scilint` is written on top of the excellent `nbdev` library. This\n> library is revolutionary as it truly optimises all the benefits of\n> notebooks and compensates for most of their weaker points. For more\n> information on `nbdev` see the [homepage](https://nbdev.fast.ai/) or\n> [github repo](https://github.com/fastai/nbdev)\n\n## \ud83e\udd13 Make the switch to `nbdev`!\n\nIn case you hadn\u2019t guessed yet we are big `nbdev` fans. `scilint` has a\nbetter developer experience on an `nbdev` project and is more fully\nfeatured but mostly because it will really help you when trying to move\nfrom exploratory development to production processes.\n\nConverting your libraries to `nbdev` is not required for this tool to\nwork but we argue that it would confer many benefits if you are part of\na Production Data Science team. `nbdev` contains many features that are\nuseful for Data Science workflows; too many in fact to cover here. We\nwill focus on the major features we consider to have the most impact:\n\n1.  Explicit **separation of exploration from what is *fundamental* for\n    the workflow to execute** using the `export` directive.\n2.  Introducing a fit-for-purpose **test runner for notebooks**.\n3.  **In-flow documentation** of a notebook that is focused on the\n    reader and powerfully expressive thanks to Quarto Markdown (aids\n    building towards published reproducible research)\n4.  **Git friendly workflow** via pre-commit hooks.\n5.  Being able to build a **modular notebook workflow as it is easy to\n    export and import functions from notebooks** in your project - this\n    puts shared reusable functions within reach of the team easily.\n\n# \ud83d\udc4d Contributing\n\nAfter you clone this repository, please run `nbdev_install_hooks` in\nyour terminal. This sets up git hooks, which clean up the notebooks to\nremove the extraneous stuff stored in the notebooks (e.g.\u00a0which cells\nyou ran) which causes unnecessary merge conflicts.\n\nTo run the tests in parallel, launch nbdev_test.\n\nBefore submitting a PR, check that the local library and notebooks\nmatch.\n\nIf you made a change to the notebooks in one of the exported cells, you\ncan export it to the library with nbdev_prepare. If you made a change to\nthe library, you can export it back to the notebooks with nbdev_update.\n\n\n",
    "bugtrack_url": null,
    "license": "Apache Software License 2.0",
    "summary": "infuse quality into notebook based workflows with a new type of build tool.",
    "version": "0.2.4",
    "project_urls": {
        "Documentation": "https://newday-data.github.io/scilint/",
        "Homepage": "https://github.com/newday-data/scilint/tree/{branch}/"
    },
    "split_keywords": [
        "research",
        "production",
        "exploration",
        "ci/cd"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d84cebc4e0033ea3253d5c2ed61687087428e149f89d0ac5b26313a1a8b53d37",
                "md5": "35ae455b92fa30ce6b32d02f395783ec",
                "sha256": "1be5c0d1b67793980c6d9643767d910d5b32e9a50cf6c5ce21eb396c1ec0e679"
            },
            "downloads": -1,
            "filename": "scilint-0.2.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "35ae455b92fa30ce6b32d02f395783ec",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 33208,
            "upload_time": "2023-10-31T15:33:50",
            "upload_time_iso_8601": "2023-10-31T15:33:50.601745Z",
            "url": "https://files.pythonhosted.org/packages/d8/4c/ebc4e0033ea3253d5c2ed61687087428e149f89d0ac5b26313a1a8b53d37/scilint-0.2.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "310e31f81e7c00bd90e92c1ae1c61e9b47cada3ccca8363d52341914b31f9236",
                "md5": "85963c022164debf08a4d6be34dabdd9",
                "sha256": "3dfbea6bea8e8026f79e7208657906c42de920aec96d1cb3239d4701dc1718f8"
            },
            "downloads": -1,
            "filename": "scilint-0.2.4.tar.gz",
            "has_sig": false,
            "md5_digest": "85963c022164debf08a4d6be34dabdd9",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 34369,
            "upload_time": "2023-10-31T15:33:52",
            "upload_time_iso_8601": "2023-10-31T15:33:52.514516Z",
            "url": "https://files.pythonhosted.org/packages/31/0e/31f81e7c00bd90e92c1ae1c61e9b47cada3ccca8363d52341914b31f9236/scilint-0.2.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-10-31 15:33:52",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "newday-data",
    "github_project": "scilint",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "scilint"
}
        
Elapsed time: 0.13912s