langfree


Namelangfree JSON
Version 0.0.32 PyPI version JSON
download
home_pagehttps://github.com/parlance-labs/langfree
SummaryUtilities to help you work with your language model data outside LangSmith
upload_time2024-02-27 01:44:54
maintainer
docs_urlNone
authorHamel Husain
requires_python>=3.7
licenseApache Software License 2.0
keywords nbdev jupyter notebook python langchain langsmith openai
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # langfree


<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

[![](https://github.com/parlance-labs/langfree/actions/workflows/test.yaml/badge.svg)](https://github.com/parlance-labs/langfree/actions/workflows/test.yaml)
[![Deploy to GitHub
Pages](https://github.com/parlance-labs/langfree/actions/workflows/deploy.yaml/badge.svg)](https://github.com/parlance-labs/langfree/actions/workflows/deploy.yaml)

`langfree` helps you extract, transform and curate
[ChatOpenAI](https://api.python.langchain.com/en/latest/chat_models/langchain.chat_models.openai.ChatOpenAI.html)
runs from
[traces](https://js.langchain.com/docs/modules/agents/how_to/logging_and_tracing)
stored in [LangSmith](https://www.langchain.com/langsmith), which can be
used for fine-tuning and evaluation.

![](https://github.com/parlance-labs/langfree/assets/1483922/0e37d5a4-1ffb-4661-85ba-7c9eb80dd06b.png)

### Motivation

Langchain has native [tracing
support](https://blog.langchain.dev/tracing/) that allows you to log
runs. This data is a valuable resource for fine-tuning and evaluation.
[LangSmith](https://docs.smith.langchain.com/) is a commercial
application that facilitates some of these tasks.

However, LangSmith may not suit everyone’s needs. It is often desirable
to buid your own data inspection and curation infrastructure:

> One pattern I noticed is that great AI researchers are willing to
> manually inspect lots of data. And more than that, **they build
> infrastructure that allows them to manually inspect data quickly.**
> Though not glamorous, manually examining data gives valuable
> intuitions about the problem. The canonical example here is Andrej
> Karpathy doing the ImageNet 2000-way classification task himself.
>
> – [Jason Wei, AI Researcher at
> OpenAI](https://x.com/_jasonwei/status/1708921475829481683?s=20)

`langfree` helps you export data from LangSmith and build data curation
tools. By building you own data curation tools, so you can add features
you need like:

- connectivity to data sources beyond LangSmith.
- customized data transformations of runs.
- ability to route, tag and annotate data in special ways.
- … etc.

Furthermore,`langfree` provides a handful of [Shiny for
Python](04_shiny.ipynb) components ease the process of creating data
curation applications.

## Install

``` sh
pip install langfree
```

## How to use

See the [docs site](http://langfree.parlance-labs.com/).

### Get runs from LangSmith

The [runs](01_runs.ipynb) module contains some utilities to quickly get
runs. We can get the recent runs from langsmith like so:

``` python
from langfree.runs import get_recent_runs
runs = get_recent_runs(last_n_days=3, limit=5)
```

    Fetching runs with this filter: and(eq(status, "success"), gte(start_time, "11/03/2023"), lte(start_time, "11/07/2023"))

``` python
print(f'Fetched {len(list(runs))} runs')
```

    Fetched 5 runs

There are other utlities like
[`get_runs_by_commit`](https://parlance-labs.github.io/langfree/runs.html#get_runs_by_commit)
if you are tagging runs by commit SHA. You can also use the [langsmith
sdk](https://docs.smith.langchain.com/) to get runs.

### Parse The Data

[`ChatRecordSet`](https://parlance-labs.github.io/langfree/chatrecord.html#chatrecordset)
parses the LangChain run in the following ways:

- finds the last child run that calls the language model (`ChatOpenAI`)
  in the chain where the run resides. You are often interested in the
  last call to the language model in the chain when curating data for
  fine tuning.
- extracts the inputs, outputs and function definitions that are sent to
  the language model.
- extracts other metadata that influences the run, such as the model
  version and parameters.

``` python
from langfree.chatrecord import ChatRecordSet
llm_data = ChatRecordSet.from_runs(runs)
```

Inspect Data

``` python
llm_data[0].child_run.inputs[0]
```

    {'role': 'system',
     'content': "You are a helpful documentation Q&A assistant, trained to answer questions from LangSmith's documentation. LangChain is a framework for building applications using large language models.\nThe current time is 2023-09-05 16:49:07.308007.\n\nRelevant documents will be retrieved in the following messages."}

``` python
llm_data[0].child_run.output
```

    {'role': 'assistant',
     'content': "Currently, LangSmith does not support project migration between organizations. However, you can manually imitate this process by reading and writing runs and datasets using the SDK. Here's an example of exporting runs:\n\n1. Read the runs from the source organization using the SDK.\n2. Write the runs to the destination organization using the SDK.\n\nBy following this process, you can transfer your runs from one organization to another. However, it may be faster to create a new project within your destination organization and start fresh.\n\nIf you have any further questions or need assistance, please reach out to us at support@langchain.dev."}

You can also see a flattened version of the input and the output

``` python
print(llm_data[0].flat_input[:200])
```

    ### System

    You are a helpful documentation Q&A assistant, trained to answer questions from LangSmith's documentation. LangChain is a framework for building applications using large language models.
    T

``` python
print(llm_data[0].flat_output[:200])
```

    ### Assistant

    Currently, LangSmith does not support project migration between organizations. However, you can manually imitate this process by reading and writing runs and datasets using the SDK. Her

### Transform The Data

Perform data augmentation by rephrasing the first human input. Here is
the first human input before data augmentation:

``` python
run = llm_data[0].child_run
[x for x in run.inputs if x['role'] == 'user']
```

    [{'role': 'user',
      'content': 'How do I move my project between organizations?'}]

Update the inputs:

``` python
from langfree.transform import reword_input
run.inputs = reword_input(run.inputs)
```

    rephrased input as: How can I transfer my project from one organization to another?

Check that the inputs are updated correctly:

``` python
[x for x in run.inputs if x['role'] == 'user']
```

    [{'role': 'user',
      'content': 'How can I transfer my project from one organization to another?'}]

You can also call `.to_dicts()` to convert `llm_data` to a list of dicts
that can be converted to jsonl for fine-tuning OpenAI models.

``` python
llm_dicts = llm_data.to_dicts()
print(llm_dicts[0].keys(), len(llm_dicts))
```

    dict_keys(['functions', 'messages']) 5

You can use
[`write_to_jsonl`](https://parlance-labs.github.io/langfree/transform.html#write_to_jsonl)
and
[`validate_jsonl`](https://parlance-labs.github.io/langfree/transform.html#validate_jsonl)
to help write this data to `.jsonl` and validate it.

## Build & Customize Tools For Curating LLM Data

The previous steps showed you how to collect and transform your data
from LangChain runs. Next, you can feed this data into a tool to help
you curate this data for fine tuning.

To learn how to run and customize this kind of tool, [read the
tutorial](tutorials/shiny.ipynb). `langfree` can help you quickly build
something that looks like this:

![](https://github.com/parlance-labs/langfree/assets/1483922/57d98336-d43f-432b-a730-e41261168cb2.png)

## Documentation

See the [docs site](http://langfree.parlance-labs.com/).

## FAQ

1.  **We don’t use LangChain. Can we still use something from this
    library?** No, not directly. However, we recommend looking at how
    the [Shiny for Python App works](tutorials/shiny.ipynb) so you can
    adapt it towards your own use cases.

2.  **Why did you use [Shiny For Python](https://shiny.posit.co/py/)?**
    Python has many great front-end libraries like Gradio, Streamlit,
    Panel and others. However, we liked Shiny For Python the best,
    because of its reactive model, modularity, strong integration with
    [Quarto](https://quarto.org/), and [WASM
    support](https://shiny.posit.co/py/docs/shinylive.html). You can
    read more about it
    [here](https://shiny.posit.co/py/docs/overview.html).

3.  **Does this only work with runs from LangChain/LangSmith?** Yes,
    `langfree` has only been tested with `LangChain` runs that have been
    logged to`LangSmith`, however we suspect that you could log your
    traces elsewhere and pull them in a similar manner.

4.  **Does this only work with
    [`ChatOpenAI`](https://api.python.langchain.com/en/latest/chat_models/langchain.chat_models.openai.ChatOpenAI.html)
    runs?** A: Yes, `langfree` is opinionated and only works with runs
    that use chat models from OpenAI (which use
    [`ChatOpenAI`](https://api.python.langchain.com/en/latest/chat_models/langchain.chat_models.openai.ChatOpenAI.html)
    in LangChain). We didn’t want to over-generalize this tool too
    quickly and started with the most popular combination of things.

5.  **Do you offer support?**: These tools are free and licensed under
    [Apache
    2.0](https://github.com/parlance-labs/langfree/blob/main/LICENSE).
    If you want support or customization, feel free to [reach out to
    us](https://parlance-labs.com/).

## Contributing

This library was created with [nbdev](https://nbdev.fast.ai/). See
[Contributing.md](https://github.com/parlance-labs/langfree/blob/main/CONTRIBUTING.md)
for further guidelines.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/parlance-labs/langfree",
    "name": "langfree",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "nbdev jupyter notebook python langchain langsmith openai",
    "author": "Hamel Husain",
    "author_email": "hamel.husain@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/fd/a7/5fb2733d6b978cf131b5a06fa95f42493bbd0e903a671731d04c67f4c8a4/langfree-0.0.32.tar.gz",
    "platform": null,
    "description": "# langfree\n\n\n<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->\n\n[![](https://github.com/parlance-labs/langfree/actions/workflows/test.yaml/badge.svg)](https://github.com/parlance-labs/langfree/actions/workflows/test.yaml)\n[![Deploy to GitHub\nPages](https://github.com/parlance-labs/langfree/actions/workflows/deploy.yaml/badge.svg)](https://github.com/parlance-labs/langfree/actions/workflows/deploy.yaml)\n\n`langfree` helps you extract, transform and curate\n[ChatOpenAI](https://api.python.langchain.com/en/latest/chat_models/langchain.chat_models.openai.ChatOpenAI.html)\nruns from\n[traces](https://js.langchain.com/docs/modules/agents/how_to/logging_and_tracing)\nstored in [LangSmith](https://www.langchain.com/langsmith), which can be\nused for fine-tuning and evaluation.\n\n![](https://github.com/parlance-labs/langfree/assets/1483922/0e37d5a4-1ffb-4661-85ba-7c9eb80dd06b.png)\n\n### Motivation\n\nLangchain has native [tracing\nsupport](https://blog.langchain.dev/tracing/) that allows you to log\nruns. This data is a valuable resource for fine-tuning and evaluation.\n[LangSmith](https://docs.smith.langchain.com/) is a commercial\napplication that facilitates some of these tasks.\n\nHowever, LangSmith may not suit everyone\u2019s needs. It is often desirable\nto buid your own data inspection and curation infrastructure:\n\n> One pattern I noticed is that great AI researchers are willing to\n> manually inspect lots of data. And more than that, **they build\n> infrastructure that allows them to manually inspect data quickly.**\n> Though not glamorous, manually examining data gives valuable\n> intuitions about the problem. The canonical example here is Andrej\n> Karpathy doing the ImageNet 2000-way classification task himself.\n>\n> \u2013 [Jason Wei, AI Researcher at\n> OpenAI](https://x.com/_jasonwei/status/1708921475829481683?s=20)\n\n`langfree` helps you export data from LangSmith and build data curation\ntools. By building you own data curation tools, so you can add features\nyou need like:\n\n- connectivity to data sources beyond LangSmith.\n- customized data transformations of runs.\n- ability to route, tag and annotate data in special ways.\n- \u2026 etc.\n\nFurthermore,`langfree` provides a handful of [Shiny for\nPython](04_shiny.ipynb) components ease the process of creating data\ncuration applications.\n\n## Install\n\n``` sh\npip install langfree\n```\n\n## How to use\n\nSee the [docs site](http://langfree.parlance-labs.com/).\n\n### Get runs from LangSmith\n\nThe [runs](01_runs.ipynb) module contains some utilities to quickly get\nruns. We can get the recent runs from langsmith like so:\n\n``` python\nfrom langfree.runs import get_recent_runs\nruns = get_recent_runs(last_n_days=3, limit=5)\n```\n\n    Fetching runs with this filter: and(eq(status, \"success\"), gte(start_time, \"11/03/2023\"), lte(start_time, \"11/07/2023\"))\n\n``` python\nprint(f'Fetched {len(list(runs))} runs')\n```\n\n    Fetched 5 runs\n\nThere are other utlities like\n[`get_runs_by_commit`](https://parlance-labs.github.io/langfree/runs.html#get_runs_by_commit)\nif you are tagging runs by commit SHA. You can also use the [langsmith\nsdk](https://docs.smith.langchain.com/) to get runs.\n\n### Parse The Data\n\n[`ChatRecordSet`](https://parlance-labs.github.io/langfree/chatrecord.html#chatrecordset)\nparses the LangChain run in the following ways:\n\n- finds the last child run that calls the language model (`ChatOpenAI`)\n  in the chain where the run resides. You are often interested in the\n  last call to the language model in the chain when curating data for\n  fine tuning.\n- extracts the inputs, outputs and function definitions that are sent to\n  the language model.\n- extracts other metadata that influences the run, such as the model\n  version and parameters.\n\n``` python\nfrom langfree.chatrecord import ChatRecordSet\nllm_data = ChatRecordSet.from_runs(runs)\n```\n\nInspect Data\n\n``` python\nllm_data[0].child_run.inputs[0]\n```\n\n    {'role': 'system',\n     'content': \"You are a helpful documentation Q&A assistant, trained to answer questions from LangSmith's documentation. LangChain is a framework for building applications using large language models.\\nThe current time is 2023-09-05 16:49:07.308007.\\n\\nRelevant documents will be retrieved in the following messages.\"}\n\n``` python\nllm_data[0].child_run.output\n```\n\n    {'role': 'assistant',\n     'content': \"Currently, LangSmith does not support project migration between organizations. However, you can manually imitate this process by reading and writing runs and datasets using the SDK. Here's an example of exporting runs:\\n\\n1. Read the runs from the source organization using the SDK.\\n2. Write the runs to the destination organization using the SDK.\\n\\nBy following this process, you can transfer your runs from one organization to another. However, it may be faster to create a new project within your destination organization and start fresh.\\n\\nIf you have any further questions or need assistance, please reach out to us at support@langchain.dev.\"}\n\nYou can also see a flattened version of the input and the output\n\n``` python\nprint(llm_data[0].flat_input[:200])\n```\n\n    ### System\n\n    You are a helpful documentation Q&A assistant, trained to answer questions from LangSmith's documentation. LangChain is a framework for building applications using large language models.\n    T\n\n``` python\nprint(llm_data[0].flat_output[:200])\n```\n\n    ### Assistant\n\n    Currently, LangSmith does not support project migration between organizations. However, you can manually imitate this process by reading and writing runs and datasets using the SDK. Her\n\n### Transform The Data\n\nPerform data augmentation by rephrasing the first human input. Here is\nthe first human input before data augmentation:\n\n``` python\nrun = llm_data[0].child_run\n[x for x in run.inputs if x['role'] == 'user']\n```\n\n    [{'role': 'user',\n      'content': 'How do I move my project between organizations?'}]\n\nUpdate the inputs:\n\n``` python\nfrom langfree.transform import reword_input\nrun.inputs = reword_input(run.inputs)\n```\n\n    rephrased input as: How can I transfer my project from one organization to another?\n\nCheck that the inputs are updated correctly:\n\n``` python\n[x for x in run.inputs if x['role'] == 'user']\n```\n\n    [{'role': 'user',\n      'content': 'How can I transfer my project from one organization to another?'}]\n\nYou can also call `.to_dicts()` to convert `llm_data` to a list of dicts\nthat can be converted to jsonl for fine-tuning OpenAI models.\n\n``` python\nllm_dicts = llm_data.to_dicts()\nprint(llm_dicts[0].keys(), len(llm_dicts))\n```\n\n    dict_keys(['functions', 'messages']) 5\n\nYou can use\n[`write_to_jsonl`](https://parlance-labs.github.io/langfree/transform.html#write_to_jsonl)\nand\n[`validate_jsonl`](https://parlance-labs.github.io/langfree/transform.html#validate_jsonl)\nto help write this data to `.jsonl` and validate it.\n\n## Build & Customize Tools For Curating LLM Data\n\nThe previous steps showed you how to collect and transform your data\nfrom LangChain runs. Next, you can feed this data into a tool to help\nyou curate this data for fine tuning.\n\nTo learn how to run and customize this kind of tool, [read the\ntutorial](tutorials/shiny.ipynb). `langfree` can help you quickly build\nsomething that looks like this:\n\n![](https://github.com/parlance-labs/langfree/assets/1483922/57d98336-d43f-432b-a730-e41261168cb2.png)\n\n## Documentation\n\nSee the [docs site](http://langfree.parlance-labs.com/).\n\n## FAQ\n\n1.  **We don\u2019t use LangChain. Can we still use something from this\n    library?** No, not directly. However, we recommend looking at how\n    the [Shiny for Python App works](tutorials/shiny.ipynb) so you can\n    adapt it towards your own use cases.\n\n2.  **Why did you use [Shiny For Python](https://shiny.posit.co/py/)?**\n    Python has many great front-end libraries like Gradio, Streamlit,\n    Panel and others. However, we liked Shiny For Python the best,\n    because of its reactive model, modularity, strong integration with\n    [Quarto](https://quarto.org/), and [WASM\n    support](https://shiny.posit.co/py/docs/shinylive.html). You can\n    read more about it\n    [here](https://shiny.posit.co/py/docs/overview.html).\n\n3.  **Does this only work with runs from LangChain/LangSmith?** Yes,\n    `langfree` has only been tested with `LangChain` runs that have been\n    logged to`LangSmith`, however we suspect that you could log your\n    traces elsewhere and pull them in a similar manner.\n\n4.  **Does this only work with\n    [`ChatOpenAI`](https://api.python.langchain.com/en/latest/chat_models/langchain.chat_models.openai.ChatOpenAI.html)\n    runs?** A: Yes, `langfree` is opinionated and only works with runs\n    that use chat models from OpenAI (which use\n    [`ChatOpenAI`](https://api.python.langchain.com/en/latest/chat_models/langchain.chat_models.openai.ChatOpenAI.html)\n    in LangChain). We didn\u2019t want to over-generalize this tool too\n    quickly and started with the most popular combination of things.\n\n5.  **Do you offer support?**: These tools are free and licensed under\n    [Apache\n    2.0](https://github.com/parlance-labs/langfree/blob/main/LICENSE).\n    If you want support or customization, feel free to [reach out to\n    us](https://parlance-labs.com/).\n\n## Contributing\n\nThis library was created with [nbdev](https://nbdev.fast.ai/). See\n[Contributing.md](https://github.com/parlance-labs/langfree/blob/main/CONTRIBUTING.md)\nfor further guidelines.\n",
    "bugtrack_url": null,
    "license": "Apache Software License 2.0",
    "summary": "Utilities to help you work with your language model data outside LangSmith",
    "version": "0.0.32",
    "project_urls": {
        "Homepage": "https://github.com/parlance-labs/langfree"
    },
    "split_keywords": [
        "nbdev",
        "jupyter",
        "notebook",
        "python",
        "langchain",
        "langsmith",
        "openai"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a1bb43f5bfdb3d2ed18d88d06bb3881679cba2afffd827e6e3049de4b640c008",
                "md5": "42caedeb9fb97a7ed70350d08ca5a089",
                "sha256": "9885b1417d2db93ac5e10170d8df17a6e89b0e804330e461a97eb8a58e17dad0"
            },
            "downloads": -1,
            "filename": "langfree-0.0.32-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "42caedeb9fb97a7ed70350d08ca5a089",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 19247,
            "upload_time": "2024-02-27T01:44:52",
            "upload_time_iso_8601": "2024-02-27T01:44:52.900834Z",
            "url": "https://files.pythonhosted.org/packages/a1/bb/43f5bfdb3d2ed18d88d06bb3881679cba2afffd827e6e3049de4b640c008/langfree-0.0.32-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "fda75fb2733d6b978cf131b5a06fa95f42493bbd0e903a671731d04c67f4c8a4",
                "md5": "81c0b61b3225ae6dd6fbb8c013201202",
                "sha256": "5af446be42801e1e2259fd05aa9930b8abe9460ae743e9d21b148b6835a278ef"
            },
            "downloads": -1,
            "filename": "langfree-0.0.32.tar.gz",
            "has_sig": false,
            "md5_digest": "81c0b61b3225ae6dd6fbb8c013201202",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 23082,
            "upload_time": "2024-02-27T01:44:54",
            "upload_time_iso_8601": "2024-02-27T01:44:54.721238Z",
            "url": "https://files.pythonhosted.org/packages/fd/a7/5fb2733d6b978cf131b5a06fa95f42493bbd0e903a671731d04c67f4c8a4/langfree-0.0.32.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-27 01:44:54",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "parlance-labs",
    "github_project": "langfree",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "langfree"
}
        
Elapsed time: 0.18841s