soli-data-generator


Namesoli-data-generator JSON
Version 0.1.2 PyPI version JSON
download
home_pagehttps://openlegalstandard.org/
SummaryPython library for SOLI data generation
upload_time2024-09-07 21:15:04
maintainerNone
docs_urlNone
authorALEA Institute
requires_python<4.0.0,>=3.10
licenseMIT
keywords legal information standard soli open
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ![SOLI Logo](https://openlegalstandard.org/assets/images/soli-intro-logo.png)

[![PyPI version](https://badge.fury.io/py/soli-data-generator.svg)](https://badge.fury.io/py/soli-data-generator)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python Versions](https://img.shields.io/pypi/pyversions/soli-data-generator.svg)](https://pypi.org/project/soli-data-generator/)

# SOLI Data Generator

SOLI Data Generator is a Python package for generating synthetic legal data using
the [SOLI (Standards for Open Legal Information)](https://openlegalstandard.org) knowledge graph. It provides both
procedural and LLM-based generation techniques to create realistic legal text and data.

## Features

- Procedural generation using templates with SOLI and Faker tags
- LLM-based text generation using various AI models
- Easy integration with the SOLI knowledge graph
- Flexible and extensible architecture

## Installation

You can install SOLI Data Generator using pip:

```bash
pip install soli-data-generator
```

## Usage

### Procedural Template Generation

```python
from soli import SOLI
from soli_data_generator.procedural.template import TemplateFormatter

# Initialize the SOLI graph
soli_graph = SOLI()

# Initialize the TemplateFormatter
formatter = TemplateFormatter()

# Define a template with SOLI and Faker tags
template = """
Company: <|company|>
Industry: <|industry|>
Legal Issue: <|area_of_law|>
Date: <|date|>
Document Type: <|document_artifact|>
"""

# Format the template
formatted_text = formatter(template)
print(formatted_text)
```

**Output**:

```text
Company: Griffith-Mahoney
Industry: Electric Power Generation, Transmission and Distribution Industry
Legal Issue: Privacy
Date: 2024-08-19
Document Type: Request to Take Judicial Notice
```

### Multiple Values per Type

```python
template = """
From: <|name:1|>
To: <|name:2|>, <|email:1|>, <|email:b|>
Date: <|date|>
Subject: <|company|> matter updates
"""

print(formatter(template))
```

**Output**:

```text
From: David Henry
To: Jean Vance, obryant@example.com, landrysamuel@example.com
Date: 2024-08-31
Subject: Dorsey Ltd
```

### LLM-based Text Generation

```python
from alea_llm_client import VLLMModel
from soli_data_generator.llm.text import TextGenerator

# Initialize the VLLM model
model = VLLMModel()

# Initialize the TextGenerator
generator = TextGenerator(model)

# Generate text
generated_text = generator()

print(generated_text)
```

**Output with llama3.1 8B:**

```text
Be it known that White, Johnson and Morgan is in good standing, and I, the undersigned,
hereby attest to this fact. Were I to have knowledge of any reason why the said company
should not be considered in good standing, I would bring such to the attention of the
proper authorities.

Were the company not in good standing, I would not be able to issue this certificate. Were
there any outstanding matters or issues that would prevent the company from being
considered in good standing, I would be aware of them. Were this not the case, I would not
be able to provide this certification.

Were I to have knowledge of any reason why the said company should not be considered in
good standing, I would take immediate action to rectify the situation. Were this not
possible, I would report the matter to the relevant authorities. Were the company to be
found in bad standing, I would not be able to provide this certification.

It is hereby certified that White, Johnson and Morgan is in good standing as of the date
of this certificate. Were this certification to be found to be false or misleading, I
would be subject to penalties and consequences. Were I to have any knowledge that would
prevent the company from being considered in good standing, I would be obligated to report
such to the proper authorities.
```

Quality of generated text obviously varies by model and generation parameters.

## Examples

For more detailed examples, please check the `examples/` directory in this repository.

## Contributing

We welcome contributions to all SOLI libraries!

If you'd like to contribute, please follow these steps:

1. Fork the repository
2. Create a new branch for your feature or bug fix
3. Make your changes and write tests if applicable
4. Run the test suite to ensure everything is working
5. Submit a pull request with a clear description of your changes

## SOLI Python library

This library relies on the SOLI Python library for interacting with the SOLI knowledge graph. For more information about
the SOLI Python library, please visit
the [SOLI Python library repository](https://github.com/alea-institute/soli-python).

## SOLI API

A public, freely-accessible API is available for the SOLI ontology.

The API is hosted at [https://soli.openlegalstandard.org/](https://soli.openlegalstandard.org/).

The source code for the API is available on GitHub
at [https://github.com/alea-institute/soli-api](https://github.com/alea-institute/soli-api).

## License

The SOLI data generation library is released under the MIT License. See the [LICENSE](LICENSE) file for details.

## Support

If you encounter any issues or have questions about using the SOLI Python library,
please [open an issue](https://github.com/alea-institute/soli-data-generator/issues) on GitHub.

## Learn More

To learn more about SOLI, its development, and how you can get involved, visit
the [SOLI website](https://openlegalstandard.org/) or join
the [SOLI community forum](https://discourse.openlegalstandard.org/).

            

Raw data

            {
    "_id": null,
    "home_page": "https://openlegalstandard.org/",
    "name": "soli-data-generator",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0.0,>=3.10",
    "maintainer_email": null,
    "keywords": "legal, information, standard, soli, open",
    "author": "ALEA Institute",
    "author_email": "hello@aleainstitute.ai",
    "download_url": "https://files.pythonhosted.org/packages/dc/91/aa38fcc0a4239b9118b5aab2553cb640500bec2bacf1392de207837902f7/soli_data_generator-0.1.2.tar.gz",
    "platform": null,
    "description": "![SOLI Logo](https://openlegalstandard.org/assets/images/soli-intro-logo.png)\n\n[![PyPI version](https://badge.fury.io/py/soli-data-generator.svg)](https://badge.fury.io/py/soli-data-generator)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Python Versions](https://img.shields.io/pypi/pyversions/soli-data-generator.svg)](https://pypi.org/project/soli-data-generator/)\n\n# SOLI Data Generator\n\nSOLI Data Generator is a Python package for generating synthetic legal data using\nthe [SOLI (Standards for Open Legal Information)](https://openlegalstandard.org) knowledge graph. It provides both\nprocedural and LLM-based generation techniques to create realistic legal text and data.\n\n## Features\n\n- Procedural generation using templates with SOLI and Faker tags\n- LLM-based text generation using various AI models\n- Easy integration with the SOLI knowledge graph\n- Flexible and extensible architecture\n\n## Installation\n\nYou can install SOLI Data Generator using pip:\n\n```bash\npip install soli-data-generator\n```\n\n## Usage\n\n### Procedural Template Generation\n\n```python\nfrom soli import SOLI\nfrom soli_data_generator.procedural.template import TemplateFormatter\n\n# Initialize the SOLI graph\nsoli_graph = SOLI()\n\n# Initialize the TemplateFormatter\nformatter = TemplateFormatter()\n\n# Define a template with SOLI and Faker tags\ntemplate = \"\"\"\nCompany: <|company|>\nIndustry: <|industry|>\nLegal Issue: <|area_of_law|>\nDate: <|date|>\nDocument Type: <|document_artifact|>\n\"\"\"\n\n# Format the template\nformatted_text = formatter(template)\nprint(formatted_text)\n```\n\n**Output**:\n\n```text\nCompany: Griffith-Mahoney\nIndustry: Electric Power Generation, Transmission and Distribution Industry\nLegal Issue: Privacy\nDate: 2024-08-19\nDocument Type: Request to Take Judicial Notice\n```\n\n### Multiple Values per Type\n\n```python\ntemplate = \"\"\"\nFrom: <|name:1|>\nTo: <|name:2|>, <|email:1|>, <|email:b|>\nDate: <|date|>\nSubject: <|company|> matter updates\n\"\"\"\n\nprint(formatter(template))\n```\n\n**Output**:\n\n```text\nFrom: David Henry\nTo: Jean Vance, obryant@example.com, landrysamuel@example.com\nDate: 2024-08-31\nSubject: Dorsey Ltd\n```\n\n### LLM-based Text Generation\n\n```python\nfrom alea_llm_client import VLLMModel\nfrom soli_data_generator.llm.text import TextGenerator\n\n# Initialize the VLLM model\nmodel = VLLMModel()\n\n# Initialize the TextGenerator\ngenerator = TextGenerator(model)\n\n# Generate text\ngenerated_text = generator()\n\nprint(generated_text)\n```\n\n**Output with llama3.1 8B:**\n\n```text\nBe it known that White, Johnson and Morgan is in good standing, and I, the undersigned,\nhereby attest to this fact. Were I to have knowledge of any reason why the said company\nshould not be considered in good standing, I would bring such to the attention of the\nproper authorities.\n\nWere the company not in good standing, I would not be able to issue this certificate. Were\nthere any outstanding matters or issues that would prevent the company from being\nconsidered in good standing, I would be aware of them. Were this not the case, I would not\nbe able to provide this certification.\n\nWere I to have knowledge of any reason why the said company should not be considered in\ngood standing, I would take immediate action to rectify the situation. Were this not\npossible, I would report the matter to the relevant authorities. Were the company to be\nfound in bad standing, I would not be able to provide this certification.\n\nIt is hereby certified that White, Johnson and Morgan is in good standing as of the date\nof this certificate. Were this certification to be found to be false or misleading, I\nwould be subject to penalties and consequences. Were I to have any knowledge that would\nprevent the company from being considered in good standing, I would be obligated to report\nsuch to the proper authorities.\n```\n\nQuality of generated text obviously varies by model and generation parameters.\n\n## Examples\n\nFor more detailed examples, please check the `examples/` directory in this repository.\n\n## Contributing\n\nWe welcome contributions to all SOLI libraries!\n\nIf you'd like to contribute, please follow these steps:\n\n1. Fork the repository\n2. Create a new branch for your feature or bug fix\n3. Make your changes and write tests if applicable\n4. Run the test suite to ensure everything is working\n5. Submit a pull request with a clear description of your changes\n\n## SOLI Python library\n\nThis library relies on the SOLI Python library for interacting with the SOLI knowledge graph. For more information about\nthe SOLI Python library, please visit\nthe [SOLI Python library repository](https://github.com/alea-institute/soli-python).\n\n## SOLI API\n\nA public, freely-accessible API is available for the SOLI ontology.\n\nThe API is hosted at [https://soli.openlegalstandard.org/](https://soli.openlegalstandard.org/).\n\nThe source code for the API is available on GitHub\nat [https://github.com/alea-institute/soli-api](https://github.com/alea-institute/soli-api).\n\n## License\n\nThe SOLI data generation library is released under the MIT License. See the [LICENSE](LICENSE) file for details.\n\n## Support\n\nIf you encounter any issues or have questions about using the SOLI Python library,\nplease [open an issue](https://github.com/alea-institute/soli-data-generator/issues) on GitHub.\n\n## Learn More\n\nTo learn more about SOLI, its development, and how you can get involved, visit\nthe [SOLI website](https://openlegalstandard.org/) or join\nthe [SOLI community forum](https://discourse.openlegalstandard.org/).\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Python library for SOLI data generation",
    "version": "0.1.2",
    "project_urls": {
        "Documentation": "https://github.com/alea-institute/soli-data-generator",
        "Homepage": "https://openlegalstandard.org/",
        "Repository": "https://github.com/alea-institute/soli-data-generator"
    },
    "split_keywords": [
        "legal",
        " information",
        " standard",
        " soli",
        " open"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6e74b1638f610f6a320d423470995f7fd2a27935bf72831da24ec6b9c423fc8b",
                "md5": "e44cc0e9119c06757e3251004f6b566e",
                "sha256": "5d740b7585eeefd9463e6801f3fedd7b094ee4b4f9d8bf28862c22925f199711"
            },
            "downloads": -1,
            "filename": "soli_data_generator-0.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e44cc0e9119c06757e3251004f6b566e",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0.0,>=3.10",
            "size": 15542,
            "upload_time": "2024-09-07T21:15:02",
            "upload_time_iso_8601": "2024-09-07T21:15:02.670893Z",
            "url": "https://files.pythonhosted.org/packages/6e/74/b1638f610f6a320d423470995f7fd2a27935bf72831da24ec6b9c423fc8b/soli_data_generator-0.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "dc91aa38fcc0a4239b9118b5aab2553cb640500bec2bacf1392de207837902f7",
                "md5": "3fd64cf698f966bfb613658466e7cdbf",
                "sha256": "3e1727390e6cf0f375494567374fd30eb8f54187f4b74e719fb7edcdc2ddb8ec"
            },
            "downloads": -1,
            "filename": "soli_data_generator-0.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "3fd64cf698f966bfb613658466e7cdbf",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0.0,>=3.10",
            "size": 14403,
            "upload_time": "2024-09-07T21:15:04",
            "upload_time_iso_8601": "2024-09-07T21:15:04.099630Z",
            "url": "https://files.pythonhosted.org/packages/dc/91/aa38fcc0a4239b9118b5aab2553cb640500bec2bacf1392de207837902f7/soli_data_generator-0.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-07 21:15:04",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "alea-institute",
    "github_project": "soli-data-generator",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "soli-data-generator"
}
        
Elapsed time: 0.36527s