ProjectCodebaseToJsonl


NameProjectCodebaseToJsonl JSON
Version 0.0.1 PyPI version JSON
download
home_pagehttps://github.com/chigwell/ProjectCodebaseToJsonl
SummaryA package to convert project codebases into JSONL format for GPT model training.
upload_time2024-01-13 13:30:11
maintainer
docs_urlNone
authorEugene Evstafev
requires_python>=3.6
license
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![PyPI version](https://badge.fury.io/py/ProjectCodebaseToJsonl.svg)](https://badge.fury.io/py/ProjectCodebaseToJsonl)
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)
[![Downloads](https://static.pepy.tech/badge/ProjectCodebaseToJsonl)](https://pepy.tech/project/ProjectCodebaseToJsonl)

# ProjectCodebaseToJsonl

`ProjectCodebaseToJsonl` is a Python package designed to convert project codebases into JSONL format. This is particularly useful for preparing data for training GPT models, as it allows for the easy transformation of existing project structures and code into a format compatible with machine learning pipelines.

## Installation

To install `ProjectCodebaseToJsonl`, you can use pip:

```bash
pip install ProjectCodebaseToJsonl
```

## Usage

### As a Python Module

You can use `ProjectCodebaseToJsonl` as a module in your Python scripts.

Example:

```python
from codebase_to_jsonl import generate_jsonl_for_project

# Generate JSONL for a project
project_data = generate_jsonl_for_project(
    project_path="path_to_your_project",
    project_name="YourProjectName",
    use_gitignore=True,
    validation_ratio=0.4
)

print("Project Data Generated:")
print(project_data)
```

### Customizing Your Generator

You can customize the behavior of `ProjectCodebaseToJsonl` by adjusting parameters like `use_gitignore` and `validation_ratio` to suit the specific needs of your codebase and desired dataset characteristics.

## Output Example

Running `ProjectCodebaseToJsonl` generates JSONL files for both training and validation, structured to facilitate GPT model training. Here's an example of the output structure:

```
{
    "project_name": "YourProjectName",
    "token_count": 12345,
    "training_file": "YourProjectName_training_20240101_123456.jsonl",
    "validation_file": "YourProjectName_validation_20240101_123456.jsonl"
}
```

## Contributing

Contributions, issues, and feature requests are welcome! Feel free to check [issues page](https://github.com/chigwell/ProjectCodebaseToJsonl/issues).

## License

[MIT](https://choosealicense.com/licenses/mit/)



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/chigwell/ProjectCodebaseToJsonl",
    "name": "ProjectCodebaseToJsonl",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "",
    "author": "Eugene Evstafev",
    "author_email": "chigwel@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/a7/f3/dd771878611a868c442ce4474fbb4986f04be8f80a48512d6c7b6d777e40/ProjectCodebaseToJsonl-0.0.1.tar.gz",
    "platform": null,
    "description": "[![PyPI version](https://badge.fury.io/py/ProjectCodebaseToJsonl.svg)](https://badge.fury.io/py/ProjectCodebaseToJsonl)\n[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)\n[![Downloads](https://static.pepy.tech/badge/ProjectCodebaseToJsonl)](https://pepy.tech/project/ProjectCodebaseToJsonl)\n\n# ProjectCodebaseToJsonl\n\n`ProjectCodebaseToJsonl` is a Python package designed to convert project codebases into JSONL format. This is particularly useful for preparing data for training GPT models, as it allows for the easy transformation of existing project structures and code into a format compatible with machine learning pipelines.\n\n## Installation\n\nTo install `ProjectCodebaseToJsonl`, you can use pip:\n\n```bash\npip install ProjectCodebaseToJsonl\n```\n\n## Usage\n\n### As a Python Module\n\nYou can use `ProjectCodebaseToJsonl` as a module in your Python scripts.\n\nExample:\n\n```python\nfrom codebase_to_jsonl import generate_jsonl_for_project\n\n# Generate JSONL for a project\nproject_data = generate_jsonl_for_project(\n    project_path=\"path_to_your_project\",\n    project_name=\"YourProjectName\",\n    use_gitignore=True,\n    validation_ratio=0.4\n)\n\nprint(\"Project Data Generated:\")\nprint(project_data)\n```\n\n### Customizing Your Generator\n\nYou can customize the behavior of `ProjectCodebaseToJsonl` by adjusting parameters like `use_gitignore` and `validation_ratio` to suit the specific needs of your codebase and desired dataset characteristics.\n\n## Output Example\n\nRunning `ProjectCodebaseToJsonl` generates JSONL files for both training and validation, structured to facilitate GPT model training. Here's an example of the output structure:\n\n```\n{\n    \"project_name\": \"YourProjectName\",\n    \"token_count\": 12345,\n    \"training_file\": \"YourProjectName_training_20240101_123456.jsonl\",\n    \"validation_file\": \"YourProjectName_validation_20240101_123456.jsonl\"\n}\n```\n\n## Contributing\n\nContributions, issues, and feature requests are welcome! Feel free to check [issues page](https://github.com/chigwell/ProjectCodebaseToJsonl/issues).\n\n## License\n\n[MIT](https://choosealicense.com/licenses/mit/)\n\n\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "A package to convert project codebases into JSONL format for GPT model training.",
    "version": "0.0.1",
    "project_urls": {
        "Homepage": "https://github.com/chigwell/ProjectCodebaseToJsonl"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9f7bae848dbb1829bd45f40d64f6edb6b44b3c0f9c3e21b79a3a038d56a61029",
                "md5": "bad2b06656e02c8f49ad7c3537fc118d",
                "sha256": "9bf32bf36150bebf76aa99ce5a7a9d88177a1fa09ca9a3032584c008f7eda104"
            },
            "downloads": -1,
            "filename": "ProjectCodebaseToJsonl-0.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "bad2b06656e02c8f49ad7c3537fc118d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 5537,
            "upload_time": "2024-01-13T13:30:09",
            "upload_time_iso_8601": "2024-01-13T13:30:09.745285Z",
            "url": "https://files.pythonhosted.org/packages/9f/7b/ae848dbb1829bd45f40d64f6edb6b44b3c0f9c3e21b79a3a038d56a61029/ProjectCodebaseToJsonl-0.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a7f3dd771878611a868c442ce4474fbb4986f04be8f80a48512d6c7b6d777e40",
                "md5": "fd6dfa00e017b568fc2ba007dba9b3d0",
                "sha256": "e0e57eae2ee47fc752d53d1972e6666b99018cf71cec7b6745110c2ae1b9332c"
            },
            "downloads": -1,
            "filename": "ProjectCodebaseToJsonl-0.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "fd6dfa00e017b568fc2ba007dba9b3d0",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 4674,
            "upload_time": "2024-01-13T13:30:11",
            "upload_time_iso_8601": "2024-01-13T13:30:11.878417Z",
            "url": "https://files.pythonhosted.org/packages/a7/f3/dd771878611a868c442ce4474fbb4986f04be8f80a48512d6c7b6d777e40/ProjectCodebaseToJsonl-0.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-01-13 13:30:11",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "chigwell",
    "github_project": "ProjectCodebaseToJsonl",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "projectcodebasetojsonl"
}
        
Elapsed time: 0.67398s