repo2txt


Namerepo2txt JSON
Version 0.1.6 PyPI version JSON
download
home_page
SummaryA tool for combining the structure and contents of software repositories into a single file.
upload_time2023-11-21 00:12:54
maintainer
docs_urlNone
author
requires_python>=3.8
licenseMIT License Copyright (c) [2023] [Jack Krosinsnki] Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords gpt training combine
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # repo2txt

## Overview
`repo2txt` is a Python tool I assembled to help streamline the process of preparing code base training data for GPT-style Models (LLMs). It's especially helpful in passing a codebase to a GPT. This script automates the task of compiling assets from a project or repository into a single, comprehensive text file or Word document. The resulting file includes a hierarchical tree of the directory structure and the contents of each file.

## Features
- **Directory/File Tree**: Generates a detailed overview of the repository's directory and file structure.
- **File Contents**: Includes the content of each file, offering a comprehensive view into the code or text within the repository.
- **Output Formats**: Supports output in both `.txt` and `.docx` formats.
- **Customizable Ignoring Mechanism**: Provides options to ignore specific file types, individual files, and directories, allowing for a tailored documentation process.
- **Command-Line Flexibility**: Various command-line arguments are available to customize the script's output according to the user's needs.

## Suggested Installation
For ease of use, `repo2txt` can be installed via pip:

```bash
pip install repo2txt
```

Alternatively, you can directly run the `repo2txt.py` script. Ensure to install `python-docx` if using this method.

## Usage

If installed with pip simply invoke with ``` repo2txt ``` and any arguments you want. 

IMPORTANT: Remember that if you run repo2txt without any arguments it will create a txt document which will include ALL the files and their contents in the directory its inovked in and all of its subdirectories. This can be a massive ammount of data. I strongly suggest to exclude directories such as node_modules, .angular and similar. You can do this by simply adding the --exclude-dir argument followed by the directory or directories you wish to ignore.

For example:

```bash
repo2txt -r [path_to_repo] -o [output_file_name]
```

Replace [path_to_repo] with the path to your repository and [output_file_name] with your desired output file name (including the .txt or .docx extension).

By default, if no path is specified, the script operates in the current directory. Similarly, if no output file name is provided, it defaults to `output.txt`.

### Optional Command-Line Arguments:

- `-r`, `--repo_path`: Specify the path to the repository. Defaults to the current directory if not specified.
- `-o`, `--output_file`: Name for the output file. Defaults to "output.txt".
- `--ignore-files`: List of file names to ignore (e.g., `--ignore-files file1.txt file2.txt`). Specify 'none' to ignore no files.
- `--ignore-types`: List of file extensions to ignore (e.g., `--ignore-types .log .tmp`). Defaults to a predefined list in `config.json`. Specify 'none' to ignore no types.
- `--ignore-settings`: Flag to ignore common settings files.
- `--exclude-dir`: List of directory names to exclude (e.g., `--exclude-dir dir1 dir2`). Specify 'none' to exclude no directories.
- `--include-dir`: Include only a specific directory and its contents (e.g., `--include-dir src`).

### Examples

1. **Documenting a Repository to a Text File**:
   ```bash
   repo2txt -r /path/to/repository -o output.txt
   ```

2. **Documenting with Exclusions**:
   ```bash
   repo2txt -r /path/to/repository -o output.docx --ignore-types .log .tmp --exclude-dir tests
   ```

## Contributing
Contributions to enhance `repo2txt` are always welcome. Feel free to fork the repository, make your improvements, and submit a pull request.


            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "repo2txt",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "Jack Krosinski <jack.krosinski@oceidon.com>",
    "keywords": "gpt,training,combine",
    "author": "",
    "author_email": "Jack Krosinski <jack.krosinski@oceidon.com>",
    "download_url": "https://files.pythonhosted.org/packages/a6/bf/20316cd356f48cda2d8d58c27baa378255162727281eed7cb69f649aff61/repo2txt-0.1.6.tar.gz",
    "platform": null,
    "description": "# repo2txt\n\n## Overview\n`repo2txt` is a Python tool I assembled to help streamline the process of preparing code base training data for GPT-style Models (LLMs). It's especially helpful in passing a codebase to a GPT. This script automates the task of compiling assets from a project or repository into a single, comprehensive text file or Word document. The resulting file includes a hierarchical tree of the directory structure and the contents of each file.\n\n## Features\n- **Directory/File Tree**: Generates a detailed overview of the repository's directory and file structure.\n- **File Contents**: Includes the content of each file, offering a comprehensive view into the code or text within the repository.\n- **Output Formats**: Supports output in both `.txt` and `.docx` formats.\n- **Customizable Ignoring Mechanism**: Provides options to ignore specific file types, individual files, and directories, allowing for a tailored documentation process.\n- **Command-Line Flexibility**: Various command-line arguments are available to customize the script's output according to the user's needs.\n\n## Suggested Installation\nFor ease of use, `repo2txt` can be installed via pip:\n\n```bash\npip install repo2txt\n```\n\nAlternatively, you can directly run the `repo2txt.py` script. Ensure to install `python-docx` if using this method.\n\n## Usage\n\nIf installed with pip simply invoke with ``` repo2txt ``` and any arguments you want. \n\nIMPORTANT: Remember that if you run repo2txt without any arguments it will create a txt document which will include ALL the files and their contents in the directory its inovked in and all of its subdirectories. This can be a massive ammount of data. I strongly suggest to exclude directories such as node_modules, .angular and similar. You can do this by simply adding the --exclude-dir argument followed by the directory or directories you wish to ignore.\n\nFor example:\n\n```bash\nrepo2txt -r [path_to_repo] -o [output_file_name]\n```\n\nReplace [path_to_repo] with the path to your repository and [output_file_name] with your desired output file name (including the .txt or .docx extension).\n\nBy default, if no path is specified, the script operates in the current directory. Similarly, if no output file name is provided, it defaults to `output.txt`.\n\n### Optional Command-Line Arguments:\n\n- `-r`, `--repo_path`: Specify the path to the repository. Defaults to the current directory if not specified.\n- `-o`, `--output_file`: Name for the output file. Defaults to \"output.txt\".\n- `--ignore-files`: List of file names to ignore (e.g., `--ignore-files file1.txt file2.txt`). Specify 'none' to ignore no files.\n- `--ignore-types`: List of file extensions to ignore (e.g., `--ignore-types .log .tmp`). Defaults to a predefined list in `config.json`. Specify 'none' to ignore no types.\n- `--ignore-settings`: Flag to ignore common settings files.\n- `--exclude-dir`: List of directory names to exclude (e.g., `--exclude-dir dir1 dir2`). Specify 'none' to exclude no directories.\n- `--include-dir`: Include only a specific directory and its contents (e.g., `--include-dir src`).\n\n### Examples\n\n1. **Documenting a Repository to a Text File**:\n   ```bash\n   repo2txt -r /path/to/repository -o output.txt\n   ```\n\n2. **Documenting with Exclusions**:\n   ```bash\n   repo2txt -r /path/to/repository -o output.docx --ignore-types .log .tmp --exclude-dir tests\n   ```\n\n## Contributing\nContributions to enhance `repo2txt` are always welcome. Feel free to fork the repository, make your improvements, and submit a pull request.\n\n",
    "bugtrack_url": null,
    "license": "MIT License  Copyright (c) [2023] [Jack Krosinsnki]  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.",
    "summary": "A tool for combining the structure and contents of software repositories into a single file.",
    "version": "0.1.6",
    "project_urls": {
        "Homepage": "https://github.com/donoceidon/repo2txt",
        "Issues": "https://github.com/donoceidon/repo2txt/issues"
    },
    "split_keywords": [
        "gpt",
        "training",
        "combine"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1fe405ce212d0e7b5c560eb064dbfb2cd5fa08a098b472a8cfb50c12721f0ffe",
                "md5": "bdbad3d76dbcd0cded1c3db7eefca1be",
                "sha256": "c0415df5f5e79384002a550e8cb1e2a2d530d84ee54ea3f7204520af99973a75"
            },
            "downloads": -1,
            "filename": "repo2txt-0.1.6-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "bdbad3d76dbcd0cded1c3db7eefca1be",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 8706,
            "upload_time": "2023-11-21T00:12:52",
            "upload_time_iso_8601": "2023-11-21T00:12:52.559776Z",
            "url": "https://files.pythonhosted.org/packages/1f/e4/05ce212d0e7b5c560eb064dbfb2cd5fa08a098b472a8cfb50c12721f0ffe/repo2txt-0.1.6-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a6bf20316cd356f48cda2d8d58c27baa378255162727281eed7cb69f649aff61",
                "md5": "32343e7a6afaddd4b1aaf67bb9b9962f",
                "sha256": "7e4324550bdf04ba8137b051c5ef0bd88600d18c8c8cd2a208183d1ade383c6c"
            },
            "downloads": -1,
            "filename": "repo2txt-0.1.6.tar.gz",
            "has_sig": false,
            "md5_digest": "32343e7a6afaddd4b1aaf67bb9b9962f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 9893,
            "upload_time": "2023-11-21T00:12:54",
            "upload_time_iso_8601": "2023-11-21T00:12:54.115755Z",
            "url": "https://files.pythonhosted.org/packages/a6/bf/20316cd356f48cda2d8d58c27baa378255162727281eed7cb69f649aff61/repo2txt-0.1.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-11-21 00:12:54",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "donoceidon",
    "github_project": "repo2txt",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "repo2txt"
}
        
Elapsed time: 3.87121s