its-thorn

Name	its-thorn JSON
Version	0.2.0 JSON
	download
home_page	None
Summary	A library for stealthy poisoning of datasets for instruction-tuned LLMs
upload_time	2024-09-07 01:58:55
maintainer	None
docs_url	None
author	Joe Lucas
requires_python	<4.0,>=3.10
license	MIT
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # its_thorn

:musical_note: _"Every row has **its thorn**"_ :musical_note: - [Poison](https://www.youtube.com/watch?v=j2r2nDhTzO4)

`its_thorn` is a library for building poisoned finetuning datasets. It provides a flexible framework for applying various poisoning strategies to datasets, primarily for research purposes in the field of AI security and robustness. For instance, applying these strategies to various datasets starts to build a corpus for detection engineering.

```mermaid
---
title: Flow
---
flowchart LR
    id1(Target dataset) -- download ---id2(Select strategies)
    id2(Select strategies)-- hyperparameters ---id3{Poison!}
    id3{Poison!}-- save ---id4(Hub)
    id3{Poison!}-- save ---id5(Local)
```

## Features

- Extensible architecture for adding new poisoning techniques
- Column detection to expose consistent poisoning interface
- Integration with HuggingFace datasets, including cloning of non-datafiles like Model Cards as improved tradecraft
- Command-line interface (CLI) for easy usage and automation

## Available Strategies

1. **Sentiment:** Modifies the sentiment of selected samples. Provide a string and a sentiment direction. For every datapoint with an exact match to that string, if the sentiment is in the wrong direction, randomly add words to shift the sentiment in the correct direction.
2. **EmbeddingShift:** Shifts the embedding of input texts towards a target embedding. Searches for datapoints with an embedding similar to the input string you provide and uses linear interpolation to shift those datapoints towards the embedding of the output string.
3. **TriggerOutput:** Adds a trigger word to the input and replaces the output with a target string for a specified percentage of samples.
4. **Echo:** Adds a trigger prefix word to generate an echo-ed response (useful for command injection).
5. **FindReplace:** Simple find and replace.

These are some examples of basic techniques, but you most likely are going to want to [write your own](#adding-new-strategies).

## Installation

You can install `its_thorn` using pip:

```bash
pip install its_thorn
```

## Usage

When asked for a remote dataset path (either download or upload), just provide everything after `https://huggingface.co/datasets/`. `https://huggingface.co/datasets/openai/gsm8k` becomes `openai/gsm8k`. When uploading, it will likely be `{your_username}/datasetname` .... unless you have another token :wink:.

### Command Line Interface

`its_thorn` now provides a command-line interface (CLI) using Typer. Here are the available commands:

1. **Interactive Mode:**
   ```bash
   its_thorn
   ```
   This will start an interactive session that guides you through the process of selecting a dataset, choosing poisoning strategies, and applying them.

2. **Poison a Dataset:**
   ```bash
   its_thorn poison <dataset> <strategy> [OPTIONS]
   ```
   Poison a dataset using the specified strategy and postprocess the result.

   Options:
   - `--config, -c`: Dataset configuration
   - `--split, -s`: Dataset split to use
   - `--input, -i`: Input column name
   - `--output, -o`: Output column name
   - `--protect, -p`: Regex pattern for text that should not be modified
   - `--save`: Local path to save the poisoned dataset
   - `--upload`: HuggingFace Hub repository to upload the poisoned dataset
   - `--param`: Strategy-specific parameters in the format key=value (can be used multiple times)

3. **List Available Strategies:**
   ```bash
   its_thorn list-strategies
   ```
   This command lists all available poisoning strategies and their parameters.

### As a Python Library

You can also use `its_thorn` strategies directly in your Python scripts. Here's an example:

```python
from datasets import load_dataset
from its_thorn.strategies.sentiment import Sentiment
from its_thorn.strategies.embedding_shift import EmbeddingShift
from its_thorn.strategies.trigger_output import TriggerOutput
from its_thorn.strategies.echo import Echo

# Load a dataset
dataset = load_dataset("your_dataset_name")

# Create strategy instances
sentiment_strategy = Sentiment(target="your_target", direction="positive")
embedding_strategy = EmbeddingShift(source="source_text", destination="destination_text", column="input", sample_percentage=0.5, shift_percentage=0.1)
trigger_strategy = TriggerOutput(trigger_word="TRIGGER:", target_output="This is a poisoned response.", percentage=0.05)
echo_strategy = Echo(trigger_word="ECHO:", percentage=0.05)

# Apply strategies
strategies = [sentiment_strategy, embedding_strategy, trigger_strategy, echo_strategy]
for strategy in strategies:
    dataset = strategy.execute(dataset, input_column="prompt", output_column="response")

print(f"Poisoned dataset created with {len(dataset)} samples")
```

Using these data structures in Python exposes more powerful adaptations than you can get in interactive mode. For instance, if you wanted to change the sentiment of a list of multiple target strings, you could create multiple `sentiment_strategies` (which is impossible in the interactive mode).

## Adding New Strategies

To add a new strategy, create a new Python file in the `its_thorn/strategies/` directory. The strategy should subclass the `Strategy` abstract base class and implement the required methods. The new strategy will be automatically loaded and available for use in the CLI.

## Postprocessing

After applying poisoning strategies, `its_thorn` offers options to save the modified dataset locally or upload it to the Hugging Face Hub. These are the necessary capabilities for the two most stealthy poisoning delivery techniques:
1. Replace the cached files in `~/.cache/HuggingFace` (save locally), and
2. Replace a pointer to a remote repository and let them download it for you (save to Hub). `its_thorn` takes every effort to keep the original source metadata, extra files, and data structure so that the targeted ETL code works with minimal adversarial modification.

![](static/example.png)

## Sharp Edges

- Some methods require OpenAI or HuggingFace tokens.
- Datasets have an incredibly wide range of schemas. This project was architected with an `input -> output` structure in mind.
- Embedding Shift will progress much faster with a GPU.

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "its-thorn",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.10",
    "maintainer_email": null,
    "keywords": null,
    "author": "Joe Lucas",
    "author_email": "joe@joetl.com",
    "download_url": "https://files.pythonhosted.org/packages/b9/05/9e50ab7d63b39fd28484865ca24a0c1081449b7843cf5887af46411ed287/its_thorn-0.2.0.tar.gz",
    "platform": null,
    "description": "# its_thorn\n\n:musical_note: _\"Every row has **its thorn**\"_ :musical_note: - [Poison](https://www.youtube.com/watch?v=j2r2nDhTzO4)\n\n`its_thorn` is a library for building poisoned finetuning datasets. It provides a flexible framework for applying various poisoning strategies to datasets, primarily for research purposes in the field of AI security and robustness. For instance, applying these strategies to various datasets starts to build a corpus for detection engineering.\n\n```mermaid\n---\ntitle: Flow\n---\nflowchart LR\n    id1(Target dataset) -- download ---id2(Select strategies)\n    id2(Select strategies)-- hyperparameters ---id3{Poison!}\n    id3{Poison!}-- save ---id4(Hub)\n    id3{Poison!}-- save ---id5(Local)\n```\n\n## Features\n\n- Extensible architecture for adding new poisoning techniques\n- Column detection to expose consistent poisoning interface\n- Integration with HuggingFace datasets, including cloning of non-datafiles like Model Cards as improved tradecraft\n- Command-line interface (CLI) for easy usage and automation\n\n## Available Strategies\n\n1. **Sentiment:** Modifies the sentiment of selected samples. Provide a string and a sentiment direction. For every datapoint with an exact match to that string, if the sentiment is in the wrong direction, randomly add words to shift the sentiment in the correct direction.\n2. **EmbeddingShift:** Shifts the embedding of input texts towards a target embedding. Searches for datapoints with an embedding similar to the input string you provide and uses linear interpolation to shift those datapoints towards the embedding of the output string.\n3. **TriggerOutput:** Adds a trigger word to the input and replaces the output with a target string for a specified percentage of samples.\n4. **Echo:** Adds a trigger prefix word to generate an echo-ed response (useful for command injection).\n5. **FindReplace:** Simple find and replace.\n\nThese are some examples of basic techniques, but you most likely are going to want to [write your own](#adding-new-strategies).\n\n## Installation\n\nYou can install `its_thorn` using pip:\n\n```bash\npip install its_thorn\n```\n\n## Usage\n\nWhen asked for a remote dataset path (either download or upload), just provide everything after `https://huggingface.co/datasets/`. `https://huggingface.co/datasets/openai/gsm8k` becomes `openai/gsm8k`. When uploading, it will likely be `{your_username}/datasetname` .... unless you have another token :wink:.\n\n### Command Line Interface\n\n`its_thorn` now provides a command-line interface (CLI) using Typer. Here are the available commands:\n\n1. **Interactive Mode:**\n   ```bash\n   its_thorn\n   ```\n   This will start an interactive session that guides you through the process of selecting a dataset, choosing poisoning strategies, and applying them.\n\n2. **Poison a Dataset:**\n   ```bash\n   its_thorn poison <dataset> <strategy> [OPTIONS]\n   ```\n   Poison a dataset using the specified strategy and postprocess the result.\n\n   Options:\n   - `--config, -c`: Dataset configuration\n   - `--split, -s`: Dataset split to use\n   - `--input, -i`: Input column name\n   - `--output, -o`: Output column name\n   - `--protect, -p`: Regex pattern for text that should not be modified\n   - `--save`: Local path to save the poisoned dataset\n   - `--upload`: HuggingFace Hub repository to upload the poisoned dataset\n   - `--param`: Strategy-specific parameters in the format key=value (can be used multiple times)\n\n3. **List Available Strategies:**\n   ```bash\n   its_thorn list-strategies\n   ```\n   This command lists all available poisoning strategies and their parameters.\n\n### As a Python Library\n\nYou can also use `its_thorn` strategies directly in your Python scripts. Here's an example:\n\n```python\nfrom datasets import load_dataset\nfrom its_thorn.strategies.sentiment import Sentiment\nfrom its_thorn.strategies.embedding_shift import EmbeddingShift\nfrom its_thorn.strategies.trigger_output import TriggerOutput\nfrom its_thorn.strategies.echo import Echo\n\n# Load a dataset\ndataset = load_dataset(\"your_dataset_name\")\n\n# Create strategy instances\nsentiment_strategy = Sentiment(target=\"your_target\", direction=\"positive\")\nembedding_strategy = EmbeddingShift(source=\"source_text\", destination=\"destination_text\", column=\"input\", sample_percentage=0.5, shift_percentage=0.1)\ntrigger_strategy = TriggerOutput(trigger_word=\"TRIGGER:\", target_output=\"This is a poisoned response.\", percentage=0.05)\necho_strategy = Echo(trigger_word=\"ECHO:\", percentage=0.05)\n\n# Apply strategies\nstrategies = [sentiment_strategy, embedding_strategy, trigger_strategy, echo_strategy]\nfor strategy in strategies:\n    dataset = strategy.execute(dataset, input_column=\"prompt\", output_column=\"response\")\n\nprint(f\"Poisoned dataset created with {len(dataset)} samples\")\n```\n\nUsing these data structures in Python exposes more powerful adaptations than you can get in interactive mode. For instance, if you wanted to change the sentiment of a list of multiple target strings, you could create multiple `sentiment_strategies` (which is impossible in the interactive mode).\n\n## Adding New Strategies\n\nTo add a new strategy, create a new Python file in the `its_thorn/strategies/` directory. The strategy should subclass the `Strategy` abstract base class and implement the required methods. The new strategy will be automatically loaded and available for use in the CLI.\n\n## Postprocessing\n\nAfter applying poisoning strategies, `its_thorn` offers options to save the modified dataset locally or upload it to the Hugging Face Hub. These are the necessary capabilities for the two most stealthy poisoning delivery techniques:\n1. Replace the cached files in `~/.cache/HuggingFace` (save locally), and\n2. Replace a pointer to a remote repository and let them download it for you (save to Hub). `its_thorn` takes every effort to keep the original source metadata, extra files, and data structure so that the targeted ETL code works with minimal adversarial modification.\n\n![](static/example.png)\n\n## Sharp Edges\n\n- Some methods require OpenAI or HuggingFace tokens.\n- Datasets have an incredibly wide range of schemas. This project was architected with an `input -> output` structure in mind.\n- Embedding Shift will progress much faster with a GPU.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A library for stealthy poisoning of datasets for instruction-tuned LLMs",
    "version": "0.2.0",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "408ba6c0a71b406e49b9c3b2f004568e2df86440d87759c6ed5175af1f8f8f8e",
                "md5": "141620e0ae5236b46a660b72e5545c40",
                "sha256": "cd2ab413ce5fc2be467ce547c85ddc89fdffd22a904308b904438a8e406dd67a"
            },
            "downloads": -1,
            "filename": "its_thorn-0.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "141620e0ae5236b46a660b72e5545c40",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.10",
            "size": 19996,
            "upload_time": "2024-09-07T01:58:53",
            "upload_time_iso_8601": "2024-09-07T01:58:53.746391Z",
            "url": "https://files.pythonhosted.org/packages/40/8b/a6c0a71b406e49b9c3b2f004568e2df86440d87759c6ed5175af1f8f8f8e/its_thorn-0.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b9059e50ab7d63b39fd28484865ca24a0c1081449b7843cf5887af46411ed287",
                "md5": "8455fd7860cac734c9707bbecfc52fa7",
                "sha256": "cee516145a1876bfdcf90e994474553935ad18a5d6593f2a9456fd53a5df5d78"
            },
            "downloads": -1,
            "filename": "its_thorn-0.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "8455fd7860cac734c9707bbecfc52fa7",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.10",
            "size": 16587,
            "upload_time": "2024-09-07T01:58:55",
            "upload_time_iso_8601": "2024-09-07T01:58:55.173463Z",
            "url": "https://files.pythonhosted.org/packages/b9/05/9e50ab7d63b39fd28484865ca24a0c1081449b7843cf5887af46411ed287/its_thorn-0.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-07 01:58:55",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "its-thorn"
}

Joe Lucas