silverspeak

Name	silverspeak JSON
Version	2.1.0 JSON
	download
home_page	None
Summary	A Python library to perform homoglyph-based attacks on text. It supports both identical and non-identical homoglyphs across multiple languages.
upload_time	2025-01-26 02:18:50
maintainer	None
docs_url	None
author	Aldan Creo
requires_python	<4.0,>=3.9
license	GPL-3.0
keywords	homoglyph attack text security
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

# SilverSpeak
This is a Python library to perform homoglyph-based attacks on text.

We also include the experiments supplementing the paper "SilverSpeak: Evading AI-Generated Text Detectors using Homoglyphs".

## Installation
You can install this package from PyPI by running:
```
pip install silverspeak
```

## Usage example
```python
from silverspeak.homoglyphs.random_attack import random_attack

text = "Hello, world!"
attacked_text = random_attack(text, 0.1)
print(attacked_text)
```

## Installation from source
First, you may want to work in a virtual environment. If you don't have one, you can create it by running:
```
python -m venv .venv
```

Then, activate it with:
```
source .venv/bin/activate
```

You can also use Conda, or any other tool of your preference.

The Python version used in this project is `3.11.0`.

Also, remember to install the requirements by running:
```
pip install -r requirements.txt
```

And finally, install this package by running:
```
pip install -e .
```

## Reproducing the experimental results from the paper
To reproduce the results, you'll need a free Hugging Face account. You can register for an account here: https://huggingface.co/

Then, you'll need to sign into your account using the CLI with a token that has `write` permissions (more information [here](https://huggingface.co/docs/huggingface_hub/en/guides/cli)). To do that, just run:
```
huggingface-cli login
```

[note] When prompted "Add token as git credential?", you should answer "Yes".

Then, set the `MY_HUGGINGFACE_USER` environment variable to the username of the account you just registered on Hugging Face by running:
```
export MY_HUGGINGFACE_USER='your_username'
```

Then, you can run the `run_experiments.sh` script. This script will run the experiments for all the models and datasets.

Finally, run the following command to generate the plots and tables:
```
python experiments/visualization.py
```

You will also find two notebooks (`experiments/divergence_embeddings_attacks.ipynb` and `experiments/perplexity_tests.ipynb`), to reproduce some smaller parts of the paper.

## Datasets
We make our datasets, in versions with and without results, at the following URL: https://huggingface.co/silverspeak
Specifically, the datasets are provided in two versions, one without the results of the experiments and one including them. The datasets are named as follows:
- Datasets without results:
- `silverspeak/cheat`
- `silverspeak/essay`
- `silverspeak/reuter`
- `silverspeak/writing_prompts`
- `silverspeak/realnewslike`
- Datasets with results:
- `silverspeak/cheat_with_results`
- `silverspeak/essay_with_results`
- `silverspeak/reuter_with_results`
- `silverspeak/writing_prompts_with_results`
- `silverspeak/realnewslike_with_results`

## AI Disclaimer
We used AI code generation assitance from GitHub Copilot for this project. Nonetheless, the coding process has been essentially manual, with the AI code generator exclusively helping us to speed up the process.

## Reproducibility statement
We have tested the code in this repository on a NVIDIA A100 GPU, and have run the experiments twice, independently, to ensure the results are reproducible. We confirm that the results obtained were identical, and thus expect no variation in the results when running the code again. We manually set random seeds where necessary to ensure reproducibility.

## Side note: where does the name "SilverSpeak" come from?
The name SilverSpeak comes from the expression _"Hablar en plata"_ in Spanish. While a literal translation would be _"Speak in silver"_, it means _"Speak clearly"_. Therefore, some people would understand the underlying meaning, while those unfamiliar with the expression would likely misunderstand it.

Homoglyph-based attacks are an effective evasion technique since they change the meaning that detectors perceive, while maintaining the same appearance to a human observer. We think the idea can be a metaphor of the system getting _"lost in translation"_, especially considering that homoglyphs are frequently identical characters in different languages.

Hereby the rationale behind our choice of the name _SilverSpeak_ to refer to the family of homoglyph-based attacks that we use in our paper. The attacks play with the understood meaning of the text, depending on who is the observer, taking advantage of codification differences across alphabets.

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "silverspeak",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.9",
    "maintainer_email": null,
    "keywords": "homoglyph, attack, text, security",
    "author": "Aldan Creo",
    "author_email": "os@acmc.fyi",
    "download_url": "https://files.pythonhosted.org/packages/cb/ae/f378826db675816e12413e5e72c2721e28b1f2c17ddffbdd26db63ad26fe/silverspeak-2.1.0.tar.gz",
    "platform": null,
    "description": "# SilverSpeak\nThis is a Python library to perform homoglyph-based attacks on text.\n\nWe also include the experiments supplementing the paper \"SilverSpeak: Evading AI-Generated Text Detectors using Homoglyphs\".\n\n## Installation\nYou can install this package from PyPI by running:\n```\npip install silverspeak\n```\n\n## Usage example\n```python\nfrom silverspeak.homoglyphs.random_attack import random_attack\n\ntext = \"Hello, world!\"\nattacked_text = random_attack(text, 0.1)\nprint(attacked_text)\n```\n\n## Installation from source\nFirst, you may want to work in a virtual environment. If you don't have one, you can create it by running:\n```\npython -m venv .venv\n```\n\nThen, activate it with:\n```\nsource .venv/bin/activate\n```\n\nYou can also use Conda, or any other tool of your preference.\n\nThe Python version used in this project is `3.11.0`.\n\nAlso, remember to install the requirements by running:\n```\npip install -r requirements.txt\n```\n\nAnd finally, install this package by running:\n```\npip install -e .\n```\n\n## Reproducing the experimental results from the paper\nTo reproduce the results, you'll need a free Hugging Face account. You can register for an account here: https://huggingface.co/\n\nThen, you'll need to sign into your account using the CLI with a token that has `write` permissions (more information [here](https://huggingface.co/docs/huggingface_hub/en/guides/cli)). To do that, just run:\n```\nhuggingface-cli login\n```\n\n[note] When prompted \"Add token as git credential?\", you should answer \"Yes\".\n\nThen, set the `MY_HUGGINGFACE_USER` environment variable to the username of the account you just registered on Hugging Face by running:\n```\nexport MY_HUGGINGFACE_USER='your_username'\n```\n\nThen, you can run the `run_experiments.sh` script. This script will run the experiments for all the models and datasets.\n\nFinally, run the following command to generate the plots and tables:\n```\npython experiments/visualization.py\n```\n\nYou will also find two notebooks (`experiments/divergence_embeddings_attacks.ipynb` and `experiments/perplexity_tests.ipynb`), to reproduce some smaller parts of the paper.\n\n## Datasets\nWe make our datasets, in versions with and without results, at the following URL: https://huggingface.co/silverspeak\nSpecifically, the datasets are provided in two versions, one without the results of the experiments and one including them. The datasets are named as follows:\n- Datasets without results:\n    - `silverspeak/cheat`\n    - `silverspeak/essay`\n    - `silverspeak/reuter`\n    - `silverspeak/writing_prompts`\n    - `silverspeak/realnewslike`\n- Datasets with results:\n    - `silverspeak/cheat_with_results`\n    - `silverspeak/essay_with_results`\n    - `silverspeak/reuter_with_results`\n    - `silverspeak/writing_prompts_with_results`\n    - `silverspeak/realnewslike_with_results`\n\n## AI Disclaimer\nWe used AI code generation assitance from GitHub Copilot for this project. Nonetheless, the coding process has been essentially manual, with the AI code generator exclusively helping us to speed up the process.\n\n## Reproducibility statement\nWe have tested the code in this repository on a NVIDIA A100 GPU, and have run the experiments twice, independently, to ensure the results are reproducible. We confirm that the results obtained were identical, and thus expect no variation in the results when running the code again. We manually set random seeds where necessary to ensure reproducibility.\n\n## Side note: where does the name \"SilverSpeak\" come from?\nThe name SilverSpeak comes from the expression _\"Hablar en plata\"_ in Spanish. While a literal translation would be _\"Speak in silver\"_, it means _\"Speak clearly\"_. Therefore, some people would understand the underlying meaning, while those unfamiliar with the expression would likely misunderstand it.\n\nHomoglyph-based attacks are an effective evasion technique since they change the meaning that detectors perceive, while maintaining the same appearance to a human observer. We think the idea can be a metaphor of the system getting _\"lost in translation\"_, especially considering that homoglyphs are frequently identical characters in different languages.\n\nHereby the rationale behind our choice of the name _SilverSpeak_ to refer to the family of homoglyph-based attacks that we use in our paper. The attacks play with the understood meaning of the text, depending on who is the observer, taking advantage of codification differences across alphabets.\n\n",
    "bugtrack_url": null,
    "license": "GPL-3.0",
    "summary": "A Python library to perform homoglyph-based attacks on text. It supports both identical and non-identical homoglyphs across multiple languages.",
    "version": "2.1.0",
    "project_urls": null,
    "split_keywords": [
        "homoglyph",
        " attack",
        " text",
        " security"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "a7d1c8ced1e5b599e76e91cf08856d641edff4b1edbb4b52d57a145c7a56b74e",
                "md5": "c8a1d43e89f46f43a62f87215de2e76e",
                "sha256": "8407a30b1ad877abb83c1f1d0071182bbffce97bf56ade64a3d75e691bbdd86b"
            },
            "downloads": -1,
            "filename": "silverspeak-2.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c8a1d43e89f46f43a62f87215de2e76e",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.9",
            "size": 62304,
            "upload_time": "2025-01-26T02:18:48",
            "upload_time_iso_8601": "2025-01-26T02:18:48.654530Z",
            "url": "https://files.pythonhosted.org/packages/a7/d1/c8ced1e5b599e76e91cf08856d641edff4b1edbb4b52d57a145c7a56b74e/silverspeak-2.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "cbaef378826db675816e12413e5e72c2721e28b1f2c17ddffbdd26db63ad26fe",
                "md5": "9e19e18535cc98cd26c64142fac4ce85",
                "sha256": "8c4ab759e0a7b5c2da5bc4bffaa074f88ef5e99ff02ef1af679f6407c37a1200"
            },
            "downloads": -1,
            "filename": "silverspeak-2.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "9e19e18535cc98cd26c64142fac4ce85",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.9",
            "size": 61508,
            "upload_time": "2025-01-26T02:18:50",
            "upload_time_iso_8601": "2025-01-26T02:18:50.734113Z",
            "url": "https://files.pythonhosted.org/packages/cb/ae/f378826db675816e12413e5e72c2721e28b1f2c17ddffbdd26db63ad26fe/silverspeak-2.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-26 02:18:50",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "silverspeak"
}

Aldan Creo