mflux

Name	mflux JSON
Version	0.9.6 JSON
	download
home_page	None
Summary	A MLX port of FLUX based on the Huggingface Diffusers implementation.
upload_time	2025-07-20 10:05:52
maintainer	Filip Strand
docs_url	None
author	Filip Strand
requires_python	>=3.10
license	MIT License Copyright (c) 2025 Filip Strand Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords	diffusers flux mlx
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            ![image](src/mflux/assets/logo.png)
*A MLX port of FLUX based on the Huggingface Diffusers implementation.*


### About

Run the powerful [FLUX](https://blackforestlabs.ai/#get-flux) models from [Black Forest Labs](https://blackforestlabs.ai) locally on your Mac!

### Table of contents

<!-- TOC start (generated with https://github.com/derlin/bitdowntoc) -->

- [Philosophy](#philosophy)
- [💿 Installation](#-installation)
- [🖼️ Generating an image](#%EF%B8%8F-generating-an-image)
  * [📜 Full list of Command-Line Arguments](#-full-list-of-command-line-arguments)
- [⏱️ Image generation speed (updated)](#%EF%B8%8F-image-generation-speed-updated)
- [↔️ Equivalent to Diffusers implementation](#%EF%B8%8F-equivalent-to-diffusers-implementation)
- [🗜️ Quantization](#%EF%B8%8F-quantization)
- [💽 Running a non-quantized model directly from disk](#-running-a-non-quantized-model-directly-from-disk)
- [🌐 Third-Party HuggingFace Model Support](#-third-party-huggingface-model-support)
- [🎨 Image-to-Image](#-image-to-image)
- [🔌 LoRA](#-lora)
- [🎭 In-Context Generation](#-in-context-generation)
  * [📸 Kontext](#-kontext)
  * [🎨 In-Context LoRA](#-in-context-lora)
  * [👕 CatVTON (Virtual Try-On)](#-catvton-virtual-try-on)
  * [✏️ IC-Edit (In-Context Editing)](#%EF%B8%8F-ic-edit-in-context-editing)
- [🛠️ Flux Tools](#%EF%B8%8F-flux-tools)
  * [🖌️ Fill](#%EF%B8%8F-fill)
  * [🔍 Depth](#-depth)
  * [🔄 Redux](#-redux)
- [🕹️ Controlnet](#%EF%B8%8F-controlnet)
- [🔎 Upscale](#-upscale)
- [🎛️ Dreambooth fine-tuning](#%EF%B8%8F-dreambooth-fine-tuning)
- [🧠 Concept Attention](#-concept-attention)
- [🚧 Current limitations](#-current-limitations)
- [💡Workflow tips](#workflow-tips)
- [🔬 Cool research](#-cool-research)
- [🌱‍ Related projects](#-related-projects)
- [🙏 Acknowledgements](#-acknowledgements)
- [⚖️ License](#%EF%B8%8F-license)

<!-- TOC end -->

---

### Philosophy

MFLUX is a line-by-line port of the FLUX implementation in the [Huggingface Diffusers](https://github.com/huggingface/diffusers) library to [Apple MLX](https://github.com/ml-explore/mlx).
MFLUX is purposefully kept minimal and explicit - Network architectures are hardcoded and no config files are used
except for the tokenizers. The aim is to have a tiny codebase with the single purpose of expressing these models
(thereby avoiding too many abstractions). While MFLUX priorities readability over generality and performance, [it can still be quite fast](#%EF%B8%8F-image-generation-speed-updated), [and even faster quantized](#%EF%B8%8F-quantization).

All models are implemented from scratch in MLX and only the tokenizers are used via the
[Huggingface Transformers](https://github.com/huggingface/transformers) library. Other than that, there are only minimal dependencies
like [Numpy](https://numpy.org) and [Pillow](https://pypi.org/project/pillow/) for simple image post-processing.

---

### 💿 Installation
For users, the easiest way to install MFLUX is to use `uv tool`: If you have [installed `uv`](https://github.com/astral-sh/uv?tab=readme-ov-file#installation), simply:

```sh
uv tool install --upgrade mflux
```

to get the `mflux-generate` and related command line executables. You can skip to the usage guides below.

<details>
<summary>For Python 3.13 dev preview</summary>

The [T5 encoder](https://huggingface.co/docs/transformers/en/model_doc/t5) is dependent on [sentencepiece](https://pypi.org/project/sentencepiece/), which does not have a installable wheel artifact for Python 3.13 as of Nov 2024. Until Google [publishes a 3.13 wheel](https://pypi.org/project/sentencepiece/), you need to build your own wheel with [official build instructions](https://github.com/google/sentencepiece/blob/master/python/README.md#build-and-install-sentencepiece) or for your convenience use a `.whl` pre-built by contributor @anthonywu. The steps below should work for most developers though your system may vary.

```sh
uv venv --python 3.13
python -V  # e.g. Python 3.13.0rc2
source .venv/bin/activate

# for your convenience, you can use the contributor wheel
uv pip install https://github.com/anthonywu/sentencepiece/releases/download/0.2.1-py13dev/sentencepiece-0.2.1-cp313-cp313-macosx_11_0_arm64.whl

# enable the pytorch nightly 
uv pip install --pre --extra-index-url https://download.pytorch.org/whl/nightly -e .
```
</details>

<details>
<summary>For the classic way to create a user virtual environment:</summary>

```
mkdir -p mflux && cd mflux && python3 -m venv .venv && source .venv/bin/activate
```

This creates and activates a virtual environment in the `mflux` folder. After that, install MFLUX via pip:

```
pip install -U mflux
```

</details>

<details>
<summary>For contributors (click to expand)</summary>

1. Clone the repo:
```sh
 git clone git@github.com:filipstrand/mflux.git
 ```
2. Install the application

```sh
 make install
 ```
3. To run the test suite
```sh
 make test
 ```
4. Follow format and lint checks prior to submitting Pull Requests. The recommended `make lint` and `make format` installs and uses [`ruff`](https://github.com/astral-sh/ruff). You can setup your editor/IDE to lint/format automatically, or use our provided `make` helpers:
  - `make format` - formats your code
  - `make lint` - shows your lint errors and warnings, but does not auto fix
  - `make check` - via `pre-commit` hooks, formats your code **and** attempts to auto fix lint errors
  - consult official [`ruff` documentation](https://docs.astral.sh/ruff/) on advanced usages

</details>

*If you have trouble installing MFLUX, please see the [installation related issues section](https://github.com/filipstrand/mflux/issues?q=is%3Aissue+install+).* 

<details>
<summary>⌨️ <strong>Shell Completions (Optional)</strong></summary>

MFLUX supports ZSH (default on macOS) shell completions for all CLI commands. This provides tab completion for all [command-line arguments](#-full-list-of-command-line-arguments) and options.

To enable completions:

```sh
# Install completions
mflux-completions

# Reload your shell
exec zsh

# Now try tab completion!
mflux-generate --<TAB>
```

If you encounter issues (ZSH completion setup can vary by system), check your installation:

```sh
mflux-completions --check
```

For more details and troubleshooting, see the [completions documentation](src/mflux/completions/README.md).

</details>

---

### 🖼️ Generating an image

Run the command `mflux-generate` by specifying a prompt and the model and some optional arguments. For example, here we use a quantized version of the `schnell` model for 2 steps:

```sh
mflux-generate --model schnell --prompt "Luxury food photograph" --steps 2 --seed 2 -q 8
```

This example uses the more powerful `dev` model with 25 time steps:

```sh
mflux-generate --model dev --prompt "Luxury food photograph" --steps 25 --seed 2 -q 8
```

You can also pipe prompts from other commands using stdin:

```sh
echo "A majestic mountain landscape" | mflux-generate --prompt - --model schnell --steps 2
```

This is useful for integrating MFLUX into shell scripts or dynamically generating prompts using LLM inference tools such as [`llm`](https://llm.datasette.io/en/stable/), [`mlx-lm`](https://github.com/ml-explore/mlx-lm), [`ollama`](https://ollama.ai/), etc.

Alternatively, you can use MFLUX directly in Python:

```python
from mflux.flux.flux import Flux1
from mflux.config.config import Config

# Load the model
flux = Flux1.from_name(
   model_name="schnell",  # "schnell" or "dev"
   quantize=8,            # 4 or 8
)

# Generate an image
image = flux.generate_image(
   seed=2,
   prompt="Luxury food photograph",
   config=Config(
      num_inference_steps=2,  # "schnell" works well with 2-4 steps, "dev" works well with 20-25 steps
      height=1024,
      width=1024,
   )
)

image.save(path="image.png")
```

For more advanced Python usage and additional configuration options, you can explore the entry point files in the source code, such as [`generate.py`](src/mflux/generate.py), [`generate_controlnet.py`](src/mflux/generate_controlnet.py), [`generate_fill.py`](src/mflux/generate_fill.py), and others in the [`src/mflux/`](src/mflux/) directory. These files demonstrate how to use the Python API for various features and provide examples of advanced configurations.

⚠️ *If the specific model is not already downloaded on your machine, it will start the download process and fetch the model weights (~34GB in size for the Schnell or Dev model respectively). See the [quantization](#%EF%B8%8F-quantization) section for running compressed versions of the model.* ⚠️

*By default, mflux caches files in `~/Library/Caches/mflux/`. The Hugging Face model files themselves are cached separately in the Hugging Face cache directory (e.g., `~/.cache/huggingface/`).*

*To change the mflux cache location, set the `MFLUX_CACHE_DIR` environment variable. To change the Hugging Face cache location, you can modify the `HF_HOME` environment variable. For more details on Hugging Face cache settings, please refer to the [Hugging Face documentation](https://huggingface.co/docs/huggingface_hub/en/package_reference/environment_variables)*.

🔒 [FLUX.1-dev currently requires granted access to its Huggingface repo. For troubleshooting, see the issue tracker](https://github.com/filipstrand/mflux/issues/14) 🔒

#### 📜 Full list of Command-Line Arguments

<details>
<summary>📜 <strong>Command-Line Arguments Reference</strong></summary>

#### General Command-Line Arguments

<details>
<summary>Click to expand General arguments</summary>

- **`--prompt`** (required, `str`): Text description of the image to generate. Use `-` to read the prompt from stdin (e.g., `echo "A beautiful sunset" | mflux-generate --prompt -`).

- **`--model`** or **`-m`** (required, `str`): Model to use for generation. Can be one of the official models (`"schnell"` or `"dev"`) or a HuggingFace repository ID for a compatible third-party model (e.g., `"Freepik/flux.1-lite-8B-alpha"`).

- **`--base-model`** (optional, `str`, default: `None`): Specifies which base architecture a third-party model is derived from (`"schnell"` or `"dev"`). Required when using third-party models from HuggingFace.

- **`--output`** (optional, `str`, default: `"image.png"`): Output image filename. If `--seed` or `--auto-seeds` establishes multiple seed values, the output filename will automatically be modified to include the seed value (e.g., `image_seed_42.png`).

- **`--seed`** (optional, repeatable `int` args, default: `None`): 1 or more seeds for random number generation. e.g. `--seed 42` or `--seed 123 456 789`. When multiple seeds are provided, MFLUX will generate one image per seed, using the same prompt and settings. Default is a single time-based value.

- **`--auto-seeds`** (optional, `int`, default: `None`): Auto generate N random Seeds in a series of image generations. For example, `--auto-seeds 5` will generate 5 different images with 5 different random seeds. This is superseded by explicit `--seed` arguments and `seed` values in `--config-from-metadata` files.

- **`--height`** (optional, `int`, default: `1024`): Height of the output image in pixels.

- **`--width`** (optional, `int`, default: `1024`): Width of the output image in pixels.

- **`--steps`** (optional, `int`, default: `4`): Number of inference steps.

- **`--guidance`** (optional, `float`, default: `3.5`): Guidance scale (only used for `"dev"` model).

- **`--path`** (optional, `str`, default: `None`): Path to a local model on disk.

- **`--quantize`** or **`-q`** (optional, `int`, default: `None`): [Quantization](#%EF%B8%8F-quantization) (choose between `3`, `4`, `6`, or `8` bits).

- **`--lora-paths`** (optional, `[str]`, default: `None`): The paths to the [LoRA](#-LoRA) weights.

- **`--lora-scales`** (optional, `[float]`, default: `None`): The scale for each respective [LoRA](#-LoRA) (will default to `1.0` if not specified and only one LoRA weight is loaded.)

- **`--metadata`** (optional): Exports a `.json` file containing the metadata for the image with the same name. (Even without this flag, the image metadata is saved and can be viewed using `exiftool image.png`)

- **`--image-path`** (optional, `str`, default: `None`): Local path to the initial image for image-to-image generation.

- **`--image-strength`** (optional, `float`, default: `0.4`): Controls how strongly the initial image influences the output image. A value of `0.0` means no influence. (Default is `0.4`)

- **`--config-from-metadata`** or **`-C`** (optional, `str`): [EXPERIMENTAL] Path to a prior file saved via `--metadata`, or a compatible handcrafted config file adhering to the expected args schema.

- **`--low-ram`** (optional): Reduces GPU memory usage by limiting the MLX cache size and releasing text encoders and transformer components after use (single image generation only). While this may slightly decrease performance, it helps prevent system memory swapping to disk, allowing image generation on systems with limited RAM.

- **`--battery-percentage-stop-limit`** or **`-B`** (optional, `int`, default: `5`): On Mac laptops powered by battery, automatically stops image generation when battery percentage reaches this threshold. Prevents your Mac from shutting down and becoming unresponsive during long generation sessions.

- **`--lora-name`** (optional, `str`, default: `None`): The name of the LoRA to download from Hugging Face.

- **`--lora-repo-id`** (optional, `str`, default: `"ali-vilab/In-Context-LoRA"`): The Hugging Face repository ID for LoRAs.

- **`--stepwise-image-output-dir`** (optional, `str`, default: `None`): [EXPERIMENTAL] Output directory to write step-wise images and their final composite image to. This feature may change in future versions. When specified, MFLUX will save an image for each denoising step, allowing you to visualize the generation process from noise to final image.

- **`--vae-tiling`** (optional, flag): Enable VAE tiling to reduce memory usage during the decoding phase. This splits the image into smaller chunks for processing, which can prevent out-of-memory errors when generating high-resolution images. Note that this optimization may occasionally produce a subtle seam in the middle of the image, but it's often worth the tradeoff for being able to generate images that would otherwise cause your system to run out of memory.

- **`--vae-tiling-split`** (optional, `str`, default: `"horizontal"`): When VAE tiling is enabled, this parameter controls the direction to split the latents. Options are `"horizontal"` (splits into top/bottom) or `"vertical"` (splits into left/right). Use this option to control where potential seams might appear in the final image.

</details>

#### Kontext Command-Line Arguments

<details>
<summary>Click to expand Kontext arguments</summary>

The `mflux-generate-kontext` command supports most of the same arguments as `mflux-generate`, with these specific parameters:

- **`--image-path`** (required, `str`): Path to the reference image that will be used as the base for editing. This image provides the visual context for the text-guided modifications.

- **`--guidance`** (optional, `float`, default: `2.5`): Guidance scale for the generation. Kontext typically works well with values between 2.0-4.0, with lower values allowing more creative freedom and higher values adhering more closely to the prompt.

**Note**: The Kontext tool automatically uses the appropriate model (`dev-kontext`), so you don't need to specify `--model`.

See the [Kontext](#-kontext) section for more details on this feature and comprehensive prompting strategies.

</details>

#### In-Context LoRA Command-Line Arguments

<details>
<summary>Click to expand In-Context LoRA arguments</summary>

The `mflux-generate-in-context` command supports most of the same arguments as `mflux-generate`, with these additional parameters:

- **`--image-path`** (required, `str`): Path to the reference image that will guide the style of the generated image.

- **`--lora-style`** (optional, `str`, default: `None`): The style to use for In-Context LoRA generation. Choose from: `couple`, `storyboard`, `font`, `home`, `illustration`, `portrait`, `ppt`, `sandstorm`, `sparklers`, or `identity`.

See the [In-Context Generation](#-in-context-generation) section for more details on how to use this feature effectively.

</details>

#### CatVTON Command-Line Arguments

<details>
<summary>Click to expand CatVTON arguments</summary>

The `mflux-generate-in-context-catvton` command supports most of the same arguments as `mflux-generate`, with these specific parameters:

- **`--person-image`** (required, `str`): Path to the person image who will "wear" the garment.

- **`--person-mask`** (required, `str`): Path to a binary mask image indicating where the garment should be applied on the person.

- **`--garment-image`** (required, `str`): Path to the garment image to be virtually worn.

- **`--save-full-image`** (optional, flag): Additionally save the full side-by-side image containing both the garment reference and the result.

See the [CatVTON (Virtual Try-On)](#-catvton-virtual-try-on) section for more details on this feature.

</details>

#### IC-Edit Command-Line Arguments

<details>
<summary>Click to expand IC-Edit arguments</summary>

The `mflux-generate-in-context-edit` command supports most of the same arguments as `mflux-generate`, with these specific parameters:

- **`--reference-image`** (required, `str`): Path to the reference image to be edited.

- **`--instruction`** (optional, `str`): Simple text instruction describing the desired edit (e.g., "make the hair black"). This will be automatically formatted as a diptych template. Either `--instruction` or `--prompt` is required.

- **`--save-full-image`** (optional, flag): Additionally save the full side-by-side image showing the original and edited versions.

**Note**: The IC-Edit tool automatically downloads and applies the required LoRA, and optimizes the resolution to 512px width for best results.

See the [IC-Edit (In-Context Editing)](#-ic-edit-in-context-editing) section for more details on this feature.

</details>

#### Redux Tool Command-Line Arguments

<details>
<summary>Click to expand Redux arguments</summary>

The `mflux-generate-redux` command uses most of the same arguments as `mflux-generate`, with these specific parameters:

- **`--redux-image-paths`** (required, `[str]`): Paths to one or more reference images to influence the generation.

- **`--redux-image-strengths`** (optional, `[float]`, default: `[1.0, 1.0, ...]`): Strength values (between 0.0 and 1.0) for each reference image. Higher values give more influence to that reference image. If not provided, all images use a strength of 1.0.

See the [Redux](#-redux) section for more details on this feature.

</details>

#### Concept Attention Command-Line Arguments

<details>
<summary>Click to expand Concept Attention arguments</summary>

The `mflux-concept` command supports most of the same arguments as `mflux-generate`, with these specific parameters:

- **`--concept`** (required, `str`): The concept prompt to use for attention visualization. This defines what specific aspect or element you want to analyze within your generated image.

- **`--heatmap-layer-indices`** (optional, `[int]`, default: `[15, 16, 17, 18]`): Transformer layer indices to use for heatmap generation. These layers capture different levels of abstraction in the attention mechanism.

- **`--heatmap-timesteps`** (optional, `[int]`, default: `None`): Timesteps to use for heatmap generation. If not specified, all timesteps are used. Lower timestep values focus on early stages of generation.

The `mflux-concept-from-image` command uses most of the same arguments as `mflux-concept`, with this additional parameter:

- **`--input-image-path`** (required, `str`): Path to a reference image for concept attention analysis. The model will analyze how the concept appears in this reference image and apply similar attention patterns to the generated image.

See the [Concept Attention](#-concept-attention) section for more details on this feature.

</details>

#### Fill Tool Command-Line Arguments

<details>
<summary>Click to expand Fill Tool arguments</summary>

The `mflux-generate-fill` command supports most of the same arguments as `mflux-generate`, with these specific parameters:

- **`--image-path`** (required, `str`): Path to the original image that will be modified.

- **`--masked-image-path`** (required, `str`): Path to a binary mask image where white areas (255) will be regenerated and black areas (0) will be preserved.

- **`--guidance`** (optional, `float`, default: `30.0`): The Fill tool works best with higher guidance values compared to regular image generation.

See the [Fill](#-fill) section for more details on inpainting and outpainting.

</details>

#### Depth Tool Command-Line Arguments

<details>
<summary>Click to expand Depth Tool arguments</summary>

The `mflux-generate-depth` command supports most of the same arguments as `mflux-generate`, with these specific parameters:

- **`--image-path`** (optional, `str`, default: `None`): Path to the reference image from which to extract a depth map. Either this or `--depth-image-path` must be provided.

- **`--depth-image-path`** (optional, `str`, default: `None`): Path to a pre-generated depth map image. Either this or `--image-path` must be provided.

The `mflux-save-depth` command for extracting depth maps without generating images has these arguments:

- **`--image-path`** (required, `str`): Path to the image from which to extract a depth map.

- **`--output`** (optional, `str`, default: Uses the input filename with "_depth" suffix): Path where the generated depth map will be saved.

- **`--quantize`** or **`-q`** (optional, `int`, default: `None`): Quantization for the Depth Pro model.

See the [Depth](#-depth) section for more details on this feature.

</details>

#### ControlNet Command-Line Arguments

<details>
<summary>Click to expand ControlNet arguments</summary>

The `mflux-generate-controlnet` command supports most of the same arguments as `mflux-generate`, with these additional parameters:

- **`--controlnet-image-path`** (required, `str`): Path to the local image used by ControlNet to guide output generation.

- **`--controlnet-strength`** (optional, `float`, default: `0.4`): Degree of influence the control image has on the output. Ranges from `0.0` (no influence) to `1.0` (full influence).

- **`--controlnet-save-canny`** (optional, bool, default: `False`): If set, saves the Canny edge detection reference image used by ControlNet.

See the [Controlnet](#%EF%B8%8F-controlnet) section for more details on how to use this feature effectively.

</details>

#### Upscale Command-Line Arguments

<details>
<summary>Click to expand Upscale arguments</summary>

The `mflux-upscale` command supports most of the same arguments as `mflux-generate`, with these specific parameters:

- **`--height`** (optional, `int` or scale factor, default: `auto`): Image height. Can be specified as pixels (e.g., `1024`), scale factors (e.g., `2x`, `1.5x`), or `auto` to use the source image height.

- **`--width`** (optional, `int` or scale factor, default: `auto`): Image width. Can be specified as pixels (e.g., `1024`), scale factors (e.g., `2x`, `1.5x`), or `auto` to use the source image width.

- **`--controlnet-image-path`** (required, `str`): Path to the source image to upscale.

- **`--controlnet-strength`** (optional, `float`, default: `0.4`): Degree of influence the control image has on the output. Ranges from `0.0` (no influence) to `1.0` (full influence).

**Scale Factor Examples:**
```bash
# Scale by 2x in both dimensions
mflux-upscale --height 2x --width 2x --controlnet-image-path source.png

# Scale height by 1.5x, set width to specific pixels
mflux-upscale --height 1.5x --width 1920 --controlnet-image-path source.png

# Use auto (keeps original dimensions)
mflux-upscale --height auto --width auto --controlnet-image-path source.png

# Mix scale factors and absolute values
mflux-upscale --height 2x --width 1024 --controlnet-image-path source.png
```

See the [Upscale](#-upscale) section for more details on how to use this feature effectively.

</details>

#### Training Arguments

<details>
<summary>Click to expand Training arguments</summary>

- **`--train-config`** (optional, `str`): Local path of the training configuration file. This file defines all aspects of the training process including model parameters, optimizer settings, and training data. See the [Training configuration](#training-configuration) section for details on the structure of this file.

- **`--train-checkpoint`** (optional, `str`): Local path of the checkpoint file which specifies how to continue the training process. Used when resuming an interrupted training run.

</details>

</details>

<details>
<summary>📝 <strong>Dynamic Prompts with `--prompt-file`</strong></summary>

MFlux supports dynamic prompt updates through the `--prompt-file` option. Instead of providing a fixed prompt with `--prompt`, you can specify a plain text file containing your prompt. The file is re-read before each generation, allowing you to modify prompts between iterations without restarting.

##### Key Benefits

- **Real-time Prompt Editing**: Edit prompts between generations while the program continues running
- **Iterative Refinement**: Fine-tune prompts based on previous results in a batch
- **Editor Integration**: Work with complex prompts in your preferred text editor
- **Consistent Parameters**: Experiment with prompt variations while keeping all other settings constant

##### Example Usage

```bash
# Generate 10 images using a prompt from a file
mflux-generate --prompt-file my_prompt.txt --auto-seeds 10
```

##### Workflow Example

1. Create a prompt file:
   ```
   echo "a surreal landscape with floating islands" > my_prompt.txt
   ```

2. Start a batch generation:
   ```
   mflux-generate --prompt-file my_prompt.txt --auto-seeds 10
   ```

3. After reviewing initial results, edit the prompt file while generation continues:
   ```
   echo "a surreal landscape with floating islands and waterfalls" > my_prompt.txt
   ```

4. Subsequent generations will automatically use your updated prompt

5. Continue refining as needed between generations

##### Notes

- `--prompt` and `--prompt-file` are mutually exclusive options
- Empty prompt files or non-existent files will raise appropriate errors
- Each generated image's metadata will contain the actual prompt used for that specific generation

</details>

<details>
<summary>⚙️ <strong>Parameters supported by config files</strong></summary>

#### How configs are used

- all config properties are optional and applied to the image generation if applicable
- invalid or incompatible properties will be ignored

#### Config schema

```json
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "seed": {
      "type": ["integer", "null"]
    },
    "steps": {
      "type": ["integer", "null"]
    },
    "guidance": {
      "type": ["number", "null"]
    },
    "quantize": {
      "type": ["null", "string"]
    },
    "lora_paths": {
      "type": ["array", "null"],
      "items": {
        "type": "string"
      }
    },
    "lora_scales": {
      "type": ["array", "null"],
      "items": {
        "type": "number"
      }
    },
    "prompt": {
      "type": ["string", "null"]
    }
  }
}
```

#### Example

```json
{
  "model": "dev",
  "seed": 42,
  "steps": 8,
  "guidance": 3.0,
  "quantize": 4,
  "lora_paths": [
    "/some/path1/to/subject.safetensors",
    "/some/path2/to/style.safetensors"
  ],
  "lora_scales": [
    0.8,
    0.4
  ],
  "prompt": "award winning modern art, MOMA"
}
```
</details>

---

### ⏱️ Image generation speed (updated)

These numbers are based on the *non-quantized* `schnell` model, with the configuration provided in the code snippet below.
To time your machine, run the following:
```sh
time mflux-generate \
--prompt "Luxury food photograph" \
--model schnell \
--steps 2 \
--seed 2 \
--height 1024 \
--width 1024
```

To find out the spec of your machine (including number of CPU cores, GPU cores, and memory, run the following command:
```sh
system_profiler SPHardwareDataType SPDisplaysDataType
```

| Device      | M-series            | User                                                                                                                               | Reported Time | Notes                           |
|-------------|---------------------|------------------------------------------------------------------------------------------------------------------------------------|---------------|---------------------------------|
| Mac Studio  | 2023 M2 Ultra       | [@awni](https://x.com/awnihannun/status/1823515121827897385)                                                                       | <15s          |                                 |
| Macbook Pro | 2024 M4 Max (128GB) | [@ivanfioravanti](https://gist.github.com/awni/a67d16d50f0f492d94a10418e0592bde?permalink_comment_id=5153531#gistcomment-5153531)  | ~19s          |                                 |
| Macbook Pro | 2023 M3 Max         | [@karpathy](https://gist.github.com/awni/a67d16d50f0f492d94a10418e0592bde?permalink_comment_id=5153531#gistcomment-5153531)        | ~20s          |                                 |
| -           | 2023 M2 Max (96GB)  | [@explorigin](https://github.com/filipstrand/mflux/issues/6)                                                                       | ~25s          |                                 |
| Mac Mini    | 2024 M4 Pro (64GB)  | [@Stoobs](https://github.com/filipstrand/mflux/issues/92)                                                                          | ~34s          |                                 |
| Mac Mini    | 2023 M2 Pro (32GB)  | [@leekichko](https://github.com/filipstrand/mflux/issues/85)                                                                       | ~54s          |                                 |
| -           | 2022 M1 MAX (64GB)  | [@BosseParra](https://x.com/BosseParra/status/1826191780812877968)                                                                 | ~55s          |                                 |
| Macbook Pro | 2023 M2 Max (32GB)  | @filipstrand                                                                                                                       | ~70s          |                                 |
| -           | 2023 M3 Pro (36GB)  | [@kush-gupt](https://github.com/filipstrand/mflux/issues/11)                                                                       | ~80s          |                                 |
| Mac Mini    | 2024 M4 (16GB)      | [@wnma3mz](https://github.com/filipstrand/mflux/issues/105)                                                                        | ~97s          | 512 x 512, 8-bit quantization   |
| Macbook Pro | 2021 M1 Pro (32GB)  | @filipstrand                                                                                                                       | ~160s         |                                 |
| -           | 2021 M1 Pro (16GB)  | [@qw-in](https://github.com/filipstrand/mflux/issues/7)                                                                            | ~175s         | Might freeze your mac           |
| Macbook Air | 2020 M1 (8GB)       | [@mbvillaverde](https://github.com/filipstrand/mflux/issues/13)                                                                    | ~335s         | With resolution 512 x 512       |


*Note that these numbers includes starting the application from scratch, which means doing model i/o, setting/quantizing weights etc.
If we assume that the model is already loaded, you can inspect the image metadata using `exiftool image.png` and see the total duration of the denoising loop (excluding text embedding).*

*These benchmarks are not very scientific and is only intended to give ballpark numbers. They were performed during different times with different MFLUX and MLX-versions etc. Additional hardware information such as number of GPU cores, Mac device etc. are not always known.*

---

### ↔️ Equivalent to Diffusers implementation

There is only a single source of randomness when generating an image: The initial latent array.
In this implementation, this initial latent is fully deterministically controlled by the input `seed` parameter.
However, if we were to import a fixed instance of this latent array saved from the Diffusers implementation, then MFLUX will produce an identical image to the Diffusers implementation (assuming a fixed prompt and using the default parameter settings in the Diffusers setup).


The images below illustrate this equivalence.
In all cases the Schnell model was run for 2 time steps.
The Diffusers implementation ran in CPU mode.
The precision for MFLUX can be set in the [Config](src/mflux/config/config.py) class.
There is typically a noticeable but very small difference in the final image when switching between 16bit and 32bit precision.

---
```
Luxury food photograph
```
![image](src/mflux/assets/comparison1.jpg)

---
```
detailed cinematic dof render of an old dusty detailed CRT monitor on a wooden desk in a dim room with items around, messy dirty room. On the screen are the letters "FLUX" glowing softly. High detail hard surface render
```
![image](src/mflux/assets/comparison2.jpg)

---

```
photorealistic, lotr, A tiny red dragon curled up asleep inside a nest, (Soft Focus) , (f_stop 2.8) , (focal_length 50mm) macro lens f/2. 8, medieval wizard table, (pastel) colors, (cozy) morning light filtering through a nearby window, (whimsical) steam shapes, captured with a (Canon EOS R5) , highlighting (serene) comfort, medieval, dnd, rpg, 3d, 16K, 8K
```
![image](src/mflux/assets/comparison3.jpg)

---


```
A weathered fisherman in his early 60s stands on the deck of his boat, gazing out at a stormy sea. He has a thick, salt-and-pepper beard, deep-set blue eyes, and skin tanned and creased from years of sun exposure. He's wearing a yellow raincoat and hat, with water droplets clinging to the fabric. Behind him, dark clouds loom ominously, and waves crash against the side of the boat. The overall atmosphere is one of tension and respect for the power of nature.
```
![image](src/mflux/assets/comparison4.jpg)

---

```
Luxury food photograph of an italian Linguine pasta alle vongole dish with lots of clams. It has perfect lighting and a cozy background with big bokeh and shallow depth of field. The mood is a sunset balcony in tuscany.  The photo is taken from the side of the plate. The pasta is shiny with sprinkled parmesan cheese and basil leaves on top. The scene is complemented by a warm, inviting light that highlights the textures and colors of the ingredients, giving it an appetizing and elegant look.
```
![image](src/mflux/assets/comparison5.jpg)

---

### 🗜️ Quantization

MFLUX supports running FLUX in 3, 4, 6, or 8-bit quantized mode. Running a quantized version can greatly speed up the
generation process and reduce the memory consumption by several gigabytes. [Quantized models also take up less disk space](#-size-comparisons-for-quantized-models).

```sh
mflux-generate \
    --model schnell \
    --steps 2 \
    --seed 2 \
    --quantize 8 \
    --height 1920 \
    --width 1024 \
    --prompt "Tranquil pond in a bamboo forest at dawn, the sun is barely starting to peak over the horizon, panda practices Tai Chi near the edge of the pond, atmospheric perspective through the mist of morning dew, sunbeams, its movements are graceful and fluid — creating a sense of harmony and balance, the pond's calm waters reflecting the scene, inviting a sense of meditation and connection with nature, style of Howard Terpning and Jessica Rossier"
```
![image](src/mflux/assets/comparison6.jpg)

*In this example, weights are quantized at **runtime** - this is convenient if you don't want to [save a quantized copy of the weights to disk](#-saving-a-quantized-version-to-disk), but still want to benefit from the potential speedup and RAM reduction quantization might bring.*


By selecting the `--quantize` or `-q` flag to be `4`, `8`, or removing it entirely, we get all 3 images above. As can be seen, there is very little difference between the images (especially between the 8-bit, and the non-quantized result).
Image generation times in this example are based on a 2021 M1 Pro (32GB) machine. Even though the images are almost identical, there is a ~2x speedup by
running the 8-bit quantized version on this particular machine. Unlike the non-quantized version, for the 8-bit version the swap memory usage is drastically reduced and GPU utilization is close to 100% during the whole generation. Results here can vary across different machines.

For systems with limited RAM, you can also use the `--low-ram` option which reduces GPU memory usage by constraining the MLX cache size and releasing text encoders and transformer components after use. This option is particularly helpful for preventing system memory swapping to disk on machines with less available RAM.

#### 📊 Size comparisons for quantized models

The model sizes for both `schnell` and `dev` at various quantization levels are as follows:

| 3 bit  | 4 bit  | 6 bit   | 8 bit   | Original (16 bit) |
|--------|--------|---------|---------|-------------------|
| 7.52GB | 9.61GB | 13.81GB | 18.01GB | 33.73GB           |


#### 💾 Saving a quantized version to disk

To save a local copy of the quantized weights, run the `mflux-save` command like so:

```sh
mflux-save \
    --path "/Users/filipstrand/Desktop/schnell_8bit" \
    --model schnell \
    --quantize 8
```

*Note that when saving a quantized version, you will need the original huggingface weights.*

It is also possible to specify [LoRA](#-lora) adapters when saving the model, e.g

```sh
mflux-save \
    --path "/Users/filipstrand/Desktop/schnell_8bit" \
    --model schnell \
    --quantize 8 \
    --lora-paths "/path/to/lora.safetensors" \
    --lora-scales 0.7
```

When generating images with a model like this, no LoRA adapter is needed to be specified since
it is already baked into the saved quantized weights.

#### 💽 Loading and running a quantized version from disk

To generate a new image from the quantized model, simply provide a `--path` to where it was saved:

```sh
mflux-generate \
    --path "/Users/filipstrand/Desktop/schnell_8bit" \
    --model schnell \
    --steps 2 \
    --seed 2 \
    --height 1920 \
    --width 1024 \
    --prompt "Tranquil pond in a bamboo forest at dawn, the sun is barely starting to peak over the horizon, panda practices Tai Chi near the edge of the pond, atmospheric perspective through the mist of morning dew, sunbeams, its movements are graceful and fluid — creating a sense of harmony and balance, the pond's calm waters reflecting the scene, inviting a sense of meditation and connection with nature, style of Howard Terpning and Jessica Rossier"
```

*Note: When loading a quantized model from disk, there is no need to pass in `-q` flag, since we can infer this from the weight metadata.*

*Also Note: Once we have a local model (quantized [or not](#-running-a-non-quantized-model-directly-from-disk)) specified via the `--path` argument, the huggingface cache models are not required to launch the model.
In other words, you can reclaim the 34GB diskspace (per model) by deleting the full 16-bit model from the [Huggingface cache](#%EF%B8%8F-generating-an-image) if you choose.*

⚠️ * Quantized models saved with mflux < v.0.6.0 will not work with v.0.6.0 and later due to updated implementation. The solution is to [save a new quantized local copy](https://github.com/filipstrand/mflux/issues/149) 

*If you don't want to download the full models and quantize them yourself, the 4-bit weights are available here for a direct download:*
- For mflux < v.0.6.0:
  - [madroid/flux.1-schnell-mflux-4bit](https://huggingface.co/madroid/flux.1-schnell-mflux-4bit)
  - [madroid/flux.1-dev-mflux-4bit](https://huggingface.co/madroid/flux.1-dev-mflux-4bit)
- For mflux >= v.0.6.0:
  - [dhairyashil/FLUX.1-schnell-mflux-v0.6.2-4bit](https://huggingface.co/dhairyashil/FLUX.1-schnell-mflux-v0.6.2-4bit)
  - [dhairyashil/FLUX.1-dev-mflux-4bit](https://huggingface.co/dhairyashil/FLUX.1-dev-mflux-4bit)

<details>
<summary>Using the community model support, the quantized weights can be also be automatically downloaded</summary>

```sh
mflux-generate \
    --model "dhairyashil/FLUX.1-schnell-mflux-v0.6.2-4bit" \
    --base-model schnell \
    --steps 2 \
    --seed 2 \
    --prompt "Luxury food photograph"
```

</details>

---

### 💽 Running a non-quantized model directly from disk

MFLUX also supports running a non-quantized model directly from a custom location.
In the example below, the model is placed in `/Users/filipstrand/Desktop/schnell`:

```sh
mflux-generate \
    --path "/Users/filipstrand/Desktop/schnell" \
    --model schnell \
    --steps 2 \
    --seed 2 \
    --prompt "Luxury food photograph"
```

Note that the `--model` flag must be set when loading a model from disk.

Also note that unlike when using the typical `alias` way of initializing the model (which internally handles that the required resources are downloaded),
when loading a model directly from disk, we require the downloaded models to look like the following:

<details>
<summary>📁 <strong>Required directory structure</strong></summary>

```
.
├── text_encoder
│   └── model.safetensors
├── text_encoder_2
│   ├── model-00001-of-00002.safetensors
│   └── model-00002-of-00002.safetensors
├── tokenizer
│   ├── merges.txt
│   ├── special_tokens_map.json
│   ├── tokenizer_config.json
│   └── vocab.json
├── tokenizer_2
│   ├── special_tokens_map.json
│   ├── spiece.model
│   ├── tokenizer.json
│   └── tokenizer_config.json
├── transformer
│   ├── diffusion_pytorch_model-00001-of-00003.safetensors
│   ├── diffusion_pytorch_model-00002-of-00003.safetensors
│   └── diffusion_pytorch_model-00003-of-00003.safetensors
└── vae
    └── diffusion_pytorch_model.safetensors
```

</details>

This mirrors how the resources are placed in the [HuggingFace Repo](https://huggingface.co/black-forest-labs/FLUX.1-schnell/tree/main) for FLUX.1.
*Huggingface weights, unlike quantized ones exported directly from this project, have to be
processed a bit differently, which is why we require this structure above.*

---

### 🌐 Third-Party HuggingFace Model Support

MFLUX now supports compatible third-party models from HuggingFace that follow the FLUX architecture. This opens up the ecosystem to community-created models that may offer different capabilities, sizes, or specializations.

To use a third-party model, specify the HuggingFace repository ID with the `--model` parameter and indicate which base architecture (dev or schnell) it's derived from using the `--base-model` parameter:

```sh
mflux-generate \
    --model Freepik/flux.1-lite-8B \
    --base-model schnell \
    --steps 4 \
    --seed 42 \
    --prompt "A beautiful landscape with mountains and a lake"
```

Some examples of compatible third-party models include:
- [Freepik/flux.1-lite-8B-alpha](https://huggingface.co/Freepik/flux.1-lite-8B-alpha) - A lighter version of FLUX
- [shuttleai/shuttle-3-diffusion](https://huggingface.co/shuttleai/shuttle-3-diffusion) - Shuttle's implementation based on FLUX

The model will be automatically downloaded from HuggingFace the first time you use it, similar to the official FLUX models.

*Note: Third-party models may have different performance characteristics, capabilities, or limitations compared to the official FLUX models. Always refer to the model's documentation on HuggingFace for specific usage instructions.*

---

### 🎨 Image-to-Image

One way to condition the image generation is by starting from an existing image and let MFLUX produce new variations.
Use the `--image-path` flag to specify the reference image, and the `--image-strength` to control how much the reference 
image should guide the generation. For example, given the reference image below, the following command produced the first
image using the  [Sketching](https://civitai.com/models/803456/sketching?modelVersionId=898364) LoRA: 

```sh
mflux-generate \
--prompt "sketching of an Eiffel architecture, masterpiece, best quality. The site is lit by lighting professionals, creating a subtle illumination effect. Ink on paper with very fine touches with colored markers, (shadings:1.1), loose lines, Schematic, Conceptual, Abstract, Gestural. Quick sketches to explore ideas and concepts." \
--image-path "reference.png" \
--image-strength 0.3 \
--lora-paths Architectural_Sketching.safetensors \
--lora-scales 1.0 \
--model dev \
--steps 20 \
--seed 43 \
--guidance 4.0 \
--quantize 8 \
--height 1024 \
--width 1024
```

Like with [Controlnet](#-controlnet), this technique combines well with [LoRA](#-lora) adapters:

![image](src/mflux/assets/img2img.jpg)

In the examples above the following LoRAs are used [Sketching](https://civitai.com/models/803456/sketching?modelVersionId=898364), [Animation Shot](https://civitai.com/models/883914/animation-shot-flux-xl-ponyrealism) and [flux-film-camera](https://civitai.com/models/874708?modelVersionId=979175) are used.

---

### 🔌 LoRA

MFLUX support loading trained [LoRA](https://huggingface.co/docs/diffusers/en/training/lora) adapters (actual training support is coming).

The following example [The_Hound](https://huggingface.co/TheLastBen/The_Hound) LoRA from [@TheLastBen](https://github.com/TheLastBen):

```sh
mflux-generate --prompt "sandor clegane" --model dev --steps 20 --seed 43 -q 8 --lora-paths "sandor_clegane_single_layer.safetensors"
```

![image](src/mflux/assets/lora1.jpg)
---

The following example is [Flux_1_Dev_LoRA_Paper-Cutout-Style](https://huggingface.co/Norod78/Flux_1_Dev_LoRA_Paper-Cutout-Style) LoRA from [@Norod78](https://huggingface.co/Norod78):

```sh
mflux-generate --prompt "pikachu, Paper Cutout Style" --model schnell --steps 4 --seed 43 -q 8 --lora-paths "Flux_1_Dev_LoRA_Paper-Cutout-Style.safetensors"
```
![image](src/mflux/assets/lora2.jpg)

*Note that LoRA trained weights are typically trained with a **trigger word or phrase**. For example, in the latter case, the sentence should include the phrase **"Paper Cutout Style"**.*

*Also note that the same LoRA weights can work well with both the `schnell` and `dev` models. Refer to the original LoRA repository to see what mode it was trained for.*

#### Multi-LoRA

Multiple LoRAs can be sent in to combine the effects of the individual adapters. The following example combines both of the above LoRAs:

```sh
mflux-generate \
   --prompt "sandor clegane in a forest, Paper Cutout Style" \
   --model dev \
   --steps 20 \
   --seed 43 \
   --lora-paths sandor_clegane_single_layer.safetensors Flux_1_Dev_LoRA_Paper-Cutout-Style.safetensors \
   --lora-scales 1.0 1.0 \
   -q 8
```
![image](src/mflux/assets/lora3.jpg)

Just to see the difference, this image displays the four cases: One of having both adapters fully active, partially active and no LoRA at all.
The example above also show the usage of `--lora-scales` flag.

#### LoRA Library Path

MFLUX supports a convenient LoRA library feature that allows you to reference LoRA files by their basename instead of full paths. This is particularly useful when you have a collection of LoRA files organized in one or more directories.

To use this feature, set the `LORA_LIBRARY_PATH` environment variable to point to your LoRA directories. You can specify multiple directories separated by colons (`:`):

```sh
export LORA_LIBRARY_PATH="/path/to/loras:/another/path/to/more/loras"
```

Once set, MFLUX will automatically discover all `.safetensors` files in these directories (including subdirectories) and allow you to reference them by their basename:

```sh
# Instead of using full paths:
mflux-generate \
    --prompt "a portrait" \
    --lora-paths "/path/to/loras/style1.safetensors" "/another/path/to/more/loras/style2.safetensors"

# You can simply use basenames:
mflux-generate \
    --prompt "a portrait" \
    --lora-paths "style1" "style2"
```

<details>
<summary>Notes on organizing your LoRA files</summary>

- The basename is the filename without the `.safetensors` extension
- If multiple files have the same basename, the first directory in `LORA_LIBRARY_PATH` takes precedence
  - to workaround this, rename or symlink to another name your `.safetensors` files to avoid conflicts
- Full paths still work as before, making this feature fully backwards compatible
- The library paths are scanned recursively, so LoRAs in subdirectories are also discovered. However, we do not recommend setting the library paths to a directory with a large number of files, as it can slow down the scanning process on every run.
</details>

#### Supported LoRA formats (updated)

Since different fine-tuning services can use different implementations of FLUX, the corresponding
LoRA weights trained on these services can be different from one another. The aim of MFLUX is to support the most common ones.
The following table show the current supported formats:

| Supported | Name      | Example                                                                                                  | Notes                               |
|-----------|-----------|----------------------------------------------------------------------------------------------------------|-------------------------------------|
| ✅        | BFL       | [civitai - Impressionism](https://civitai.com/models/545264/impressionism-sdxl-pony-flux)                | Many things on civitai seem to work |
| ✅        | Diffusers | [Flux_1_Dev_LoRA_Paper-Cutout-Style](https://huggingface.co/Norod78/Flux_1_Dev_LoRA_Paper-Cutout-Style/) |                                     |
| ❌        | XLabs-AI  | [flux-RealismLora](https://huggingface.co/XLabs-AI/flux-RealismLora/tree/main)                           |                                     |

To report additional formats, examples or other any suggestions related to LoRA format support, please see [issue #47](https://github.com/filipstrand/mflux/issues/47).

---

### 🎭 In-Context Generation

In-Context Generation is a powerful collection of techniques that allows you to generate images based on reference images and context, without requiring model fine-tuning. MFLUX supports multiple in-context approaches, each designed for specific use cases ranging from style transfer to virtual try-on and image editing.

#### 📸 Kontext

FLUX.1 Kontext, released in June 2025, is Black Forest Labs' [official model](https://bfl.ai/models/flux-kontext) that goes beyond text-to-image generation. Unlike previous flow models that only allow text-based generation, Kontext understands and creates from existing images, enabling flexible image editing through simple text instructions without complex workflows or finetuning.
Out of the various in-context generation techniques, Kontext is the most advanced and versatile, allowing for a wide range of image editing and generation tasks. It is recommended to try Kontext first before exploring other in-context methods.

![kontext example](src/mflux/assets/kontext_example.jpg)

*Original images credit: [Jon Tyson (sign)](https://unsplash.com/photos/the-sign-urges-viewers-to-get-more-rest-a4Cm6EnYtls), [Daniel Seßler (town)](https://unsplash.com/photos/colorful-buildings-line-a-coastal-towns-harbor-haVcuj4buqE), [Liga Alksne (house)](https://unsplash.com/photos/a-building-with-a-mountain-in-the-background-jZV4jaycoxE) on Unsplash*

Kontext's core capabilities include **character consistency** (preserving unique elements across scenes), **local editing** (targeted modifications without affecting the rest), **style reference** (generating novel scenes while preserving styles), and **interactive speed** for iterative refinement with minimal latency.


##### Example Usage with sequential edits

![kontext sequential edit example](src/mflux/assets/kontext_sequential_edit_example.jpg)

The image above illustrates **sequential editing** with Kontext - making multiple individual edits to progressively refine an image. Each edit builds upon the previous result, allowing for precise, iterative modifications.

Here's an example of one step in a sequential editing workflow, where we add atmospheric mist to the mountains:

```bash
mflux-generate-kontext \
  --image-path "house_without_background_objects.png" \
  --prompt "Make the mountains in the background be covered in white mist so that they are barely visible. Preserve the original camera angle, house placement etc" \
  --steps 20 \
  --seed 9375333 \
  --guidance 2.5 \
  --width 912 \
  --height 1360 \
  -q 8
```

##### Tips for Best Results

For comprehensive prompting strategies, see the [official Black Forest Labs Kontext prompting guide](https://docs.bfl.ai/guides/prompting_guide_kontext_i2i).

**Core Principles:**
- **Be Specific and Clear**: Use precise descriptions, avoid vague terms
- **Step-by-step Editing**: Break complex modifications into multiple simple steps  
- **Explicit Preservation**: State what should remain unchanged
- **Verb Selection**: Use "change", "replace" rather than "transform"

**Prompting Techniques:**

*Basic Modifications:*
- Simple: `"Change the car color to red"`
- Maintain style: `"Change to daytime while maintaining the same style of the painting"`

*Character Consistency:*
- Use specific descriptions: `"The woman with short black hair"` instead of `"she"`
- Preserve features: `"while maintaining the same facial features, hairstyle, and expression"`

*Style Transfer:*
- Name the style: `"Transform to Bauhaus art style"`
- Describe characteristics: `"Transform to oil painting with visible brushstrokes, thick paint texture"`
- Preserve composition: `"Change to Bauhaus style while maintaining the original composition"`

*Text Editing:*
- Use quotes: `"Replace 'joy' with 'BFL'"`
- Maintain format: `"Replace text while maintaining the same font style"`

**Common Problems & Solutions:**

❌ **Character changes too much**: `"Transform the person into a Viking"`  
✅ **Better**: `"Change the clothes to be a viking warrior while preserving facial features"`

❌ **Position changes**: `"Put him on a beach"`  
✅ **Better**: `"Change the background to a beach while keeping the person in the exact same position, scale, and pose"`

❌ **Vague style**: `"Make it a sketch"`  
✅ **Better**: `"Convert to pencil sketch with natural graphite lines, cross-hatching, and visible paper texture"`

**Ready-to-Use Templates:**
- **Object Modification**: `"Change [object] to [new state], keep [content to preserve] unchanged"`
- **Style Transfer**: `"Transform to [specific style], while maintaining [composition/character/other] unchanged"`
- **Background Replacement**: `"Change the background to [new background], keep the subject in the exact same position and pose"`
- **Text Editing**: `"Replace '[original text]' with '[new text]', maintain the same font style"`

**Technical Tips:**
1. **High-Quality References**: Use clear, well-lit reference images with distinct characteristics
2. **Guidance Scale**: Experiment with guidance values between 2.5-4.0 for optimal balance
3. **Multiple Seeds**: Try different seeds to explore variations while maintaining consistency
4. **Resolution**: Use standard resolutions (1024x1024, 768x1024, etc.) for best results

⚠️ *Note: Using the Kontext tool requires an additional ~34 GB download from [black-forest-labs/FLUX.1-Kontext-dev](https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev). The download happens automatically on first use.*

#### 🎨 In-Context LoRA

In-Context LoRA allows you to generate images in a specific style based on a reference image. This approach uses specialized LoRA weights that enable the model to understand and apply the visual context from your reference image to a new generation.

This feature is based on the [In-Context LoRA for Diffusion Transformers](https://github.com/ali-vilab/In-Context-LoRA) project by Ali-ViLab.

![image](src/mflux/assets/in_context_example.jpg)

To use In-Context LoRA, you need:
1. A reference image
2. A style LoRA (optional - the in-context ability works without LoRAs, but they can significantly enhance the results)

##### Available Styles

MFLUX provides several pre-defined styles from the [Hugging Face ali-vilab/In-Context-LoRA repository](https://huggingface.co/ali-vilab/In-Context-LoRA) that you can use with the `--lora-style` argument:

| Style Name     | Description                               |
|----------------|-------------------------------------------|
| `couple`       | Couple profile photography style          |
| `storyboard`   | Film storyboard sketching style           |
| `font`         | Font design and typography style          |
| `home`         | Home decoration and interior design style |
| `illustration` | Portrait illustration style               |
| `portrait`     | Portrait photography style                |
| `ppt`          | Presentation template style               |
| `sandstorm`    | Sandstorm visual effect                   |
| `sparklers`    | Sparklers visual effect                   |
| `identity`     | Visual identity and branding design style |

##### How It Works

The In-Context LoRA generation creates a side-by-side image where:
- The left side shows your reference image with noise applied
- The right side shows the new generation that follows your prompt while maintaining the visual context

The final output is automatically cropped to show only the right half (the generated image).

![image](src/mflux/assets/in_context_how_it_works.jpg)

##### Prompting for In-Context LoRA

For best results with In-Context LoRA, your prompt should describe both the reference image and the target image you want to generate. Use markers like `[IMAGE1]`, `[LEFT]`, or `[RIGHT]` to distinguish between the two parts.

Here's an example:

```sh
mflux-generate-in-context \
  --model dev \
  --steps 20 \
  --quantize 8 \
  --seed 42 \
  --height 1024 \
  --width 1024 \
  --image-path "reference.png" \
  --lora-style identity \
  --prompt "In this set of two images, a bold modern typeface with the brand name 'DEMA' is introduced and is shown on a company merchandise product photo; [IMAGE1] a simplistic black logo featuring a modern typeface with the brand name 'DEMA' on a bright light green/yellowish background; [IMAGE2] the design is printed on a green/yellowish hoodie as a company merchandise product photo with a plain white background."
```

This prompt clearly describes both the reference image (after `[IMAGE1]`) and the desired output (after `[IMAGE2]`). Other marker pairs you can use include:
- `[LEFT]` and `[RIGHT]`
- `[TOP]` and `[BOTTOM]`
- `[REFERENCE]` and `[OUTPUT]`

**Important**: In the current implementation, the reference image is ALWAYS placed on the left side of the composition, and the generated image on the right side. When using marker pairs in your prompt, the first marker (e.g., `[IMAGE1]`, `[LEFT]`, `[REFERENCE]`) always refers to your reference image, while the second marker (e.g., `[IMAGE2]`, `[RIGHT]`, `[OUTPUT]`) refers to what you want to generate.

##### Tips for Best Results

1. **Choose the right reference image**: Select a reference image with a clear composition and structure that matches your intended output.
2. **Adjust guidance**: Higher guidance values (7.0-9.0) tend to produce results that more closely follow your prompt.
3. **Try different styles**: Each style produces distinctly different results - experiment to find the one that best matches your vision.
4. **Increase steps**: For higher quality results, use 25-30 steps.
5. **Detailed prompting**: Be specific about both the reference image and your desired output in your prompt.
6. **Try without LoRA**: While LoRAs enhance the results, you can experiment without them to see the base in-context capabilities.

#### 👕 CatVTON (Virtual Try-On)

⚠️ **Experimental Feature**: CatVTON is an experimental feature and may be removed or significantly changed in future updates.

CatVTON enables virtual try-on capabilities using in-context learning. This approach allows you to generate images of people wearing different garments by providing a person image, a person mask, and a garment image.

The implementation is based on [@nftblackmagic/catvton-flux](https://github.com/nftblackmagic/catvton-flux) and uses the model weights from [xiaozaa/catvton-flux-beta](https://huggingface.co/xiaozaa/catvton-flux-beta) (approximately 24 GB download).

![catvton example](src/mflux/assets/catvton_example.jpg)

##### How it works

CatVTON uses a sophisticated in-context learning approach that creates a side-by-side composition for virtual try-on:

1. **Image Layout**: The garment image is placed on the left side, while the person image is positioned on the right side of the composition.

2. **Selective Processing**: The model treats each side differently:
   - **Left side (garment)**: The model can only *read* information from this area - no inpainting or modifications occur here. This serves as a reference for the garment's appearance, texture, and style.
   - **Right side (person)**: Only the areas highlighted in the person mask are inpainted. The mask determines exactly which parts of the person image should be modified to "wear" the garment.

3. **Contextual Understanding**: By reading the garment characteristics from the left and selectively applying them to the masked regions on the right, the model creates a realistic virtual try-on that maintains proper lighting, shadows, and fabric behavior.

This approach allows the model to understand the garment's properties while precisely controlling which parts of the person image should be transformed, resulting in natural-looking virtual clothing fitting.

![catvton how it works](src/mflux/assets/catvton_how_it_works.jpg)

##### Example Command

```bash
mflux-generate-in-context-catvton \
  --person-image "person.png" \
  --person-mask "mask.png" \
  --garment-image "garment.png" \
  --prompt "The pair of images highlights a clothing and its styling on a model, high resolution, 4K, 8K; [IMAGE1] Detailed product shot of a light blue shirt with designer details such as white and pink patterns; [IMAGE2] The *exact* same cloth (the light blue shirt with designer details such as white and pink patterns) is worn by a model in a lifestyle setting." \
  --steps 20 \
  --seed 6269363 \
  --guidance 30 \
  --height 1024 \
  --width 891 \
  -q 8
```



##### Required Inputs

- **Person Image**: A photo of the person who will "wear" the garment
- **Person Mask**: A binary mask indicating the areas where the garment should be applied
- **Garment Image**: The clothing item to be virtually worn

##### CatVTON Features

- **Automatic Prompting**: If no prompt is provided, CatVTON uses an optimized default prompt designed for virtual try-on scenarios
- **Side-by-Side Generation**: Creates a diptych showing the garment product shot alongside the styled result
- **Optimized for Clothing**: Specifically tuned for clothing and fashion applications
- **High-Quality Results**: Maintains realistic lighting, shadows, and fabric properties

##### Tips for CatVTON

1. **High-Quality Inputs**: Use high-resolution, well-lit images for best results
2. **Accurate Masks**: Ensure the person mask precisely covers the areas where the garment should appear
3. **Consistent Lighting**: Match lighting conditions between person and garment images when possible
4. **Garment Type**: Works best with clearly defined clothing items (shirts, dresses, jackets, etc.)
5. **Pattern Limitations**: Very intricate patterns, fine details, and complex textures may not transfer well - simpler, solid colors and basic patterns typically work better
6. **Try Multiple Seeds**: Generate several variations with different seeds to find the best result, as outcomes can vary significantly

#### ✏️ IC-Edit (In-Context Editing)

⚠️ **Experimental Feature**: IC-Edit is an experimental feature and may be removed or significantly changed in future updates.

IC-Edit provides intuitive image editing capabilities using natural language instructions. This approach automatically applies a specialized LoRA and generates edited versions of your reference image based on simple text instructions.

The implementation is based on [@River-Zhang/ICEdit](https://github.com/River-Zhang/ICEdit).

![ic-edit example](src/mflux/assets/ic_edit_example.jpg)
*Original images credit: [Jimmy Chang (flower)](https://unsplash.com/photos/closeup-photography-of-pink-lotus-flower-LP5WXkrnxX0), [Kar Ming Moo (tiger)](https://unsplash.com/photos/brown-and-black-tiger-lying-on-ground-Q_3WmguWgYg), [Martin Baron (car)](https://unsplash.com/photos/a-shiny-black-car-parks-on-a-street-8JoIWt2KzYI) on Unsplash*

##### How it works

IC-Edit employs a streamlined diptych approach for natural language image editing:

1. **Image Layout**: The reference image is placed on the left side, while the right side is reserved for the edited result.

2. **Automatic Masking**: Unlike CatVTON, no user-provided mask is required. The system automatically creates a simple but effective mask:
   - **Left side (reference)**: Completely black mask - no denoising occurs here, preserving the original image as reference
   - **Right side (edit area)**: Completely white mask - the entire area is available for denoising and modification

3. **Instruction Processing**: Your natural language instruction is automatically wrapped in a diptych template that tells the model to create the desired changes on the right side while using the left side as reference.

4. **Guided Generation**: The model reads the visual information from the left side and applies the requested modifications to generate the edited result on the right side, maintaining visual coherence and context.

This simplified approach removes the complexity of mask creation while still providing precise control through natural language instructions, making image editing accessible and intuitive.

![ic-edit how it works](src/mflux/assets/ic_edit_how_it_works.jpg)

##### Example Command

```bash
mflux-generate-in-context-edit \
  --reference-image "flower.jpg" \
  --prompt "two images of the exact same flower in two different styles: On the left the photo has is in bright colors showing green leaves and a pink flower. On the right, the *exact* same photo (same flower, same leaves, same background, identical etc). but with where everything is black and white except for the flower which is still in color. The content is *exactly* the same between the left and right image, except only for the coloring (black and white for everything except for the colorful flower)" \
  --steps 20 \
  --seed 8570325 \
  --guidance 30 \
  -q 8
```



##### Key Features

- **Natural Language Instructions**: Use simple, descriptive instructions like "make the hair black", "add sunglasses", or "change the background to a beach"
- **Automatic Prompting**: Instructions are automatically wrapped in a diptych template for optimal results
- **Optimal Resolution**: Automatically resizes to 512px width (the resolution IC-Edit was trained on) while maintaining aspect ratio
- **Specialized LoRA**: Automatically downloads and applies the IC-Edit LoRA for enhanced editing capabilities

##### IC-Edit Options

You can use either `--instruction` for simple edits or `--prompt` for full control:

```sh
# Using instruction (recommended)
mflux-generate-in-context-edit \
  --reference-image "photo.jpg" \
  --instruction "remove the glasses"

# Using custom prompt (advanced)
mflux-generate-in-context-edit \
  --reference-image "photo.jpg" \
  --prompt "A diptych with two side-by-side images of the same scene. On the right, the scene is exactly the same as on the left but with a vintage filter applied"
```

##### Tips for IC-Edit

1. **Clear Instructions**: Use specific, actionable instructions for best results
2. **Single Changes**: Focus on one edit at a time for more predictable results
3. **Reference Quality**: Higher quality reference images generally produce better edits
4. **Iterative Editing**: Use the output of one edit as input for the next to build complex changes
5. **Try Multiple Seeds**: Generate several variations with different seeds to find the best result, as outcomes can vary significantly

⚠️ *Note: Using the IC-Edit tool requires an additional ~230MB download for the specialized LoRA weights. The download happens automatically on first use. This remarkably small LoRA (less than 0.7% of the base model's ~34GB) enables sophisticated image editing capabilities, demonstrating that the base FLUX model is already highly competent at style transfers and transformations - the LoRA simply provides the user control and interface to access these latent capabilities.*

---

### 🛠️ Flux Tools

MFLUX supports the official [Flux.1 Tools](https://blackforestlabs.ai/flux-1-tools/).

#### 🖌️ Fill

The Fill tool uses the [FLUX.1-Fill-dev](https://huggingface.co/black-forest-labs/FLUX.1-Fill-dev) model to allow you to selectively edit parts of an image by providing a binary mask. This is useful for inpainting (replacing specific areas) and outpainting (expanding the canvas).

#### Inpainting

Inpainting allows you to selectively regenerate specific parts of an image while preserving the rest. This is perfect for removing unwanted objects, adding new elements, or changing specific areas of an image without affecting the surrounding content. The Fill tool understands the context of the entire image and creates seamless edits that blend naturally with the preserved areas.

![inpaint example](src/mflux/assets/inpaint_example.jpg)
*Original dog image credit: [Julio Bernal on Unsplash](https://unsplash.com/photos/brown-and-white-long-coated-dog-WLGx0fKFSeI)*

##### Creating Masks

Before using the Fill tool, you need an image and a corresponding mask. You can create a mask using the included tool:

```bash
python -m tools.fill_mask_tool /path/to/your/image.jpg
```

This will open an interactive interface where you can paint over the areas you want to regenerate.
Pressing the `s` key will save the mask at the same location as the image.

##### Example

To regenerate specific parts of an image:

```bash
mflux-generate-fill \
  --prompt "A professionally shot studio photograph of a dog wearing a red hat and ski goggles. The dog is centered against a uniformly bright yellow background, with well-balanced lighting and sharp details." \
  --steps 20 \
  --seed 42 \
  --height 1280 \
  --width 851 \
  --guidance 30 \
  -q 8 \
  --image-path "dog.png" \
  --masked-image-path "dog_mask.png" \
```

#### Outpainting

![outpaint example](src/mflux/assets/outpaint_example.jpg)
*Original room image credit: [Alexey Aladashvili on Unsplash](https://unsplash.com/photos/a-living-room-with-a-couch-and-a-table-JFzglhmwlck)*

Outpainting extends your image beyond its original boundaries, allowing you to expand the canvas in any direction while maintaining visual consistency with the original content. This is useful for creating wider landscapes, revealing more of a scene, or transforming a portrait into a full-body image. The Fill tool intelligently generates new content that seamlessly connects with the existing image.

You can expand the canvas of your image using the provided tool:

```bash
python -m tools.create_outpaint_image_canvas_and_mask \
  /path/to/your/image.jpg \
  --image-outpaint-padding "0,30%,20%,0"
```

As an example, here's how to add 25% padding to both the left and right sides of an image:

```bash
python -m tools.create_outpaint_image_canvas_and_mask \
  room.png \
  --image-outpaint-padding "0,25%,0,25%"
```

The padding format is "top,right,bottom,left" where each value can be in pixels or as a percentage. For example, "0,30%,20%,0" expands the canvas by 30% to the right and 20% to the bottom.

After running this command, you'll get two files: an expanded canvas with your original image and a corresponding mask. You can then run the `mflux-generate-fill` command similar to the inpainting example, using these files as input.

##### Example

Once you've created the expanded canvas and mask files, run the Fill tool on them:

```bash
mflux-generate-fill \
  --prompt "A detailed interior room photograph with natural lighting, extended in a way that perfectly complements the original space. The expanded areas continue the architectural style, color scheme, and overall aesthetic of the room seamlessly." \
  --steps 25 \
  --seed 43 \
  --guidance 30 \
  -q 8 \
  --image-path "room.png" \
  --masked-image-path "room_mask.png" \
```

##### Tips for Best Results

- **Model**: The model is always `FLUX.1-Fill-dev` and should not be specified.
- **Guidance**: Higher guidance values (around 30) typically yield better results. This is the default if not specified.
- **Resolution**: Higher resolution images generally produce better results with the Fill tool.
- **Masks**: Make sure your mask clearly defines the areas you want to regenerate. White areas in the mask will be regenerated, while black areas will be preserved.
- **Prompting**: For best results, provide detailed prompts that describe both what you want in the newly generated areas and how it should relate to the preserved parts of the image.
- **Steps**: Using 20-30 denoising steps generally produces higher quality results.

⚠️ *Note: Using the Fill tool requires an additional 33.92 GB download from [black-forest-labs/FLUX.1-Fill-dev](https://huggingface.co/black-forest-labs/FLUX.1-Fill-dev). The download happens automatically on first use.*

#### 🔍 Depth

Using the depth tool, you can generate high-quality images constrained on a depth map from a reference image. These maps represent the estimated distance of each pixel from the camera, with brighter areas representing objects closer to the camera and darker areas representing objects farther away.

For state-of-the-art depth extraction, MFLUX now supports the [Depth Pro](https://github.com/apple/ml-depth-pro) model, natively implemented in MLX based on the reference implementation. 

![depth example](src/mflux/assets/depth_example.jpg)
*Original hallway image credit: [Yuliya Matuzava on Unsplash](https://unsplash.com/photos/a-long-hallway-in-a-building-with-arches-and-arches-39NJt9jfGSU)*

##### Example

Using the `mflux-generate-depth` we can let MFLUX generate a new image conditioned on the depth map of the input image:

```bash
mflux-generate-depth \
  --prompt "A hallway made of white carrara marble" \
  -q 8 \
  --height 1680 \
  --width 1200 \
  --image-path "original_image.png" \
  --steps 20 \
  --seed 7102725
```

Instead of automatically generating the depth map based on a reference image, you can instead input the depth map yourself using the `--depth-image-path`, like so:

```bash
mflux-generate-depth \
  --prompt "A hallway made of white carrara marble" \
  -q 8 \
  --height 1680 \
  --width 1200 \
  --depth-image-path "depth_image.png" \
  --steps 20
```

To generate and export the depth map from an image without triggering the image generation, run the following command:

```bash
mflux-save-depth --image-path "your_image.jpg" -q 8
```

This will create a depth map and save it with the same name as your image but with a "_depth" suffix (e.g., "your_image_depth.png").
Quantization is supported for the Depth Pro model, however, output quality can very depend on the input image. 

⚠️ *Note: The Depth Pro model requires an additional 1.9GB download from Apple. The download happens automatically on first use.*

⚠️ *Note: Using the Depth tool requires an additional 33.92 GB download from [black-forest-labs/FLUX.1-Depth-dev](https://huggingface.co/black-forest-labs/FLUX.1-Depth-dev). The download happens automatically on first use.*

#### 🔄 Redux

The Redux tool is an adapter for FLUX.1 models that enables image variation generation similar to the [image-to-image](#-image-to-image) technique. 
It can reproduce an input image with slight variations, allowing you to refine and enhance existing images. 
Unlike the image-to-image technique which resumes the denoising process mid-way starting from the input image, the redux tool instead embeds the image and joins it with the T5 text encodings.

![redux example](src/mflux/assets/redux_example.jpg)
*Image credit: [Paris Bilal on Unsplash](https://unsplash.com/photos/a-cat-statue-sitting-on-top-of-a-white-base-Bedd4qHGWCg)*

##### Example

The following examples show how to use the Redux tool to generate variations:

```bash
mflux-generate-redux \
  --prompt "a grey statue of a cat on a white platform in front of a blue background" \
  --redux-image-paths "original.png" \
  --steps 20 \
  --height 1654 \
  --width 1154 \
  -q 8
```

You can also control the influence of each reference image by specifying strength values (between 0.0 and 1.0) using the `--redux-image-strengths` parameter:

```bash
mflux-generate-redux \
  --prompt "a grey statue of a cat on a white platform in front of a blue background" \
  --redux-image-paths "image1.png" "image2.png" \
  --redux-image-strengths 0.8 0.5 \
  --steps 20 \
  --height 1024 \
  --width 768 \
  -q 8
```

A higher strength value gives the reference image more influence over the final result. If you don't specify any strengths, a default value of 1.0 is used for all images.

There is a tendency for the reference image to dominate over the input prompt.

⚠️ *Note: Using the Redux tool requires an additional 1.1GB download from [black-forest-labs/FLUX.1-Redux-dev](https://huggingface.co/black-forest-labs/FLUX.1-Redux-dev). The download happens automatically on first use.*

---

### 🕹️ Controlnet

MFLUX has [Controlnet](https://huggingface.co/docs/diffusers/en/using-diffusers/controlnet) support for an even more fine-grained control
of the image generation. By providing a reference image via `--controlnet-image-path` and a strength parameter via `--controlnet-strength`, you can guide the generation toward the reference image.

```sh
mflux-generate-controlnet \
  --prompt "A comic strip with a joker in a purple suit" \
  --model dev \
  --steps 20 \
  --seed 1727047657 \
  --height 1066 \
  --width 692 \
  -q 8 \
  --lora-paths "Dark Comic - s0_8 g4.safetensors" \
  --controlnet-image-path "reference.png" \
  --controlnet-strength 0.5 \
  --controlnet-save-canny
```
![image](src/mflux/assets/controlnet1.jpg)

*This example combines the controlnet reference image with the LoRA [Dark Comic Flux](https://civitai.com/models/742916/dark-comic-flux)*.

⚠️ *Note: Controlnet requires an additional one-time download of ~3.58GB of weights from Huggingface. This happens automatically the first time you run the `generate-controlnet` command.
At the moment, the Controlnet used is [InstantX/FLUX.1-dev-Controlnet-Canny](https://huggingface.co/InstantX/FLUX.1-dev-Controlnet-Canny), which was trained for the `dev` model.
It can work well with `schnell`, but performance is not guaranteed.*

⚠️ *Note: The output can be highly sensitive to the controlnet strength and is very much dependent on the reference image.
Too high settings will corrupt the image. A recommended starting point a value like 0.4 and to play around with the strength.*


Controlnet can also work well together with [LoRA adapters](#-lora). In the example below the same reference image is used as a controlnet input
with different prompts and LoRA adapters active.

![image](src/mflux/assets/controlnet2.jpg)

---

### 🔎 Upscale

The upscale tool allows you to increase the resolution of an existing image while maintaining or enhancing its quality and details. It uses a specialized ControlNet model that's trained to intelligently upscale images without introducing artifacts or losing image fidelity.

Under the hood, this is simply the controlnet pipeline with [jasperai/Flux.1-dev-Controlnet-Upscaler](https://huggingface.co/jasperai/Flux.1-dev-Controlnet-Upscaler), with a small modification that we don't process the image with canny edge detection.

![upscale example](src/mflux/assets/upscale_example.jpg)
*Image credit: [Kevin Mueller on Unsplash](https://unsplash.com/photos/gray-owl-on-black-background-xvwZJNaiRNo)*

#### How to use

```sh
mflux-upscale \
  --prompt "A gray owl on black background" \
  --steps 28 \
  --seed 42 \
  --height 1363 \
  --width 908 \
  -q 8 \
  --controlnet-image-path "low_res_image.png" \
  --controlnet-strength 0.6 
```

This will upscale your input image to the specified dimensions. The upscaler works best when increasing the resolution by a factor of 2-4x.

⚠️ *Note: Depending on the capability of your machine, you might run out of memory when trying to export the image. Try the `--vae-tiling` flag to process the image in smaller chunks, which significantly reduces memory usage during the VAE decoding phase at a minimal cost to performance. This optimization may occasionally produce a subtle seam in the middle of the image, but it's often worth the tradeoff for being able to generate higher resolution images. You can also use `--vae-tiling-split vertical` to change the split direction from the default horizontal (top/bottom) to vertical (left/right).*

#### Tips for Best Results

- For optimal results, try to maintain the same aspect ratio as the original image
- Prompting matters: Try to acculturate describe the image when upscaling
- The recommended `--controlnet-strength` is in the range between 0.5 to 0.7

---

### 🎛️ Dreambooth fine-tuning

As of release [v.0.5.0](https://github.com/filipstrand/mflux/releases/tag/v.0.5.0), MFLUX has support for fine-tuning your own LoRA adapters using the [Dreambooth](https://dreambooth.github.io) technique.

![image](src/mflux/assets/dreambooth.jpg)

*This example shows the MFLUX training progression of the [included training example](#training-example) which is based on the [DreamBooth Dataset](https://github.com/google/dreambooth), also used in the [mlx-examples repo](https://github.com/ml-explore/mlx-examples/tree/main/flux#finetuning).*

#### Training configuration

To describe a training run, you need to provide a [training configuration](src/mflux/dreambooth/_example/train.json) file which specifies the details such as
what training data to use and various parameters. To try it out, one of the easiest ways is to start from the 
provided [example configuration](src/mflux/dreambooth/_example/train.json) and simply use your own dataset and prompts by modifying the `examples` section of the json file.

#### Training example

A complete example ([training configuration](src/mflux/dreambooth/_example/train.json) + [dataset](src/mflux/dreambooth/_example/images)) is provided in this repository. To start a training run, go to the project folder `cd mflux`, and simply run:

```sh
mflux-train --train-config src/mflux/dreambooth/_example/train.json
```

By default, this will train an adapter with images of size `512x512` with a batch size of 1 and can take up to several hours to fully complete depending on your machine.
If this task is too computationally demanding, see the section on [memory issues](#memory-issues) for tips on how to speed things up and what tradeoffs exist.

During training, MFLUX will output training checkpoints with artifacts (weights, states) according to what is specified in the configuration file.
As specified in the file `train.json`, these files will be placed in a folder on the Desktop called `~/Desktop/train`, but this can of course be changed to any other path by adjusting the configuration. 
All training artifacts will be saved as self-contained zip file, which can later be pointed to [resume an existing training run](#resuming-a-training-run).
To find the LoRA weights, simply unzip and look for the `adapter` safetensors file and [use it as you would with a regular downloaded LoRA adapter](#-lora).

#### Resuming a training run

The training process will continue to run until each training example has been used `num_epochs` times. 
For various reasons however, the user might choose to interrupt the process.
To resume training for a given checkpoint, say `0001000_checkpoint.zip`, simply run: 

```sh
mflux-train --train-checkpoint 0001000_checkpoint.zip
```

This uses the `--train-checkpoint` command-line argument to specify the checkpoint file to resume from.

There are two nice properties of the training procedure: 

- Fully deterministic (given a specified `seed` in the training configuration)
- The complete training state (including optimizer state) is saved at each checkpoint.

Because of these, MFLUX has the ability to resume a training run from a previous checkpoint and have the results
be *exactly* identical to a training run which was never interrupted in the first place.

*⚠️ Note: Everything but the dataset itself is contained within this zip file, as the dataset can be quite large.
The zip file will contain configuration files which point to the original dataset, so make sure that it is in the same place when resuming training*.

*⚠️ Note: One current limitation is that a training run can only be resumed if it has not yet been completed. 
In other words, only checkpoints that represent an interrupted training-run can be resumed and run until completion.*

<details>
<summary>⚙️ <strong>Configuration details</strong></summary>

Currently, MFLUX supports fine-tuning only for the transformer part of the model.
In the training configuration, under `lora_layers`, you can specify which layers you want to train. The available ones are: 

- `transformer_blocks`:
  - `attn.to_q`
  - `attn.to_k`
  - `attn.to_v`
  - `attn.add_q_proj`
  - `attn.add_k_proj`
  - `attn.add_v_proj`
  - `attn.to_out`
  - `attn.to_add_out`
  - `ff.linear1`
  - `ff.linear2`
  - `ff_context.linear1`
  - `ff_context.linear2`
- `single_transformer_blocks`:
  - `proj_out`
  - `proj_mlp`
  - `attn.to_q`
  - `attn.to_k`
  - `attn.to_v`

The `block_range` under the respective layer category specifies which blocks to train.
The maximum range available for the different layer categories are:

- `transformer_blocks`:
  - `start: 0`
  - `end: 19`
- `single_transformer_blocks`:
  - `start: 0`
  - `end: 38`

<details>
<summary>Specify individual layers</summary>

For even more precision, you can specify individual block indices to train like so:

```json
"lora_layers": {
  "single_transformer_blocks": {
    "block_range": {
      "indices": [
        0,
        1,
        7,
        19,
        20
      ],
      ...
  },
...
```
</details>

*⚠️ Note: As the joint transformer blocks (`transformer_blocks`) - are placed earlier on in the sequence of computations, they will require more resources to train.
In other words, training later layers, such as only the `single_transformer_blocks` should be faster. However, training too few / only later layers might result in a faster but unsuccessful training.*

*Under the `examples` section, there is an argument called `"path"` which specifies where the images are located. This path is relative to the config file itself.*

</details>

<details>
<summary>⚠️ <strong>Memory issues</strong></summary>

Depending on the configuration of the training setup, fine-tuning can be quite memory intensive.
In the worst case, if your Mac runs out of memory it might freeze completely and crash!

To avoid this, consider some of the following strategies to reduce memory requirements by adjusting the parameters in the training configuration:

- Use a quantized based model by setting `"quantize": 4` or `"quantize": 8`
- For the `layer_types`, consider skipping some of the trainable layers (e.g. by not including `proj_out` etc.)
- Use a lower `rank` value for the LoRA matrices.
- Don't train all the  `38` layers from `single_transformer_blocks` or all of the `19` layers from `transformer_blocks`  
- Use a smaller batch size, for example `"batch_size": 1`
- Make sure your Mac is not busy with other background tasks that holds memory.

Applying some of these strategies, like how [train.json](src/mflux/dreambooth/_example/train.json) is set up by default,
will allow a 32GB M1 Pro to perform a successful fine-tuning run. 
Note, however, that reducing the trainable parameters might lead to worse performance.
 

*Additional techniques such as gradient checkpoint and other strategies might be implemented in the future.*

</details> 

<details>
<summary>📝 <strong>Misc</strong></summary>

This feature is currently v1 and can be considered a bit experimental. Interfaces might change (configuration file setup etc.)
The aim is to also gradually expand the scope of this feature with alternative techniques, data augmentation etc.

- As with loading external LoRA adapters, the MFLUX training currently only supports training the transformer part of the network.
- Sometimes, a model trained with the `dev` model might actually work better when applied to the `schnell` weights.
- Currently, all training images are assumed to be in the resolution specified in the configuration file.
- Loss curve can be a bit misleading/hard to read, sometimes it conveys little improvement over time, but actual image samples show the real progress.
- When plotting the loss during training, we label it as "validation loss" but it is actually only the first 10 elements of the training examples for now. Future updates should support user inputs of separate validation images.
- Training also works with the original model as quantized!
- [For the curious, a motivation for the loss function can be found here](src/mflux/dreambooth/optimization/_loss_derivation/dreambooth_loss.pdf).
- Two great resources that heavily inspired this feature are: 
  - The fine-tuning script in [mlx-examples](https://github.com/ml-explore/mlx-examples/tree/main/flux#finetuning)
  - The original fine-tuning script in [Diffusers](https://huggingface.co/docs/diffusers/v0.11.0/en/training/dreambooth)

</details>


---

### 🧠 Concept Attention

The Concept Attention feature allows you to visualize and understand how FLUX models focus on specific concepts within your prompts during image generation.

![concept_example_1](src/mflux/assets/concept_example_1.jpg)

This implementation is based on the research paper ["ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features"](https://arxiv.org/abs/2502.04320) by [Helbling et al.](https://github.com/helblazer811/ConceptAttention), which demonstrates that multi-modal diffusion transformers like FLUX learn highly interpretable representations that can be leveraged to generate precise concept localization maps.

MFLUX provides two concept attention tools:

1. **Text-based Concept Analysis** (`mflux-concept`): Analyzes attention patterns for a specific concept within your text prompt
2. **Image-guided Concept Analysis** (`mflux-concept-from-image`): Uses a reference image to guide concept attention analysis

#### Text-based Concept Analysis

This approach analyzes how the model attends to a specific concept mentioned in your prompt. The model generates an image while tracking attention patterns, then creates a heatmap showing where the model focused when processing your concept.

##### Example

```bash
mflux-concept \
  --prompt "A dragon on a hill" \
  --concept "dragon" \
  --model schnell \
  --steps 4 \
  --seed 9643208 \
  --height 720 \
  --width 1280 \
  --heatmap-layer-indices 15 16 17 18 \
  --heatmap-timesteps 0 1 2 3 \
  -q 4
```
This will generate the following image

![concept_example_1](src/mflux/assets/concept_example_2.jpg)

This command will generate:
- The main image based on your prompt
- A concept attention heatmap showing where the model focused on the "dragon" concept
- Both images are automatically saved with appropriate naming

#### Image-guided Concept Analysis

This approach uses a reference image to guide the concept attention analysis. The model analyzes how a concept appears in the reference image and applies similar attention patterns to generate a new image that maintains those conceptual relationships.

##### Example

```bash
mflux-concept-from-image \
  --model schnell \
  --input-image-path "puffin.png" \
  --prompt "Two puffins are perched on a grassy, flower-covered cliffside, with one appearing to call out while the other looks on silently against a blurred ocean backdrop" \
  --concept "bird" \
  --steps 4 \
  --height 720 \
  --width 1280 \
  --seed 4529717 \
  --heatmap-layer-indices 15 16 17 18 \
  --heatmap-timesteps 0 1 2 3 \
  -q 4
```

This will generate the following image

![concept_example_1](src/mflux/assets/concept_example_3.jpg)

#### Advanced Configuration

- **`--heatmap-layer-indices`**: Controls which transformer layers to analyze (default: 15-18). Different layers capture different levels of abstraction

- **`--heatmap-timesteps`**: Specifies which denoising steps to include in the analysis (default: all steps). Lower timestep values focus on early generation stages where broad composition is determined.

---

### 🚧 Current limitations

- Images are generated one by one.
- Negative prompts not supported.
- LoRA weights are only supported for the transformer part of the network.
- Some LoRA adapters does not work.
- Currently, the supported controlnet is the [canny-only version](https://huggingface.co/InstantX/FLUX.1-dev-Controlnet-Canny).
- Dreambooth training currently does not support sending in training parameters as flags.
- In-Context Generation features currently only support a left-right image setup (reference image on left, generated image on right).

---

### Optional Tool: Batch Image Renamer

With a large number of generated images, some users want to automatically rename their image outputs to reflect the prompts and configs.

The bundled `tools/rename_images.py` is an optional tool that is part with the project repo
but *not* included in the `mflux` Python package due to additional dependencies that
do not make sense to become standard project requirements.

The script uses [KeyBERT](https://maartengr.github.io/KeyBERT/) (a keyword extraction library)
to extract keywords from mflux exif metadata to update the image file names.
We then use `uv run` to execute the script in an isolated env without affecting your `mflux` env.

Users who want to use or extend this tool to their own needs is encouraged to `git clone` the repo
then `uv run tools/rename_images.py <paths>` or download the single-file standalone script
and `uv run your/path/rename_images.py`.

This script's renaming logic can be customized to your needs.
See `uv run tools/rename_images.py --help` for full CLI usage help.

---

### 💡Workflow Tips

- To hide the model fetching status progress bars, `export HF_HUB_DISABLE_PROGRESS_BARS=1`
- To opt-out of [HuggingFace telemetry](https://huggingface.co/docs/huggingface_hub/main/en/package_reference/environment_variables#hfhubdisabletelemetry), `export HF_HUB_DISABLE_TELEMETRY=1`
- Use config files to save complex job parameters in a file instead of passing many `--args`
- Set up shell aliases for required args examples:
  - shortcut for dev model: `alias mflux-dev='mflux-generate --model dev'`
  - shortcut for schnell model *and* always save metadata: `alias mflux-schnell='mflux-generate --model schnell --metadata'`
- For systems with limited memory, use the `--low-ram` flag to reduce memory usage by constraining the MLX cache size and releasing components after use
- On battery-powered Macs, use `--battery-percentage-stop-limit` (or `-B`) to prevent your laptop from shutting down during long generation sessions
- When generating multiple images with different seeds, use `--seed` with multiple values or `--auto-seeds` to automatically generate a series of random seeds
- Use `--stepwise-image-output-dir` to save intermediate images at each denoising step, which can be useful for debugging or creating animations of the generation process

---

### 🔬 Cool research
- [ ] [PuLID](https://github.com/ToTheBeginning/PuLID)

---

### 🌱‍ Related projects

- [Mflux-ComfyUI](https://github.com/raysers/Mflux-ComfyUI) by [@raysers](https://github.com/raysers)
- [MFLUX-WEBUI](https://github.com/CharafChnioune/MFLUX-WEBUI) by [@CharafChnioune](https://github.com/CharafChnioune)
- [mflux-fasthtml](https://github.com/anthonywu/mflux-fasthtml) by [@anthonywu](https://github.com/anthonywu)
- [mflux-streamlit](https://github.com/elitexp/mflux-streamlit) by [@elitexp](https://github.com/elitexp)

---

### 🙏 Acknowledgements

MFLUX would not be possible without the great work of:

- The MLX Team for [MLX](https://github.com/ml-explore/mlx) and [MLX examples](https://github.com/ml-explore/mlx-examples)
- Black Forest Labs for the [FLUX project](https://github.com/black-forest-labs/flux)
- Hugging Face for the [Diffusers library implementation of Flux](https://huggingface.co/black-forest-labs/FLUX.1-dev) 
- Depth Pro authors for the [Depth Pro model](https://github.com/apple/ml-depth-pro?tab=readme-ov-file#citation)
- The MLX community and all [contributors and testers](https://github.com/filipstrand/mflux/graphs/contributors)

---

### ⚖️ License

This project is licensed under the [MIT License](LICENSE).

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "mflux",
    "maintainer": "Filip Strand",
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": "Filip Strand <strand.filip@gmail.com>",
    "keywords": "diffusers, flux, mlx",
    "author": "Filip Strand",
    "author_email": "Filip Strand <strand.filip@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/00/89/c06c275a0c6258e6a4ba5fd6951e3847940a20c0795e7b84bdf940b6d8c8/mflux-0.9.6.tar.gz",
    "platform": null,
    "description": "![image](src/mflux/assets/logo.png)\n*A MLX port of FLUX based on the Huggingface Diffusers implementation.*\n\n\n### About\n\nRun the powerful [FLUX](https://blackforestlabs.ai/#get-flux) models from [Black Forest Labs](https://blackforestlabs.ai) locally on your Mac!\n\n### Table of contents\n\n<!-- TOC start (generated with https://github.com/derlin/bitdowntoc) -->\n\n- [Philosophy](#philosophy)\n- [\ud83d\udcbf Installation](#-installation)\n- [\ud83d\uddbc\ufe0f Generating an image](#%EF%B8%8F-generating-an-image)\n  * [\ud83d\udcdc Full list of Command-Line Arguments](#-full-list-of-command-line-arguments)\n- [\u23f1\ufe0f Image generation speed (updated)](#%EF%B8%8F-image-generation-speed-updated)\n- [\u2194\ufe0f Equivalent to Diffusers implementation](#%EF%B8%8F-equivalent-to-diffusers-implementation)\n- [\ud83d\udddc\ufe0f Quantization](#%EF%B8%8F-quantization)\n- [\ud83d\udcbd Running a non-quantized model directly from disk](#-running-a-non-quantized-model-directly-from-disk)\n- [\ud83c\udf10 Third-Party HuggingFace Model Support](#-third-party-huggingface-model-support)\n- [\ud83c\udfa8 Image-to-Image](#-image-to-image)\n- [\ud83d\udd0c LoRA](#-lora)\n- [\ud83c\udfad In-Context Generation](#-in-context-generation)\n  * [\ud83d\udcf8 Kontext](#-kontext)\n  * [\ud83c\udfa8 In-Context LoRA](#-in-context-lora)\n  * [\ud83d\udc55 CatVTON (Virtual Try-On)](#-catvton-virtual-try-on)\n  * [\u270f\ufe0f IC-Edit (In-Context Editing)](#%EF%B8%8F-ic-edit-in-context-editing)\n- [\ud83d\udee0\ufe0f Flux Tools](#%EF%B8%8F-flux-tools)\n  * [\ud83d\udd8c\ufe0f Fill](#%EF%B8%8F-fill)\n  * [\ud83d\udd0d Depth](#-depth)\n  * [\ud83d\udd04 Redux](#-redux)\n- [\ud83d\udd79\ufe0f Controlnet](#%EF%B8%8F-controlnet)\n- [\ud83d\udd0e Upscale](#-upscale)\n- [\ud83c\udf9b\ufe0f Dreambooth fine-tuning](#%EF%B8%8F-dreambooth-fine-tuning)\n- [\ud83e\udde0 Concept Attention](#-concept-attention)\n- [\ud83d\udea7 Current limitations](#-current-limitations)\n- [\ud83d\udca1Workflow tips](#workflow-tips)\n- [\ud83d\udd2c Cool research](#-cool-research)\n- [\ud83c\udf31\u200d Related projects](#-related-projects)\n- [\ud83d\ude4f Acknowledgements](#-acknowledgements)\n- [\u2696\ufe0f License](#%EF%B8%8F-license)\n\n<!-- TOC end -->\n\n---\n\n### Philosophy\n\nMFLUX is a line-by-line port of the FLUX implementation in the [Huggingface Diffusers](https://github.com/huggingface/diffusers) library to [Apple MLX](https://github.com/ml-explore/mlx).\nMFLUX is purposefully kept minimal and explicit - Network architectures are hardcoded and no config files are used\nexcept for the tokenizers. The aim is to have a tiny codebase with the single purpose of expressing these models\n(thereby avoiding too many abstractions). While MFLUX priorities readability over generality and performance, [it can still be quite fast](#%EF%B8%8F-image-generation-speed-updated), [and even faster quantized](#%EF%B8%8F-quantization).\n\nAll models are implemented from scratch in MLX and only the tokenizers are used via the\n[Huggingface Transformers](https://github.com/huggingface/transformers) library. Other than that, there are only minimal dependencies\nlike [Numpy](https://numpy.org) and [Pillow](https://pypi.org/project/pillow/) for simple image post-processing.\n\n---\n\n### \ud83d\udcbf Installation\nFor users, the easiest way to install MFLUX is to use `uv tool`: If you have [installed `uv`](https://github.com/astral-sh/uv?tab=readme-ov-file#installation), simply:\n\n```sh\nuv tool install --upgrade mflux\n```\n\nto get the `mflux-generate` and related command line executables. You can skip to the usage guides below.\n\n<details>\n<summary>For Python 3.13 dev preview</summary>\n\nThe [T5 encoder](https://huggingface.co/docs/transformers/en/model_doc/t5) is dependent on [sentencepiece](https://pypi.org/project/sentencepiece/), which does not have a installable wheel artifact for Python 3.13 as of Nov 2024. Until Google [publishes a 3.13 wheel](https://pypi.org/project/sentencepiece/), you need to build your own wheel with [official build instructions](https://github.com/google/sentencepiece/blob/master/python/README.md#build-and-install-sentencepiece) or for your convenience use a `.whl` pre-built by contributor @anthonywu. The steps below should work for most developers though your system may vary.\n\n```sh\nuv venv --python 3.13\npython -V  # e.g. Python 3.13.0rc2\nsource .venv/bin/activate\n\n# for your convenience, you can use the contributor wheel\nuv pip install https://github.com/anthonywu/sentencepiece/releases/download/0.2.1-py13dev/sentencepiece-0.2.1-cp313-cp313-macosx_11_0_arm64.whl\n\n# enable the pytorch nightly \nuv pip install --pre --extra-index-url https://download.pytorch.org/whl/nightly -e .\n```\n</details>\n\n<details>\n<summary>For the classic way to create a user virtual environment:</summary>\n\n```\nmkdir -p mflux && cd mflux && python3 -m venv .venv && source .venv/bin/activate\n```\n\nThis creates and activates a virtual environment in the `mflux` folder. After that, install MFLUX via pip:\n\n```\npip install -U mflux\n```\n\n</details>\n\n<details>\n<summary>For contributors (click to expand)</summary>\n\n1. Clone the repo:\n```sh\n git clone git@github.com:filipstrand/mflux.git\n ```\n2. Install the application\n\n```sh\n make install\n ```\n3. To run the test suite\n```sh\n make test\n ```\n4. Follow format and lint checks prior to submitting Pull Requests. The recommended `make lint` and `make format` installs and uses [`ruff`](https://github.com/astral-sh/ruff). You can setup your editor/IDE to lint/format automatically, or use our provided `make` helpers:\n  - `make format` - formats your code\n  - `make lint` - shows your lint errors and warnings, but does not auto fix\n  - `make check` - via `pre-commit` hooks, formats your code **and** attempts to auto fix lint errors\n  - consult official [`ruff` documentation](https://docs.astral.sh/ruff/) on advanced usages\n\n</details>\n\n*If you have trouble installing MFLUX, please see the [installation related issues section](https://github.com/filipstrand/mflux/issues?q=is%3Aissue+install+).* \n\n<details>\n<summary>\u2328\ufe0f <strong>Shell Completions (Optional)</strong></summary>\n\nMFLUX supports ZSH (default on macOS) shell completions for all CLI commands. This provides tab completion for all [command-line arguments](#-full-list-of-command-line-arguments) and options.\n\nTo enable completions:\n\n```sh\n# Install completions\nmflux-completions\n\n# Reload your shell\nexec zsh\n\n# Now try tab completion!\nmflux-generate --<TAB>\n```\n\nIf you encounter issues (ZSH completion setup can vary by system), check your installation:\n\n```sh\nmflux-completions --check\n```\n\nFor more details and troubleshooting, see the [completions documentation](src/mflux/completions/README.md).\n\n</details>\n\n---\n\n### \ud83d\uddbc\ufe0f Generating an image\n\nRun the command `mflux-generate` by specifying a prompt and the model and some optional arguments. For example, here we use a quantized version of the `schnell` model for 2 steps:\n\n```sh\nmflux-generate --model schnell --prompt \"Luxury food photograph\" --steps 2 --seed 2 -q 8\n```\n\nThis example uses the more powerful `dev` model with 25 time steps:\n\n```sh\nmflux-generate --model dev --prompt \"Luxury food photograph\" --steps 25 --seed 2 -q 8\n```\n\nYou can also pipe prompts from other commands using stdin:\n\n```sh\necho \"A majestic mountain landscape\" | mflux-generate --prompt - --model schnell --steps 2\n```\n\nThis is useful for integrating MFLUX into shell scripts or dynamically generating prompts using LLM inference tools such as [`llm`](https://llm.datasette.io/en/stable/), [`mlx-lm`](https://github.com/ml-explore/mlx-lm), [`ollama`](https://ollama.ai/), etc.\n\nAlternatively, you can use MFLUX directly in Python:\n\n```python\nfrom mflux.flux.flux import Flux1\nfrom mflux.config.config import Config\n\n# Load the model\nflux = Flux1.from_name(\n   model_name=\"schnell\",  # \"schnell\" or \"dev\"\n   quantize=8,            # 4 or 8\n)\n\n# Generate an image\nimage = flux.generate_image(\n   seed=2,\n   prompt=\"Luxury food photograph\",\n   config=Config(\n      num_inference_steps=2,  # \"schnell\" works well with 2-4 steps, \"dev\" works well with 20-25 steps\n      height=1024,\n      width=1024,\n   )\n)\n\nimage.save(path=\"image.png\")\n```\n\nFor more advanced Python usage and additional configuration options, you can explore the entry point files in the source code, such as [`generate.py`](src/mflux/generate.py), [`generate_controlnet.py`](src/mflux/generate_controlnet.py), [`generate_fill.py`](src/mflux/generate_fill.py), and others in the [`src/mflux/`](src/mflux/) directory. These files demonstrate how to use the Python API for various features and provide examples of advanced configurations.\n\n\u26a0\ufe0f *If the specific model is not already downloaded on your machine, it will start the download process and fetch the model weights (~34GB in size for the Schnell or Dev model respectively). See the [quantization](#%EF%B8%8F-quantization) section for running compressed versions of the model.* \u26a0\ufe0f\n\n*By default, mflux caches files in `~/Library/Caches/mflux/`. The Hugging Face model files themselves are cached separately in the Hugging Face cache directory (e.g., `~/.cache/huggingface/`).*\n\n*To change the mflux cache location, set the `MFLUX_CACHE_DIR` environment variable. To change the Hugging Face cache location, you can modify the `HF_HOME` environment variable. For more details on Hugging Face cache settings, please refer to the [Hugging Face documentation](https://huggingface.co/docs/huggingface_hub/en/package_reference/environment_variables)*.\n\n\ud83d\udd12 [FLUX.1-dev currently requires granted access to its Huggingface repo. For troubleshooting, see the issue tracker](https://github.com/filipstrand/mflux/issues/14) \ud83d\udd12\n\n#### \ud83d\udcdc Full list of Command-Line Arguments\n\n<details>\n<summary>\ud83d\udcdc <strong>Command-Line Arguments Reference</strong></summary>\n\n#### General Command-Line Arguments\n\n<details>\n<summary>Click to expand General arguments</summary>\n\n- **`--prompt`** (required, `str`): Text description of the image to generate. Use `-` to read the prompt from stdin (e.g., `echo \"A beautiful sunset\" | mflux-generate --prompt -`).\n\n- **`--model`** or **`-m`** (required, `str`): Model to use for generation. Can be one of the official models (`\"schnell\"` or `\"dev\"`) or a HuggingFace repository ID for a compatible third-party model (e.g., `\"Freepik/flux.1-lite-8B-alpha\"`).\n\n- **`--base-model`** (optional, `str`, default: `None`): Specifies which base architecture a third-party model is derived from (`\"schnell\"` or `\"dev\"`). Required when using third-party models from HuggingFace.\n\n- **`--output`** (optional, `str`, default: `\"image.png\"`): Output image filename. If `--seed` or `--auto-seeds` establishes multiple seed values, the output filename will automatically be modified to include the seed value (e.g., `image_seed_42.png`).\n\n- **`--seed`** (optional, repeatable `int` args, default: `None`): 1 or more seeds for random number generation. e.g. `--seed 42` or `--seed 123 456 789`. When multiple seeds are provided, MFLUX will generate one image per seed, using the same prompt and settings. Default is a single time-based value.\n\n- **`--auto-seeds`** (optional, `int`, default: `None`): Auto generate N random Seeds in a series of image generations. For example, `--auto-seeds 5` will generate 5 different images with 5 different random seeds. This is superseded by explicit `--seed` arguments and `seed` values in `--config-from-metadata` files.\n\n- **`--height`** (optional, `int`, default: `1024`): Height of the output image in pixels.\n\n- **`--width`** (optional, `int`, default: `1024`): Width of the output image in pixels.\n\n- **`--steps`** (optional, `int`, default: `4`): Number of inference steps.\n\n- **`--guidance`** (optional, `float`, default: `3.5`): Guidance scale (only used for `\"dev\"` model).\n\n- **`--path`** (optional, `str`, default: `None`): Path to a local model on disk.\n\n- **`--quantize`** or **`-q`** (optional, `int`, default: `None`): [Quantization](#%EF%B8%8F-quantization) (choose between `3`, `4`, `6`, or `8` bits).\n\n- **`--lora-paths`** (optional, `[str]`, default: `None`): The paths to the [LoRA](#-LoRA) weights.\n\n- **`--lora-scales`** (optional, `[float]`, default: `None`): The scale for each respective [LoRA](#-LoRA) (will default to `1.0` if not specified and only one LoRA weight is loaded.)\n\n- **`--metadata`** (optional): Exports a `.json` file containing the metadata for the image with the same name. (Even without this flag, the image metadata is saved and can be viewed using `exiftool image.png`)\n\n- **`--image-path`** (optional, `str`, default: `None`): Local path to the initial image for image-to-image generation.\n\n- **`--image-strength`** (optional, `float`, default: `0.4`): Controls how strongly the initial image influences the output image. A value of `0.0` means no influence. (Default is `0.4`)\n\n- **`--config-from-metadata`** or **`-C`** (optional, `str`): [EXPERIMENTAL] Path to a prior file saved via `--metadata`, or a compatible handcrafted config file adhering to the expected args schema.\n\n- **`--low-ram`** (optional): Reduces GPU memory usage by limiting the MLX cache size and releasing text encoders and transformer components after use (single image generation only). While this may slightly decrease performance, it helps prevent system memory swapping to disk, allowing image generation on systems with limited RAM.\n\n- **`--battery-percentage-stop-limit`** or **`-B`** (optional, `int`, default: `5`): On Mac laptops powered by battery, automatically stops image generation when battery percentage reaches this threshold. Prevents your Mac from shutting down and becoming unresponsive during long generation sessions.\n\n- **`--lora-name`** (optional, `str`, default: `None`): The name of the LoRA to download from Hugging Face.\n\n- **`--lora-repo-id`** (optional, `str`, default: `\"ali-vilab/In-Context-LoRA\"`): The Hugging Face repository ID for LoRAs.\n\n- **`--stepwise-image-output-dir`** (optional, `str`, default: `None`): [EXPERIMENTAL] Output directory to write step-wise images and their final composite image to. This feature may change in future versions. When specified, MFLUX will save an image for each denoising step, allowing you to visualize the generation process from noise to final image.\n\n- **`--vae-tiling`** (optional, flag): Enable VAE tiling to reduce memory usage during the decoding phase. This splits the image into smaller chunks for processing, which can prevent out-of-memory errors when generating high-resolution images. Note that this optimization may occasionally produce a subtle seam in the middle of the image, but it's often worth the tradeoff for being able to generate images that would otherwise cause your system to run out of memory.\n\n- **`--vae-tiling-split`** (optional, `str`, default: `\"horizontal\"`): When VAE tiling is enabled, this parameter controls the direction to split the latents. Options are `\"horizontal\"` (splits into top/bottom) or `\"vertical\"` (splits into left/right). Use this option to control where potential seams might appear in the final image.\n\n</details>\n\n#### Kontext Command-Line Arguments\n\n<details>\n<summary>Click to expand Kontext arguments</summary>\n\nThe `mflux-generate-kontext` command supports most of the same arguments as `mflux-generate`, with these specific parameters:\n\n- **`--image-path`** (required, `str`): Path to the reference image that will be used as the base for editing. This image provides the visual context for the text-guided modifications.\n\n- **`--guidance`** (optional, `float`, default: `2.5`): Guidance scale for the generation. Kontext typically works well with values between 2.0-4.0, with lower values allowing more creative freedom and higher values adhering more closely to the prompt.\n\n**Note**: The Kontext tool automatically uses the appropriate model (`dev-kontext`), so you don't need to specify `--model`.\n\nSee the [Kontext](#-kontext) section for more details on this feature and comprehensive prompting strategies.\n\n</details>\n\n#### In-Context LoRA Command-Line Arguments\n\n<details>\n<summary>Click to expand In-Context LoRA arguments</summary>\n\nThe `mflux-generate-in-context` command supports most of the same arguments as `mflux-generate`, with these additional parameters:\n\n- **`--image-path`** (required, `str`): Path to the reference image that will guide the style of the generated image.\n\n- **`--lora-style`** (optional, `str`, default: `None`): The style to use for In-Context LoRA generation. Choose from: `couple`, `storyboard`, `font`, `home`, `illustration`, `portrait`, `ppt`, `sandstorm`, `sparklers`, or `identity`.\n\nSee the [In-Context Generation](#-in-context-generation) section for more details on how to use this feature effectively.\n\n</details>\n\n#### CatVTON Command-Line Arguments\n\n<details>\n<summary>Click to expand CatVTON arguments</summary>\n\nThe `mflux-generate-in-context-catvton` command supports most of the same arguments as `mflux-generate`, with these specific parameters:\n\n- **`--person-image`** (required, `str`): Path to the person image who will \"wear\" the garment.\n\n- **`--person-mask`** (required, `str`): Path to a binary mask image indicating where the garment should be applied on the person.\n\n- **`--garment-image`** (required, `str`): Path to the garment image to be virtually worn.\n\n- **`--save-full-image`** (optional, flag): Additionally save the full side-by-side image containing both the garment reference and the result.\n\nSee the [CatVTON (Virtual Try-On)](#-catvton-virtual-try-on) section for more details on this feature.\n\n</details>\n\n#### IC-Edit Command-Line Arguments\n\n<details>\n<summary>Click to expand IC-Edit arguments</summary>\n\nThe `mflux-generate-in-context-edit` command supports most of the same arguments as `mflux-generate`, with these specific parameters:\n\n- **`--reference-image`** (required, `str`): Path to the reference image to be edited.\n\n- **`--instruction`** (optional, `str`): Simple text instruction describing the desired edit (e.g., \"make the hair black\"). This will be automatically formatted as a diptych template. Either `--instruction` or `--prompt` is required.\n\n- **`--save-full-image`** (optional, flag): Additionally save the full side-by-side image showing the original and edited versions.\n\n**Note**: The IC-Edit tool automatically downloads and applies the required LoRA, and optimizes the resolution to 512px width for best results.\n\nSee the [IC-Edit (In-Context Editing)](#-ic-edit-in-context-editing) section for more details on this feature.\n\n</details>\n\n#### Redux Tool Command-Line Arguments\n\n<details>\n<summary>Click to expand Redux arguments</summary>\n\nThe `mflux-generate-redux` command uses most of the same arguments as `mflux-generate`, with these specific parameters:\n\n- **`--redux-image-paths`** (required, `[str]`): Paths to one or more reference images to influence the generation.\n\n- **`--redux-image-strengths`** (optional, `[float]`, default: `[1.0, 1.0, ...]`): Strength values (between 0.0 and 1.0) for each reference image. Higher values give more influence to that reference image. If not provided, all images use a strength of 1.0.\n\nSee the [Redux](#-redux) section for more details on this feature.\n\n</details>\n\n#### Concept Attention Command-Line Arguments\n\n<details>\n<summary>Click to expand Concept Attention arguments</summary>\n\nThe `mflux-concept` command supports most of the same arguments as `mflux-generate`, with these specific parameters:\n\n- **`--concept`** (required, `str`): The concept prompt to use for attention visualization. This defines what specific aspect or element you want to analyze within your generated image.\n\n- **`--heatmap-layer-indices`** (optional, `[int]`, default: `[15, 16, 17, 18]`): Transformer layer indices to use for heatmap generation. These layers capture different levels of abstraction in the attention mechanism.\n\n- **`--heatmap-timesteps`** (optional, `[int]`, default: `None`): Timesteps to use for heatmap generation. If not specified, all timesteps are used. Lower timestep values focus on early stages of generation.\n\nThe `mflux-concept-from-image` command uses most of the same arguments as `mflux-concept`, with this additional parameter:\n\n- **`--input-image-path`** (required, `str`): Path to a reference image for concept attention analysis. The model will analyze how the concept appears in this reference image and apply similar attention patterns to the generated image.\n\nSee the [Concept Attention](#-concept-attention) section for more details on this feature.\n\n</details>\n\n#### Fill Tool Command-Line Arguments\n\n<details>\n<summary>Click to expand Fill Tool arguments</summary>\n\nThe `mflux-generate-fill` command supports most of the same arguments as `mflux-generate`, with these specific parameters:\n\n- **`--image-path`** (required, `str`): Path to the original image that will be modified.\n\n- **`--masked-image-path`** (required, `str`): Path to a binary mask image where white areas (255) will be regenerated and black areas (0) will be preserved.\n\n- **`--guidance`** (optional, `float`, default: `30.0`): The Fill tool works best with higher guidance values compared to regular image generation.\n\nSee the [Fill](#-fill) section for more details on inpainting and outpainting.\n\n</details>\n\n#### Depth Tool Command-Line Arguments\n\n<details>\n<summary>Click to expand Depth Tool arguments</summary>\n\nThe `mflux-generate-depth` command supports most of the same arguments as `mflux-generate`, with these specific parameters:\n\n- **`--image-path`** (optional, `str`, default: `None`): Path to the reference image from which to extract a depth map. Either this or `--depth-image-path` must be provided.\n\n- **`--depth-image-path`** (optional, `str`, default: `None`): Path to a pre-generated depth map image. Either this or `--image-path` must be provided.\n\nThe `mflux-save-depth` command for extracting depth maps without generating images has these arguments:\n\n- **`--image-path`** (required, `str`): Path to the image from which to extract a depth map.\n\n- **`--output`** (optional, `str`, default: Uses the input filename with \"_depth\" suffix): Path where the generated depth map will be saved.\n\n- **`--quantize`** or **`-q`** (optional, `int`, default: `None`): Quantization for the Depth Pro model.\n\nSee the [Depth](#-depth) section for more details on this feature.\n\n</details>\n\n#### ControlNet Command-Line Arguments\n\n<details>\n<summary>Click to expand ControlNet arguments</summary>\n\nThe `mflux-generate-controlnet` command supports most of the same arguments as `mflux-generate`, with these additional parameters:\n\n- **`--controlnet-image-path`** (required, `str`): Path to the local image used by ControlNet to guide output generation.\n\n- **`--controlnet-strength`** (optional, `float`, default: `0.4`): Degree of influence the control image has on the output. Ranges from `0.0` (no influence) to `1.0` (full influence).\n\n- **`--controlnet-save-canny`** (optional, bool, default: `False`): If set, saves the Canny edge detection reference image used by ControlNet.\n\nSee the [Controlnet](#%EF%B8%8F-controlnet) section for more details on how to use this feature effectively.\n\n</details>\n\n#### Upscale Command-Line Arguments\n\n<details>\n<summary>Click to expand Upscale arguments</summary>\n\nThe `mflux-upscale` command supports most of the same arguments as `mflux-generate`, with these specific parameters:\n\n- **`--height`** (optional, `int` or scale factor, default: `auto`): Image height. Can be specified as pixels (e.g., `1024`), scale factors (e.g., `2x`, `1.5x`), or `auto` to use the source image height.\n\n- **`--width`** (optional, `int` or scale factor, default: `auto`): Image width. Can be specified as pixels (e.g., `1024`), scale factors (e.g., `2x`, `1.5x`), or `auto` to use the source image width.\n\n- **`--controlnet-image-path`** (required, `str`): Path to the source image to upscale.\n\n- **`--controlnet-strength`** (optional, `float`, default: `0.4`): Degree of influence the control image has on the output. Ranges from `0.0` (no influence) to `1.0` (full influence).\n\n**Scale Factor Examples:**\n```bash\n# Scale by 2x in both dimensions\nmflux-upscale --height 2x --width 2x --controlnet-image-path source.png\n\n# Scale height by 1.5x, set width to specific pixels\nmflux-upscale --height 1.5x --width 1920 --controlnet-image-path source.png\n\n# Use auto (keeps original dimensions)\nmflux-upscale --height auto --width auto --controlnet-image-path source.png\n\n# Mix scale factors and absolute values\nmflux-upscale --height 2x --width 1024 --controlnet-image-path source.png\n```\n\nSee the [Upscale](#-upscale) section for more details on how to use this feature effectively.\n\n</details>\n\n#### Training Arguments\n\n<details>\n<summary>Click to expand Training arguments</summary>\n\n- **`--train-config`** (optional, `str`): Local path of the training configuration file. This file defines all aspects of the training process including model parameters, optimizer settings, and training data. See the [Training configuration](#training-configuration) section for details on the structure of this file.\n\n- **`--train-checkpoint`** (optional, `str`): Local path of the checkpoint file which specifies how to continue the training process. Used when resuming an interrupted training run.\n\n</details>\n\n</details>\n\n<details>\n<summary>\ud83d\udcdd <strong>Dynamic Prompts with `--prompt-file`</strong></summary>\n\nMFlux supports dynamic prompt updates through the `--prompt-file` option. Instead of providing a fixed prompt with `--prompt`, you can specify a plain text file containing your prompt. The file is re-read before each generation, allowing you to modify prompts between iterations without restarting.\n\n##### Key Benefits\n\n- **Real-time Prompt Editing**: Edit prompts between generations while the program continues running\n- **Iterative Refinement**: Fine-tune prompts based on previous results in a batch\n- **Editor Integration**: Work with complex prompts in your preferred text editor\n- **Consistent Parameters**: Experiment with prompt variations while keeping all other settings constant\n\n##### Example Usage\n\n```bash\n# Generate 10 images using a prompt from a file\nmflux-generate --prompt-file my_prompt.txt --auto-seeds 10\n```\n\n##### Workflow Example\n\n1. Create a prompt file:\n   ```\n   echo \"a surreal landscape with floating islands\" > my_prompt.txt\n   ```\n\n2. Start a batch generation:\n   ```\n   mflux-generate --prompt-file my_prompt.txt --auto-seeds 10\n   ```\n\n3. After reviewing initial results, edit the prompt file while generation continues:\n   ```\n   echo \"a surreal landscape with floating islands and waterfalls\" > my_prompt.txt\n   ```\n\n4. Subsequent generations will automatically use your updated prompt\n\n5. Continue refining as needed between generations\n\n##### Notes\n\n- `--prompt` and `--prompt-file` are mutually exclusive options\n- Empty prompt files or non-existent files will raise appropriate errors\n- Each generated image's metadata will contain the actual prompt used for that specific generation\n\n</details>\n\n<details>\n<summary>\u2699\ufe0f <strong>Parameters supported by config files</strong></summary>\n\n#### How configs are used\n\n- all config properties are optional and applied to the image generation if applicable\n- invalid or incompatible properties will be ignored\n\n#### Config schema\n\n```json\n{\n  \"$schema\": \"http://json-schema.org/draft-07/schema#\",\n  \"type\": \"object\",\n  \"properties\": {\n    \"seed\": {\n      \"type\": [\"integer\", \"null\"]\n    },\n    \"steps\": {\n      \"type\": [\"integer\", \"null\"]\n    },\n    \"guidance\": {\n      \"type\": [\"number\", \"null\"]\n    },\n    \"quantize\": {\n      \"type\": [\"null\", \"string\"]\n    },\n    \"lora_paths\": {\n      \"type\": [\"array\", \"null\"],\n      \"items\": {\n        \"type\": \"string\"\n      }\n    },\n    \"lora_scales\": {\n      \"type\": [\"array\", \"null\"],\n      \"items\": {\n        \"type\": \"number\"\n      }\n    },\n    \"prompt\": {\n      \"type\": [\"string\", \"null\"]\n    }\n  }\n}\n```\n\n#### Example\n\n```json\n{\n  \"model\": \"dev\",\n  \"seed\": 42,\n  \"steps\": 8,\n  \"guidance\": 3.0,\n  \"quantize\": 4,\n  \"lora_paths\": [\n    \"/some/path1/to/subject.safetensors\",\n    \"/some/path2/to/style.safetensors\"\n  ],\n  \"lora_scales\": [\n    0.8,\n    0.4\n  ],\n  \"prompt\": \"award winning modern art, MOMA\"\n}\n```\n</details>\n\n---\n\n### \u23f1\ufe0f Image generation speed (updated)\n\nThese numbers are based on the *non-quantized* `schnell` model, with the configuration provided in the code snippet below.\nTo time your machine, run the following:\n```sh\ntime mflux-generate \\\n--prompt \"Luxury food photograph\" \\\n--model schnell \\\n--steps 2 \\\n--seed 2 \\\n--height 1024 \\\n--width 1024\n```\n\nTo find out the spec of your machine (including number of CPU cores, GPU cores, and memory, run the following command:\n```sh\nsystem_profiler SPHardwareDataType SPDisplaysDataType\n```\n\n| Device      | M-series            | User                                                                                                                               | Reported Time | Notes                           |\n|-------------|---------------------|------------------------------------------------------------------------------------------------------------------------------------|---------------|---------------------------------|\n| Mac Studio  | 2023 M2 Ultra       | [@awni](https://x.com/awnihannun/status/1823515121827897385)                                                                       | <15s          |                                 |\n| Macbook Pro | 2024 M4 Max (128GB) | [@ivanfioravanti](https://gist.github.com/awni/a67d16d50f0f492d94a10418e0592bde?permalink_comment_id=5153531#gistcomment-5153531)  | ~19s          |                                 |\n| Macbook Pro | 2023 M3 Max         | [@karpathy](https://gist.github.com/awni/a67d16d50f0f492d94a10418e0592bde?permalink_comment_id=5153531#gistcomment-5153531)        | ~20s          |                                 |\n| -           | 2023 M2 Max (96GB)  | [@explorigin](https://github.com/filipstrand/mflux/issues/6)                                                                       | ~25s          |                                 |\n| Mac Mini    | 2024 M4 Pro (64GB)  | [@Stoobs](https://github.com/filipstrand/mflux/issues/92)                                                                          | ~34s          |                                 |\n| Mac Mini    | 2023 M2 Pro (32GB)  | [@leekichko](https://github.com/filipstrand/mflux/issues/85)                                                                       | ~54s          |                                 |\n| -           | 2022 M1 MAX (64GB)  | [@BosseParra](https://x.com/BosseParra/status/1826191780812877968)                                                                 | ~55s          |                                 |\n| Macbook Pro | 2023 M2 Max (32GB)  | @filipstrand                                                                                                                       | ~70s          |                                 |\n| -           | 2023 M3 Pro (36GB)  | [@kush-gupt](https://github.com/filipstrand/mflux/issues/11)                                                                       | ~80s          |                                 |\n| Mac Mini    | 2024 M4 (16GB)      | [@wnma3mz](https://github.com/filipstrand/mflux/issues/105)                                                                        | ~97s          | 512 x 512, 8-bit quantization   |\n| Macbook Pro | 2021 M1 Pro (32GB)  | @filipstrand                                                                                                                       | ~160s         |                                 |\n| -           | 2021 M1 Pro (16GB)  | [@qw-in](https://github.com/filipstrand/mflux/issues/7)                                                                            | ~175s         | Might freeze your mac           |\n| Macbook Air | 2020 M1 (8GB)       | [@mbvillaverde](https://github.com/filipstrand/mflux/issues/13)                                                                    | ~335s         | With resolution 512 x 512       |\n\n\n*Note that these numbers includes starting the application from scratch, which means doing model i/o, setting/quantizing weights etc.\nIf we assume that the model is already loaded, you can inspect the image metadata using `exiftool image.png` and see the total duration of the denoising loop (excluding text embedding).*\n\n*These benchmarks are not very scientific and is only intended to give ballpark numbers. They were performed during different times with different MFLUX and MLX-versions etc. Additional hardware information such as number of GPU cores, Mac device etc. are not always known.*\n\n---\n\n### \u2194\ufe0f Equivalent to Diffusers implementation\n\nThere is only a single source of randomness when generating an image: The initial latent array.\nIn this implementation, this initial latent is fully deterministically controlled by the input `seed` parameter.\nHowever, if we were to import a fixed instance of this latent array saved from the Diffusers implementation, then MFLUX will produce an identical image to the Diffusers implementation (assuming a fixed prompt and using the default parameter settings in the Diffusers setup).\n\n\nThe images below illustrate this equivalence.\nIn all cases the Schnell model was run for 2 time steps.\nThe Diffusers implementation ran in CPU mode.\nThe precision for MFLUX can be set in the [Config](src/mflux/config/config.py) class.\nThere is typically a noticeable but very small difference in the final image when switching between 16bit and 32bit precision.\n\n---\n```\nLuxury food photograph\n```\n![image](src/mflux/assets/comparison1.jpg)\n\n---\n```\ndetailed cinematic dof render of an old dusty detailed CRT monitor on a wooden desk in a dim room with items around, messy dirty room. On the screen are the letters \"FLUX\" glowing softly. High detail hard surface render\n```\n![image](src/mflux/assets/comparison2.jpg)\n\n---\n\n```\nphotorealistic, lotr, A tiny red dragon curled up asleep inside a nest, (Soft Focus) , (f_stop 2.8) , (focal_length 50mm) macro lens f/2. 8, medieval wizard table, (pastel) colors, (cozy) morning light filtering through a nearby window, (whimsical) steam shapes, captured with a (Canon EOS R5) , highlighting (serene) comfort, medieval, dnd, rpg, 3d, 16K, 8K\n```\n![image](src/mflux/assets/comparison3.jpg)\n\n---\n\n\n```\nA weathered fisherman in his early 60s stands on the deck of his boat, gazing out at a stormy sea. He has a thick, salt-and-pepper beard, deep-set blue eyes, and skin tanned and creased from years of sun exposure. He's wearing a yellow raincoat and hat, with water droplets clinging to the fabric. Behind him, dark clouds loom ominously, and waves crash against the side of the boat. The overall atmosphere is one of tension and respect for the power of nature.\n```\n![image](src/mflux/assets/comparison4.jpg)\n\n---\n\n```\nLuxury food photograph of an italian Linguine pasta alle vongole dish with lots of clams. It has perfect lighting and a cozy background with big bokeh and shallow depth of field. The mood is a sunset balcony in tuscany.  The photo is taken from the side of the plate. The pasta is shiny with sprinkled parmesan cheese and basil leaves on top. The scene is complemented by a warm, inviting light that highlights the textures and colors of the ingredients, giving it an appetizing and elegant look.\n```\n![image](src/mflux/assets/comparison5.jpg)\n\n---\n\n### \ud83d\udddc\ufe0f Quantization\n\nMFLUX supports running FLUX in 3, 4, 6, or 8-bit quantized mode. Running a quantized version can greatly speed up the\ngeneration process and reduce the memory consumption by several gigabytes. [Quantized models also take up less disk space](#-size-comparisons-for-quantized-models).\n\n```sh\nmflux-generate \\\n    --model schnell \\\n    --steps 2 \\\n    --seed 2 \\\n    --quantize 8 \\\n    --height 1920 \\\n    --width 1024 \\\n    --prompt \"Tranquil pond in a bamboo forest at dawn, the sun is barely starting to peak over the horizon, panda practices Tai Chi near the edge of the pond, atmospheric perspective through the mist of morning dew, sunbeams, its movements are graceful and fluid \u2014 creating a sense of harmony and balance, the pond's calm waters reflecting the scene, inviting a sense of meditation and connection with nature, style of Howard Terpning and Jessica Rossier\"\n```\n![image](src/mflux/assets/comparison6.jpg)\n\n*In this example, weights are quantized at **runtime** - this is convenient if you don't want to [save a quantized copy of the weights to disk](#-saving-a-quantized-version-to-disk), but still want to benefit from the potential speedup and RAM reduction quantization might bring.*\n\n\nBy selecting the `--quantize` or `-q` flag to be `4`, `8`, or removing it entirely, we get all 3 images above. As can be seen, there is very little difference between the images (especially between the 8-bit, and the non-quantized result).\nImage generation times in this example are based on a 2021 M1 Pro (32GB) machine. Even though the images are almost identical, there is a ~2x speedup by\nrunning the 8-bit quantized version on this particular machine. Unlike the non-quantized version, for the 8-bit version the swap memory usage is drastically reduced and GPU utilization is close to 100% during the whole generation. Results here can vary across different machines.\n\nFor systems with limited RAM, you can also use the `--low-ram` option which reduces GPU memory usage by constraining the MLX cache size and releasing text encoders and transformer components after use. This option is particularly helpful for preventing system memory swapping to disk on machines with less available RAM.\n\n#### \ud83d\udcca Size comparisons for quantized models\n\nThe model sizes for both `schnell` and `dev` at various quantization levels are as follows:\n\n| 3 bit  | 4 bit  | 6 bit   | 8 bit   | Original (16 bit) |\n|--------|--------|---------|---------|-------------------|\n| 7.52GB | 9.61GB | 13.81GB | 18.01GB | 33.73GB           |\n\n\n#### \ud83d\udcbe Saving a quantized version to disk\n\nTo save a local copy of the quantized weights, run the `mflux-save` command like so:\n\n```sh\nmflux-save \\\n    --path \"/Users/filipstrand/Desktop/schnell_8bit\" \\\n    --model schnell \\\n    --quantize 8\n```\n\n*Note that when saving a quantized version, you will need the original huggingface weights.*\n\nIt is also possible to specify [LoRA](#-lora) adapters when saving the model, e.g\n\n```sh\nmflux-save \\\n    --path \"/Users/filipstrand/Desktop/schnell_8bit\" \\\n    --model schnell \\\n    --quantize 8 \\\n    --lora-paths \"/path/to/lora.safetensors\" \\\n    --lora-scales 0.7\n```\n\nWhen generating images with a model like this, no LoRA adapter is needed to be specified since\nit is already baked into the saved quantized weights.\n\n#### \ud83d\udcbd Loading and running a quantized version from disk\n\nTo generate a new image from the quantized model, simply provide a `--path` to where it was saved:\n\n```sh\nmflux-generate \\\n    --path \"/Users/filipstrand/Desktop/schnell_8bit\" \\\n    --model schnell \\\n    --steps 2 \\\n    --seed 2 \\\n    --height 1920 \\\n    --width 1024 \\\n    --prompt \"Tranquil pond in a bamboo forest at dawn, the sun is barely starting to peak over the horizon, panda practices Tai Chi near the edge of the pond, atmospheric perspective through the mist of morning dew, sunbeams, its movements are graceful and fluid \u2014 creating a sense of harmony and balance, the pond's calm waters reflecting the scene, inviting a sense of meditation and connection with nature, style of Howard Terpning and Jessica Rossier\"\n```\n\n*Note: When loading a quantized model from disk, there is no need to pass in `-q` flag, since we can infer this from the weight metadata.*\n\n*Also Note: Once we have a local model (quantized [or not](#-running-a-non-quantized-model-directly-from-disk)) specified via the `--path` argument, the huggingface cache models are not required to launch the model.\nIn other words, you can reclaim the 34GB diskspace (per model) by deleting the full 16-bit model from the [Huggingface cache](#%EF%B8%8F-generating-an-image) if you choose.*\n\n\u26a0\ufe0f * Quantized models saved with mflux < v.0.6.0 will not work with v.0.6.0 and later due to updated implementation. The solution is to [save a new quantized local copy](https://github.com/filipstrand/mflux/issues/149) \n\n*If you don't want to download the full models and quantize them yourself, the 4-bit weights are available here for a direct download:*\n- For mflux < v.0.6.0:\n  - [madroid/flux.1-schnell-mflux-4bit](https://huggingface.co/madroid/flux.1-schnell-mflux-4bit)\n  - [madroid/flux.1-dev-mflux-4bit](https://huggingface.co/madroid/flux.1-dev-mflux-4bit)\n- For mflux >= v.0.6.0:\n  - [dhairyashil/FLUX.1-schnell-mflux-v0.6.2-4bit](https://huggingface.co/dhairyashil/FLUX.1-schnell-mflux-v0.6.2-4bit)\n  - [dhairyashil/FLUX.1-dev-mflux-4bit](https://huggingface.co/dhairyashil/FLUX.1-dev-mflux-4bit)\n\n<details>\n<summary>Using the community model support, the quantized weights can be also be automatically downloaded</summary>\n\n```sh\nmflux-generate \\\n    --model \"dhairyashil/FLUX.1-schnell-mflux-v0.6.2-4bit\" \\\n    --base-model schnell \\\n    --steps 2 \\\n    --seed 2 \\\n    --prompt \"Luxury food photograph\"\n```\n\n</details>\n\n---\n\n### \ud83d\udcbd Running a non-quantized model directly from disk\n\nMFLUX also supports running a non-quantized model directly from a custom location.\nIn the example below, the model is placed in `/Users/filipstrand/Desktop/schnell`:\n\n```sh\nmflux-generate \\\n    --path \"/Users/filipstrand/Desktop/schnell\" \\\n    --model schnell \\\n    --steps 2 \\\n    --seed 2 \\\n    --prompt \"Luxury food photograph\"\n```\n\nNote that the `--model` flag must be set when loading a model from disk.\n\nAlso note that unlike when using the typical `alias` way of initializing the model (which internally handles that the required resources are downloaded),\nwhen loading a model directly from disk, we require the downloaded models to look like the following:\n\n<details>\n<summary>\ud83d\udcc1 <strong>Required directory structure</strong></summary>\n\n```\n.\n\u251c\u2500\u2500 text_encoder\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 model.safetensors\n\u251c\u2500\u2500 text_encoder_2\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 model-00001-of-00002.safetensors\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 model-00002-of-00002.safetensors\n\u251c\u2500\u2500 tokenizer\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 merges.txt\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 special_tokens_map.json\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 tokenizer_config.json\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 vocab.json\n\u251c\u2500\u2500 tokenizer_2\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 special_tokens_map.json\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 spiece.model\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 tokenizer.json\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 tokenizer_config.json\n\u251c\u2500\u2500 transformer\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 diffusion_pytorch_model-00001-of-00003.safetensors\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 diffusion_pytorch_model-00002-of-00003.safetensors\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 diffusion_pytorch_model-00003-of-00003.safetensors\n\u2514\u2500\u2500 vae\n    \u2514\u2500\u2500 diffusion_pytorch_model.safetensors\n```\n\n</details>\n\nThis mirrors how the resources are placed in the [HuggingFace Repo](https://huggingface.co/black-forest-labs/FLUX.1-schnell/tree/main) for FLUX.1.\n*Huggingface weights, unlike quantized ones exported directly from this project, have to be\nprocessed a bit differently, which is why we require this structure above.*\n\n---\n\n### \ud83c\udf10 Third-Party HuggingFace Model Support\n\nMFLUX now supports compatible third-party models from HuggingFace that follow the FLUX architecture. This opens up the ecosystem to community-created models that may offer different capabilities, sizes, or specializations.\n\nTo use a third-party model, specify the HuggingFace repository ID with the `--model` parameter and indicate which base architecture (dev or schnell) it's derived from using the `--base-model` parameter:\n\n```sh\nmflux-generate \\\n    --model Freepik/flux.1-lite-8B \\\n    --base-model schnell \\\n    --steps 4 \\\n    --seed 42 \\\n    --prompt \"A beautiful landscape with mountains and a lake\"\n```\n\nSome examples of compatible third-party models include:\n- [Freepik/flux.1-lite-8B-alpha](https://huggingface.co/Freepik/flux.1-lite-8B-alpha) - A lighter version of FLUX\n- [shuttleai/shuttle-3-diffusion](https://huggingface.co/shuttleai/shuttle-3-diffusion) - Shuttle's implementation based on FLUX\n\nThe model will be automatically downloaded from HuggingFace the first time you use it, similar to the official FLUX models.\n\n*Note: Third-party models may have different performance characteristics, capabilities, or limitations compared to the official FLUX models. Always refer to the model's documentation on HuggingFace for specific usage instructions.*\n\n---\n\n### \ud83c\udfa8 Image-to-Image\n\nOne way to condition the image generation is by starting from an existing image and let MFLUX produce new variations.\nUse the `--image-path` flag to specify the reference image, and the `--image-strength` to control how much the reference \nimage should guide the generation. For example, given the reference image below, the following command produced the first\nimage using the  [Sketching](https://civitai.com/models/803456/sketching?modelVersionId=898364) LoRA: \n\n```sh\nmflux-generate \\\n--prompt \"sketching of an Eiffel architecture, masterpiece, best quality. The site is lit by lighting professionals, creating a subtle illumination effect. Ink on paper with very fine touches with colored markers, (shadings:1.1), loose lines, Schematic, Conceptual, Abstract, Gestural. Quick sketches to explore ideas and concepts.\" \\\n--image-path \"reference.png\" \\\n--image-strength 0.3 \\\n--lora-paths Architectural_Sketching.safetensors \\\n--lora-scales 1.0 \\\n--model dev \\\n--steps 20 \\\n--seed 43 \\\n--guidance 4.0 \\\n--quantize 8 \\\n--height 1024 \\\n--width 1024\n```\n\nLike with [Controlnet](#-controlnet), this technique combines well with [LoRA](#-lora) adapters:\n\n![image](src/mflux/assets/img2img.jpg)\n\nIn the examples above the following LoRAs are used [Sketching](https://civitai.com/models/803456/sketching?modelVersionId=898364), [Animation Shot](https://civitai.com/models/883914/animation-shot-flux-xl-ponyrealism) and [flux-film-camera](https://civitai.com/models/874708?modelVersionId=979175) are used.\n\n---\n\n### \ud83d\udd0c LoRA\n\nMFLUX support loading trained [LoRA](https://huggingface.co/docs/diffusers/en/training/lora) adapters (actual training support is coming).\n\nThe following example [The_Hound](https://huggingface.co/TheLastBen/The_Hound) LoRA from [@TheLastBen](https://github.com/TheLastBen):\n\n```sh\nmflux-generate --prompt \"sandor clegane\" --model dev --steps 20 --seed 43 -q 8 --lora-paths \"sandor_clegane_single_layer.safetensors\"\n```\n\n![image](src/mflux/assets/lora1.jpg)\n---\n\nThe following example is [Flux_1_Dev_LoRA_Paper-Cutout-Style](https://huggingface.co/Norod78/Flux_1_Dev_LoRA_Paper-Cutout-Style) LoRA from [@Norod78](https://huggingface.co/Norod78):\n\n```sh\nmflux-generate --prompt \"pikachu, Paper Cutout Style\" --model schnell --steps 4 --seed 43 -q 8 --lora-paths \"Flux_1_Dev_LoRA_Paper-Cutout-Style.safetensors\"\n```\n![image](src/mflux/assets/lora2.jpg)\n\n*Note that LoRA trained weights are typically trained with a **trigger word or phrase**. For example, in the latter case, the sentence should include the phrase **\"Paper Cutout Style\"**.*\n\n*Also note that the same LoRA weights can work well with both the `schnell` and `dev` models. Refer to the original LoRA repository to see what mode it was trained for.*\n\n#### Multi-LoRA\n\nMultiple LoRAs can be sent in to combine the effects of the individual adapters. The following example combines both of the above LoRAs:\n\n```sh\nmflux-generate \\\n   --prompt \"sandor clegane in a forest, Paper Cutout Style\" \\\n   --model dev \\\n   --steps 20 \\\n   --seed 43 \\\n   --lora-paths sandor_clegane_single_layer.safetensors Flux_1_Dev_LoRA_Paper-Cutout-Style.safetensors \\\n   --lora-scales 1.0 1.0 \\\n   -q 8\n```\n![image](src/mflux/assets/lora3.jpg)\n\nJust to see the difference, this image displays the four cases: One of having both adapters fully active, partially active and no LoRA at all.\nThe example above also show the usage of `--lora-scales` flag.\n\n#### LoRA Library Path\n\nMFLUX supports a convenient LoRA library feature that allows you to reference LoRA files by their basename instead of full paths. This is particularly useful when you have a collection of LoRA files organized in one or more directories.\n\nTo use this feature, set the `LORA_LIBRARY_PATH` environment variable to point to your LoRA directories. You can specify multiple directories separated by colons (`:`):\n\n```sh\nexport LORA_LIBRARY_PATH=\"/path/to/loras:/another/path/to/more/loras\"\n```\n\nOnce set, MFLUX will automatically discover all `.safetensors` files in these directories (including subdirectories) and allow you to reference them by their basename:\n\n```sh\n# Instead of using full paths:\nmflux-generate \\\n    --prompt \"a portrait\" \\\n    --lora-paths \"/path/to/loras/style1.safetensors\" \"/another/path/to/more/loras/style2.safetensors\"\n\n# You can simply use basenames:\nmflux-generate \\\n    --prompt \"a portrait\" \\\n    --lora-paths \"style1\" \"style2\"\n```\n\n<details>\n<summary>Notes on organizing your LoRA files</summary>\n\n- The basename is the filename without the `.safetensors` extension\n- If multiple files have the same basename, the first directory in `LORA_LIBRARY_PATH` takes precedence\n  - to workaround this, rename or symlink to another name your `.safetensors` files to avoid conflicts\n- Full paths still work as before, making this feature fully backwards compatible\n- The library paths are scanned recursively, so LoRAs in subdirectories are also discovered. However, we do not recommend setting the library paths to a directory with a large number of files, as it can slow down the scanning process on every run.\n</details>\n\n#### Supported LoRA formats (updated)\n\nSince different fine-tuning services can use different implementations of FLUX, the corresponding\nLoRA weights trained on these services can be different from one another. The aim of MFLUX is to support the most common ones.\nThe following table show the current supported formats:\n\n| Supported | Name      | Example                                                                                                  | Notes                               |\n|-----------|-----------|----------------------------------------------------------------------------------------------------------|-------------------------------------|\n| \u2705        | BFL       | [civitai - Impressionism](https://civitai.com/models/545264/impressionism-sdxl-pony-flux)                | Many things on civitai seem to work |\n| \u2705        | Diffusers | [Flux_1_Dev_LoRA_Paper-Cutout-Style](https://huggingface.co/Norod78/Flux_1_Dev_LoRA_Paper-Cutout-Style/) |                                     |\n| \u274c        | XLabs-AI  | [flux-RealismLora](https://huggingface.co/XLabs-AI/flux-RealismLora/tree/main)                           |                                     |\n\nTo report additional formats, examples or other any suggestions related to LoRA format support, please see [issue #47](https://github.com/filipstrand/mflux/issues/47).\n\n---\n\n### \ud83c\udfad In-Context Generation\n\nIn-Context Generation is a powerful collection of techniques that allows you to generate images based on reference images and context, without requiring model fine-tuning. MFLUX supports multiple in-context approaches, each designed for specific use cases ranging from style transfer to virtual try-on and image editing.\n\n#### \ud83d\udcf8 Kontext\n\nFLUX.1 Kontext, released in June 2025, is Black Forest Labs' [official model](https://bfl.ai/models/flux-kontext) that goes beyond text-to-image generation. Unlike previous flow models that only allow text-based generation, Kontext understands and creates from existing images, enabling flexible image editing through simple text instructions without complex workflows or finetuning.\nOut of the various in-context generation techniques, Kontext is the most advanced and versatile, allowing for a wide range of image editing and generation tasks. It is recommended to try Kontext first before exploring other in-context methods.\n\n![kontext example](src/mflux/assets/kontext_example.jpg)\n\n*Original images credit: [Jon Tyson (sign)](https://unsplash.com/photos/the-sign-urges-viewers-to-get-more-rest-a4Cm6EnYtls), [Daniel Se\u00dfler (town)](https://unsplash.com/photos/colorful-buildings-line-a-coastal-towns-harbor-haVcuj4buqE), [Liga Alksne (house)](https://unsplash.com/photos/a-building-with-a-mountain-in-the-background-jZV4jaycoxE) on Unsplash*\n\nKontext's core capabilities include **character consistency** (preserving unique elements across scenes), **local editing** (targeted modifications without affecting the rest), **style reference** (generating novel scenes while preserving styles), and **interactive speed** for iterative refinement with minimal latency.\n\n\n##### Example Usage with sequential edits\n\n![kontext sequential edit example](src/mflux/assets/kontext_sequential_edit_example.jpg)\n\nThe image above illustrates **sequential editing** with Kontext - making multiple individual edits to progressively refine an image. Each edit builds upon the previous result, allowing for precise, iterative modifications.\n\nHere's an example of one step in a sequential editing workflow, where we add atmospheric mist to the mountains:\n\n```bash\nmflux-generate-kontext \\\n  --image-path \"house_without_background_objects.png\" \\\n  --prompt \"Make the mountains in the background be covered in white mist so that they are barely visible. Preserve the original camera angle, house placement etc\" \\\n  --steps 20 \\\n  --seed 9375333 \\\n  --guidance 2.5 \\\n  --width 912 \\\n  --height 1360 \\\n  -q 8\n```\n\n##### Tips for Best Results\n\nFor comprehensive prompting strategies, see the [official Black Forest Labs Kontext prompting guide](https://docs.bfl.ai/guides/prompting_guide_kontext_i2i).\n\n**Core Principles:**\n- **Be Specific and Clear**: Use precise descriptions, avoid vague terms\n- **Step-by-step Editing**: Break complex modifications into multiple simple steps  \n- **Explicit Preservation**: State what should remain unchanged\n- **Verb Selection**: Use \"change\", \"replace\" rather than \"transform\"\n\n**Prompting Techniques:**\n\n*Basic Modifications:*\n- Simple: `\"Change the car color to red\"`\n- Maintain style: `\"Change to daytime while maintaining the same style of the painting\"`\n\n*Character Consistency:*\n- Use specific descriptions: `\"The woman with short black hair\"` instead of `\"she\"`\n- Preserve features: `\"while maintaining the same facial features, hairstyle, and expression\"`\n\n*Style Transfer:*\n- Name the style: `\"Transform to Bauhaus art style\"`\n- Describe characteristics: `\"Transform to oil painting with visible brushstrokes, thick paint texture\"`\n- Preserve composition: `\"Change to Bauhaus style while maintaining the original composition\"`\n\n*Text Editing:*\n- Use quotes: `\"Replace 'joy' with 'BFL'\"`\n- Maintain format: `\"Replace text while maintaining the same font style\"`\n\n**Common Problems & Solutions:**\n\n\u274c **Character changes too much**: `\"Transform the person into a Viking\"`  \n\u2705 **Better**: `\"Change the clothes to be a viking warrior while preserving facial features\"`\n\n\u274c **Position changes**: `\"Put him on a beach\"`  \n\u2705 **Better**: `\"Change the background to a beach while keeping the person in the exact same position, scale, and pose\"`\n\n\u274c **Vague style**: `\"Make it a sketch\"`  \n\u2705 **Better**: `\"Convert to pencil sketch with natural graphite lines, cross-hatching, and visible paper texture\"`\n\n**Ready-to-Use Templates:**\n- **Object Modification**: `\"Change [object] to [new state], keep [content to preserve] unchanged\"`\n- **Style Transfer**: `\"Transform to [specific style], while maintaining [composition/character/other] unchanged\"`\n- **Background Replacement**: `\"Change the background to [new background], keep the subject in the exact same position and pose\"`\n- **Text Editing**: `\"Replace '[original text]' with '[new text]', maintain the same font style\"`\n\n**Technical Tips:**\n1. **High-Quality References**: Use clear, well-lit reference images with distinct characteristics\n2. **Guidance Scale**: Experiment with guidance values between 2.5-4.0 for optimal balance\n3. **Multiple Seeds**: Try different seeds to explore variations while maintaining consistency\n4. **Resolution**: Use standard resolutions (1024x1024, 768x1024, etc.) for best results\n\n\u26a0\ufe0f *Note: Using the Kontext tool requires an additional ~34 GB download from [black-forest-labs/FLUX.1-Kontext-dev](https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev). The download happens automatically on first use.*\n\n#### \ud83c\udfa8 In-Context LoRA\n\nIn-Context LoRA allows you to generate images in a specific style based on a reference image. This approach uses specialized LoRA weights that enable the model to understand and apply the visual context from your reference image to a new generation.\n\nThis feature is based on the [In-Context LoRA for Diffusion Transformers](https://github.com/ali-vilab/In-Context-LoRA) project by Ali-ViLab.\n\n![image](src/mflux/assets/in_context_example.jpg)\n\nTo use In-Context LoRA, you need:\n1. A reference image\n2. A style LoRA (optional - the in-context ability works without LoRAs, but they can significantly enhance the results)\n\n##### Available Styles\n\nMFLUX provides several pre-defined styles from the [Hugging Face ali-vilab/In-Context-LoRA repository](https://huggingface.co/ali-vilab/In-Context-LoRA) that you can use with the `--lora-style` argument:\n\n| Style Name     | Description                               |\n|----------------|-------------------------------------------|\n| `couple`       | Couple profile photography style          |\n| `storyboard`   | Film storyboard sketching style           |\n| `font`         | Font design and typography style          |\n| `home`         | Home decoration and interior design style |\n| `illustration` | Portrait illustration style               |\n| `portrait`     | Portrait photography style                |\n| `ppt`          | Presentation template style               |\n| `sandstorm`    | Sandstorm visual effect                   |\n| `sparklers`    | Sparklers visual effect                   |\n| `identity`     | Visual identity and branding design style |\n\n##### How It Works\n\nThe In-Context LoRA generation creates a side-by-side image where:\n- The left side shows your reference image with noise applied\n- The right side shows the new generation that follows your prompt while maintaining the visual context\n\nThe final output is automatically cropped to show only the right half (the generated image).\n\n![image](src/mflux/assets/in_context_how_it_works.jpg)\n\n##### Prompting for In-Context LoRA\n\nFor best results with In-Context LoRA, your prompt should describe both the reference image and the target image you want to generate. Use markers like `[IMAGE1]`, `[LEFT]`, or `[RIGHT]` to distinguish between the two parts.\n\nHere's an example:\n\n```sh\nmflux-generate-in-context \\\n  --model dev \\\n  --steps 20 \\\n  --quantize 8 \\\n  --seed 42 \\\n  --height 1024 \\\n  --width 1024 \\\n  --image-path \"reference.png\" \\\n  --lora-style identity \\\n  --prompt \"In this set of two images, a bold modern typeface with the brand name 'DEMA' is introduced and is shown on a company merchandise product photo; [IMAGE1] a simplistic black logo featuring a modern typeface with the brand name 'DEMA' on a bright light green/yellowish background; [IMAGE2] the design is printed on a green/yellowish hoodie as a company merchandise product photo with a plain white background.\"\n```\n\nThis prompt clearly describes both the reference image (after `[IMAGE1]`) and the desired output (after `[IMAGE2]`). Other marker pairs you can use include:\n- `[LEFT]` and `[RIGHT]`\n- `[TOP]` and `[BOTTOM]`\n- `[REFERENCE]` and `[OUTPUT]`\n\n**Important**: In the current implementation, the reference image is ALWAYS placed on the left side of the composition, and the generated image on the right side. When using marker pairs in your prompt, the first marker (e.g., `[IMAGE1]`, `[LEFT]`, `[REFERENCE]`) always refers to your reference image, while the second marker (e.g., `[IMAGE2]`, `[RIGHT]`, `[OUTPUT]`) refers to what you want to generate.\n\n##### Tips for Best Results\n\n1. **Choose the right reference image**: Select a reference image with a clear composition and structure that matches your intended output.\n2. **Adjust guidance**: Higher guidance values (7.0-9.0) tend to produce results that more closely follow your prompt.\n3. **Try different styles**: Each style produces distinctly different results - experiment to find the one that best matches your vision.\n4. **Increase steps**: For higher quality results, use 25-30 steps.\n5. **Detailed prompting**: Be specific about both the reference image and your desired output in your prompt.\n6. **Try without LoRA**: While LoRAs enhance the results, you can experiment without them to see the base in-context capabilities.\n\n#### \ud83d\udc55 CatVTON (Virtual Try-On)\n\n\u26a0\ufe0f **Experimental Feature**: CatVTON is an experimental feature and may be removed or significantly changed in future updates.\n\nCatVTON enables virtual try-on capabilities using in-context learning. This approach allows you to generate images of people wearing different garments by providing a person image, a person mask, and a garment image.\n\nThe implementation is based on [@nftblackmagic/catvton-flux](https://github.com/nftblackmagic/catvton-flux) and uses the model weights from [xiaozaa/catvton-flux-beta](https://huggingface.co/xiaozaa/catvton-flux-beta) (approximately 24 GB download).\n\n![catvton example](src/mflux/assets/catvton_example.jpg)\n\n##### How it works\n\nCatVTON uses a sophisticated in-context learning approach that creates a side-by-side composition for virtual try-on:\n\n1. **Image Layout**: The garment image is placed on the left side, while the person image is positioned on the right side of the composition.\n\n2. **Selective Processing**: The model treats each side differently:\n   - **Left side (garment)**: The model can only *read* information from this area - no inpainting or modifications occur here. This serves as a reference for the garment's appearance, texture, and style.\n   - **Right side (person)**: Only the areas highlighted in the person mask are inpainted. The mask determines exactly which parts of the person image should be modified to \"wear\" the garment.\n\n3. **Contextual Understanding**: By reading the garment characteristics from the left and selectively applying them to the masked regions on the right, the model creates a realistic virtual try-on that maintains proper lighting, shadows, and fabric behavior.\n\nThis approach allows the model to understand the garment's properties while precisely controlling which parts of the person image should be transformed, resulting in natural-looking virtual clothing fitting.\n\n![catvton how it works](src/mflux/assets/catvton_how_it_works.jpg)\n\n##### Example Command\n\n```bash\nmflux-generate-in-context-catvton \\\n  --person-image \"person.png\" \\\n  --person-mask \"mask.png\" \\\n  --garment-image \"garment.png\" \\\n  --prompt \"The pair of images highlights a clothing and its styling on a model, high resolution, 4K, 8K; [IMAGE1] Detailed product shot of a light blue shirt with designer details such as white and pink patterns; [IMAGE2] The *exact* same cloth (the light blue shirt with designer details such as white and pink patterns) is worn by a model in a lifestyle setting.\" \\\n  --steps 20 \\\n  --seed 6269363 \\\n  --guidance 30 \\\n  --height 1024 \\\n  --width 891 \\\n  -q 8\n```\n\n\n\n##### Required Inputs\n\n- **Person Image**: A photo of the person who will \"wear\" the garment\n- **Person Mask**: A binary mask indicating the areas where the garment should be applied\n- **Garment Image**: The clothing item to be virtually worn\n\n##### CatVTON Features\n\n- **Automatic Prompting**: If no prompt is provided, CatVTON uses an optimized default prompt designed for virtual try-on scenarios\n- **Side-by-Side Generation**: Creates a diptych showing the garment product shot alongside the styled result\n- **Optimized for Clothing**: Specifically tuned for clothing and fashion applications\n- **High-Quality Results**: Maintains realistic lighting, shadows, and fabric properties\n\n##### Tips for CatVTON\n\n1. **High-Quality Inputs**: Use high-resolution, well-lit images for best results\n2. **Accurate Masks**: Ensure the person mask precisely covers the areas where the garment should appear\n3. **Consistent Lighting**: Match lighting conditions between person and garment images when possible\n4. **Garment Type**: Works best with clearly defined clothing items (shirts, dresses, jackets, etc.)\n5. **Pattern Limitations**: Very intricate patterns, fine details, and complex textures may not transfer well - simpler, solid colors and basic patterns typically work better\n6. **Try Multiple Seeds**: Generate several variations with different seeds to find the best result, as outcomes can vary significantly\n\n#### \u270f\ufe0f IC-Edit (In-Context Editing)\n\n\u26a0\ufe0f **Experimental Feature**: IC-Edit is an experimental feature and may be removed or significantly changed in future updates.\n\nIC-Edit provides intuitive image editing capabilities using natural language instructions. This approach automatically applies a specialized LoRA and generates edited versions of your reference image based on simple text instructions.\n\nThe implementation is based on [@River-Zhang/ICEdit](https://github.com/River-Zhang/ICEdit).\n\n![ic-edit example](src/mflux/assets/ic_edit_example.jpg)\n*Original images credit: [Jimmy Chang (flower)](https://unsplash.com/photos/closeup-photography-of-pink-lotus-flower-LP5WXkrnxX0), [Kar Ming Moo (tiger)](https://unsplash.com/photos/brown-and-black-tiger-lying-on-ground-Q_3WmguWgYg), [Martin Baron (car)](https://unsplash.com/photos/a-shiny-black-car-parks-on-a-street-8JoIWt2KzYI) on Unsplash*\n\n##### How it works\n\nIC-Edit employs a streamlined diptych approach for natural language image editing:\n\n1. **Image Layout**: The reference image is placed on the left side, while the right side is reserved for the edited result.\n\n2. **Automatic Masking**: Unlike CatVTON, no user-provided mask is required. The system automatically creates a simple but effective mask:\n   - **Left side (reference)**: Completely black mask - no denoising occurs here, preserving the original image as reference\n   - **Right side (edit area)**: Completely white mask - the entire area is available for denoising and modification\n\n3. **Instruction Processing**: Your natural language instruction is automatically wrapped in a diptych template that tells the model to create the desired changes on the right side while using the left side as reference.\n\n4. **Guided Generation**: The model reads the visual information from the left side and applies the requested modifications to generate the edited result on the right side, maintaining visual coherence and context.\n\nThis simplified approach removes the complexity of mask creation while still providing precise control through natural language instructions, making image editing accessible and intuitive.\n\n![ic-edit how it works](src/mflux/assets/ic_edit_how_it_works.jpg)\n\n##### Example Command\n\n```bash\nmflux-generate-in-context-edit \\\n  --reference-image \"flower.jpg\" \\\n  --prompt \"two images of the exact same flower in two different styles: On the left the photo has is in bright colors showing green leaves and a pink flower. On the right, the *exact* same photo (same flower, same leaves, same background, identical etc). but with where everything is black and white except for the flower which is still in color. The content is *exactly* the same between the left and right image, except only for the coloring (black and white for everything except for the colorful flower)\" \\\n  --steps 20 \\\n  --seed 8570325 \\\n  --guidance 30 \\\n  -q 8\n```\n\n\n\n##### Key Features\n\n- **Natural Language Instructions**: Use simple, descriptive instructions like \"make the hair black\", \"add sunglasses\", or \"change the background to a beach\"\n- **Automatic Prompting**: Instructions are automatically wrapped in a diptych template for optimal results\n- **Optimal Resolution**: Automatically resizes to 512px width (the resolution IC-Edit was trained on) while maintaining aspect ratio\n- **Specialized LoRA**: Automatically downloads and applies the IC-Edit LoRA for enhanced editing capabilities\n\n##### IC-Edit Options\n\nYou can use either `--instruction` for simple edits or `--prompt` for full control:\n\n```sh\n# Using instruction (recommended)\nmflux-generate-in-context-edit \\\n  --reference-image \"photo.jpg\" \\\n  --instruction \"remove the glasses\"\n\n# Using custom prompt (advanced)\nmflux-generate-in-context-edit \\\n  --reference-image \"photo.jpg\" \\\n  --prompt \"A diptych with two side-by-side images of the same scene. On the right, the scene is exactly the same as on the left but with a vintage filter applied\"\n```\n\n##### Tips for IC-Edit\n\n1. **Clear Instructions**: Use specific, actionable instructions for best results\n2. **Single Changes**: Focus on one edit at a time for more predictable results\n3. **Reference Quality**: Higher quality reference images generally produce better edits\n4. **Iterative Editing**: Use the output of one edit as input for the next to build complex changes\n5. **Try Multiple Seeds**: Generate several variations with different seeds to find the best result, as outcomes can vary significantly\n\n\u26a0\ufe0f *Note: Using the IC-Edit tool requires an additional ~230MB download for the specialized LoRA weights. The download happens automatically on first use. This remarkably small LoRA (less than 0.7% of the base model's ~34GB) enables sophisticated image editing capabilities, demonstrating that the base FLUX model is already highly competent at style transfers and transformations - the LoRA simply provides the user control and interface to access these latent capabilities.*\n\n---\n\n### \ud83d\udee0\ufe0f Flux Tools\n\nMFLUX supports the official [Flux.1 Tools](https://blackforestlabs.ai/flux-1-tools/).\n\n#### \ud83d\udd8c\ufe0f Fill\n\nThe Fill tool uses the [FLUX.1-Fill-dev](https://huggingface.co/black-forest-labs/FLUX.1-Fill-dev) model to allow you to selectively edit parts of an image by providing a binary mask. This is useful for inpainting (replacing specific areas) and outpainting (expanding the canvas).\n\n#### Inpainting\n\nInpainting allows you to selectively regenerate specific parts of an image while preserving the rest. This is perfect for removing unwanted objects, adding new elements, or changing specific areas of an image without affecting the surrounding content. The Fill tool understands the context of the entire image and creates seamless edits that blend naturally with the preserved areas.\n\n![inpaint example](src/mflux/assets/inpaint_example.jpg)\n*Original dog image credit: [Julio Bernal on Unsplash](https://unsplash.com/photos/brown-and-white-long-coated-dog-WLGx0fKFSeI)*\n\n##### Creating Masks\n\nBefore using the Fill tool, you need an image and a corresponding mask. You can create a mask using the included tool:\n\n```bash\npython -m tools.fill_mask_tool /path/to/your/image.jpg\n```\n\nThis will open an interactive interface where you can paint over the areas you want to regenerate.\nPressing the `s` key will save the mask at the same location as the image.\n\n##### Example\n\nTo regenerate specific parts of an image:\n\n```bash\nmflux-generate-fill \\\n  --prompt \"A professionally shot studio photograph of a dog wearing a red hat and ski goggles. The dog is centered against a uniformly bright yellow background, with well-balanced lighting and sharp details.\" \\\n  --steps 20 \\\n  --seed 42 \\\n  --height 1280 \\\n  --width 851 \\\n  --guidance 30 \\\n  -q 8 \\\n  --image-path \"dog.png\" \\\n  --masked-image-path \"dog_mask.png\" \\\n```\n\n#### Outpainting\n\n![outpaint example](src/mflux/assets/outpaint_example.jpg)\n*Original room image credit: [Alexey Aladashvili on Unsplash](https://unsplash.com/photos/a-living-room-with-a-couch-and-a-table-JFzglhmwlck)*\n\nOutpainting extends your image beyond its original boundaries, allowing you to expand the canvas in any direction while maintaining visual consistency with the original content. This is useful for creating wider landscapes, revealing more of a scene, or transforming a portrait into a full-body image. The Fill tool intelligently generates new content that seamlessly connects with the existing image.\n\nYou can expand the canvas of your image using the provided tool:\n\n```bash\npython -m tools.create_outpaint_image_canvas_and_mask \\\n  /path/to/your/image.jpg \\\n  --image-outpaint-padding \"0,30%,20%,0\"\n```\n\nAs an example, here's how to add 25% padding to both the left and right sides of an image:\n\n```bash\npython -m tools.create_outpaint_image_canvas_and_mask \\\n  room.png \\\n  --image-outpaint-padding \"0,25%,0,25%\"\n```\n\nThe padding format is \"top,right,bottom,left\" where each value can be in pixels or as a percentage. For example, \"0,30%,20%,0\" expands the canvas by 30% to the right and 20% to the bottom.\n\nAfter running this command, you'll get two files: an expanded canvas with your original image and a corresponding mask. You can then run the `mflux-generate-fill` command similar to the inpainting example, using these files as input.\n\n##### Example\n\nOnce you've created the expanded canvas and mask files, run the Fill tool on them:\n\n```bash\nmflux-generate-fill \\\n  --prompt \"A detailed interior room photograph with natural lighting, extended in a way that perfectly complements the original space. The expanded areas continue the architectural style, color scheme, and overall aesthetic of the room seamlessly.\" \\\n  --steps 25 \\\n  --seed 43 \\\n  --guidance 30 \\\n  -q 8 \\\n  --image-path \"room.png\" \\\n  --masked-image-path \"room_mask.png\" \\\n```\n\n##### Tips for Best Results\n\n- **Model**: The model is always `FLUX.1-Fill-dev` and should not be specified.\n- **Guidance**: Higher guidance values (around 30) typically yield better results. This is the default if not specified.\n- **Resolution**: Higher resolution images generally produce better results with the Fill tool.\n- **Masks**: Make sure your mask clearly defines the areas you want to regenerate. White areas in the mask will be regenerated, while black areas will be preserved.\n- **Prompting**: For best results, provide detailed prompts that describe both what you want in the newly generated areas and how it should relate to the preserved parts of the image.\n- **Steps**: Using 20-30 denoising steps generally produces higher quality results.\n\n\u26a0\ufe0f *Note: Using the Fill tool requires an additional 33.92 GB download from [black-forest-labs/FLUX.1-Fill-dev](https://huggingface.co/black-forest-labs/FLUX.1-Fill-dev). The download happens automatically on first use.*\n\n#### \ud83d\udd0d Depth\n\nUsing the depth tool, you can generate high-quality images constrained on a depth map from a reference image. These maps represent the estimated distance of each pixel from the camera, with brighter areas representing objects closer to the camera and darker areas representing objects farther away.\n\nFor state-of-the-art depth extraction, MFLUX now supports the [Depth Pro](https://github.com/apple/ml-depth-pro) model, natively implemented in MLX based on the reference implementation. \n\n![depth example](src/mflux/assets/depth_example.jpg)\n*Original hallway image credit: [Yuliya Matuzava on Unsplash](https://unsplash.com/photos/a-long-hallway-in-a-building-with-arches-and-arches-39NJt9jfGSU)*\n\n##### Example\n\nUsing the `mflux-generate-depth` we can let MFLUX generate a new image conditioned on the depth map of the input image:\n\n```bash\nmflux-generate-depth \\\n  --prompt \"A hallway made of white carrara marble\" \\\n  -q 8 \\\n  --height 1680 \\\n  --width 1200 \\\n  --image-path \"original_image.png\" \\\n  --steps 20 \\\n  --seed 7102725\n```\n\nInstead of automatically generating the depth map based on a reference image, you can instead input the depth map yourself using the `--depth-image-path`, like so:\n\n```bash\nmflux-generate-depth \\\n  --prompt \"A hallway made of white carrara marble\" \\\n  -q 8 \\\n  --height 1680 \\\n  --width 1200 \\\n  --depth-image-path \"depth_image.png\" \\\n  --steps 20\n```\n\nTo generate and export the depth map from an image without triggering the image generation, run the following command:\n\n```bash\nmflux-save-depth --image-path \"your_image.jpg\" -q 8\n```\n\nThis will create a depth map and save it with the same name as your image but with a \"_depth\" suffix (e.g., \"your_image_depth.png\").\nQuantization is supported for the Depth Pro model, however, output quality can very depend on the input image. \n\n\u26a0\ufe0f *Note: The Depth Pro model requires an additional 1.9GB download from Apple. The download happens automatically on first use.*\n\n\u26a0\ufe0f *Note: Using the Depth tool requires an additional 33.92 GB download from [black-forest-labs/FLUX.1-Depth-dev](https://huggingface.co/black-forest-labs/FLUX.1-Depth-dev). The download happens automatically on first use.*\n\n#### \ud83d\udd04 Redux\n\nThe Redux tool is an adapter for FLUX.1 models that enables image variation generation similar to the [image-to-image](#-image-to-image) technique. \nIt can reproduce an input image with slight variations, allowing you to refine and enhance existing images. \nUnlike the image-to-image technique which resumes the denoising process mid-way starting from the input image, the redux tool instead embeds the image and joins it with the T5 text encodings.\n\n![redux example](src/mflux/assets/redux_example.jpg)\n*Image credit: [Paris Bilal on Unsplash](https://unsplash.com/photos/a-cat-statue-sitting-on-top-of-a-white-base-Bedd4qHGWCg)*\n\n##### Example\n\nThe following examples show how to use the Redux tool to generate variations:\n\n```bash\nmflux-generate-redux \\\n  --prompt \"a grey statue of a cat on a white platform in front of a blue background\" \\\n  --redux-image-paths \"original.png\" \\\n  --steps 20 \\\n  --height 1654 \\\n  --width 1154 \\\n  -q 8\n```\n\nYou can also control the influence of each reference image by specifying strength values (between 0.0 and 1.0) using the `--redux-image-strengths` parameter:\n\n```bash\nmflux-generate-redux \\\n  --prompt \"a grey statue of a cat on a white platform in front of a blue background\" \\\n  --redux-image-paths \"image1.png\" \"image2.png\" \\\n  --redux-image-strengths 0.8 0.5 \\\n  --steps 20 \\\n  --height 1024 \\\n  --width 768 \\\n  -q 8\n```\n\nA higher strength value gives the reference image more influence over the final result. If you don't specify any strengths, a default value of 1.0 is used for all images.\n\nThere is a tendency for the reference image to dominate over the input prompt.\n\n\u26a0\ufe0f *Note: Using the Redux tool requires an additional 1.1GB download from [black-forest-labs/FLUX.1-Redux-dev](https://huggingface.co/black-forest-labs/FLUX.1-Redux-dev). The download happens automatically on first use.*\n\n---\n\n### \ud83d\udd79\ufe0f Controlnet\n\nMFLUX has [Controlnet](https://huggingface.co/docs/diffusers/en/using-diffusers/controlnet) support for an even more fine-grained control\nof the image generation. By providing a reference image via `--controlnet-image-path` and a strength parameter via `--controlnet-strength`, you can guide the generation toward the reference image.\n\n```sh\nmflux-generate-controlnet \\\n  --prompt \"A comic strip with a joker in a purple suit\" \\\n  --model dev \\\n  --steps 20 \\\n  --seed 1727047657 \\\n  --height 1066 \\\n  --width 692 \\\n  -q 8 \\\n  --lora-paths \"Dark Comic - s0_8 g4.safetensors\" \\\n  --controlnet-image-path \"reference.png\" \\\n  --controlnet-strength 0.5 \\\n  --controlnet-save-canny\n```\n![image](src/mflux/assets/controlnet1.jpg)\n\n*This example combines the controlnet reference image with the LoRA [Dark Comic Flux](https://civitai.com/models/742916/dark-comic-flux)*.\n\n\u26a0\ufe0f *Note: Controlnet requires an additional one-time download of ~3.58GB of weights from Huggingface. This happens automatically the first time you run the `generate-controlnet` command.\nAt the moment, the Controlnet used is [InstantX/FLUX.1-dev-Controlnet-Canny](https://huggingface.co/InstantX/FLUX.1-dev-Controlnet-Canny), which was trained for the `dev` model.\nIt can work well with `schnell`, but performance is not guaranteed.*\n\n\u26a0\ufe0f *Note: The output can be highly sensitive to the controlnet strength and is very much dependent on the reference image.\nToo high settings will corrupt the image. A recommended starting point a value like 0.4 and to play around with the strength.*\n\n\nControlnet can also work well together with [LoRA adapters](#-lora). In the example below the same reference image is used as a controlnet input\nwith different prompts and LoRA adapters active.\n\n![image](src/mflux/assets/controlnet2.jpg)\n\n---\n\n### \ud83d\udd0e Upscale\n\nThe upscale tool allows you to increase the resolution of an existing image while maintaining or enhancing its quality and details. It uses a specialized ControlNet model that's trained to intelligently upscale images without introducing artifacts or losing image fidelity.\n\nUnder the hood, this is simply the controlnet pipeline with [jasperai/Flux.1-dev-Controlnet-Upscaler](https://huggingface.co/jasperai/Flux.1-dev-Controlnet-Upscaler), with a small modification that we don't process the image with canny edge detection.\n\n![upscale example](src/mflux/assets/upscale_example.jpg)\n*Image credit: [Kevin Mueller on Unsplash](https://unsplash.com/photos/gray-owl-on-black-background-xvwZJNaiRNo)*\n\n#### How to use\n\n```sh\nmflux-upscale \\\n  --prompt \"A gray owl on black background\" \\\n  --steps 28 \\\n  --seed 42 \\\n  --height 1363 \\\n  --width 908 \\\n  -q 8 \\\n  --controlnet-image-path \"low_res_image.png\" \\\n  --controlnet-strength 0.6 \n```\n\nThis will upscale your input image to the specified dimensions. The upscaler works best when increasing the resolution by a factor of 2-4x.\n\n\u26a0\ufe0f *Note: Depending on the capability of your machine, you might run out of memory when trying to export the image. Try the `--vae-tiling` flag to process the image in smaller chunks, which significantly reduces memory usage during the VAE decoding phase at a minimal cost to performance. This optimization may occasionally produce a subtle seam in the middle of the image, but it's often worth the tradeoff for being able to generate higher resolution images. You can also use `--vae-tiling-split vertical` to change the split direction from the default horizontal (top/bottom) to vertical (left/right).*\n\n#### Tips for Best Results\n\n- For optimal results, try to maintain the same aspect ratio as the original image\n- Prompting matters: Try to acculturate describe the image when upscaling\n- The recommended `--controlnet-strength` is in the range between 0.5 to 0.7\n\n---\n\n### \ud83c\udf9b\ufe0f Dreambooth fine-tuning\n\nAs of release [v.0.5.0](https://github.com/filipstrand/mflux/releases/tag/v.0.5.0), MFLUX has support for fine-tuning your own LoRA adapters using the [Dreambooth](https://dreambooth.github.io) technique.\n\n![image](src/mflux/assets/dreambooth.jpg)\n\n*This example shows the MFLUX training progression of the [included training example](#training-example) which is based on the [DreamBooth Dataset](https://github.com/google/dreambooth), also used in the [mlx-examples repo](https://github.com/ml-explore/mlx-examples/tree/main/flux#finetuning).*\n\n#### Training configuration\n\nTo describe a training run, you need to provide a [training configuration](src/mflux/dreambooth/_example/train.json) file which specifies the details such as\nwhat training data to use and various parameters. To try it out, one of the easiest ways is to start from the \nprovided [example configuration](src/mflux/dreambooth/_example/train.json) and simply use your own dataset and prompts by modifying the `examples` section of the json file.\n\n#### Training example\n\nA complete example ([training configuration](src/mflux/dreambooth/_example/train.json) + [dataset](src/mflux/dreambooth/_example/images)) is provided in this repository. To start a training run, go to the project folder `cd mflux`, and simply run:\n\n```sh\nmflux-train --train-config src/mflux/dreambooth/_example/train.json\n```\n\nBy default, this will train an adapter with images of size `512x512` with a batch size of 1 and can take up to several hours to fully complete depending on your machine.\nIf this task is too computationally demanding, see the section on [memory issues](#memory-issues) for tips on how to speed things up and what tradeoffs exist.\n\nDuring training, MFLUX will output training checkpoints with artifacts (weights, states) according to what is specified in the configuration file.\nAs specified in the file `train.json`, these files will be placed in a folder on the Desktop called `~/Desktop/train`, but this can of course be changed to any other path by adjusting the configuration. \nAll training artifacts will be saved as self-contained zip file, which can later be pointed to [resume an existing training run](#resuming-a-training-run).\nTo find the LoRA weights, simply unzip and look for the `adapter` safetensors file and [use it as you would with a regular downloaded LoRA adapter](#-lora).\n\n#### Resuming a training run\n\nThe training process will continue to run until each training example has been used `num_epochs` times. \nFor various reasons however, the user might choose to interrupt the process.\nTo resume training for a given checkpoint, say `0001000_checkpoint.zip`, simply run: \n\n```sh\nmflux-train --train-checkpoint 0001000_checkpoint.zip\n```\n\nThis uses the `--train-checkpoint` command-line argument to specify the checkpoint file to resume from.\n\nThere are two nice properties of the training procedure: \n\n- Fully deterministic (given a specified `seed` in the training configuration)\n- The complete training state (including optimizer state) is saved at each checkpoint.\n\nBecause of these, MFLUX has the ability to resume a training run from a previous checkpoint and have the results\nbe *exactly* identical to a training run which was never interrupted in the first place.\n\n*\u26a0\ufe0f Note: Everything but the dataset itself is contained within this zip file, as the dataset can be quite large.\nThe zip file will contain configuration files which point to the original dataset, so make sure that it is in the same place when resuming training*.\n\n*\u26a0\ufe0f Note: One current limitation is that a training run can only be resumed if it has not yet been completed. \nIn other words, only checkpoints that represent an interrupted training-run can be resumed and run until completion.*\n\n<details>\n<summary>\u2699\ufe0f <strong>Configuration details</strong></summary>\n\nCurrently, MFLUX supports fine-tuning only for the transformer part of the model.\nIn the training configuration, under `lora_layers`, you can specify which layers you want to train. The available ones are: \n\n- `transformer_blocks`:\n  - `attn.to_q`\n  - `attn.to_k`\n  - `attn.to_v`\n  - `attn.add_q_proj`\n  - `attn.add_k_proj`\n  - `attn.add_v_proj`\n  - `attn.to_out`\n  - `attn.to_add_out`\n  - `ff.linear1`\n  - `ff.linear2`\n  - `ff_context.linear1`\n  - `ff_context.linear2`\n- `single_transformer_blocks`:\n  - `proj_out`\n  - `proj_mlp`\n  - `attn.to_q`\n  - `attn.to_k`\n  - `attn.to_v`\n\nThe `block_range` under the respective layer category specifies which blocks to train.\nThe maximum range available for the different layer categories are:\n\n- `transformer_blocks`:\n  - `start: 0`\n  - `end: 19`\n- `single_transformer_blocks`:\n  - `start: 0`\n  - `end: 38`\n\n<details>\n<summary>Specify individual layers</summary>\n\nFor even more precision, you can specify individual block indices to train like so:\n\n```json\n\"lora_layers\": {\n  \"single_transformer_blocks\": {\n    \"block_range\": {\n      \"indices\": [\n        0,\n        1,\n        7,\n        19,\n        20\n      ],\n      ...\n  },\n...\n```\n</details>\n\n*\u26a0\ufe0f Note: As the joint transformer blocks (`transformer_blocks`) - are placed earlier on in the sequence of computations, they will require more resources to train.\nIn other words, training later layers, such as only the `single_transformer_blocks` should be faster. However, training too few / only later layers might result in a faster but unsuccessful training.*\n\n*Under the `examples` section, there is an argument called `\"path\"` which specifies where the images are located. This path is relative to the config file itself.*\n\n</details>\n\n<details>\n<summary>\u26a0\ufe0f <strong>Memory issues</strong></summary>\n\nDepending on the configuration of the training setup, fine-tuning can be quite memory intensive.\nIn the worst case, if your Mac runs out of memory it might freeze completely and crash!\n\nTo avoid this, consider some of the following strategies to reduce memory requirements by adjusting the parameters in the training configuration:\n\n- Use a quantized based model by setting `\"quantize\": 4` or `\"quantize\": 8`\n- For the `layer_types`, consider skipping some of the trainable layers (e.g. by not including `proj_out` etc.)\n- Use a lower `rank` value for the LoRA matrices.\n- Don't train all the  `38` layers from `single_transformer_blocks` or all of the `19` layers from `transformer_blocks`  \n- Use a smaller batch size, for example `\"batch_size\": 1`\n- Make sure your Mac is not busy with other background tasks that holds memory.\n\nApplying some of these strategies, like how [train.json](src/mflux/dreambooth/_example/train.json) is set up by default,\nwill allow a 32GB M1 Pro to perform a successful fine-tuning run. \nNote, however, that reducing the trainable parameters might lead to worse performance.\n \n\n*Additional techniques such as gradient checkpoint and other strategies might be implemented in the future.*\n\n</details> \n\n<details>\n<summary>\ud83d\udcdd <strong>Misc</strong></summary>\n\nThis feature is currently v1 and can be considered a bit experimental. Interfaces might change (configuration file setup etc.)\nThe aim is to also gradually expand the scope of this feature with alternative techniques, data augmentation etc.\n\n- As with loading external LoRA adapters, the MFLUX training currently only supports training the transformer part of the network.\n- Sometimes, a model trained with the `dev` model might actually work better when applied to the `schnell` weights.\n- Currently, all training images are assumed to be in the resolution specified in the configuration file.\n- Loss curve can be a bit misleading/hard to read, sometimes it conveys little improvement over time, but actual image samples show the real progress.\n- When plotting the loss during training, we label it as \"validation loss\" but it is actually only the first 10 elements of the training examples for now. Future updates should support user inputs of separate validation images.\n- Training also works with the original model as quantized!\n- [For the curious, a motivation for the loss function can be found here](src/mflux/dreambooth/optimization/_loss_derivation/dreambooth_loss.pdf).\n- Two great resources that heavily inspired this feature are: \n  - The fine-tuning script in [mlx-examples](https://github.com/ml-explore/mlx-examples/tree/main/flux#finetuning)\n  - The original fine-tuning script in [Diffusers](https://huggingface.co/docs/diffusers/v0.11.0/en/training/dreambooth)\n\n</details>\n\n\n---\n\n### \ud83e\udde0 Concept Attention\n\nThe Concept Attention feature allows you to visualize and understand how FLUX models focus on specific concepts within your prompts during image generation.\n\n![concept_example_1](src/mflux/assets/concept_example_1.jpg)\n\nThis implementation is based on the research paper [\"ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features\"](https://arxiv.org/abs/2502.04320) by [Helbling et al.](https://github.com/helblazer811/ConceptAttention), which demonstrates that multi-modal diffusion transformers like FLUX learn highly interpretable representations that can be leveraged to generate precise concept localization maps.\n\nMFLUX provides two concept attention tools:\n\n1. **Text-based Concept Analysis** (`mflux-concept`): Analyzes attention patterns for a specific concept within your text prompt\n2. **Image-guided Concept Analysis** (`mflux-concept-from-image`): Uses a reference image to guide concept attention analysis\n\n#### Text-based Concept Analysis\n\nThis approach analyzes how the model attends to a specific concept mentioned in your prompt. The model generates an image while tracking attention patterns, then creates a heatmap showing where the model focused when processing your concept.\n\n##### Example\n\n```bash\nmflux-concept \\\n  --prompt \"A dragon on a hill\" \\\n  --concept \"dragon\" \\\n  --model schnell \\\n  --steps 4 \\\n  --seed 9643208 \\\n  --height 720 \\\n  --width 1280 \\\n  --heatmap-layer-indices 15 16 17 18 \\\n  --heatmap-timesteps 0 1 2 3 \\\n  -q 4\n```\nThis will generate the following image\n\n![concept_example_1](src/mflux/assets/concept_example_2.jpg)\n\nThis command will generate:\n- The main image based on your prompt\n- A concept attention heatmap showing where the model focused on the \"dragon\" concept\n- Both images are automatically saved with appropriate naming\n\n#### Image-guided Concept Analysis\n\nThis approach uses a reference image to guide the concept attention analysis. The model analyzes how a concept appears in the reference image and applies similar attention patterns to generate a new image that maintains those conceptual relationships.\n\n##### Example\n\n```bash\nmflux-concept-from-image \\\n  --model schnell \\\n  --input-image-path \"puffin.png\" \\\n  --prompt \"Two puffins are perched on a grassy, flower-covered cliffside, with one appearing to call out while the other looks on silently against a blurred ocean backdrop\" \\\n  --concept \"bird\" \\\n  --steps 4 \\\n  --height 720 \\\n  --width 1280 \\\n  --seed 4529717 \\\n  --heatmap-layer-indices 15 16 17 18 \\\n  --heatmap-timesteps 0 1 2 3 \\\n  -q 4\n```\n\nThis will generate the following image\n\n![concept_example_1](src/mflux/assets/concept_example_3.jpg)\n\n#### Advanced Configuration\n\n- **`--heatmap-layer-indices`**: Controls which transformer layers to analyze (default: 15-18). Different layers capture different levels of abstraction\n\n- **`--heatmap-timesteps`**: Specifies which denoising steps to include in the analysis (default: all steps). Lower timestep values focus on early generation stages where broad composition is determined.\n\n---\n\n### \ud83d\udea7 Current limitations\n\n- Images are generated one by one.\n- Negative prompts not supported.\n- LoRA weights are only supported for the transformer part of the network.\n- Some LoRA adapters does not work.\n- Currently, the supported controlnet is the [canny-only version](https://huggingface.co/InstantX/FLUX.1-dev-Controlnet-Canny).\n- Dreambooth training currently does not support sending in training parameters as flags.\n- In-Context Generation features currently only support a left-right image setup (reference image on left, generated image on right).\n\n---\n\n### Optional Tool: Batch Image Renamer\n\nWith a large number of generated images, some users want to automatically rename their image outputs to reflect the prompts and configs.\n\nThe bundled `tools/rename_images.py` is an optional tool that is part with the project repo\nbut *not* included in the `mflux` Python package due to additional dependencies that\ndo not make sense to become standard project requirements.\n\nThe script uses [KeyBERT](https://maartengr.github.io/KeyBERT/) (a keyword extraction library)\nto extract keywords from mflux exif metadata to update the image file names.\nWe then use `uv run` to execute the script in an isolated env without affecting your `mflux` env.\n\nUsers who want to use or extend this tool to their own needs is encouraged to `git clone` the repo\nthen `uv run tools/rename_images.py <paths>` or download the single-file standalone script\nand `uv run your/path/rename_images.py`.\n\nThis script's renaming logic can be customized to your needs.\nSee `uv run tools/rename_images.py --help` for full CLI usage help.\n\n---\n\n### \ud83d\udca1Workflow Tips\n\n- To hide the model fetching status progress bars, `export HF_HUB_DISABLE_PROGRESS_BARS=1`\n- To opt-out of [HuggingFace telemetry](https://huggingface.co/docs/huggingface_hub/main/en/package_reference/environment_variables#hfhubdisabletelemetry), `export HF_HUB_DISABLE_TELEMETRY=1`\n- Use config files to save complex job parameters in a file instead of passing many `--args`\n- Set up shell aliases for required args examples:\n  - shortcut for dev model: `alias mflux-dev='mflux-generate --model dev'`\n  - shortcut for schnell model *and* always save metadata: `alias mflux-schnell='mflux-generate --model schnell --metadata'`\n- For systems with limited memory, use the `--low-ram` flag to reduce memory usage by constraining the MLX cache size and releasing components after use\n- On battery-powered Macs, use `--battery-percentage-stop-limit` (or `-B`) to prevent your laptop from shutting down during long generation sessions\n- When generating multiple images with different seeds, use `--seed` with multiple values or `--auto-seeds` to automatically generate a series of random seeds\n- Use `--stepwise-image-output-dir` to save intermediate images at each denoising step, which can be useful for debugging or creating animations of the generation process\n\n---\n\n### \ud83d\udd2c Cool research\n- [ ] [PuLID](https://github.com/ToTheBeginning/PuLID)\n\n---\n\n### \ud83c\udf31\u200d Related projects\n\n- [Mflux-ComfyUI](https://github.com/raysers/Mflux-ComfyUI) by [@raysers](https://github.com/raysers)\n- [MFLUX-WEBUI](https://github.com/CharafChnioune/MFLUX-WEBUI) by [@CharafChnioune](https://github.com/CharafChnioune)\n- [mflux-fasthtml](https://github.com/anthonywu/mflux-fasthtml) by [@anthonywu](https://github.com/anthonywu)\n- [mflux-streamlit](https://github.com/elitexp/mflux-streamlit) by [@elitexp](https://github.com/elitexp)\n\n---\n\n### \ud83d\ude4f Acknowledgements\n\nMFLUX would not be possible without the great work of:\n\n- The MLX Team for [MLX](https://github.com/ml-explore/mlx) and [MLX examples](https://github.com/ml-explore/mlx-examples)\n- Black Forest Labs for the [FLUX project](https://github.com/black-forest-labs/flux)\n- Hugging Face for the [Diffusers library implementation of Flux](https://huggingface.co/black-forest-labs/FLUX.1-dev) \n- Depth Pro authors for the [Depth Pro model](https://github.com/apple/ml-depth-pro?tab=readme-ov-file#citation)\n- The MLX community and all [contributors and testers](https://github.com/filipstrand/mflux/graphs/contributors)\n\n---\n\n### \u2696\ufe0f License\n\nThis project is licensed under the [MIT License](LICENSE).\n",
    "bugtrack_url": null,
    "license": "MIT License\n         \n         Copyright (c) 2025 Filip Strand\n         \n         Permission is hereby granted, free of charge, to any person obtaining a copy\n         of this software and associated documentation files (the \"Software\"), to deal\n         in the Software without restriction, including without limitation the rights\n         to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\n         copies of the Software, and to permit persons to whom the Software is\n         furnished to do so, subject to the following conditions:\n         \n         The above copyright notice and this permission notice shall be included in all\n         copies or substantial portions of the Software.\n         \n         THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n         IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n         FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\n         AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n         LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\n         OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\n         SOFTWARE.",
    "summary": "A MLX port of FLUX based on the Huggingface Diffusers implementation.",
    "version": "0.9.6",
    "project_urls": {
        "homepage": "https://github.com/filipstrand/mflux"
    },
    "split_keywords": [
        "diffusers",
        " flux",
        " mlx"
    ],
    "urls": [
        {
            "comment_text": "Automated release via mflux-release-script",
            "digests": {
                "blake2b_256": "241ab53be8a2e8756c57c54ecae1bb1390fe44a7840a9104c49b7ced19285f18",
                "md5": "3faf499b7d3c21437d244a1c425a3a03",
                "sha256": "f05f64a449d9509fef1919e2378961b9800d6946f86e519d5bd88ff108a1d615"
            },
            "downloads": -1,
            "filename": "mflux-0.9.6-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3faf499b7d3c21437d244a1c425a3a03",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 226325,
            "upload_time": "2025-07-20T10:05:50",
            "upload_time_iso_8601": "2025-07-20T10:05:50.751331Z",
            "url": "https://files.pythonhosted.org/packages/24/1a/b53be8a2e8756c57c54ecae1bb1390fe44a7840a9104c49b7ced19285f18/mflux-0.9.6-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "Automated release via mflux-release-script",
            "digests": {
                "blake2b_256": "0089c06c275a0c6258e6a4ba5fd6951e3847940a20c0795e7b84bdf940b6d8c8",
                "md5": "46ad5c6ec0b1834166fdc4e84783e8cb",
                "sha256": "475998777bb1f971d2eef1f61aab044de8a1427c155edad5cd9024bdd938f644"
            },
            "downloads": -1,
            "filename": "mflux-0.9.6.tar.gz",
            "has_sig": false,
            "md5_digest": "46ad5c6ec0b1834166fdc4e84783e8cb",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 164252,
            "upload_time": "2025-07-20T10:05:52",
            "upload_time_iso_8601": "2025-07-20T10:05:52.849088Z",
            "url": "https://files.pythonhosted.org/packages/00/89/c06c275a0c6258e6a4ba5fd6951e3847940a20c0795e7b84bdf940b6d8c8/mflux-0.9.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-20 10:05:52",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "filipstrand",
    "github_project": "mflux",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "mflux"
}

Filip Strand