
# FLYNC - FLY Non-Coding gene discovery & classification
## TL;DR (Quick Start)
Install (conda recommended):
```bash
conda create -n flync -c bioconda -c conda-forge flync
conda activate flync
```
Setup genome (dm6):
```bash
flync setup --genome-dir genome
```
Generate config template and edit paths:
```bash
flync config --template --output config.yaml
# Set: samples: null and fastq_dir: "/path/to/fastq"
```
Run bioinformatics (auto-detect FASTQs):
```bash
flync run-bio -c config.yaml -j 8
```
Predict lncRNAs (novel transcripts):
```bash
flync run-ml -g results/assemblies/merged-new-transcripts.gtf \
-o results/lncrna_predictions.csv -r genome/genome.fa -t 8
```
All-in-one:
```bash
flync run-all -c config.yaml -j 8
```
Essential outputs:
- results/assemblies/merged-new-transcripts.gtf (novel)
- results/lncrna_predictions.csv (predictions)
- results/dge/... (if metadata CSV with condition provided)
Need help? Run:
```bash
flync run-bio -c config.yaml --dry-run
```
## Minimal Conceptual Overview
1. Input: FASTQs (local) or SRA IDs (via metadata CSV).
2. Snakemake workflow builds transcriptome and isolates novel transcripts.
3. Feature engine converts GTF + genome into a model-ready feature matrix.
4. Pre-trained model classifies lncRNA vs protein-coding; outputs probabilities.
5. Optional differential expression if conditions provided.
## When to Use Which Command
- run-bio: You only need assemblies.
- run-ml: You have a GTF and want predictions.
- run-all: End-to-end (recommended for new users).
- setup: Prepare genome and indices once.
- config: Generate or validate config.yaml.
## Common Pitfalls (Fast Answers)
| Issue | Fix |
|-------|-----|
| samples: null fails | Ensure fastq_dir is set |
| Snakefile not found | pip install -e . |
| Missing genome index | Re-run flync setup |
| All predictions identical | Check feature extraction logs |
| DGE missing | Ensure metadata CSV has header + condition column |
Full detailed documentation continues below.
---
## Table of Contents
- [TL;DR (Quick Start)](#tldr-quick-start)
- [Minimal Conceptual Overview](#minimal-conceptual-overview)
- [When to Use Which Command](#when-to-use-which-command)
- [Common Pitfalls (Fast Answers)](#common-pitfalls-fast-answers)
- [Pipeline Overview](#pipeline-overview)
- [Key Features](#key-features)
- [Installation](#installation)
- [Quick Start](#quick-start)
- [Usage Guide](#usage-guide)
- [1. Setup Reference Genome](#1-setup-reference-genome)
- [2. Configure Pipeline](#2-configure-pipeline)
- [Sample Specification (3 Modes)](#sample-specification-3-modes)
- [3. Run Bioinformatics Pipeline](#3-run-bioinformatics-pipeline)
- [4. Run ML Prediction](#4-run-ml-prediction)
- [5. Run Complete Pipeline (Recommended)](#5-run-complete-pipeline-recommended)
- [6. Differential Gene Expression (DGE)](#6-differential-gene-expression-dge)
- [7. Python API Usage](#7-python-api-usage)
- [Pipeline Architecture](#pipeline-architecture)
- [Advanced Usage](#advanced-usage)
- [Troubleshooting](#troubleshooting)
- [Contributing](#contributing)
- [License](#license)
---
## Pipeline Overview
FLYNC executes a complete lncRNA discovery workflow with three main execution modes:
### Phase 1: Bioinformatics Pipeline (`flync run-bio`)
Run only the RNA-seq processing and assembly:
1. **Read Mapping** - Align RNA-seq reads to reference genome using HISAT2
2. **Transcriptome Assembly** - Reconstruct transcripts per sample with StringTie
3. **Assembly Merging** - Create unified transcriptome with gffcompare
4. **Novel Transcript Extraction** - Identify transcripts not in reference annotation
5. **Quantification** - Calculate expression levels per transcript
6. **DGE Analysis** (optional) - Differential expression with Ballgown when metadata.csv provided
### Phase 2: ML Prediction (`flync run-ml`)
Run only the machine learning classification:
1. **Feature Extraction** - Extract multi-modal genomic features
2. **Feature Cleaning** - Standardize and prepare features for ML
3. **ML Classification** - Predict lncRNA candidates using trained EBM model
4. **Confidence Scoring** - Provide prediction probabilities and confidence scores
### Complete Pipeline (`flync run-all`)
Run the entire workflow end-to-end with a single command
---
## Key Features
**Complete End-to-End Pipeline** - Single `flync run-all` command for full workflow
**Unified Environment** - All dependencies managed in single `environment.yml`
**Differential Expression** - Integrated Ballgown DGE analysis for condition comparisons
**Public Python API** - Use FLYNC programmatically in custom workflows
**Flexible Input Modes** - Auto-detect samples from FASTQ directory or use sample lists
**Snakemake Orchestration** - Robust workflow management with automatic parallelization
**Comprehensive Features** - 100+ genomic features from multiple data sources
**Intelligent Caching** - Downloads and caches remote genomic tracks automatically
**Production-Ready Models** - Pre-trained EBM classifier with high accuracy
**Multi-Stage Docker** - Runtime and pre-warmed images for flexible deployment
**Python 3.11** - Modern Python codebase with type hints and comprehensive documentation
---
## Installation
### Overview
Choose an installation method based on what you need:
| Use Case | Recommended Method | Command |
|----------|-------------------|---------|
| Full pipeline (alignment + assembly + ML) | Conda (base) | `conda create -n flync -c bioconda -c conda-forge flync` |
| Add differential expression (Ballgown) | Conda add-on | `conda install -n flync flync-dge` |
| ML / feature extraction only (no aligners) | pip + extras | `pip install flync[features,ml]` |
| Programmatic Snakemake orchestration (no bio tools) | pip minimal + workflow | `pip install flync[workflow]` |
| Reproducible container execution | Docker (runtime) | `docker pull ghcr.io/homemlab/flync:latest` |
| Faster startup with pre-cached tracks | Docker (prewarmed) | `docker pull ghcr.io/homemlab/flync:latest-prewarmed` |
### Option 1: Conda (Recommended – Full Stack)
```bash
conda create -n flync -c bioconda -c conda-forge flync
conda activate flync
flync --help
```
Add DGE support (Ballgown + R stack):
```bash
conda install -n flync flync-dge # after base install
```
Or install both at once:
```bash
conda create -n flync -c bioconda -c conda-forge flync flync-dge
```
### Option 2: pip (Python-Only / Lightweight)
Pip will NOT install external bioinformatics binaries (HISAT2, StringTie, samtools, etc.). Use this only for feature extraction or ML inference on an existing GTF.
```bash
python -m venv flync-venv
source flync-venv/bin/activate
pip install --upgrade pip
# Feature extraction + ML
pip install "flync[features,ml]"
# Add Snakemake lightweight orchestration (still no external binaries)
pip install "flync[workflow]"
flync run-ml --help
```
If you attempt `flync run-bio` without the required external tools, FLYNC will explain what is missing and how to install via conda.
### Option 3: Docker
Runtime image (downloads tracks on demand):
```bash
docker pull ghcr.io/homemlab/flync:latest
docker run --rm -v $PWD:/work ghcr.io/homemlab/flync:latest \
flync --help
```
Prewarmed image (tracks pre-cached):
```bash
docker pull ghcr.io/homemlab/flync:latest-prewarmed
```
### Which Should I Pick?
| Scenario | Choose |
|----------|-------|
| New user, want everything | Conda base (add `flync-dge` if doing DGE) |
| HPC / cluster with module rules | Conda (export env YAML for reproducibility) |
| Notebook exploratory ML only | pip extras (`features,ml`) |
| CI / workflow integration | Docker runtime image |
| Need fastest repeated ML runs | Docker prewarmed image |
### External Tool Summary
The following are ONLY installed automatically via the Conda packages (`flync`, `flync-dge`):
```
hisat2, stringtie, gffcompare, gffread, samtools, bedtools, sra-tools,
R (r-base), bioconductor-ballgown, r-matrixstats, r-ggplot2
```
Pip installations will perform a dependency sanity check and abort `run-bio` if these are missing (unless `--skip-deps-check` is used).
### Development Install (Editable)
```bash
git clone https://github.com/homemlab/flync.git
cd flync
conda env create -f environment.yml
conda activate flync
pip install -e .
```
### Prerequisites
- **Operating System**: Linux (tested on Debian/Ubuntu)
- **Conda/Mamba**: Required for managing dependencies
- **System Requirements**:
- 8+ GB RAM (16+ GB recommended for large datasets)
- 20+ GB disk space (genome, indices, and tracks)
- 4+ CPU cores (8+ recommended)
### Install from Source (Full + Editable)
For development or if you need the latest unreleased features:
```bash
# 1. Clone the repository
git clone https://github.com/homemlab/flync.git
cd flync
git checkout master # Use the master branch (production)
# 2. Create conda environment with dependencies
conda env create -f environment.yml
# 3. Activate environment
conda activate flync
# 4. Install package in development mode
pip install -e .
# 5. Verify installation
flync --help
```
Docker image details moved above for quick discovery.
---
## Quick Start
**Complete workflow with `run-all` command:**
```bash
# 1. Activate conda environment
conda activate flync
# 2. Download genome and build indices
flync setup --genome-dir genome
# 3. Create configuration file
flync config --template --output config.yaml
# 4. Edit config.yaml with your paths and settings
# See config_example_full.yaml for all available options
# 5. Create metadata.csv with sample information (MUST have header row!)
cat > metadata.csv << EOF
sample_id,condition,replicate
SRR123456,control,1
SRR123457,control,2
SRR123458,treatment,1
SRR123459,treatment,2
EOF
# 6. Update config.yaml to use metadata.csv
# Change: samples: null
# To: samples: metadata.csv
# 7. Run complete pipeline (bioinformatics + ML + DGE)
flync run-all --configfile config.yaml --cores 8
```
**Alternative: Step-by-step workflow:**
```bash
# Run bioinformatics pipeline only
flync run-bio --configfile config.yaml --cores 8
# Then run ML prediction
flync run-ml \
--gtf results/assemblies/merged-new-transcripts.gtf \
--output results/lncrna_predictions.csv \
--ref-genome genome/genome.fa \
--threads 8
```
**Python API Usage:**
```python
from flync import run_pipeline
from pathlib import Path
# Run complete pipeline programmatically
result = run_pipeline(
config_path=Path("config.yaml"),
cores=8,
ml_threads=8
)
print(f"Status: {result['status']}")
print(f"Predictions: {result['predictions_file']}")
```
**Output:**
- `results/assemblies/merged.gtf` - Full transcriptome (reference + novel)
- `results/assemblies/merged-new-transcripts.gtf` - Novel transcripts only
- `results/cov/` - Per-sample quantification files
- `results/dge/` - Differential expression analysis (if metadata.csv provided)
- `transcript_dge_results.csv` - Transcript-level DE results
- `gene_dge_results.csv` - Gene-level DE results
- `dge_summary.csv` - Summary statistics
- `results/lncrna_predictions.csv` - lncRNA predictions with confidence scores
---
## Usage Guide
### 1. Setup Reference Genome
Download *Drosophila melanogaster* genome (BDGP6.32/dm6) and build HISAT2 index:
```bash
flync setup --genome-dir genome
```
**What this does:**
- Downloads genome FASTA from Ensembl (release 106)
- Downloads gene annotation GTF
- Builds HISAT2 index (~10 minutes, requires ~4GB RAM)
- Extracts splice sites for splice-aware alignment
**Skip download if files exist:**
```bash
flync setup --genome-dir genome --skip-download
```
### 2. Configure Pipeline
Generate a configuration template:
```bash
flync config --template --output config.yaml
```
**Edit `config.yaml`** with your settings:
```yaml
# Sample specification (3 options - see below)
samples: null # Auto-detect from fastq_dir
fastq_dir: "/path/to/fastq/files" # Directory with FASTQ files
fastq_paired: false # true for paired-end, false for single-end
# Reference files (created by 'flync setup')
genome: "genome/genome.fa"
annotation: "genome/genome.gtf"
hisat_index: "genome/genome.idx"
splice_sites: "genome/genome.ss"
# Output and resources
output_dir: "results"
threads: 8
# Tool parameters (optional)
params:
hisat2: "-p 8 --dta --dta-cufflinks"
stringtie_assemble: "-p 8"
stringtie_merge: ""
stringtie_quantify: "-eB"
download_threads: 4 # For SRA downloads
```
#### Sample Specification (3 Modes)
**Mode 1: Auto-detect from FASTQ directory (Recommended)**
```yaml
samples: null # Must be null to enable auto-detection
fastq_dir: "/path/to/fastq"
fastq_paired: false
```
Automatically detects samples from filenames:
- **Paired-end**: `sample1_1.fastq.gz` + `sample1_2.fastq.gz` → detects `sample1`
- **Single-end**: `sample1.fastq.gz` → detects `sample1`
**Mode 2: Plain text list**
```yaml
samples: "samples.txt"
fastq_dir: "/path/to/fastq" # Optional if using SRA
```
`samples.txt`:
```
sample1
sample2
sample3
```
**Mode 3: CSV with metadata (for differential expression)**
```yaml
samples: "metadata.csv"
fastq_dir: "/path/to/fastq" # Optional if using SRA
```
`metadata.csv`:
```csv
sample_id,condition,replicate
sample1,control,1
sample2,control,2
sample3,treated,1
```
**⚠️ Important:** When using CSV metadata, the header row with column names (`sample_id`, `condition`) is **required**. Without headers, the DGE analysis will fail. The `sample_id` column is mandatory for sample identification, and the `condition` column is required to enable differential expression analysis.
### 3. Run Bioinformatics Pipeline
Execute the complete RNA-seq workflow:
```bash
flync run-bio --configfile config.yaml --cores 8
```
**What happens:**
1. **Read Mapping**: HISAT2 aligns reads to genome (splice-aware)
2. **Assembly**: StringTie reconstructs transcripts per sample
3. **Merging**: Combines assemblies into unified transcriptome
4. **Comparison**: gffcompare identifies novel vs. known transcripts
5. **Quantification**: StringTie calculates TPM and FPKM values
**Input Modes:**
**A. Local FASTQ files** (set `fastq_dir` in config)
```bash
flync run-bio --configfile config.yaml --cores 8
```
**B. SRA accessions** (omit `fastq_dir`, provide SRA IDs in samples)
```csv
# samples.csv
sample_id,condition,replicate
SRR1234567,control,1
SRR1234568,treated,1
```
SRA files are automatically downloaded using `prefetch` + `fasterq-dump`.
**Useful Options:**
```bash
# Dry run - show what would be executed
flync run-bio -c config.yaml --dry-run
# Unlock after crash
flync run-bio -c config.yaml --unlock
# More cores for faster processing
flync run-bio -c config.yaml --cores 16
```
**Output Structure:**
```
results/
├── data/ # Alignment files
│ └── {sample}/
│ └── {sample}.sorted.bam
├── assemblies/
│ ├── stringtie/ # Per-sample assemblies
│ │ └── {sample}.rna.gtf
│ ├── merged.gtf # Unified transcriptome
│ ├── merged-new-transcripts.gtf # Novel transcripts only
│ └── assembled-new-transcripts.fa # Novel transcript sequences
├── gffcompare/
│ └── gffcmp.stats # Assembly comparison stats
├── cov/ # Expression quantification
│ └── {sample}/
│ └── {sample}.rna.gtf
└── logs/ # Per-rule log files
```
### 4. Run ML Prediction
Classify novel transcripts as lncRNA or protein-coding:
```bash
flync run-ml \
--gtf results/assemblies/merged-new-transcripts.gtf \
--output results/lncrna_predictions.csv \
--ref-genome genome/genome.fa \
--threads 8
```
**Required Arguments:**
- `--gtf`, `-g`: Input GTF file (novel transcripts or full assembly)
- `--output`, `-o`: Output CSV file for predictions
- `--ref-genome`, `-r`: Reference genome FASTA file
- `--output`, `-o`: Output CSV file for predictions
- `--ref-genome`, `-r`: Reference genome FASTA file
**Optional Arguments:**
- `--model`, `-m`: Custom trained model (default: bundled EBM model)
- `--bwq-config`: Custom BigWig track configuration
- `--threads`, `-t`: Number of threads (default: 8)
- `--cache-dir`: Cache directory for downloaded tracks (default: `./bwq_tracks`)
- `--clear-cache`: Clear cache before starting
**What happens:**
1. **Sequence Extraction**: Extracts spliced transcript sequences from GTF
2. **K-mer Profiling**: Calculates 3-12mer frequencies with TF-IDF + SVD
3. **BigWig Query**: Queries 50+ genomic tracks (chromatin, conservation, etc.)
4. **Structure Prediction**: Calculates RNA minimum free energy
5. **Feature Cleaning**: Standardizes features and aligns with model schema
6. **ML Prediction**: Classifies using pre-trained EBM model
**Output Format (`lncrna_predictions.csv`):**
```csv
transcript_id,prediction,confidence,probability_lncrna
MSTRG.1.1,1,0.95,0.95
MSTRG.1.2,0,0.87,0.13
MSTRG.2.1,1,0.89,0.89
```
**Column Descriptions:**
- `transcript_id`: Transcript identifier from GTF
- `prediction`: 1 = lncRNA, 0 = protein-coding
- `confidence`: Model confidence score (0-1)
- `probability_lncrna`: Probability of being lncRNA (0-1)
**Filter high-confidence lncRNAs:**
```bash
# Get lncRNAs with >90% confidence
awk -F',' '$3 > 0.90 && $2 == 1' results/lncrna_predictions.csv > high_conf_lncrnas.csv
```
### 5. Run Complete Pipeline (Recommended)
Execute both bioinformatics and ML prediction with a single command:
```bash
flync run-all --configfile config.yaml --cores 8
```
**Unified Configuration:**
```yaml
# Bioinformatics settings
samples: metadata.csv
genome: genome/genome.fa
annotation: genome/genome.gtf
hisat_index: genome/genome.idx
output_dir: results
threads: 8
# ML settings (required for run-all)
ml_reference_genome: genome/genome.fa
ml_output_file: results/lncrna_predictions.csv
ml_bwq_config: config/bwq_config.yaml # Optional
ml_cache_dir: /path/to/cache # Optional
```
**What happens:**
1. Runs bioinformatics pipeline (`flync run-bio`)
2. Automatically detects output GTF (`results/assemblies/merged-new-transcripts.gtf`)
3. Runs ML prediction on novel transcripts
4. Generates DGE analysis if `metadata.csv` has condition column
**Options:**
```bash
# Skip bioinformatics (use existing GTF)
flync run-all -c config.yaml --skip-bio
# Skip ML prediction (only run bioinformatics)
flync run-all -c config.yaml --skip-ml
# Dry run to see what would be executed
flync run-all -c config.yaml --dry-run
# Custom thread allocation
flync run-all -c config.yaml --cores 16 --ml-threads 8
```
### 6. Differential Gene Expression (DGE)
Run DGE analysis using Ballgown when metadata with conditions is provided:
**Requirements:**
- `samples` config key points to a CSV file (not TXT)
- CSV **must have a header row** with column names
- CSV **must contain** `sample_id` column (for sample identification)
- CSV **must contain** `condition` column (for grouping samples in DGE)
**Example metadata.csv:**
```csv
sample_id,condition,replicate
SRR123456,control,1
SRR123457,control,2
SRR123458,treatment,1
SRR123459,treatment,2
```
**⚠️ Critical:** The header row is **not optional**. If you omit it or have a headerless CSV, the DGE analysis will fail with an error about missing the `sample_id` column.
**DGE runs automatically** when using `flync run-bio` or `flync run-all` with metadata CSV.
**Output Files:**
```
results/dge/
├── transcript_dge_results.csv # Transcript-level differential expression
├── gene_dge_results.csv # Gene-level differential expression
├── dge_summary.csv # Analysis summary statistics
├── transcript_ma_plot.png # MA plot visualization
└── ballgown_dge.log # Analysis log
```
**DGE Results Format:**
```csv
id,pval,qval,fc,gene_name,gene_id
MSTRG.1.1,0.001,0.01,2.5,gene_A,FBgn0001
MSTRG.1.2,0.05,0.12,1.8,gene_B,FBgn0002
```
**Filter significant transcripts:**
```bash
# Get transcripts with FDR < 0.05
awk -F',' '$3 < 0.05' results/dge/transcript_dge_results.csv > significant_de.csv
```
### 7. Python API Usage
Use FLYNC programmatically in custom workflows:
```python
from flync import run_pipeline, run_bioinformatics, run_ml_prediction
from pathlib import Path
# Run complete pipeline
result = run_pipeline(
config_path=Path("config.yaml"),
cores=8,
ml_threads=8,
verbose=True
)
if result['status'] == 'success':
print(f"✓ Pipeline completed!")
print(f" Predictions: {result['predictions_file']}")
print(f" Output directory: {result['output_dir']}")
```
**Run only bioinformatics:**
```python
from flync import run_bioinformatics
result = run_bioinformatics(
config_path=Path("config.yaml"),
cores=16,
verbose=True
)
```
**Run only ML prediction:**
```python
from flync import run_ml_prediction
result = run_ml_prediction(
gtf_file=Path("merged.gtf"),
output_file=Path("predictions.csv"),
ref_genome=Path("genome.fa"),
threads=8,
verbose=True
)
print(f"Predicted {result['n_lncrna']} lncRNAs")
```
**Integration in larger workflows:**
```python
import flync
# Part of a larger analysis pipeline
def analyze_rnaseq_data(sample_dir, output_dir):
# Run FLYNC
result = flync.run_pipeline(
config_path=create_config(sample_dir, output_dir),
cores=8
)
# Continue with downstream analyses
if result['status'] == 'success':
lncrnas = pd.read_csv(result['predictions_file'])
perform_enrichment_analysis(lncrnas)
generate_report(lncrnas, result['output_dir'])
```
---
## Pipeline Architecture
FLYNC follows a modular Python-first architecture with unified CLI:
```
┌─────────────────────────────────────────────────────────────┐
│ CLI Layer (click) │
│ flync run-all | run-bio | run-ml | setup | config │
│ + Public Python API (flync.run_pipeline) │
└──────────────┬────────────────────────┬─────────────────────┘
│ │
┌─────────▼────────┐ ┌─────────▼──────────┐
│ Bioinformatics │ │ ML Prediction │
│ (Snakemake) │ │ (Python) │
└─────────┬────────┘ └─────────┬──────────┘
│ │
┌─────────▼────────┐ ┌─────────▼──────────┐
│ Workflow Rules │ │ Feature Extraction │
│ - mapping.smk │ │ - feature_wrapper │
│ - assembly.smk │ │ - bwq, kmer, mfe │
│ - merge.smk │ │ - cleaning │
│ - quantify.smk │ │ │
│ - dge.smk │ │ │
└──────────────────┘ └─────────┬──────────┘
│
┌──────────▼──────────┐
│ ML Predictor │
│ - EBM model │
│ - Schema validator │
└─────────────────────┘
```
### Core Components
**1. CLI (`src/flync/cli.py`) & API (`src/flync/api.py`)**
- Single unified command with 5 subcommands: `run-all`, `run-bio`, `run-ml`, `setup`, `config`
- New `run-all` orchestrates complete pipeline end-to-end
- Public Python API for programmatic access
- Custom error handling and helpful messages
- Absolute path resolution for file operations
**2. Workflows (`src/flync/workflows/`)**
- **Snakefile**: Main workflow orchestrator with conditional DGE
- **rules/mapping.smk**: HISAT2 alignment, SRA download, FASTQ symlinking
- **rules/assembly.smk**: StringTie per-sample assembly
- **rules/merge.smk**: StringTie merge + gffcompare
- **rules/quantify.smk**: Expression quantification
- **rules/dge.smk**: Ballgown differential expression
- **scripts/ballgown_dge.R**: R script for Ballgown DGE analysis
- **scripts/predownload_tracks.py**: Docker image track pre-caching
**3. Feature Extraction (`src/flync/features/`)**
- **feature_wrapper.py**: High-level orchestration
- **bwq.py**: BigWig/BigBed track querying
- **kmer.py**: K-mer profiling with TF-IDF and SVD
- **mfe.py**: RNA secondary structure (MFE calculation)
- **feature_cleaning.py**: Data preparation and schema alignment
**4. ML Prediction (`src/flync/ml/`)**
- **predictor.py**: Main prediction interface
- **ebm_predictor.py**: EBM model wrapper
- **schema_validator.py**: Feature schema validation
**5. Utilities (`src/flync/utils/`)**
- **kmer_redux.py**: K-mer transformation utilities
- **progress.py**: Progress bar management
**6. Assets (`src/flync/assets/`)**
- Pre-trained EBM models and scalers
- Model schema definitions
**7. Configuration (`src/flync/config/`)**
- **bwq_config.yaml**: Default BigWig track configuration
---
## Advanced Usage
### Custom BigWig Track Configuration
Create a custom `bwq_config.yaml` to query your own tracks:
```yaml
# List of BigWig/BigBed files to query
- path: /path/to/custom_track.bigWig
upstream: 1000 # Extend region upstream
downstream: 1000 # Extend region downstream
stats:
- stat: mean
name: custom_mean
- stat: max
name: custom_max
- stat: coverage
name: custom_coverage
- path: https://example.com/remote_track.bigBed
stats:
- stat: coverage
name: remote_coverage
- stat: extract_names
name: remote_names
name_field_index: 3 # For BigBed name extraction
```
**Available Statistics:**
- `mean`, `max`, `min`, `sum`: Numerical summaries
- `std`: Standard deviation
- `coverage`: Fraction of region covered by signal
- `extract_names`: Extract names from BigBed entries
Use with ML prediction:
```bash
flync run-ml --gtf input.gtf --output predictions.csv \
--ref-genome genome.fa --bwq-config custom_bwq_config.yaml
```
### Feature Extraction Only
Extract features without running prediction:
```bash
python src/flync/features/feature_wrapper.py all \
--gtf annotations.gtf \
--ref-genome genome.fa \
--bwq-config config/bwq_config.yaml \
--k-min 3 --k-max 12 \
--use-tfidf --use-dim-redux --redux-n-components 1 \
--output features.parquet
```
### Training Custom Models
**1. Prepare training data:**
```bash
# Split positive and negative samples
python src/flync/optimizer/prepare_data.py \
--positive-file lncrna_features.parquet \
--negative-file protein_coding_features.parquet \
--output-dir datasets/ \
--train-size 0.7 --val-size 0.15 --test-size 0.15
```
**2. Optimize hyperparameters:**
```bash
python src/flync/optimizer/hyperparameter_optimizer.py \
--train-data datasets/train.parquet \
--test-data datasets/test.parquet \
--holdout-data datasets/holdout.parquet \
--model-type randomforest \
--optimization-metrics precision f1 \
--n-trials 100 \
--experiment-name "Custom_RF_Model"
```
**3. View results in MLflow UI:**
```bash
mlflow ui --backend-store-uri sqlite:///mlflow.db
# Open http://localhost:5000
```
**4. Extract model schema for inference:**
```bash
python src/flync/ml/schema_extractor.py \
--model-path best_model.pkl \
--training-data datasets/train.parquet \
--output-schema model_schema.json
```
### Docker Deployment
**Build custom image:**
```bash
docker build -t my-flync:latest -f Dockerfile .
```
**Run with mounted volumes:**
```bash
docker run --rm \
-v $PWD/data:/data \
-v $PWD/genome:/genome \
-v $PWD/results:/results \
my-flync:latest \
flync run-bio -c /data/config.yaml --cores 8
```
**Interactive shell:**
```bash
docker run -it --rm -v $PWD:/work my-flync:latest /bin/bash
```
---
## Troubleshooting
### Installation Issues
**Problem**: `command not found: flync`
```bash
# Solution: Activate conda environment
conda activate flync
# Verify installation
which flync
flync --version
```
**Problem**: `Snakefile not found` when running `flync run-bio`
```bash
# Solution: Reinstall package in editable mode
pip install -e .
```
**Problem**: Missing bioinformatics tools (hisat2, stringtie, etc.)
```bash
# Solution: Recreate conda environment
conda env remove -n flync
conda env create -f environment.yml
conda activate flync
```
### Pipeline Execution Issues
**Problem**: HISAT2 index build fails
```bash
# Check available disk space (needs ~10GB)
df -h
# Check available memory (needs ~4GB)
free -h
# Check logs
cat genome/idx.err.txt
```
**Problem**: SRA download hangs or fails
```bash
# Solution 1: Reduce download threads in config.yaml
params:
download_threads: 2 # Instead of 4
# Solution 2: Pre-download SRA files manually
prefetch SRR1234567
fasterq-dump SRR1234567 --outdir fastq/
```
**Problem**: Snakemake workflow crashes
```bash
# Unlock working directory
flync run-bio -c config.yaml --unlock
# Check logs for specific rule
tail -f results/logs/hisat2/sample1.log
# Rerun with verbose output
flync run-bio -c config.yaml --cores 8 --dry-run --printshellcmds
```
**Problem**: `samples: null` fails
```bash
# Solution: Must also set fastq_dir in config.yaml
samples: null
fastq_dir: "/path/to/fastq" # Required for auto-detection
fastq_paired: false
```
### Feature Extraction Issues
**Problem**: Feature extraction fails with "track not accessible"
```bash
# Solution: Check internet connection (tracks downloaded from UCSC/Ensembl)
wget -q --spider http://genome.ucsc.edu
echo $? # Should be 0
# Clear cache and retry
flync run-ml --gtf input.gtf --clear-cache ...
```
**Problem**: "No sequences available for downstream feature generation"
```bash
# Solution 1: Verify GTF has transcript and exon features
grep -c 'transcript' input.gtf
grep -c 'exon' input.gtf
# Solution 2: Check reference genome is accessible
ls -lh genome/genome.fa
samtools faidx genome/genome.fa # Build index if missing
```
**Problem**: "kmer_redux utilities not available"
```bash
# Solution: Verify utils module is installed
python -c "from flync.utils import kmer_redux; print('OK')"
# Reinstall if needed
pip install -e .
```
### ML Prediction Issues
**Problem**: "schema mismatch" error during prediction
```bash
# Solution: Feature transformations must match training
# Ensure these flags are set correctly:
flync run-ml --gtf input.gtf --output predictions.csv \
--ref-genome genome.fa
# (Default model expects: use_tfidf=True, use_dim_redux=True, redux_n_components=1)
```
**Problem**: Predictions all 0 or all 1
```bash
# Solution 1: Check input GTF quality
# Ensure transcripts are complete and have exons
# Solution 2: Verify feature extraction succeeded
# Check for warnings in logs
# Solution 3: Use different model or retrain
flync run-ml --gtf input.gtf --model custom_model.pkl ...
```
**Problem**: Out of memory during feature extraction
```bash
# Solution 1: Reduce threads
flync run-ml --threads 4 ...
# Solution 2: Process in smaller batches
# Split GTF and process separately
# Solution 3: Use sparse k-mer format (automatic with default settings)
```
### Docker Issues
**Problem**: Docker permission denied
```bash
# Solution 1: Add user to docker group
sudo usermod -aG docker $USER
newgrp docker
# Solution 2: Run with sudo
sudo docker run ...
```
**Problem**: Docker container out of disk space
```bash
# Clean up old containers and images
docker system prune -a
# Check disk usage
docker system df
```
---
## Contributing
Contributions are welcome! Please follow these guidelines:
### Development Setup
```bash
# Clone and setup development environment
git clone https://github.com/homemlab/flync.git
cd flync
git checkout master
# Create development environment
conda env create -f environment.yml
conda activate flync
# Install in development mode
pip install -e .
# Optional: Install development dependencies
pip install pytest black flake8 mypy
```
### Code Style
- **Python**: Follow PEP 8, use Black formatter (line length 100)
- **Type Hints**: Required for public functions
- **Docstrings**: Google style for all modules, classes, functions
- **Imports**: Absolute imports preferred (`from flync.module import Class`)
### Testing
```bash
# Run tests (when implemented)
pytest tests/
# Format code
black src/flync/
# Type checking
mypy src/flync/
```
### Workflow for Contributions
1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Make your changes with clear commit messages
4. Ensure code passes style checks and tests
5. Update documentation if needed
6. Submit a pull request to the `master` branch
### Reporting Issues
- Use GitHub Issues: https://github.com/homemlab/flync/issues
- Include:
- FLYNC version (`flync --version`)
- Operating system and version
- Minimal reproducible example
- Error messages and logs
## License
MIT License - see [LICENSE](LICENSE) file for details.
Raw data
{
"_id": null,
"home_page": null,
"name": "flync",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.11",
"maintainer_email": null,
"keywords": "bioinformatics, lncRNA, genomics, machine-learning, RNA-seq",
"author": "FLYNC Contributors",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/6f/16/4d0be91ec0f0e0245ac1198e5762e08bdf08eaf849f4e815f0c2a38dec6f/flync-1.0.2.tar.gz",
"platform": null,
"description": "\n\n# FLYNC - FLY Non-Coding gene discovery & classification\n\n## TL;DR (Quick Start)\n\nInstall (conda recommended):\n```bash\nconda create -n flync -c bioconda -c conda-forge flync\nconda activate flync\n```\n\nSetup genome (dm6):\n```bash\nflync setup --genome-dir genome\n```\n\nGenerate config template and edit paths:\n```bash\nflync config --template --output config.yaml\n# Set: samples: null and fastq_dir: \"/path/to/fastq\"\n```\n\nRun bioinformatics (auto-detect FASTQs):\n```bash\nflync run-bio -c config.yaml -j 8\n```\n\nPredict lncRNAs (novel transcripts):\n```bash\nflync run-ml -g results/assemblies/merged-new-transcripts.gtf \\\n -o results/lncrna_predictions.csv -r genome/genome.fa -t 8\n```\n\nAll-in-one:\n```bash\nflync run-all -c config.yaml -j 8\n```\n\nEssential outputs:\n- results/assemblies/merged-new-transcripts.gtf (novel)\n- results/lncrna_predictions.csv (predictions)\n- results/dge/... (if metadata CSV with condition provided)\n\nNeed help? Run:\n```bash\nflync run-bio -c config.yaml --dry-run\n```\n\n## Minimal Conceptual Overview\n\n1. Input: FASTQs (local) or SRA IDs (via metadata CSV).\n2. Snakemake workflow builds transcriptome and isolates novel transcripts.\n3. Feature engine converts GTF + genome into a model-ready feature matrix.\n4. Pre-trained model classifies lncRNA vs protein-coding; outputs probabilities.\n5. Optional differential expression if conditions provided.\n\n## When to Use Which Command\n\n- run-bio: You only need assemblies.\n- run-ml: You have a GTF and want predictions.\n- run-all: End-to-end (recommended for new users).\n- setup: Prepare genome and indices once.\n- config: Generate or validate config.yaml.\n\n## Common Pitfalls (Fast Answers)\n\n| Issue | Fix |\n|-------|-----|\n| samples: null fails | Ensure fastq_dir is set |\n| Snakefile not found | pip install -e . |\n| Missing genome index | Re-run flync setup |\n| All predictions identical | Check feature extraction logs |\n| DGE missing | Ensure metadata CSV has header + condition column |\n\nFull detailed documentation continues below.\n\n---\n\n## Table of Contents\n\n- [TL;DR (Quick Start)](#tldr-quick-start)\n- [Minimal Conceptual Overview](#minimal-conceptual-overview)\n- [When to Use Which Command](#when-to-use-which-command)\n- [Common Pitfalls (Fast Answers)](#common-pitfalls-fast-answers)\n- [Pipeline Overview](#pipeline-overview)\n- [Key Features](#key-features)\n- [Installation](#installation)\n- [Quick Start](#quick-start)\n- [Usage Guide](#usage-guide)\n - [1. Setup Reference Genome](#1-setup-reference-genome)\n - [2. Configure Pipeline](#2-configure-pipeline)\n - [Sample Specification (3 Modes)](#sample-specification-3-modes)\n - [3. Run Bioinformatics Pipeline](#3-run-bioinformatics-pipeline)\n - [4. Run ML Prediction](#4-run-ml-prediction)\n - [5. Run Complete Pipeline (Recommended)](#5-run-complete-pipeline-recommended)\n - [6. Differential Gene Expression (DGE)](#6-differential-gene-expression-dge)\n - [7. Python API Usage](#7-python-api-usage)\n- [Pipeline Architecture](#pipeline-architecture)\n- [Advanced Usage](#advanced-usage)\n- [Troubleshooting](#troubleshooting)\n- [Contributing](#contributing)\n- [License](#license)\n\n---\n\n## Pipeline Overview\n\nFLYNC executes a complete lncRNA discovery workflow with three main execution modes:\n\n### Phase 1: Bioinformatics Pipeline (`flync run-bio`)\nRun only the RNA-seq processing and assembly:\n1. **Read Mapping** - Align RNA-seq reads to reference genome using HISAT2\n2. **Transcriptome Assembly** - Reconstruct transcripts per sample with StringTie\n3. **Assembly Merging** - Create unified transcriptome with gffcompare\n4. **Novel Transcript Extraction** - Identify transcripts not in reference annotation\n5. **Quantification** - Calculate expression levels per transcript\n6. **DGE Analysis** (optional) - Differential expression with Ballgown when metadata.csv provided\n\n### Phase 2: ML Prediction (`flync run-ml`)\nRun only the machine learning classification:\n1. **Feature Extraction** - Extract multi-modal genomic features\n2. **Feature Cleaning** - Standardize and prepare features for ML\n3. **ML Classification** - Predict lncRNA candidates using trained EBM model\n4. **Confidence Scoring** - Provide prediction probabilities and confidence scores\n\n### Complete Pipeline (`flync run-all`)\nRun the entire workflow end-to-end with a single command\n\n---\n\n## Key Features\n\n**Complete End-to-End Pipeline** - Single `flync run-all` command for full workflow \n**Unified Environment** - All dependencies managed in single `environment.yml` \n**Differential Expression** - Integrated Ballgown DGE analysis for condition comparisons \n**Public Python API** - Use FLYNC programmatically in custom workflows \n**Flexible Input Modes** - Auto-detect samples from FASTQ directory or use sample lists \n**Snakemake Orchestration** - Robust workflow management with automatic parallelization \n**Comprehensive Features** - 100+ genomic features from multiple data sources \n**Intelligent Caching** - Downloads and caches remote genomic tracks automatically \n**Production-Ready Models** - Pre-trained EBM classifier with high accuracy \n**Multi-Stage Docker** - Runtime and pre-warmed images for flexible deployment \n**Python 3.11** - Modern Python codebase with type hints and comprehensive documentation \n\n---\n\n## Installation\n\n### Overview\n\nChoose an installation method based on what you need:\n\n| Use Case | Recommended Method | Command |\n|----------|-------------------|---------|\n| Full pipeline (alignment + assembly + ML) | Conda (base) | `conda create -n flync -c bioconda -c conda-forge flync` |\n| Add differential expression (Ballgown) | Conda add-on | `conda install -n flync flync-dge` |\n| ML / feature extraction only (no aligners) | pip + extras | `pip install flync[features,ml]` |\n| Programmatic Snakemake orchestration (no bio tools) | pip minimal + workflow | `pip install flync[workflow]` |\n| Reproducible container execution | Docker (runtime) | `docker pull ghcr.io/homemlab/flync:latest` |\n| Faster startup with pre-cached tracks | Docker (prewarmed) | `docker pull ghcr.io/homemlab/flync:latest-prewarmed` |\n\n### Option 1: Conda (Recommended \u2013 Full Stack)\n\n```bash\nconda create -n flync -c bioconda -c conda-forge flync\nconda activate flync\nflync --help\n```\n\nAdd DGE support (Ballgown + R stack):\n```bash\nconda install -n flync flync-dge # after base install\n```\n\nOr install both at once:\n```bash\nconda create -n flync -c bioconda -c conda-forge flync flync-dge\n```\n\n### Option 2: pip (Python-Only / Lightweight)\n\nPip will NOT install external bioinformatics binaries (HISAT2, StringTie, samtools, etc.). Use this only for feature extraction or ML inference on an existing GTF.\n\n```bash\npython -m venv flync-venv\nsource flync-venv/bin/activate\npip install --upgrade pip\n\n# Feature extraction + ML\npip install \"flync[features,ml]\"\n\n# Add Snakemake lightweight orchestration (still no external binaries)\npip install \"flync[workflow]\"\n\nflync run-ml --help\n```\n\nIf you attempt `flync run-bio` without the required external tools, FLYNC will explain what is missing and how to install via conda.\n\n### Option 3: Docker\n\nRuntime image (downloads tracks on demand):\n```bash\ndocker pull ghcr.io/homemlab/flync:latest\ndocker run --rm -v $PWD:/work ghcr.io/homemlab/flync:latest \\\n flync --help\n```\n\nPrewarmed image (tracks pre-cached):\n```bash\ndocker pull ghcr.io/homemlab/flync:latest-prewarmed\n```\n\n### Which Should I Pick?\n\n| Scenario | Choose |\n|----------|-------|\n| New user, want everything | Conda base (add `flync-dge` if doing DGE) |\n| HPC / cluster with module rules | Conda (export env YAML for reproducibility) |\n| Notebook exploratory ML only | pip extras (`features,ml`) |\n| CI / workflow integration | Docker runtime image |\n| Need fastest repeated ML runs | Docker prewarmed image |\n\n### External Tool Summary\n\nThe following are ONLY installed automatically via the Conda packages (`flync`, `flync-dge`):\n```\nhisat2, stringtie, gffcompare, gffread, samtools, bedtools, sra-tools,\nR (r-base), bioconductor-ballgown, r-matrixstats, r-ggplot2\n```\nPip installations will perform a dependency sanity check and abort `run-bio` if these are missing (unless `--skip-deps-check` is used).\n\n### Development Install (Editable)\n\n```bash\ngit clone https://github.com/homemlab/flync.git\ncd flync\nconda env create -f environment.yml\nconda activate flync\npip install -e .\n```\n\n### Prerequisites\n\n- **Operating System**: Linux (tested on Debian/Ubuntu)\n- **Conda/Mamba**: Required for managing dependencies\n- **System Requirements**:\n - 8+ GB RAM (16+ GB recommended for large datasets)\n - 20+ GB disk space (genome, indices, and tracks)\n - 4+ CPU cores (8+ recommended)\n\n### Install from Source (Full + Editable)\n\nFor development or if you need the latest unreleased features:\n\n```bash\n# 1. Clone the repository\ngit clone https://github.com/homemlab/flync.git\ncd flync\ngit checkout master # Use the master branch (production)\n\n# 2. Create conda environment with dependencies\nconda env create -f environment.yml\n\n# 3. Activate environment\nconda activate flync\n\n# 4. Install package in development mode\npip install -e .\n\n# 5. Verify installation\nflync --help\n```\n\nDocker image details moved above for quick discovery.\n\n---\n\n## Quick Start\n\n**Complete workflow with `run-all` command:**\n\n```bash\n# 1. Activate conda environment\nconda activate flync\n\n# 2. Download genome and build indices\nflync setup --genome-dir genome\n\n# 3. Create configuration file\nflync config --template --output config.yaml\n\n# 4. Edit config.yaml with your paths and settings\n# See config_example_full.yaml for all available options\n\n# 5. Create metadata.csv with sample information (MUST have header row!)\ncat > metadata.csv << EOF\nsample_id,condition,replicate\nSRR123456,control,1\nSRR123457,control,2\nSRR123458,treatment,1\nSRR123459,treatment,2\nEOF\n\n# 6. Update config.yaml to use metadata.csv\n# Change: samples: null\n# To: samples: metadata.csv\n\n# 7. Run complete pipeline (bioinformatics + ML + DGE)\nflync run-all --configfile config.yaml --cores 8\n```\n\n**Alternative: Step-by-step workflow:**\n\n```bash\n# Run bioinformatics pipeline only\nflync run-bio --configfile config.yaml --cores 8\n\n# Then run ML prediction\nflync run-ml \\\n --gtf results/assemblies/merged-new-transcripts.gtf \\\n --output results/lncrna_predictions.csv \\\n --ref-genome genome/genome.fa \\\n --threads 8\n```\n\n**Python API Usage:**\n\n```python\nfrom flync import run_pipeline\nfrom pathlib import Path\n\n# Run complete pipeline programmatically\nresult = run_pipeline(\n config_path=Path(\"config.yaml\"),\n cores=8,\n ml_threads=8\n)\n\nprint(f\"Status: {result['status']}\")\nprint(f\"Predictions: {result['predictions_file']}\")\n```\n\n**Output:**\n- `results/assemblies/merged.gtf` - Full transcriptome (reference + novel)\n- `results/assemblies/merged-new-transcripts.gtf` - Novel transcripts only\n- `results/cov/` - Per-sample quantification files\n- `results/dge/` - Differential expression analysis (if metadata.csv provided)\n - `transcript_dge_results.csv` - Transcript-level DE results\n - `gene_dge_results.csv` - Gene-level DE results\n - `dge_summary.csv` - Summary statistics\n- `results/lncrna_predictions.csv` - lncRNA predictions with confidence scores\n\n---\n\n## Usage Guide\n\n### 1. Setup Reference Genome\n\nDownload *Drosophila melanogaster* genome (BDGP6.32/dm6) and build HISAT2 index:\n\n```bash\nflync setup --genome-dir genome\n```\n\n**What this does:**\n- Downloads genome FASTA from Ensembl (release 106)\n- Downloads gene annotation GTF\n- Builds HISAT2 index (~10 minutes, requires ~4GB RAM)\n- Extracts splice sites for splice-aware alignment\n\n**Skip download if files exist:**\n```bash\nflync setup --genome-dir genome --skip-download\n```\n\n### 2. Configure Pipeline\n\nGenerate a configuration template:\n\n```bash\nflync config --template --output config.yaml\n```\n\n**Edit `config.yaml`** with your settings:\n\n```yaml\n# Sample specification (3 options - see below)\nsamples: null # Auto-detect from fastq_dir\nfastq_dir: \"/path/to/fastq/files\" # Directory with FASTQ files\nfastq_paired: false # true for paired-end, false for single-end\n\n# Reference files (created by 'flync setup')\ngenome: \"genome/genome.fa\"\nannotation: \"genome/genome.gtf\"\nhisat_index: \"genome/genome.idx\"\nsplice_sites: \"genome/genome.ss\"\n\n# Output and resources\noutput_dir: \"results\"\nthreads: 8\n\n# Tool parameters (optional)\nparams:\n hisat2: \"-p 8 --dta --dta-cufflinks\"\n stringtie_assemble: \"-p 8\"\n stringtie_merge: \"\"\n stringtie_quantify: \"-eB\"\n download_threads: 4 # For SRA downloads\n```\n\n#### Sample Specification (3 Modes)\n\n**Mode 1: Auto-detect from FASTQ directory (Recommended)**\n```yaml\nsamples: null # Must be null to enable auto-detection\nfastq_dir: \"/path/to/fastq\"\nfastq_paired: false\n```\n\nAutomatically detects samples from filenames:\n- **Paired-end**: `sample1_1.fastq.gz` + `sample1_2.fastq.gz` \u2192 detects `sample1`\n- **Single-end**: `sample1.fastq.gz` \u2192 detects `sample1`\n\n**Mode 2: Plain text list**\n```yaml\nsamples: \"samples.txt\"\nfastq_dir: \"/path/to/fastq\" # Optional if using SRA\n```\n\n`samples.txt`:\n```\nsample1\nsample2\nsample3\n```\n\n**Mode 3: CSV with metadata (for differential expression)**\n```yaml\nsamples: \"metadata.csv\"\nfastq_dir: \"/path/to/fastq\" # Optional if using SRA\n```\n\n`metadata.csv`:\n```csv\nsample_id,condition,replicate\nsample1,control,1\nsample2,control,2\nsample3,treated,1\n```\n\n**\u26a0\ufe0f Important:** When using CSV metadata, the header row with column names (`sample_id`, `condition`) is **required**. Without headers, the DGE analysis will fail. The `sample_id` column is mandatory for sample identification, and the `condition` column is required to enable differential expression analysis.\n\n### 3. Run Bioinformatics Pipeline\n\nExecute the complete RNA-seq workflow:\n\n```bash\nflync run-bio --configfile config.yaml --cores 8\n```\n\n**What happens:**\n1. **Read Mapping**: HISAT2 aligns reads to genome (splice-aware)\n2. **Assembly**: StringTie reconstructs transcripts per sample\n3. **Merging**: Combines assemblies into unified transcriptome\n4. **Comparison**: gffcompare identifies novel vs. known transcripts\n5. **Quantification**: StringTie calculates TPM and FPKM values\n\n**Input Modes:**\n\n**A. Local FASTQ files** (set `fastq_dir` in config)\n```bash\nflync run-bio --configfile config.yaml --cores 8\n```\n\n**B. SRA accessions** (omit `fastq_dir`, provide SRA IDs in samples)\n```csv\n# samples.csv\nsample_id,condition,replicate\nSRR1234567,control,1\nSRR1234568,treated,1\n```\n\nSRA files are automatically downloaded using `prefetch` + `fasterq-dump`.\n\n**Useful Options:**\n```bash\n# Dry run - show what would be executed\nflync run-bio -c config.yaml --dry-run\n\n# Unlock after crash\nflync run-bio -c config.yaml --unlock\n\n# More cores for faster processing\nflync run-bio -c config.yaml --cores 16\n```\n\n**Output Structure:**\n```\nresults/\n\u251c\u2500\u2500 data/ # Alignment files\n\u2502 \u2514\u2500\u2500 {sample}/\n\u2502 \u2514\u2500\u2500 {sample}.sorted.bam\n\u251c\u2500\u2500 assemblies/\n\u2502 \u251c\u2500\u2500 stringtie/ # Per-sample assemblies\n\u2502 \u2502 \u2514\u2500\u2500 {sample}.rna.gtf\n\u2502 \u251c\u2500\u2500 merged.gtf # Unified transcriptome\n\u2502 \u251c\u2500\u2500 merged-new-transcripts.gtf # Novel transcripts only\n\u2502 \u2514\u2500\u2500 assembled-new-transcripts.fa # Novel transcript sequences\n\u251c\u2500\u2500 gffcompare/\n\u2502 \u2514\u2500\u2500 gffcmp.stats # Assembly comparison stats\n\u251c\u2500\u2500 cov/ # Expression quantification\n\u2502 \u2514\u2500\u2500 {sample}/\n\u2502 \u2514\u2500\u2500 {sample}.rna.gtf\n\u2514\u2500\u2500 logs/ # Per-rule log files\n```\n\n### 4. Run ML Prediction\n\nClassify novel transcripts as lncRNA or protein-coding:\n\n```bash\nflync run-ml \\\n --gtf results/assemblies/merged-new-transcripts.gtf \\\n --output results/lncrna_predictions.csv \\\n --ref-genome genome/genome.fa \\\n --threads 8\n```\n\n**Required Arguments:**\n- `--gtf`, `-g`: Input GTF file (novel transcripts or full assembly)\n- `--output`, `-o`: Output CSV file for predictions\n- `--ref-genome`, `-r`: Reference genome FASTA file\n\n- `--output`, `-o`: Output CSV file for predictions\n- `--ref-genome`, `-r`: Reference genome FASTA file\n\n**Optional Arguments:**\n- `--model`, `-m`: Custom trained model (default: bundled EBM model)\n- `--bwq-config`: Custom BigWig track configuration\n- `--threads`, `-t`: Number of threads (default: 8)\n- `--cache-dir`: Cache directory for downloaded tracks (default: `./bwq_tracks`)\n- `--clear-cache`: Clear cache before starting\n\n**What happens:**\n1. **Sequence Extraction**: Extracts spliced transcript sequences from GTF\n2. **K-mer Profiling**: Calculates 3-12mer frequencies with TF-IDF + SVD\n3. **BigWig Query**: Queries 50+ genomic tracks (chromatin, conservation, etc.)\n4. **Structure Prediction**: Calculates RNA minimum free energy\n5. **Feature Cleaning**: Standardizes features and aligns with model schema\n6. **ML Prediction**: Classifies using pre-trained EBM model\n\n**Output Format (`lncrna_predictions.csv`):**\n```csv\ntranscript_id,prediction,confidence,probability_lncrna\nMSTRG.1.1,1,0.95,0.95\nMSTRG.1.2,0,0.87,0.13\nMSTRG.2.1,1,0.89,0.89\n```\n\n**Column Descriptions:**\n- `transcript_id`: Transcript identifier from GTF\n- `prediction`: 1 = lncRNA, 0 = protein-coding\n- `confidence`: Model confidence score (0-1)\n- `probability_lncrna`: Probability of being lncRNA (0-1)\n\n**Filter high-confidence lncRNAs:**\n```bash\n# Get lncRNAs with >90% confidence\nawk -F',' '$3 > 0.90 && $2 == 1' results/lncrna_predictions.csv > high_conf_lncrnas.csv\n```\n\n### 5. Run Complete Pipeline (Recommended)\n\nExecute both bioinformatics and ML prediction with a single command:\n\n```bash\nflync run-all --configfile config.yaml --cores 8\n```\n\n**Unified Configuration:**\n\n```yaml\n# Bioinformatics settings\nsamples: metadata.csv\ngenome: genome/genome.fa\nannotation: genome/genome.gtf\nhisat_index: genome/genome.idx\noutput_dir: results\nthreads: 8\n\n# ML settings (required for run-all)\nml_reference_genome: genome/genome.fa\nml_output_file: results/lncrna_predictions.csv\nml_bwq_config: config/bwq_config.yaml # Optional\nml_cache_dir: /path/to/cache # Optional\n```\n\n**What happens:**\n1. Runs bioinformatics pipeline (`flync run-bio`)\n2. Automatically detects output GTF (`results/assemblies/merged-new-transcripts.gtf`)\n3. Runs ML prediction on novel transcripts\n4. Generates DGE analysis if `metadata.csv` has condition column\n\n**Options:**\n```bash\n# Skip bioinformatics (use existing GTF)\nflync run-all -c config.yaml --skip-bio\n\n# Skip ML prediction (only run bioinformatics)\nflync run-all -c config.yaml --skip-ml\n\n# Dry run to see what would be executed\nflync run-all -c config.yaml --dry-run\n\n# Custom thread allocation\nflync run-all -c config.yaml --cores 16 --ml-threads 8\n```\n\n### 6. Differential Gene Expression (DGE)\n\nRun DGE analysis using Ballgown when metadata with conditions is provided:\n\n**Requirements:**\n- `samples` config key points to a CSV file (not TXT)\n- CSV **must have a header row** with column names\n- CSV **must contain** `sample_id` column (for sample identification)\n- CSV **must contain** `condition` column (for grouping samples in DGE)\n\n**Example metadata.csv:**\n```csv\nsample_id,condition,replicate\nSRR123456,control,1\nSRR123457,control,2\nSRR123458,treatment,1\nSRR123459,treatment,2\n```\n\n**\u26a0\ufe0f Critical:** The header row is **not optional**. If you omit it or have a headerless CSV, the DGE analysis will fail with an error about missing the `sample_id` column.\n\n**DGE runs automatically** when using `flync run-bio` or `flync run-all` with metadata CSV.\n\n**Output Files:**\n```\nresults/dge/\n\u251c\u2500\u2500 transcript_dge_results.csv # Transcript-level differential expression\n\u251c\u2500\u2500 gene_dge_results.csv # Gene-level differential expression\n\u251c\u2500\u2500 dge_summary.csv # Analysis summary statistics\n\u251c\u2500\u2500 transcript_ma_plot.png # MA plot visualization\n\u2514\u2500\u2500 ballgown_dge.log # Analysis log\n```\n\n**DGE Results Format:**\n```csv\nid,pval,qval,fc,gene_name,gene_id\nMSTRG.1.1,0.001,0.01,2.5,gene_A,FBgn0001\nMSTRG.1.2,0.05,0.12,1.8,gene_B,FBgn0002\n```\n\n**Filter significant transcripts:**\n```bash\n# Get transcripts with FDR < 0.05\nawk -F',' '$3 < 0.05' results/dge/transcript_dge_results.csv > significant_de.csv\n```\n\n### 7. Python API Usage\n\nUse FLYNC programmatically in custom workflows:\n\n```python\nfrom flync import run_pipeline, run_bioinformatics, run_ml_prediction\nfrom pathlib import Path\n\n# Run complete pipeline\nresult = run_pipeline(\n config_path=Path(\"config.yaml\"),\n cores=8,\n ml_threads=8,\n verbose=True\n)\n\nif result['status'] == 'success':\n print(f\"\u2713 Pipeline completed!\")\n print(f\" Predictions: {result['predictions_file']}\")\n print(f\" Output directory: {result['output_dir']}\")\n```\n\n**Run only bioinformatics:**\n```python\nfrom flync import run_bioinformatics\n\nresult = run_bioinformatics(\n config_path=Path(\"config.yaml\"),\n cores=16,\n verbose=True\n)\n```\n\n**Run only ML prediction:**\n```python\nfrom flync import run_ml_prediction\n\nresult = run_ml_prediction(\n gtf_file=Path(\"merged.gtf\"),\n output_file=Path(\"predictions.csv\"),\n ref_genome=Path(\"genome.fa\"),\n threads=8,\n verbose=True\n)\n\nprint(f\"Predicted {result['n_lncrna']} lncRNAs\")\n```\n\n**Integration in larger workflows:**\n```python\nimport flync\n\n# Part of a larger analysis pipeline\ndef analyze_rnaseq_data(sample_dir, output_dir):\n # Run FLYNC\n result = flync.run_pipeline(\n config_path=create_config(sample_dir, output_dir),\n cores=8\n )\n \n # Continue with downstream analyses\n if result['status'] == 'success':\n lncrnas = pd.read_csv(result['predictions_file'])\n perform_enrichment_analysis(lncrnas)\n generate_report(lncrnas, result['output_dir'])\n```\n\n---\n\n## Pipeline Architecture\n\nFLYNC follows a modular Python-first architecture with unified CLI:\n\n```\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502 CLI Layer (click) \u2502\n\u2502 flync run-all | run-bio | run-ml | setup | config \u2502\n\u2502 + Public Python API (flync.run_pipeline) \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n \u2502 \u2502\n \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u25bc\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u25bc\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n \u2502 Bioinformatics \u2502 \u2502 ML Prediction \u2502\n \u2502 (Snakemake) \u2502 \u2502 (Python) \u2502\n \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n \u2502 \u2502\n \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u25bc\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u25bc\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n \u2502 Workflow Rules \u2502 \u2502 Feature Extraction \u2502\n \u2502 - mapping.smk \u2502 \u2502 - feature_wrapper \u2502\n \u2502 - assembly.smk \u2502 \u2502 - bwq, kmer, mfe \u2502\n \u2502 - merge.smk \u2502 \u2502 - cleaning \u2502\n \u2502 - quantify.smk \u2502 \u2502 \u2502\n \u2502 - dge.smk \u2502 \u2502 \u2502\n \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n \u2502\n \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u25bc\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n \u2502 ML Predictor \u2502\n \u2502 - EBM model \u2502\n \u2502 - Schema validator \u2502\n \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n```\n\n### Core Components\n\n**1. CLI (`src/flync/cli.py`) & API (`src/flync/api.py`)**\n- Single unified command with 5 subcommands: `run-all`, `run-bio`, `run-ml`, `setup`, `config`\n- New `run-all` orchestrates complete pipeline end-to-end\n- Public Python API for programmatic access\n- Custom error handling and helpful messages\n- Absolute path resolution for file operations\n\n**2. Workflows (`src/flync/workflows/`)**\n- **Snakefile**: Main workflow orchestrator with conditional DGE\n- **rules/mapping.smk**: HISAT2 alignment, SRA download, FASTQ symlinking\n- **rules/assembly.smk**: StringTie per-sample assembly\n- **rules/merge.smk**: StringTie merge + gffcompare\n- **rules/quantify.smk**: Expression quantification\n- **rules/dge.smk**: Ballgown differential expression\n- **scripts/ballgown_dge.R**: R script for Ballgown DGE analysis\n- **scripts/predownload_tracks.py**: Docker image track pre-caching\n\n**3. Feature Extraction (`src/flync/features/`)**\n- **feature_wrapper.py**: High-level orchestration\n- **bwq.py**: BigWig/BigBed track querying\n- **kmer.py**: K-mer profiling with TF-IDF and SVD\n- **mfe.py**: RNA secondary structure (MFE calculation)\n- **feature_cleaning.py**: Data preparation and schema alignment\n\n**4. ML Prediction (`src/flync/ml/`)**\n- **predictor.py**: Main prediction interface\n- **ebm_predictor.py**: EBM model wrapper\n- **schema_validator.py**: Feature schema validation\n\n**5. Utilities (`src/flync/utils/`)**\n- **kmer_redux.py**: K-mer transformation utilities\n- **progress.py**: Progress bar management\n\n**6. Assets (`src/flync/assets/`)**\n- Pre-trained EBM models and scalers\n- Model schema definitions\n\n**7. Configuration (`src/flync/config/`)**\n- **bwq_config.yaml**: Default BigWig track configuration\n\n---\n\n## Advanced Usage\n\n### Custom BigWig Track Configuration\n\nCreate a custom `bwq_config.yaml` to query your own tracks:\n\n```yaml\n# List of BigWig/BigBed files to query\n- path: /path/to/custom_track.bigWig\n upstream: 1000 # Extend region upstream\n downstream: 1000 # Extend region downstream\n stats:\n - stat: mean\n name: custom_mean\n - stat: max\n name: custom_max\n - stat: coverage\n name: custom_coverage\n\n- path: https://example.com/remote_track.bigBed\n stats:\n - stat: coverage\n name: remote_coverage\n - stat: extract_names\n name: remote_names\n name_field_index: 3 # For BigBed name extraction\n```\n\n**Available Statistics:**\n- `mean`, `max`, `min`, `sum`: Numerical summaries\n- `std`: Standard deviation\n- `coverage`: Fraction of region covered by signal\n- `extract_names`: Extract names from BigBed entries\n\nUse with ML prediction:\n```bash\nflync run-ml --gtf input.gtf --output predictions.csv \\\n --ref-genome genome.fa --bwq-config custom_bwq_config.yaml\n```\n\n### Feature Extraction Only\n\nExtract features without running prediction:\n\n```bash\npython src/flync/features/feature_wrapper.py all \\\n --gtf annotations.gtf \\\n --ref-genome genome.fa \\\n --bwq-config config/bwq_config.yaml \\\n --k-min 3 --k-max 12 \\\n --use-tfidf --use-dim-redux --redux-n-components 1 \\\n --output features.parquet\n```\n\n### Training Custom Models\n\n**1. Prepare training data:**\n```bash\n# Split positive and negative samples\npython src/flync/optimizer/prepare_data.py \\\n --positive-file lncrna_features.parquet \\\n --negative-file protein_coding_features.parquet \\\n --output-dir datasets/ \\\n --train-size 0.7 --val-size 0.15 --test-size 0.15\n```\n\n**2. Optimize hyperparameters:**\n```bash\npython src/flync/optimizer/hyperparameter_optimizer.py \\\n --train-data datasets/train.parquet \\\n --test-data datasets/test.parquet \\\n --holdout-data datasets/holdout.parquet \\\n --model-type randomforest \\\n --optimization-metrics precision f1 \\\n --n-trials 100 \\\n --experiment-name \"Custom_RF_Model\"\n```\n\n**3. View results in MLflow UI:**\n```bash\nmlflow ui --backend-store-uri sqlite:///mlflow.db\n# Open http://localhost:5000\n```\n\n**4. Extract model schema for inference:**\n```bash\npython src/flync/ml/schema_extractor.py \\\n --model-path best_model.pkl \\\n --training-data datasets/train.parquet \\\n --output-schema model_schema.json\n```\n\n### Docker Deployment\n\n**Build custom image:**\n```bash\ndocker build -t my-flync:latest -f Dockerfile .\n```\n\n**Run with mounted volumes:**\n```bash\ndocker run --rm \\\n -v $PWD/data:/data \\\n -v $PWD/genome:/genome \\\n -v $PWD/results:/results \\\n my-flync:latest \\\n flync run-bio -c /data/config.yaml --cores 8\n```\n\n**Interactive shell:**\n```bash\ndocker run -it --rm -v $PWD:/work my-flync:latest /bin/bash\n```\n\n---\n\n## Troubleshooting\n\n### Installation Issues\n\n**Problem**: `command not found: flync`\n```bash\n# Solution: Activate conda environment\nconda activate flync\n\n# Verify installation\nwhich flync\nflync --version\n```\n\n**Problem**: `Snakefile not found` when running `flync run-bio`\n```bash\n# Solution: Reinstall package in editable mode\npip install -e .\n```\n\n**Problem**: Missing bioinformatics tools (hisat2, stringtie, etc.)\n```bash\n# Solution: Recreate conda environment\nconda env remove -n flync\nconda env create -f environment.yml\nconda activate flync\n```\n\n### Pipeline Execution Issues\n\n**Problem**: HISAT2 index build fails\n```bash\n# Check available disk space (needs ~10GB)\ndf -h\n\n# Check available memory (needs ~4GB)\nfree -h\n\n# Check logs\ncat genome/idx.err.txt\n```\n\n**Problem**: SRA download hangs or fails\n```bash\n# Solution 1: Reduce download threads in config.yaml\nparams:\n download_threads: 2 # Instead of 4\n\n# Solution 2: Pre-download SRA files manually\nprefetch SRR1234567\nfasterq-dump SRR1234567 --outdir fastq/\n```\n\n**Problem**: Snakemake workflow crashes\n```bash\n# Unlock working directory\nflync run-bio -c config.yaml --unlock\n\n# Check logs for specific rule\ntail -f results/logs/hisat2/sample1.log\n\n# Rerun with verbose output\nflync run-bio -c config.yaml --cores 8 --dry-run --printshellcmds\n```\n\n**Problem**: `samples: null` fails\n```bash\n# Solution: Must also set fastq_dir in config.yaml\nsamples: null\nfastq_dir: \"/path/to/fastq\" # Required for auto-detection\nfastq_paired: false\n```\n\n### Feature Extraction Issues\n\n**Problem**: Feature extraction fails with \"track not accessible\"\n```bash\n# Solution: Check internet connection (tracks downloaded from UCSC/Ensembl)\nwget -q --spider http://genome.ucsc.edu\necho $? # Should be 0\n\n# Clear cache and retry\nflync run-ml --gtf input.gtf --clear-cache ...\n```\n\n**Problem**: \"No sequences available for downstream feature generation\"\n```bash\n# Solution 1: Verify GTF has transcript and exon features\ngrep -c 'transcript' input.gtf\ngrep -c 'exon' input.gtf\n\n# Solution 2: Check reference genome is accessible\nls -lh genome/genome.fa\nsamtools faidx genome/genome.fa # Build index if missing\n```\n\n**Problem**: \"kmer_redux utilities not available\"\n```bash\n# Solution: Verify utils module is installed\npython -c \"from flync.utils import kmer_redux; print('OK')\"\n\n# Reinstall if needed\npip install -e .\n```\n\n### ML Prediction Issues\n\n**Problem**: \"schema mismatch\" error during prediction\n```bash\n# Solution: Feature transformations must match training\n# Ensure these flags are set correctly:\nflync run-ml --gtf input.gtf --output predictions.csv \\\n --ref-genome genome.fa\n# (Default model expects: use_tfidf=True, use_dim_redux=True, redux_n_components=1)\n```\n\n**Problem**: Predictions all 0 or all 1\n```bash\n# Solution 1: Check input GTF quality\n# Ensure transcripts are complete and have exons\n\n# Solution 2: Verify feature extraction succeeded\n# Check for warnings in logs\n\n# Solution 3: Use different model or retrain\nflync run-ml --gtf input.gtf --model custom_model.pkl ...\n```\n\n**Problem**: Out of memory during feature extraction\n```bash\n# Solution 1: Reduce threads\nflync run-ml --threads 4 ...\n\n# Solution 2: Process in smaller batches\n# Split GTF and process separately\n\n# Solution 3: Use sparse k-mer format (automatic with default settings)\n```\n\n### Docker Issues\n\n**Problem**: Docker permission denied\n```bash\n# Solution 1: Add user to docker group\nsudo usermod -aG docker $USER\nnewgrp docker\n\n# Solution 2: Run with sudo\nsudo docker run ...\n```\n\n**Problem**: Docker container out of disk space\n```bash\n# Clean up old containers and images\ndocker system prune -a\n\n# Check disk usage\ndocker system df\n```\n\n---\n\n## Contributing\n\nContributions are welcome! Please follow these guidelines:\n\n### Development Setup\n\n```bash\n# Clone and setup development environment\ngit clone https://github.com/homemlab/flync.git\ncd flync\ngit checkout master\n\n# Create development environment\nconda env create -f environment.yml\nconda activate flync\n\n# Install in development mode\npip install -e .\n\n# Optional: Install development dependencies\npip install pytest black flake8 mypy\n```\n\n### Code Style\n\n- **Python**: Follow PEP 8, use Black formatter (line length 100)\n- **Type Hints**: Required for public functions\n- **Docstrings**: Google style for all modules, classes, functions\n- **Imports**: Absolute imports preferred (`from flync.module import Class`)\n\n### Testing\n\n```bash\n# Run tests (when implemented)\npytest tests/\n\n# Format code\nblack src/flync/\n\n# Type checking\nmypy src/flync/\n```\n\n### Workflow for Contributions\n\n1. Fork the repository\n2. Create a feature branch (`git checkout -b feature/amazing-feature`)\n3. Make your changes with clear commit messages\n4. Ensure code passes style checks and tests\n5. Update documentation if needed\n6. Submit a pull request to the `master` branch\n\n### Reporting Issues\n\n- Use GitHub Issues: https://github.com/homemlab/flync/issues\n- Include:\n - FLYNC version (`flync --version`)\n - Operating system and version\n - Minimal reproducible example\n - Error messages and logs\n\n## License\n\nMIT License - see [LICENSE](LICENSE) file for details.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "FLYNC: lncRNA discovery pipeline for Drosophila melanogaster",
"version": "1.0.2",
"project_urls": null,
"split_keywords": [
"bioinformatics",
" lncrna",
" genomics",
" machine-learning",
" rna-seq"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "255f38a589f1bc782e5b8d5d1d474021948e0d64c28a4d16c24f5eb9fcab360e",
"md5": "d2b6ace5f5e0c00b2066899cb3e3ce11",
"sha256": "76af99faba7f2c8f43fa118642639179a9d0c928b4ae5152a1d94f48feb4e9c2"
},
"downloads": -1,
"filename": "flync-1.0.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "d2b6ace5f5e0c00b2066899cb3e3ce11",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.11",
"size": 2417547,
"upload_time": "2025-11-05T20:22:38",
"upload_time_iso_8601": "2025-11-05T20:22:38.012244Z",
"url": "https://files.pythonhosted.org/packages/25/5f/38a589f1bc782e5b8d5d1d474021948e0d64c28a4d16c24f5eb9fcab360e/flync-1.0.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "6f164d0be91ec0f0e0245ac1198e5762e08bdf08eaf849f4e815f0c2a38dec6f",
"md5": "802bd1a4fd59efe7a4ad9117a4bd6fce",
"sha256": "4501884cf4441b59db61c5852c5571195a742e783904fecd09831663f9da8351"
},
"downloads": -1,
"filename": "flync-1.0.2.tar.gz",
"has_sig": false,
"md5_digest": "802bd1a4fd59efe7a4ad9117a4bd6fce",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.11",
"size": 2421213,
"upload_time": "2025-11-05T20:22:39",
"upload_time_iso_8601": "2025-11-05T20:22:39.530670Z",
"url": "https://files.pythonhosted.org/packages/6f/16/4d0be91ec0f0e0245ac1198e5762e08bdf08eaf849f4e815f0c2a38dec6f/flync-1.0.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-11-05 20:22:39",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "flync"
}