dsf-aml-sdk


Namedsf-aml-sdk JSON
Version 1.0.26 PyPI version JSON
download
home_pagehttps://github.com/jaimeajl/dsf-aml-sdk
SummarySDK for DSF Adaptive ML with Knowledge Distillation
upload_time2025-10-08 04:13:55
maintainerNone
docs_urlNone
authorapi-dsfuptech
requires_python>=3.7
licenseNone
keywords dsf aml ml machine-learning distillation adaptive sdk
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # DSF AML SDK

Reduce ML training data requirements by 70–90% through adaptive evaluation and knowledge distillation. Train surrogate models ~10× faster and cut infra costs.

---

## Why DSF AML?

Traditional ML needs thousands of labeled examples and hours of training. DSF AML uses adaptive formula evaluation + knowledge distillation to create fast, lightweight models from domain expertise with minimal data.

---

## Core Concepts

Define weighted evaluation rules from domain knowledge. (Enterprise) adapts parameters over time and (Pro/Ent) can distill into an ultra-fast surrogate (linear) for large-scale or edge inference.

**Non-linear mode note**: the backend expects `data['adjustments_values'] = { field_name: { adj_name: value } }`.

---

## Installation

```bash
pip install dsf-aml-sdk
```

**Custom Backend URL (SDK):**

```python
import os
from dsf_aml_sdk import AMLSDK
sdk = AMLSDK(base_url=os.getenv("DSF_AML_BASE_URL"), tier="community")
```

---

## Quick Start

### Community Edition

```python
from dsf_aml_sdk import AMLSDK

sdk = AMLSDK()  # community

config = (sdk.create_config()
  .add_field('model_accuracy',  default=0.95, importance=2.5, sensitivity=2.0)
  .add_field('training_epochs', default=100,  importance=1.8, sensitivity=1.5)
  .add_field('validation_loss', default=0.05, importance=2.2, sensitivity=2.5)
  .add_field('model_name',      default='baseline', importance=1.0, string_floor=0.1)
)

experiment = {'model_accuracy': 0.96, 'training_epochs': 105, 'validation_loss': 0.048}
result = sdk.evaluate(experiment, config)
print(f"Score: {result.score:.3f}")
```

---

### Professional Edition

```python
sdk = AMLSDK(license_key='PRO-2026-12-31-XXXX', tier='professional')

# Bootstrap config from labeled examples (Professional+)
labeled_examples = [
    {'model_accuracy': 0.92, 'training_epochs': 50, 'label': 1},
    {'model_accuracy': 0.85, 'training_epochs': 30, 'label': 0},
    # ... minimum 20 examples
]
suggested_config = sdk.bootstrap_config(labeled_examples)

# Batch evaluation (limit per request = BATCH_MAX_ITEMS; default 1000)
experiments = [
  {'model_accuracy': 0.92, 'training_epochs': 50,  'validation_loss': 0.08},
  {'model_accuracy': 0.95, 'training_epochs': 100, 'validation_loss': 0.05},
  {'model_accuracy': 0.97, 'training_epochs': 150, 'validation_loss': 0.03},
]

results = sdk.batch_evaluate(experiments, config)
metrics = sdk.get_metrics()  # requires prior evaluate/batch_evaluate with config
```

---

### Enterprise: Pipeline + Distillation

```python
sdk = AMLSDK(license_key='ENT-2026-12-31-XXXX', tier='enterprise')

# 1) Seeds (all tiers)
seeds = sdk.pipeline_identify_seeds(dataset=training_data, config=config, top_k_percent=0.1)

# 2) Critical generation
# - Community: demo (1 variant)
# - Professional: rejected by backend (upgrade to Enterprise)
# - Enterprise: full generation
gen = sdk.pipeline_generate_critical(config=config, original_dataset=training_data)

# 3) Full cycle (Enterprise)
full = sdk.pipeline_full_cycle(dataset=training_data, config=config, max_iterations=3)

# 4) Distillation (train/predict Pro+; export Enterprise only)
sdk.distill_train(config, samples=1000, batch_size=100, seed=42)
fast_score = sdk.distill_predict(data=some_item, config=config)
artifact = sdk.distill_export()   # Enterprise only
```

---

## Rate Limits

| Tier         | Evaluations/Day | Batch Size                              | Seeds Preview                    |
|--------------|-----------------|-----------------------------------------|----------------------------------|
| Community    | 100             | ❌ Not available                         | Configurable (default: 10)      |
| Professional | 10,000          | ✅ Up to `BATCH_MAX_ITEMS` (default 1000) | Unlimited                      |
| Enterprise   | Unlimited       | ✅ Up to `BATCH_MAX_ITEMS` (default 1000) | Unlimited                      |

---

## Pipeline 2-in-1: Decision Boundary Focus

Reduce dataset 70–90% preserving information at decision boundaries.

```python
# Seeds (all tiers). Cache 1h.
seeds_result = sdk.pipeline_identify_seeds(dataset=training_data, config=config, top_k_percent=0.1)
print("Seeds:", seeds_result['seeds_count'])

variants_result = sdk.pipeline_generate_critical(
  config=config,
  original_dataset=training_data,
  k_variants=5,
  epsilon=0.05,            # auto-tune based on previous acceptance rate
  non_critical_ratio=0.15,
  diversity_threshold=0.95,
  max_seeds_to_process=100
)
print("Generated:", variants_result['total_generated'])
```

**Auto-tuning (Enterprise)**: epsilon adjusts based on previous acceptance rate (stored 1h in `acc_rate:{license}`).
- If rate < 1% → epsilon += 0.02 (≤ 0.25)
- If rate > 30% → epsilon -= 0.02 (≥ 0.02)

---

## Curriculum Learning (Enterprise)

```python
sdk = AMLSDK(license_key='ENT-...', tier='enterprise')

init  = sdk.curriculum_init(dataset=training_data,  config=config, top_k_percent=0.1)
step  = sdk.curriculum_step(dataset=current_batch, config=config, precomputed_metrics={'max_iterations': 5})
status = sdk.curriculum_status()
print(status.get('state', {}).get('status'))
```

State/iterations persist in Redis (TTL ~1 day).

---

## Bootstrap Configuration (Professional/Enterprise)

Generate initial configuration from labeled examples:

```python
sdk = AMLSDK(license_key='PRO-...', tier='professional')

# Minimum 20 labeled examples required
labeled_data = [
    {'accuracy': 0.92, 'epochs': 100, 'label': 1},  # success
    {'accuracy': 0.78, 'epochs': 50,  'label': 0},  # failure
    # ...
]

config = sdk.bootstrap_config(labeled_data)
# Returns optimized importance and sensitivity based on correlation with success
```

---

## Non-Linear Evaluation Mode (Professional/Enterprise)

```python
config = {'performance': {'default': 0.85, 'importance': 2.0},
          'latency':     {'default': 100,  'importance': 1.5}}

adjustments = {'new_customer_bonus': 0.5, 'peak_hours_penalty': 0.3}
adjustment_values = {'performance': {'new_customer_bonus': 0.1, 'peak_hours_penalty': -0.05}}

result = sdk.evaluate_nonlinear(
  data={'performance': 0.87, 'latency': 95},
  config=config,
  adjustments=adjustments,
  adjustment_values=adjustment_values  # sent as data['adjustments_values']
)
print("Adjusted score:", result.score)
```

---

## Error Handling

```python
from dsf_aml_sdk import AMLSDK, LicenseError, ValidationError, APIError

try:
  sdk = AMLSDK(license_key='invalid', tier='enterprise')
  sdk.distill_train(config, samples=1000)
except LicenseError:
  sdk = AMLSDK()  # fallback to community
except ValidationError as e:
  print("Invalid config:", e)
except APIError as e:
  print("API failure:", e)

# get_metrics requires prior evaluate/batch_evaluate with config
```

**Backend limits & statuses**:
- 413 for data_batch or dataset too large (defaults: BATCH_MAX_ITEMS=1000, DATASET_MAX_ITEMS=10000).
- 403 invalid license / tier not permitted
- 404 unknown action/state
- 502 export failure

---

## Tier Comparison

| Feature                       | Community         | Professional                      | Enterprise                       |
|------------------------------|--------------------|-----------------------------------|----------------------------------|
| Single evaluation            | ✅ (100/day)      | ✅ (10k/day)                      | ✅ (unlimited)                   |
| Batch evaluation             | ❌                | ✅ (up to BATCH_MAX_ITEMS)        | ✅ (up to BATCH_MAX_ITEMS)       |
| Performance metrics          | ❌                | ✅                                | ✅ (enhanced)                    |
| Adaptive learning            | ❌                | ✅ Light                          | ✅ Full                          |
| Bootstrap configuration      | ❌                | ✅                                | ✅                               |
| pipeline_identify_seeds      | ✅                | ✅                                | ✅                               |
| pipeline_generate_critical   | Demo (1)           | ❌                                | ✅ Full                          |
| pipeline_full_cycle          | ❌                | ❌                                | ✅                               |
| Curriculum learning          | ❌                | ❌                                | ✅                               |
| Non-linear evaluation        | ❌                | ✅                                | ✅                               |
| Distillation (train/predict) | ❌                | ✅                                | ✅                               |
| Model export (surrogate)     | ❌                | ❌                                | ✅                               |
| Redis hot store / caching    | ❌                | ✅                                | ✅                               |
| Auto-tuning                  | ❌                | ⚠️ (limited)                      | ✅                               |
| Support                      | Community          | Email                             | Priority SLA                     |

---

## Enterprise Features

### Full Adaptive Learning

```python
sdk = AMLSDK(license_key='ENT-...', tier='enterprise')
_ = sdk.batch_evaluate(batches[0], config)
metrics = sdk.get_metrics()
print(metrics.get('weight_changes'), metrics.get('adjusted_fields'))
```

---

### Knowledge Distillation Performance

```python
sdk = AMLSDK(license_key='PRO-...', tier='professional')

import time
t0 = time.time(); _ = sdk.evaluate(data, config); t_full = time.time() - t0

sdk.distill_train(config, samples=1000)
t1 = time.time(); _ = sdk.distill_predict(data, config); t_surr = time.time() - t1

print(f"Speedup: {t_full / max(t_surr, 1e-6):.1f}×")
```

---

## API Reference (SDK)

### Initialization

`AMLSDK(tier='community'|'professional'|'enterprise', license_key=None, base_url=None)`

---

### Evaluation

- `evaluate(data, config)` – single evaluation
- `batch_evaluate(data_points, config)` – Pro/Ent (tier limits apply)
- `evaluate_nonlinear(data, config, adjustments, adjustment_values)` – Pro/Ent
- `get_metrics()` – requires prior evaluate/batch_evaluate with config (not community)

---

### Configuration

- `bootstrap_config(labeled_examples)` – Pro/Ent (min 20 examples)

---

### Pipeline

- `pipeline_identify_seeds(dataset, config, top_k_percent=0.1, max_seeds_preview=10)`
- `pipeline_generate_critical(config, original_dataset, seeds=None, **kwargs)`
  - Params: `k_variants`, `epsilon`, `non_critical_ratio`, `diversity_threshold`, `max_seeds_to_process`, `vectors_for_dedup` (optional)
  - If seeds not provided, retrieved from cache
- `pipeline_full_cycle(dataset, config, max_iterations=5, **kwargs)` – Enterprise

---

### Curriculum (Enterprise)

- `curriculum_init(dataset, config, **params)`
- `curriculum_step(dataset, config, precomputed_metrics=None)`
- `curriculum_status()`

---

### Distillation (Professional/Enterprise)

- `distill_train(config, samples=1000, batch_size=100, seed=42, adjustments=None)`
- `distill_predict(data, config)` → float score
- `distill_predict_batch(data_batch, config)` → List[float]
- `distill_export()` – Enterprise (export to Supabase)

---

### Configuration parameters (per field)

- `default` – reference value
- `importance` – field relevance (0.0–5.0)
- `sensitivity` – sensitivity factor (0.0–5.0)
- `string_floor` – minimum match for string mismatch (0.0–1.0; default 0.1)

---

## Use Cases

### Experiment Scoring

```python
config = {
  'train_acc': {'default': 0.92, 'importance': 2.0},
  'val_acc':   {'default': 0.88, 'importance': 2.5},
  'train_loss':{'default': 0.1,  'importance': 1.8},
  'gap':       {'default': 0.04, 'importance': 2.2},
}
result = sdk.evaluate(experiment_metrics, config)
```

---

### Boundary-Focused Reduction

```python
result = sdk.pipeline_full_cycle(dataset=full_training_data, config=config, max_iterations=3)
print(result['final_size'])
```

---

## Performance Benefits

- **Data efficiency**: 100–1,000 examples + rules (vs 10k+)
- **Training speed**: surrogate ≈ 10× faster
- **Pipeline processing**: 70–90% reduction maintaining accuracy
- **Deployment size**: surrogate artifacts are tiny

---

## FAQ

**Q: How accurate are surrogate models?**  
A: Typical MAE of 0.01-0.05 on normalized scores.

**Q: What is Pipeline 2-in-1?**  
A: Combines filtering and generation at decision boundaries.

**Q: Why does get_metrics() fail?**  
A: You must first call evaluate() or batch_evaluate() with a valid config.

**Q: How does epsilon auto-tuning work?**  
A: Adjusts based on previous acceptance rate (stored 1h in Redis).

**Q: Do I need to generate vectors for deduplication?**  
A: No, auto-generated if not provided.

**Q: How long are pipeline seeds cached?**  
A: 3600 seconds (1 hour) in Redis.

**Q: Can Community tier use pipeline_identify_seeds?**  
A: Yes. Preview size is configurable (default 10 via max_seeds_preview).

**Q: Does Professional tier have access to distillation?**  
A: Yes, both Professional and Enterprise tiers.

---

## Support

- **Docs**: https://docs.dsf-aml.ai
- **Issues**: https://github.com/dsf-aml/sdk/issues
- **Enterprise**: contacto@softwarefinanzas.com.co

---

## License

MIT for Community. Professional/Enterprise under commercial terms.  
© 2025 DSF AML SDK. Adaptive ML powered by Knowledge Distillation.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/jaimeajl/dsf-aml-sdk",
    "name": "dsf-aml-sdk",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": "dsf aml ml machine-learning distillation adaptive sdk",
    "author": "api-dsfuptech",
    "author_email": "contacto@softwarefinanzas.com.co",
    "download_url": "https://files.pythonhosted.org/packages/4c/a0/91d909058d22bafcc153ee8740a112b70b4f6e52ed8357e1d5c605e6a646/dsf_aml_sdk-1.0.26.tar.gz",
    "platform": null,
    "description": "# DSF AML SDK\r\n\r\nReduce ML training data requirements by 70\u201390% through adaptive evaluation and knowledge distillation. Train surrogate models ~10\u00d7 faster and cut infra costs.\r\n\r\n---\r\n\r\n## Why DSF AML?\r\n\r\nTraditional ML needs thousands of labeled examples and hours of training. DSF AML uses adaptive formula evaluation + knowledge distillation to create fast, lightweight models from domain expertise with minimal data.\r\n\r\n---\r\n\r\n## Core Concepts\r\n\r\nDefine weighted evaluation rules from domain knowledge. (Enterprise) adapts parameters over time and (Pro/Ent) can distill into an ultra-fast surrogate (linear) for large-scale or edge inference.\r\n\r\n**Non-linear mode note**: the backend expects `data['adjustments_values'] = { field_name: { adj_name: value } }`.\r\n\r\n---\r\n\r\n## Installation\r\n\r\n```bash\r\npip install dsf-aml-sdk\r\n```\r\n\r\n**Custom Backend URL (SDK):**\r\n\r\n```python\r\nimport os\r\nfrom dsf_aml_sdk import AMLSDK\r\nsdk = AMLSDK(base_url=os.getenv(\"DSF_AML_BASE_URL\"), tier=\"community\")\r\n```\r\n\r\n---\r\n\r\n## Quick Start\r\n\r\n### Community Edition\r\n\r\n```python\r\nfrom dsf_aml_sdk import AMLSDK\r\n\r\nsdk = AMLSDK()  # community\r\n\r\nconfig = (sdk.create_config()\r\n  .add_field('model_accuracy',  default=0.95, importance=2.5, sensitivity=2.0)\r\n  .add_field('training_epochs', default=100,  importance=1.8, sensitivity=1.5)\r\n  .add_field('validation_loss', default=0.05, importance=2.2, sensitivity=2.5)\r\n  .add_field('model_name',      default='baseline', importance=1.0, string_floor=0.1)\r\n)\r\n\r\nexperiment = {'model_accuracy': 0.96, 'training_epochs': 105, 'validation_loss': 0.048}\r\nresult = sdk.evaluate(experiment, config)\r\nprint(f\"Score: {result.score:.3f}\")\r\n```\r\n\r\n---\r\n\r\n### Professional Edition\r\n\r\n```python\r\nsdk = AMLSDK(license_key='PRO-2026-12-31-XXXX', tier='professional')\r\n\r\n# Bootstrap config from labeled examples (Professional+)\r\nlabeled_examples = [\r\n    {'model_accuracy': 0.92, 'training_epochs': 50, 'label': 1},\r\n    {'model_accuracy': 0.85, 'training_epochs': 30, 'label': 0},\r\n    # ... minimum 20 examples\r\n]\r\nsuggested_config = sdk.bootstrap_config(labeled_examples)\r\n\r\n# Batch evaluation (limit per request = BATCH_MAX_ITEMS; default 1000)\r\nexperiments = [\r\n  {'model_accuracy': 0.92, 'training_epochs': 50,  'validation_loss': 0.08},\r\n  {'model_accuracy': 0.95, 'training_epochs': 100, 'validation_loss': 0.05},\r\n  {'model_accuracy': 0.97, 'training_epochs': 150, 'validation_loss': 0.03},\r\n]\r\n\r\nresults = sdk.batch_evaluate(experiments, config)\r\nmetrics = sdk.get_metrics()  # requires prior evaluate/batch_evaluate with config\r\n```\r\n\r\n---\r\n\r\n### Enterprise: Pipeline + Distillation\r\n\r\n```python\r\nsdk = AMLSDK(license_key='ENT-2026-12-31-XXXX', tier='enterprise')\r\n\r\n# 1) Seeds (all tiers)\r\nseeds = sdk.pipeline_identify_seeds(dataset=training_data, config=config, top_k_percent=0.1)\r\n\r\n# 2) Critical generation\r\n# - Community: demo (1 variant)\r\n# - Professional: rejected by backend (upgrade to Enterprise)\r\n# - Enterprise: full generation\r\ngen = sdk.pipeline_generate_critical(config=config, original_dataset=training_data)\r\n\r\n# 3) Full cycle (Enterprise)\r\nfull = sdk.pipeline_full_cycle(dataset=training_data, config=config, max_iterations=3)\r\n\r\n# 4) Distillation (train/predict Pro+; export Enterprise only)\r\nsdk.distill_train(config, samples=1000, batch_size=100, seed=42)\r\nfast_score = sdk.distill_predict(data=some_item, config=config)\r\nartifact = sdk.distill_export()   # Enterprise only\r\n```\r\n\r\n---\r\n\r\n## Rate Limits\r\n\r\n| Tier         | Evaluations/Day | Batch Size                              | Seeds Preview                    |\r\n|--------------|-----------------|-----------------------------------------|----------------------------------|\r\n| Community    | 100             | \u274c Not available                         | Configurable (default: 10)      |\r\n| Professional | 10,000          | \u2705 Up to `BATCH_MAX_ITEMS` (default 1000) | Unlimited                      |\r\n| Enterprise   | Unlimited       | \u2705 Up to `BATCH_MAX_ITEMS` (default 1000) | Unlimited                      |\r\n\r\n---\r\n\r\n## Pipeline 2-in-1: Decision Boundary Focus\r\n\r\nReduce dataset 70\u201390% preserving information at decision boundaries.\r\n\r\n```python\r\n# Seeds (all tiers). Cache 1h.\r\nseeds_result = sdk.pipeline_identify_seeds(dataset=training_data, config=config, top_k_percent=0.1)\r\nprint(\"Seeds:\", seeds_result['seeds_count'])\r\n\r\nvariants_result = sdk.pipeline_generate_critical(\r\n  config=config,\r\n  original_dataset=training_data,\r\n  k_variants=5,\r\n  epsilon=0.05,            # auto-tune based on previous acceptance rate\r\n  non_critical_ratio=0.15,\r\n  diversity_threshold=0.95,\r\n  max_seeds_to_process=100\r\n)\r\nprint(\"Generated:\", variants_result['total_generated'])\r\n```\r\n\r\n**Auto-tuning (Enterprise)**: epsilon adjusts based on previous acceptance rate (stored 1h in `acc_rate:{license}`).\r\n- If rate < 1% \u2192 epsilon += 0.02 (\u2264 0.25)\r\n- If rate > 30% \u2192 epsilon -= 0.02 (\u2265 0.02)\r\n\r\n---\r\n\r\n## Curriculum Learning (Enterprise)\r\n\r\n```python\r\nsdk = AMLSDK(license_key='ENT-...', tier='enterprise')\r\n\r\ninit  = sdk.curriculum_init(dataset=training_data,  config=config, top_k_percent=0.1)\r\nstep  = sdk.curriculum_step(dataset=current_batch, config=config, precomputed_metrics={'max_iterations': 5})\r\nstatus = sdk.curriculum_status()\r\nprint(status.get('state', {}).get('status'))\r\n```\r\n\r\nState/iterations persist in Redis (TTL ~1 day).\r\n\r\n---\r\n\r\n## Bootstrap Configuration (Professional/Enterprise)\r\n\r\nGenerate initial configuration from labeled examples:\r\n\r\n```python\r\nsdk = AMLSDK(license_key='PRO-...', tier='professional')\r\n\r\n# Minimum 20 labeled examples required\r\nlabeled_data = [\r\n    {'accuracy': 0.92, 'epochs': 100, 'label': 1},  # success\r\n    {'accuracy': 0.78, 'epochs': 50,  'label': 0},  # failure\r\n    # ...\r\n]\r\n\r\nconfig = sdk.bootstrap_config(labeled_data)\r\n# Returns optimized importance and sensitivity based on correlation with success\r\n```\r\n\r\n---\r\n\r\n## Non-Linear Evaluation Mode (Professional/Enterprise)\r\n\r\n```python\r\nconfig = {'performance': {'default': 0.85, 'importance': 2.0},\r\n          'latency':     {'default': 100,  'importance': 1.5}}\r\n\r\nadjustments = {'new_customer_bonus': 0.5, 'peak_hours_penalty': 0.3}\r\nadjustment_values = {'performance': {'new_customer_bonus': 0.1, 'peak_hours_penalty': -0.05}}\r\n\r\nresult = sdk.evaluate_nonlinear(\r\n  data={'performance': 0.87, 'latency': 95},\r\n  config=config,\r\n  adjustments=adjustments,\r\n  adjustment_values=adjustment_values  # sent as data['adjustments_values']\r\n)\r\nprint(\"Adjusted score:\", result.score)\r\n```\r\n\r\n---\r\n\r\n## Error Handling\r\n\r\n```python\r\nfrom dsf_aml_sdk import AMLSDK, LicenseError, ValidationError, APIError\r\n\r\ntry:\r\n  sdk = AMLSDK(license_key='invalid', tier='enterprise')\r\n  sdk.distill_train(config, samples=1000)\r\nexcept LicenseError:\r\n  sdk = AMLSDK()  # fallback to community\r\nexcept ValidationError as e:\r\n  print(\"Invalid config:\", e)\r\nexcept APIError as e:\r\n  print(\"API failure:\", e)\r\n\r\n# get_metrics requires prior evaluate/batch_evaluate with config\r\n```\r\n\r\n**Backend limits & statuses**:\r\n- 413 for data_batch or dataset too large (defaults: BATCH_MAX_ITEMS=1000, DATASET_MAX_ITEMS=10000).\r\n- 403 invalid license / tier not permitted\r\n- 404 unknown action/state\r\n- 502 export failure\r\n\r\n---\r\n\r\n## Tier Comparison\r\n\r\n| Feature                       | Community         | Professional                      | Enterprise                       |\r\n|------------------------------|--------------------|-----------------------------------|----------------------------------|\r\n| Single evaluation            | \u2705 (100/day)      | \u2705 (10k/day)                      | \u2705 (unlimited)                   |\r\n| Batch evaluation             | \u274c                | \u2705 (up to BATCH_MAX_ITEMS)        | \u2705 (up to BATCH_MAX_ITEMS)       |\r\n| Performance metrics          | \u274c                | \u2705                                | \u2705 (enhanced)                    |\r\n| Adaptive learning            | \u274c                | \u2705 Light                          | \u2705 Full                          |\r\n| Bootstrap configuration      | \u274c                | \u2705                                | \u2705                               |\r\n| pipeline_identify_seeds      | \u2705                | \u2705                                | \u2705                               |\r\n| pipeline_generate_critical   | Demo (1)           | \u274c                                | \u2705 Full                          |\r\n| pipeline_full_cycle          | \u274c                | \u274c                                | \u2705                               |\r\n| Curriculum learning          | \u274c                | \u274c                                | \u2705                               |\r\n| Non-linear evaluation        | \u274c                | \u2705                                | \u2705                               |\r\n| Distillation (train/predict) | \u274c                | \u2705                                | \u2705                               |\r\n| Model export (surrogate)     | \u274c                | \u274c                                | \u2705                               |\r\n| Redis hot store / caching    | \u274c                | \u2705                                | \u2705                               |\r\n| Auto-tuning                  | \u274c                | \u26a0\ufe0f (limited)                      | \u2705                               |\r\n| Support                      | Community          | Email                             | Priority SLA                     |\r\n\r\n---\r\n\r\n## Enterprise Features\r\n\r\n### Full Adaptive Learning\r\n\r\n```python\r\nsdk = AMLSDK(license_key='ENT-...', tier='enterprise')\r\n_ = sdk.batch_evaluate(batches[0], config)\r\nmetrics = sdk.get_metrics()\r\nprint(metrics.get('weight_changes'), metrics.get('adjusted_fields'))\r\n```\r\n\r\n---\r\n\r\n### Knowledge Distillation Performance\r\n\r\n```python\r\nsdk = AMLSDK(license_key='PRO-...', tier='professional')\r\n\r\nimport time\r\nt0 = time.time(); _ = sdk.evaluate(data, config); t_full = time.time() - t0\r\n\r\nsdk.distill_train(config, samples=1000)\r\nt1 = time.time(); _ = sdk.distill_predict(data, config); t_surr = time.time() - t1\r\n\r\nprint(f\"Speedup: {t_full / max(t_surr, 1e-6):.1f}\u00d7\")\r\n```\r\n\r\n---\r\n\r\n## API Reference (SDK)\r\n\r\n### Initialization\r\n\r\n`AMLSDK(tier='community'|'professional'|'enterprise', license_key=None, base_url=None)`\r\n\r\n---\r\n\r\n### Evaluation\r\n\r\n- `evaluate(data, config)` \u2013 single evaluation\r\n- `batch_evaluate(data_points, config)` \u2013 Pro/Ent (tier limits apply)\r\n- `evaluate_nonlinear(data, config, adjustments, adjustment_values)` \u2013 Pro/Ent\r\n- `get_metrics()` \u2013 requires prior evaluate/batch_evaluate with config (not community)\r\n\r\n---\r\n\r\n### Configuration\r\n\r\n- `bootstrap_config(labeled_examples)` \u2013 Pro/Ent (min 20 examples)\r\n\r\n---\r\n\r\n### Pipeline\r\n\r\n- `pipeline_identify_seeds(dataset, config, top_k_percent=0.1, max_seeds_preview=10)`\r\n- `pipeline_generate_critical(config, original_dataset, seeds=None, **kwargs)`\r\n  - Params: `k_variants`, `epsilon`, `non_critical_ratio`, `diversity_threshold`, `max_seeds_to_process`, `vectors_for_dedup` (optional)\r\n  - If seeds not provided, retrieved from cache\r\n- `pipeline_full_cycle(dataset, config, max_iterations=5, **kwargs)` \u2013 Enterprise\r\n\r\n---\r\n\r\n### Curriculum (Enterprise)\r\n\r\n- `curriculum_init(dataset, config, **params)`\r\n- `curriculum_step(dataset, config, precomputed_metrics=None)`\r\n- `curriculum_status()`\r\n\r\n---\r\n\r\n### Distillation (Professional/Enterprise)\r\n\r\n- `distill_train(config, samples=1000, batch_size=100, seed=42, adjustments=None)`\r\n- `distill_predict(data, config)` \u2192 float score\r\n- `distill_predict_batch(data_batch, config)` \u2192 List[float]\r\n- `distill_export()` \u2013 Enterprise (export to Supabase)\r\n\r\n---\r\n\r\n### Configuration parameters (per field)\r\n\r\n- `default` \u2013 reference value\r\n- `importance` \u2013 field relevance (0.0\u20135.0)\r\n- `sensitivity` \u2013 sensitivity factor (0.0\u20135.0)\r\n- `string_floor` \u2013 minimum match for string mismatch (0.0\u20131.0; default 0.1)\r\n\r\n---\r\n\r\n## Use Cases\r\n\r\n### Experiment Scoring\r\n\r\n```python\r\nconfig = {\r\n  'train_acc': {'default': 0.92, 'importance': 2.0},\r\n  'val_acc':   {'default': 0.88, 'importance': 2.5},\r\n  'train_loss':{'default': 0.1,  'importance': 1.8},\r\n  'gap':       {'default': 0.04, 'importance': 2.2},\r\n}\r\nresult = sdk.evaluate(experiment_metrics, config)\r\n```\r\n\r\n---\r\n\r\n### Boundary-Focused Reduction\r\n\r\n```python\r\nresult = sdk.pipeline_full_cycle(dataset=full_training_data, config=config, max_iterations=3)\r\nprint(result['final_size'])\r\n```\r\n\r\n---\r\n\r\n## Performance Benefits\r\n\r\n- **Data efficiency**: 100\u20131,000 examples + rules (vs 10k+)\r\n- **Training speed**: surrogate \u2248 10\u00d7 faster\r\n- **Pipeline processing**: 70\u201390% reduction maintaining accuracy\r\n- **Deployment size**: surrogate artifacts are tiny\r\n\r\n---\r\n\r\n## FAQ\r\n\r\n**Q: How accurate are surrogate models?**  \r\nA: Typical MAE of 0.01-0.05 on normalized scores.\r\n\r\n**Q: What is Pipeline 2-in-1?**  \r\nA: Combines filtering and generation at decision boundaries.\r\n\r\n**Q: Why does get_metrics() fail?**  \r\nA: You must first call evaluate() or batch_evaluate() with a valid config.\r\n\r\n**Q: How does epsilon auto-tuning work?**  \r\nA: Adjusts based on previous acceptance rate (stored 1h in Redis).\r\n\r\n**Q: Do I need to generate vectors for deduplication?**  \r\nA: No, auto-generated if not provided.\r\n\r\n**Q: How long are pipeline seeds cached?**  \r\nA: 3600 seconds (1 hour) in Redis.\r\n\r\n**Q: Can Community tier use pipeline_identify_seeds?**  \r\nA: Yes. Preview size is configurable (default 10 via max_seeds_preview).\r\n\r\n**Q: Does Professional tier have access to distillation?**  \r\nA: Yes, both Professional and Enterprise tiers.\r\n\r\n---\r\n\r\n## Support\r\n\r\n- **Docs**: https://docs.dsf-aml.ai\r\n- **Issues**: https://github.com/dsf-aml/sdk/issues\r\n- **Enterprise**: contacto@softwarefinanzas.com.co\r\n\r\n---\r\n\r\n## License\r\n\r\nMIT for Community. Professional/Enterprise under commercial terms.  \r\n\u00a9 2025 DSF AML SDK. Adaptive ML powered by Knowledge Distillation.\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "SDK for DSF Adaptive ML with Knowledge Distillation",
    "version": "1.0.26",
    "project_urls": {
        "Homepage": "https://github.com/jaimeajl/dsf-aml-sdk"
    },
    "split_keywords": [
        "dsf",
        "aml",
        "ml",
        "machine-learning",
        "distillation",
        "adaptive",
        "sdk"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "8f640465e6c39e2d08f78c0d56c293b9b9619d3f3e3243a59ba8c5d24775d223",
                "md5": "d85dea5b08d8b20088df0d49143f769b",
                "sha256": "717bbf9a41910a2c8b4c1d9c96bd96e2b65a5e76fd9b52a0e48f73372afce136"
            },
            "downloads": -1,
            "filename": "dsf_aml_sdk-1.0.26-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d85dea5b08d8b20088df0d49143f769b",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 11989,
            "upload_time": "2025-10-08T04:13:53",
            "upload_time_iso_8601": "2025-10-08T04:13:53.759164Z",
            "url": "https://files.pythonhosted.org/packages/8f/64/0465e6c39e2d08f78c0d56c293b9b9619d3f3e3243a59ba8c5d24775d223/dsf_aml_sdk-1.0.26-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "4ca091d909058d22bafcc153ee8740a112b70b4f6e52ed8357e1d5c605e6a646",
                "md5": "2ff78d38b2f433594d86244f03e28df0",
                "sha256": "2e1a26390e4899ad17058291de54d1ba9ef78d0cfac02e8778db66e0f368b181"
            },
            "downloads": -1,
            "filename": "dsf_aml_sdk-1.0.26.tar.gz",
            "has_sig": false,
            "md5_digest": "2ff78d38b2f433594d86244f03e28df0",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 15091,
            "upload_time": "2025-10-08T04:13:55",
            "upload_time_iso_8601": "2025-10-08T04:13:55.069349Z",
            "url": "https://files.pythonhosted.org/packages/4c/a0/91d909058d22bafcc153ee8740a112b70b4f6e52ed8357e1d5c605e6a646/dsf_aml_sdk-1.0.26.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-08 04:13:55",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "jaimeajl",
    "github_project": "dsf-aml-sdk",
    "github_not_found": true,
    "lcname": "dsf-aml-sdk"
}
        
Elapsed time: 2.64124s