upmex


Nameupmex JSON
Version 1.6.6 PyPI version JSON
download
home_pageNone
SummaryUniversal Package Metadata Extractor - Extract metadata from various package formats
upload_time2025-10-28 01:29:25
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseMIT
keywords package metadata extractor license detection python npm maven jar wheel pypi
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # UPMEX - Universal Package Metadata Extractor

Extract metadata and license information from various package formats with a single tool.

## Features

### Core Capabilities
- **Universal Package Support**: Extract metadata from 15 package ecosystems
- **Multi-Format Detection**: Automatic package type identification
- **Standardized Output**: Consistent JSON structure across all formats
- **Native Extraction**: No dependency on external package managers
- **High Performance**: Process packages up to 500MB in under 10 seconds

#### Advanced Features
- **NO-ASSERTION Handling**: Clear indication for unavailable data
- **Dependency Mapping**: Full dependency tree with version constraints
- **Author Parsing**: Intelligent name/email extraction and normalization
- **Repository Detection**: Automatic VCS URL extraction
- **Platform Support**: Architecture and OS requirement detection
- **Package URL (PURL)**: Generate standard Package URLs for all packages
- **File Hashing**: SHA-1, MD5, and fuzzy hash (TLSH) for package files
- **JSON Organization**: Structured output with package, metadata, people, licensing, dependencies sections
- **Data Provenance**: Track source of each data field for attestation

#### Supported Ecosystems
- **Python**: wheel (.whl), sdist (.tar.gz, .zip)
- **NPM/Node.js**: .tgz, .tar.gz packages
- **Java/Maven**: .jar, .war, .ear with POM support
- **Gradle**: build.gradle, build.gradle.kts files
- **CocoaPods**: .podspec, .podspec.json files
- **Conda**: .conda (zip), .tar.bz2 packages
- **Perl/CPAN**: .tar.gz, .zip with META.json/yml
- **Conan C/C++**: conanfile.py, conanfile.txt, .tgz packages
- **Ruby Gems**: .gem packages
- **Rust Crates**: .crate packages
- **Go Modules**: .zip archives, go.mod files
- **NuGet/.NET**: .nupkg packages
- **Debian**: .deb packages
- **RPM**: .rpm packages

#### License Detection
- **Powered by OSLiLi**: Uses the external [semantic-copycat-oslili](https://github.com/oscarvalenzuelab/semantic-copycat-oslili) library (v1.5.0+) for license detection
- **Simplified Integration**: UPMEX extracts license-related files and delegates detection to OSLiLi
- **Detection Coverage**:
  - SPDX identifiers in package metadata
  - License files (LICENSE, COPYING, etc.)
  - Package manifest license fields

#### API Integrations & Enrichment
- **Registry Mode**: Fetches missing metadata from package registries (Maven Central, etc.)
- **API Enrichment**: External third-party API integrations for enhanced data
  - **ClearlyDefined**: License and compliance data enrichment
  - **Ecosyste.ms**: Package registry metadata and dependencies
  - **PurlDB**: Comprehensive package metadata from Package URL database
  - **VulnerableCode**: Security vulnerability scanning and assessment
- **Enrichment Tracking**: Full transparency on data sources and applied fields
- **Offline-First**: All core features work without internet connectivity

## Installation

```bash
# Install from source
git clone https://github.com/SemClone/upmex.git
cd upmex
pip install -e .

# Install with all dependencies
pip install -e .

# Install for development
pip install -e ".[dev]"

```

## Quick Start

```python
from upmex import PackageExtractor

# Create extractor
extractor = PackageExtractor()

# Extract metadata from a package
metadata = extractor.extract("path/to/package.whl")

# Access metadata
print(f"Package: {metadata.name} v{metadata.version}")
print(f"Type: {metadata.package_type.value}")
print(f"License: {metadata.licenses[0].spdx_id if metadata.licenses else 'Unknown'}")

# Convert to JSON
import json
print(json.dumps(metadata.to_dict(), indent=2))
```

## CLI Usage

```bash
# Basic extraction (offline mode - default)
upmex extract package.whl

# Registry mode - fetches missing metadata from package registries
upmex extract --registry package.jar

# API enrichment - query specific third-party APIs
upmex extract --api clearlydefined package.whl
upmex extract --api ecosystems package.jar
upmex extract --api purldb package.gem
upmex extract --api vulnerablecode package.jar
upmex extract --api all package.whl

# Combined registry and API enrichment
upmex extract --registry --api all package.jar

# With pretty JSON output
upmex extract --pretty package.whl

# Output to file
upmex extract package.whl -o metadata.json

# Text format output
upmex extract --format text package.tar.gz

# Detect package type
upmex detect package.jar

# Extract license information
upmex license package.tgz
```

## Configuration

Configuration can be done via JSON files or environment variables:

### Environment Variables

```bash
# API Keys
export PME_CLEARLYDEFINED_API_KEY=your-api-key
export PME_ECOSYSTEMS_API_KEY=your-api-key
export PME_PURLDB_API_KEY=your-api-key
export PME_VULNERABLECODE_API_KEY=your-api-key

# Settings
export PME_LOG_LEVEL=DEBUG
export PME_CACHE_DIR=/path/to/cache
export PME_OUTPUT_FORMAT=json

```

### Configuration File

Create a `config.json`:

```json
{
  "api": {
    "clearlydefined": {
      "enabled": true,
      "api_key": null
    }
  },
  "output": {
    "format": "json",
    "pretty_print": true
  }
}
```

## Supported Package Types

| Ecosystem | Formats | Detection | Metadata | Online Mode | Tested |
|-----------|---------|-----------|----------|-------------|--------|
| Python | .whl, .tar.gz, .zip | ✓ | ✓ | Registry & API | ✓ |
| NPM | .tgz, .tar.gz | ✓ | ✓ | Registry & API | ✓ |
| Java | .jar, .war, .ear | ✓ | ✓ | Registry & API | ✓ |
| Maven | .jar with POM | ✓ | ✓ | Registry & API | ✓ |
| Gradle | build.gradle(.kts) | ✓ | ✓ | Registry & API | ✓ |
| CocoaPods | .podspec(.json) | ✓ | ✓ | Registry & API | ✓ |
| Conda | .conda, .tar.bz2 | ✓ | ✓ | Registry & API | ✓ |
| Perl/CPAN | .tar.gz, .zip | ✓ | ✓ | Registry & API | ✓ |
| Conan | conanfile.py/.txt | ✓ | ✓ | Registry & API | ✓ |
| Ruby | .gem | ✓ | ✓ | Registry & API | ✓ |
| Rust | .crate | ✓ | ✓ | Registry & API | ✓ |
| Go | .zip, .mod, go.mod | ✓ | ✓ | Registry & API | ✓ |
| NuGet | .nupkg | ✓ | ✓ | Registry & API | ✓ |
| Debian | .deb | ✓ | ✓ | Registry & API | ✓ |
| RPM | .rpm | ✓ | ✓ | Registry & API | ✓ |


## Changelog

See [CHANGELOG.md](CHANGELOG.md) for a detailed history of changes.

## License

MIT License - see LICENSE file for details.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "upmex",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "\"Oscar Valenzuela B.\" <oscar.valenzuela.b@gmail.com>",
    "keywords": "package, metadata, extractor, license, detection, python, npm, maven, jar, wheel, pypi",
    "author": null,
    "author_email": "\"Oscar Valenzuela B.\" <oscar.valenzuela.b@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/8c/20/6484d6b0afa7cf4097cc0050281a3d0c21f021724cd286385e06e62d77f0/upmex-1.6.6.tar.gz",
    "platform": null,
    "description": "# UPMEX - Universal Package Metadata Extractor\n\nExtract metadata and license information from various package formats with a single tool.\n\n## Features\n\n### Core Capabilities\n- **Universal Package Support**: Extract metadata from 15 package ecosystems\n- **Multi-Format Detection**: Automatic package type identification\n- **Standardized Output**: Consistent JSON structure across all formats\n- **Native Extraction**: No dependency on external package managers\n- **High Performance**: Process packages up to 500MB in under 10 seconds\n\n#### Advanced Features\n- **NO-ASSERTION Handling**: Clear indication for unavailable data\n- **Dependency Mapping**: Full dependency tree with version constraints\n- **Author Parsing**: Intelligent name/email extraction and normalization\n- **Repository Detection**: Automatic VCS URL extraction\n- **Platform Support**: Architecture and OS requirement detection\n- **Package URL (PURL)**: Generate standard Package URLs for all packages\n- **File Hashing**: SHA-1, MD5, and fuzzy hash (TLSH) for package files\n- **JSON Organization**: Structured output with package, metadata, people, licensing, dependencies sections\n- **Data Provenance**: Track source of each data field for attestation\n\n#### Supported Ecosystems\n- **Python**: wheel (.whl), sdist (.tar.gz, .zip)\n- **NPM/Node.js**: .tgz, .tar.gz packages\n- **Java/Maven**: .jar, .war, .ear with POM support\n- **Gradle**: build.gradle, build.gradle.kts files\n- **CocoaPods**: .podspec, .podspec.json files\n- **Conda**: .conda (zip), .tar.bz2 packages\n- **Perl/CPAN**: .tar.gz, .zip with META.json/yml\n- **Conan C/C++**: conanfile.py, conanfile.txt, .tgz packages\n- **Ruby Gems**: .gem packages\n- **Rust Crates**: .crate packages\n- **Go Modules**: .zip archives, go.mod files\n- **NuGet/.NET**: .nupkg packages\n- **Debian**: .deb packages\n- **RPM**: .rpm packages\n\n#### License Detection\n- **Powered by OSLiLi**: Uses the external [semantic-copycat-oslili](https://github.com/oscarvalenzuelab/semantic-copycat-oslili) library (v1.5.0+) for license detection\n- **Simplified Integration**: UPMEX extracts license-related files and delegates detection to OSLiLi\n- **Detection Coverage**:\n  - SPDX identifiers in package metadata\n  - License files (LICENSE, COPYING, etc.)\n  - Package manifest license fields\n\n#### API Integrations & Enrichment\n- **Registry Mode**: Fetches missing metadata from package registries (Maven Central, etc.)\n- **API Enrichment**: External third-party API integrations for enhanced data\n  - **ClearlyDefined**: License and compliance data enrichment\n  - **Ecosyste.ms**: Package registry metadata and dependencies\n  - **PurlDB**: Comprehensive package metadata from Package URL database\n  - **VulnerableCode**: Security vulnerability scanning and assessment\n- **Enrichment Tracking**: Full transparency on data sources and applied fields\n- **Offline-First**: All core features work without internet connectivity\n\n## Installation\n\n```bash\n# Install from source\ngit clone https://github.com/SemClone/upmex.git\ncd upmex\npip install -e .\n\n# Install with all dependencies\npip install -e .\n\n# Install for development\npip install -e \".[dev]\"\n\n```\n\n## Quick Start\n\n```python\nfrom upmex import PackageExtractor\n\n# Create extractor\nextractor = PackageExtractor()\n\n# Extract metadata from a package\nmetadata = extractor.extract(\"path/to/package.whl\")\n\n# Access metadata\nprint(f\"Package: {metadata.name} v{metadata.version}\")\nprint(f\"Type: {metadata.package_type.value}\")\nprint(f\"License: {metadata.licenses[0].spdx_id if metadata.licenses else 'Unknown'}\")\n\n# Convert to JSON\nimport json\nprint(json.dumps(metadata.to_dict(), indent=2))\n```\n\n## CLI Usage\n\n```bash\n# Basic extraction (offline mode - default)\nupmex extract package.whl\n\n# Registry mode - fetches missing metadata from package registries\nupmex extract --registry package.jar\n\n# API enrichment - query specific third-party APIs\nupmex extract --api clearlydefined package.whl\nupmex extract --api ecosystems package.jar\nupmex extract --api purldb package.gem\nupmex extract --api vulnerablecode package.jar\nupmex extract --api all package.whl\n\n# Combined registry and API enrichment\nupmex extract --registry --api all package.jar\n\n# With pretty JSON output\nupmex extract --pretty package.whl\n\n# Output to file\nupmex extract package.whl -o metadata.json\n\n# Text format output\nupmex extract --format text package.tar.gz\n\n# Detect package type\nupmex detect package.jar\n\n# Extract license information\nupmex license package.tgz\n```\n\n## Configuration\n\nConfiguration can be done via JSON files or environment variables:\n\n### Environment Variables\n\n```bash\n# API Keys\nexport PME_CLEARLYDEFINED_API_KEY=your-api-key\nexport PME_ECOSYSTEMS_API_KEY=your-api-key\nexport PME_PURLDB_API_KEY=your-api-key\nexport PME_VULNERABLECODE_API_KEY=your-api-key\n\n# Settings\nexport PME_LOG_LEVEL=DEBUG\nexport PME_CACHE_DIR=/path/to/cache\nexport PME_OUTPUT_FORMAT=json\n\n```\n\n### Configuration File\n\nCreate a `config.json`:\n\n```json\n{\n  \"api\": {\n    \"clearlydefined\": {\n      \"enabled\": true,\n      \"api_key\": null\n    }\n  },\n  \"output\": {\n    \"format\": \"json\",\n    \"pretty_print\": true\n  }\n}\n```\n\n## Supported Package Types\n\n| Ecosystem | Formats | Detection | Metadata | Online Mode | Tested |\n|-----------|---------|-----------|----------|-------------|--------|\n| Python | .whl, .tar.gz, .zip | \u2713 | \u2713 | Registry & API | \u2713 |\n| NPM | .tgz, .tar.gz | \u2713 | \u2713 | Registry & API | \u2713 |\n| Java | .jar, .war, .ear | \u2713 | \u2713 | Registry & API | \u2713 |\n| Maven | .jar with POM | \u2713 | \u2713 | Registry & API | \u2713 |\n| Gradle | build.gradle(.kts) | \u2713 | \u2713 | Registry & API | \u2713 |\n| CocoaPods | .podspec(.json) | \u2713 | \u2713 | Registry & API | \u2713 |\n| Conda | .conda, .tar.bz2 | \u2713 | \u2713 | Registry & API | \u2713 |\n| Perl/CPAN | .tar.gz, .zip | \u2713 | \u2713 | Registry & API | \u2713 |\n| Conan | conanfile.py/.txt | \u2713 | \u2713 | Registry & API | \u2713 |\n| Ruby | .gem | \u2713 | \u2713 | Registry & API | \u2713 |\n| Rust | .crate | \u2713 | \u2713 | Registry & API | \u2713 |\n| Go | .zip, .mod, go.mod | \u2713 | \u2713 | Registry & API | \u2713 |\n| NuGet | .nupkg | \u2713 | \u2713 | Registry & API | \u2713 |\n| Debian | .deb | \u2713 | \u2713 | Registry & API | \u2713 |\n| RPM | .rpm | \u2713 | \u2713 | Registry & API | \u2713 |\n\n\n## Changelog\n\nSee [CHANGELOG.md](CHANGELOG.md) for a detailed history of changes.\n\n## License\n\nMIT License - see LICENSE file for details.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Universal Package Metadata Extractor - Extract metadata from various package formats",
    "version": "1.6.6",
    "project_urls": {
        "Changelog": "https://github.com/SemClone/upmex/blob/main/CHANGELOG.md",
        "Documentation": "https://github.com/SemClone/upmex#readme",
        "Homepage": "https://github.com/SemClone/upmex",
        "Issues": "https://github.com/SemClone/upmex/issues",
        "Repository": "https://github.com/SemClone/upmex"
    },
    "split_keywords": [
        "package",
        " metadata",
        " extractor",
        " license",
        " detection",
        " python",
        " npm",
        " maven",
        " jar",
        " wheel",
        " pypi"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "b0b4721bf3a90adf217fd84e5f972afef27178be77ce537af49902c2f037d36e",
                "md5": "899e05b882558468d37517f2aa907038",
                "sha256": "68f146e3cdc268ca08419b1c011747c1cf7789a49bea3bcf8093a58d622eca98"
            },
            "downloads": -1,
            "filename": "upmex-1.6.6-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "899e05b882558468d37517f2aa907038",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 89700,
            "upload_time": "2025-10-28T01:29:23",
            "upload_time_iso_8601": "2025-10-28T01:29:23.347960Z",
            "url": "https://files.pythonhosted.org/packages/b0/b4/721bf3a90adf217fd84e5f972afef27178be77ce537af49902c2f037d36e/upmex-1.6.6-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "8c206484d6b0afa7cf4097cc0050281a3d0c21f021724cd286385e06e62d77f0",
                "md5": "233d766732aa3dc578dc0b14aaf0b92b",
                "sha256": "c2b1ed525e68be97c163a0788824aa3a7bc079dd90269321f82c5a3504b3c4f0"
            },
            "downloads": -1,
            "filename": "upmex-1.6.6.tar.gz",
            "has_sig": false,
            "md5_digest": "233d766732aa3dc578dc0b14aaf0b92b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 74184,
            "upload_time": "2025-10-28T01:29:25",
            "upload_time_iso_8601": "2025-10-28T01:29:25.832817Z",
            "url": "https://files.pythonhosted.org/packages/8c/20/6484d6b0afa7cf4097cc0050281a3d0c21f021724cd286385e06e62d77f0/upmex-1.6.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-28 01:29:25",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "SemClone",
    "github_project": "upmex",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "upmex"
}
        
Elapsed time: 3.87842s