# Unknown Data - Digital Forensics Data Processing Library
[](https://www.python.org/downloads/)
[](https://opensource.org/licenses/MIT)
[](https://badge.fury.io/py/unknown-data)
A comprehensive Python library for parsing and processing digital forensics data artifacts. This library provides standardized interfaces for handling various types of forensic data including browser artifacts, deleted files, link files, messenger data, prefetch files, and USB device information.
## Features
- **Multi-format Support**: Handle browser data, deleted files, link files, messenger data, prefetch files, and USB artifacts
- **Standardized Processing**: Consistent interface for all data types
- **Flexible Data Sources**: Support for local files and cloud storage (S3)
- **DataFrame Output**: All processed data is converted to pandas DataFrames for easy analysis
- **Comprehensive Logging**: Built-in logging for debugging and monitoring
- **Type Safety**: Full type hints support for better development experience
## Installation
```bash
pip install unknown-data
```
## Quick Start
### Local File Processing
```python
from unknown_data import Category, Encoder, DataLoader, DataSaver
# Load data from local file
loader = DataLoader()
data = loader.local_data_load(Category.BROWSER, "./data/browser_results.json")
# Process the data
encoder = Encoder()
browser_encoder = encoder.convert_data(data, Category.BROWSER)
# Get processed results
results = browser_encoder.get_result_dfs()
# Save results to CSV files
saver = DataSaver("./output/path")
saver.save_all(results)
```
### AWS S3 Integration
```python
from unknown_data import Category, DataLoader
# Configure S3 access with task_id structure
s3_config = {
'bucket': 'your-forensic-data-bucket',
'task_id': '550e8400-e29b-41d4-a716-446655440000', # UUID format
'region': 'us-west-2', # optional
'profile': 'forensics' # optional AWS profile
}
# Load data from S3
# S3 path will be: {bucket}/{task_id}/browser_data.json
loader = DataLoader()
browser_data = loader.s3_data_load(Category.BROWSER, s3_config)
# The data is automatically loaded and ready for processing
print(f"Loaded {len(browser_data.get('browser_data', []))} browser records")
# Load different types of data from the same task
deleted_data = loader.s3_data_load(Category.DELETED, s3_config)
usb_data = loader.s3_data_load(Category.USB, s3_config)
```
## Supported Data Types
### Browser Artifacts
- History (URLs, visits, downloads)
- Cookies
- Login data
- Web data
### Deleted Files
- MFT deleted files
- Recycle bin files
- Collection metadata
### Other Artifacts
- Link (LNK) files
- Messenger data
- Prefetch files
- USB device information
## Data Structure
The library expects JSON data in specific formats for each category. Here's an example for browser data:
```python
browser_data = {
"collected_files": [...],
"collection_time": "2023-01-01T10:00:00",
"detailed_files": [...],
"discovered_profiles": [...],
"statistics": {...},
"temp_directory": "/tmp/extraction"
}
```
## Advanced Usage
### Custom Data Processing
```python
from unknown_data import BrowserDataEncoder
# Create specific encoder
encoder = BrowserDataEncoder()
# Process data
encoder.convert_data(your_data)
# Access specific results
chrome_data = encoder.chrome_data
edge_data = encoder.edge_data
```
### Cloud Storage Support
```python
# Load from S3 (requires boto3 configuration)
s3_config = {
"bucket": "your-bucket",
"key": "path/to/data.json"
}
data = loader.s3_data_load(Category.BROWSER, s3_config)
```
## Requirements
- Python 3.8+
- pandas >= 1.3.0
- numpy
- jsonschema
- boto3 (for S3 support)
## AWS S3 Configuration
### Setting up AWS Credentials
Before using S3 features, configure your AWS credentials using one of these methods:
#### 1. AWS CLI Configuration
```bash
aws configure
```
#### 2. Environment Variables
```bash
export AWS_ACCESS_KEY_ID=your_access_key
export AWS_SECRET_ACCESS_KEY=your_secret_key
export AWS_DEFAULT_REGION=us-west-2
```
#### 3. AWS Profile
```python
s3_config = {
'bucket': 'your-bucket',
'key': 'path/to/file.json',
'profile': 'your-aws-profile'
}
```
### S3 Data Structure
Your S3 bucket should organize forensic data using the task_id structure:
```
your-forensic-bucket/
├── 550e8400-e29b-41d4-a716-446655440000/ # task_id (UUID)
│ ├── browser_data.json
│ ├── deleted_data.json
│ ├── usb_data.json
│ ├── messenger_data.json
│ ├── prefetch_data.json
│ └── lnk_data.json
├── 6ba7b810-9dad-11d1-80b4-00c04fd430c8/ # another task_id
│ ├── browser_data.json
│ └── ...
└── ...
```
Each `{category.value}_data.json` file contains the forensic data for that specific category. The library automatically constructs the S3 key as `{task_id}/{category.value}_data.json`.
│ ├── case001/
│ │ ├── browser_data.json
│ │ ├── deleted_data.json
│ │ └── usb_data.json
│ └── case002/
│ └── messenger_data.json
└── archive/
└── old_cases/
```
### Error Handling
The library provides comprehensive error handling for S3 operations:
```python
from unknown_data import DataLoader, Category
from botocore.exceptions import NoCredentialsError, ClientError
try:
loader = DataLoader()
data = loader.s3_data_load(Category.BROWSER, s3_config)
except NoCredentialsError:
print("AWS credentials not found. Please configure your credentials.")
except FileNotFoundError as e:
print(f"File not found: {e}")
except ClientError as e:
print(f"AWS error: {e}")
```
## Development
### Setting up Development Environment
```bash
# Clone the repository
git clone https://github.com/daehan00/unknown_parsing_module.git
cd unknown_parsing_module
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install in development mode
pip install -e .
# Install development dependencies
pip install pytest black mypy
```
### Running Tests
```bash
# Run all tests
pytest
# Run with coverage
pytest --cov=parsing_module
# Run specific test file
pytest tests/test_integration.py -v
```
## Changelog
### Version 0.1.0
- Initial release
- Support for browser artifacts processing
- Support for deleted files analysis
- Basic encoder framework
- Local and S3 data loading
- Comprehensive test coverage
## Contact
- GitHub: [@daehan00](https://github.com/daehan00)
- Repository: [unknown_parsing_module](https://github.com/daehan00/unknown_parsing_module)
Raw data
{
"_id": null,
"home_page": "https://github.com/yourusername/parsing-module-library",
"name": "unknown-data",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": null,
"keywords": "forensics, data, artifacts, parsing, digital-forensics",
"author": "Your Name",
"author_email": "daehan00 <daehan00@github.com>",
"download_url": "https://files.pythonhosted.org/packages/2a/53/98e87a332c53ac3868dd4d9f60f1b048acb56dc35b397a7b8fbb43a46353/unknown_data-0.1.0.tar.gz",
"platform": null,
"description": "# Unknown Data - Digital Forensics Data Processing Library\n\n[](https://www.python.org/downloads/)\n[](https://opensource.org/licenses/MIT)\n[](https://badge.fury.io/py/unknown-data)\n\nA comprehensive Python library for parsing and processing digital forensics data artifacts. This library provides standardized interfaces for handling various types of forensic data including browser artifacts, deleted files, link files, messenger data, prefetch files, and USB device information.\n\n## Features\n\n- **Multi-format Support**: Handle browser data, deleted files, link files, messenger data, prefetch files, and USB artifacts\n- **Standardized Processing**: Consistent interface for all data types\n- **Flexible Data Sources**: Support for local files and cloud storage (S3)\n- **DataFrame Output**: All processed data is converted to pandas DataFrames for easy analysis\n- **Comprehensive Logging**: Built-in logging for debugging and monitoring\n- **Type Safety**: Full type hints support for better development experience\n\n## Installation\n\n```bash\npip install unknown-data\n```\n\n## Quick Start\n\n### Local File Processing\n\n```python\nfrom unknown_data import Category, Encoder, DataLoader, DataSaver\n\n# Load data from local file\nloader = DataLoader()\ndata = loader.local_data_load(Category.BROWSER, \"./data/browser_results.json\")\n\n# Process the data\nencoder = Encoder()\nbrowser_encoder = encoder.convert_data(data, Category.BROWSER)\n\n# Get processed results\nresults = browser_encoder.get_result_dfs()\n\n# Save results to CSV files\nsaver = DataSaver(\"./output/path\")\nsaver.save_all(results)\n```\n\n### AWS S3 Integration\n\n```python\nfrom unknown_data import Category, DataLoader\n\n# Configure S3 access with task_id structure\ns3_config = {\n 'bucket': 'your-forensic-data-bucket',\n 'task_id': '550e8400-e29b-41d4-a716-446655440000', # UUID format\n 'region': 'us-west-2', # optional\n 'profile': 'forensics' # optional AWS profile\n}\n\n# Load data from S3\n# S3 path will be: {bucket}/{task_id}/browser_data.json\nloader = DataLoader()\nbrowser_data = loader.s3_data_load(Category.BROWSER, s3_config)\n\n# The data is automatically loaded and ready for processing\nprint(f\"Loaded {len(browser_data.get('browser_data', []))} browser records\")\n\n# Load different types of data from the same task\ndeleted_data = loader.s3_data_load(Category.DELETED, s3_config)\nusb_data = loader.s3_data_load(Category.USB, s3_config)\n```\n\n## Supported Data Types\n\n### Browser Artifacts\n- History (URLs, visits, downloads)\n- Cookies\n- Login data\n- Web data\n\n### Deleted Files\n- MFT deleted files\n- Recycle bin files\n- Collection metadata\n\n### Other Artifacts\n- Link (LNK) files\n- Messenger data\n- Prefetch files\n- USB device information\n\n## Data Structure\n\nThe library expects JSON data in specific formats for each category. Here's an example for browser data:\n\n```python\nbrowser_data = {\n \"collected_files\": [...],\n \"collection_time\": \"2023-01-01T10:00:00\",\n \"detailed_files\": [...],\n \"discovered_profiles\": [...],\n \"statistics\": {...},\n \"temp_directory\": \"/tmp/extraction\"\n}\n```\n\n## Advanced Usage\n\n### Custom Data Processing\n\n```python\nfrom unknown_data import BrowserDataEncoder\n\n# Create specific encoder\nencoder = BrowserDataEncoder()\n\n# Process data\nencoder.convert_data(your_data)\n\n# Access specific results\nchrome_data = encoder.chrome_data\nedge_data = encoder.edge_data\n```\n\n### Cloud Storage Support\n\n```python\n# Load from S3 (requires boto3 configuration)\ns3_config = {\n \"bucket\": \"your-bucket\",\n \"key\": \"path/to/data.json\"\n}\ndata = loader.s3_data_load(Category.BROWSER, s3_config)\n```\n\n## Requirements\n\n- Python 3.8+\n- pandas >= 1.3.0\n- numpy\n- jsonschema\n- boto3 (for S3 support)\n\n## AWS S3 Configuration\n\n### Setting up AWS Credentials\n\nBefore using S3 features, configure your AWS credentials using one of these methods:\n\n#### 1. AWS CLI Configuration\n```bash\naws configure\n```\n\n#### 2. Environment Variables\n```bash\nexport AWS_ACCESS_KEY_ID=your_access_key\nexport AWS_SECRET_ACCESS_KEY=your_secret_key\nexport AWS_DEFAULT_REGION=us-west-2\n```\n\n#### 3. AWS Profile\n```python\ns3_config = {\n 'bucket': 'your-bucket',\n 'key': 'path/to/file.json',\n 'profile': 'your-aws-profile'\n}\n```\n\n### S3 Data Structure\n\nYour S3 bucket should organize forensic data using the task_id structure:\n\n```\nyour-forensic-bucket/\n\u251c\u2500\u2500 550e8400-e29b-41d4-a716-446655440000/ # task_id (UUID)\n\u2502 \u251c\u2500\u2500 browser_data.json\n\u2502 \u251c\u2500\u2500 deleted_data.json\n\u2502 \u251c\u2500\u2500 usb_data.json\n\u2502 \u251c\u2500\u2500 messenger_data.json\n\u2502 \u251c\u2500\u2500 prefetch_data.json\n\u2502 \u2514\u2500\u2500 lnk_data.json\n\u251c\u2500\u2500 6ba7b810-9dad-11d1-80b4-00c04fd430c8/ # another task_id\n\u2502 \u251c\u2500\u2500 browser_data.json\n\u2502 \u2514\u2500\u2500 ...\n\u2514\u2500\u2500 ...\n```\n\nEach `{category.value}_data.json` file contains the forensic data for that specific category. The library automatically constructs the S3 key as `{task_id}/{category.value}_data.json`.\n\u2502 \u251c\u2500\u2500 case001/\n\u2502 \u2502 \u251c\u2500\u2500 browser_data.json\n\u2502 \u2502 \u251c\u2500\u2500 deleted_data.json\n\u2502 \u2502 \u2514\u2500\u2500 usb_data.json\n\u2502 \u2514\u2500\u2500 case002/\n\u2502 \u2514\u2500\u2500 messenger_data.json\n\u2514\u2500\u2500 archive/\n \u2514\u2500\u2500 old_cases/\n```\n\n### Error Handling\n\nThe library provides comprehensive error handling for S3 operations:\n\n```python\nfrom unknown_data import DataLoader, Category\nfrom botocore.exceptions import NoCredentialsError, ClientError\n\ntry:\n loader = DataLoader()\n data = loader.s3_data_load(Category.BROWSER, s3_config)\nexcept NoCredentialsError:\n print(\"AWS credentials not found. Please configure your credentials.\")\nexcept FileNotFoundError as e:\n print(f\"File not found: {e}\")\nexcept ClientError as e:\n print(f\"AWS error: {e}\")\n```\n\n## Development\n\n### Setting up Development Environment\n\n```bash\n# Clone the repository\ngit clone https://github.com/daehan00/unknown_parsing_module.git\ncd unknown_parsing_module\n\n# Create virtual environment\npython -m venv venv\nsource venv/bin/activate # On Windows: venv\\Scripts\\activate\n\n# Install in development mode\npip install -e .\n\n# Install development dependencies\npip install pytest black mypy\n```\n\n### Running Tests\n\n```bash\n# Run all tests\npytest\n\n# Run with coverage\npytest --cov=parsing_module\n\n# Run specific test file\npytest tests/test_integration.py -v\n```\n\n## Changelog\n\n### Version 0.1.0\n- Initial release\n- Support for browser artifacts processing\n- Support for deleted files analysis\n- Basic encoder framework\n- Local and S3 data loading\n- Comprehensive test coverage\n\n## Contact\n\n- GitHub: [@daehan00](https://github.com/daehan00)\n- Repository: [unknown_parsing_module](https://github.com/daehan00/unknown_parsing_module)\n\n",
"bugtrack_url": null,
"license": null,
"summary": "A library for parsing and processing forensic data artifacts.",
"version": "0.1.0",
"project_urls": {
"Bug Tracker": "https://github.com/daehan00/unknown_parsing_module/issues",
"Documentation": "https://github.com/daehan00/unknown_parsing_module#readme",
"Homepage": "https://github.com/daehan00/unknown_parsing_module",
"Repository": "https://github.com/daehan00/unknown_parsing_module"
},
"split_keywords": [
"forensics",
" data",
" artifacts",
" parsing",
" digital-forensics"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "c224329db122d5cabd6d0808c6b9d8d12e913d413185efb920866ba8258302d5",
"md5": "ec21ff85142c11e5f201558b7bcb50f0",
"sha256": "019f62aaab9d61e93f795671d096afabf4407a33ecd6366671e125f72e08ca28"
},
"downloads": -1,
"filename": "unknown_data-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ec21ff85142c11e5f201558b7bcb50f0",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 15726,
"upload_time": "2025-09-01T09:25:00",
"upload_time_iso_8601": "2025-09-01T09:25:00.888095Z",
"url": "https://files.pythonhosted.org/packages/c2/24/329db122d5cabd6d0808c6b9d8d12e913d413185efb920866ba8258302d5/unknown_data-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "2a5398e87a332c53ac3868dd4d9f60f1b048acb56dc35b397a7b8fbb43a46353",
"md5": "062ba261f234b42472f95c3ad3d88fc7",
"sha256": "6c6956439df3764358bd7b85e7eb6f91e0f1be80bfbb18cb464644d604dbd846"
},
"downloads": -1,
"filename": "unknown_data-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "062ba261f234b42472f95c3ad3d88fc7",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 23328,
"upload_time": "2025-09-01T09:25:02",
"upload_time_iso_8601": "2025-09-01T09:25:02.099120Z",
"url": "https://files.pythonhosted.org/packages/2a/53/98e87a332c53ac3868dd4d9f60f1b048acb56dc35b397a7b8fbb43a46353/unknown_data-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-01 09:25:02",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "yourusername",
"github_project": "parsing-module-library",
"github_not_found": true,
"lcname": "unknown-data"
}