# sem-meta
A unified Python package for SEM (Scanning Electron Microscopy) image processing, providing metadata extraction, OCR-based pixel size estimation, and unit conversion utilities.
## Features
- **SEMMetaData**: Extract and format metadata from SEM images
- **SEMOCR**: OCR-based pixel size estimation from scale bars
- **ConvertPS**: Unit conversion and error analysis for pixel size data
- **FullSEMKeys**: Standardized metadata key list for SEM images
## Installation
Install from PyPI:
```bash
pip install sem-meta
```
## Quick Start
```python
from sem_meta import SEMMeta, OCRPS, ConvertScale, FullSEMKeys
# Extract metadata from SEM images
metadata = SEMMeta.extract_metadata("path/to/sem/image.tif")
# Perform OCR on scale bars
pixel_size = OCRPS.extract_pixel_size("path/to/image/with/scalebar.tif")
# Convert units
converted_size = ConvertScale.convert_units(pixel_size, "μm", "nm")
# Access standardized SEM metadata keys
sem_keys = FullSEMKeys
```
## Main Components
### SEMMetaData
Extracts and processes metadata from SEM image files, particularly TIFF files with EXIF data.
### SEMOCR
Uses OCR (Optical Character Recognition) to extract pixel size information from scale bars in SEM images. Includes noise filtering and error correction for common OCR mistakes.
### ConvertPS
Handles unit conversion and normalization for pixel size measurements, supporting various scientific units commonly used in microscopy.
### FullSEMKeys
Provides a standardized set of metadata keys for consistent SEM image annotation and data extraction.
## Dependencies
- numpy: Numerical computations
- PIL (Pillow): Image processing
- pymysql: SQL safety utilities
- termcolor: Terminal output styling
- matplotlib: Visualization
- opencv-python: Advanced image preprocessing
- pytesseract: OCR functionality
## Requirements
- Python 3.6+
- Tesseract OCR engine (for OCR functionality)
### Installing Tesseract
**Ubuntu/Debian:**
```bash
sudo apt install tesseract-ocr
```
**macOS:**
```bash
brew install tesseract
```
**Windows:**
Download and install from: https://github.com/UB-Mannheim/tesseract/wiki
## Usage Examples
### Extracting SEM Metadata
```python
from sem_meta import SEMMeta
# Initialize and extract metadata
sem_processor = SEMMeta
metadata = sem_processor.extract_metadata("sample.tif")
print(metadata)
```
### OCR-based Scale Bar Reading
```python
from sem_meta import OCRPS
# Extract pixel size from scale bar
ocr_processor = OCRPS
pixel_size = ocr_processor.extract_pixel_size("sem_image.tif")
print(f"Pixel size: {pixel_size}")
```
### Unit Conversion
```python
from sem_meta import ConvertScale
# Convert between units
converter = ConvertScale
result = converter.convert_units("0.5 μm", "nm")
print(f"Converted: {result}")
```
## File Structure
```
sem-meta/
├── src/
│ └── sem_meta/
│ ├── __init__.py
│ ├── metadata_Module.py # SEMMetaData class
│ ├── ocr_Module.py # SEMOCR class
│ ├── convert_Module.py # ConvertPS class
│ ├── SEMKEYS.py # FullSEMKeys definitions
│ └── OCR_NOISE_DB.py # OCR noise patterns database
├── README.md
├── LICENSE
└── pyproject.toml
```
## Contributing
1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add some amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Authors
- Ahmed Khalil - Initial work
## Acknowledgments
- Built for the SEM imaging community
- Supports various SEM manufacturers' metadata formats
- Includes extensive OCR noise pattern recognition
## Support
If you encounter any problems or have questions, please open an issue on the GitHub repository.
Raw data
{
"_id": null,
"home_page": "https://github.com/asaleh33/sem-meta",
"name": "sem-meta",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": "Ahmed Khalil <salehastro@gmail.com>, Tommaso Rodani <tommaso.rodani.phd@gmail.com>",
"keywords": "SEM, microscopy, image processing, metadata extraction, OCR, pixel size, scientific imaging, electron microscopy",
"author": "Ahmed Khalil, Tommaso Rodani",
"author_email": "Ahmed Khalil <salehastro@gmail.com>, Tommaso Rodani <tommaso.rodani.phd@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/ca/a2/5b8cee70affe9ab67e39834aca51a72a6e70932a592971cc9a17dc2c1046/sem_meta-0.1.0.tar.gz",
"platform": null,
"description": "# sem-meta\n\nA unified Python package for SEM (Scanning Electron Microscopy) image processing, providing metadata extraction, OCR-based pixel size estimation, and unit conversion utilities.\n\n## Features\n\n- **SEMMetaData**: Extract and format metadata from SEM images\n- **SEMOCR**: OCR-based pixel size estimation from scale bars\n- **ConvertPS**: Unit conversion and error analysis for pixel size data\n- **FullSEMKeys**: Standardized metadata key list for SEM images\n\n## Installation\n\nInstall from PyPI:\n\n```bash\npip install sem-meta\n```\n\n## Quick Start\n\n```python\nfrom sem_meta import SEMMeta, OCRPS, ConvertScale, FullSEMKeys\n\n# Extract metadata from SEM images\nmetadata = SEMMeta.extract_metadata(\"path/to/sem/image.tif\")\n\n# Perform OCR on scale bars\npixel_size = OCRPS.extract_pixel_size(\"path/to/image/with/scalebar.tif\")\n\n# Convert units\nconverted_size = ConvertScale.convert_units(pixel_size, \"\u03bcm\", \"nm\")\n\n# Access standardized SEM metadata keys\nsem_keys = FullSEMKeys\n```\n\n## Main Components\n\n### SEMMetaData\nExtracts and processes metadata from SEM image files, particularly TIFF files with EXIF data.\n\n### SEMOCR\nUses OCR (Optical Character Recognition) to extract pixel size information from scale bars in SEM images. Includes noise filtering and error correction for common OCR mistakes.\n\n### ConvertPS\nHandles unit conversion and normalization for pixel size measurements, supporting various scientific units commonly used in microscopy.\n\n### FullSEMKeys\nProvides a standardized set of metadata keys for consistent SEM image annotation and data extraction.\n\n## Dependencies\n\n- numpy: Numerical computations\n- PIL (Pillow): Image processing\n- pymysql: SQL safety utilities\n- termcolor: Terminal output styling\n- matplotlib: Visualization\n- opencv-python: Advanced image preprocessing\n- pytesseract: OCR functionality\n\n## Requirements\n\n- Python 3.6+\n- Tesseract OCR engine (for OCR functionality)\n\n### Installing Tesseract\n\n**Ubuntu/Debian:**\n```bash\nsudo apt install tesseract-ocr\n```\n\n**macOS:**\n```bash\nbrew install tesseract\n```\n\n**Windows:**\nDownload and install from: https://github.com/UB-Mannheim/tesseract/wiki\n\n## Usage Examples\n\n### Extracting SEM Metadata\n\n```python\nfrom sem_meta import SEMMeta\n\n# Initialize and extract metadata\nsem_processor = SEMMeta\nmetadata = sem_processor.extract_metadata(\"sample.tif\")\nprint(metadata)\n```\n\n### OCR-based Scale Bar Reading\n\n```python\nfrom sem_meta import OCRPS\n\n# Extract pixel size from scale bar\nocr_processor = OCRPS\npixel_size = ocr_processor.extract_pixel_size(\"sem_image.tif\")\nprint(f\"Pixel size: {pixel_size}\")\n```\n\n### Unit Conversion\n\n```python\nfrom sem_meta import ConvertScale\n\n# Convert between units\nconverter = ConvertScale\nresult = converter.convert_units(\"0.5 \u03bcm\", \"nm\")\nprint(f\"Converted: {result}\")\n```\n\n## File Structure\n\n```\nsem-meta/\n\u251c\u2500\u2500 src/\n\u2502 \u2514\u2500\u2500 sem_meta/\n\u2502 \u251c\u2500\u2500 __init__.py\n\u2502 \u251c\u2500\u2500 metadata_Module.py # SEMMetaData class\n\u2502 \u251c\u2500\u2500 ocr_Module.py # SEMOCR class\n\u2502 \u251c\u2500\u2500 convert_Module.py # ConvertPS class\n\u2502 \u251c\u2500\u2500 SEMKEYS.py # FullSEMKeys definitions\n\u2502 \u2514\u2500\u2500 OCR_NOISE_DB.py # OCR noise patterns database\n\u251c\u2500\u2500 README.md\n\u251c\u2500\u2500 LICENSE\n\u2514\u2500\u2500 pyproject.toml\n```\n\n## Contributing\n\n1. Fork the repository\n2. Create a feature branch (`git checkout -b feature/amazing-feature`)\n3. Commit your changes (`git commit -m 'Add some amazing feature'`)\n4. Push to the branch (`git push origin feature/amazing-feature`)\n5. Open a Pull Request\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## Authors\n\n- Ahmed Khalil - Initial work\n\n## Acknowledgments\n\n- Built for the SEM imaging community\n- Supports various SEM manufacturers' metadata formats\n- Includes extensive OCR noise pattern recognition\n\n## Support\n\nIf you encounter any problems or have questions, please open an issue on the GitHub repository.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Unified interface for SEM image processing: metadata extraction, OCR-based pixel size estimation, and unit conversion",
"version": "0.1.0",
"project_urls": {
"Bug Tracker": "https://github.com/asaleh33/sem-meta/issues",
"Documentation": "https://github.com/asaleh33/sem-meta#readme",
"Homepage": "https://github.com/asaleh33/sem-meta",
"Repository": "https://github.com/asaleh33/sem-meta"
},
"split_keywords": [
"sem",
" microscopy",
" image processing",
" metadata extraction",
" ocr",
" pixel size",
" scientific imaging",
" electron microscopy"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "c89e9f05007f5ed504129bbf9833228d3649c4ef2fa925301d371f7bd6f7f5a3",
"md5": "50017c1ba9f6f448ad6c2310cd693dec",
"sha256": "f0af9512333ac5c6c720331c6484deceec2f68679621755ff8d78fb50ad3d918"
},
"downloads": -1,
"filename": "sem_meta-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "50017c1ba9f6f448ad6c2310cd693dec",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 32000,
"upload_time": "2025-09-05T15:12:44",
"upload_time_iso_8601": "2025-09-05T15:12:44.180491Z",
"url": "https://files.pythonhosted.org/packages/c8/9e/9f05007f5ed504129bbf9833228d3649c4ef2fa925301d371f7bd6f7f5a3/sem_meta-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "caa25b8cee70affe9ab67e39834aca51a72a6e70932a592971cc9a17dc2c1046",
"md5": "a8de57064753bebc00bcc5e805a6d3ef",
"sha256": "c49f11805f0814c67ecda223151c63cf8418dd473b084706377fa16f3062ea2e"
},
"downloads": -1,
"filename": "sem_meta-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "a8de57064753bebc00bcc5e805a6d3ef",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 37740,
"upload_time": "2025-09-05T15:12:45",
"upload_time_iso_8601": "2025-09-05T15:12:45.741303Z",
"url": "https://files.pythonhosted.org/packages/ca/a2/5b8cee70affe9ab67e39834aca51a72a6e70932a592971cc9a17dc2c1046/sem_meta-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-05 15:12:45",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "asaleh33",
"github_project": "sem-meta",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "sem-meta"
}