# MicrographCleaner
**MicrographCleaner** (micrograph_cleaner_em) is a python package designed to segment cryo-EM
micrographs into:
- carbon/high-contrast or contaminated regions
- good regions
so that incorrectly picked coordinates can be easily ruled out
To get a complete description of usage execute
`cleanMics -h`
##### Example
`cleanMics -c path/to/inputCoords/ -o path/to/outputCoords/ -b 180 -s 1.0 -i /path/to/micrographs/ --predictedMaskDir path/to/store/masks --deepThr 0.5`
## INSTALLATION:
*New!* The main branch is quite old and won't work on a modern GPU card. If it is your case, you can try to install from the new [tf2 branch](https://github.com/rsanchezgarc/micrograph_cleaner_em/tree/tf2)
This version has been tested with Python 3.10
1) (optional) create and activate the virtual environment
```
pip install virtualenv
virtualenv --system-site-packages -p python3 ./env_micrograph_cleaner_em
source ./env_micrograph_cleaner_em/bin/activate
```
or conda environment
```
conda create -n env_micrograph_cleaner_em python=3.10
conda activate env_micrograph_cleaner_em
```
2) Install micrograph_cleaner_em
PyPI
`pip install micrograph-cleaner-em`
To get the latest version, install from git instead.
```
git clone https://github.com/rsanchezgarc/micrograph_cleaner_em.git
cd micrograph_cleaner_em
pip install -e .
```
or
`pip install git+https://github.com/rsanchezgarc/micrograph_cleaner_em.git`
3) Download deep learning model
`cleanMics --download`
4) Ready
## USAGE
MicrographCleaner employs an U-net-based deep learning model to segmentate micrographs into good regions and bad regions. Thus, it is mainly used as a post-processing step after particle picking in which coordinates selected in high contrast artefacts, such as carbon, will be ruled out. Additionally, it can be employed to generate binary masks so that particle pickers can be prevented from considering problematic regions.
Thus, micrograph_cleaner employs as a mandatory argument a(some) micrograph(s) fileneame(s) and the particle size in pixels (with respect input mics). Additionally it can recive as input:
1) A directory where picked coordinates are located and another directory where scored/cleaned coordiantes will be saved. Coordinates will be saved in pos format or plain text (columns whith header colnames x and y) are located.
There must be one different coordinates file for each micrograph named as the micrograph and the output coordiantes will preserve the naming.
E.g. -c path/to/inputCoordsDirectory/ -o /path/to/outputCoordsDirectory/
Allowed formats are xmipp pos, relion star and raw text tab separated with at least two columns named as xcoor, ycoor in the header.
Raw text file example:
```
micFname1.tab:
###########################################
xcoor ycoor otherInfo1 otherInfo2
12 143 -1 0.1
431 4341 0 0.2
323 321 1 0.213
###########################################
```
2) A directory where predicted masks will be saved (mrc format).
E.g. --predictedMaskDir path/where/predictedMasksWillBeSaved/
3) A downsampling factor (can be less than 1 if actually upsampling was performed) in case the coordinates where picked from
micrographs at different scale.
E.g. -s 2 will downsample coordinates by a factor 2 and then it will apply the predicted mask that is as big as the input micrographs. This
case corresponds to an example in which we use for particle picking raw micrographs but we are using MicrographCleaner with downsampled mics
4) Any combination of previous options.
Trained MicrographCleaner model is available [here](https://scipion.cnb.csic.es/downloads/scipion/software/em/xmipp_model_deepMicrographCleaner.tgz) and can be automatically download executing
`cleanMics --download`
#### Examples
```
#Donwload deep learning model
cleanMics --download
#Compute masks from imput micrographs and store them
cleanMics -b $BOX_SIXE -i /path/to/micrographs/ --predictedMaskDir path/to/store/masks
#Rule out input bad coordinates (threshold<0.5) and store them into path/to/outputCoords
cleanMics -c path/to/inputCoords/ -o path/to/outputCoords/ -b $BOX_SIXE -s $DOWN_FACTOR -i /path/to/micrographs/ --deepThr 0.5
#Compute goodness scores from input coordinates and store them into path/to/outputCoords
cleanMics -c path/to/inputCoords/ -o path/to/outputCoords/ -b $BOX_SIXE -s $DOWN_FACTOR -i /path/to/micrographs/ --deepThr 0.5
```
## API:
The fundamental class employed within MicrographCleaner is MaskPredictor, a class designed to predict a contamination/carbon
mask given a micrograph.
##### class micrograph_cleaner_em.MaskPredictor
Usage: predicts masks of shape HxW given one numpy array of shape HxW that represents a micrograph.
Mask values range from 0. to 1., being 0. associated to clean regions and 1. to contamination.
##### builder
```
micrograph_cleaner_em.MaskPredictor(boxSize, deepLearningModelFname=DEFAULT_PATH , gpus=[0], strideFactor=2)
:param boxSize (int): estimated particle boxSize in pixels
:param deepLearningModelFname (str): a path where the deep learning model will be loaded. DEFAULT_PATH="~/.local/share/micrograph_cleaner_em/models/defaultModel.keras"
:param gpus (list of gpu ids (ints) or None): If None, CPU only mode will be employed.
:param strideFactor (int): Overlapping between windows. Micrographs are divided into patches and each processed individually.
The overlapping factor indicates how many times a given row/column is processed by the network. The
bigger the better the predictions, but higher computational cost.
```
##### methods
```
predictMask(self, inputMic, preproDownsampleMic=1, outputPrecision=np.float32):
Obtains a contamination mask for a given inputMic
:param inputMic (np.array shape HxW): the micrograph to clean
:param preproDownsampleMic: the downsampling factor applied to the micrograph before processing. Make it bigger if
large carbon areas are not identified
:param outputPrecision: the type of the floating point number desired as input. Default float32
:return: mask (np.array shape HxW): a mask that ranges from 0. to 1. ->
0. meaning clean area and 1. contaminated area.
```
```
getDownFactor(self):
MaskPredictor preprocess micrographs before Nnet computation. First step is donwsampling using a donwsampling factor
that depends on particle boxSize. This function computes the downsampling factor
:return (float): the donwsampling factor that MaskPredictor uses internally when preprocessing the micrographs
close(self):
Used to release memory
```
##### example
The following lines show how to compute the mask for a given micrograph
```
import numpy as np
import mrcfile
import micrograph_cleaner_em as mce
boxSize = 128 #pixels
# Load the micrograph data, for mrc files you can use mrcifle
# but you can use any other method that return a numpy array
with mrcfile.open('/path/to/micrograph.mrc') as mrc:
mic = mrc.data
# By default, the mask predictor will try load the model at
# "~/.local/share/micrograph_cleaner_em/models/"
# provide , deepLearningModelFname= modelPath argument to the builder
# if the model is placed in other location
with mce.MaskPredictor(boxSize, gpus=[0]) as mp:
mask = mp.predictMask(mic) #by default, mask is float32 numpy array
# Then write the mask as a file
with mrcfile.new('mask.mrc', overwrite=True) as maskFile:
maskFile.set_data(mask.astype(np.half)) # as float
```
## Dataset
The model and dataset used in this work can be downloaded from https://zenodo.org/records/17093439
Raw data
{
"_id": null,
"home_page": "https://github.com/rsanchezgarc/micrograph_cleaner_em",
"name": "micrograph-cleaner-em",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": null,
"author": "Original authors + maintainers",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/25/32/c34044dd90be429b9ee743cb058ba0efad9079f13a9eee72f8a762b637d4/micrograph_cleaner_em-1.0.2.tar.gz",
"platform": null,
"description": "# MicrographCleaner\n**MicrographCleaner** (micrograph_cleaner_em) is a python package designed to segment cryo-EM\n micrographs into:\n\n - carbon/high-contrast or contaminated regions \n - good regions\n \nso that incorrectly picked coordinates can be easily ruled out\n\nTo get a complete description of usage execute\n\n`cleanMics -h`\n\n##### Example\n\n`cleanMics -c path/to/inputCoords/ -o path/to/outputCoords/ -b 180 -s 1.0 -i /path/to/micrographs/ --predictedMaskDir path/to/store/masks --deepThr 0.5`\n\n\n## INSTALLATION:\n*New!* The main branch is quite old and won't work on a modern GPU card. If it is your case, you can try to install from the new [tf2 branch](https://github.com/rsanchezgarc/micrograph_cleaner_em/tree/tf2)\n\nThis version has been tested with Python 3.10\n\n1) (optional) create and activate the virtual environment\n```\npip install virtualenv\nvirtualenv --system-site-packages -p python3 ./env_micrograph_cleaner_em\nsource ./env_micrograph_cleaner_em/bin/activate\n```\nor conda environment\n```\nconda create -n env_micrograph_cleaner_em python=3.10\nconda activate env_micrograph_cleaner_em\n\n```\n \n2) Install micrograph_cleaner_em\n\nPyPI\n\n`pip install micrograph-cleaner-em`\n\n\nTo get the latest version, install from git instead.\n\n\n```\ngit clone https://github.com/rsanchezgarc/micrograph_cleaner_em.git\ncd micrograph_cleaner_em\npip install -e .\n```\nor\n \n`pip install git+https://github.com/rsanchezgarc/micrograph_cleaner_em.git`\n\n \n3) Download deep learning model\n`cleanMics --download`\n \n4) Ready\n\n## USAGE\n\nMicrographCleaner employs an U-net-based deep learning model to segmentate micrographs into good regions and bad regions. Thus, it is mainly used as a post-processing step after particle picking in which coordinates selected in high contrast artefacts, such as carbon, will be ruled out. Additionally, it can be employed to generate binary masks so that particle pickers can be prevented from considering problematic regions.\nThus, micrograph_cleaner employs as a mandatory argument a(some) micrograph(s) fileneame(s) and the particle size in pixels (with respect input mics). Additionally it can recive as input:\n\n1) A directory where picked coordinates are located and another directory where scored/cleaned coordiantes will be saved. Coordinates will be saved in pos format or plain text (columns whith header colnames x and y) are located. \nThere must be one different coordinates file for each micrograph named as the micrograph and the output coordiantes will preserve the naming. \nE.g. -c path/to/inputCoordsDirectory/ -o /path/to/outputCoordsDirectory/\nAllowed formats are xmipp pos, relion star and raw text tab separated with at least two columns named as xcoor, ycoor in the header.\nRaw text file example:\n```\nmicFname1.tab:\n###########################################\nxcoor ycoor otherInfo1 otherInfo2\n12 143 -1 0.1\n431 4341 0 0.2\n323 321 1 0.213\n###########################################\n```\n2) A directory where predicted masks will be saved (mrc format).\nE.g. --predictedMaskDir path/where/predictedMasksWillBeSaved/\n\n3) A downsampling factor (can be less than 1 if actually upsampling was performed) in case the coordinates where picked from\nmicrographs at different scale.\nE.g. -s 2 will downsample coordinates by a factor 2 and then it will apply the predicted mask that is as big as the input micrographs. This\ncase corresponds to an example in which we use for particle picking raw micrographs but we are using MicrographCleaner with downsampled mics \n\n4) Any combination of previous options. \n\nTrained MicrographCleaner model is available [here](https://scipion.cnb.csic.es/downloads/scipion/software/em/xmipp_model_deepMicrographCleaner.tgz) and can be automatically download executing \n`cleanMics --download`\n\n\n\n#### Examples\n\n```\n#Donwload deep learning model\ncleanMics --download\n \n#Compute masks from imput micrographs and store them\ncleanMics -b $BOX_SIXE -i /path/to/micrographs/ --predictedMaskDir path/to/store/masks\n\n#Rule out input bad coordinates (threshold<0.5) and store them into path/to/outputCoords\ncleanMics -c path/to/inputCoords/ -o path/to/outputCoords/ -b $BOX_SIXE -s $DOWN_FACTOR -i /path/to/micrographs/ --deepThr 0.5\n\n#Compute goodness scores from input coordinates and store them into path/to/outputCoords\ncleanMics -c path/to/inputCoords/ -o path/to/outputCoords/ -b $BOX_SIXE -s $DOWN_FACTOR -i /path/to/micrographs/ --deepThr 0.5 \n```\n\n## API:\n\n\nThe fundamental class employed within MicrographCleaner is MaskPredictor, a class designed to predict a contamination/carbon\nmask given a micrograph.\n\n\n##### class micrograph_cleaner_em.MaskPredictor\n\nUsage: predicts masks of shape HxW given one numpy array of shape HxW that represents a micrograph.\nMask values range from 0. to 1., being 0. associated to clean regions and 1. to contamination.\n\n\n##### builder\n```\nmicrograph_cleaner_em.MaskPredictor(boxSize, deepLearningModelFname=DEFAULT_PATH , gpus=[0], strideFactor=2)\n \n :param boxSize (int): estimated particle boxSize in pixels\n :param deepLearningModelFname (str): a path where the deep learning model will be loaded. DEFAULT_PATH=\"~/.local/share/micrograph_cleaner_em/models/defaultModel.keras\"\n :param gpus (list of gpu ids (ints) or None): If None, CPU only mode will be employed.\n :param strideFactor (int): Overlapping between windows. Micrographs are divided into patches and each processed individually.\n The overlapping factor indicates how many times a given row/column is processed by the network. The \n bigger the better the predictions, but higher computational cost.\n```\n\n##### methods\n\n\n```\npredictMask(self, inputMic, preproDownsampleMic=1, outputPrecision=np.float32):\n Obtains a contamination mask for a given inputMic\n\n :param inputMic (np.array shape HxW): the micrograph to clean\n :param preproDownsampleMic: the downsampling factor applied to the micrograph before processing. Make it bigger if\n large carbon areas are not identified\n :param outputPrecision: the type of the floating point number desired as input. Default float32\n :return: mask (np.array shape HxW): a mask that ranges from 0. to 1. ->\n 0. meaning clean area and 1. contaminated area.\n```\n\n```\ngetDownFactor(self):\n MaskPredictor preprocess micrographs before Nnet computation. First step is donwsampling using a donwsampling factor\n that depends on particle boxSize. This function computes the downsampling factor\n \n :return (float): the donwsampling factor that MaskPredictor uses internally when preprocessing the micrographs\n \nclose(self):\n Used to release memory\n```\n\n##### example\nThe following lines show how to compute the mask for a given micrograph\n\n```\nimport numpy as np\nimport mrcfile\nimport micrograph_cleaner_em as mce\n\nboxSize = 128 #pixels\n\n# Load the micrograph data, for mrc files you can use mrcifle\n# but you can use any other method that return a numpy array\n\nwith mrcfile.open('/path/to/micrograph.mrc') as mrc:\n mic = mrc.data\n\n# By default, the mask predictor will try load the model at \n# \"~/.local/share/micrograph_cleaner_em/models/\"\n# provide , deepLearningModelFname= modelPath argument to the builder \n# if the model is placed in other location \n\nwith mce.MaskPredictor(boxSize, gpus=[0]) as mp:\n mask = mp.predictMask(mic) #by default, mask is float32 numpy array\n \n# Then write the mask as a file\n\nwith mrcfile.new('mask.mrc', overwrite=True) as maskFile:\n maskFile.set_data(mask.astype(np.half)) # as float\n```\n\n## Dataset\nThe model and dataset used in this work can be downloaded from https://zenodo.org/records/17093439\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Deep-learning micrograph denoising/segmentation for cryo-EM (TF2/Keras3 rescue)",
"version": "1.0.2",
"project_urls": {
"Homepage": "https://github.com/rsanchezgarc/micrograph_cleaner_em"
},
"split_keywords": [],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "b6ab1f2be5e38fa10cddf9441e4dfe2e694d671b7242b25e5b75c522f7813e2a",
"md5": "d393e896b650aeb05c12dea0125a2f1f",
"sha256": "dd8b526fbf8134cb28d39a2df267391687ce88967acd85a906db94b742f39d7b"
},
"downloads": -1,
"filename": "micrograph_cleaner_em-1.0.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "d393e896b650aeb05c12dea0125a2f1f",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 5243549,
"upload_time": "2025-10-10T18:20:48",
"upload_time_iso_8601": "2025-10-10T18:20:48.278522Z",
"url": "https://files.pythonhosted.org/packages/b6/ab/1f2be5e38fa10cddf9441e4dfe2e694d671b7242b25e5b75c522f7813e2a/micrograph_cleaner_em-1.0.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "2532c34044dd90be429b9ee743cb058ba0efad9079f13a9eee72f8a762b637d4",
"md5": "2e6c1bfa0539aa1fe10f04e106acfa15",
"sha256": "4875adb56a66d7a16d779797ae93e1caf77cabb3dcc08a806872109f165d1e0a"
},
"downloads": -1,
"filename": "micrograph_cleaner_em-1.0.2.tar.gz",
"has_sig": false,
"md5_digest": "2e6c1bfa0539aa1fe10f04e106acfa15",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 5223594,
"upload_time": "2025-10-10T18:20:50",
"upload_time_iso_8601": "2025-10-10T18:20:50.120459Z",
"url": "https://files.pythonhosted.org/packages/25/32/c34044dd90be429b9ee743cb058ba0efad9079f13a9eee72f8a762b637d4/micrograph_cleaner_em-1.0.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-10 18:20:50",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "rsanchezgarc",
"github_project": "micrograph_cleaner_em",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "setuptools_scm",
"specs": []
},
{
"name": "tensorflow",
"specs": [
[
">=",
"2.16"
]
]
},
{
"name": "keras",
"specs": [
[
">=",
"3.2"
]
]
},
{
"name": "numpy",
"specs": [
[
">=",
"1.26"
]
]
},
{
"name": "scipy",
"specs": [
[
">=",
"1.10"
]
]
},
{
"name": "h5py",
"specs": [
[
">=",
"3.10"
]
]
},
{
"name": "joblib",
"specs": [
[
">=",
"1.4"
]
]
},
{
"name": "pandas",
"specs": [
[
">=",
"2.3"
]
]
},
{
"name": "scikit-image",
"specs": [
[
">=",
"0.22"
]
]
},
{
"name": "opencv-python-headless",
"specs": [
[
">=",
"4.8"
]
]
},
{
"name": "pillow",
"specs": [
[
">=",
"10.0"
]
]
},
{
"name": "imageio",
"specs": [
[
">=",
"2.31"
]
]
},
{
"name": "mrcfile",
"specs": [
[
">=",
"1.5"
]
]
},
{
"name": "matplotlib",
"specs": [
[
">=",
"3.8"
]
]
},
{
"name": "tqdm",
"specs": [
[
">=",
"4.66"
]
]
},
{
"name": "click",
"specs": [
[
">=",
"8.1"
]
]
},
{
"name": "protobuf",
"specs": [
[
">=",
"3.20"
],
[
"<",
"6"
]
]
},
{
"name": "pytest",
"specs": []
}
],
"lcname": "micrograph-cleaner-em"
}