.. meta::
:description: Distancia is a comprehensive Python package that provides a wide range of distance metrics and similarity measures, making it easy to calculate and compare the proximity between various types of data. This documentation provides an in-depth guide to the package, including installation instructions, usage examples, and detailed descriptions of each available metric.
:keywords: data-science machine-learning deep-learning neural-network graph text-classification text distance cython markov-chain file similarity image-classification nlp-machine-learning loss-functions distancia
:keywords lang=en: data-science machine-learning deep-learning neural-network graph text-classification text distance cython markov-chain file similarity image-classification nlp-machine-learning loss-functions distancia
======================================
Welcome to Distancia's documentation!
======================================
**Distancia** is a comprehensive Python package that provides a wide range of distance metrics and similarity measures, making it easy to calculate and compare the proximity between various types of data. This documentation provides an in-depth guide to the package, including installation instructions, usage examples, and detailed descriptions of each available metric.
The documentation is divided into the following sections:
.. note::
The code examples provided in this documentation are written for Python 3.x.
The python code in this package has been optimized by static typing with Cython
*Getting Started*
---------------
**Distancia** is designed to be simple and intuitive, yet powerful and flexible. Whether you are working with numerical data, strings, or other types of data, Distancia provides the tools you need to measure the distance or similarity between objects.
For a quick introduction, check out the `quickstart`_ guide. If you want to dive straight into the code, head over to the `Euclidean`_ page.
.. quickstart: https://distancia.readthedocs.io/en/latest/quickstart.html
.. _Euclidean: https://distancia.readthedocs.io/en/latest/Euclidean.html
.. note::
If you find any issues or have suggestions for improvements, feel free to contribute!
*Installation*
------------
You can install the distancia package with pip:
.. code-block:: bash
pip install distancia
By default, this will install the core functionality of the package, suitable for users who only need basic distance metrics.
Optional Dependencies
The **Distancia** package also supports optional modules to enable additional features. You can install these extras depending on your needs:
With pandas support: Install with additional support for working with tabular data:
.. code-block:: bash
pip install distancia[pandas]
With all supported extras: Install all optional dependencies for maximum functionality:
.. code-block:: bash
pip install distancia[all]
This modular installation allows you to keep your setup lightweight or include everything for full capabilities.
*Quickstart*
----------
Here are some common examples of how to use **Distancia**:
.. code-block:: python
:caption: Example 1: Calculating Euclidean Distance
from distancia import Euclidean
point1 = [1, 2, 3]
point2 = [4, 5, 6]
# Create an instance of Euclidean
euclidean = Euclidean()
# Calculate the Euclidean distance
distance = euclidean.compute(point1, point2)
print(f"Euclidean Distance: {distance:4f}")
.. code-block:: bash
>>>Euclidean Distance: 5.196
.. code-block:: python
:caption: Example 2: Calculating Levenshtein Distance
from distancia import Levenshtein
string1 = "kitten"
string2 = "sitting"
distance = Levenshtein().compute(string1, string2)
print(f"Levenshtein Distance: {distance:4f}")
.. code:: bash
>>>Levenshtein Distance: 3
For a complete list and detailed explanations of each metric, see the next section.
*Available Metrics*
-------------------
.. _Vector: https://distancia.readthedocs.io/en/latest/vectorDistance.html
.. _Manhattan: https://distancia.readthedocs.io/en/latest/Manhattan.html
.. _Minkowski: https://distancia.readthedocs.io/en/latest/Minkowski.html
.. _Jaro: https://distancia.readthedocs.io/en/latest/Jaro.html
.. _KendallTau: https://distancia.readthedocs.io/en/latest/KendallTau.html
.. _Bhattacharyya: https://distancia.readthedocs.io/en/latest/Bhattacharyya.html
.. _Haversine: https://distancia.readthedocs.io/en/latest/Haversine.html
.. _Chebyshev: https://distancia.readthedocs.io/en/latest/Chebyshev.html
.. _ContextualDynamicDistance: https://distancia.readthedocs.io/en/latest/ContextualDynamicDistance.html
.. _Canberra: https://distancia.readthedocs.io/en/latest/Canberra.html
.. _BrayCurtis: https://distancia.readthedocs.io/en/latest/BrayCurtis.html
.. _RogersTanimoto: https://distancia.readthedocs.io/en/latest/RogersTanimoto.html
.. _RussellRao: https://distancia.readthedocs.io/en/latest/RussellRao.html
.. _SokalMichener: https://distancia.readthedocs.io/en/latest/SokalMichener.html
.. _SokalSneath: https://distancia.readthedocs.io/en/latest/SokalSneath.html
.. _Wasserstein: https://distancia.readthedocs.io/en/latest/Wasserstein.html
.. _Gower: https://distancia.readthedocs.io/en/latest/Gower.html
.. _CzekanowskiDice: https://distancia.readthedocs.io/en/latest/CzekanowskiDice.html
.. _Hellinger: https://distancia.readthedocs.io/en/latest/Hellinger.html
.. _MotzkinStraus: https://distancia.readthedocs.io/en/latest/MotzkinStraus.html
.. _EnhancedRogersTanimoto: https://distancia.readthedocs.io/en/latest/EnhancedRogersTanimoto.html
.. _KullbackLeibler: https://distancia.readthedocs.io/en/latest/KullbackLeibler.html
.. _Jaccard: https://distancia.readthedocs.io/en/latest/Jaccard.html
.. _GeneralizedJaccard: https://distancia.readthedocs.io/en/latest/GeneralizedJaccard.html
.. _Tanimoto: https://distancia.readthedocs.io/en/latest/Tanimoto.html
.. _InverseTanimoto: https://distancia.readthedocs.io/en/latest/InverseTanimoto.html
.. _Ochiai: https://distancia.readthedocs.io/en/latest/Ochiai.html
.. _CzekanowskiDice: https://distancia.readthedocs.io/en/latest/CzekanowskiDice.html
.. _Pearson: https://distancia.readthedocs.io/en/latest/Pearson.html
.. _Spearman: https://distancia.readthedocs.io/en/latest/Spearman.html
.. _FagerMcGowan: https://distancia.readthedocs.io/en/latest/FagerMcGowan.html
.. _Otsuka: https://distancia.readthedocs.io/en/latest/Otsuka.html
.. _Gestalt: https://distancia.readthedocs.io/en/latest/Gestalt.html
.. _Matrix: https://distancia.readthedocs.io/en/latest/matrixDistance.html
.. _Mahalanobis: https://distancia.readthedocs.io/en/latest/Mahalanobis.html
.. _MahalanobisTaguchi: https://distancia.readthedocs.io/en/latest/MahalanobisTaguchi.html
.. _MatrixSpectral: https://distancia.readthedocs.io/en/latest/MatrixSpectral.html
.. _NormalizedSpectral: https://distancia.readthedocs.io/en/latest/NormalizedSpectral.html
.. _PureDiffusion: https://distancia.readthedocs.io/en/latest/PureDiffusion.html
.. _RandomWalk: https://distancia.readthedocs.io/en/latest/RandomWalk.html
.. _HeatKernel: https://distancia.readthedocs.io/en/latest/HeatKernel.html
.. _GraphEditMatrix: https://distancia.readthedocs.io/en/latest/GraphEditMatrix.html
.. _WeisfeilerLehman: https://distancia.readthedocs.io/en/latest/WeisfeilerLehman.html
.. _NetSimile: https://distancia.readthedocs.io/en/latest/NetSimile.html
.. _TriangleMatrixDistance: https://distancia.readthedocs.io/en/latest/TriangleMatrixDistance.html
.. _PatternBased: https://distancia.readthedocs.io/en/latest/PatternBased.html
.. _CliqueBasedGraph: https://distancia.readthedocs.io/en/latest/CliqueBasedGraph.html
.. _CycleMatrixDistance: https://distancia.readthedocs.io/en/latest/CycleMatrixDistance.html
.. _GraphletMatrixDistance: https://distancia.readthedocs.io/en/latest/GraphletMatrixDistance.html
.. _MinimumCutDistanceCalculator: https://distancia.readthedocs.io/en/latest/MinimumCutDistanceCalculator.html
.. _Percolation: https://distancia.readthedocs.io/en/latest/Percolation.html
.. _Text: https://distancia.readthedocs.io/en/latest/textDistance.html
.. _Levenshtein: https://distancia.readthedocs.io/en/latest/Levenshtein.html
.. _DamerauLevenshtein: https://distancia.readthedocs.io/en/latest/DamerauLevenshtein.html
.. _Hamming: https://distancia.readthedocs.io/en/latest/Hamming.html
.. _Cosine: https://distancia.readthedocs.io/en/latest/Cosine.html
.. _TFIDFDistance: https://distancia.readthedocs.io/en/latest/TFIDFDistance.html
.. _SimHash: https://distancia.readthedocs.io/en/latest/SimHash.html
.. _CosineTF: https://distancia.readthedocs.io/en/latest/CosineTF.html
.. _WordMoversDistance: https://distancia.readthedocs.io/en/latest/WordMoversDistance.html
.. _BERTBasedDistance: https://distancia.readthedocs.io/en/latest/BERTBasedDistance.html
.. _JaroWinkler: https://distancia.readthedocs.io/en/latest/JaroWinkler.html
.. _OverlapCoefficient: https://distancia.readthedocs.io/en/latest/OverlapCoefficient.html
.. _SorensenDice: https://distancia.readthedocs.io/en/latest/SorensenDice.html
.. _BagOfWordsDistance: https://distancia.readthedocs.io/en/latest/BagOfWordsDistance.html
.. _FastTextDistance: https://distancia.readthedocs.io/en/latest/FastTextDistance.html
.. _Dice: https://distancia.readthedocs.io/en/latest/Dice.html
.. _Tversky: https://distancia.readthedocs.io/en/latest/Tversky.html
.. _NgramDistance: https://distancia.readthedocs.io/en/latest/NgramDistance.html
.. _SmithWaterman: https://distancia.readthedocs.io/en/latest/SmithWaterman.html
.. _RatcliffObershelp: https://distancia.readthedocs.io/en/latest/RatcliffObershelp.html
.. _BLEUScore: https://distancia.readthedocs.io/en/latest/BLEUScore.html
.. _ROUGEScore: https://distancia.readthedocs.io/en/latest/ROUGEScore.html
.. _SoftCosineSimilarity: https://distancia.readthedocs.io/en/latest/SoftCosineSimilarity.html
.. _TopicModelingDistance: https://distancia.readthedocs.io/en/latest/TopicModelingDistance.html
.. _AlignmentBasedMeasures: https://distancia.readthedocs.io/en/latest/AlignmentBasedMeasures.html
.. _GappyNGramDistance: https://distancia.readthedocs.io/en/latest/GappyNGramDistance.html
.. _SoftJaccardSimilarity: https://distancia.readthedocs.io/en/latest/SoftJaccardSimilarity.html
.. _NormalizedCompressionDistance: https://distancia.readthedocs.io/en/latest/NormalizedCompressionDistance.html
.. _MongeElkanDistance: https://distancia.readthedocs.io/en/latest/MongeElkanDistance.html
.. _JensenShannonDivergence: https://distancia.readthedocs.io/en/latest/JensenShannonDivergence.html
.. _Time: https://distancia.readthedocs.io/en/latest/timeDistance.html
.. _DynamicTimeWarping: https://distancia.readthedocs.io/en/latest/DynamicTimeWarping.html
.. _LongestCommonSubsequence: https://distancia.readthedocs.io/en/latest/LongestCommonSubsequence.html
.. _Frechet: https://distancia.readthedocs.io/en/latest/Frechet.html
+ `Vector`_
.. - `Euclidean`_
- `Manhattan`_
- `Minkowski`_
- `Bhattacharyya`_
- `Haversine`_
- `Chebyshev`_
- `ContextualDynamicDistance`_
- `Canberra`_
- `BrayCurtis`_
- `RogersTanimoto`_
- `RussellRao`_
- `SokalMichener`_
- `SokalSneath`_
- `Wasserstein`_
- `Gower`_
- `CzekanowskiDice`_
- `Hellinger`_
- `MotzkinStraus`_
- `EnhancedRogersTanimoto`_
- `KullbackLeibler`_
- `Jaccard`_
- `GeneralizedJaccard`_
- `Tanimoto`_
- `InverseTanimoto`_
- `Ochiai`_
- `CzekanowskiDice`_
- `Pearson`_
- `Spearman`_
- `FagerMcGowan`_
- `Otsuka`_
- `Gestalt`_
+ `Matrix`_
.. - `Mahalanobis`_
- `MahalanobisTaguchi`_
- `MatrixSpectral`_
- `NormalizedSpectral`_
- `PureDiffusion`_
- `RandomWalk`_
- `HeatKernel`_
- `GraphEditMatrix`_
- `WeisfeilerLehman`_
- `NetSimile`_
- `TriangleMatrixDistance`_
- `PatternBased`_
- `CliqueBasedGraph`_
- `CycleMatrixDistance`_
- `GraphletMatrixDistance`_
- `MinimumCutDistanceCalculator`_
- `Percolation`_
+ `Text`_
.. - `Levenshtein`_
- `DamerauLevenshtein`_
- `Hamming`_
- `Cosine`_
- `TFIDFDistance`_
- `SimHash`_
- `CosineTF`_
- `WordMoversDistance`_
- `BERTBasedDistance`_
- `Jaro`_
- `JaroWinkler`_
- `OverlapCoefficient`_
- `SorensenDice`_
- `BagOfWordsDistance`_
- `FastTextDistance`_
- `Dice`_
- `Tversky`_
- `NgramDistance`_
- `SmithWaterman`_
- `RatcliffObershelp`_
- `BLEUScore`_
- `ROUGEScore`_
- `SoftCosineSimilarity`_
- `TopicModelingDistance`_
- `AlignmentBasedMeasures`_
- `GappyNGramDistance`_
- `SoftJaccardSimilarity`_
- `NormalizedCompressionDistance`_
- `MongeElkanDistance`_
- `JensenShannonDivergence`_
.. + 'statistics'
.. - `KendallTau`_
+ `Time`_
.. - `DynamicTimeWarping`_
- `LongestCommonSubsequence`_
- `Frechet`_
+ `Loss`_
.. - `CrossEntropy`_
- `MeanAbsoluteError`_
- `MeanAbsolutePercentageError`_
- `MeanSquaredError`_
- `SquaredLogarithmicError`_
- `GaloisWassersteinLoss`_
.. _Loss: https://distancia.readthedocs.io/en/latest/lossFunction.html
.. _CrossEntropy: https://distancia.readthedocs.io/en/latest/CrossEntropy.html
.. _MeanAbsoluteError: https://distancia.readthedocs.io/en/latest/MeanAbsoluteError.html
.. _MeanAbsolutePercentageError: https://distancia.readthedocs.io/en/latest/MeanAbsolutePercentageError.html
.. _MeanSquaredError: https://distancia.readthedocs.io/en/latest/MeanSquaredError.html
.. _SquaredLogarithmicError: https://distancia.readthedocs.io/en/latest/SquaredLogarithmicError.html
.. _GaloisWassersteinLoss: https://distancia.readthedocs.io/en/latest/GaloisWassersteinLoss.html
+ `Graph`_
.. - `ShortestPath`_
- `GraphEditDistance`_
- `SpectralDistance`_
- `WeisfeilerLehmanSimilarity`_
- `ComparingRandomWalkStationaryDistributions`_
- `Diffusion`_
- `FrobeniusDistance`_
- `GraphKernelDistance`_
- `PatternBasedDistance`_
- `GraphCompressionDistance`_
- `DegreeDistributionDistance`_
- `CommunityStructureDistance`_
.. _Graph: https://distancia.readthedocs.io/en/latest/graphDistance.html
.. _ShortestPath: https://distancia.readthedocs.io/en/latest/ShortestPath.html
.. _GraphEditDistance: https://distancia.readthedocs.io/en/latest/GraphEditDistance.html
.. _SpectralDistance: https://distancia.readthedocs.io/en/latest/SpectralDistance.html
.. _WeisfeilerLehmanSimilarity: https://distancia.readthedocs.io/en/latest/WeisfeilerLehmanSimilarity.html
.. _ComparingRandomWalkStationaryDistributions: https://distancia.readthedocs.io/en/latest/ComparingRandomWalkStationaryDistributions.html
.. _Diffusion: https://distancia.readthedocs.io/en/latest/Diffusion.html
.. _FrobeniusDistance: https://distancia.readthedocs.io/en/latest/FrobeniusDistance.html
.. _GraphKernelDistance: https://distancia.readthedocs.io/en/latest/GraphKernelDistance.html
.. _PatternBasedDistance: https://distancia.readthedocs.io/en/latest/PatternBasedDistance.html
.. _GraphCompressionDistance: https://distancia.readthedocs.io/en/latest/GraphCompressionDistance.html
.. _DegreeDistributionDistance: https://distancia.readthedocs.io/en/latest/DegreeDistributionDistance.html
.. _CommunityStructureDistance: https://distancia.readthedocs.io/en/latest/CommunityStructureDistance.html
+ `MarkovChaine`_
.. - `MarkovChainKullbackLeibler`_
- `MarkovChainWasserstein`_
- `MarkovChainTotalVariation`_
- `MarkovChainHellinger`_
- `MarkovChainJensenShannon`_
- `MarkovChainFrobenius`_
- `MarkovChainSpectral`_
.. _MarkovChaine: https://distancia.readthedocs.io/en/latest/markovChainDistance.html
.. _MarkovChainKullbackLeibler: https://distancia.readthedocs.io/en/latest/MarkovChainKullbackLeibler.html
.. _MarkovChainWasserstein: https://distancia.readthedocs.io/en/latest/MarkovChainWasserstein.html
.. _MarkovChainTotalVariation: https://distancia.readthedocs.io/en/latest/MarkovChainTotalVariation.html
.. _MarkovChainHellinger: https://distancia.readthedocs.io/en/latest/MarkovChainHellinger.html
.. _MarkovChainJensenShannon: https://distancia.readthedocs.io/en/latest/MarkovChainJensenShannon.html
.. _MarkovChainFrobenius: https://distancia.readthedocs.io/en/latest/MarkovChainFrobenius.html
.. _MarkovChainSpectral: https://distancia.readthedocs.io/en/latest/MarkovChainSpectral.html
+ `Image`_
.. - `StructuralSimilarityIndex`_
- `PeakSignalToNoiseRatio`_
- `HistogramIntersection`_
- `EarthMoversDistance`_
- `ChiSquareDistance`_
- `FeatureBasedDistance`_
- `PerceptualHashing`_
- `NormalizedCrossCorrelation`_
.. _Image: https://distancia.readthedocs.io/en/latest/imageDistance.html
.. _StructuralSimilarityIndex: https://distancia.readthedocs.io/en/latest/StructuralSimilarityIndex.html
.. _PeakSignalToNoiseRatio: https://distancia.readthedocs.io/en/latest/PeakSignalToNoiseRatio.html
.. _HistogramIntersection: https://distancia.readthedocs.io/en/latest/HistogramIntersection.html
.. _EarthMoversDistance: https://distancia.readthedocs.io/en/latest/EarthMoversDistance.html
.. _ChiSquareDistance: https://distancia.readthedocs.io/en/latest/ChiSquareDistance.html
.. _FeatureBasedDistance: https://distancia.readthedocs.io/en/latest/FeatureBasedDistance.html
.. _PerceptualHashing: https://distancia.readthedocs.io/en/latest/PerceptualHashing.html
.. _NormalizedCrossCorrelation: https://distancia.readthedocs.io/en/latest/NormalizedCrossCorrelation.html
+ `Sound`_
.. - `SpectralConvergence`_
- `MFCCProcessor`_
- `SignalProcessor`_
- `PowerSpectralDensityDistance`_
- `CrossCorrelation`_
- `PhaseDifferenceCalculator`_
- `TimeLagDistance`_
- `PESQ`_
- `LogSpectralDistance`_
- `BarkSpectralDistortion`_
- `ItakuraSaitoDistance`_
- `SignalToNoiseRatio`_
- `EnergyDistance`_
- `EnvelopeCorrelation`_
- `ZeroCrossingRateDistance`_
- `CochleagramDistance`_
- `ChromagramDistance`_
- `SpectrogramDistance`_
- `CQTDistance`_
.. _Sound: https://distancia.readthedocs.io/en/latest/soundDistance.html
.. _SpectralConvergence: https://distancia.readthedocs.io/en/latest/SpectralConvergence.html
.. _MFCCProcessor: https://distancia.readthedocs.io/en/latest/MFCCProcessor.html
.. _SignalProcessor: https://distancia.readthedocs.io/en/latest/SignalProcessor.html
.. _PowerSpectralDensityDistance: https://distancia.readthedocs.io/en/latest/PowerSpectralDensityDistance.html
.. _CrossCorrelation: https://distancia.readthedocs.io/en/latest/CrossCorrelation.html
.. _PhaseDifferenceCalculator: https://distancia.readthedocs.io/en/latest/PhaseDifferenceCalculator.html
.. _TimeLagDistance: https://distancia.readthedocs.io/en/latest/TimeLagDistance.html
.. _PESQ: https://distancia.readthedocs.io/en/latest/PESQ.html
.. _LogSpectralDistance: https://distancia.readthedocs.io/en/latest/LogSpectralDistance.html
.. _BarkSpectralDistortion: https://distancia.readthedocs.io/en/latest/BarkSpectralDistortion.html
.. _ItakuraSaitoDistance: https://distancia.readthedocs.io/en/latest/ItakuraSaitoDistance.html
.. _SignalToNoiseRatio: https://distancia.readthedocs.io/en/latest/SignalToNoiseRatio.html
.. _EnergyDistance: https://distancia.readthedocs.io/en/latest/EnergyDistance.html
.. _EnvelopeCorrelation: https://distancia.readthedocs.io/en/latest/EnvelopeCorrelation.html
.. _ZeroCrossingRateDistance: https://distancia.readthedocs.io/en/latest/ZeroCrossingRateDistance.html
.. _CochleagramDistance: https://distancia.readthedocs.io/en/latest/CochleagramDistance.html
.. _ChromagramDistance: https://distancia.readthedocs.io/en/latest/ChromagramDistance.html
.. _SpectrogramDistance: https://distancia.readthedocs.io/en/latest/SpectrogramDistance.html
.. _CQTDistance: https://distancia.readthedocs.io/en/latest/CQTDistance.html
+ `File`_
.. - `ByteLevelDistance`_
- `HashComparison`_
- `NormalizedCompression`_
- `KolmogorovComplexity`_
- `DynamicBinaryInstrumentation`_
- `FileMetadataComparison`_
- `FileTypeDistance`_
- `TreeEditDistance`_
- `ZlibBasedDistance`_
.. _File: https://distancia.readthedocs.io/en/latest/fileDistance.html
.. _ByteLevelDistance: https://distancia.readthedocs.io/en/latest/ByteLevelDistance.html
.. _HashComparison: https://distancia.readthedocs.io/en/latest/HashComparison.html
.. _NormalizedCompression: https://distancia.readthedocs.io/en/latest/NormalizedCompression.html
.. _KolmogorovComplexity: https://distancia.readthedocs.io/en/latest/KolmogorovComplexity.html
.. _DynamicBinaryInstrumentation: https://distancia.readthedocs.io/en/latest/DynamicBinaryInstrumentation.html
.. _FileMetadataComparison: https://distancia.readthedocs.io/en/latest/FileMetadataComparison.html
.. _FileTypeDistance: https://distancia.readthedocs.io/en/latest/FileTypeDistance.html
.. _TreeEditDistance: https://distancia.readthedocs.io/en/latest/TreeEditDistance.html
.. _ZlibBasedDistance: https://distancia.readthedocs.io/en/latest/ZlibBasedDistance.html
And many more...
*Overview*
--------
The distancia package offers a comprehensive set of tools for computing and analyzing distances and similarities between data points. This package is particularly useful for tasks in data analysis, machine learning, and pattern recognition. Below is an overview of the key classes included in the package, each designed to address specific types of distance or similarity calculations.
+ `BatchDistance`_
.. _BatchDistance: https://distancia.readthedocs.io/en/latest/BatchDistance.html
Purpose: Facilitates batch processing of distance computations, enabling users to compute distances for large sets of pairs in a single operation.
Use Case: Essential in real-time systems or when working with large datasets where efficiency is critical. Batch processing saves time and computational resources by handling multiple distance computations in one go.
+ `ComprehensiveBenchmarking`_
.. _ComprehensiveBenchmarking: https://distancia.readthedocs.io/en/latest/ComprehensiveBenchmarking.html
Purpose: Provides tools for benchmarking the performance of various distance metrics on different types of data.
Use Case: Useful in performance-sensitive applications where choosing the optimal metric can greatly impact computational efficiency and accuracy. This class helps users make informed decisions about which distance metric to use for their specific task.
+ `CustomDistanceFunction`_
.. _CustomDistanceFunction: https://distancia.readthedocs.io/en/latest/CustomDistanceFunction.html
Purpose: Allows users to define custom distance functions by specifying a mathematical formula or providing a custom Python function.
Use Case: Useful for researchers or practitioners who need a specific metric that isn’t commonly used or already implemented.
+ `DistanceMatrix`_
.. _DistanceMatrix: https://distancia.readthedocs.io/en/latest/DistanceMatrix.html
Purpose: Automatically generates a distance matrix for a set of data points using a specified distance metric.
Use Case: Useful in clustering algorithms like k-means, hierarchical clustering, or in generating heatmaps for visualizing similarity/dissimilarity in datasets.
+ `DistanceMetricLearning`_
.. _DistanceMetricLearning: https://distancia.readthedocs.io/en/latest/DistanceMetricLearning.html
Purpose: Implements algorithms for learning an optimal distance metric from data based on a specific task, such as classification or clustering.
Use Case: Critical in machine learning tasks where the goal is to optimize a distance metric for maximum task-specific performance, improving the accuracy of models.
+ `IntegratedDistance`_
.. _IntegratedDistance: https://distancia.readthedocs.io/en/latest/IntegratedDistance.html
Purpose: Enables seamless integration of distance computations with popular data science libraries like pandas, scikit-learn, and numpy.
Use Case: This class enhances the usability of the distancia package, allowing users to incorporate distance calculations directly into their existing data analysis workflows.
+ `MetricFinder`_
.. _MetricFinder: https://distancia.readthedocs.io/en/latest/MetricFinder.html
Purpose: Identifies the most appropriate distance metric for two given data points based on their structure.
Use Case: Useful when dealing with various types of data, this class helps users automatically determine the best distance metric to apply, ensuring that the metric chosen is suitable for the data's characteristics.
+ `OutlierDetection`_
.. _OutlierDetection: https://distancia.readthedocs.io/en/latest/OutlierDetection.html
Purpose: Implements methods for detecting outliers in datasets by using distance metrics to identify points that deviate significantly from others.
Use Case: Essential in fields such as fraud detection, quality control, and data cleaning, where identifying and managing outliers is crucial for maintaining data integrity.
+ `ParallelandDistributedComputation`_
.. _ParallelandDistributedComputation: https://distancia.readthedocs.io/en/latest/ParallelandDistributedComputation.html
Purpose: Adds support for parallel or distributed computation of distances, particularly useful for large datasets.
Use Case: In big data scenarios, calculating distances between millions of data points can be computationally expensive. This class significantly reduces computation time by parallelizing these calculations across multiple processors or machines.
+ `Visualization`_
.. _Visualization: https://distancia.readthedocs.io/en/latest/Visualization.html
Purpose: Provides tools for visualizing distance matrices, dendrograms (for hierarchical clustering), and 2D/3D representations of data points based on distance metrics.
Use Case: Visualization is a powerful tool in exploratory data analysis (EDA), helping users understand the relationships between data points. This class is particularly useful for creating visual aids like heatmaps or dendrograms to better interpret the data.
+ `APICompatibility`_
.. _APICompatibility: https://distancia.readthedocs.io/en/latest/APICompatibility.html
The APICompatibility class in the distancia package bridges the gap between powerful distance computation tools and modern API-based architectures. By enabling the creation of REST endpoints for distance metrics, it facilitates the integration of distancia into a wide range of applications, from web services to distributed computing environments. This not only enhances the usability of the package but also ensures that it can be effectively deployed in real-world, production-grade systems.
+ `AutomatedDistanceMetricSelection`_
.. _AutomatedDistanceMetricSelection: https://distancia.readthedocs.io/en/latest/AutomatedDistanceMetricSelection.html
The AutomatedDistanceMetricSelection feature in the distancia package represents a significant advancement in the ease of use and accessibility of distance metric selection. By automating the process of metric recommendation, it helps users, especially those less familiar with the intricacies of different metrics, to achieve better results in their analyses. This feature not only saves time but also improves the accuracy of data-driven decisions, making distancia a more powerful and user-friendly tool for the data science community.
+ `ReportingAndDocumentation`_
.. _ReportingAndDocumentation: https://distancia.readthedocs.io/en/latest/ReportingAndDocumentation.html
The ReportingAndDocumentation class is a powerful tool for automating the analysis and documentation of distance metrics. By integrating report generation, matrix export, and property documentation, it provides users with a streamlined way to evaluate and present the results of their distance-based models. This class is especially valuable for machine learning practitioners who require a deeper understanding of the behavior of the metrics they employ.
+AdvancedAnalysis`_
.. _AdvancedAnalysis: https://distancia.readthedocs.io/en/latest/AdvancedAnalysis.html
The AdvancedAnalysis class provides essential tools for evaluating the performance, robustness, and sensitivity of distance metrics. These advanced analyses ensure that a metric is not only theoretically sound but also practical and reliable in diverse applications. By offering deep insights into the behavior of distance metrics under perturbations, noise, and dataset divisions, this class is crucial for building resilient models in real-world environments.
+ `DimensionalityReductionAndScaling`_
.. _DimensionalityReductionAndScaling: https://distancia.readthedocs.io/en/latest/DimensionalityReductionAndScaling.html
The `DimensionalityReductionAndScaling` class offers powerful methods for simplifying and scaling datasets. By providing tools for dimensionality reduction such as Multi-Dimensional Scaling (MDS), it allows users to project high-dimensional data into lower dimensions while retaining its key characteristics.
+ `ComparisonAndValidation`_
.. _ComparisonAndValidation: https://distancia.readthedocs.io/en/latest/ComparisonAndValidation.html
The ComparisonAndValidation class offers tools to analyze and validate the performance of a distance or similarity metric by comparing it with other metrics and using established benchmarks. This class is essential for evaluating the effectiveness of a metric in various tasks, such as clustering, classification, or retrieval. By providing cross-validation techniques and benchmarking methods, it allows users to gain a deeper understanding of the metric's strengths and weaknesses.
+ `StatisticalAnalysis`_
.. _StatisticalAnalysis: https://distancia.readthedocs.io/en/latest/StatisticalAnalysis.html
The StatisticalAnalysis class provides essential tools to analyze and interpret the statistical properties of distances or similarities within a dataset. Through the computation of mean, variance, and distance distributions,
*Contributing*
------------
We welcome contributions! If you would like to contribute to **Distancia**, please read the `contributing`_ guide to get started. We appreciate your help in making this project better.
.. contributing: https://distancia.readthedocs.io/en/latest/CONTRIBUTING.html
*Link*
------
+ `Notebook`_
+ `vectorDistance`_
+ `matrixDistance`_
+ `textDistance`_
+ `graphDistance`_
+ `MarkovChain`_
+ `Loss_function`_
+ `distance`_
+ `fileDistance`_
+ `lossDistance`_
+ `similarity`_
+ `imageDistance`_
+ `soundDistance`_
+ `timeSeriesDistance`_
.. _Notebook: https://github.com/ym001/distancia/tree/master/notebook
.. _vectorDistance: https://github.com/ym001/distancia/blob/master/notebook/vectorDistance.ipynb
.. _matrixDistance: https://github.com/ym001/distancia/blob/master/notebook/matrixDistance.ipynb
.. _textDistance: https://github.com/ym001/distancia/blob/master/notebook/textDistance.ipynb
.. _graphDistance: https://github.com/ym001/distancia/blob/master/notebook/graphDistance.ipynb
.. _MarkovChain: https://github.com/ym001/distancia/blob/master/notebook/MarkovChain.ipynb
.. _Loss_function: https://github.com/ym001/distancia/blob/master/notebook/Loss_function.ipynb
.. _distance: https://github.com/ym001/distancia/blob/master/notebook/distance.ipynb
.. _fileDistance: https://github.com/ym001/distancia/blob/master/notebook/fileDistance.ipynb
.. _lossDistance: https://github.com/ym001/distancia/blob/master/notebook/lossDistance.ipynb
.. _similarity: https://github.com/ym001/distancia/blob/master/notebook/similarity.ipynb
.. _imageDistance: https://github.com/ym001/distancia/blob/master/notebook/imageDistance.ipynb
.. _soundDistance: https://github.com/ym001/distancia/blob/master/notebook/soundDistance.ipynb
.. _timeSeriesDistance: https://github.com/ym001/distancia/blob/master/notebook/timeSeriesDistance.ipynb
+ `Examples`_
.. _Examples: https://github.com/ym001/distancia/blob/master/src/example.py
+ `Pypi`_
.. _Pypi: https://pypi.org/project/distancia/
+ `Source`_
.. _Source: https://github.com/ym001/distancia
+ `Documentation`_
.. _Documentation: https://distancia.readthedocs.io/en/latest/
+ `License`_
.. _License: https://github.com/ym001/distancia/blob/master/LICENSE
*Conclusion*
------------
The *Distancia* package offers a versatile toolkit for handling a wide range of distance and similarity calculations. Whether you're working with numeric data, categorical data, strings, or time series, the package's classes provide the necessary tools to accurately measure distances and similarities. By understanding and utilizing these classes, you can enhance your data analysis workflows and improve the performance of your machine learning models.
Raw data
{
"_id": null,
"home_page": "https://pypi.org/project/distancia/",
"name": "distancia",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.0",
"maintainer_email": null,
"keywords": "distance, similarity, metrics, space, data-science, deep-learning, machine-learning, neural-network, statistics, python, cython, jupyter-notebook, data-analyse, nlp, vector, matrix, graph, markov chain, image, sound, text",
"author": "Yves Mercadier",
"author_email": "Yves Mercadier <info@realpython.com>",
"download_url": "https://files.pythonhosted.org/packages/1d/1b/e4c951c3549f11dbd0d786c10850448108aa0d1d2fa466dfef14f03024b1/distancia-0.0.74.tar.gz",
"platform": null,
"description": ".. meta::\n :description: Distancia is a comprehensive Python package that provides a wide range of distance metrics and similarity measures, making it easy to calculate and compare the proximity between various types of data. This documentation provides an in-depth guide to the package, including installation instructions, usage examples, and detailed descriptions of each available metric.\n\n\n :keywords: data-science machine-learning deep-learning neural-network graph text-classification text distance cython markov-chain file similarity image-classification nlp-machine-learning loss-functions distancia\n :keywords lang=en: data-science machine-learning deep-learning neural-network graph text-classification text distance cython markov-chain file similarity image-classification nlp-machine-learning loss-functions distancia\n======================================\nWelcome to Distancia's documentation!\n======================================\n\n\n**Distancia** is a comprehensive Python package that provides a wide range of distance metrics and similarity measures, making it easy to calculate and compare the proximity between various types of data. This documentation provides an in-depth guide to the package, including installation instructions, usage examples, and detailed descriptions of each available metric.\n\nThe documentation is divided into the following sections:\n\n.. note::\n\n The code examples provided in this documentation are written for Python 3.x.\n The python code in this package has been optimized by static typing with Cython\n\n*Getting Started*\n---------------\n\n**Distancia** is designed to be simple and intuitive, yet powerful and flexible. Whether you are working with numerical data, strings, or other types of data, Distancia provides the tools you need to measure the distance or similarity between objects.\n\n\nFor a quick introduction, check out the `quickstart`_ guide. If you want to dive straight into the code, head over to the `Euclidean`_ page.\n\n.. quickstart: https://distancia.readthedocs.io/en/latest/quickstart.html\n\n.. _Euclidean: https://distancia.readthedocs.io/en/latest/Euclidean.html\n\n.. note::\n\n If you find any issues or have suggestions for improvements, feel free to contribute!\n\n*Installation*\n------------\n\nYou can install the distancia package with pip:\n\n.. code-block:: bash\n\n pip install distancia\n\nBy default, this will install the core functionality of the package, suitable for users who only need basic distance metrics.\n\nOptional Dependencies\nThe **Distancia** package also supports optional modules to enable additional features. You can install these extras depending on your needs:\n\nWith pandas support: Install with additional support for working with tabular data:\n\n.. code-block:: bash\n\n pip install distancia[pandas]\n\nWith all supported extras: Install all optional dependencies for maximum functionality:\n\n.. code-block:: bash\n\n pip install distancia[all]\n\nThis modular installation allows you to keep your setup lightweight or include everything for full capabilities.\n\n*Quickstart*\n----------\n\nHere are some common examples of how to use **Distancia**:\n\n.. code-block:: python\n :caption: Example 1: Calculating Euclidean Distance\n\n from distancia import Euclidean\n\n point1 = [1, 2, 3]\n point2 = [4, 5, 6]\n\n # Create an instance of Euclidean\n euclidean = Euclidean()\n\n # Calculate the Euclidean distance\n distance = euclidean.compute(point1, point2)\n\n print(f\"Euclidean Distance: {distance:4f}\")\n\n.. code-block:: bash\n\n >>>Euclidean Distance: 5.196\n\n.. code-block:: python\n :caption: Example 2: Calculating Levenshtein Distance\n\n from distancia import Levenshtein\n\n string1 = \"kitten\"\n string2 = \"sitting\"\n\n distance = Levenshtein().compute(string1, string2)\n print(f\"Levenshtein Distance: {distance:4f}\")\n\n.. code:: bash\n\n >>>Levenshtein Distance: 3\n\nFor a complete list and detailed explanations of each metric, see the next section.\n\n*Available Metrics*\n-------------------\n\n.. _Vector: https://distancia.readthedocs.io/en/latest/vectorDistance.html\n\n.. _Manhattan: https://distancia.readthedocs.io/en/latest/Manhattan.html\n.. _Minkowski: https://distancia.readthedocs.io/en/latest/Minkowski.html\n.. _Jaro: https://distancia.readthedocs.io/en/latest/Jaro.html\n.. _KendallTau: https://distancia.readthedocs.io/en/latest/KendallTau.html\n.. _Bhattacharyya: https://distancia.readthedocs.io/en/latest/Bhattacharyya.html\n.. _Haversine: https://distancia.readthedocs.io/en/latest/Haversine.html\n.. _Chebyshev: https://distancia.readthedocs.io/en/latest/Chebyshev.html\n.. _ContextualDynamicDistance: https://distancia.readthedocs.io/en/latest/ContextualDynamicDistance.html\n.. _Canberra: https://distancia.readthedocs.io/en/latest/Canberra.html\n.. _BrayCurtis: https://distancia.readthedocs.io/en/latest/BrayCurtis.html\n.. _RogersTanimoto: https://distancia.readthedocs.io/en/latest/RogersTanimoto.html\n.. _RussellRao: https://distancia.readthedocs.io/en/latest/RussellRao.html\n.. _SokalMichener: https://distancia.readthedocs.io/en/latest/SokalMichener.html\n.. _SokalSneath: https://distancia.readthedocs.io/en/latest/SokalSneath.html\n.. _Wasserstein: https://distancia.readthedocs.io/en/latest/Wasserstein.html\n.. _Gower: https://distancia.readthedocs.io/en/latest/Gower.html\n.. _CzekanowskiDice: https://distancia.readthedocs.io/en/latest/CzekanowskiDice.html\n.. _Hellinger: https://distancia.readthedocs.io/en/latest/Hellinger.html\n.. _MotzkinStraus: https://distancia.readthedocs.io/en/latest/MotzkinStraus.html\n.. _EnhancedRogersTanimoto: https://distancia.readthedocs.io/en/latest/EnhancedRogersTanimoto.html\n.. _KullbackLeibler: https://distancia.readthedocs.io/en/latest/KullbackLeibler.html\n.. _Jaccard: https://distancia.readthedocs.io/en/latest/Jaccard.html\n.. _GeneralizedJaccard: https://distancia.readthedocs.io/en/latest/GeneralizedJaccard.html\n.. _Tanimoto: https://distancia.readthedocs.io/en/latest/Tanimoto.html\n.. _InverseTanimoto: https://distancia.readthedocs.io/en/latest/InverseTanimoto.html\n.. _Ochiai: https://distancia.readthedocs.io/en/latest/Ochiai.html\n.. _CzekanowskiDice: https://distancia.readthedocs.io/en/latest/CzekanowskiDice.html\n.. _Pearson: https://distancia.readthedocs.io/en/latest/Pearson.html\n.. _Spearman: https://distancia.readthedocs.io/en/latest/Spearman.html\n.. _FagerMcGowan: https://distancia.readthedocs.io/en/latest/FagerMcGowan.html\n.. _Otsuka: https://distancia.readthedocs.io/en/latest/Otsuka.html\n.. _Gestalt: https://distancia.readthedocs.io/en/latest/Gestalt.html\n\n.. _Matrix: https://distancia.readthedocs.io/en/latest/matrixDistance.html\n.. _Mahalanobis: https://distancia.readthedocs.io/en/latest/Mahalanobis.html\n.. _MahalanobisTaguchi: https://distancia.readthedocs.io/en/latest/MahalanobisTaguchi.html\n.. _MatrixSpectral: https://distancia.readthedocs.io/en/latest/MatrixSpectral.html\n.. _NormalizedSpectral: https://distancia.readthedocs.io/en/latest/NormalizedSpectral.html\n.. _PureDiffusion: https://distancia.readthedocs.io/en/latest/PureDiffusion.html\n.. _RandomWalk: https://distancia.readthedocs.io/en/latest/RandomWalk.html\n.. _HeatKernel: https://distancia.readthedocs.io/en/latest/HeatKernel.html\n.. _GraphEditMatrix: https://distancia.readthedocs.io/en/latest/GraphEditMatrix.html\n.. _WeisfeilerLehman: https://distancia.readthedocs.io/en/latest/WeisfeilerLehman.html\n.. _NetSimile: https://distancia.readthedocs.io/en/latest/NetSimile.html\n.. _TriangleMatrixDistance: https://distancia.readthedocs.io/en/latest/TriangleMatrixDistance.html\n.. _PatternBased: https://distancia.readthedocs.io/en/latest/PatternBased.html\n.. _CliqueBasedGraph: https://distancia.readthedocs.io/en/latest/CliqueBasedGraph.html\n.. _CycleMatrixDistance: https://distancia.readthedocs.io/en/latest/CycleMatrixDistance.html\n.. _GraphletMatrixDistance: https://distancia.readthedocs.io/en/latest/GraphletMatrixDistance.html\n.. _MinimumCutDistanceCalculator: https://distancia.readthedocs.io/en/latest/MinimumCutDistanceCalculator.html\n.. _Percolation: https://distancia.readthedocs.io/en/latest/Percolation.html\n\n\n.. _Text: https://distancia.readthedocs.io/en/latest/textDistance.html\n.. _Levenshtein: https://distancia.readthedocs.io/en/latest/Levenshtein.html\n.. _DamerauLevenshtein: https://distancia.readthedocs.io/en/latest/DamerauLevenshtein.html\n.. _Hamming: https://distancia.readthedocs.io/en/latest/Hamming.html\n.. _Cosine: https://distancia.readthedocs.io/en/latest/Cosine.html\n.. _TFIDFDistance: https://distancia.readthedocs.io/en/latest/TFIDFDistance.html\n.. _SimHash: https://distancia.readthedocs.io/en/latest/SimHash.html\n.. _CosineTF: https://distancia.readthedocs.io/en/latest/CosineTF.html\n.. _WordMoversDistance: https://distancia.readthedocs.io/en/latest/WordMoversDistance.html\n.. _BERTBasedDistance: https://distancia.readthedocs.io/en/latest/BERTBasedDistance.html\n.. _JaroWinkler: https://distancia.readthedocs.io/en/latest/JaroWinkler.html\n.. _OverlapCoefficient: https://distancia.readthedocs.io/en/latest/OverlapCoefficient.html\n.. _SorensenDice: https://distancia.readthedocs.io/en/latest/SorensenDice.html\n.. _BagOfWordsDistance: https://distancia.readthedocs.io/en/latest/BagOfWordsDistance.html\n.. _FastTextDistance: https://distancia.readthedocs.io/en/latest/FastTextDistance.html\n.. _Dice: https://distancia.readthedocs.io/en/latest/Dice.html\n.. _Tversky: https://distancia.readthedocs.io/en/latest/Tversky.html\n.. _NgramDistance: https://distancia.readthedocs.io/en/latest/NgramDistance.html\n.. _SmithWaterman: https://distancia.readthedocs.io/en/latest/SmithWaterman.html\n.. _RatcliffObershelp: https://distancia.readthedocs.io/en/latest/RatcliffObershelp.html\n.. _BLEUScore: https://distancia.readthedocs.io/en/latest/BLEUScore.html\n.. _ROUGEScore: https://distancia.readthedocs.io/en/latest/ROUGEScore.html\n.. _SoftCosineSimilarity: https://distancia.readthedocs.io/en/latest/SoftCosineSimilarity.html\n.. _TopicModelingDistance: https://distancia.readthedocs.io/en/latest/TopicModelingDistance.html\n.. _AlignmentBasedMeasures: https://distancia.readthedocs.io/en/latest/AlignmentBasedMeasures.html\n.. _GappyNGramDistance: https://distancia.readthedocs.io/en/latest/GappyNGramDistance.html\n.. _SoftJaccardSimilarity: https://distancia.readthedocs.io/en/latest/SoftJaccardSimilarity.html\n.. _NormalizedCompressionDistance: https://distancia.readthedocs.io/en/latest/NormalizedCompressionDistance.html\n.. _MongeElkanDistance: https://distancia.readthedocs.io/en/latest/MongeElkanDistance.html\n.. _JensenShannonDivergence: https://distancia.readthedocs.io/en/latest/JensenShannonDivergence.html\n\n\n.. _Time: https://distancia.readthedocs.io/en/latest/timeDistance.html\n.. _DynamicTimeWarping: https://distancia.readthedocs.io/en/latest/DynamicTimeWarping.html\n.. _LongestCommonSubsequence: https://distancia.readthedocs.io/en/latest/LongestCommonSubsequence.html\n.. _Frechet: https://distancia.readthedocs.io/en/latest/Frechet.html\n\n+ `Vector`_ \n\n.. - `Euclidean`_\n - `Manhattan`_ \n - `Minkowski`_ \n - `Bhattacharyya`_\n - `Haversine`_\n - `Chebyshev`_\n - `ContextualDynamicDistance`_\n - `Canberra`_\n - `BrayCurtis`_\n - `RogersTanimoto`_\n - `RussellRao`_\n - `SokalMichener`_\n - `SokalSneath`_\n - `Wasserstein`_\n - `Gower`_\n - `CzekanowskiDice`_\n - `Hellinger`_\n - `MotzkinStraus`_\n - `EnhancedRogersTanimoto`_\n - `KullbackLeibler`_\n - `Jaccard`_\n - `GeneralizedJaccard`_\n - `Tanimoto`_\n - `InverseTanimoto`_\n - `Ochiai`_ \n - `CzekanowskiDice`_\n - `Pearson`_\n - `Spearman`_ \n - `FagerMcGowan`_\n - `Otsuka`_ \n - `Gestalt`_\n\n+ `Matrix`_\n\n.. - `Mahalanobis`_\n - `MahalanobisTaguchi`_\n - `MatrixSpectral`_\n - `NormalizedSpectral`_\n - `PureDiffusion`_\n - `RandomWalk`_\n - `HeatKernel`_\n - `GraphEditMatrix`_\n - `WeisfeilerLehman`_\n - `NetSimile`_\n - `TriangleMatrixDistance`_\n - `PatternBased`_\n - `CliqueBasedGraph`_\n - `CycleMatrixDistance`_\n - `GraphletMatrixDistance`_\n - `MinimumCutDistanceCalculator`_\n - `Percolation`_\n\n+ `Text`_\n\n.. - `Levenshtein`_\n - `DamerauLevenshtein`_\n - `Hamming`_\n - `Cosine`_\n - `TFIDFDistance`_\n - `SimHash`_\n - `CosineTF`_\n - `WordMoversDistance`_\n - `BERTBasedDistance`_\n - `Jaro`_\n - `JaroWinkler`_\n - `OverlapCoefficient`_\n - `SorensenDice`_\n - `BagOfWordsDistance`_\n - `FastTextDistance`_\n - `Dice`_ \n - `Tversky`_ \n - `NgramDistance`_\n - `SmithWaterman`_\n - `RatcliffObershelp`_\n - `BLEUScore`_\n - `ROUGEScore`_\n - `SoftCosineSimilarity`_\n - `TopicModelingDistance`_\n - `AlignmentBasedMeasures`_\n - `GappyNGramDistance`_\n - `SoftJaccardSimilarity`_\n - `NormalizedCompressionDistance`_\n - `MongeElkanDistance`_\n - `JensenShannonDivergence`_\n.. + 'statistics'\n.. - `KendallTau`_\n\n+ `Time`_\n\n.. - `DynamicTimeWarping`_\n - `LongestCommonSubsequence`_\n - `Frechet`_\n\n\n+ `Loss`_\n\n.. - `CrossEntropy`_\n - `MeanAbsoluteError`_\n - `MeanAbsolutePercentageError`_\n - `MeanSquaredError`_\n - `SquaredLogarithmicError`_\n - `GaloisWassersteinLoss`_\n\n.. _Loss: https://distancia.readthedocs.io/en/latest/lossFunction.html\n.. _CrossEntropy: https://distancia.readthedocs.io/en/latest/CrossEntropy.html\n.. _MeanAbsoluteError: https://distancia.readthedocs.io/en/latest/MeanAbsoluteError.html\n.. _MeanAbsolutePercentageError: https://distancia.readthedocs.io/en/latest/MeanAbsolutePercentageError.html\n.. _MeanSquaredError: https://distancia.readthedocs.io/en/latest/MeanSquaredError.html\n.. _SquaredLogarithmicError: https://distancia.readthedocs.io/en/latest/SquaredLogarithmicError.html\n.. _GaloisWassersteinLoss: https://distancia.readthedocs.io/en/latest/GaloisWassersteinLoss.html\n\n+ `Graph`_\n\n.. - `ShortestPath`_\n - `GraphEditDistance`_\n - `SpectralDistance`_\n - `WeisfeilerLehmanSimilarity`_\n - `ComparingRandomWalkStationaryDistributions`_\n - `Diffusion`_\n - `FrobeniusDistance`_\n - `GraphKernelDistance`_\n - `PatternBasedDistance`_\n - `GraphCompressionDistance`_\n - `DegreeDistributionDistance`_\n - `CommunityStructureDistance`_\n\n.. _Graph: https://distancia.readthedocs.io/en/latest/graphDistance.html\n.. _ShortestPath: https://distancia.readthedocs.io/en/latest/ShortestPath.html\n.. _GraphEditDistance: https://distancia.readthedocs.io/en/latest/GraphEditDistance.html\n.. _SpectralDistance: https://distancia.readthedocs.io/en/latest/SpectralDistance.html\n.. _WeisfeilerLehmanSimilarity: https://distancia.readthedocs.io/en/latest/WeisfeilerLehmanSimilarity.html\n.. _ComparingRandomWalkStationaryDistributions: https://distancia.readthedocs.io/en/latest/ComparingRandomWalkStationaryDistributions.html\n.. _Diffusion: https://distancia.readthedocs.io/en/latest/Diffusion.html\n.. _FrobeniusDistance: https://distancia.readthedocs.io/en/latest/FrobeniusDistance.html\n.. _GraphKernelDistance: https://distancia.readthedocs.io/en/latest/GraphKernelDistance.html\n.. _PatternBasedDistance: https://distancia.readthedocs.io/en/latest/PatternBasedDistance.html\n.. _GraphCompressionDistance: https://distancia.readthedocs.io/en/latest/GraphCompressionDistance.html\n.. _DegreeDistributionDistance: https://distancia.readthedocs.io/en/latest/DegreeDistributionDistance.html\n.. _CommunityStructureDistance: https://distancia.readthedocs.io/en/latest/CommunityStructureDistance.html\n\n+ `MarkovChaine`_\n\n.. - `MarkovChainKullbackLeibler`_\n - `MarkovChainWasserstein`_\n - `MarkovChainTotalVariation`_\n - `MarkovChainHellinger`_\n - `MarkovChainJensenShannon`_\n - `MarkovChainFrobenius`_\n - `MarkovChainSpectral`_\n\n.. _MarkovChaine: https://distancia.readthedocs.io/en/latest/markovChainDistance.html\n.. _MarkovChainKullbackLeibler: https://distancia.readthedocs.io/en/latest/MarkovChainKullbackLeibler.html\n.. _MarkovChainWasserstein: https://distancia.readthedocs.io/en/latest/MarkovChainWasserstein.html\n.. _MarkovChainTotalVariation: https://distancia.readthedocs.io/en/latest/MarkovChainTotalVariation.html\n.. _MarkovChainHellinger: https://distancia.readthedocs.io/en/latest/MarkovChainHellinger.html\n.. _MarkovChainJensenShannon: https://distancia.readthedocs.io/en/latest/MarkovChainJensenShannon.html\n.. _MarkovChainFrobenius: https://distancia.readthedocs.io/en/latest/MarkovChainFrobenius.html\n.. _MarkovChainSpectral: https://distancia.readthedocs.io/en/latest/MarkovChainSpectral.html\n\n+ `Image`_\n\n.. - `StructuralSimilarityIndex`_\n - `PeakSignalToNoiseRatio`_\n - `HistogramIntersection`_\n - `EarthMoversDistance`_\n - `ChiSquareDistance`_\n - `FeatureBasedDistance`_\n - `PerceptualHashing`_\n - `NormalizedCrossCorrelation`_\n\n.. _Image: https://distancia.readthedocs.io/en/latest/imageDistance.html\n.. _StructuralSimilarityIndex: https://distancia.readthedocs.io/en/latest/StructuralSimilarityIndex.html\n.. _PeakSignalToNoiseRatio: https://distancia.readthedocs.io/en/latest/PeakSignalToNoiseRatio.html\n.. _HistogramIntersection: https://distancia.readthedocs.io/en/latest/HistogramIntersection.html\n.. _EarthMoversDistance: https://distancia.readthedocs.io/en/latest/EarthMoversDistance.html\n.. _ChiSquareDistance: https://distancia.readthedocs.io/en/latest/ChiSquareDistance.html\n.. _FeatureBasedDistance: https://distancia.readthedocs.io/en/latest/FeatureBasedDistance.html\n.. _PerceptualHashing: https://distancia.readthedocs.io/en/latest/PerceptualHashing.html\n.. _NormalizedCrossCorrelation: https://distancia.readthedocs.io/en/latest/NormalizedCrossCorrelation.html\n\n+ `Sound`_\n.. - `SpectralConvergence`_\n - `MFCCProcessor`_\n - `SignalProcessor`_\n - `PowerSpectralDensityDistance`_\n - `CrossCorrelation`_\n - `PhaseDifferenceCalculator`_\n - `TimeLagDistance`_\n - `PESQ`_\n - `LogSpectralDistance`_\n - `BarkSpectralDistortion`_\n - `ItakuraSaitoDistance`_\n - `SignalToNoiseRatio`_\n - `EnergyDistance`_\n - `EnvelopeCorrelation`_\n - `ZeroCrossingRateDistance`_\n - `CochleagramDistance`_\n - `ChromagramDistance`_\n - `SpectrogramDistance`_\n - `CQTDistance`_\n\n.. _Sound: https://distancia.readthedocs.io/en/latest/soundDistance.html\n.. _SpectralConvergence: https://distancia.readthedocs.io/en/latest/SpectralConvergence.html\n.. _MFCCProcessor: https://distancia.readthedocs.io/en/latest/MFCCProcessor.html\n.. _SignalProcessor: https://distancia.readthedocs.io/en/latest/SignalProcessor.html\n.. _PowerSpectralDensityDistance: https://distancia.readthedocs.io/en/latest/PowerSpectralDensityDistance.html\n.. _CrossCorrelation: https://distancia.readthedocs.io/en/latest/CrossCorrelation.html\n.. _PhaseDifferenceCalculator: https://distancia.readthedocs.io/en/latest/PhaseDifferenceCalculator.html\n.. _TimeLagDistance: https://distancia.readthedocs.io/en/latest/TimeLagDistance.html\n.. _PESQ: https://distancia.readthedocs.io/en/latest/PESQ.html\n.. _LogSpectralDistance: https://distancia.readthedocs.io/en/latest/LogSpectralDistance.html\n.. _BarkSpectralDistortion: https://distancia.readthedocs.io/en/latest/BarkSpectralDistortion.html\n.. _ItakuraSaitoDistance: https://distancia.readthedocs.io/en/latest/ItakuraSaitoDistance.html\n.. _SignalToNoiseRatio: https://distancia.readthedocs.io/en/latest/SignalToNoiseRatio.html\n.. _EnergyDistance: https://distancia.readthedocs.io/en/latest/EnergyDistance.html\n.. _EnvelopeCorrelation: https://distancia.readthedocs.io/en/latest/EnvelopeCorrelation.html\n.. _ZeroCrossingRateDistance: https://distancia.readthedocs.io/en/latest/ZeroCrossingRateDistance.html\n.. _CochleagramDistance: https://distancia.readthedocs.io/en/latest/CochleagramDistance.html\n.. _ChromagramDistance: https://distancia.readthedocs.io/en/latest/ChromagramDistance.html\n.. _SpectrogramDistance: https://distancia.readthedocs.io/en/latest/SpectrogramDistance.html\n.. _CQTDistance: https://distancia.readthedocs.io/en/latest/CQTDistance.html\n\n+ `File`_\n.. - `ByteLevelDistance`_\n - `HashComparison`_\n - `NormalizedCompression`_\n - `KolmogorovComplexity`_\n - `DynamicBinaryInstrumentation`_\n - `FileMetadataComparison`_\n - `FileTypeDistance`_\n - `TreeEditDistance`_\n - `ZlibBasedDistance`_\n\n.. _File: https://distancia.readthedocs.io/en/latest/fileDistance.html\n.. _ByteLevelDistance: https://distancia.readthedocs.io/en/latest/ByteLevelDistance.html\n.. _HashComparison: https://distancia.readthedocs.io/en/latest/HashComparison.html\n.. _NormalizedCompression: https://distancia.readthedocs.io/en/latest/NormalizedCompression.html\n.. _KolmogorovComplexity: https://distancia.readthedocs.io/en/latest/KolmogorovComplexity.html\n.. _DynamicBinaryInstrumentation: https://distancia.readthedocs.io/en/latest/DynamicBinaryInstrumentation.html\n.. _FileMetadataComparison: https://distancia.readthedocs.io/en/latest/FileMetadataComparison.html\n.. _FileTypeDistance: https://distancia.readthedocs.io/en/latest/FileTypeDistance.html\n.. _TreeEditDistance: https://distancia.readthedocs.io/en/latest/TreeEditDistance.html\n.. _ZlibBasedDistance: https://distancia.readthedocs.io/en/latest/ZlibBasedDistance.html\n\nAnd many more...\n\n*Overview*\n--------\nThe distancia package offers a comprehensive set of tools for computing and analyzing distances and similarities between data points. This package is particularly useful for tasks in data analysis, machine learning, and pattern recognition. Below is an overview of the key classes included in the package, each designed to address specific types of distance or similarity calculations.\n\n\n+ `BatchDistance`_\n\n.. _BatchDistance: https://distancia.readthedocs.io/en/latest/BatchDistance.html\n\nPurpose: Facilitates batch processing of distance computations, enabling users to compute distances for large sets of pairs in a single operation.\n\nUse Case: Essential in real-time systems or when working with large datasets where efficiency is critical. Batch processing saves time and computational resources by handling multiple distance computations in one go.\n\n+ `ComprehensiveBenchmarking`_\n\n.. _ComprehensiveBenchmarking: https://distancia.readthedocs.io/en/latest/ComprehensiveBenchmarking.html\n\nPurpose: Provides tools for benchmarking the performance of various distance metrics on different types of data.\n\nUse Case: Useful in performance-sensitive applications where choosing the optimal metric can greatly impact computational efficiency and accuracy. This class helps users make informed decisions about which distance metric to use for their specific task.\n\n+ `CustomDistanceFunction`_\n.. _CustomDistanceFunction: https://distancia.readthedocs.io/en/latest/CustomDistanceFunction.html\n\nPurpose: Allows users to define custom distance functions by specifying a mathematical formula or providing a custom Python function.\n\nUse Case: Useful for researchers or practitioners who need a specific metric that isn\u2019t commonly used or already implemented.\n\n+ `DistanceMatrix`_\n.. _DistanceMatrix: https://distancia.readthedocs.io/en/latest/DistanceMatrix.html\n\nPurpose: Automatically generates a distance matrix for a set of data points using a specified distance metric.\n\nUse Case: Useful in clustering algorithms like k-means, hierarchical clustering, or in generating heatmaps for visualizing similarity/dissimilarity in datasets.\n\n+ `DistanceMetricLearning`_\n.. _DistanceMetricLearning: https://distancia.readthedocs.io/en/latest/DistanceMetricLearning.html\n\nPurpose: Implements algorithms for learning an optimal distance metric from data based on a specific task, such as classification or clustering.\n\nUse Case: Critical in machine learning tasks where the goal is to optimize a distance metric for maximum task-specific performance, improving the accuracy of models.\n\n+ `IntegratedDistance`_\n.. _IntegratedDistance: https://distancia.readthedocs.io/en/latest/IntegratedDistance.html\n\nPurpose: Enables seamless integration of distance computations with popular data science libraries like pandas, scikit-learn, and numpy.\n\nUse Case: This class enhances the usability of the distancia package, allowing users to incorporate distance calculations directly into their existing data analysis workflows.\n\n+ `MetricFinder`_\n.. _MetricFinder: https://distancia.readthedocs.io/en/latest/MetricFinder.html\n\nPurpose: Identifies the most appropriate distance metric for two given data points based on their structure.\n\nUse Case: Useful when dealing with various types of data, this class helps users automatically determine the best distance metric to apply, ensuring that the metric chosen is suitable for the data's characteristics.\n\n+ `OutlierDetection`_\n.. _OutlierDetection: https://distancia.readthedocs.io/en/latest/OutlierDetection.html\n\nPurpose: Implements methods for detecting outliers in datasets by using distance metrics to identify points that deviate significantly from others.\n\nUse Case: Essential in fields such as fraud detection, quality control, and data cleaning, where identifying and managing outliers is crucial for maintaining data integrity.\n\n+ `ParallelandDistributedComputation`_\n.. _ParallelandDistributedComputation: https://distancia.readthedocs.io/en/latest/ParallelandDistributedComputation.html\n\nPurpose: Adds support for parallel or distributed computation of distances, particularly useful for large datasets.\n\nUse Case: In big data scenarios, calculating distances between millions of data points can be computationally expensive. This class significantly reduces computation time by parallelizing these calculations across multiple processors or machines.\n\n+ `Visualization`_\n.. _Visualization: https://distancia.readthedocs.io/en/latest/Visualization.html\n\nPurpose: Provides tools for visualizing distance matrices, dendrograms (for hierarchical clustering), and 2D/3D representations of data points based on distance metrics.\n\nUse Case: Visualization is a powerful tool in exploratory data analysis (EDA), helping users understand the relationships between data points. This class is particularly useful for creating visual aids like heatmaps or dendrograms to better interpret the data.\n\n+ `APICompatibility`_\n.. _APICompatibility: https://distancia.readthedocs.io/en/latest/APICompatibility.html\n\nThe APICompatibility class in the distancia package bridges the gap between powerful distance computation tools and modern API-based architectures. By enabling the creation of REST endpoints for distance metrics, it facilitates the integration of distancia into a wide range of applications, from web services to distributed computing environments. This not only enhances the usability of the package but also ensures that it can be effectively deployed in real-world, production-grade systems.\n\n+ `AutomatedDistanceMetricSelection`_\n.. _AutomatedDistanceMetricSelection: https://distancia.readthedocs.io/en/latest/AutomatedDistanceMetricSelection.html\n\nThe AutomatedDistanceMetricSelection feature in the distancia package represents a significant advancement in the ease of use and accessibility of distance metric selection. By automating the process of metric recommendation, it helps users, especially those less familiar with the intricacies of different metrics, to achieve better results in their analyses. This feature not only saves time but also improves the accuracy of data-driven decisions, making distancia a more powerful and user-friendly tool for the data science community.\n\n+ `ReportingAndDocumentation`_\n.. _ReportingAndDocumentation: https://distancia.readthedocs.io/en/latest/ReportingAndDocumentation.html\n\nThe ReportingAndDocumentation class is a powerful tool for automating the analysis and documentation of distance metrics. By integrating report generation, matrix export, and property documentation, it provides users with a streamlined way to evaluate and present the results of their distance-based models. This class is especially valuable for machine learning practitioners who require a deeper understanding of the behavior of the metrics they employ.\n\n\n+AdvancedAnalysis`_\n\n.. _AdvancedAnalysis: https://distancia.readthedocs.io/en/latest/AdvancedAnalysis.html\n\nThe AdvancedAnalysis class provides essential tools for evaluating the performance, robustness, and sensitivity of distance metrics. These advanced analyses ensure that a metric is not only theoretically sound but also practical and reliable in diverse applications. By offering deep insights into the behavior of distance metrics under perturbations, noise, and dataset divisions, this class is crucial for building resilient models in real-world environments.\n\n\n+ `DimensionalityReductionAndScaling`_\n.. _DimensionalityReductionAndScaling: https://distancia.readthedocs.io/en/latest/DimensionalityReductionAndScaling.html\n\nThe `DimensionalityReductionAndScaling` class offers powerful methods for simplifying and scaling datasets. By providing tools for dimensionality reduction such as Multi-Dimensional Scaling (MDS), it allows users to project high-dimensional data into lower dimensions while retaining its key characteristics.\n\n\n+ `ComparisonAndValidation`_\n.. _ComparisonAndValidation: https://distancia.readthedocs.io/en/latest/ComparisonAndValidation.html\n\nThe ComparisonAndValidation class offers tools to analyze and validate the performance of a distance or similarity metric by comparing it with other metrics and using established benchmarks. This class is essential for evaluating the effectiveness of a metric in various tasks, such as clustering, classification, or retrieval. By providing cross-validation techniques and benchmarking methods, it allows users to gain a deeper understanding of the metric's strengths and weaknesses.\n\n\n+ `StatisticalAnalysis`_\n.. _StatisticalAnalysis: https://distancia.readthedocs.io/en/latest/StatisticalAnalysis.html\n\nThe StatisticalAnalysis class provides essential tools to analyze and interpret the statistical properties of distances or similarities within a dataset. Through the computation of mean, variance, and distance distributions, \n\n*Contributing*\n------------\n\nWe welcome contributions! If you would like to contribute to **Distancia**, please read the `contributing`_ guide to get started. We appreciate your help in making this project better.\n\n.. contributing: https://distancia.readthedocs.io/en/latest/CONTRIBUTING.html\n\n\n*Link*\n------\n\n+ `Notebook`_\n + `vectorDistance`_\n + `matrixDistance`_\n + `textDistance`_\n + `graphDistance`_\n + `MarkovChain`_\n + `Loss_function`_\n + `distance`_\n + `fileDistance`_\n + `lossDistance`_\n + `similarity`_\n + `imageDistance`_\n + `soundDistance`_\n + `timeSeriesDistance`_\n\n.. _Notebook: https://github.com/ym001/distancia/tree/master/notebook\n.. _vectorDistance: https://github.com/ym001/distancia/blob/master/notebook/vectorDistance.ipynb\n.. _matrixDistance: https://github.com/ym001/distancia/blob/master/notebook/matrixDistance.ipynb\n.. _textDistance: https://github.com/ym001/distancia/blob/master/notebook/textDistance.ipynb\n.. _graphDistance: https://github.com/ym001/distancia/blob/master/notebook/graphDistance.ipynb\n.. _MarkovChain: https://github.com/ym001/distancia/blob/master/notebook/MarkovChain.ipynb\n.. _Loss_function: https://github.com/ym001/distancia/blob/master/notebook/Loss_function.ipynb\n.. _distance: https://github.com/ym001/distancia/blob/master/notebook/distance.ipynb\n.. _fileDistance: https://github.com/ym001/distancia/blob/master/notebook/fileDistance.ipynb\n.. _lossDistance: https://github.com/ym001/distancia/blob/master/notebook/lossDistance.ipynb\n.. _similarity: https://github.com/ym001/distancia/blob/master/notebook/similarity.ipynb\n.. _imageDistance: https://github.com/ym001/distancia/blob/master/notebook/imageDistance.ipynb\n.. _soundDistance: https://github.com/ym001/distancia/blob/master/notebook/soundDistance.ipynb\n.. _timeSeriesDistance: https://github.com/ym001/distancia/blob/master/notebook/timeSeriesDistance.ipynb\n\n+ `Examples`_\n.. _Examples: https://github.com/ym001/distancia/blob/master/src/example.py\n\n+ `Pypi`_\n.. _Pypi: https://pypi.org/project/distancia/\n\n+ `Source`_\n.. _Source: https://github.com/ym001/distancia\n\n+ `Documentation`_\n.. _Documentation: https://distancia.readthedocs.io/en/latest/\n\n+ `License`_\n.. _License: https://github.com/ym001/distancia/blob/master/LICENSE\n\n*Conclusion*\n------------\n\nThe *Distancia* package offers a versatile toolkit for handling a wide range of distance and similarity calculations. Whether you're working with numeric data, categorical data, strings, or time series, the package's classes provide the necessary tools to accurately measure distances and similarities. By understanding and utilizing these classes, you can enhance your data analysis workflows and improve the performance of your machine learning models.\n\n",
"bugtrack_url": null,
"license": "LICENSE.txt ",
"summary": "distance metrics",
"version": "0.0.74",
"project_urls": {
"Homepage": "https://pypi.org/project/distancia/"
},
"split_keywords": [
"distance",
" similarity",
" metrics",
" space",
" data-science",
" deep-learning",
" machine-learning",
" neural-network",
" statistics",
" python",
" cython",
" jupyter-notebook",
" data-analyse",
" nlp",
" vector",
" matrix",
" graph",
" markov chain",
" image",
" sound",
" text"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "d3d2f89dc251685bbc2c6c1a707dba6fa19615723fa10f020bcda36d545f0bed",
"md5": "8204967cfbf6b48b199a95b407cb9dd1",
"sha256": "e6ba9911f8ccef7fc46441890e1ea6ccf2059768a687e825da0dc882d4f2031e"
},
"downloads": -1,
"filename": "distancia-0.0.74-py3-none-any.whl",
"has_sig": false,
"md5_digest": "8204967cfbf6b48b199a95b407cb9dd1",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.0",
"size": 136784,
"upload_time": "2024-12-25T15:28:51",
"upload_time_iso_8601": "2024-12-25T15:28:51.497873Z",
"url": "https://files.pythonhosted.org/packages/d3/d2/f89dc251685bbc2c6c1a707dba6fa19615723fa10f020bcda36d545f0bed/distancia-0.0.74-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "1d1be4c951c3549f11dbd0d786c10850448108aa0d1d2fa466dfef14f03024b1",
"md5": "6eaad71dbe80176985d44e91a6335e00",
"sha256": "1bc18bcc04b4aaa23b38c5384daa7f97975d14453a0d870dbae3513ce0c5da3c"
},
"downloads": -1,
"filename": "distancia-0.0.74.tar.gz",
"has_sig": false,
"md5_digest": "6eaad71dbe80176985d44e91a6335e00",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.0",
"size": 143072,
"upload_time": "2024-12-25T15:28:54",
"upload_time_iso_8601": "2024-12-25T15:28:54.437109Z",
"url": "https://files.pythonhosted.org/packages/1d/1b/e4c951c3549f11dbd0d786c10850448108aa0d1d2fa466dfef14f03024b1/distancia-0.0.74.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-25 15:28:54",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "distancia"
}