Name | Version | Summary | date |
blockingpy |
0.1.8 |
Blocking records for record linkage and data deduplication based on ANN algorithms. |
2025-01-17 16:54:07 |
splinkclickhouse |
0.3.4 |
Clickhouse backend support for Splink |
2024-12-16 10:07:24 |
mail-deduplicate |
7.6.1 |
📧 CLI to deduplicate mails from mail boxes |
2024-11-30 08:27:27 |
mim-nlp |
0.2.0 |
A Python package with ready-to-use models for various NLP tasks and text preprocessing utilities. The implementation allows fine-tuning. |
2024-07-25 11:29:32 |
rensa |
0.1.6 |
High-performance MinHash implementation in Rust with Python bindings for efficient similarity estimation and deduplication of large datasets |
2024-06-25 17:10:17 |
process-twarc |
0.20.2 |
Tools for transforming raw data from Twarc2 to structured data for Masked Language Modeling. |
2024-06-12 11:40:55 |
inoutlists |
1.0.1 |
inoutlists is a python package to parse and normalize different sources of lists (OFAC, EU, UN, etc) to a common dictionary interface. |
2024-06-03 22:18:59 |