csv-similarity


Namecsv-similarity JSON
Version 0.0.1a0 PyPI version JSON
download
home_pagehttps://github.com/dhchenx/csv-similarity
SummaryA toolkit to get or remove similar items from the csv file
upload_time2023-05-07 01:15:05
maintainer
docs_urlNone
authorDonghua Chen
requires_python>=3.6, <4
licenseMIT
keywords csv file similarity
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ## CSV-Similarity

### Intro

A toolkit to get or remove similar items from the csv file

### Example

```python
from csv_similarity.similarity import *

get_similar(
    input_path=f'data/list_company_news1.csv',
    similarity=0.8,
    save_path=f'data/similarity_report1.csv',
    # stopwords_path=f'{root_path}/stopwords/stopwords',
    stopwords_path='',
    analyze_field='title'
)

remove_similar(
    similarity_report_path=f'data/similarity_report1.csv',
    input_csv_path=f'data/list_company_news1.csv',
    output_path=f'data/list_company_news_without_similar.csv',
)

```

### License

The `csv-similarity` toolkit is developed by [Donghua Chen](https://github.com/dhchenx). 


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/dhchenx/csv-similarity",
    "name": "csv-similarity",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6, <4",
    "maintainer_email": "",
    "keywords": "csv file similarity",
    "author": "Donghua Chen",
    "author_email": "douglaschan@126.com",
    "download_url": "https://files.pythonhosted.org/packages/c1/2f/ab4d17096b94abfacc9e31241fbe83600a90ca755390383cd48d3f9a1a18/csv-similarity-0.0.1a0.tar.gz",
    "platform": null,
    "description": "## CSV-Similarity\n\n### Intro\n\nA toolkit to get or remove similar items from the csv file\n\n### Example\n\n```python\nfrom csv_similarity.similarity import *\n\nget_similar(\n    input_path=f'data/list_company_news1.csv',\n    similarity=0.8,\n    save_path=f'data/similarity_report1.csv',\n    # stopwords_path=f'{root_path}/stopwords/stopwords',\n    stopwords_path='',\n    analyze_field='title'\n)\n\nremove_similar(\n    similarity_report_path=f'data/similarity_report1.csv',\n    input_csv_path=f'data/list_company_news1.csv',\n    output_path=f'data/list_company_news_without_similar.csv',\n)\n\n```\n\n### License\n\nThe `csv-similarity` toolkit is developed by [Donghua Chen](https://github.com/dhchenx). \n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A toolkit to get or remove similar items from the csv file",
    "version": "0.0.1a0",
    "project_urls": {
        "Bug Reports": "https://github.com/dhchenx/csv-similarity/issues",
        "Homepage": "https://github.com/dhchenx/csv-similarity",
        "Source": "https://github.com/dhchenx/csv-similarity"
    },
    "split_keywords": [
        "csv",
        "file",
        "similarity"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7d87b6f2c8d8c18970d788aec4113e15d5b56e6044f743872f765352390c9bcb",
                "md5": "07934199c869447334221e7c7c113dc4",
                "sha256": "9f3169ab663f4ff57d1f1b6662a120d10abe05e0d779f56c485642ed549b58f4"
            },
            "downloads": -1,
            "filename": "csv_similarity-0.0.1a0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "07934199c869447334221e7c7c113dc4",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6, <4",
            "size": 13534,
            "upload_time": "2023-05-07T01:15:02",
            "upload_time_iso_8601": "2023-05-07T01:15:02.328066Z",
            "url": "https://files.pythonhosted.org/packages/7d/87/b6f2c8d8c18970d788aec4113e15d5b56e6044f743872f765352390c9bcb/csv_similarity-0.0.1a0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c12fab4d17096b94abfacc9e31241fbe83600a90ca755390383cd48d3f9a1a18",
                "md5": "a700d2b54c825aa714549717d7ccd950",
                "sha256": "246e47b43adc751cb1aa96b3392180e0073dafff4ed03b03e7cc2156f1ea65b4"
            },
            "downloads": -1,
            "filename": "csv-similarity-0.0.1a0.tar.gz",
            "has_sig": false,
            "md5_digest": "a700d2b54c825aa714549717d7ccd950",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6, <4",
            "size": 15765,
            "upload_time": "2023-05-07T01:15:05",
            "upload_time_iso_8601": "2023-05-07T01:15:05.598852Z",
            "url": "https://files.pythonhosted.org/packages/c1/2f/ab4d17096b94abfacc9e31241fbe83600a90ca755390383cd48d3f9a1a18/csv-similarity-0.0.1a0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-05-07 01:15:05",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "dhchenx",
    "github_project": "csv-similarity",
    "github_not_found": true,
    "lcname": "csv-similarity"
}
        
Elapsed time: 0.08661s