sinr


Namesinr JSON
Version 1.2.0 PyPI version JSON
download
home_pagehttps://sinr-embeddings.github.io/sinr/_build/html/index.html
SummaryBuild word and graph embeddings based on community detection in graphs.
upload_time2023-07-24 14:44:02
maintainer
docs_urlNone
authorThibault Prouteau
requires_python>=3.8,<4.0
licenseCeCILL 2.1
keywords node embedding word embedding embedding graph embedding louvain community
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            =====
SINr
=====
|languages| |downloads| |license| |version| |cpython| |wheel| |python| |docs| |activity| |contributors| |quality| |build|

*SINr* is an open-source tool to efficiently compute graph and word
embeddings. Its aim is to provide sparse interpretable vectors from a
graph structure. The dimensions of the vector produced are related to
the community structure detected in the graph. By leveraging the
relative connection of vertices to communities, *SINr* builds an
interpretable space. *SINr* is focused on providing tools to build and
interpret the embeddings produced.

*SINr* is a Python module relying on
`Networkit <https://networkit.github.io>`__ for the graph structure and
community detection. *SINr* also provides efficient implementations to
extract word co-occurrence graphs from large text corpora. One of the
strength of *SINr* is its ability to work with text and produce
interpretable word embeddings that are competitive with similar
approaches. For more details on the performances of *SINr* on downstream
evaluation tasks, please refer to the `Publications <#publications>`__
section.

Requirements
============

-  As SINr relies on libraries implemented using C/C++, a modern C++
   compiler is required.
-  OpenMP (required for `Networkit <https://networkit.github.io>`__ and
   compiling *SINr*\ ’s Cython
-  Python 3.9
-  Pip
-  Cython
-  Conda (recommended)

Install
=======

SINr can be installed through ``pip`` or from source using ``poetry``
directives.

pip
---

.. code:: bash

   conda activate sinr # activate conda environment
   pip install sinr

from source
-----------

.. code:: bash

   conda activate sinr # activate conda environment
   git clone git@github.com:SINr-Embeddings/sinr.git
   cd sinr
   pip install poetry # poetry solves dependencies and installs SINr
   poetry install # installs SINr based on the pyproject.toml file

Usage example
=============

To get started using *SINr* to build graph and word embeddings, have a
look at the `notebook <./notebooks>`__ directory.

Here is a minimum working example of *SINr*

.. code:: python

       import urllib
       import io
       import gzip
       import networkit as nk
       import sinr.graph_embeddings as ge


       url = "https://snap.stanford.edu/data/wiki-Vote.txt.gz"
       graph_file = "wikipedia-votes.txt"
       # Read a graph from SNAP
       sock = urllib.request.urlopen(url)  # open URL
       s = io.BytesIO(sock.read())  # read into BytesIO "file"
       sock.close()
       with gzip.open(s, "rt") as f_in:
           with open(graph_file, "wt") as f_out:
               f_out.writelines(f_in.readlines())
       # Initialize a networkit.Graph object from SNAP graph
       G = nk.readGraph(graph_file, nk.Format.SNAP)

       # Build a SINr model and extract embeddings
       model = ge.SINr.load_from_graph(G)
       model.run(algo=nk.community.PLM(G))
       embeddings = model.get_nr()
       print(embeddings)

Documentation
=============

The documentation for *SINr* is `available
online <https://sinr-embeddings.github.io/sinr/index.html>`__.

Contributing
============

Pull requests are welcome. For major changes, please open an issue first
to disccus the changes to be made.

License
=======

Released under `CeCILL 2.1 <https://cecill.info/>`__, see `LICENSE <./LICENSE>`__ for more details.

Publications
============

*SINr* is currently maintained at the *University of Le Mans*. If you
find *SINr* useful for your own research, please cite the appropriate
papers from the list below. Publications can also be found on
`publications page in the
documentation <https://sinr-embeddings.github.io/sinr/_build/html/publications.html>`__.

**Initial SINr paper, 2021**

-  Thibault Prouteau, Victor Connes, Nicolas Dugué, Anthony Perez,
   Jean-Charles Lamirel, et al.. SINr: Fast Computing of Sparse
   Interpretable Node Representations is not a Sin!. Advances in
   Intelligent Data Analysis XIX, 19th International Symposium on
   Intelligent Data Analysis, IDA 2021, Apr 2021, Porto, Portugal.
   pp.325-337,
   ⟨\ `10.1007/978-3-030-74251-5_26 <https://dx.doi.org/10.1007/978-3-030-74251-5_26>`__\ ⟩.
   `⟨hal-03197434⟩ <https://hal.science/hal-03197434>`__

**Interpretability of SINr embedding**

-  Thibault Prouteau, Nicolas Dugué, Nathalie Camelin, Sylvain Meignier.
   Are Embedding Spaces Interpretable? Results of an Intrusion Detection
   Evaluation on a Large French Corpus. LREC 2022, Jun 2022, Marseille,
   France. `⟨hal-03770444⟩ <https://hal.science/hal-03770444>`__
   
   
.. |languages| image:: https://img.shields.io/github/languages/count/SINr-Embeddings/sinr
.. |downloads| image:: https://img.shields.io/pypi/dm/sinr
.. |license| image:: https://img.shields.io/pypi/l/sinr?color=green
.. |version| image:: https://img.shields.io/pypi/v/sinr
.. |cpython| image:: https://img.shields.io/pypi/implementation/sinr
.. |wheel| image:: https://img.shields.io/pypi/wheel/sinr
.. |python| image:: https://img.shields.io/pypi/pyversions/sinr
.. |docs| image:: https://img.shields.io/website?url=https%3A%2F%2Fsinr-embeddings.github.io%2Fsinr%2F_build%2Fhtml%2Findex.html
.. |activity| image:: https://img.shields.io/github/commit-activity/y/SINr-Embeddings/sinr
.. |contributors| image:: https://img.shields.io/github/contributors/SINr-Embeddings/sinr
.. |quality| image:: https://scrutinizer-ci.com/g/SINr-Embeddings/sinr/badges/quality-score.png?b=main
.. |build| image:: https://scrutinizer-ci.com/g/SINr-Embeddings/sinr/badges/build.png?b=main

            

Raw data

            {
    "_id": null,
    "home_page": "https://sinr-embeddings.github.io/sinr/_build/html/index.html",
    "name": "sinr",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8,<4.0",
    "maintainer_email": "",
    "keywords": "node embedding,word embedding,embedding,graph embedding,louvain,community",
    "author": "Thibault Prouteau",
    "author_email": "thibault.prouteau@univ-lemans.fr",
    "download_url": "https://files.pythonhosted.org/packages/05/4c/7e624aa7d55fd2c51519e0d72fa8ff138d756cf54b0003442863ec798d69/sinr-1.2.0.tar.gz",
    "platform": null,
    "description": "=====\nSINr\n=====\n|languages| |downloads| |license| |version| |cpython| |wheel| |python| |docs| |activity| |contributors| |quality| |build|\n\n*SINr* is an open-source tool to efficiently compute graph and word\nembeddings. Its aim is to provide sparse interpretable vectors from a\ngraph structure. The dimensions of the vector produced are related to\nthe community structure detected in the graph. By leveraging the\nrelative connection of vertices to communities, *SINr* builds an\ninterpretable space. *SINr* is focused on providing tools to build and\ninterpret the embeddings produced.\n\n*SINr* is a Python module relying on\n`Networkit <https://networkit.github.io>`__ for the graph structure and\ncommunity detection. *SINr* also provides efficient implementations to\nextract word co-occurrence graphs from large text corpora. One of the\nstrength of *SINr* is its ability to work with text and produce\ninterpretable word embeddings that are competitive with similar\napproaches. For more details on the performances of *SINr* on downstream\nevaluation tasks, please refer to the `Publications <#publications>`__\nsection.\n\nRequirements\n============\n\n-  As SINr relies on libraries implemented using C/C++, a modern C++\n   compiler is required.\n-  OpenMP (required for `Networkit <https://networkit.github.io>`__ and\n   compiling *SINr*\\ \u2019s Cython\n-  Python 3.9\n-  Pip\n-  Cython\n-  Conda (recommended)\n\nInstall\n=======\n\nSINr can be installed through ``pip`` or from source using ``poetry``\ndirectives.\n\npip\n---\n\n.. code:: bash\n\n   conda activate sinr # activate conda environment\n   pip install sinr\n\nfrom source\n-----------\n\n.. code:: bash\n\n   conda activate sinr # activate conda environment\n   git clone git@github.com:SINr-Embeddings/sinr.git\n   cd sinr\n   pip install poetry # poetry solves dependencies and installs SINr\n   poetry install # installs SINr based on the pyproject.toml file\n\nUsage example\n=============\n\nTo get started using *SINr* to build graph and word embeddings, have a\nlook at the `notebook <./notebooks>`__ directory.\n\nHere is a minimum working example of *SINr*\n\n.. code:: python\n\n       import urllib\n       import io\n       import gzip\n       import networkit as nk\n       import sinr.graph_embeddings as ge\n\n\n       url = \"https://snap.stanford.edu/data/wiki-Vote.txt.gz\"\n       graph_file = \"wikipedia-votes.txt\"\n       # Read a graph from SNAP\n       sock = urllib.request.urlopen(url)  # open URL\n       s = io.BytesIO(sock.read())  # read into BytesIO \"file\"\n       sock.close()\n       with gzip.open(s, \"rt\") as f_in:\n           with open(graph_file, \"wt\") as f_out:\n               f_out.writelines(f_in.readlines())\n       # Initialize a networkit.Graph object from SNAP graph\n       G = nk.readGraph(graph_file, nk.Format.SNAP)\n\n       # Build a SINr model and extract embeddings\n       model = ge.SINr.load_from_graph(G)\n       model.run(algo=nk.community.PLM(G))\n       embeddings = model.get_nr()\n       print(embeddings)\n\nDocumentation\n=============\n\nThe documentation for *SINr* is `available\nonline <https://sinr-embeddings.github.io/sinr/index.html>`__.\n\nContributing\n============\n\nPull requests are welcome. For major changes, please open an issue first\nto disccus the changes to be made.\n\nLicense\n=======\n\nReleased under `CeCILL 2.1 <https://cecill.info/>`__, see `LICENSE <./LICENSE>`__ for more details.\n\nPublications\n============\n\n*SINr* is currently maintained at the *University of Le Mans*. If you\nfind *SINr* useful for your own research, please cite the appropriate\npapers from the list below. Publications can also be found on\n`publications page in the\ndocumentation <https://sinr-embeddings.github.io/sinr/_build/html/publications.html>`__.\n\n**Initial SINr paper, 2021**\n\n-  Thibault Prouteau, Victor Connes, Nicolas Dugu\u00e9, Anthony Perez,\n   Jean-Charles Lamirel, et al.. SINr: Fast Computing of Sparse\n   Interpretable Node Representations is not a Sin!. Advances in\n   Intelligent Data Analysis XIX, 19th International Symposium on\n   Intelligent Data Analysis, IDA 2021, Apr 2021, Porto, Portugal.\n   pp.325-337,\n   \u27e8\\ `10.1007/978-3-030-74251-5_26 <https://dx.doi.org/10.1007/978-3-030-74251-5_26>`__\\ \u27e9.\n   `\u27e8hal-03197434\u27e9 <https://hal.science/hal-03197434>`__\n\n**Interpretability of SINr embedding**\n\n-  Thibault Prouteau, Nicolas Dugu\u00e9, Nathalie Camelin, Sylvain Meignier.\n   Are Embedding Spaces Interpretable? Results of an Intrusion Detection\n   Evaluation on a Large French Corpus. LREC 2022, Jun 2022, Marseille,\n   France. `\u27e8hal-03770444\u27e9 <https://hal.science/hal-03770444>`__\n   \n   \n.. |languages| image:: https://img.shields.io/github/languages/count/SINr-Embeddings/sinr\n.. |downloads| image:: https://img.shields.io/pypi/dm/sinr\n.. |license| image:: https://img.shields.io/pypi/l/sinr?color=green\n.. |version| image:: https://img.shields.io/pypi/v/sinr\n.. |cpython| image:: https://img.shields.io/pypi/implementation/sinr\n.. |wheel| image:: https://img.shields.io/pypi/wheel/sinr\n.. |python| image:: https://img.shields.io/pypi/pyversions/sinr\n.. |docs| image:: https://img.shields.io/website?url=https%3A%2F%2Fsinr-embeddings.github.io%2Fsinr%2F_build%2Fhtml%2Findex.html\n.. |activity| image:: https://img.shields.io/github/commit-activity/y/SINr-Embeddings/sinr\n.. |contributors| image:: https://img.shields.io/github/contributors/SINr-Embeddings/sinr\n.. |quality| image:: https://scrutinizer-ci.com/g/SINr-Embeddings/sinr/badges/quality-score.png?b=main\n.. |build| image:: https://scrutinizer-ci.com/g/SINr-Embeddings/sinr/badges/build.png?b=main\n",
    "bugtrack_url": null,
    "license": "CeCILL 2.1",
    "summary": "Build word and graph embeddings based on community detection in graphs.",
    "version": "1.2.0",
    "project_urls": {
        "Homepage": "https://sinr-embeddings.github.io/sinr/_build/html/index.html",
        "Repository": "https://github.com/SINr-Embeddings/sinr"
    },
    "split_keywords": [
        "node embedding",
        "word embedding",
        "embedding",
        "graph embedding",
        "louvain",
        "community"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "17e7f049c65b49f0ab7c37594cfa5b429103f8d34ad1a12fd9208f870874e940",
                "md5": "5ad7d17928ba89f8e0629c62d9d353dc",
                "sha256": "c382e034c16d29e4ccd4b89d7a6ec851d236501acd190f866b3a76c5ae2fbadc"
            },
            "downloads": -1,
            "filename": "sinr-1.2.0-cp310-cp310-manylinux_2_35_x86_64.whl",
            "has_sig": false,
            "md5_digest": "5ad7d17928ba89f8e0629c62d9d353dc",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": ">=3.8,<4.0",
            "size": 884850,
            "upload_time": "2023-07-24T14:44:00",
            "upload_time_iso_8601": "2023-07-24T14:44:00.775461Z",
            "url": "https://files.pythonhosted.org/packages/17/e7/f049c65b49f0ab7c37594cfa5b429103f8d34ad1a12fd9208f870874e940/sinr-1.2.0-cp310-cp310-manylinux_2_35_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "054c7e624aa7d55fd2c51519e0d72fa8ff138d756cf54b0003442863ec798d69",
                "md5": "bc4e59bd621d988f6927e988bffb4401",
                "sha256": "cab03a9f1ce15de63489b5552d9244f7b2f9bd7991622016023fce761475a2ee"
            },
            "downloads": -1,
            "filename": "sinr-1.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "bc4e59bd621d988f6927e988bffb4401",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8,<4.0",
            "size": 55473,
            "upload_time": "2023-07-24T14:44:02",
            "upload_time_iso_8601": "2023-07-24T14:44:02.879113Z",
            "url": "https://files.pythonhosted.org/packages/05/4c/7e624aa7d55fd2c51519e0d72fa8ff138d756cf54b0003442863ec798d69/sinr-1.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-07-24 14:44:02",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "SINr-Embeddings",
    "github_project": "sinr",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "sinr"
}
        
Elapsed time: 0.10664s