=====
SINr
=====
|languages| |downloads| |license| |version| |cpython| |wheel| |python| |docs| |activity| |contributors| |quality| |build|
*SINr* is an open-source tool to efficiently compute graph and word
embeddings. Its aim is to provide sparse interpretable vectors from a
graph structure. The dimensions of the vector produced are related to
the community structure detected in the graph. By leveraging the
relative connection of vertices to communities, *SINr* builds an
interpretable space. *SINr* is focused on providing tools to build and
interpret the embeddings produced.
*SINr* is a Python module relying on
`Networkit <https://networkit.github.io>`__ for the graph structure and
community detection. *SINr* also provides efficient implementations to
extract word co-occurrence graphs from large text corpora. One of the
strength of *SINr* is its ability to work with text and produce
interpretable word embeddings that are competitive with similar
approaches. For more details on the performances of *SINr* on downstream
evaluation tasks, please refer to the `Publications <#publications>`__
section.
Requirements
============
- As SINr relies on libraries implemented using C/C++, a modern C++
compiler is required.
- OpenMP (required for `Networkit <https://networkit.github.io>`__ and
compiling *SINr*\ ’s Cython
- Python 3.9
- Pip
- Cython
- Conda (recommended)
Install
=======
SINr can be installed through ``pip`` or from source using ``poetry``
directives.
pip
---
.. code:: bash
conda activate sinr # activate conda environment
pip install sinr
from source
-----------
.. code:: bash
conda activate sinr # activate conda environment
git clone git@github.com:SINr-Embeddings/sinr.git
cd sinr
pip install poetry # poetry solves dependencies and installs SINr
poetry install # installs SINr based on the pyproject.toml file
Usage example
=============
To get started using *SINr* to build graph and word embeddings, have a
look at the `notebook <./notebooks>`__ directory.
Here is a minimum working example of *SINr*
.. code:: python
import urllib
import io
import gzip
import networkit as nk
import sinr.graph_embeddings as ge
url = "https://snap.stanford.edu/data/wiki-Vote.txt.gz"
graph_file = "wikipedia-votes.txt"
# Read a graph from SNAP
sock = urllib.request.urlopen(url) # open URL
s = io.BytesIO(sock.read()) # read into BytesIO "file"
sock.close()
with gzip.open(s, "rt") as f_in:
with open(graph_file, "wt") as f_out:
f_out.writelines(f_in.readlines())
# Initialize a networkit.Graph object from SNAP graph
G = nk.readGraph(graph_file, nk.Format.SNAP)
# Build a SINr model and extract embeddings
model = ge.SINr.load_from_graph(G)
model.run(algo=nk.community.PLM(G))
embeddings = model.get_nr()
print(embeddings)
Documentation
=============
The documentation for *SINr* is `available
online <https://sinr-embeddings.github.io/sinr/index.html>`__.
Contributing
============
Pull requests are welcome. For major changes, please open an issue first
to disccus the changes to be made.
License
=======
Released under `CeCILL 2.1 <https://cecill.info/>`__, see `LICENSE <./LICENSE>`__ for more details.
Publications
============
*SINr* is currently maintained at the *University of Le Mans*. If you
find *SINr* useful for your own research, please cite the appropriate
papers from the list below. Publications can also be found on
`publications page in the
documentation <https://sinr-embeddings.github.io/sinr/_build/html/publications.html>`__.
**Initial SINr paper, 2021**
- Thibault Prouteau, Victor Connes, Nicolas Dugué, Anthony Perez,
Jean-Charles Lamirel, et al.. SINr: Fast Computing of Sparse
Interpretable Node Representations is not a Sin!. Advances in
Intelligent Data Analysis XIX, 19th International Symposium on
Intelligent Data Analysis, IDA 2021, Apr 2021, Porto, Portugal.
pp.325-337,
⟨\ `10.1007/978-3-030-74251-5_26 <https://dx.doi.org/10.1007/978-3-030-74251-5_26>`__\ ⟩.
`⟨hal-03197434⟩ <https://hal.science/hal-03197434>`__
**Interpretability of SINr embedding**
- Thibault Prouteau, Nicolas Dugué, Nathalie Camelin, Sylvain Meignier.
Are Embedding Spaces Interpretable? Results of an Intrusion Detection
Evaluation on a Large French Corpus. LREC 2022, Jun 2022, Marseille,
France. `⟨hal-03770444⟩ <https://hal.science/hal-03770444>`__
.. |languages| image:: https://img.shields.io/github/languages/count/SINr-Embeddings/sinr
.. |downloads| image:: https://img.shields.io/pypi/dm/sinr
.. |license| image:: https://img.shields.io/pypi/l/sinr?color=green
.. |version| image:: https://img.shields.io/pypi/v/sinr
.. |cpython| image:: https://img.shields.io/pypi/implementation/sinr
.. |wheel| image:: https://img.shields.io/pypi/wheel/sinr
.. |python| image:: https://img.shields.io/pypi/pyversions/sinr
.. |docs| image:: https://img.shields.io/website?url=https%3A%2F%2Fsinr-embeddings.github.io%2Fsinr%2F_build%2Fhtml%2Findex.html
.. |activity| image:: https://img.shields.io/github/commit-activity/y/SINr-Embeddings/sinr
.. |contributors| image:: https://img.shields.io/github/contributors/SINr-Embeddings/sinr
.. |quality| image:: https://scrutinizer-ci.com/g/SINr-Embeddings/sinr/badges/quality-score.png?b=main
.. |build| image:: https://scrutinizer-ci.com/g/SINr-Embeddings/sinr/badges/build.png?b=main
Raw data
{
"_id": null,
"home_page": "https://sinr-embeddings.github.io/sinr/_build/html/index.html",
"name": "sinr",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8,<4.0",
"maintainer_email": "",
"keywords": "node embedding,word embedding,embedding,graph embedding,louvain,community",
"author": "Thibault Prouteau",
"author_email": "thibault.prouteau@univ-lemans.fr",
"download_url": "https://files.pythonhosted.org/packages/05/4c/7e624aa7d55fd2c51519e0d72fa8ff138d756cf54b0003442863ec798d69/sinr-1.2.0.tar.gz",
"platform": null,
"description": "=====\nSINr\n=====\n|languages| |downloads| |license| |version| |cpython| |wheel| |python| |docs| |activity| |contributors| |quality| |build|\n\n*SINr* is an open-source tool to efficiently compute graph and word\nembeddings. Its aim is to provide sparse interpretable vectors from a\ngraph structure. The dimensions of the vector produced are related to\nthe community structure detected in the graph. By leveraging the\nrelative connection of vertices to communities, *SINr* builds an\ninterpretable space. *SINr* is focused on providing tools to build and\ninterpret the embeddings produced.\n\n*SINr* is a Python module relying on\n`Networkit <https://networkit.github.io>`__ for the graph structure and\ncommunity detection. *SINr* also provides efficient implementations to\nextract word co-occurrence graphs from large text corpora. One of the\nstrength of *SINr* is its ability to work with text and produce\ninterpretable word embeddings that are competitive with similar\napproaches. For more details on the performances of *SINr* on downstream\nevaluation tasks, please refer to the `Publications <#publications>`__\nsection.\n\nRequirements\n============\n\n- As SINr relies on libraries implemented using C/C++, a modern C++\n compiler is required.\n- OpenMP (required for `Networkit <https://networkit.github.io>`__ and\n compiling *SINr*\\ \u2019s Cython\n- Python 3.9\n- Pip\n- Cython\n- Conda (recommended)\n\nInstall\n=======\n\nSINr can be installed through ``pip`` or from source using ``poetry``\ndirectives.\n\npip\n---\n\n.. code:: bash\n\n conda activate sinr # activate conda environment\n pip install sinr\n\nfrom source\n-----------\n\n.. code:: bash\n\n conda activate sinr # activate conda environment\n git clone git@github.com:SINr-Embeddings/sinr.git\n cd sinr\n pip install poetry # poetry solves dependencies and installs SINr\n poetry install # installs SINr based on the pyproject.toml file\n\nUsage example\n=============\n\nTo get started using *SINr* to build graph and word embeddings, have a\nlook at the `notebook <./notebooks>`__ directory.\n\nHere is a minimum working example of *SINr*\n\n.. code:: python\n\n import urllib\n import io\n import gzip\n import networkit as nk\n import sinr.graph_embeddings as ge\n\n\n url = \"https://snap.stanford.edu/data/wiki-Vote.txt.gz\"\n graph_file = \"wikipedia-votes.txt\"\n # Read a graph from SNAP\n sock = urllib.request.urlopen(url) # open URL\n s = io.BytesIO(sock.read()) # read into BytesIO \"file\"\n sock.close()\n with gzip.open(s, \"rt\") as f_in:\n with open(graph_file, \"wt\") as f_out:\n f_out.writelines(f_in.readlines())\n # Initialize a networkit.Graph object from SNAP graph\n G = nk.readGraph(graph_file, nk.Format.SNAP)\n\n # Build a SINr model and extract embeddings\n model = ge.SINr.load_from_graph(G)\n model.run(algo=nk.community.PLM(G))\n embeddings = model.get_nr()\n print(embeddings)\n\nDocumentation\n=============\n\nThe documentation for *SINr* is `available\nonline <https://sinr-embeddings.github.io/sinr/index.html>`__.\n\nContributing\n============\n\nPull requests are welcome. For major changes, please open an issue first\nto disccus the changes to be made.\n\nLicense\n=======\n\nReleased under `CeCILL 2.1 <https://cecill.info/>`__, see `LICENSE <./LICENSE>`__ for more details.\n\nPublications\n============\n\n*SINr* is currently maintained at the *University of Le Mans*. If you\nfind *SINr* useful for your own research, please cite the appropriate\npapers from the list below. Publications can also be found on\n`publications page in the\ndocumentation <https://sinr-embeddings.github.io/sinr/_build/html/publications.html>`__.\n\n**Initial SINr paper, 2021**\n\n- Thibault Prouteau, Victor Connes, Nicolas Dugu\u00e9, Anthony Perez,\n Jean-Charles Lamirel, et al.. SINr: Fast Computing of Sparse\n Interpretable Node Representations is not a Sin!. Advances in\n Intelligent Data Analysis XIX, 19th International Symposium on\n Intelligent Data Analysis, IDA 2021, Apr 2021, Porto, Portugal.\n pp.325-337,\n \u27e8\\ `10.1007/978-3-030-74251-5_26 <https://dx.doi.org/10.1007/978-3-030-74251-5_26>`__\\ \u27e9.\n `\u27e8hal-03197434\u27e9 <https://hal.science/hal-03197434>`__\n\n**Interpretability of SINr embedding**\n\n- Thibault Prouteau, Nicolas Dugu\u00e9, Nathalie Camelin, Sylvain Meignier.\n Are Embedding Spaces Interpretable? Results of an Intrusion Detection\n Evaluation on a Large French Corpus. LREC 2022, Jun 2022, Marseille,\n France. `\u27e8hal-03770444\u27e9 <https://hal.science/hal-03770444>`__\n \n \n.. |languages| image:: https://img.shields.io/github/languages/count/SINr-Embeddings/sinr\n.. |downloads| image:: https://img.shields.io/pypi/dm/sinr\n.. |license| image:: https://img.shields.io/pypi/l/sinr?color=green\n.. |version| image:: https://img.shields.io/pypi/v/sinr\n.. |cpython| image:: https://img.shields.io/pypi/implementation/sinr\n.. |wheel| image:: https://img.shields.io/pypi/wheel/sinr\n.. |python| image:: https://img.shields.io/pypi/pyversions/sinr\n.. |docs| image:: https://img.shields.io/website?url=https%3A%2F%2Fsinr-embeddings.github.io%2Fsinr%2F_build%2Fhtml%2Findex.html\n.. |activity| image:: https://img.shields.io/github/commit-activity/y/SINr-Embeddings/sinr\n.. |contributors| image:: https://img.shields.io/github/contributors/SINr-Embeddings/sinr\n.. |quality| image:: https://scrutinizer-ci.com/g/SINr-Embeddings/sinr/badges/quality-score.png?b=main\n.. |build| image:: https://scrutinizer-ci.com/g/SINr-Embeddings/sinr/badges/build.png?b=main\n",
"bugtrack_url": null,
"license": "CeCILL 2.1",
"summary": "Build word and graph embeddings based on community detection in graphs.",
"version": "1.2.0",
"project_urls": {
"Homepage": "https://sinr-embeddings.github.io/sinr/_build/html/index.html",
"Repository": "https://github.com/SINr-Embeddings/sinr"
},
"split_keywords": [
"node embedding",
"word embedding",
"embedding",
"graph embedding",
"louvain",
"community"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "17e7f049c65b49f0ab7c37594cfa5b429103f8d34ad1a12fd9208f870874e940",
"md5": "5ad7d17928ba89f8e0629c62d9d353dc",
"sha256": "c382e034c16d29e4ccd4b89d7a6ec851d236501acd190f866b3a76c5ae2fbadc"
},
"downloads": -1,
"filename": "sinr-1.2.0-cp310-cp310-manylinux_2_35_x86_64.whl",
"has_sig": false,
"md5_digest": "5ad7d17928ba89f8e0629c62d9d353dc",
"packagetype": "bdist_wheel",
"python_version": "cp310",
"requires_python": ">=3.8,<4.0",
"size": 884850,
"upload_time": "2023-07-24T14:44:00",
"upload_time_iso_8601": "2023-07-24T14:44:00.775461Z",
"url": "https://files.pythonhosted.org/packages/17/e7/f049c65b49f0ab7c37594cfa5b429103f8d34ad1a12fd9208f870874e940/sinr-1.2.0-cp310-cp310-manylinux_2_35_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "054c7e624aa7d55fd2c51519e0d72fa8ff138d756cf54b0003442863ec798d69",
"md5": "bc4e59bd621d988f6927e988bffb4401",
"sha256": "cab03a9f1ce15de63489b5552d9244f7b2f9bd7991622016023fce761475a2ee"
},
"downloads": -1,
"filename": "sinr-1.2.0.tar.gz",
"has_sig": false,
"md5_digest": "bc4e59bd621d988f6927e988bffb4401",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8,<4.0",
"size": 55473,
"upload_time": "2023-07-24T14:44:02",
"upload_time_iso_8601": "2023-07-24T14:44:02.879113Z",
"url": "https://files.pythonhosted.org/packages/05/4c/7e624aa7d55fd2c51519e0d72fa8ff138d756cf54b0003442863ec798d69/sinr-1.2.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-07-24 14:44:02",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "SINr-Embeddings",
"github_project": "sinr",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "sinr"
}