semnet


Namesemnet JSON
Version 0.1.6 PyPI version JSON
download
home_pageNone
SummarySemantic Networks from Embeddings
upload_time2025-10-30 11:16:09
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseMIT
keywords semantic networks embeddings graph analysis nlp similarity networkx
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Semnet: efficient graph structures from embeddings

![Embeddings of Guardian headlines represented as a network structure by Semnet and visualised by Cosmograph](img/cosmo_semnet.png)
_Embeddings of Guardian headlines represented as a network by Semnet and visualised in [Cosmograph](cosmograph.app)_

## Introduction
Semnet constructs graph structures from embeddings, enabling graph-based analysis and operations over collections of embedded documents.

Semnet uses [Annoy](https://github.com/spotify/annoy) to perform efficient pair-wise distance calculations, allowing for million-embedding network construction in under ten minutes on consumer hardware.

Graphs are returned as [NetworkX](https://networkx.org) objects, opening up a wide range of algorithms for downstream use.

The name "Semnet" derives from _[semantic network](https://en.wikipedia.org/wiki/Semantic_network)_[^1], as it was initially designed for an NLP use-case, but the tool will work well with any form of embedded document (e.g., images, audio, even or [graphs](https://arxiv.org/abs/1707.05005)).

[^1]: Technically-speaking a [Semantic Similarity Network (SSN)](https://en.wikipedia.org/wiki/Semantic_similarity_network)

Semnet may be used for:
- **Graph algorithms**: enrich your data with [communities](https://networkx.org/documentation/stable/reference/algorithms/community.html), [centrality](https://networkx.org/documentation/stable/reference/algorithms/centrality.html) and [much more](https://networkx.org/documentation/stable/reference/algorithms/) for down-stream use in search, RAG and context engineering 
- **Deduplication**: remove duplicate records (e.g., "Donald Trump", "Donald J. Trump) from datasets
- **Exploratory data analysis and visualisation**, [Cosmograph](https://cosmograph.app/) works brilliantly for large corpora

Exposing the full NetworkX and Annoy APIs, Semnet offers plenty of opportunity for experimentation depending on your use-case. Check out the examples for inspiration.

## Installation

```bash
pip install semnet
```
## Quick Start
```python
from semnet import SemanticNetwork
from sentence_transformers import SentenceTransformer

# Your documents
docs = [
    "The cat sat on the mat",
    "A cat was sitting on a mat",
    "The dog ran in the park",
    "I love Python",
    "Python is a great programming language",
]

# Generate embeddings (use any embedding provider)
embedding_model = SentenceTransformer("BAAI/bge-base-en-v1.5")
embeddings = embedding_model.encode(docs)

# Create and configure semantic network
sem = SemanticNetwork(thresh=0.3, verbose=True)  # Larger values give sparser networks

# Build a NetworkX graph object from your embeddings
G = sem.fit_transform(embeddings, labels=docs)

# Export to pandas using the standalone function
from semnet import to_pandas
nodes, edges = to_pandas(G)
```

## Requirements

- Python 3.8+
- networkx
- annoy
- numpy
- pandas
- tqdm

Recommended for examples:
- sentence-transformers
- cosmograph

## Project origin

I love network analysis, and have explored embedding-derived [semantic networks](https://en.wikipedia.org/wiki/Semantic_network) in the past as an alternative approach to representing, clustering and querying news data. 

Semnet started life as a few functions I'd been using for deduplication for a forthcoming piece of research. I could a number of potential uses for my code, so I decided to package it up for others to use.

## Statement on the use of AI

I kicked off the project by hand-refactoring my initial code into the class-based structure that forms the core functionality of the current module.

I then used Github Copilot in VSCode to:
- Bootstrap scaffolding, tests, documentation, examples and typing
- Refactor the core methods in the style of the scikit-learn API
- Add additional functionality, e.g., the ability to pass custom data to nodes
- Walk me through deployment to [readthedocs](https://semnetdocs.readthedocs.io/) and [pypi](https://pypi.org/project/semnet/)

## Roadmap

Semnet is a relatively simple project focused on core graph construction functionality. I don't have much in the way of immediate plans to expand it, however can see the potential for a few future additions: 

- Performance optimizations for very large datasets
- Utilities for deduplication, as that's my main use case 
- Integration with graph visualization tools

## License

MIT License

## Citation

If you use Semnet in academic work, please cite:

```bibtex
@software{semnet,
  title={Semnet: Semantic Networks from Embeddings},
  author={Ian Goodrich},
  year={2025},
  url={https://github.com/specialprocedures/semnet}
}
```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "semnet",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "semantic networks, embeddings, graph analysis, nlp, similarity, networkx",
    "author": null,
    "author_email": "Ian Goodrich <ian@igdr.ch>",
    "download_url": "https://files.pythonhosted.org/packages/bd/31/378aed1faf0a0b8e4eb2c9f6d8c5037afac5add842b15ea290afef95dd0e/semnet-0.1.6.tar.gz",
    "platform": null,
    "description": "# Semnet: efficient graph structures from embeddings\n\n![Embeddings of Guardian headlines represented as a network structure by Semnet and visualised by Cosmograph](img/cosmo_semnet.png)\n_Embeddings of Guardian headlines represented as a network by Semnet and visualised in [Cosmograph](cosmograph.app)_\n\n## Introduction\nSemnet constructs graph structures from embeddings, enabling graph-based analysis and operations over collections of embedded documents.\n\nSemnet uses [Annoy](https://github.com/spotify/annoy) to perform efficient pair-wise distance calculations, allowing for million-embedding network construction in under ten minutes on consumer hardware.\n\nGraphs are returned as [NetworkX](https://networkx.org) objects, opening up a wide range of algorithms for downstream use.\n\nThe name \"Semnet\" derives from _[semantic network](https://en.wikipedia.org/wiki/Semantic_network)_[^1], as it was initially designed for an NLP use-case, but the tool will work well with any form of embedded document (e.g., images, audio, even or [graphs](https://arxiv.org/abs/1707.05005)).\n\n[^1]: Technically-speaking a [Semantic Similarity Network (SSN)](https://en.wikipedia.org/wiki/Semantic_similarity_network)\n\nSemnet may be used for:\n- **Graph algorithms**: enrich your data with [communities](https://networkx.org/documentation/stable/reference/algorithms/community.html), [centrality](https://networkx.org/documentation/stable/reference/algorithms/centrality.html) and [much more](https://networkx.org/documentation/stable/reference/algorithms/) for down-stream use in search, RAG and context engineering \n- **Deduplication**: remove duplicate records (e.g., \"Donald Trump\", \"Donald J. Trump) from datasets\n- **Exploratory data analysis and visualisation**, [Cosmograph](https://cosmograph.app/) works brilliantly for large corpora\n\nExposing the full NetworkX and Annoy APIs, Semnet offers plenty of opportunity for experimentation depending on your use-case. Check out the examples for inspiration.\n\n## Installation\n\n```bash\npip install semnet\n```\n## Quick Start\n```python\nfrom semnet import SemanticNetwork\nfrom sentence_transformers import SentenceTransformer\n\n# Your documents\ndocs = [\n    \"The cat sat on the mat\",\n    \"A cat was sitting on a mat\",\n    \"The dog ran in the park\",\n    \"I love Python\",\n    \"Python is a great programming language\",\n]\n\n# Generate embeddings (use any embedding provider)\nembedding_model = SentenceTransformer(\"BAAI/bge-base-en-v1.5\")\nembeddings = embedding_model.encode(docs)\n\n# Create and configure semantic network\nsem = SemanticNetwork(thresh=0.3, verbose=True)  # Larger values give sparser networks\n\n# Build a NetworkX graph object from your embeddings\nG = sem.fit_transform(embeddings, labels=docs)\n\n# Export to pandas using the standalone function\nfrom semnet import to_pandas\nnodes, edges = to_pandas(G)\n```\n\n## Requirements\n\n- Python 3.8+\n- networkx\n- annoy\n- numpy\n- pandas\n- tqdm\n\nRecommended for examples:\n- sentence-transformers\n- cosmograph\n\n## Project origin\n\nI love network analysis, and have explored embedding-derived [semantic networks](https://en.wikipedia.org/wiki/Semantic_network) in the past as an alternative approach to representing, clustering and querying news data. \n\nSemnet started life as a few functions I'd been using for deduplication for a forthcoming piece of research. I could a number of potential uses for my code, so I decided to package it up for others to use.\n\n## Statement on the use of AI\n\nI kicked off the project by hand-refactoring my initial code into the class-based structure that forms the core functionality of the current module.\n\nI then used Github Copilot in VSCode to:\n- Bootstrap scaffolding, tests, documentation, examples and typing\n- Refactor the core methods in the style of the scikit-learn API\n- Add additional functionality, e.g., the ability to pass custom data to nodes\n- Walk me through deployment to [readthedocs](https://semnetdocs.readthedocs.io/) and [pypi](https://pypi.org/project/semnet/)\n\n## Roadmap\n\nSemnet is a relatively simple project focused on core graph construction functionality. I don't have much in the way of immediate plans to expand it, however can see the potential for a few future additions: \n\n- Performance optimizations for very large datasets\n- Utilities for deduplication, as that's my main use case \n- Integration with graph visualization tools\n\n## License\n\nMIT License\n\n## Citation\n\nIf you use Semnet in academic work, please cite:\n\n```bibtex\n@software{semnet,\n  title={Semnet: Semantic Networks from Embeddings},\n  author={Ian Goodrich},\n  year={2025},\n  url={https://github.com/specialprocedures/semnet}\n}\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Semantic Networks from Embeddings",
    "version": "0.1.6",
    "project_urls": {
        "Bug Tracker": "https://github.com/specialprocedures/semnet/issues",
        "Documentation": "https://semnetdocs.readthedocs.io",
        "Homepage": "https://github.com/specialprocedures/semnet",
        "Repository": "https://github.com/specialprocedures/semnet"
    },
    "split_keywords": [
        "semantic networks",
        " embeddings",
        " graph analysis",
        " nlp",
        " similarity",
        " networkx"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "0b215267fbedaeb972401d23385947c04c75f8baa9a58030daf7e1b30947afcb",
                "md5": "08a3372f684d55b7567f1d0cc5a24311",
                "sha256": "a256e4db46d7bf5c4fe55b787ac449dc904d5688dd4aa4e71bacfdf08e04b586"
            },
            "downloads": -1,
            "filename": "semnet-0.1.6-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "08a3372f684d55b7567f1d0cc5a24311",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 10124,
            "upload_time": "2025-10-30T11:16:08",
            "upload_time_iso_8601": "2025-10-30T11:16:08.269983Z",
            "url": "https://files.pythonhosted.org/packages/0b/21/5267fbedaeb972401d23385947c04c75f8baa9a58030daf7e1b30947afcb/semnet-0.1.6-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "bd31378aed1faf0a0b8e4eb2c9f6d8c5037afac5add842b15ea290afef95dd0e",
                "md5": "43db2c3a1d944fa47d9d1776e7f9b404",
                "sha256": "474623b69b1f4fff5cd630866eb9f936eb7903ee1a56699f16d295417970efec"
            },
            "downloads": -1,
            "filename": "semnet-0.1.6.tar.gz",
            "has_sig": false,
            "md5_digest": "43db2c3a1d944fa47d9d1776e7f9b404",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 3812881,
            "upload_time": "2025-10-30T11:16:09",
            "upload_time_iso_8601": "2025-10-30T11:16:09.228111Z",
            "url": "https://files.pythonhosted.org/packages/bd/31/378aed1faf0a0b8e4eb2c9f6d8c5037afac5add842b15ea290afef95dd0e/semnet-0.1.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-30 11:16:09",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "specialprocedures",
    "github_project": "semnet",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "semnet"
}
        
Elapsed time: 1.51815s