# NLQL (Natural Language Query Language)
> A SQL-like query language designed specifically for natural language processing and text retrieval.
## Overview
NLQL is a query language that brings the power and simplicity of SQL to natural language processing. It provides a structured way to query and analyze unstructured text data, making it particularly useful for RAG (Retrieval-Augmented Generation) systems and large language models.
## Key Features
- SQL-like syntax for intuitive querying
- Multiple text unit support (character, word, sentence, paragraph, document)
- Rich set of operators for text analysis
- Semantic search capabilities
- Vector embedding support
- Extensible plugin system
- Performance optimizations with indexing and caching
## Basic Syntax
```sql
SELECT <UNIT>
[FROM <SOURCE>]
[WHERE <CONDITIONS>]
[GROUP BY <FIELD>]
[ORDER BY <FIELD>]
[LIMIT <NUMBER>]
```
### Query Units
- `CHAR`: Character level
- `WORD`: Word level
- `SENTENCE`: Sentence level
- `PARAGRAPH`: Paragraph level
- `DOCUMENT`: Document level
### Basic Operators
```sql
CONTAINS("text") -- Contains specified text
STARTS_WITH("text") -- Starts with specified text
ENDS_WITH("text") -- Ends with specified text
LENGTH(<|>|=|<=|>=) number -- Length conditions
```
### Semantic Operators
```sql
SIMILAR_TO("text", threshold) -- Semantic similarity
TOPIC_IS("topic") -- Topic matching
SENTIMENT_IS("positive"|"negative"|"neutral") -- Sentiment analysis
```
### Vector Operators
```sql
EMBEDDING_DISTANCE("text", threshold) -- Vector distance
VECTOR_SIMILAR("vector", threshold) -- Vector similarity
```
## Usage Examples
### Basic Queries
```sql
-- Find sentences containing "artificial intelligence"
SELECT SENTENCE WHERE CONTAINS("artificial intelligence")
-- Find paragraphs with less than 100 characters
SELECT PARAGRAPH WHERE LENGTH < 100
```
### Advanced Queries
```sql
-- Find semantically similar sentences
SELECT SENTENCE
WHERE SIMILAR_TO("How to improve productivity", 0.8)
-- Find positive sentences about innovation
SELECT SENTENCE
WHERE CONTAINS("innovation")
AND SENTIMENT_IS("positive")
-- Here LENGTH is not a keyword, you need to register it manually. -> nlql.register_metadata_extractor("LENGTH", lambda x: len(x))
ORDER BY LENGTH
LIMIT 10
```
## Implementation
The system is implemented with three main components:
1. **Tokenizer**: Breaks down query strings into tokens
2. **Parser**: Converts tokens into an abstract syntax tree (AST)
3. **Executor**: Executes the query and returns results
### Performance Optimizations
- Inverted index for text search
- Vector index for semantic search
- Query result caching
- Parallel processing for large datasets
## Extension System
NLQL supports custom extensions through:
1. Plugin System
- Register custom operators
- Add new query units
- Implement custom functions
## Getting Started
1. Install the package:
```bash
pip install nlql
```
2. Basic usage:
```python
from nlql import NLQL
# Initialize NLQL
nlql = NLQL()
# Add text for querying
raw_text = """
Natural Language Processing (NLP) is a branch of artificial intelligence
that helps computers understand human language. This technology is used
in many applications. For example, virtual assistants use NLP to
understand your commands.
"""
nlql.text(raw_text)
# Execute query
results = nlql.execute("SELECT SENTENCE WHERE CONTAINS('artificial intelligence')")
# Print results
for result in results:
print(result)
```
## Contributing
We welcome contributions! Please see our contributing guidelines for more details.
## License
This project is licensed under the MIT License - see the LICENSE file for details.
Raw data
{
"_id": null,
"home_page": "https://github.com/natural-language-query-language/nlql-python",
"name": "nlql",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9.0",
"maintainer_email": null,
"keywords": "natural language, sql, llm, ai, rag, nlql, nlp",
"author": "Okysu",
"author_email": "yby@ecanse.com",
"download_url": "https://files.pythonhosted.org/packages/da/c0/980b3452b2fb665252379582613d3816205f882b0c51914ef4d292a20f0a/nlql-0.1.1.tar.gz",
"platform": null,
"description": "# NLQL (Natural Language Query Language)\n\n> A SQL-like query language designed specifically for natural language processing and text retrieval.\n\n## Overview\n\nNLQL is a query language that brings the power and simplicity of SQL to natural language processing. It provides a structured way to query and analyze unstructured text data, making it particularly useful for RAG (Retrieval-Augmented Generation) systems and large language models.\n\n## Key Features\n\n- SQL-like syntax for intuitive querying\n- Multiple text unit support (character, word, sentence, paragraph, document)\n- Rich set of operators for text analysis\n- Semantic search capabilities\n- Vector embedding support\n- Extensible plugin system\n- Performance optimizations with indexing and caching\n\n## Basic Syntax\n\n```sql\nSELECT <UNIT> \n[FROM <SOURCE>]\n[WHERE <CONDITIONS>]\n[GROUP BY <FIELD>]\n[ORDER BY <FIELD>]\n[LIMIT <NUMBER>]\n```\n\n### Query Units\n- `CHAR`: Character level\n- `WORD`: Word level\n- `SENTENCE`: Sentence level\n- `PARAGRAPH`: Paragraph level\n- `DOCUMENT`: Document level\n\n### Basic Operators\n```sql\nCONTAINS(\"text\") -- Contains specified text\nSTARTS_WITH(\"text\") -- Starts with specified text\nENDS_WITH(\"text\") -- Ends with specified text\nLENGTH(<|>|=|<=|>=) number -- Length conditions\n```\n\n### Semantic Operators\n```sql\nSIMILAR_TO(\"text\", threshold) -- Semantic similarity\nTOPIC_IS(\"topic\") -- Topic matching\nSENTIMENT_IS(\"positive\"|\"negative\"|\"neutral\") -- Sentiment analysis\n```\n\n### Vector Operators\n```sql\nEMBEDDING_DISTANCE(\"text\", threshold) -- Vector distance\nVECTOR_SIMILAR(\"vector\", threshold) -- Vector similarity\n```\n\n## Usage Examples\n\n### Basic Queries\n```sql\n-- Find sentences containing \"artificial intelligence\"\nSELECT SENTENCE WHERE CONTAINS(\"artificial intelligence\")\n\n-- Find paragraphs with less than 100 characters\nSELECT PARAGRAPH WHERE LENGTH < 100\n```\n\n### Advanced Queries\n```sql\n-- Find semantically similar sentences\nSELECT SENTENCE \nWHERE SIMILAR_TO(\"How to improve productivity\", 0.8)\n\n-- Find positive sentences about innovation\nSELECT SENTENCE \nWHERE CONTAINS(\"innovation\") \nAND SENTIMENT_IS(\"positive\")\n-- Here LENGTH is not a keyword, you need to register it manually. -> nlql.register_metadata_extractor(\"LENGTH\", lambda x: len(x))\nORDER BY LENGTH \nLIMIT 10\n```\n\n## Implementation\n\nThe system is implemented with three main components:\n\n1. **Tokenizer**: Breaks down query strings into tokens\n2. **Parser**: Converts tokens into an abstract syntax tree (AST)\n3. **Executor**: Executes the query and returns results\n\n### Performance Optimizations\n\n- Inverted index for text search\n- Vector index for semantic search\n- Query result caching\n- Parallel processing for large datasets\n\n## Extension System\n\nNLQL supports custom extensions through:\n\n1. Plugin System\n - Register custom operators\n - Add new query units\n - Implement custom functions\n\n## Getting Started\n\n1. Install the package:\n```bash\npip install nlql\n```\n\n2. Basic usage:\n```python\nfrom nlql import NLQL\n\n# Initialize NLQL\nnlql = NLQL()\n\n# Add text for querying\nraw_text = \"\"\"\nNatural Language Processing (NLP) is a branch of artificial intelligence \nthat helps computers understand human language. This technology is used \nin many applications. For example, virtual assistants use NLP to \nunderstand your commands.\n\"\"\"\nnlql.text(raw_text)\n\n# Execute query\nresults = nlql.execute(\"SELECT SENTENCE WHERE CONTAINS('artificial intelligence')\")\n\n# Print results\nfor result in results:\n print(result)\n```\n\n## Contributing\n\nWe welcome contributions! Please see our contributing guidelines for more details.\n\n## License\n\nThis project is licensed under the MIT License - see the LICENSE file for details.\n",
"bugtrack_url": null,
"license": "MIT Licence",
"summary": "NLQL (Natural Language Query Language) is a tool that helps you search through text using simple commands that look like SQL. Just like how SQL helps you find information in databases, NLQL helps you find information in regular text.",
"version": "0.1.1",
"project_urls": {
"Homepage": "https://github.com/natural-language-query-language/nlql-python"
},
"split_keywords": [
"natural language",
" sql",
" llm",
" ai",
" rag",
" nlql",
" nlp"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "5d7628e4afd23a6608c73e2aa84ce70b7a5ddd3783709476713435f391e164f3",
"md5": "d2768e40a0dee65c30d6ef498b8c4a37",
"sha256": "08ee66e6f01646c115576f23fffe098c31e876f193b92ab2ac6fc38d433fbb0e"
},
"downloads": -1,
"filename": "nlql-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "d2768e40a0dee65c30d6ef498b8c4a37",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9.0",
"size": 34636,
"upload_time": "2025-01-07T10:45:51",
"upload_time_iso_8601": "2025-01-07T10:45:51.793157Z",
"url": "https://files.pythonhosted.org/packages/5d/76/28e4afd23a6608c73e2aa84ce70b7a5ddd3783709476713435f391e164f3/nlql-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "dac0980b3452b2fb665252379582613d3816205f882b0c51914ef4d292a20f0a",
"md5": "90c474a1b1e20270f0ad921f42f82397",
"sha256": "d8c47f1f05454218dc29a2c1b72c2e6a4e78a7a0ae8b67291a7ca7f0ec48e39e"
},
"downloads": -1,
"filename": "nlql-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "90c474a1b1e20270f0ad921f42f82397",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9.0",
"size": 29617,
"upload_time": "2025-01-07T10:45:54",
"upload_time_iso_8601": "2025-01-07T10:45:54.147321Z",
"url": "https://files.pythonhosted.org/packages/da/c0/980b3452b2fb665252379582613d3816205f882b0c51914ef4d292a20f0a/nlql-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-07 10:45:54",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "natural-language-query-language",
"github_project": "nlql-python",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "certifi",
"specs": [
[
"==",
"2024.12.14"
]
]
},
{
"name": "cffi",
"specs": [
[
"==",
"1.17.1"
]
]
},
{
"name": "charset-normalizer",
"specs": [
[
"==",
"3.4.1"
]
]
},
{
"name": "cryptography",
"specs": [
[
"==",
"44.0.0"
]
]
},
{
"name": "docutils",
"specs": [
[
"==",
"0.21.2"
]
]
},
{
"name": "idna",
"specs": [
[
"==",
"3.10"
]
]
},
{
"name": "jaraco.classes",
"specs": [
[
"==",
"3.4.0"
]
]
},
{
"name": "jaraco.context",
"specs": [
[
"==",
"6.0.1"
]
]
},
{
"name": "jaraco.functools",
"specs": [
[
"==",
"4.1.0"
]
]
},
{
"name": "jeepney",
"specs": [
[
"==",
"0.8.0"
]
]
},
{
"name": "jieba",
"specs": [
[
"==",
"0.42.1"
]
]
},
{
"name": "keyring",
"specs": [
[
"==",
"25.6.0"
]
]
},
{
"name": "keyrings.alt",
"specs": [
[
"==",
"5.0.2"
]
]
},
{
"name": "markdown-it-py",
"specs": [
[
"==",
"3.0.0"
]
]
},
{
"name": "mdurl",
"specs": [
[
"==",
"0.1.2"
]
]
},
{
"name": "more-itertools",
"specs": [
[
"==",
"10.5.0"
]
]
},
{
"name": "nh3",
"specs": [
[
"==",
"0.2.20"
]
]
},
{
"name": "numpy",
"specs": [
[
"==",
"2.2.1"
]
]
},
{
"name": "packaging",
"specs": [
[
"==",
"24.2"
]
]
},
{
"name": "pkginfo",
"specs": [
[
"==",
"1.12.0"
]
]
},
{
"name": "pycparser",
"specs": [
[
"==",
"2.22"
]
]
},
{
"name": "Pygments",
"specs": [
[
"==",
"2.19.1"
]
]
},
{
"name": "readme_renderer",
"specs": [
[
"==",
"44.0"
]
]
},
{
"name": "requests",
"specs": [
[
"==",
"2.32.3"
]
]
},
{
"name": "requests-toolbelt",
"specs": [
[
"==",
"1.0.0"
]
]
},
{
"name": "rfc3986",
"specs": [
[
"==",
"2.0.0"
]
]
},
{
"name": "rich",
"specs": [
[
"==",
"13.9.4"
]
]
},
{
"name": "SecretStorage",
"specs": [
[
"==",
"3.3.3"
]
]
},
{
"name": "setuptools",
"specs": [
[
"==",
"75.7.0"
]
]
},
{
"name": "twine",
"specs": [
[
"==",
"6.0.1"
]
]
},
{
"name": "urllib3",
"specs": [
[
"==",
"2.3.0"
]
]
},
{
"name": "wheel",
"specs": [
[
"==",
"0.45.1"
]
]
}
],
"lcname": "nlql"
}