<h2 align="center">
SearchURL
</h2>
<h5 align="center">
<br/>
<br/>
<img src="https://img.shields.io/pepy/dt/SearchURL"/>
<img src="https://img.shields.io/pypi/v/SearchURL"/>
<img src="https://img.shields.io/pypi/status/SearchURL"/>
<img src="https://img.shields.io/pypi/l/SearchURL"/>
<a href="https://github.com/pahulgogna/SearchURL">github-searchURL</a>
</h5>
## Installation
Install SearchURL with pip
```bash
pip install SearchURL
```
## Documentation
**1. Getting all the text from a webpage by not passing in keywords:**
```python
from SearchURL.main import SearchURL
search = SearchURL(cache=True)
data = search.searchUrl(
url="https://en.wikipedia.org/wiki/Web_scraping"
)
print(data)
```
**output:** {'success': True, 'data': 'Web scraping - Wikipedia ...'}
**2. Searching with keywords:**
```python
from SearchURL.main import SearchURL
search = SearchURL(cache=True)
data = search.searchUrl(
url="https://en.wikipedia.org/wiki/Web_scraping",
keywords=['legal'])
print(data)
```
**output:** {'success': True, 'data': 'Legal issues Toggle Legal issues subsection Legal issues [ edit ] The legality of web scraping varies across the world ...'}
**3. Fuzzy Searching:**
```python
from SearchURL.main import SearchURL
search = SearchURL(cache=True)
data = search.searchUrlFuzz(
url="https://en.wikipedia.org/wiki/Web_scraping",
keywords=['legal'])
print(data)
```
**output:** {'success': True, 'data': 'Legal issues [ edit ] | In the United States, website owners can use three major legal claims to prevent undesired web scraping: (1) copyright ...'}
**4. Semantic Search:**
Yes, this package supports Semantic Search!
```python
from SearchURL.main import SearchURL
search = SearchURL(createVector=True) # creates a in-memory vector database using chromadb
data = search.createEmbededData("https://en.wikipedia.org/wiki/Web_scraping") # loads and embeds all the data from the webpage.
if data.get('success'): # data = {'success': True, 'db': db}
db = data.get('db')
results = db.query(keywords=['benefits', 'what benifits can we get from web scraping'], limit=10)
print(results)
else:
print(data.get('detail')) # data = {'success': False, 'detail': 'ERROR'}
```
## Errors
If this package runs into some error while fetching and searching, it will return an object like this:
{'success': False, 'detail': 'The error that occurred'}
***
***
####
The URL used in this readme is a link to an article on [wikipedia.org](https://wikipedia.org) on the topic of [Web_scraping](https://en.wikipedia.org/wiki/Web_scraping).
Raw data
{
"_id": null,
"home_page": null,
"name": "SearchURL",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "python, web-scraping, search, searchurl, SearchURL, Fuzzy matching, fuzzy search, web, scraping, semantic search, nlp, natural language processing",
"author": "Pahul Gogna",
"author_email": "pahulgogna@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/53/f5/94f0b2e9557d3b1a14242d4fbc39a3f6155a404c02e062b92cf72dfa863a/SearchURL-1.1.4.tar.gz",
"platform": null,
"description": "\r\n<h2 align=\"center\">\r\n SearchURL\r\n</h2>\r\n\r\n<h5 align=\"center\">\r\n <br/>\r\n <br/>\r\n <img src=\"https://img.shields.io/pepy/dt/SearchURL\"/>\r\n <img src=\"https://img.shields.io/pypi/v/SearchURL\"/>\r\n <img src=\"https://img.shields.io/pypi/status/SearchURL\"/>\r\n <img src=\"https://img.shields.io/pypi/l/SearchURL\"/>\r\n <a href=\"https://github.com/pahulgogna/SearchURL\">github-searchURL</a>\r\n</h5>\r\n\r\n## Installation\r\nInstall SearchURL with pip\r\n\r\n```bash\r\n pip install SearchURL\r\n```\r\n \r\n## Documentation\r\n\r\n**1. Getting all the text from a webpage by not passing in keywords:**\r\n```python\r\nfrom SearchURL.main import SearchURL\r\n\r\nsearch = SearchURL(cache=True)\r\n\r\ndata = search.searchUrl(\r\n url=\"https://en.wikipedia.org/wiki/Web_scraping\"\r\n )\r\n\r\nprint(data)\r\n```\r\n**output:** {'success': True, 'data': 'Web scraping - Wikipedia ...'}\r\n\r\n**2. Searching with keywords:**\r\n\r\n```python\r\nfrom SearchURL.main import SearchURL\r\n\r\nsearch = SearchURL(cache=True)\r\n\r\ndata = search.searchUrl(\r\n url=\"https://en.wikipedia.org/wiki/Web_scraping\",\r\n keywords=['legal'])\r\n\r\nprint(data)\r\n```\r\n**output:** {'success': True, 'data': 'Legal issues Toggle Legal issues subsection Legal issues [ edit ] The legality of web scraping varies across the world ...'}\r\n\r\n**3. Fuzzy Searching:**\r\n\r\n```python\r\nfrom SearchURL.main import SearchURL\r\n\r\nsearch = SearchURL(cache=True)\r\n\r\ndata = search.searchUrlFuzz(\r\n url=\"https://en.wikipedia.org/wiki/Web_scraping\",\r\n keywords=['legal'])\r\n\r\n\r\nprint(data)\r\n```\r\n**output:** {'success': True, 'data': 'Legal issues [ edit ] | In the United States, website owners can use three major legal claims to prevent undesired web scraping: (1) copyright ...'}\r\n\r\n\r\n**4. Semantic Search:**\r\nYes, this package supports Semantic Search! \r\n\r\n```python\r\nfrom SearchURL.main import SearchURL\r\n\r\nsearch = SearchURL(createVector=True) # creates a in-memory vector database using chromadb\r\n\r\ndata = search.createEmbededData(\"https://en.wikipedia.org/wiki/Web_scraping\") # loads and embeds all the data from the webpage.\r\n\r\nif data.get('success'): # data = {'success': True, 'db': db}\r\n db = data.get('db') \r\n results = db.query(keywords=['benefits', 'what benifits can we get from web scraping'], limit=10)\r\n print(results)\r\n\r\nelse:\r\n print(data.get('detail')) # data = {'success': False, 'detail': 'ERROR'}\r\n```\r\n\r\n## Errors\r\nIf this package runs into some error while fetching and searching, it will return an object like this: \r\n{'success': False, 'detail': 'The error that occurred'}\r\n\r\n***\r\n***\r\n\r\n####\r\nThe URL used in this readme is a link to an article on [wikipedia.org](https://wikipedia.org) on the topic of [Web_scraping](https://en.wikipedia.org/wiki/Web_scraping).\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "SearchURL lets perform Keyword, Fuzzy and Semantic Search through the text on websites using thier URLs.",
"version": "1.1.4",
"project_urls": null,
"split_keywords": [
"python",
" web-scraping",
" search",
" searchurl",
" searchurl",
" fuzzy matching",
" fuzzy search",
" web",
" scraping",
" semantic search",
" nlp",
" natural language processing"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "38185f5e4c07407a241fc6fcef37ca0b6f8c35949055ed094077f75c40b61580",
"md5": "7b4f3843cc5f92bfd717d5eba4cef250",
"sha256": "c6b8d848459dac364978a40d1af6a8476e6643a049bb9818e79a24e59ffa3f4a"
},
"downloads": -1,
"filename": "SearchURL-1.1.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "7b4f3843cc5f92bfd717d5eba4cef250",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 5848,
"upload_time": "2024-10-09T08:38:47",
"upload_time_iso_8601": "2024-10-09T08:38:47.637679Z",
"url": "https://files.pythonhosted.org/packages/38/18/5f5e4c07407a241fc6fcef37ca0b6f8c35949055ed094077f75c40b61580/SearchURL-1.1.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "53f594f0b2e9557d3b1a14242d4fbc39a3f6155a404c02e062b92cf72dfa863a",
"md5": "af416fb8d90552fa7f6546b49d128971",
"sha256": "3e909e16dd75a734e00b6e358a2a663608c24a3ae288e289c55d18a065e48b4f"
},
"downloads": -1,
"filename": "SearchURL-1.1.4.tar.gz",
"has_sig": false,
"md5_digest": "af416fb8d90552fa7f6546b49d128971",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 5411,
"upload_time": "2024-10-09T08:38:49",
"upload_time_iso_8601": "2024-10-09T08:38:49.845112Z",
"url": "https://files.pythonhosted.org/packages/53/f5/94f0b2e9557d3b1a14242d4fbc39a3f6155a404c02e062b92cf72dfa863a/SearchURL-1.1.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-09 08:38:49",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "searchurl"
}