Name | keytext JSON |
Version |
0.5
JSON |
| download |
home_page | |
Summary | Keyword based text extraction Pacakage (keytext) |
upload_time | 2024-01-24 04:58:06 |
maintainer | |
docs_url | None |
author | Soumyajit Basak |
requires_python | |
license | |
keywords |
textmining
nlp
document intelligence
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
## keyword based text extraction toolkit (keytext)
## What is it?
**keytext** is an all-in-one versatile and efficient Python package designed for keyword-based text search, manipulation, and data cleansing. Whether you need to **extract contextual information around specific keywords**, **remove unwanted terms from texts and dataframes**, **precisely locate the positions of keywords within a Pandas DataFrame**, **replacing single or a set of keywords**, keytext is your indispensable toolkit for advanced robust toolkit text analysis and data management.
## Main Features
Here are just a few of the things that keytext does well:
- Keyword Positioning: Locate the exact start and end positions of a keyword within a given text, facilitating precise information retrieval.
- Keyword Frequency: Extract the count of a keyword or set of keywords within a given text, facilitating precise information retrieval.
- Replacing keyword: Replace a single keyword or list of keywords with its corresponding replacement(s) in the given text.
- Contextual Extraction: Extract left and right texts, characters, words, and sentences surrounding a specified keyword as well as words between .
- Flexible Configuration: Customize the number of left and right characters, words, or sentences to tailor the extraction to your specific requirements.
- Text Between Keywords: Extract the text between two occurrences of the same keyword, offering deeper insights into the context of your data.
- Word Removal: Efficiently remove a list of specified words from texts, enhancing text cleanliness and relevance.
- Dataframe Cleansing: Seamlessly remove unwanted words from text columns in Pandas DataFrames, ensuring data integrity.
- Cell Positioning in DataFrame: Identify the row and column positions of a keyword within a Pandas DataFrame, enabling precise data manipulation.
- Random Pattern Search: Check for list of arbitrary patterns or regular expressions within the text data of a DataFrame, uncovering hidden insights and potential correlations.
- Easy Integration: Integrate keytext into your Python projects effortlessly, enhancing your text processing and data cleansing workflows.
## Installation Procedure
```sh
PyPI
pip install keytext
```
## Dependencies:
- [Regex - Adds support to itterating and finding keywords from the text and dataframe](https://docs.python.org/3/library/re.html)
- [Pandas - Adds support to deal with dataframe](https://docs.python.org/3/library/index.html)
## Functionalities (with parameters description):
#### keytext.keywords_occurrences(keywords, text)
- text (str): The input text
- keywords (str or list): The keyword or a list of keywords to count occurrences for
- Returns a dictionary mapping each keyword to its frequency in the text
#### keytext.replace_keywords(keywords, replacements, text)
- text (str): The input text
- keyword (str or list): The keyword or a list of keywords to be replaced
- replacement (str or list): The replacement string or a list of replacement strings corresponding to the keyword(s)
- Returns the text with replacements
#### keytext.keypos_text(keyword, text)
- text (str): The input text
- keyword (str): The keyword need to be searched
- Return all starting and ending position of the keyword from a text
- Output will be in list of tuples
#### keytext.extract_sents(keyword, text, format)
- text (str): The input text
- keyword (str): The keyword need to be searched in the sentences of given text
- format (str): By default format is l, that means list of sentences. If we pass p then the outpt format will be paragraph.
- This function extract all the sentences from a giuven text that contain the keyword
#### keytext.extract_words(keyword, text, left, right)
- text (str): The input text
- keyword (str): The keyword need to be searched in the given text
- left (int): The number of words from the left side of the keyword
- right (int): The number of words from the right side of the keyword
- This function extract the neighbourhood words of the keyword from a given text.
- In case of left_w = 0, right_w = n it will provide n number of words from the right side of the keyword
- In case of left_w = m, right_w = 0 it will provide m number of words from the left side of the keyword
- In case of left_w = m, right_w = n it will provide m left words and n right words of the keyword
#### keytext.extract_chr(keyword, text, left_chr, right_chr)
- text (str): The input text
- keyword (str): The keyword need to be searched in the given text
- left_chr (int): The number of charecters from the left side of the keyword
- right_chr (int): The number of charecters from the right side of the keyword
- This function extract the neighbourhood charecters of the keyword from a given text.
- In case of left_chr = 0, right_chr = n it will provide n number of charecters from the right side of the keyword
- In case of left_chr = m, right_chr = 0 it will provide m number of charecters from the left side of the keyword
- In case of left_chr = m, right_chr = n it will provide m left charecters and n right charecters of the keyword
#### keytext.left_texts(keyword, text, occurrence)
- text (str): The input text
- keyword (str): The keyword need to be searched in the given text
- occurrence (int or str): The number of charecters from the left side of the keyword, Occurene should be 1,2,...,n,'all'
- This function will return the left side of the keyword i.e. from the keyword to beginning of the text based on all occurence of keyword
- If we pass the 1 or 2 in occurence then it will return the left side text of 1st or 2nd occurence of the keyword from a text
- Provid the output in list format if occurence is "all"
#### keytext.right_texts(keyword, text, occurrence)
- text (str): The input text
- keyword (str): The keyword need to be searched in the given text
- occurrence (int or str): The number of charecters from the left side of the keyword, Occurene should be 1,2,...,n,'all'
- occurence means the repeation of the keyword in text
- This function will return the right side of the keyword i.e. from the keyword to ending of the text based on all occurence of keyword
- If we pass the 1 in occurence then it will return the right side text of 1st occurence of the keyword from a text
- Provid the output in list format if occurence is "all"
#### keytext.between_fixed_keyword(keyword, text)
- text (str): The input text
- keyword (str): The keyword replicating in given text
- Provide the part of the text between two same keyword
- Output will come in list format
#### keytext.between_distinct_keywords(keyword_start, keyword_end, text, keyword_start_occurence, keyword_end_occurence)
- text (str): The input text
- keyword_start (str): The starting keyword
- keyword_end (str): The ending keyword that should be different from strating keyword
- keyword_start_occurence (int): indicates the the repeatition of the starting keyword in given string
- keyword_end_occurence (int): indicates the the repeatition of the ending keyword in given string
- Provide the part of the text between two distinct keyword
- Output will come in list format
- For getting all snap texts in list format pass keyword_start_occurence = 0 and keyword_end_occurence = 0
#### keytext.text_keyword_remover(remover_list, text, replaced_by)
- text (str): The input text
- remover_list (list): List of keywords along with regex patterns need to be removed
- replaced_by (str): Replacing the unwanted list of keyword or patterns with some special charecters like space(" ")
- Non alphanumeric charecters need to be write in regex format
- Return the text after removing the unranted keyword or patterns
### keytext.text_pattern_finder(pattern_list, text)
- text (str): The input text
- pattern_list (list): List of regex patterns need to be searched within the text
- It will return the matched word with location
### keytext.keypos_df(keyword, dataframe)
- dataframe (dataframe): The input table
- keyword (str): The keyword need to be searched in the dataframe
- Return all cells position of the keyword from a giuven dataframe
- Output will be in list of tuples
### keytext.dataframe_keyword_remover(remover_list, dataframe, replaced_by)
- dataframe (dataframe): The input table
- remover_list (list): List of keywords along with regex patterns need to be removed
- replaced_by (str): Replacing the unwanted list of keyword or patterns with some special charecters like space(" ")
- This function remove the keyword from the dataframe
- Non alphanumeric charecters need to be write in regex format
### keytext.dataframe_pattern_finder(pattern, dataframe)
- dataframe (dataframe): The input table
- pattern (str): List of regex patterns need to be searched within the dataframe
- This function find the list of regex patterns from the dataframe
- It will return the matched word with cell identity
## Contributing to keytext
All contributions, bug reports, bug fixes, documentation improvements, enhancements, and ideas are welcome.
Feel free to ask questions on the [mailing list](https://groups.google.com/g/keytext)
## Change Log
0.1 (03/01/2024)
------------------
- First Release
0.2 (03/01/2024)
------------------
- Second Release
0.3 (03/01/2024)
------------------
- Third Release
0.4 (04/01/2024)
------------------
- Fourth Release
0.5 (24/01/2024)
------------------
- Fifth Release
Raw data
{
"_id": null,
"home_page": "",
"name": "keytext",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "textmining,NLP,document intelligence",
"author": "Soumyajit Basak",
"author_email": "soumyabasak96@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/66/0e/b93b6c152211b45424ea77eee810a739b9fa264ed6da8fa59280d66e1e89/keytext-0.5.tar.gz",
"platform": null,
"description": "## keyword based text extraction toolkit (keytext)\r\n\r\n## What is it?\r\n\r\n**keytext** is an all-in-one versatile and efficient Python package designed for keyword-based text search, manipulation, and data cleansing. Whether you need to **extract contextual information around specific keywords**, **remove unwanted terms from texts and dataframes**, **precisely locate the positions of keywords within a Pandas DataFrame**, **replacing single or a set of keywords**, keytext is your indispensable toolkit for advanced robust toolkit text analysis and data management.\r\n\r\n\r\n## Main Features\r\nHere are just a few of the things that keytext does well:\r\n\r\n - Keyword Positioning: Locate the exact start and end positions of a keyword within a given text, facilitating precise information retrieval.\r\n - Keyword Frequency: Extract the count of a keyword or set of keywords within a given text, facilitating precise information retrieval.\r\n - Replacing keyword: Replace a single keyword or list of keywords with its corresponding replacement(s) in the given text.\r\n - Contextual Extraction: Extract left and right texts, characters, words, and sentences surrounding a specified keyword as well as words between .\r\n - Flexible Configuration: Customize the number of left and right characters, words, or sentences to tailor the extraction to your specific requirements.\r\n - Text Between Keywords: Extract the text between two occurrences of the same keyword, offering deeper insights into the context of your data.\r\n - Word Removal: Efficiently remove a list of specified words from texts, enhancing text cleanliness and relevance.\r\n - Dataframe Cleansing: Seamlessly remove unwanted words from text columns in Pandas DataFrames, ensuring data integrity.\r\n - Cell Positioning in DataFrame: Identify the row and column positions of a keyword within a Pandas DataFrame, enabling precise data manipulation.\r\n - Random Pattern Search: Check for list of arbitrary patterns or regular expressions within the text data of a DataFrame, uncovering hidden insights and potential correlations.\r\n - Easy Integration: Integrate keytext into your Python projects effortlessly, enhancing your text processing and data cleansing workflows.\r\n\r\n\r\n## Installation Procedure\r\n```sh\r\nPyPI\r\npip install keytext\r\n```\r\n\r\n## Dependencies:\r\n- [Regex - Adds support to itterating and finding keywords from the text and dataframe](https://docs.python.org/3/library/re.html)\r\n- [Pandas - Adds support to deal with dataframe](https://docs.python.org/3/library/index.html)\r\n\r\n\r\n## Functionalities (with parameters description):\r\n\r\n#### keytext.keywords_occurrences(keywords, text)\r\n\t- text (str): The input text\r\n\t- keywords (str or list): The keyword or a list of keywords to count occurrences for\r\n\t- Returns a dictionary mapping each keyword to its frequency in the text\r\n\r\n#### keytext.replace_keywords(keywords, replacements, text)\r\n\t- text (str): The input text\r\n\t- keyword (str or list): The keyword or a list of keywords to be replaced\r\n\t- replacement (str or list): The replacement string or a list of replacement strings corresponding to the keyword(s)\r\n\t- Returns the text with replacements\r\n\r\n#### keytext.keypos_text(keyword, text)\r\n\t- text (str): The input text\r\n\t- keyword (str): The keyword need to be searched\r\n\t- Return all starting and ending position of the keyword from a text\r\n\t- Output will be in list of tuples\r\n\r\n#### keytext.extract_sents(keyword, text, format)\r\n\t- text (str): The input text\r\n\t- keyword (str): The keyword need to be searched in the sentences of given text\r\n\t- format (str): By default format is l, that means list of sentences. If we pass p then the outpt format will be paragraph.\r\n\t- This function extract all the sentences from a giuven text that contain the keyword\r\n \r\n#### keytext.extract_words(keyword, text, left, right)\r\n\t- text (str): The input text\r\n\t- keyword (str): The keyword need to be searched in the given text\r\n\t- left (int): The number of words from the left side of the keyword\r\n\t- right (int): The number of words from the right side of the keyword\r\n\t- This function extract the neighbourhood words of the keyword from a given text.\r\n\t- In case of left_w = 0, right_w = n it will provide n number of words from the right side of the keyword\r\n\t- In case of left_w = m, right_w = 0 it will provide m number of words from the left side of the keyword\r\n\t- In case of left_w = m, right_w = n it will provide m left words and n right words of the keyword\r\n \r\n#### keytext.extract_chr(keyword, text, left_chr, right_chr)\r\n\t- text (str): The input text\r\n\t- keyword (str): The keyword need to be searched in the given text\r\n\t- left_chr (int): The number of charecters from the left side of the keyword\r\n\t- right_chr (int): The number of charecters from the right side of the keyword\r\n\t- This function extract the neighbourhood charecters of the keyword from a given text.\r\n\t- In case of left_chr = 0, right_chr = n it will provide n number of charecters from the right side of the keyword\r\n\t- In case of left_chr = m, right_chr = 0 it will provide m number of charecters from the left side of the keyword\r\n\t- In case of left_chr = m, right_chr = n it will provide m left charecters and n right charecters of the keyword\r\n\r\n#### keytext.left_texts(keyword, text, occurrence)\r\n\t- text (str): The input text\r\n\t- keyword (str): The keyword need to be searched in the given text\r\n\t- occurrence (int or str): The number of charecters from the left side of the keyword, Occurene should be 1,2,...,n,'all'\r\n\t- This function will return the left side of the keyword i.e. from the keyword to beginning of the text based on all occurence of keyword\r\n\t- If we pass the 1 or 2 in occurence then it will return the left side text of 1st or 2nd occurence of the keyword from a text\r\n\t- Provid the output in list format if occurence is \"all\"\r\n\t\r\n#### keytext.right_texts(keyword, text, occurrence)\r\n\t- text (str): The input text\r\n\t- keyword (str): The keyword need to be searched in the given text\r\n\t- occurrence (int or str): The number of charecters from the left side of the keyword, Occurene should be 1,2,...,n,'all'\r\n\t- occurence means the repeation of the keyword in text\r\n\t- This function will return the right side of the keyword i.e. from the keyword to ending of the text based on all occurence of keyword\r\n\t- If we pass the 1 in occurence then it will return the right side text of 1st occurence of the keyword from a text\r\n\t- Provid the output in list format if occurence is \"all\"\r\n\t\r\n#### keytext.between_fixed_keyword(keyword, text)\r\n\t- text (str): The input text\r\n\t- keyword (str): The keyword replicating in given text\r\n\t- Provide the part of the text between two same keyword\r\n\t- Output will come in list format\r\n\r\n#### keytext.between_distinct_keywords(keyword_start, keyword_end, text, keyword_start_occurence, keyword_end_occurence)\r\n\t- text (str): The input text\r\n\t- keyword_start (str): The starting keyword\r\n\t- keyword_end (str): The ending keyword that should be different from strating keyword\r\n\t- keyword_start_occurence (int): indicates the the repeatition of the starting keyword in given string\r\n\t- keyword_end_occurence (int): indicates the the repeatition of the ending keyword in given string\r\n\t- Provide the part of the text between two distinct keyword\r\n\t- Output will come in list format\r\n\t- For getting all snap texts in list format pass keyword_start_occurence = 0 and keyword_end_occurence = 0\r\n\r\n#### keytext.text_keyword_remover(remover_list, text, replaced_by)\r\n\t- text (str): The input text\r\n\t- remover_list (list): List of keywords along with regex patterns need to be removed\r\n\t- replaced_by (str): Replacing the unwanted list of keyword or patterns with some special charecters like space(\" \")\r\n\t- Non alphanumeric charecters need to be write in regex format\r\n\t- Return the text after removing the unranted keyword or patterns\r\n\r\n### keytext.text_pattern_finder(pattern_list, text)\r\n\t- text (str): The input text\r\n\t- pattern_list (list): List of regex patterns need to be searched within the text\r\n\t- It will return the matched word with location\r\n\r\n### keytext.keypos_df(keyword, dataframe)\r\n\t- dataframe (dataframe): The input table\r\n\t- keyword (str): The keyword need to be searched in the dataframe\r\n\t- Return all cells position of the keyword from a giuven dataframe\r\n\t- Output will be in list of tuples\r\n\r\n### keytext.dataframe_keyword_remover(remover_list, dataframe, replaced_by)\r\n\t- dataframe (dataframe): The input table\r\n\t- remover_list (list): List of keywords along with regex patterns need to be removed\r\n\t- replaced_by (str): Replacing the unwanted list of keyword or patterns with some special charecters like space(\" \")\r\n\t- This function remove the keyword from the dataframe\r\n\t- Non alphanumeric charecters need to be write in regex format\r\n\r\n### keytext.dataframe_pattern_finder(pattern, dataframe)\r\n\t- dataframe (dataframe): The input table\r\n\t- pattern (str): List of regex patterns need to be searched within the dataframe\r\n\t- This function find the list of regex patterns from the dataframe\r\n\t- It will return the matched word with cell identity\r\n\r\n\r\n## Contributing to keytext\r\nAll contributions, bug reports, bug fixes, documentation improvements, enhancements, and ideas are welcome.\r\nFeel free to ask questions on the [mailing list](https://groups.google.com/g/keytext)\r\n\r\n\r\n## Change Log\r\n\r\n0.1 (03/01/2024)\r\n------------------\r\n- First Release\r\n\r\n0.2 (03/01/2024)\r\n------------------\r\n- Second Release\r\n\r\n0.3 (03/01/2024)\r\n------------------\r\n- Third Release\r\n\r\n0.4 (04/01/2024)\r\n------------------\r\n- Fourth Release\r\n\r\n0.5 (24/01/2024)\r\n------------------\r\n- Fifth Release\r\n\r\n\r\n",
"bugtrack_url": null,
"license": "",
"summary": "Keyword based text extraction Pacakage (keytext)",
"version": "0.5",
"project_urls": null,
"split_keywords": [
"textmining",
"nlp",
"document intelligence"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "660eb93b6c152211b45424ea77eee810a739b9fa264ed6da8fa59280d66e1e89",
"md5": "7eec56a8541ab5b7ed16f93a9986047d",
"sha256": "0597deebb60258ed100bd8bf53fa4a3d3d7712c688253fbe07c32b77fa7ebefb"
},
"downloads": -1,
"filename": "keytext-0.5.tar.gz",
"has_sig": false,
"md5_digest": "7eec56a8541ab5b7ed16f93a9986047d",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 6952,
"upload_time": "2024-01-24T04:58:06",
"upload_time_iso_8601": "2024-01-24T04:58:06.070410Z",
"url": "https://files.pythonhosted.org/packages/66/0e/b93b6c152211b45424ea77eee810a739b9fa264ed6da8fa59280d66e1e89/keytext-0.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-01-24 04:58:06",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "keytext"
}