phrasecrafter

Name	phrasecrafter JSON
Version	0.0.2 JSON
	download
home_page
Summary	Keyword based text extraction Pacakage (textsnipper)
upload_time	2024-01-03 10:57:12
maintainer
docs_url	None
author	Soumyajit Basak
requires_python
license
keywords	textmining nlp document intelligence
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            ## keyword based text extraction toolkit (phrasecrafter)

## What is it?

**phrasecrafter** is an all-in-one versatile and efficient Python package designed for keyword-based text search, manipulation, and data cleansing. Whether you need to **extract contextual information around specific keywords**, **remove unwanted terms from texts and dataframes**, or **precisely locate the positions of keywords within a Pandas DataFrame**, phrasecrafter is your indispensable toolkit for advanced robust toolkit text analysis and data management.


## Main Features
Here are just a few of the things that textsnipper does well:

  - Keyword Positioning: Locate the exact start and end positions of a keyword within a given text, facilitating precise information retrieval.
  - Contextual Extraction: Extract left and right texts, characters, words, and sentences surrounding a specified keyword as well as words between .
  - Flexible Configuration: Customize the number of left and right characters, words, or sentences to tailor the extraction to your specific requirements.
  - Text Between Keywords: Extract the text between two occurrences of the same keyword, offering deeper insights into the context of your data.
  - Word Removal: Efficiently remove a list of specified words from texts, enhancing text cleanliness and relevance.
  - Dataframe Cleansing: Seamlessly remove unwanted words from text columns in Pandas DataFrames, ensuring data integrity.
  - Cell Positioning in DataFrame: Identify the row and column positions of a keyword within a Pandas DataFrame, enabling precise data manipulation.
  - Easy Integration: Integrate KeyExplorer into your Python projects effortlessly, enhancing your text processing and data cleansing workflows.


## Installation Procedure
```sh
PyPI
pip install phrasecrafter==0.0.2
```

## Dependencies:
- [Regex - Adds support to itterating and finding keywords from the text and dataframe](https://docs.python.org/3/library/re.html)


## Functionalities (with parameters description):

#### textsnipper.tkeypos(keyword, text)
	- Return all starting and ending position of the keyword from a giuven text
	- Output will be in list of tuples

#### textsnipper.extract_sents(keyword, text, format='l') 
	- This function extract all the sentences from a giuven text that contain the keyword
	- By default format is l, that means list of sentences. If we pass p then the outpt format will be paragraph.
    
#### textsnipper.extract_words(keyword, text, left_w=0, right_w=1)
	- This function extract the neighbourhood words of the keyword from a given text.
	- In case of left_w = 0, right_w = n it will provide n number of words from the right side of the keyword, n should be an integer
	- In case of left_w = m, right_w = 0 it will provide m number of words from the left side of the keyword, m should be an integer
	- In case of left_w = m, right_w = n it will provide m number of words from the left side of the keyword, n number of words from the right side of the keyword
    
#### textsnipper.extract_chars(keyword, text, left_chr=0, right_chr=1)
    - This function extract the neighbourhood charecters of the keyword from a given text.
	- In case of left_chr = 0, right_chr = n it will provide n number of charecters from the right side of the keyword, n should be an integer
	- In case of left_chr = m, right_chr = 0 it will provide m number of charecters from the left side of the keyword, m should be an integer
	- In case of left_chr = m, right_chr = n it will provide m number of charecters from the left side of the keyword, n number of charecters from the right side of the keyword

#### textsnipper.left_texts(keyword, text, occurrence='all')
	- This function will return the left side of the keyword i.e. from the keyword to beginning of the text based on all occurence of keyword
	- If we pass the 1 or 2 in occurence then it will return the left side text of 1st or 2nd occurence of the keyword from a text, Occurene should be 1,2,...,n,'all'
	- Provid ethe output in list format if occurence is all
	
#### textsnipper.right_texts(keyword, text, occurrence='all')
	- occurence means the repeation of the keyword in  text
	- This function will return the right side of the keyword i.e. from the keyword to ending of the text based on all occurence of keyword
	- If we pass the 1 in occurence then it will return the right side text of 1st occurence of the keyword from a text, Occurene should be 1,2,...,n,'all'
	- Provid ethe output in list format if occurence is all
	
#### textsnipper.between_fixed_keyword(keyword, text)
	- Provide the part of the text between two same keyword
	- Output will come in list format

#### textsnipper.between_distinct_keywords(keyword_start, keyword_end, text, keyword_start_occurence=1, keyword_end_occurence=1)
	- keyword_start_occurence indicates the the repeatition of the starting keyword in given string
	- keyword_end_occurence indicates the the repeatition of the starting  keyword in given string
	- Provide the part of the text between two distinct keyword
	- Output will come in list format
	- For getting all snap texts in list format pass keyword_start_occurence = 0 and keyword_end_occurence = 0

#### textsnipper.text_keyword_remover(remover_list, text, replaced_by)
	- This function remove the keyword from the text
	- Non alphanumeric charecters need to be write in regex format

### textsnipper.dkeypos(keyword, dataframe)
	- Return all cells position of the keyword from a giuven dataframe
	- Output will be in list of tuples

### textsnipper.dataframe_keyword_remover(remover_list, dataframe, replaced_by)
	- This function remove the keyword from the dataframe
	- Non alphanumeric charecters need to be write in regex format


## Contributing to pandas
All contributions, bug reports, bug fixes, documentation improvements, enhancements, and ideas are welcome.
Feel free to ask questions on the [mailing list](https://groups.google.com/forum/?fromgroups#!forum/pydata)


## Change Log
0.0.1 (03/01/2024)
------------------
- First Release

0.0.2 (03/01/2024)
------------------
- Second Release

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "phrasecrafter",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "textmining,NLP,document intelligence",
    "author": "Soumyajit Basak",
    "author_email": "soumyabasak96@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/f8/7a/7eb6b89d9befe6732f3652136bdebf07957f719cf5228cbca0cb09c20478/phrasecrafter-0.0.2.tar.gz",
    "platform": null,
    "description": "## keyword based text extraction toolkit (phrasecrafter)\r\n\r\n## What is it?\r\n\r\n**phrasecrafter** is an all-in-one versatile and efficient Python package designed for keyword-based text search, manipulation, and data cleansing. Whether you need to **extract contextual information around specific keywords**, **remove unwanted terms from texts and dataframes**, or **precisely locate the positions of keywords within a Pandas DataFrame**, phrasecrafter is your indispensable toolkit for advanced robust toolkit text analysis and data management.\r\n\r\n\r\n## Main Features\r\nHere are just a few of the things that textsnipper does well:\r\n\r\n  - Keyword Positioning: Locate the exact start and end positions of a keyword within a given text, facilitating precise information retrieval.\r\n  - Contextual Extraction: Extract left and right texts, characters, words, and sentences surrounding a specified keyword as well as words between .\r\n  - Flexible Configuration: Customize the number of left and right characters, words, or sentences to tailor the extraction to your specific requirements.\r\n  - Text Between Keywords: Extract the text between two occurrences of the same keyword, offering deeper insights into the context of your data.\r\n  - Word Removal: Efficiently remove a list of specified words from texts, enhancing text cleanliness and relevance.\r\n  - Dataframe Cleansing: Seamlessly remove unwanted words from text columns in Pandas DataFrames, ensuring data integrity.\r\n  - Cell Positioning in DataFrame: Identify the row and column positions of a keyword within a Pandas DataFrame, enabling precise data manipulation.\r\n  - Easy Integration: Integrate KeyExplorer into your Python projects effortlessly, enhancing your text processing and data cleansing workflows.\r\n\r\n\r\n## Installation Procedure\r\n```sh\r\nPyPI\r\npip install phrasecrafter==0.0.2\r\n```\r\n\r\n## Dependencies:\r\n- [Regex - Adds support to itterating and finding keywords from the text and dataframe](https://docs.python.org/3/library/re.html)\r\n\r\n\r\n## Functionalities (with parameters description):\r\n\r\n#### textsnipper.tkeypos(keyword, text)\r\n\t- Return all starting and ending position of the keyword from a giuven text\r\n\t- Output will be in list of tuples\r\n\r\n#### textsnipper.extract_sents(keyword, text, format='l') \r\n\t- This function extract all the sentences from a giuven text that contain the keyword\r\n\t- By default format is l, that means list of sentences. If we pass p then the outpt format will be paragraph.\r\n    \r\n#### textsnipper.extract_words(keyword, text, left_w=0, right_w=1)\r\n\t- This function extract the neighbourhood words of the keyword from a given text.\r\n\t- In case of left_w = 0, right_w = n it will provide n number of words from the right side of the keyword, n should be an integer\r\n\t- In case of left_w = m, right_w = 0 it will provide m number of words from the left side of the keyword, m should be an integer\r\n\t- In case of left_w = m, right_w = n it will provide m number of words from the left side of the keyword, n number of words from the right side of the keyword\r\n    \r\n#### textsnipper.extract_chars(keyword, text, left_chr=0, right_chr=1)\r\n    - This function extract the neighbourhood charecters of the keyword from a given text.\r\n\t- In case of left_chr = 0, right_chr = n it will provide n number of charecters from the right side of the keyword, n should be an integer\r\n\t- In case of left_chr = m, right_chr = 0 it will provide m number of charecters from the left side of the keyword, m should be an integer\r\n\t- In case of left_chr = m, right_chr = n it will provide m number of charecters from the left side of the keyword, n number of charecters from the right side of the keyword\r\n\r\n#### textsnipper.left_texts(keyword, text, occurrence='all')\r\n\t- This function will return the left side of the keyword i.e. from the keyword to beginning of the text based on all occurence of keyword\r\n\t- If we pass the 1 or 2 in occurence then it will return the left side text of 1st or 2nd occurence of the keyword from a text, Occurene should be 1,2,...,n,'all'\r\n\t- Provid ethe output in list format if occurence is all\r\n\t\r\n#### textsnipper.right_texts(keyword, text, occurrence='all')\r\n\t- occurence means the repeation of the keyword in  text\r\n\t- This function will return the right side of the keyword i.e. from the keyword to ending of the text based on all occurence of keyword\r\n\t- If we pass the 1 in occurence then it will return the right side text of 1st occurence of the keyword from a text, Occurene should be 1,2,...,n,'all'\r\n\t- Provid ethe output in list format if occurence is all\r\n\t\r\n#### textsnipper.between_fixed_keyword(keyword, text)\r\n\t- Provide the part of the text between two same keyword\r\n\t- Output will come in list format\r\n\r\n#### textsnipper.between_distinct_keywords(keyword_start, keyword_end, text, keyword_start_occurence=1, keyword_end_occurence=1)\r\n\t- keyword_start_occurence indicates the the repeatition of the starting keyword in given string\r\n\t- keyword_end_occurence indicates the the repeatition of the starting  keyword in given string\r\n\t- Provide the part of the text between two distinct keyword\r\n\t- Output will come in list format\r\n\t- For getting all snap texts in list format pass keyword_start_occurence = 0 and keyword_end_occurence = 0\r\n\r\n#### textsnipper.text_keyword_remover(remover_list, text, replaced_by)\r\n\t- This function remove the keyword from the text\r\n\t- Non alphanumeric charecters need to be write in regex format\r\n\r\n### textsnipper.dkeypos(keyword, dataframe)\r\n\t- Return all cells position of the keyword from a giuven dataframe\r\n\t- Output will be in list of tuples\r\n\r\n### textsnipper.dataframe_keyword_remover(remover_list, dataframe, replaced_by)\r\n\t- This function remove the keyword from the dataframe\r\n\t- Non alphanumeric charecters need to be write in regex format\r\n\r\n\r\n## Contributing to pandas\r\nAll contributions, bug reports, bug fixes, documentation improvements, enhancements, and ideas are welcome.\r\nFeel free to ask questions on the [mailing list](https://groups.google.com/forum/?fromgroups#!forum/pydata)\r\n\r\n\r\n## Change Log\r\n0.0.1 (03/01/2024)\r\n------------------\r\n- First Release\r\n\r\n0.0.2 (03/01/2024)\r\n------------------\r\n- Second Release\r\n\r\n\r\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Keyword based text extraction Pacakage (textsnipper)",
    "version": "0.0.2",
    "project_urls": null,
    "split_keywords": [
        "textmining",
        "nlp",
        "document intelligence"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f87a7eb6b89d9befe6732f3652136bdebf07957f719cf5228cbca0cb09c20478",
                "md5": "cd9c1b3bc9b52ea54806c469732157f8",
                "sha256": "bee5e5474026691c7e21bae8dbba139845a6f7bd229f881fe1752f8810d6686d"
            },
            "downloads": -1,
            "filename": "phrasecrafter-0.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "cd9c1b3bc9b52ea54806c469732157f8",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 5541,
            "upload_time": "2024-01-03T10:57:12",
            "upload_time_iso_8601": "2024-01-03T10:57:12.794227Z",
            "url": "https://files.pythonhosted.org/packages/f8/7a/7eb6b89d9befe6732f3652136bdebf07957f719cf5228cbca0cb09c20478/phrasecrafter-0.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-01-03 10:57:12",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "phrasecrafter"
}

Soumyajit Basak