Name | sbfi-knime-utils JSON |
Version |
1.4.2
JSON |
| download |
home_page | None |
Summary | A lightweight logging utility with folder clearing, browser download support, and pandas DataFrame output. |
upload_time | 2025-09-02 10:24:47 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.8 |
license | None |
keywords |
browser
folder
logging
selenium
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# Knime Python Utils
This Python package provides a collection of utility functions designed to streamline workflow automation, especially when used in conjunction with KNIME. It includes features for creating and clearing folders, structured logging, and helpful tools for managing ChromeDriver operations in browser automation tasks. The package is lightweight and ideal for repetitive data processing pipelines or test automation scenarios where clean folder management and robust logging are essential.
## Features
- **Logging**: Log messages with timestamps, function names, and error status, exportable to a pandas DataFrame.
- **Folder Management**: Clear or create folders with robust error handling.
- **Browser Automation**: Configure Chrome WebDriver for headless or non-headless file downloads, with support for monitoring and moving downloaded files.
- **Extensibility**: Designed for easy integration into automation scripts with comprehensive logging.
## Installation
Install the package via pip:
```bash
pip install sbfi-knime-utils
```
### Requirements
- Python >= 3.8
- Dependencies: `pandas>=1.5.0`, `selenium>=4.0.0`, `webdriver_manager>=4.0.0`
## Usage Guidelines
### Basic Logging
Create a logger, log messages, and export logs to a pandas DataFrame:
```python
from sbfi_knime_utils.logger import Logger
# Initialize logger
logger = Logger()
# Log messages
logger.log("main", "Starting application")
logger.log("main", "Error occurred", is_error=True)
# Export logs to DataFrame
df = logger.get_log_dataframe()
knio.output_tables[2] = knio.Table.from_pandas(df)
```
### Folder Management
Clear or create a folder:
```python
from sbfi_knime_utils.file_utils import clear_folder
# Clear all files in a folder
clear_folder("logs")
# Create a folder without clearing existing files
clear_folder("logs", clear_files=False)
```
### Browser Automation
Set up a Chrome WebDriver for downloading files and monitor the download directory:
```python
from sbfi_knime_utils.logger import Logger
from sbfi_knime_utils.chrome_utils import create_chrome_driver, wait_download_file
# Initialize logger
logger = Logger()
# Create Chrome WebDriver with default download directory
browser = create_chrome_driver(logger=logger) # Uses <cwd>/data/download
# Navigate to a download URL (example)
browser.get("https://example.com/sample.pdf")
# Wait for and process downloaded files
output_files = wait_download_file(
folder_to_check="data/download",
extension="pdf",
folder_storage="storage",
max_waiting_download=30,
replace_filename="report",
logger=logger
)
print(output_files)
# Clean up
browser.quit()
```
### Advanced Example
Combine logging, folder management, and browser automation:
```python
from sbfi_knime_utils.chrome_utils import create_chrome_driver, wait_download_file
# Initialize logger
logger = Logger()
# Clear download and storage directories
clear_folder("data/download")
clear_folder("storage")
# Create Chrome WebDriver with custom download directory
browser = create_chrome_driver(download_dir="custom_downloads", headless=True, logger=logger)
# Navigate to a page and trigger a download
browser.get("https://example.com/sample.pdf")
# Wait for and move the downloaded file
output_files = wait_download_file(
folder_to_check="custom_downloads",
extension="pdf",
folder_storage="storage",
max_waiting_download=30,
replace_filename="sample_report",
logger=logger
)
# Log results
logger.log("main", f"Processed files: {output_files}")
# Export logs
print(logger.get_log_dataframe())
# Clean up
browser.quit()
```
### Enable download when run in headless mode
Set up a Chrome WebDriver for downloading files and monitor the download directory:
```python
from sbfi_knime_utils.logger import Logger
from sbfi_knime_utils.chrome_utils import create_chrome_driver, wait_download_file, enable_download_headless
# Initialize logger
logger = Logger()
# Create Chrome WebDriver with default download directory
browser = create_chrome_driver(logger=logger) # Uses <cwd>/data/download
# enable download before start download
enable_download_headless(browser)
# Navigate to a download URL (example)
browser.get("https://example.com/sample.pdf")
# Wait for and process downloaded files
output_files = wait_download_file(
folder_to_check="data/download",
extension="pdf",
folder_storage="storage",
max_waiting_download=30,
replace_filename="report",
logger=logger
)
print(output_files)
# Clean up
browser.quit()
```
## API Reference
### `Logger` Class
A utility for logging messages with timestamps and exporting logs to a pandas DataFrame.
#### `__init__()`
Initialize an empty logger.
- **Parameters**: None
- **Returns**: None
#### `log(function_name: str, message: str, is_error: bool = False) -> None`
Log a message with a timestamp, function name, and error status.
- **Parameters**:
- `function_name` (str): Name of the function or context.
- `message` (str): Log message content.
- `is_error` (bool, optional): Indicates if the message is an error. Defaults to `False`.
- **Returns**: None
- **Example**:
```python
logger = Logger()
logger.log("main", "Task completed")
```
#### `get_log_dataframe() -> pd.DataFrame`
Return logged messages as a pandas DataFrame with columns `['Date', 'Function', 'Message', 'IsError']`.
- **Parameters**: None
- **Returns**: `pd.DataFrame` - DataFrame of logs, empty if no logs exist.
- **Example**:
```python
df = logger.get_log_dataframe()
print(df)
```
### `clear_folder(path: str, clear_files: bool = True) -> None`
Clear all files in a folder or create it if it doesn't exist.
- **Parameters**:
- `path` (str): Path to the folder.
- `clear_files` (bool, optional): If `True`, delete all files and subdirectories. Defaults to `True`.
- **Returns**: None
- **Raises**:
- `ValueError`: If `path` is empty or not a directory.
- `OSError`: If folder creation or clearing fails.
- **Example**:
```python
clear_folder("logs")
```
### `create_chrome_driver(download_dir: Optional[str] = None, headless: bool = True, logger: Optional[Logger] = None) -> WebDriver`
Create a Selenium Chrome WebDriver configured for file downloads.
- **Parameters**:
- `download_dir` (str, optional): Directory for downloads. If `None`, defaults to `<cwd>/data/download`.
- `headless` (bool, optional): Run in headless mode if `True`. Defaults to `True`.
- `clear_download_dir` (bool, optional): Clear download folder if `True`. Defaults to `True`.
- `disable_web_security` (bool, optional): Disable web security if `True`. Allow access to the web run with `HTTP` or unsecure. `**BE CAREFULL!!**`. Defaults to `False`.
- `domain_skip_security` (List[str], optional): List of domains to treat as secure (bypass insecure warnings). Defaults to `None`.
- `enable_incognito` (bool, optional): Run in incognito mode if `True`. Defaults to `True`.
- `logger` (Logger, optional): Logger instance for logging actions. Defaults to `None`.
- **Returns**: `WebDriver` - Configured Chrome WebDriver instance.
- **Raises**:
- `ValueError`: If `download_dir` is not a directory.
- `OSError`: If `download_dir` cannot be created.
- `WebDriverException`: If driver initialization fails.
- **Example**:
```python
browser = create_chrome_driver(download_dir="downloads", logger=logger)
```
```python
browser = create_chrome_driver(download_dir="downloads", disable_web_security=True, domain_skip_security=["localhost.com"] logger=logger)
```
### `enable_download_headless(browser: WebDriver, download_dir: str, logger: Optional[Logger] = None) -> None`
Configure a headless Chrome WebDriver to allow file downloads.
- **Parameters**:
- `browser` (WebDriver): Selenium WebDriver instance.
- `download_dir` (str): Directory for downloads.
- `logger` (Logger, optional): Logger instance for logging. Defaults to `None`.
- **Returns**: None
- **Raises**:
- `ValueError`: If `download_dir` is empty or not a directory.
- `OSError`: If `download_dir` cannot be created.
- `WebDriverException`: If the browser command fails.
- **Example**:
```python
enable_download_headless(browser, "downloads", logger)
```
### `wait_download_file(folder_to_check: str, extension: str, folder_storage: str, max_waiting_download: int, replace_filename: Optional[str] = None, logger: Optional[Logger] = None) -> List[List[str]]`
Monitor a folder for files with a specified extension, move them to a storage folder, and log actions.
- **Parameters**:
- `folder_to_check` (str): Folder to monitor for downloads.
- `extension` (str): File extension to look for (e.g., `'pdf'` or `'.pdf'`).
- `folder_storage` (str): Destination folder for moved files.
- `max_waiting_download` (int): Maximum seconds to wait for downloads.
- `replace_filename` (str, optional): New filename (without extension) for moved files. Defaults to `None`.
- `logger` (Logger, optional): Logger instance for logging. Defaults to `None`.
- **Returns**: `List[List[str]]` - List of `[filename, filepath, extension]` for processed files.
- **Raises**:
- `ValueError`: If folders or extension are invalid.
- `OSError`: If folder creation or file operations fail.
- `TimeoutError`: If no files are found within `max_waiting_download` seconds.
- **Example**:
```python
files = wait_download_file("downloads", "pdf", "storage", 300, "report", logger)
```
## Testing
The package includes a test suite using pytest. To run tests:
```bash
pip install pytest
pytest tests/
```
The test suite covers:
- Logger functionality (`test_logger.py`)
- Folder clearing (`test_utils.py`)
- Browser automation (`test_browser_utils.py`)
- Package imports and version (`test_init.py`)
## Contributing
1. Fork the repository: `https://github.com/TuiTenTuan/sbfi-knime-utils`
2. Create a feature branch: `git checkout -b feature-name`
3. Commit changes: `git commit -m "Add feature"`
4. Push to the branch: `git push origin feature-name`
5. Open a pull request.
## License
MIT License. See [LICENSE](LICENSE) for details.
## Support
For issues or feature requests, open an issue on the [GitHub repository](https://github.com/TuiTenTuan/sbfi-knime-utils/issues).
Raw data
{
"_id": null,
"home_page": null,
"name": "sbfi-knime-utils",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "browser, folder, logging, selenium",
"author": null,
"author_email": "SBFI CoE Team <sbfap_autobot@suntory.com>",
"download_url": "https://files.pythonhosted.org/packages/26/15/85a687ce8927a9cd18248963472ff3320f881c5969b967f0672fef1e7ccd/sbfi_knime_utils-1.4.2.tar.gz",
"platform": null,
"description": "# Knime Python Utils\n\nThis Python package provides a collection of utility functions designed to streamline workflow automation, especially when used in conjunction with KNIME. It includes features for creating and clearing folders, structured logging, and helpful tools for managing ChromeDriver operations in browser automation tasks. The package is lightweight and ideal for repetitive data processing pipelines or test automation scenarios where clean folder management and robust logging are essential.\n\n## Features\n- **Logging**: Log messages with timestamps, function names, and error status, exportable to a pandas DataFrame.\n- **Folder Management**: Clear or create folders with robust error handling.\n- **Browser Automation**: Configure Chrome WebDriver for headless or non-headless file downloads, with support for monitoring and moving downloaded files.\n- **Extensibility**: Designed for easy integration into automation scripts with comprehensive logging.\n\n## Installation\nInstall the package via pip:\n```bash\npip install sbfi-knime-utils\n```\n\n### Requirements\n- Python >= 3.8\n- Dependencies: `pandas>=1.5.0`, `selenium>=4.0.0`, `webdriver_manager>=4.0.0`\n\n## Usage Guidelines\n### Basic Logging\nCreate a logger, log messages, and export logs to a pandas DataFrame:\n```python\nfrom sbfi_knime_utils.logger import Logger\n\n# Initialize logger\nlogger = Logger()\n\n# Log messages\nlogger.log(\"main\", \"Starting application\")\nlogger.log(\"main\", \"Error occurred\", is_error=True)\n\n# Export logs to DataFrame\ndf = logger.get_log_dataframe()\n\nknio.output_tables[2] = knio.Table.from_pandas(df)\n```\n\n### Folder Management\nClear or create a folder:\n```python\nfrom sbfi_knime_utils.file_utils import clear_folder\n\n# Clear all files in a folder\nclear_folder(\"logs\")\n\n# Create a folder without clearing existing files\nclear_folder(\"logs\", clear_files=False)\n```\n\n### Browser Automation\nSet up a Chrome WebDriver for downloading files and monitor the download directory:\n```python\nfrom sbfi_knime_utils.logger import Logger\nfrom sbfi_knime_utils.chrome_utils import create_chrome_driver, wait_download_file\n\n# Initialize logger\nlogger = Logger()\n\n# Create Chrome WebDriver with default download directory\nbrowser = create_chrome_driver(logger=logger) # Uses <cwd>/data/download\n\n# Navigate to a download URL (example)\nbrowser.get(\"https://example.com/sample.pdf\")\n\n# Wait for and process downloaded files\noutput_files = wait_download_file(\n folder_to_check=\"data/download\",\n extension=\"pdf\",\n folder_storage=\"storage\",\n max_waiting_download=30,\n replace_filename=\"report\",\n logger=logger\n)\nprint(output_files)\n\n# Clean up\nbrowser.quit()\n```\n\n### Advanced Example\nCombine logging, folder management, and browser automation:\n```python\nfrom sbfi_knime_utils.chrome_utils import create_chrome_driver, wait_download_file\n\n# Initialize logger\nlogger = Logger()\n\n# Clear download and storage directories\nclear_folder(\"data/download\")\nclear_folder(\"storage\")\n\n# Create Chrome WebDriver with custom download directory\nbrowser = create_chrome_driver(download_dir=\"custom_downloads\", headless=True, logger=logger)\n\n# Navigate to a page and trigger a download\nbrowser.get(\"https://example.com/sample.pdf\")\n\n# Wait for and move the downloaded file\noutput_files = wait_download_file(\n folder_to_check=\"custom_downloads\",\n extension=\"pdf\",\n folder_storage=\"storage\",\n max_waiting_download=30,\n replace_filename=\"sample_report\",\n logger=logger\n)\n\n# Log results\nlogger.log(\"main\", f\"Processed files: {output_files}\")\n\n# Export logs\nprint(logger.get_log_dataframe())\n\n# Clean up\nbrowser.quit()\n```\n\n### Enable download when run in headless mode\nSet up a Chrome WebDriver for downloading files and monitor the download directory:\n```python\nfrom sbfi_knime_utils.logger import Logger\nfrom sbfi_knime_utils.chrome_utils import create_chrome_driver, wait_download_file, enable_download_headless\n\n# Initialize logger\nlogger = Logger()\n\n# Create Chrome WebDriver with default download directory\nbrowser = create_chrome_driver(logger=logger) # Uses <cwd>/data/download\n\n# enable download before start download\nenable_download_headless(browser)\n\n# Navigate to a download URL (example)\nbrowser.get(\"https://example.com/sample.pdf\")\n\n# Wait for and process downloaded files\noutput_files = wait_download_file(\n folder_to_check=\"data/download\",\n extension=\"pdf\",\n folder_storage=\"storage\",\n max_waiting_download=30,\n replace_filename=\"report\",\n logger=logger\n)\nprint(output_files)\n\n# Clean up\nbrowser.quit()\n```\n\n## API Reference\n\n### `Logger` Class\nA utility for logging messages with timestamps and exporting logs to a pandas DataFrame.\n\n#### `__init__()`\nInitialize an empty logger.\n- **Parameters**: None\n- **Returns**: None\n\n#### `log(function_name: str, message: str, is_error: bool = False) -> None`\nLog a message with a timestamp, function name, and error status.\n- **Parameters**:\n - `function_name` (str): Name of the function or context.\n - `message` (str): Log message content.\n - `is_error` (bool, optional): Indicates if the message is an error. Defaults to `False`.\n- **Returns**: None\n- **Example**:\n ```python\n logger = Logger()\n logger.log(\"main\", \"Task completed\")\n ```\n\n#### `get_log_dataframe() -> pd.DataFrame`\nReturn logged messages as a pandas DataFrame with columns `['Date', 'Function', 'Message', 'IsError']`.\n- **Parameters**: None\n- **Returns**: `pd.DataFrame` - DataFrame of logs, empty if no logs exist.\n- **Example**:\n ```python\n df = logger.get_log_dataframe()\n print(df)\n ```\n\n### `clear_folder(path: str, clear_files: bool = True) -> None`\nClear all files in a folder or create it if it doesn't exist.\n- **Parameters**:\n - `path` (str): Path to the folder.\n - `clear_files` (bool, optional): If `True`, delete all files and subdirectories. Defaults to `True`.\n- **Returns**: None\n- **Raises**:\n - `ValueError`: If `path` is empty or not a directory.\n - `OSError`: If folder creation or clearing fails.\n- **Example**:\n ```python\n clear_folder(\"logs\")\n ```\n\n### `create_chrome_driver(download_dir: Optional[str] = None, headless: bool = True, logger: Optional[Logger] = None) -> WebDriver`\nCreate a Selenium Chrome WebDriver configured for file downloads.\n- **Parameters**:\n - `download_dir` (str, optional): Directory for downloads. If `None`, defaults to `<cwd>/data/download`.\n - `headless` (bool, optional): Run in headless mode if `True`. Defaults to `True`.\n - `clear_download_dir` (bool, optional): Clear download folder if `True`. Defaults to `True`.\n - `disable_web_security` (bool, optional): Disable web security if `True`. Allow access to the web run with `HTTP` or unsecure. `**BE CAREFULL!!**`. Defaults to `False`.\n - `domain_skip_security` (List[str], optional): List of domains to treat as secure (bypass insecure warnings). Defaults to `None`.\n - `enable_incognito` (bool, optional): Run in incognito mode if `True`. Defaults to `True`.\n - `logger` (Logger, optional): Logger instance for logging actions. Defaults to `None`.\n- **Returns**: `WebDriver` - Configured Chrome WebDriver instance.\n- **Raises**:\n - `ValueError`: If `download_dir` is not a directory.\n - `OSError`: If `download_dir` cannot be created.\n - `WebDriverException`: If driver initialization fails.\n- **Example**:\n ```python\n browser = create_chrome_driver(download_dir=\"downloads\", logger=logger)\n ```\n ```python\n browser = create_chrome_driver(download_dir=\"downloads\", disable_web_security=True, domain_skip_security=[\"localhost.com\"] logger=logger)\n ```\n\n### `enable_download_headless(browser: WebDriver, download_dir: str, logger: Optional[Logger] = None) -> None`\nConfigure a headless Chrome WebDriver to allow file downloads.\n- **Parameters**:\n - `browser` (WebDriver): Selenium WebDriver instance.\n - `download_dir` (str): Directory for downloads.\n - `logger` (Logger, optional): Logger instance for logging. Defaults to `None`.\n- **Returns**: None\n- **Raises**:\n - `ValueError`: If `download_dir` is empty or not a directory.\n - `OSError`: If `download_dir` cannot be created.\n - `WebDriverException`: If the browser command fails.\n- **Example**:\n ```python\n enable_download_headless(browser, \"downloads\", logger)\n ```\n\n### `wait_download_file(folder_to_check: str, extension: str, folder_storage: str, max_waiting_download: int, replace_filename: Optional[str] = None, logger: Optional[Logger] = None) -> List[List[str]]`\nMonitor a folder for files with a specified extension, move them to a storage folder, and log actions.\n- **Parameters**:\n - `folder_to_check` (str): Folder to monitor for downloads.\n - `extension` (str): File extension to look for (e.g., `'pdf'` or `'.pdf'`).\n - `folder_storage` (str): Destination folder for moved files.\n - `max_waiting_download` (int): Maximum seconds to wait for downloads.\n - `replace_filename` (str, optional): New filename (without extension) for moved files. Defaults to `None`.\n - `logger` (Logger, optional): Logger instance for logging. Defaults to `None`.\n- **Returns**: `List[List[str]]` - List of `[filename, filepath, extension]` for processed files.\n- **Raises**:\n - `ValueError`: If folders or extension are invalid.\n - `OSError`: If folder creation or file operations fail.\n - `TimeoutError`: If no files are found within `max_waiting_download` seconds.\n- **Example**:\n ```python\n files = wait_download_file(\"downloads\", \"pdf\", \"storage\", 300, \"report\", logger)\n ```\n\n## Testing\nThe package includes a test suite using pytest. To run tests:\n```bash\npip install pytest\npytest tests/\n```\n\nThe test suite covers:\n- Logger functionality (`test_logger.py`)\n- Folder clearing (`test_utils.py`)\n- Browser automation (`test_browser_utils.py`)\n- Package imports and version (`test_init.py`)\n\n## Contributing\n1. Fork the repository: `https://github.com/TuiTenTuan/sbfi-knime-utils`\n2. Create a feature branch: `git checkout -b feature-name`\n3. Commit changes: `git commit -m \"Add feature\"`\n4. Push to the branch: `git push origin feature-name`\n5. Open a pull request.\n\n## License\nMIT License. See [LICENSE](LICENSE) for details.\n\n## Support\nFor issues or feature requests, open an issue on the [GitHub repository](https://github.com/TuiTenTuan/sbfi-knime-utils/issues).",
"bugtrack_url": null,
"license": null,
"summary": "A lightweight logging utility with folder clearing, browser download support, and pandas DataFrame output.",
"version": "1.4.2",
"project_urls": {
"Homepage": "https://github.com/TuiTenTuan/sbfi-knime-utils",
"Repository": "https://github.com/TuiTenTuan/sbfi-knime-utils"
},
"split_keywords": [
"browser",
" folder",
" logging",
" selenium"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "c8deb3e5e2fd6059a9cd91225fa824cf5e850ec02d4a8a14a4459b5f322d6976",
"md5": "a08d71e3ac981cf8763b6b2153dfe3e9",
"sha256": "a40753f16ae56d9a57d6feda43652de356ca9a47f23eb327e0b31d2070f443fd"
},
"downloads": -1,
"filename": "sbfi_knime_utils-1.4.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "a08d71e3ac981cf8763b6b2153dfe3e9",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 10856,
"upload_time": "2025-09-02T10:24:45",
"upload_time_iso_8601": "2025-09-02T10:24:45.850978Z",
"url": "https://files.pythonhosted.org/packages/c8/de/b3e5e2fd6059a9cd91225fa824cf5e850ec02d4a8a14a4459b5f322d6976/sbfi_knime_utils-1.4.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "261585a687ce8927a9cd18248963472ff3320f881c5969b967f0672fef1e7ccd",
"md5": "5dcce7ce3d6ee82dea0010dce1b4b2af",
"sha256": "b70a1d4fb1bddf353d004901bf173662f5dbc5d8348b5a29a7854aee5f9f2085"
},
"downloads": -1,
"filename": "sbfi_knime_utils-1.4.2.tar.gz",
"has_sig": false,
"md5_digest": "5dcce7ce3d6ee82dea0010dce1b4b2af",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 11176,
"upload_time": "2025-09-02T10:24:47",
"upload_time_iso_8601": "2025-09-02T10:24:47.012751Z",
"url": "https://files.pythonhosted.org/packages/26/15/85a687ce8927a9cd18248963472ff3320f881c5969b967f0672fef1e7ccd/sbfi_knime_utils-1.4.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-02 10:24:47",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "TuiTenTuan",
"github_project": "sbfi-knime-utils",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "sbfi-knime-utils"
}