csv-db-loader

Name	csv-db-loader JSON
Version	1.0.0 JSON
	download
home_page
Summary	Python package that enables customized loading of data from a CSV file into a MySQL database
upload_time	2023-10-22 14:53:06
maintainer
docs_url	None
author	Chavis
requires_python
license	MIT
keywords	python csv mysql database loader db load csv_db_loader
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # CSV Mysql Loader

**csv_db_loader** is a Python tool that allows you to easily load data from CSV files into a database with flexibility for custom data manipulation. You can transform and manipulate the data before storing it in the database, making it a versatile solution for data import projects.



## Getting Started
First define your database credentials into a dictionary like this:
```python
DB_CREDENTIALS = {
    'DB_HOST': 'your_host',
    'DB_NAME': 'your_db_name',
    'DB_USERNAME': 'your_db_username',
    'DB_PASSWORD': 'your_db_password',
    'DB_PORT': 'your_db_port',
}
```
Remember the name of the keys of the dictionary must be the same as the ones in the example. Then you have to define the column indexes of the CSV file you want to import into the database. For example, if you have a CSV file with the following columns:
```bash
id, name, email, address, number_code
0, John, john@mai.com, 1234, 1234
```
And you want to import only the name, email and number_code, columns, you have to define the indexes of the columns you want to import into the database. In this case, the indexes of the name and email columns are 1 and 2 respectively. So you have to define the indexes in a list like this:

```python
COLUMN_INDEXES = {
    "name": 1,
    "email": 2,
    "number_code": 4,
}
```
Now you have to define the query/queries that will be executed to insert the data into the database. For example, if you want to insert the data into a table called "users", you have to define the query like this:
```python
QUERY1 = "INSERT INTO users (name, email, code) SELECT ('{name}', '{email}'. {number_code})"
```
Once you have define all the queries you have to set them into a list like this:
```python
QUERIES = [QUERY1, QUERY2, QUERY3]
```
Now you have to create your own Strategy class that will be used to manipulate the data before storing it into the database. Remember that the class must inherit from the StrategyExtractor class. For example:
```python
from db_loader.structs.struct_strategy import StrategyExtractor
class StrategyExtractorExample(StrategyExtractor):
    def extract_from_csv(self, indexes: dict, data: list, row: list) -> tuple:
        # Implement your own logic here
        name = self.extract_value(row=row, indexes=indexes, key='name')  # Use the extract_value method to extract the value from the row.
        email = self.extract_value(row=row, indexes=indexes, key='email')
        number_code = "PARTNER-N-"+ self.extract_value(row=row, indexes=indexes, key='number_code')
        return name, email, number_code
```
extract_from_csv method is mandatory and must return a tuple with the values that will be used to replace the placeholders in the query/queries. Remember you need to use extract_value class method to extract the value of the row and the tuple must be sorted  in the same order as the COLUMN_INDEXES dictionary. In this case, the tuple must be like this:
```python
    (name, email, number_code)
```
We also have is_valid method that is optional and can be used to validate the data before storing it into the database. For example:
```python
def is_valid(self, indexes: dict, row: list[str]) -> bool:
    email = self.extract_value(row=row, indexes=indexes, key='email')
    my_company_domain = "@mycompany.com"
    if not email.endswith(my_company_domain):
        return False
    return True
```
If a row is not valid, the row will be skipped and won't be stored into the database.

Now you have to create an instance of the StrategyExtractorExample class and define the fields to be used from the COLUMN_INDEXES dictionary. For example:
```python
# Define the strategy to extract the data from the csv, must be a class that inherits from StrategyExtractor
strategy = StrategyExtractorExample()
# Define the structure of the class
strategy.define_fields(COLUMN_INDEXES)
```
Now you have everything you need you just need to import and run the migrate function like this:
```python
from db_loader.migration import migrate
migrate(db_credentials=DB_CREDENTIALS, csv_path='your/path/to/your.csv', strategy=strategy, indexes=COLUMN_INDEXES,
        queries=QUERIES_TO_EXECUTE, starter_row=1)
```
The starter_row parameter is optional and can be used to skip the first n rows of the CSV file. For example, if you want to skip the first row of the CSV file, you have to set the starter_row parameter to 1.

## More

### Optionals parameters for migrate function
- **starter_row**: defines the starter row to start reading csv, default value  = 0, 
- **delimiter**: define the csv delimiter, default value  = ','
- **double_query**: used to use chain query, default value  = False
- **logs**: used to print error logs, default value  = True
- **only_show_queries**: used to not excecute queryt only log the built query, default value  = False 
- **encoding**: used to define the csv encoding, default value  = 'ISO-8859-1'

## CHAIN QUERY
Using the double_query = True parameter you can use chain query. 

The only proupose of this is to use the result of the first query as a parameter of the second query. For example:
```python
QUERY_PRODUCT = "INSERT INTO product(code) " \
                "SELECT {code}"

QUERY_CLIENT_PURCHASES_MIDDLE_TABLE = "INSERT INTO client_purchases_middle_table(client_id, product_id) " \
                                      "SELECT " \
                                      "(SELECT id FROM client WHERE email = '{client_email}'), " \
                                      "LAST_INSERT_ID()"
QUERIES_TO_EXECUTE = [QUERY_PRODUCT, QUERY_CLIENT_PURCHASES_MIDDLE_TABLE]
```
In this case, the QUERY_CLIENT_PURCHASES_MIDDLE_TABLE query uses the result of the QUERY_PRODUCT query as a parameter. So, if you want to use chain query, you have to set the double_query parameter to True.

## More Examples
You can find more examples in the implementation examples folder.

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "csv-db-loader",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "python,csv,mysql,database,loader,db,load,csv_db_loader",
    "author": "Chavis",
    "author_email": "chavis.delcourt@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/06/b7/f997532e48aea7d2cf3f54b2d4a172de083bcca6b803d5c3a8c58c130e2e/csv_db_loader-1.0.0.tar.gz",
    "platform": null,
    "description": "# CSV Mysql Loader\r\n\r\n**csv_db_loader** is a Python tool that allows you to easily load data from CSV files into a database with flexibility for custom data manipulation. You can transform and manipulate the data before storing it in the database, making it a versatile solution for data import projects.\r\n\r\n\r\n\r\n## Getting Started\r\nFirst define your database credentials into a dictionary like this:\r\n```python\r\nDB_CREDENTIALS = {\r\n    'DB_HOST': 'your_host',\r\n    'DB_NAME': 'your_db_name',\r\n    'DB_USERNAME': 'your_db_username',\r\n    'DB_PASSWORD': 'your_db_password',\r\n    'DB_PORT': 'your_db_port',\r\n}\r\n```\r\nRemember the name of the keys of the dictionary must be the same as the ones in the example. Then you have to define the column indexes of the CSV file you want to import into the database. For example, if you have a CSV file with the following columns:\r\n```bash\r\nid, name, email, address, number_code\r\n0, John, john@mai.com, 1234, 1234\r\n```\r\nAnd you want to import only the name, email and number_code, columns, you have to define the indexes of the columns you want to import into the database. In this case, the indexes of the name and email columns are 1 and 2 respectively. So you have to define the indexes in a list like this:\r\n\r\n```python\r\nCOLUMN_INDEXES = {\r\n    \"name\": 1,\r\n    \"email\": 2,\r\n    \"number_code\": 4,\r\n}\r\n```\r\nNow you have to define the query/queries that will be executed to insert the data into the database. For example, if you want to insert the data into a table called \"users\", you have to define the query like this:\r\n```python\r\nQUERY1 = \"INSERT INTO users (name, email, code) SELECT ('{name}', '{email}'. {number_code})\"\r\n```\r\nOnce you have define all the queries you have to set them into a list like this:\r\n```python\r\nQUERIES = [QUERY1, QUERY2, QUERY3]\r\n```\r\nNow you have to create your own Strategy class that will be used to manipulate the data before storing it into the database. Remember that the class must inherit from the StrategyExtractor class. For example:\r\n```python\r\nfrom db_loader.structs.struct_strategy import StrategyExtractor\r\nclass StrategyExtractorExample(StrategyExtractor):\r\n    def extract_from_csv(self, indexes: dict, data: list, row: list) -> tuple:\r\n        # Implement your own logic here\r\n        name = self.extract_value(row=row, indexes=indexes, key='name')  # Use the extract_value method to extract the value from the row.\r\n        email = self.extract_value(row=row, indexes=indexes, key='email')\r\n        number_code = \"PARTNER-N-\"+ self.extract_value(row=row, indexes=indexes, key='number_code')\r\n        return name, email, number_code\r\n```\r\nextract_from_csv method is mandatory and must return a tuple with the values that will be used to replace the placeholders in the query/queries. Remember you need to use extract_value class method to extract the value of the row and the tuple must be sorted  in the same order as the COLUMN_INDEXES dictionary. In this case, the tuple must be like this:\r\n```python\r\n    (name, email, number_code)\r\n```\r\nWe also have is_valid method that is optional and can be used to validate the data before storing it into the database. For example:\r\n```python\r\ndef is_valid(self, indexes: dict, row: list[str]) -> bool:\r\n    email = self.extract_value(row=row, indexes=indexes, key='email')\r\n    my_company_domain = \"@mycompany.com\"\r\n    if not email.endswith(my_company_domain):\r\n        return False\r\n    return True\r\n```\r\nIf a row is not valid, the row will be skipped and won't be stored into the database.\r\n\r\nNow you have to create an instance of the StrategyExtractorExample class and define the fields to be used from the COLUMN_INDEXES dictionary. For example:\r\n```python\r\n# Define the strategy to extract the data from the csv, must be a class that inherits from StrategyExtractor\r\nstrategy = StrategyExtractorExample()\r\n# Define the structure of the class\r\nstrategy.define_fields(COLUMN_INDEXES)\r\n```\r\nNow you have everything you need you just need to import and run the migrate function like this:\r\n```python\r\nfrom db_loader.migration import migrate\r\nmigrate(db_credentials=DB_CREDENTIALS, csv_path='your/path/to/your.csv', strategy=strategy, indexes=COLUMN_INDEXES,\r\n        queries=QUERIES_TO_EXECUTE, starter_row=1)\r\n```\r\nThe starter_row parameter is optional and can be used to skip the first n rows of the CSV file. For example, if you want to skip the first row of the CSV file, you have to set the starter_row parameter to 1.\r\n\r\n## More\r\n\r\n### Optionals parameters for migrate function\r\n- **starter_row**: defines the starter row to start reading csv, default value  = 0, \r\n- **delimiter**: define the csv delimiter, default value  = ','\r\n- **double_query**: used to use chain query, default value  = False\r\n- **logs**: used to print error logs, default value  = True\r\n- **only_show_queries**: used to not excecute queryt only log the built query, default value  = False \r\n- **encoding**: used to define the csv encoding, default value  = 'ISO-8859-1'\r\n\r\n## CHAIN QUERY\r\nUsing the double_query = True parameter you can use chain query. \r\n\r\nThe only proupose of this is to use the result of the first query as a parameter of the second query. For example:\r\n```python\r\nQUERY_PRODUCT = \"INSERT INTO product(code) \" \\\r\n                \"SELECT {code}\"\r\n\r\nQUERY_CLIENT_PURCHASES_MIDDLE_TABLE = \"INSERT INTO client_purchases_middle_table(client_id, product_id) \" \\\r\n                                      \"SELECT \" \\\r\n                                      \"(SELECT id FROM client WHERE email = '{client_email}'), \" \\\r\n                                      \"LAST_INSERT_ID()\"\r\nQUERIES_TO_EXECUTE = [QUERY_PRODUCT, QUERY_CLIENT_PURCHASES_MIDDLE_TABLE]\r\n```\r\nIn this case, the QUERY_CLIENT_PURCHASES_MIDDLE_TABLE query uses the result of the QUERY_PRODUCT query as a parameter. So, if you want to use chain query, you have to set the double_query parameter to True.\r\n\r\n## More Examples\r\nYou can find more examples in the implementation examples folder.\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Python package that enables customized loading of data from a CSV file into a MySQL database",
    "version": "1.0.0",
    "project_urls": null,
    "split_keywords": [
        "python",
        "csv",
        "mysql",
        "database",
        "loader",
        "db",
        "load",
        "csv_db_loader"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "55f034a465821715f2dd189e173d71caa8ef43b47406e8e05f9aecfd39175d26",
                "md5": "3854b9364c4eddc114d9992744aa4460",
                "sha256": "75d3250202764cec5ecd8328a5d32c52c31745463e7bda34885f339fdfc3896d"
            },
            "downloads": -1,
            "filename": "csv_db_loader-1.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3854b9364c4eddc114d9992744aa4460",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 9634,
            "upload_time": "2023-10-22T14:53:04",
            "upload_time_iso_8601": "2023-10-22T14:53:04.996333Z",
            "url": "https://files.pythonhosted.org/packages/55/f0/34a465821715f2dd189e173d71caa8ef43b47406e8e05f9aecfd39175d26/csv_db_loader-1.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "06b7f997532e48aea7d2cf3f54b2d4a172de083bcca6b803d5c3a8c58c130e2e",
                "md5": "345d18048adfbc787e9ffb99860f41f6",
                "sha256": "42f6b42021ae25cf30ebc0469bf4c0258567bd3f408cbaa1ed7cb71d69da2c1c"
            },
            "downloads": -1,
            "filename": "csv_db_loader-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "345d18048adfbc787e9ffb99860f41f6",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 7994,
            "upload_time": "2023-10-22T14:53:06",
            "upload_time_iso_8601": "2023-10-22T14:53:06.911476Z",
            "url": "https://files.pythonhosted.org/packages/06/b7/f997532e48aea7d2cf3f54b2d4a172de083bcca6b803d5c3a8c58c130e2e/csv_db_loader-1.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-10-22 14:53:06",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "csv-db-loader"
}

Chavis