django-pipe2db


Namedjango-pipe2db JSON
Version 1.0.3 PyPI version JSON
download
home_pagehttps://github.com/zwolf21/django-pipe2db
SummaryA decorator that connects django model and data generator function
upload_time2022-08-02 15:21:47
maintainer
docs_urlNone
authorHS Moon
requires_python>=3.8
licenseMIT
keywords pipe2db django-pipe2db django orm standalone django standalone django orm
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # django-pipe2db


## Contents
- [django-pipe2db](#django-pipe2db)
  - [Contents](#contents)
  - [Concepts](#concepts)
  - [Features](#features)
  - [Install and Import](#install-and-import)
  - [Quick Start](#quick-start)
    - [1. Using django orm as standalone](#1-using-django-orm-as-standalone)
    - [2. Using with django project](#2-using-with-django-project)
  - [Useage](#useage)
    - [Argument of pipe decorator as context](#argument-of-pipe-decorator-as-context)
      - [model](#model)
      - [unique_key](#unique_key)
      - [method](#method)
      - [rename_fields](#rename_fields)
      - [exclude_fields](#exclude_fields)
      - [foreignkey_fields](#foreignkey_fields)
      - [manytomany_fields](#manytomany_fields)
  - [- See complicate context and data nested level example](#--see-complicate-context-and-data-nested-level-example)
      - [contentfile_fields](#contentfile_fields)



## Concepts
- A decorator that written by wrapping orm method of django models
- It maps the relationship between the models and data via nested dictionary

---
## Features
- It bridges Python functions and django models
- Create and update data to database via models
- Automatically create and modify tables by wrapping manage.py commands from django as makemigrations and migrate
- Load minimum django settings for can use django orm as standalone that without using the django project
- Insertion of data with the same relationship as foreignkey and manytomany fields
- Inserting a content file object as an image field

---
## Install and Import

```bash
pip install django-pipe2db
```
```python
# crawler.py
from pipe2db import pipe
from pipe2db import setupdb
```
---
## Quick Start


### 1. Using django orm as standalone
- Create models.py in the directory that will be used as the Django app
- example for minimum project directory structure. [see](https://github.com/zwolf21/django-pipe2db/tree/master/test)
```bash
Project
│  __main__.py
│
└─bookstore
    │  insert.py
    │  
    └─db
          models.py
```

```python
# models.py
from django.db import models


class Author(models.Model):
    email = models.EmailField('Email', unique=True)
    first_name = models.CharField(max_length=100)
    last_name = models.CharField(max_length=100)
    date_of_birth = models.DateField(null=True, blank=True)
    date_of_death = models.DateField('Died', null=True, blank=True)

    class Meta:
        db_table = 'author'
```
```python
# insert.py
from pipe2db import pipe, setupdb


setupdb() # find models automatically
# setupdb('bookstore.db') # or more explicitly 

# The key of the data and the field names of the model are matched
author1 = {
    'email': 'xman1@google.com',
    'first_name': 'charse',
    'last_name': 'javie',
    'date_of_birth': '1975-07-25',
    'date_of_death': '1995-07-11'
}
author2 = {
    'email': 'yman1@google.com',
    'first_name': 'jin',
    'last_name': 'gray',
    'date_of_birth': '1925-07-25',
    'date_of_death': '1999-01-21'
}


@pipe({
    'model': 'db.Author', 
    'unique_key': 'email', # unique values of table as pk
    # 'method': 'update' # If uncomment, works in update mode
})
def insert(*args, Author, **kwargs):
    # You Can get model class via argumenting at generator function

    # from django.apps import apps # or via get_model method of django
    # Author = apps.get_model('db.Author') 

    queryset = Author.objects.all()

    yield from [author1, author2, author3]

```

- run examples
```bash
python bookstore/insert.py
```


### 2. Using with django project
- Since DJANGO_SETTINGS_MODULE is already setted, it's not need to call setupdb
- [django site example](https://github.com/zwolf21/django-pipe2db/tree/master/testsite/bookstore)

> run via shell which excuted by 'python manage.py shell' command of django manage
> ```bash
> python manage.py shell
> ```
>```python
>In [1]: from yourpackage.insert import insert
>In [2]: insert()
>```


|id|email|first_name|last_name|date_of_birth|date_of_death|
|--|--|--|--|--|--|
|1|xman1@google.com	|charse|javie|1975-07-25|1995-07-11|
|2|yman1@google.com	|jin|gray|1925-07-25|1999-01-21|
|3|batman1@google.com|wolverin|jack|1988-07-25|NULL|



--- 
## Useage

### Argument of pipe decorator as context
- A context is a dictionary that describes the relationship between the model and the data
- In the following examples, the elements that make up the context are explained step by step

#### model
- django model to pipe data written as string literals
```python
# some_crawler.py
from pip2db import pipe

@pipe({
    'model': 'db.Author'
    # 'model': 'yourapp.YourModel' on django project
})
def abc_crawler():
    ...
    yield row
```
> It is also a good way to assign and use a variable to increase reusability
> When expressing nested relationships in relational data, not assigning them as variables can result in repeatedly creating the same context.
```python
# assign to variable crawler.py

# It seems to better way
context_author = {
    'model': 'db.Author'
}

@pipe(context_author)
def abcd_crawler(*args, **kwargs):
    yield ..
```

- It is also possible to specify the model by directly importing it, but in the case of standalone, you must declare setupdb before importing the model
  
```python
# dose not look good.py

from pipe2db import setupdb, pipe

setupdb()
from .db.models import Author

context_author = {'model': Author}

@pipe(context_author)
def abc():
    yield ..
```

> Another way to refer to the model class
> 1. Using Django's apps module
>   ```python
>   from django.apps import apps
>
>   Author = apps.get_model('db.Author')
>   ```
> 2. Specify the model name as an argument to the generator function
>   ```python   
>   # An example of controlling a generator based on data in a database
>   @pipe(context_author)
>   def abc_crawler(rows, response, Author):
>       visited = Author.objects.values_list('review_id', flat=True)
>       for row in rows:
>           if row['id'] in visited:
>               break
>           yield row
>   ```

#### unique_key
- key to identify data like as primary key
- If you don't specify it, creating data will be duplicated
- To identify data with one or several keys as unique_together

```python
# models.py

# unique key model
class Author(models.Model):
    ...
    first_name = models.CharField(max_length=100, unique=True)
    ...
```

```python
# uniqufy_by_one.py

context_author = {
    'model': 'db.Author',
    'unique_key': 'first_name'
}
```

> If uniqueness is not guaranteed with one key, add another
>```python
># models.py
>
># unique together model
>class Author(models.Model):
>    ...
>    first_name = models.CharField(max_length=100)
>    last_name = models.CharField(max_length=100)
>
>    class Meta:
>        unique_together = ['first_name', 'last_name']
>    ...
>```
>```python
>#unique_together.py
>
>context_author = {
>    'model': 'db.Author',
>    'unique_key': ['first_name', 'last_name']
>}
>```


#### method
- Creates or updates data with a unique key specified
- Defaults is create
- In create mode, data is inserted based on unique.
- In update mode as wrapper update_or_create of django method, creates records if they don't exist, otherwise modifies existing records


```python
# incorrect create.py
from pipe2db import pipe

author_incorrect = {
    'email': 'batman1@google.com',
    'first_name': 'who', # incorrect
    'last_name': 'jackman',
    'date_of_birth': '1988-07-25', # incorrect
    'date_of_death': None
}

context = {
    'model': 'db.Author',
    'unique_key': 'email',
    # 'method': 'create' no need to specify if create
}

@pipe(context)
def gen_author(...):
    yield author_incorrect
```
> result table
>
>|id|email|first_name|last_name|date_of_birth|date_of_death|
>|--|--|--|--|--|--|
>|3|batman1@google.com|who|jackman|1988-07-25|NULL|


```python
# correct as update.py
from pipe2db import pipe

author_corrected = {
    'email': 'batman1@google.com',
    'first_name': 'Hugh', # correct
    'last_name': 'jackman',
    'date_of_birth': '1968-10-12', # correct
    'date_of_death': None
}

context = {
    'model': 'db.Author',
    'unique_key': 'email',
    'method': 'update', # for update record by corrected data
}

@pipe(context)
def gen_author(...):
    yield author_corrected
```
> result table
>
>|id|email|first_name|last_name|date_of_birth|date_of_death|
>|--|--|--|--|--|--|
>|3|batman1@google.com|Hugh|jackman|1968-10-12|NULL|


#### rename_fields
- Dictionary of between data and model as key:field mapping
- Used when the data key and the model field name are different

```python
# models.py
from django.db import models


class Author(models.Models):
    ...
    ...

class Book(models.Model):
    title = models.CharField(max_length=200) 
    isbn = models.CharField('ISBN', max_length=13, unique=True)

    class Meta:
        db_table = 'book'
```

```python
# book_crawler.py

context = {
    'model': 'db.Book',
    'unique_key': 'isbn',
    'rename_fields': {
        'header' : 'title', 
        'book_id': 'isbn',
    }
}
# map header -> title, book_id -> isbn

@pipe(context)
def book_crawler(abc, defg, jkl=None):
    book_list = [
        {
            'header': 'oh happy day', # header to title
            'book_id': '1234640841',
        },
        {
            'header': 'oh happy day',
            'book_id': '9214644250',
        },
    ]
    yield from book_list
```

#### exclude_fields
- List of keys to excluds
- Used when the data has a key that is not in the field names in the model
- Filter too much information from data that model cannot consume
  
```python
# bookcrawler.py
from pipe2db import pipe
...
...

context = {
    'model': 'db.Book',
    'unique_key': 'isbn',
    'rename_fields': {
        'header' : 'title', 
        'book_id': 'isbn',
    },
    'exclude_fields': ['status'] # exclude
}

@pipe(context)
def book_crawler(abc, defg, jkl=None):
    book_list = [
        {
            'header': 'oh happy day', # header to title
            'book_id': '1234640841',
            'status': 'on sales', # status is not needed in Book model
        },
        {
            'header': 'oh happy day',
            'book_id': '9214644250',
            'sstatus': 'no stock',
        },
    ]
    yield from book_list

```

--- 
Mapping of Relative Data

#### foreignkey_fields
- Creat records by generation according to the foreign key relationship between tables
- Recursively nest parent dict to children dict
- There are two way of create relationship data

```python
# models.py
# two models of related with foreign key
from django.db import models


class Author(models.Model):
    email = models.EmailField('Email', unique=True)
    name = models.CharField(max_length=100)

    class Meta:
        db_table = 'author'


class Book(models.Model):
    author = models.ForeignKey('Author', on_delete=models.CASCADE, null=True) # fk
    isbn = models.CharField('ISBN', max_length=13, unique=True)
    title = models.CharField(max_length=200)

    class Meta:
        db_table = 'book'
```

```python
# some crawler.py
from pipe2db import pipe

# 1. Generate data of book author nested

context_author = {
    'model': 'db.Author',
    'unique_key': 'email',
    'method': 'update'
}

context_book = {
    'model': 'db.Book',
    'unique_key': 'isbn',
    'foreignkey_fields': {
        'book': context_author
    }
}

# author data is nested in book data
@pipe(context_book)
def parse_book():
    author1 = {
        'email': 'pbr112@naver.com',
        'name': 'hs moon',
    }
    book = {
        'author': author1,
        'title': 'django-pipe2db',
        'isbn': '291803928123'
    }
    yield book

```

```python
# some crawler.py 
from pipe2db import pipe

# 2. Generate data of author and book sequentially

@pipe(context_author)
def parse_author():
    author1 = {
        'email': 'pbr112@naver.com',
        'name': 'hs moon',
    }
    yield author1

# create author first
author1 = parse_author()

# create book after and connect fk relation to author
@pipe(context_book)
def parse_book():
    book = {
        'author': author1['email'], # Since the author has already been created, it possible to pass email as pk of author only
        # 'author': author1, # or same as above
        'title': 'django-pipe2db',
        'isbn': '291803928123'
    }
    yield book
```

#### manytomany_fields
- Create data for manytomany relationships
- Generate data with nesting the children m2m data in the parent data key in the form of a list

```python
# models.py 
from django.db import models


class Book(models.Model):
    title = models.CharField(max_length=200)
    isbn = models.CharField('ISBN', max_length=13, unique=True)

    genre = models.ManyToManyField('db.Genre')

    class Meta:
        db_table = 'book'


class Genre(models.Model):
    name = models.CharField(max_length=200, unique=True)

    class Meta:
        db_table = 'genre'

```

```python
# m2m_generator.py
from pipe2db import pipe

context_genre = {
    'model': 'db.Genre',
    'unique_key': 'name'
}

context_book = {
    'model': 'db.Book',
    'unique_key': 'isbn',
    'manytomany_fields': {
        'genre': context_genre
    }
}

@pipe(context_book)
def gen_book_with_genre():
    genre1 = {'name': 'action'}
    genre2 = {'name': 'fantasy'}

    book1 = {
        'title': 'oh happy day', 'isbn': '2828233644', 'genre': [genre2], # nest genres to list
    }
    book2 = {
        'title': 'python', 'isbn': '9875230846', 'genre': [genre1, genre2],
    }
    book3 = {
        'title': 'java', 'isbn': '1234640841', # has no genre
    }
    yield from [book1, book2, book3]
```

- [See complicate context and data nested level example](https://github.com/zwolf21/django-pipe2db/blob/master/testsite/bookstore/scraper.py)
---

Create record with contentfiles

#### contentfile_fields
- Saving file via ContentFile class from django.core.files module
- source_url_field is specified as meta data for determinding file name

```python
# models.py
from django.db import models

class BookImage(models.Model):
    img = models.ImageField()

    class Meta:
        db_table = 'bookimage'

```

```python
from pipe2db import pipe

@pipe({
    'model': 'db.BookImage',
    'contentfile_fields': {
        'img': {
            'source_url_field': 'src',
        }
    },
    'exclude_fields': ['src'] # when model dose not need src data
})
def image_crawler(response):
    image_data = {
        'img': 'response_content',
        'src': response.url #  needed for extracting filename as source_url_field
    }
    yield image_data
```
            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/zwolf21/django-pipe2db",
    "name": "django-pipe2db",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "pipe2db,django-pipe2db,django orm,standalone django,standalone django orm",
    "author": "HS Moon",
    "author_email": "pbr112@naver.com",
    "download_url": "https://files.pythonhosted.org/packages/79/9e/c62cb230ff160a29063830171c910c0c013a9f737a8ac1371cf71a31b2e3/django-pipe2db-1.0.3.tar.gz",
    "platform": null,
    "description": "# django-pipe2db\n\n\n## Contents\n- [django-pipe2db](#django-pipe2db)\n  - [Contents](#contents)\n  - [Concepts](#concepts)\n  - [Features](#features)\n  - [Install and Import](#install-and-import)\n  - [Quick Start](#quick-start)\n    - [1. Using django orm as standalone](#1-using-django-orm-as-standalone)\n    - [2. Using with django project](#2-using-with-django-project)\n  - [Useage](#useage)\n    - [Argument of pipe decorator as context](#argument-of-pipe-decorator-as-context)\n      - [model](#model)\n      - [unique_key](#unique_key)\n      - [method](#method)\n      - [rename_fields](#rename_fields)\n      - [exclude_fields](#exclude_fields)\n      - [foreignkey_fields](#foreignkey_fields)\n      - [manytomany_fields](#manytomany_fields)\n  - [- See complicate context and data nested level example](#--see-complicate-context-and-data-nested-level-example)\n      - [contentfile_fields](#contentfile_fields)\n\n\n\n## Concepts\n- A decorator that written by wrapping orm method of django models\n- It maps the relationship between the models and data via nested dictionary\n\n---\n## Features\n- It bridges Python functions and django models\n- Create and update data to database via models\n- Automatically create and modify tables by wrapping manage.py commands from django as makemigrations and migrate\n- Load minimum django settings for can use django orm as standalone that without using the django project\n- Insertion of data with the same relationship as foreignkey and manytomany fields\n- Inserting a content file object as an image field\n\n---\n## Install and Import\n\n```bash\npip install django-pipe2db\n```\n```python\n# crawler.py\nfrom pipe2db import pipe\nfrom pipe2db import setupdb\n```\n---\n## Quick Start\n\n\n### 1. Using django orm as standalone\n- Create models.py in the directory that will be used as the Django app\n- example for minimum project directory structure. [see](https://github.com/zwolf21/django-pipe2db/tree/master/test)\n```bash\nProject\n\u2502  __main__.py\n\u2502\n\u2514\u2500bookstore\n    \u2502  insert.py\n    \u2502  \n    \u2514\u2500db\n          models.py\n```\n\n```python\n# models.py\nfrom django.db import models\n\n\nclass Author(models.Model):\n    email = models.EmailField('Email', unique=True)\n    first_name = models.CharField(max_length=100)\n    last_name = models.CharField(max_length=100)\n    date_of_birth = models.DateField(null=True, blank=True)\n    date_of_death = models.DateField('Died', null=True, blank=True)\n\n    class Meta:\n        db_table = 'author'\n```\n```python\n# insert.py\nfrom pipe2db import pipe, setupdb\n\n\nsetupdb() # find models automatically\n# setupdb('bookstore.db') # or more explicitly \n\n# The key of the data and the field names of the model are matched\nauthor1 = {\n    'email': 'xman1@google.com',\n    'first_name': 'charse',\n    'last_name': 'javie',\n    'date_of_birth': '1975-07-25',\n    'date_of_death': '1995-07-11'\n}\nauthor2 = {\n    'email': 'yman1@google.com',\n    'first_name': 'jin',\n    'last_name': 'gray',\n    'date_of_birth': '1925-07-25',\n    'date_of_death': '1999-01-21'\n}\n\n\n@pipe({\n    'model': 'db.Author', \n    'unique_key': 'email', # unique values of table as pk\n    # 'method': 'update' # If uncomment, works in update mode\n})\ndef insert(*args, Author, **kwargs):\n    # You Can get model class via argumenting at generator function\n\n    # from django.apps import apps # or via get_model method of django\n    # Author = apps.get_model('db.Author') \n\n    queryset = Author.objects.all()\n\n    yield from [author1, author2, author3]\n\n```\n\n- run examples\n```bash\npython bookstore/insert.py\n```\n\n\n### 2. Using with django project\n- Since DJANGO_SETTINGS_MODULE is already setted, it's not need to call setupdb\n- [django site example](https://github.com/zwolf21/django-pipe2db/tree/master/testsite/bookstore)\n\n> run via shell which excuted by 'python manage.py shell' command of django manage\n> ```bash\n> python manage.py shell\n> ```\n>```python\n>In [1]: from yourpackage.insert import insert\n>In [2]: insert()\n>```\n\n\n|id|email|first_name|last_name|date_of_birth|date_of_death|\n|--|--|--|--|--|--|\n|1|xman1@google.com\t|charse|javie|1975-07-25|1995-07-11|\n|2|yman1@google.com\t|jin|gray|1925-07-25|1999-01-21|\n|3|batman1@google.com|wolverin|jack|1988-07-25|NULL|\n\n\n\n--- \n## Useage\n\n### Argument of pipe decorator as context\n- A context is a dictionary that describes the relationship between the model and the data\n- In the following examples, the elements that make up the context are explained step by step\n\n#### model\n- django model to pipe data written as string literals\n```python\n# some_crawler.py\nfrom pip2db import pipe\n\n@pipe({\n    'model': 'db.Author'\n    # 'model': 'yourapp.YourModel' on django project\n})\ndef abc_crawler():\n    ...\n    yield row\n```\n> It is also a good way to assign and use a variable to increase reusability\n> When expressing nested relationships in relational data, not assigning them as variables can result in repeatedly creating the same context.\n```python\n# assign to variable crawler.py\n\n# It seems to better way\ncontext_author = {\n    'model': 'db.Author'\n}\n\n@pipe(context_author)\ndef abcd_crawler(*args, **kwargs):\n    yield ..\n```\n\n- It is also possible to specify the model by directly importing it, but in the case of standalone, you must declare setupdb before importing the model\n  \n```python\n# dose not look good.py\n\nfrom pipe2db import setupdb, pipe\n\nsetupdb()\nfrom .db.models import Author\n\ncontext_author = {'model': Author}\n\n@pipe(context_author)\ndef abc():\n    yield ..\n```\n\n> Another way to refer to the model class\n> 1. Using Django's apps module\n>   ```python\n>   from django.apps import apps\n>\n>   Author = apps.get_model('db.Author')\n>   ```\n> 2. Specify the model name as an argument to the generator function\n>   ```python   \n>   # An example of controlling a generator based on data in a database\n>   @pipe(context_author)\n>   def abc_crawler(rows, response, Author):\n>       visited = Author.objects.values_list('review_id', flat=True)\n>       for row in rows:\n>           if row['id'] in visited:\n>               break\n>           yield row\n>   ```\n\n#### unique_key\n- key to identify data like as primary key\n- If you don't specify it, creating data will be duplicated\n- To identify data with one or several keys as unique_together\n\n```python\n# models.py\n\n# unique key model\nclass Author(models.Model):\n    ...\n    first_name = models.CharField(max_length=100, unique=True)\n    ...\n```\n\n```python\n# uniqufy_by_one.py\n\ncontext_author = {\n    'model': 'db.Author',\n    'unique_key': 'first_name'\n}\n```\n\n> If uniqueness is not guaranteed with one key, add another\n>```python\n># models.py\n>\n># unique together model\n>class Author(models.Model):\n>    ...\n>    first_name = models.CharField(max_length=100)\n>    last_name = models.CharField(max_length=100)\n>\n>    class Meta:\n>        unique_together = ['first_name', 'last_name']\n>    ...\n>```\n>```python\n>#unique_together.py\n>\n>context_author = {\n>    'model': 'db.Author',\n>    'unique_key': ['first_name', 'last_name']\n>}\n>```\n\n\n#### method\n- Creates or updates data with a unique key specified\n- Defaults is create\n- In create mode, data is inserted based on unique.\n- In update mode as wrapper update_or_create of django method, creates records if they don't exist, otherwise modifies existing records\n\n\n```python\n# incorrect create.py\nfrom pipe2db import pipe\n\nauthor_incorrect = {\n    'email': 'batman1@google.com',\n    'first_name': 'who', # incorrect\n    'last_name': 'jackman',\n    'date_of_birth': '1988-07-25', # incorrect\n    'date_of_death': None\n}\n\ncontext = {\n    'model': 'db.Author',\n    'unique_key': 'email',\n    # 'method': 'create' no need to specify if create\n}\n\n@pipe(context)\ndef gen_author(...):\n    yield author_incorrect\n```\n> result table\n>\n>|id|email|first_name|last_name|date_of_birth|date_of_death|\n>|--|--|--|--|--|--|\n>|3|batman1@google.com|who|jackman|1988-07-25|NULL|\n\n\n```python\n# correct as update.py\nfrom pipe2db import pipe\n\nauthor_corrected = {\n    'email': 'batman1@google.com',\n    'first_name': 'Hugh', # correct\n    'last_name': 'jackman',\n    'date_of_birth': '1968-10-12', # correct\n    'date_of_death': None\n}\n\ncontext = {\n    'model': 'db.Author',\n    'unique_key': 'email',\n    'method': 'update', # for update record by corrected data\n}\n\n@pipe(context)\ndef gen_author(...):\n    yield author_corrected\n```\n> result table\n>\n>|id|email|first_name|last_name|date_of_birth|date_of_death|\n>|--|--|--|--|--|--|\n>|3|batman1@google.com|Hugh|jackman|1968-10-12|NULL|\n\n\n#### rename_fields\n- Dictionary of between data and model as key:field mapping\n- Used when the data key and the model field name are different\n\n```python\n# models.py\nfrom django.db import models\n\n\nclass Author(models.Models):\n    ...\n    ...\n\nclass Book(models.Model):\n    title = models.CharField(max_length=200) \n    isbn = models.CharField('ISBN', max_length=13, unique=True)\n\n    class Meta:\n        db_table = 'book'\n```\n\n```python\n# book_crawler.py\n\ncontext = {\n    'model': 'db.Book',\n    'unique_key': 'isbn',\n    'rename_fields': {\n        'header' : 'title', \n        'book_id': 'isbn',\n    }\n}\n# map header -> title, book_id -> isbn\n\n@pipe(context)\ndef book_crawler(abc, defg, jkl=None):\n    book_list = [\n        {\n            'header': 'oh happy day', # header to title\n            'book_id': '1234640841',\n        },\n        {\n            'header': 'oh happy day',\n            'book_id': '9214644250',\n        },\n    ]\n    yield from book_list\n```\n\n#### exclude_fields\n- List of keys to excluds\n- Used when the data has a key that is not in the field names in the model\n- Filter too much information from data that model cannot consume\n  \n```python\n# bookcrawler.py\nfrom pipe2db import pipe\n...\n...\n\ncontext = {\n    'model': 'db.Book',\n    'unique_key': 'isbn',\n    'rename_fields': {\n        'header' : 'title', \n        'book_id': 'isbn',\n    },\n    'exclude_fields': ['status'] # exclude\n}\n\n@pipe(context)\ndef book_crawler(abc, defg, jkl=None):\n    book_list = [\n        {\n            'header': 'oh happy day', # header to title\n            'book_id': '1234640841',\n            'status': 'on sales', # status is not needed in Book model\n        },\n        {\n            'header': 'oh happy day',\n            'book_id': '9214644250',\n            'sstatus': 'no stock',\n        },\n    ]\n    yield from book_list\n\n```\n\n--- \nMapping of Relative Data\n\n#### foreignkey_fields\n- Creat records by generation according to the foreign key relationship between tables\n- Recursively nest parent dict to children dict\n- There are two way of create relationship data\n\n```python\n# models.py\n# two models of related with foreign key\nfrom django.db import models\n\n\nclass Author(models.Model):\n    email = models.EmailField('Email', unique=True)\n    name = models.CharField(max_length=100)\n\n    class Meta:\n        db_table = 'author'\n\n\nclass Book(models.Model):\n    author = models.ForeignKey('Author', on_delete=models.CASCADE, null=True) # fk\n    isbn = models.CharField('ISBN', max_length=13, unique=True)\n    title = models.CharField(max_length=200)\n\n    class Meta:\n        db_table = 'book'\n```\n\n```python\n# some crawler.py\nfrom pipe2db import pipe\n\n# 1. Generate data of book author nested\n\ncontext_author = {\n    'model': 'db.Author',\n    'unique_key': 'email',\n    'method': 'update'\n}\n\ncontext_book = {\n    'model': 'db.Book',\n    'unique_key': 'isbn',\n    'foreignkey_fields': {\n        'book': context_author\n    }\n}\n\n# author data is nested in book data\n@pipe(context_book)\ndef parse_book():\n    author1 = {\n        'email': 'pbr112@naver.com',\n        'name': 'hs moon',\n    }\n    book = {\n        'author': author1,\n        'title': 'django-pipe2db',\n        'isbn': '291803928123'\n    }\n    yield book\n\n```\n\n```python\n# some crawler.py \nfrom pipe2db import pipe\n\n# 2. Generate data of author and book sequentially\n\n@pipe(context_author)\ndef parse_author():\n    author1 = {\n        'email': 'pbr112@naver.com',\n        'name': 'hs moon',\n    }\n    yield author1\n\n# create author first\nauthor1 = parse_author()\n\n# create book after and connect fk relation to author\n@pipe(context_book)\ndef parse_book():\n    book = {\n        'author': author1['email'], # Since the author has already been created, it possible to pass email as pk of author only\n        # 'author': author1, # or same as above\n        'title': 'django-pipe2db',\n        'isbn': '291803928123'\n    }\n    yield book\n```\n\n#### manytomany_fields\n- Create data for manytomany relationships\n- Generate data with nesting the children m2m data in the parent data key in the form of a list\n\n```python\n# models.py \nfrom django.db import models\n\n\nclass Book(models.Model):\n    title = models.CharField(max_length=200)\n    isbn = models.CharField('ISBN', max_length=13, unique=True)\n\n    genre = models.ManyToManyField('db.Genre')\n\n    class Meta:\n        db_table = 'book'\n\n\nclass Genre(models.Model):\n    name = models.CharField(max_length=200, unique=True)\n\n    class Meta:\n        db_table = 'genre'\n\n```\n\n```python\n# m2m_generator.py\nfrom pipe2db import pipe\n\ncontext_genre = {\n    'model': 'db.Genre',\n    'unique_key': 'name'\n}\n\ncontext_book = {\n    'model': 'db.Book',\n    'unique_key': 'isbn',\n    'manytomany_fields': {\n        'genre': context_genre\n    }\n}\n\n@pipe(context_book)\ndef gen_book_with_genre():\n    genre1 = {'name': 'action'}\n    genre2 = {'name': 'fantasy'}\n\n    book1 = {\n        'title': 'oh happy day', 'isbn': '2828233644', 'genre': [genre2], # nest genres to list\n    }\n    book2 = {\n        'title': 'python', 'isbn': '9875230846', 'genre': [genre1, genre2],\n    }\n    book3 = {\n        'title': 'java', 'isbn': '1234640841', # has no genre\n    }\n    yield from [book1, book2, book3]\n```\n\n- [See complicate context and data nested level example](https://github.com/zwolf21/django-pipe2db/blob/master/testsite/bookstore/scraper.py)\n---\n\nCreate record with contentfiles\n\n#### contentfile_fields\n- Saving file via ContentFile class from django.core.files module\n- source_url_field is specified as meta data for determinding file name\n\n```python\n# models.py\nfrom django.db import models\n\nclass BookImage(models.Model):\n    img = models.ImageField()\n\n    class Meta:\n        db_table = 'bookimage'\n\n```\n\n```python\nfrom pipe2db import pipe\n\n@pipe({\n    'model': 'db.BookImage',\n    'contentfile_fields': {\n        'img': {\n            'source_url_field': 'src',\n        }\n    },\n    'exclude_fields': ['src'] # when model dose not need src data\n})\ndef image_crawler(response):\n    image_data = {\n        'img': 'response_content',\n        'src': response.url #  needed for extracting filename as source_url_field\n    }\n    yield image_data\n```",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A decorator that connects django model and data generator function",
    "version": "1.0.3",
    "project_urls": {
        "Homepage": "https://github.com/zwolf21/django-pipe2db"
    },
    "split_keywords": [
        "pipe2db",
        "django-pipe2db",
        "django orm",
        "standalone django",
        "standalone django orm"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "799ec62cb230ff160a29063830171c910c0c013a9f737a8ac1371cf71a31b2e3",
                "md5": "1a2ea7b30a1c2a94546fefb1e3a4952a",
                "sha256": "01719bf9ef3d40bae5823585520d545fbbab97d6ad924f1ff93c21365fbadf0f"
            },
            "downloads": -1,
            "filename": "django-pipe2db-1.0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "1a2ea7b30a1c2a94546fefb1e3a4952a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 15095,
            "upload_time": "2022-08-02T15:21:47",
            "upload_time_iso_8601": "2022-08-02T15:21:47.556179Z",
            "url": "https://files.pythonhosted.org/packages/79/9e/c62cb230ff160a29063830171c910c0c013a9f737a8ac1371cf71a31b2e3/django-pipe2db-1.0.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2022-08-02 15:21:47",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "zwolf21",
    "github_project": "django-pipe2db",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "django-pipe2db"
}
        
Elapsed time: 0.12857s