# django-pipe2db
## Contents
- [django-pipe2db](#django-pipe2db)
- [Contents](#contents)
- [Concepts](#concepts)
- [Features](#features)
- [Install and Import](#install-and-import)
- [Quick Start](#quick-start)
- [1. Using django orm as standalone](#1-using-django-orm-as-standalone)
- [2. Using with django project](#2-using-with-django-project)
- [Useage](#useage)
- [Argument of pipe decorator as context](#argument-of-pipe-decorator-as-context)
- [model](#model)
- [unique_key](#unique_key)
- [method](#method)
- [rename_fields](#rename_fields)
- [exclude_fields](#exclude_fields)
- [foreignkey_fields](#foreignkey_fields)
- [manytomany_fields](#manytomany_fields)
- [- See complicate context and data nested level example](#--see-complicate-context-and-data-nested-level-example)
- [contentfile_fields](#contentfile_fields)
## Concepts
- A decorator that written by wrapping orm method of django models
- It maps the relationship between the models and data via nested dictionary
---
## Features
- It bridges Python functions and django models
- Create and update data to database via models
- Automatically create and modify tables by wrapping manage.py commands from django as makemigrations and migrate
- Load minimum django settings for can use django orm as standalone that without using the django project
- Insertion of data with the same relationship as foreignkey and manytomany fields
- Inserting a content file object as an image field
---
## Install and Import
```bash
pip install django-pipe2db
```
```python
# crawler.py
from pipe2db import pipe
from pipe2db import setupdb
```
---
## Quick Start
### 1. Using django orm as standalone
- Create models.py in the directory that will be used as the Django app
- example for minimum project directory structure. [see](https://github.com/zwolf21/django-pipe2db/tree/master/test)
```bash
Project
│ __main__.py
│
└─bookstore
│ insert.py
│
└─db
models.py
```
```python
# models.py
from django.db import models
class Author(models.Model):
email = models.EmailField('Email', unique=True)
first_name = models.CharField(max_length=100)
last_name = models.CharField(max_length=100)
date_of_birth = models.DateField(null=True, blank=True)
date_of_death = models.DateField('Died', null=True, blank=True)
class Meta:
db_table = 'author'
```
```python
# insert.py
from pipe2db import pipe, setupdb
setupdb() # find models automatically
# setupdb('bookstore.db') # or more explicitly
# The key of the data and the field names of the model are matched
author1 = {
'email': 'xman1@google.com',
'first_name': 'charse',
'last_name': 'javie',
'date_of_birth': '1975-07-25',
'date_of_death': '1995-07-11'
}
author2 = {
'email': 'yman1@google.com',
'first_name': 'jin',
'last_name': 'gray',
'date_of_birth': '1925-07-25',
'date_of_death': '1999-01-21'
}
@pipe({
'model': 'db.Author',
'unique_key': 'email', # unique values of table as pk
# 'method': 'update' # If uncomment, works in update mode
})
def insert(*args, Author, **kwargs):
# You Can get model class via argumenting at generator function
# from django.apps import apps # or via get_model method of django
# Author = apps.get_model('db.Author')
queryset = Author.objects.all()
yield from [author1, author2, author3]
```
- run examples
```bash
python bookstore/insert.py
```
### 2. Using with django project
- Since DJANGO_SETTINGS_MODULE is already setted, it's not need to call setupdb
- [django site example](https://github.com/zwolf21/django-pipe2db/tree/master/testsite/bookstore)
> run via shell which excuted by 'python manage.py shell' command of django manage
> ```bash
> python manage.py shell
> ```
>```python
>In [1]: from yourpackage.insert import insert
>In [2]: insert()
>```
|id|email|first_name|last_name|date_of_birth|date_of_death|
|--|--|--|--|--|--|
|1|xman1@google.com |charse|javie|1975-07-25|1995-07-11|
|2|yman1@google.com |jin|gray|1925-07-25|1999-01-21|
|3|batman1@google.com|wolverin|jack|1988-07-25|NULL|
---
## Useage
### Argument of pipe decorator as context
- A context is a dictionary that describes the relationship between the model and the data
- In the following examples, the elements that make up the context are explained step by step
#### model
- django model to pipe data written as string literals
```python
# some_crawler.py
from pip2db import pipe
@pipe({
'model': 'db.Author'
# 'model': 'yourapp.YourModel' on django project
})
def abc_crawler():
...
yield row
```
> It is also a good way to assign and use a variable to increase reusability
> When expressing nested relationships in relational data, not assigning them as variables can result in repeatedly creating the same context.
```python
# assign to variable crawler.py
# It seems to better way
context_author = {
'model': 'db.Author'
}
@pipe(context_author)
def abcd_crawler(*args, **kwargs):
yield ..
```
- It is also possible to specify the model by directly importing it, but in the case of standalone, you must declare setupdb before importing the model
```python
# dose not look good.py
from pipe2db import setupdb, pipe
setupdb()
from .db.models import Author
context_author = {'model': Author}
@pipe(context_author)
def abc():
yield ..
```
> Another way to refer to the model class
> 1. Using Django's apps module
> ```python
> from django.apps import apps
>
> Author = apps.get_model('db.Author')
> ```
> 2. Specify the model name as an argument to the generator function
> ```python
> # An example of controlling a generator based on data in a database
> @pipe(context_author)
> def abc_crawler(rows, response, Author):
> visited = Author.objects.values_list('review_id', flat=True)
> for row in rows:
> if row['id'] in visited:
> break
> yield row
> ```
#### unique_key
- key to identify data like as primary key
- If you don't specify it, creating data will be duplicated
- To identify data with one or several keys as unique_together
```python
# models.py
# unique key model
class Author(models.Model):
...
first_name = models.CharField(max_length=100, unique=True)
...
```
```python
# uniqufy_by_one.py
context_author = {
'model': 'db.Author',
'unique_key': 'first_name'
}
```
> If uniqueness is not guaranteed with one key, add another
>```python
># models.py
>
># unique together model
>class Author(models.Model):
> ...
> first_name = models.CharField(max_length=100)
> last_name = models.CharField(max_length=100)
>
> class Meta:
> unique_together = ['first_name', 'last_name']
> ...
>```
>```python
>#unique_together.py
>
>context_author = {
> 'model': 'db.Author',
> 'unique_key': ['first_name', 'last_name']
>}
>```
#### method
- Creates or updates data with a unique key specified
- Defaults is create
- In create mode, data is inserted based on unique.
- In update mode as wrapper update_or_create of django method, creates records if they don't exist, otherwise modifies existing records
```python
# incorrect create.py
from pipe2db import pipe
author_incorrect = {
'email': 'batman1@google.com',
'first_name': 'who', # incorrect
'last_name': 'jackman',
'date_of_birth': '1988-07-25', # incorrect
'date_of_death': None
}
context = {
'model': 'db.Author',
'unique_key': 'email',
# 'method': 'create' no need to specify if create
}
@pipe(context)
def gen_author(...):
yield author_incorrect
```
> result table
>
>|id|email|first_name|last_name|date_of_birth|date_of_death|
>|--|--|--|--|--|--|
>|3|batman1@google.com|who|jackman|1988-07-25|NULL|
```python
# correct as update.py
from pipe2db import pipe
author_corrected = {
'email': 'batman1@google.com',
'first_name': 'Hugh', # correct
'last_name': 'jackman',
'date_of_birth': '1968-10-12', # correct
'date_of_death': None
}
context = {
'model': 'db.Author',
'unique_key': 'email',
'method': 'update', # for update record by corrected data
}
@pipe(context)
def gen_author(...):
yield author_corrected
```
> result table
>
>|id|email|first_name|last_name|date_of_birth|date_of_death|
>|--|--|--|--|--|--|
>|3|batman1@google.com|Hugh|jackman|1968-10-12|NULL|
#### rename_fields
- Dictionary of between data and model as key:field mapping
- Used when the data key and the model field name are different
```python
# models.py
from django.db import models
class Author(models.Models):
...
...
class Book(models.Model):
title = models.CharField(max_length=200)
isbn = models.CharField('ISBN', max_length=13, unique=True)
class Meta:
db_table = 'book'
```
```python
# book_crawler.py
context = {
'model': 'db.Book',
'unique_key': 'isbn',
'rename_fields': {
'header' : 'title',
'book_id': 'isbn',
}
}
# map header -> title, book_id -> isbn
@pipe(context)
def book_crawler(abc, defg, jkl=None):
book_list = [
{
'header': 'oh happy day', # header to title
'book_id': '1234640841',
},
{
'header': 'oh happy day',
'book_id': '9214644250',
},
]
yield from book_list
```
#### exclude_fields
- List of keys to excluds
- Used when the data has a key that is not in the field names in the model
- Filter too much information from data that model cannot consume
```python
# bookcrawler.py
from pipe2db import pipe
...
...
context = {
'model': 'db.Book',
'unique_key': 'isbn',
'rename_fields': {
'header' : 'title',
'book_id': 'isbn',
},
'exclude_fields': ['status'] # exclude
}
@pipe(context)
def book_crawler(abc, defg, jkl=None):
book_list = [
{
'header': 'oh happy day', # header to title
'book_id': '1234640841',
'status': 'on sales', # status is not needed in Book model
},
{
'header': 'oh happy day',
'book_id': '9214644250',
'sstatus': 'no stock',
},
]
yield from book_list
```
---
Mapping of Relative Data
#### foreignkey_fields
- Creat records by generation according to the foreign key relationship between tables
- Recursively nest parent dict to children dict
- There are two way of create relationship data
```python
# models.py
# two models of related with foreign key
from django.db import models
class Author(models.Model):
email = models.EmailField('Email', unique=True)
name = models.CharField(max_length=100)
class Meta:
db_table = 'author'
class Book(models.Model):
author = models.ForeignKey('Author', on_delete=models.CASCADE, null=True) # fk
isbn = models.CharField('ISBN', max_length=13, unique=True)
title = models.CharField(max_length=200)
class Meta:
db_table = 'book'
```
```python
# some crawler.py
from pipe2db import pipe
# 1. Generate data of book author nested
context_author = {
'model': 'db.Author',
'unique_key': 'email',
'method': 'update'
}
context_book = {
'model': 'db.Book',
'unique_key': 'isbn',
'foreignkey_fields': {
'book': context_author
}
}
# author data is nested in book data
@pipe(context_book)
def parse_book():
author1 = {
'email': 'pbr112@naver.com',
'name': 'hs moon',
}
book = {
'author': author1,
'title': 'django-pipe2db',
'isbn': '291803928123'
}
yield book
```
```python
# some crawler.py
from pipe2db import pipe
# 2. Generate data of author and book sequentially
@pipe(context_author)
def parse_author():
author1 = {
'email': 'pbr112@naver.com',
'name': 'hs moon',
}
yield author1
# create author first
author1 = parse_author()
# create book after and connect fk relation to author
@pipe(context_book)
def parse_book():
book = {
'author': author1['email'], # Since the author has already been created, it possible to pass email as pk of author only
# 'author': author1, # or same as above
'title': 'django-pipe2db',
'isbn': '291803928123'
}
yield book
```
#### manytomany_fields
- Create data for manytomany relationships
- Generate data with nesting the children m2m data in the parent data key in the form of a list
```python
# models.py
from django.db import models
class Book(models.Model):
title = models.CharField(max_length=200)
isbn = models.CharField('ISBN', max_length=13, unique=True)
genre = models.ManyToManyField('db.Genre')
class Meta:
db_table = 'book'
class Genre(models.Model):
name = models.CharField(max_length=200, unique=True)
class Meta:
db_table = 'genre'
```
```python
# m2m_generator.py
from pipe2db import pipe
context_genre = {
'model': 'db.Genre',
'unique_key': 'name'
}
context_book = {
'model': 'db.Book',
'unique_key': 'isbn',
'manytomany_fields': {
'genre': context_genre
}
}
@pipe(context_book)
def gen_book_with_genre():
genre1 = {'name': 'action'}
genre2 = {'name': 'fantasy'}
book1 = {
'title': 'oh happy day', 'isbn': '2828233644', 'genre': [genre2], # nest genres to list
}
book2 = {
'title': 'python', 'isbn': '9875230846', 'genre': [genre1, genre2],
}
book3 = {
'title': 'java', 'isbn': '1234640841', # has no genre
}
yield from [book1, book2, book3]
```
- [See complicate context and data nested level example](https://github.com/zwolf21/django-pipe2db/blob/master/testsite/bookstore/scraper.py)
---
Create record with contentfiles
#### contentfile_fields
- Saving file via ContentFile class from django.core.files module
- source_url_field is specified as meta data for determinding file name
```python
# models.py
from django.db import models
class BookImage(models.Model):
img = models.ImageField()
class Meta:
db_table = 'bookimage'
```
```python
from pipe2db import pipe
@pipe({
'model': 'db.BookImage',
'contentfile_fields': {
'img': {
'source_url_field': 'src',
}
},
'exclude_fields': ['src'] # when model dose not need src data
})
def image_crawler(response):
image_data = {
'img': 'response_content',
'src': response.url # needed for extracting filename as source_url_field
}
yield image_data
```
Raw data
{
"_id": null,
"home_page": "https://github.com/zwolf21/django-pipe2db",
"name": "django-pipe2db",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "",
"keywords": "pipe2db,django-pipe2db,django orm,standalone django,standalone django orm",
"author": "HS Moon",
"author_email": "pbr112@naver.com",
"download_url": "https://files.pythonhosted.org/packages/79/9e/c62cb230ff160a29063830171c910c0c013a9f737a8ac1371cf71a31b2e3/django-pipe2db-1.0.3.tar.gz",
"platform": null,
"description": "# django-pipe2db\n\n\n## Contents\n- [django-pipe2db](#django-pipe2db)\n - [Contents](#contents)\n - [Concepts](#concepts)\n - [Features](#features)\n - [Install and Import](#install-and-import)\n - [Quick Start](#quick-start)\n - [1. Using django orm as standalone](#1-using-django-orm-as-standalone)\n - [2. Using with django project](#2-using-with-django-project)\n - [Useage](#useage)\n - [Argument of pipe decorator as context](#argument-of-pipe-decorator-as-context)\n - [model](#model)\n - [unique_key](#unique_key)\n - [method](#method)\n - [rename_fields](#rename_fields)\n - [exclude_fields](#exclude_fields)\n - [foreignkey_fields](#foreignkey_fields)\n - [manytomany_fields](#manytomany_fields)\n - [- See complicate context and data nested level example](#--see-complicate-context-and-data-nested-level-example)\n - [contentfile_fields](#contentfile_fields)\n\n\n\n## Concepts\n- A decorator that written by wrapping orm method of django models\n- It maps the relationship between the models and data via nested dictionary\n\n---\n## Features\n- It bridges Python functions and django models\n- Create and update data to database via models\n- Automatically create and modify tables by wrapping manage.py commands from django as makemigrations and migrate\n- Load minimum django settings for can use django orm as standalone that without using the django project\n- Insertion of data with the same relationship as foreignkey and manytomany fields\n- Inserting a content file object as an image field\n\n---\n## Install and Import\n\n```bash\npip install django-pipe2db\n```\n```python\n# crawler.py\nfrom pipe2db import pipe\nfrom pipe2db import setupdb\n```\n---\n## Quick Start\n\n\n### 1. Using django orm as standalone\n- Create models.py in the directory that will be used as the Django app\n- example for minimum project directory structure. [see](https://github.com/zwolf21/django-pipe2db/tree/master/test)\n```bash\nProject\n\u2502 __main__.py\n\u2502\n\u2514\u2500bookstore\n \u2502 insert.py\n \u2502 \n \u2514\u2500db\n models.py\n```\n\n```python\n# models.py\nfrom django.db import models\n\n\nclass Author(models.Model):\n email = models.EmailField('Email', unique=True)\n first_name = models.CharField(max_length=100)\n last_name = models.CharField(max_length=100)\n date_of_birth = models.DateField(null=True, blank=True)\n date_of_death = models.DateField('Died', null=True, blank=True)\n\n class Meta:\n db_table = 'author'\n```\n```python\n# insert.py\nfrom pipe2db import pipe, setupdb\n\n\nsetupdb() # find models automatically\n# setupdb('bookstore.db') # or more explicitly \n\n# The key of the data and the field names of the model are matched\nauthor1 = {\n 'email': 'xman1@google.com',\n 'first_name': 'charse',\n 'last_name': 'javie',\n 'date_of_birth': '1975-07-25',\n 'date_of_death': '1995-07-11'\n}\nauthor2 = {\n 'email': 'yman1@google.com',\n 'first_name': 'jin',\n 'last_name': 'gray',\n 'date_of_birth': '1925-07-25',\n 'date_of_death': '1999-01-21'\n}\n\n\n@pipe({\n 'model': 'db.Author', \n 'unique_key': 'email', # unique values of table as pk\n # 'method': 'update' # If uncomment, works in update mode\n})\ndef insert(*args, Author, **kwargs):\n # You Can get model class via argumenting at generator function\n\n # from django.apps import apps # or via get_model method of django\n # Author = apps.get_model('db.Author') \n\n queryset = Author.objects.all()\n\n yield from [author1, author2, author3]\n\n```\n\n- run examples\n```bash\npython bookstore/insert.py\n```\n\n\n### 2. Using with django project\n- Since DJANGO_SETTINGS_MODULE is already setted, it's not need to call setupdb\n- [django site example](https://github.com/zwolf21/django-pipe2db/tree/master/testsite/bookstore)\n\n> run via shell which excuted by 'python manage.py shell' command of django manage\n> ```bash\n> python manage.py shell\n> ```\n>```python\n>In [1]: from yourpackage.insert import insert\n>In [2]: insert()\n>```\n\n\n|id|email|first_name|last_name|date_of_birth|date_of_death|\n|--|--|--|--|--|--|\n|1|xman1@google.com\t|charse|javie|1975-07-25|1995-07-11|\n|2|yman1@google.com\t|jin|gray|1925-07-25|1999-01-21|\n|3|batman1@google.com|wolverin|jack|1988-07-25|NULL|\n\n\n\n--- \n## Useage\n\n### Argument of pipe decorator as context\n- A context is a dictionary that describes the relationship between the model and the data\n- In the following examples, the elements that make up the context are explained step by step\n\n#### model\n- django model to pipe data written as string literals\n```python\n# some_crawler.py\nfrom pip2db import pipe\n\n@pipe({\n 'model': 'db.Author'\n # 'model': 'yourapp.YourModel' on django project\n})\ndef abc_crawler():\n ...\n yield row\n```\n> It is also a good way to assign and use a variable to increase reusability\n> When expressing nested relationships in relational data, not assigning them as variables can result in repeatedly creating the same context.\n```python\n# assign to variable crawler.py\n\n# It seems to better way\ncontext_author = {\n 'model': 'db.Author'\n}\n\n@pipe(context_author)\ndef abcd_crawler(*args, **kwargs):\n yield ..\n```\n\n- It is also possible to specify the model by directly importing it, but in the case of standalone, you must declare setupdb before importing the model\n \n```python\n# dose not look good.py\n\nfrom pipe2db import setupdb, pipe\n\nsetupdb()\nfrom .db.models import Author\n\ncontext_author = {'model': Author}\n\n@pipe(context_author)\ndef abc():\n yield ..\n```\n\n> Another way to refer to the model class\n> 1. Using Django's apps module\n> ```python\n> from django.apps import apps\n>\n> Author = apps.get_model('db.Author')\n> ```\n> 2. Specify the model name as an argument to the generator function\n> ```python \n> # An example of controlling a generator based on data in a database\n> @pipe(context_author)\n> def abc_crawler(rows, response, Author):\n> visited = Author.objects.values_list('review_id', flat=True)\n> for row in rows:\n> if row['id'] in visited:\n> break\n> yield row\n> ```\n\n#### unique_key\n- key to identify data like as primary key\n- If you don't specify it, creating data will be duplicated\n- To identify data with one or several keys as unique_together\n\n```python\n# models.py\n\n# unique key model\nclass Author(models.Model):\n ...\n first_name = models.CharField(max_length=100, unique=True)\n ...\n```\n\n```python\n# uniqufy_by_one.py\n\ncontext_author = {\n 'model': 'db.Author',\n 'unique_key': 'first_name'\n}\n```\n\n> If uniqueness is not guaranteed with one key, add another\n>```python\n># models.py\n>\n># unique together model\n>class Author(models.Model):\n> ...\n> first_name = models.CharField(max_length=100)\n> last_name = models.CharField(max_length=100)\n>\n> class Meta:\n> unique_together = ['first_name', 'last_name']\n> ...\n>```\n>```python\n>#unique_together.py\n>\n>context_author = {\n> 'model': 'db.Author',\n> 'unique_key': ['first_name', 'last_name']\n>}\n>```\n\n\n#### method\n- Creates or updates data with a unique key specified\n- Defaults is create\n- In create mode, data is inserted based on unique.\n- In update mode as wrapper update_or_create of django method, creates records if they don't exist, otherwise modifies existing records\n\n\n```python\n# incorrect create.py\nfrom pipe2db import pipe\n\nauthor_incorrect = {\n 'email': 'batman1@google.com',\n 'first_name': 'who', # incorrect\n 'last_name': 'jackman',\n 'date_of_birth': '1988-07-25', # incorrect\n 'date_of_death': None\n}\n\ncontext = {\n 'model': 'db.Author',\n 'unique_key': 'email',\n # 'method': 'create' no need to specify if create\n}\n\n@pipe(context)\ndef gen_author(...):\n yield author_incorrect\n```\n> result table\n>\n>|id|email|first_name|last_name|date_of_birth|date_of_death|\n>|--|--|--|--|--|--|\n>|3|batman1@google.com|who|jackman|1988-07-25|NULL|\n\n\n```python\n# correct as update.py\nfrom pipe2db import pipe\n\nauthor_corrected = {\n 'email': 'batman1@google.com',\n 'first_name': 'Hugh', # correct\n 'last_name': 'jackman',\n 'date_of_birth': '1968-10-12', # correct\n 'date_of_death': None\n}\n\ncontext = {\n 'model': 'db.Author',\n 'unique_key': 'email',\n 'method': 'update', # for update record by corrected data\n}\n\n@pipe(context)\ndef gen_author(...):\n yield author_corrected\n```\n> result table\n>\n>|id|email|first_name|last_name|date_of_birth|date_of_death|\n>|--|--|--|--|--|--|\n>|3|batman1@google.com|Hugh|jackman|1968-10-12|NULL|\n\n\n#### rename_fields\n- Dictionary of between data and model as key:field mapping\n- Used when the data key and the model field name are different\n\n```python\n# models.py\nfrom django.db import models\n\n\nclass Author(models.Models):\n ...\n ...\n\nclass Book(models.Model):\n title = models.CharField(max_length=200) \n isbn = models.CharField('ISBN', max_length=13, unique=True)\n\n class Meta:\n db_table = 'book'\n```\n\n```python\n# book_crawler.py\n\ncontext = {\n 'model': 'db.Book',\n 'unique_key': 'isbn',\n 'rename_fields': {\n 'header' : 'title', \n 'book_id': 'isbn',\n }\n}\n# map header -> title, book_id -> isbn\n\n@pipe(context)\ndef book_crawler(abc, defg, jkl=None):\n book_list = [\n {\n 'header': 'oh happy day', # header to title\n 'book_id': '1234640841',\n },\n {\n 'header': 'oh happy day',\n 'book_id': '9214644250',\n },\n ]\n yield from book_list\n```\n\n#### exclude_fields\n- List of keys to excluds\n- Used when the data has a key that is not in the field names in the model\n- Filter too much information from data that model cannot consume\n \n```python\n# bookcrawler.py\nfrom pipe2db import pipe\n...\n...\n\ncontext = {\n 'model': 'db.Book',\n 'unique_key': 'isbn',\n 'rename_fields': {\n 'header' : 'title', \n 'book_id': 'isbn',\n },\n 'exclude_fields': ['status'] # exclude\n}\n\n@pipe(context)\ndef book_crawler(abc, defg, jkl=None):\n book_list = [\n {\n 'header': 'oh happy day', # header to title\n 'book_id': '1234640841',\n 'status': 'on sales', # status is not needed in Book model\n },\n {\n 'header': 'oh happy day',\n 'book_id': '9214644250',\n 'sstatus': 'no stock',\n },\n ]\n yield from book_list\n\n```\n\n--- \nMapping of Relative Data\n\n#### foreignkey_fields\n- Creat records by generation according to the foreign key relationship between tables\n- Recursively nest parent dict to children dict\n- There are two way of create relationship data\n\n```python\n# models.py\n# two models of related with foreign key\nfrom django.db import models\n\n\nclass Author(models.Model):\n email = models.EmailField('Email', unique=True)\n name = models.CharField(max_length=100)\n\n class Meta:\n db_table = 'author'\n\n\nclass Book(models.Model):\n author = models.ForeignKey('Author', on_delete=models.CASCADE, null=True) # fk\n isbn = models.CharField('ISBN', max_length=13, unique=True)\n title = models.CharField(max_length=200)\n\n class Meta:\n db_table = 'book'\n```\n\n```python\n# some crawler.py\nfrom pipe2db import pipe\n\n# 1. Generate data of book author nested\n\ncontext_author = {\n 'model': 'db.Author',\n 'unique_key': 'email',\n 'method': 'update'\n}\n\ncontext_book = {\n 'model': 'db.Book',\n 'unique_key': 'isbn',\n 'foreignkey_fields': {\n 'book': context_author\n }\n}\n\n# author data is nested in book data\n@pipe(context_book)\ndef parse_book():\n author1 = {\n 'email': 'pbr112@naver.com',\n 'name': 'hs moon',\n }\n book = {\n 'author': author1,\n 'title': 'django-pipe2db',\n 'isbn': '291803928123'\n }\n yield book\n\n```\n\n```python\n# some crawler.py \nfrom pipe2db import pipe\n\n# 2. Generate data of author and book sequentially\n\n@pipe(context_author)\ndef parse_author():\n author1 = {\n 'email': 'pbr112@naver.com',\n 'name': 'hs moon',\n }\n yield author1\n\n# create author first\nauthor1 = parse_author()\n\n# create book after and connect fk relation to author\n@pipe(context_book)\ndef parse_book():\n book = {\n 'author': author1['email'], # Since the author has already been created, it possible to pass email as pk of author only\n # 'author': author1, # or same as above\n 'title': 'django-pipe2db',\n 'isbn': '291803928123'\n }\n yield book\n```\n\n#### manytomany_fields\n- Create data for manytomany relationships\n- Generate data with nesting the children m2m data in the parent data key in the form of a list\n\n```python\n# models.py \nfrom django.db import models\n\n\nclass Book(models.Model):\n title = models.CharField(max_length=200)\n isbn = models.CharField('ISBN', max_length=13, unique=True)\n\n genre = models.ManyToManyField('db.Genre')\n\n class Meta:\n db_table = 'book'\n\n\nclass Genre(models.Model):\n name = models.CharField(max_length=200, unique=True)\n\n class Meta:\n db_table = 'genre'\n\n```\n\n```python\n# m2m_generator.py\nfrom pipe2db import pipe\n\ncontext_genre = {\n 'model': 'db.Genre',\n 'unique_key': 'name'\n}\n\ncontext_book = {\n 'model': 'db.Book',\n 'unique_key': 'isbn',\n 'manytomany_fields': {\n 'genre': context_genre\n }\n}\n\n@pipe(context_book)\ndef gen_book_with_genre():\n genre1 = {'name': 'action'}\n genre2 = {'name': 'fantasy'}\n\n book1 = {\n 'title': 'oh happy day', 'isbn': '2828233644', 'genre': [genre2], # nest genres to list\n }\n book2 = {\n 'title': 'python', 'isbn': '9875230846', 'genre': [genre1, genre2],\n }\n book3 = {\n 'title': 'java', 'isbn': '1234640841', # has no genre\n }\n yield from [book1, book2, book3]\n```\n\n- [See complicate context and data nested level example](https://github.com/zwolf21/django-pipe2db/blob/master/testsite/bookstore/scraper.py)\n---\n\nCreate record with contentfiles\n\n#### contentfile_fields\n- Saving file via ContentFile class from django.core.files module\n- source_url_field is specified as meta data for determinding file name\n\n```python\n# models.py\nfrom django.db import models\n\nclass BookImage(models.Model):\n img = models.ImageField()\n\n class Meta:\n db_table = 'bookimage'\n\n```\n\n```python\nfrom pipe2db import pipe\n\n@pipe({\n 'model': 'db.BookImage',\n 'contentfile_fields': {\n 'img': {\n 'source_url_field': 'src',\n }\n },\n 'exclude_fields': ['src'] # when model dose not need src data\n})\ndef image_crawler(response):\n image_data = {\n 'img': 'response_content',\n 'src': response.url # needed for extracting filename as source_url_field\n }\n yield image_data\n```",
"bugtrack_url": null,
"license": "MIT",
"summary": "A decorator that connects django model and data generator function",
"version": "1.0.3",
"project_urls": {
"Homepage": "https://github.com/zwolf21/django-pipe2db"
},
"split_keywords": [
"pipe2db",
"django-pipe2db",
"django orm",
"standalone django",
"standalone django orm"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "799ec62cb230ff160a29063830171c910c0c013a9f737a8ac1371cf71a31b2e3",
"md5": "1a2ea7b30a1c2a94546fefb1e3a4952a",
"sha256": "01719bf9ef3d40bae5823585520d545fbbab97d6ad924f1ff93c21365fbadf0f"
},
"downloads": -1,
"filename": "django-pipe2db-1.0.3.tar.gz",
"has_sig": false,
"md5_digest": "1a2ea7b30a1c2a94546fefb1e3a4952a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 15095,
"upload_time": "2022-08-02T15:21:47",
"upload_time_iso_8601": "2022-08-02T15:21:47.556179Z",
"url": "https://files.pythonhosted.org/packages/79/9e/c62cb230ff160a29063830171c910c0c013a9f737a8ac1371cf71a31b2e3/django-pipe2db-1.0.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2022-08-02 15:21:47",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "zwolf21",
"github_project": "django-pipe2db",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "django-pipe2db"
}