faker-datasets


Namefaker-datasets JSON
Version 0.1.0 PyPI version JSON
download
home_page
SummaryFaker provider that loads data from your datasets
upload_time2023-01-13 15:55:47
maintainer
docs_urlNone
author
requires_python>=3.8
licenseCopyright (c) 2022 Elasticsearch B.V. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords faker fixtures data test mock generator
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Build [Faker](https://github.com/joke2k/faker#readme) providers based on datasets

`faker-datasets` offers a building block for seeding the data generation
with existing data.

You can create simple providers picking a random entry from a tailored dataset or
assemble complex ones where you generate new combinations from more datasets,
all this while keeping an eye on speed and memory consumption.

Let's see how to.

# Crash course

We'll use the wonderful [Countries State Cities DB](https://github.com/dr5hn/countries-states-cities-database)
maintained by [Darshan Gada](https://github.com/dr5hn). Download the
[cities](https://raw.githubusercontent.com/dr5hn/countries-states-cities-database/v1.9/cities.json) and the
[countries](https://raw.githubusercontent.com/dr5hn/countries-states-cities-database/v1.9/countries.json) datasets.

## Basic random picker

`Cities` generates a city by randomly picking an entry in the cities
dataset. Here the dataset is named `cities`, the dataset file is
`cities.json` (adjust to the actual path of the file saved earlier)
and the picker, the method to get a random city, is named `city`.

How we define it in file `cities_provider.py`:

```python
from faker_datasets import Provider, add_dataset

@add_dataset("cities", "cities.json", picker="city")
class Cities(Provider):
    pass
```

How we could use it to generate 10 cities:

```python
from faker import Faker
from cities_provider import Cities

fake = Faker()
fake.add_provider(Cities)

for _ in range(10):
    # Use of the picker named in @add_dateset
    city = fake.city()
    print("{name} is in {country_name}".format(**city))
```

One of the many possible outputs:

```
Poiana Cristei is in Romania
Codosera La is in Spain
Jeremoabo is in Brazil
RodrĂ­go M. Quevedo is in Mexico
Cary is in United States
Locking is in United Kingdom
Mezinovskiy is in Russia
Nesoddtangen is in Norway
Zalesnoye is in Ukraine
Cefa is in Romania
```

Because the data generation is a pseudo-random process, every execution outputs
different results. If you want reproducible outputs, you have to seed the Faker
generator as documented [here](https://faker.readthedocs.io/en/master/index.html#seeding-the-generator).

## Customize the random picker

`CitiesEx` is functionally identical to `Cities` but shows how to define
the picker by yourself. Here `picker=` is gone from the parameters of
`@add_dataset` but a new `city` method is defined.

```python
from faker_datasets import Provider, add_dataset, with_datasets

@add_dataset("cities", "cities.json")
class CitiesEx(Provider):

    @with_datasets("cities")
    def city(self, cities):
        return self.__pick__(cities)
```

Note how the `city` method is decorated with `@with_datasets("cities")`
and how, consequently, it receives the said dataset as parameter.
The call to `__pick__` just selects a random entry from `cities`.


## Matching a criterium

`CitiesFromCountry` exploits the custom picker to return only cities from a
given country. A first implementation could just discard cities from any
other country, getting slower with increasing bad luck.

```python
from faker_datasets import Provider, add_dataset, with_datasets

@add_dataset("cities", "cities.json")
class CitiesFromCountry(Provider):

    @with_datasets("cities")
    def city(self, cities, country_name):
        while True:
            city = self.__pick__(cities)
            if city["country_name"] == country_name:
                return city
```

It's better to limit to the number of attempts though otherwise if
`country_name` is misspelled the picker would enter in an infinite loop.

```python
from faker_datasets import Provider, add_dataset, with_datasets

@add_dataset("cities", "cities.json")
class CitiesFromCountry(Provider):

    @with_datasets("cities")
    def city(self, cities, country_name, max_attempts=10000):
        while max_attempts:
            city = self.__pick__(cities)
            if city["country_name"] == country_name:
                return city
            max_attempts -= 1
        raise ValueError("Run out of attempts")
```

Or, with same results, use the `match=` and `max_attempts=`
parameters of `__pick__`.

```python
from faker_datasets import Provider, add_dataset, with_datasets

@add_dataset("cities", "cities.json")
class CitiesFromCountry(Provider):

    @with_datasets("cities")
    def city(self, cities, country_name):
        # match tells to __picker__ whether the city is good or not
        match = lambda city: city["country_name"] == country_name
        return self.__pick__(cities, match=match, max_attempts=10000)
```

If you know ahead which country you are interested in, say Afghanistan,
you can use the `@with_match` picker decorator. It produces a new index
of only matching entries and the picking speed is again constant and
independent from bad luck.

```python
from faker_datasets import Provider, add_dataset, with_datasets, with_match

@add_dataset("cities", "cities.json")
class CitiesFromCountry(Provider):

    @with_datasets("cities")
    @with_match(lambda city: city["country_name"] == "Afghanistan")
    def afghan_city(self, cities):
        return self.__pick__(cities)
```

At such conditions though it's maybe better to massage your dataset and
leave only the entries matching your criteria.

## Using multiple datasets

`CitiesAndCountries` fuses two datasets for more advanced matches. Note
how `@add_dataset` makes multiple datasets available to the provider
and `@with_datasets` passes them to the given picker.

```python
from faker_datasets import Provider, add_dataset, with_datasets, with_match

@add_dataset("cities", "cities.json")
@add_dataset("countries", "countries.json")
class CitiesAndCountries(Provider):

    @with_datasets("cities", "countries")
    def city_by_region(self, cities, countries, region):
        def match(city):
            # Given a city, find its country info in the countries dataset
            country = next(country for country in countries if country["name"] == city["country_name"])
            # Check that the country is in the region of interest
            return country["region"] == region
        return self.__pick__(cities, match=match, max_attempts=10000)
```

The picker performs the data mix and match so that the region request
is satisfied or an error is signaled.

## Summary

You use `@add_dataset` to attach a dataset to your provider, if you specify
a `picker=` parameter you'll get for free a random picker of entries.
The more datasets you need, the more `@add_dataset` you can use.

If you have special needs you can define the pickers for yourself, each
using what datasets are most appropriate among those made available with
`@add_dataset`. You can add as many pickers as you need.

A picker can use `match=` and `max_attempts=` to make the generation respect
some useful criteria.

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "faker-datasets",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "faker,fixtures,data,test,mock,generator",
    "author": "",
    "author_email": "Domenico Andreoli <domenico.andreoli@elastic.co>",
    "download_url": "https://files.pythonhosted.org/packages/3a/90/b5d2388bb77e7c42e0059d58dfcde90af90229f1621ce2cdda1e19fcb1f8/faker-datasets-0.1.0.tar.gz",
    "platform": null,
    "description": "# Build [Faker](https://github.com/joke2k/faker#readme) providers based on datasets\n\n`faker-datasets` offers a building block for seeding the data generation\nwith existing data.\n\nYou can create simple providers picking a random entry from a tailored dataset or\nassemble complex ones where you generate new combinations from more datasets,\nall this while keeping an eye on speed and memory consumption.\n\nLet's see how to.\n\n# Crash course\n\nWe'll use the wonderful [Countries State Cities DB](https://github.com/dr5hn/countries-states-cities-database)\nmaintained by [Darshan Gada](https://github.com/dr5hn). Download the\n[cities](https://raw.githubusercontent.com/dr5hn/countries-states-cities-database/v1.9/cities.json) and the\n[countries](https://raw.githubusercontent.com/dr5hn/countries-states-cities-database/v1.9/countries.json) datasets.\n\n## Basic random picker\n\n`Cities` generates a city by randomly picking an entry in the cities\ndataset. Here the dataset is named `cities`, the dataset file is\n`cities.json` (adjust to the actual path of the file saved earlier)\nand the picker, the method to get a random city, is named `city`.\n\nHow we define it in file `cities_provider.py`:\n\n```python\nfrom faker_datasets import Provider, add_dataset\n\n@add_dataset(\"cities\", \"cities.json\", picker=\"city\")\nclass Cities(Provider):\n    pass\n```\n\nHow we could use it to generate 10 cities:\n\n```python\nfrom faker import Faker\nfrom cities_provider import Cities\n\nfake = Faker()\nfake.add_provider(Cities)\n\nfor _ in range(10):\n    # Use of the picker named in @add_dateset\n    city = fake.city()\n    print(\"{name} is in {country_name}\".format(**city))\n```\n\nOne of the many possible outputs:\n\n```\nPoiana Cristei is in Romania\nCodosera La is in Spain\nJeremoabo is in Brazil\nRodr\u00edgo M. Quevedo is in Mexico\nCary is in United States\nLocking is in United Kingdom\nMezinovskiy is in Russia\nNesoddtangen is in Norway\nZalesnoye is in Ukraine\nCefa is in Romania\n```\n\nBecause the data generation is a pseudo-random process, every execution outputs\ndifferent results. If you want reproducible outputs, you have to seed the Faker\ngenerator as documented [here](https://faker.readthedocs.io/en/master/index.html#seeding-the-generator).\n\n## Customize the random picker\n\n`CitiesEx` is functionally identical to `Cities` but shows how to define\nthe picker by yourself. Here `picker=` is gone from the parameters of\n`@add_dataset` but a new `city` method is defined.\n\n```python\nfrom faker_datasets import Provider, add_dataset, with_datasets\n\n@add_dataset(\"cities\", \"cities.json\")\nclass CitiesEx(Provider):\n\n    @with_datasets(\"cities\")\n    def city(self, cities):\n        return self.__pick__(cities)\n```\n\nNote how the `city` method is decorated with `@with_datasets(\"cities\")`\nand how, consequently, it receives the said dataset as parameter.\nThe call to `__pick__` just selects a random entry from `cities`.\n\n\n## Matching a criterium\n\n`CitiesFromCountry` exploits the custom picker to return only cities from a\ngiven country. A first implementation could just discard cities from any\nother country, getting slower with increasing bad luck.\n\n```python\nfrom faker_datasets import Provider, add_dataset, with_datasets\n\n@add_dataset(\"cities\", \"cities.json\")\nclass CitiesFromCountry(Provider):\n\n    @with_datasets(\"cities\")\n    def city(self, cities, country_name):\n        while True:\n            city = self.__pick__(cities)\n            if city[\"country_name\"] == country_name:\n                return city\n```\n\nIt's better to limit to the number of attempts though otherwise if\n`country_name` is misspelled the picker would enter in an infinite loop.\n\n```python\nfrom faker_datasets import Provider, add_dataset, with_datasets\n\n@add_dataset(\"cities\", \"cities.json\")\nclass CitiesFromCountry(Provider):\n\n    @with_datasets(\"cities\")\n    def city(self, cities, country_name, max_attempts=10000):\n        while max_attempts:\n            city = self.__pick__(cities)\n            if city[\"country_name\"] == country_name:\n                return city\n            max_attempts -= 1\n        raise ValueError(\"Run out of attempts\")\n```\n\nOr, with same results, use the `match=` and `max_attempts=`\nparameters of `__pick__`.\n\n```python\nfrom faker_datasets import Provider, add_dataset, with_datasets\n\n@add_dataset(\"cities\", \"cities.json\")\nclass CitiesFromCountry(Provider):\n\n    @with_datasets(\"cities\")\n    def city(self, cities, country_name):\n        # match tells to __picker__ whether the city is good or not\n        match = lambda city: city[\"country_name\"] == country_name\n        return self.__pick__(cities, match=match, max_attempts=10000)\n```\n\nIf you know ahead which country you are interested in, say Afghanistan,\nyou can use the `@with_match` picker decorator. It produces a new index\nof only matching entries and the picking speed is again constant and\nindependent from bad luck.\n\n```python\nfrom faker_datasets import Provider, add_dataset, with_datasets, with_match\n\n@add_dataset(\"cities\", \"cities.json\")\nclass CitiesFromCountry(Provider):\n\n    @with_datasets(\"cities\")\n    @with_match(lambda city: city[\"country_name\"] == \"Afghanistan\")\n    def afghan_city(self, cities):\n        return self.__pick__(cities)\n```\n\nAt such conditions though it's maybe better to massage your dataset and\nleave only the entries matching your criteria.\n\n## Using multiple datasets\n\n`CitiesAndCountries` fuses two datasets for more advanced matches. Note\nhow `@add_dataset` makes multiple datasets available to the provider\nand `@with_datasets` passes them to the given picker.\n\n```python\nfrom faker_datasets import Provider, add_dataset, with_datasets, with_match\n\n@add_dataset(\"cities\", \"cities.json\")\n@add_dataset(\"countries\", \"countries.json\")\nclass CitiesAndCountries(Provider):\n\n    @with_datasets(\"cities\", \"countries\")\n    def city_by_region(self, cities, countries, region):\n        def match(city):\n            # Given a city, find its country info in the countries dataset\n            country = next(country for country in countries if country[\"name\"] == city[\"country_name\"])\n            # Check that the country is in the region of interest\n            return country[\"region\"] == region\n        return self.__pick__(cities, match=match, max_attempts=10000)\n```\n\nThe picker performs the data mix and match so that the region request\nis satisfied or an error is signaled.\n\n## Summary\n\nYou use `@add_dataset` to attach a dataset to your provider, if you specify\na `picker=` parameter you'll get for free a random picker of entries.\nThe more datasets you need, the more `@add_dataset` you can use.\n\nIf you have special needs you can define the pickers for yourself, each\nusing what datasets are most appropriate among those made available with\n`@add_dataset`. You can add as many pickers as you need.\n\nA picker can use `match=` and `max_attempts=` to make the generation respect\nsome useful criteria.\n",
    "bugtrack_url": null,
    "license": "Copyright (c) 2022 Elasticsearch B.V.  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ",
    "summary": "Faker provider that loads data from your datasets",
    "version": "0.1.0",
    "split_keywords": [
        "faker",
        "fixtures",
        "data",
        "test",
        "mock",
        "generator"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9c0258fc34b25ab25a583c5f2a8c4986757962a0090ea4e78198c6bb48b2217a",
                "md5": "51dd514e1f46339e4978c56f762d4f22",
                "sha256": "d1f557e3d9da618cc7d5b0fc5240acd73fb051c15c845219abdda00a431f4acd"
            },
            "downloads": -1,
            "filename": "faker_datasets-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "51dd514e1f46339e4978c56f762d4f22",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 6580,
            "upload_time": "2023-01-13T15:55:45",
            "upload_time_iso_8601": "2023-01-13T15:55:45.830389Z",
            "url": "https://files.pythonhosted.org/packages/9c/02/58fc34b25ab25a583c5f2a8c4986757962a0090ea4e78198c6bb48b2217a/faker_datasets-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3a90b5d2388bb77e7c42e0059d58dfcde90af90229f1621ce2cdda1e19fcb1f8",
                "md5": "d32a41ffc1ff4aae9c75439e1f95c718",
                "sha256": "4c44976ebfd0ef24f2c7191161efe72146672309f40938209f4c185cbd845cde"
            },
            "downloads": -1,
            "filename": "faker-datasets-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "d32a41ffc1ff4aae9c75439e1f95c718",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 9093,
            "upload_time": "2023-01-13T15:55:47",
            "upload_time_iso_8601": "2023-01-13T15:55:47.504549Z",
            "url": "https://files.pythonhosted.org/packages/3a/90/b5d2388bb77e7c42e0059d58dfcde90af90229f1621ce2cdda1e19fcb1f8/faker-datasets-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-01-13 15:55:47",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "lcname": "faker-datasets"
}
        
Elapsed time: 0.13295s