steamstore-etl


Namesteamstore-etl JSON
Version 0.0.15 PyPI version JSON
download
home_pagehttps://github.com/DataForgeOpenAIHub/Steam-Sales-Analysis
SummaryCLI for Steam Store Data Ingestion ETL Pipeline
upload_time2024-08-29 05:53:55
maintainerNone
docs_urlNone
authorDataForgeOpenAIHub
requires_python>=3.10
licenseMIT
keywords steam etl data-pipeline cli python
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Steam Sales Analysis

![banner](assets/imgs/steam_logo_banner.jpg)

## Overview
Welcome to **Steam Sales Analysis** – an innovative project designed to harness the power of data for insights into the gaming world. We have meticulously crafted an ETL (Extract, Transform, Load) pipeline that covers every essential step: data retrieval, processing, validation, and ingestion. By leveraging the robust Steamspy and Steam APIs, we collect comprehensive game-related metadata, details, and sales figures.

But we don’t stop there. The culmination of this data journey sees the information elegantly loaded into a MySQL database hosted on Aiven Cloud. From this solid foundation, we take it a step further: the data is analyzed and visualized through dynamic and interactive Tableau dashboards. This transforms raw numbers into actionable insights, offering a clear window into gaming trends and sales performance. Join us as we dive deep into the data and bring the world of gaming to life!

# `steamstore` CLI
![Steamstore ETL Pipeline](assets/imgs/steamstore-etl.drawio.png)

## Setup
### Installing the package
For general use, setting up the environment and dependencies is straightforward:

```bash
# Install the python distribution from PyPI
pip install steamstore-etl
```

### Setting up the environment variables
- Create an `.env` file in a directory.
```ini
# Database configuration
MYSQL_USERNAME=<your_mysql_username>
MYSQL_PASSWORD=<your_mysql_password>
MYSQL_HOST=<your_mysql_host>
MYSQL_PORT=<your_mysql_port>
MYSQL_DB_NAME=<your_mysql_db_name>
```
- Open a terminal at the specified location

   #### For Ubuntu (or other Unix-like systems)

   1. **Load `.env` Variables into the Terminal**

      To load the variables from the `.env` file into your current terminal session, you can use the `export` command along with the `dotenv` command if you have the `dotenv` utility installed. 

      **Using `export` directly (manual method):**

      ```bash
      export $(grep -v '^#' .env | xargs)
      ```

      - `grep -v '^#' .env` removes any comments from the file.
      - `xargs` converts the output into environment variable export commands.

      **Using `dotenv` (requires installation):**

      If you prefer a tool, you can use `dotenv`:

      - Install `dotenv` if you don't have it:

      ```bash
      sudo apt-get install python3-dotenv
      ```

      - Then, use the following command to load the `.env` file:

      ```bash
      dotenv
      ```

      **Using `source` (not typical for `.env` but useful for `.sh` files):**

      If your `.env` file is simple, you can use `source` directly (this method assumes no special parsing is needed):

      ```bash
      source .env
      ```

      Note that `source` works well if your `.env` file only contains simple `KEY=VALUE` pairs.

   2. **Verify the Variables**

      After loading, you can check that the environment variables are set:

      ```bash
      echo $MYSQL_USERNAME
      ```

   #### For Windows

   1. **Load `.env` Variables into PowerShell**

      You can use a PowerShell script to load the variables from the `.env` file.

      **Create a PowerShell script (e.g., `load_env.ps1`):**

      ```powershell
      Get-Content .env | ForEach-Object {
         if ($_ -match "^(.*?)=(.*)$") {
            [System.Environment]::SetEnvironmentVariable($matches[1], $matches[2], [System.EnvironmentVariableTarget]::Process)
         }
      }
      ```

      - This script reads each line from the `.env` file and sets it as an environment variable for the current PowerShell session.

      **Run the script:**

      ```powershell
      .\load_env.ps1
      ```

      **Verify the Variables:**

      ```powershell
      echo $env:MYSQL_USERNAME
      ```

   2. **Load `.env` Variables into Command Prompt**

      The Command Prompt does not have built-in support for `.env` files. You can use a batch script to achieve this.

      **Create a batch script (e.g., `load_env.bat`):**

      ```batch
      @echo off
      for /f "tokens=1,2 delims==" %%A in (.env) do set %%A=%%B
      ```

      **Run the batch script:**

      ```batch
      load_env.bat
      ```

      **Verify the Variables:**

      ```batch
      echo %MYSQL_USERNAME%
      ```

## CLI for Steam Store Data Ingestion ETL Pipeline

**Usage**:

```console
$ steamstore [OPTIONS] COMMAND [ARGS]...
```

**Options**:

- `--install-completion`: Install completion for the current shell.
- `--show-completion`: Show completion for the current shell, to copy it or customize the installation.
- `--help`: Show this message and exit.

**Commands**:

- `clean_steam_data`: Clean the Steam Data and ingest into the Custom Database
- `fetch_steamspy_data`: Fetch from SteamSpy Database and ingest data into Custom Database
- `fetch_steamspy_metadata`: Fetch metadata from SteamSpy Database and ingest metadata into Custom Database
- `fetch_steamstore_data`: Fetch from Steam Store Database and ingest data into Custom Database

## Detailed Command Usage
### `steamstore clean_steam_data`

Clean the Steam Data and ingest into the Custom Database

**Usage**:

```console
$ steamstore clean_steam_data [OPTIONS]
```

**Options**:

- `--batch-size INTEGER`: Number of records to process in each batch.  [default: 1000]
- `--help`: Show this message and exit.

### `steamstore fetch_steamspy_data`

Fetch from SteamSpy Database and ingest data into Custom Database

**Usage**:

```console
$ steamstore fetch_steamspy_data [OPTIONS]
```

**Options**:

- `--batch-size INTEGER`: Number of records to process in each batch.  [default: 1000]
- `--help`: Show this message and exit.

### `steamstore fetch_steamspy_metadata`

Fetch metadata from SteamSpy Database and ingest metadata into Custom Database

**Usage**:

```console
$ steamstore fetch_steamspy_metadata [OPTIONS]
```

**Options**:

- `--max-pages INTEGER`: Number of pages to fetch from.  [default: 100]
- `--help`: Show this message and exit.

### `steamstore fetch_steamstore_data`

Fetch from Steam Store Database and ingest data into Custom Database

**Usage**:

```console
$ steamstore fetch_steamstore_data [OPTIONS]
```

**Options**:

- `--batch-size INTEGER`: Number of app IDs to process in each batch.  [default: 5]
- `--bulk-factor INTEGER`: Factor to determine when to perform a bulk insert (batch_size * bulk_factor).  [default: 10]
- `--reverse / --no-reverse`: Process app IDs in reverse order.  [default: no-reverse]
- `--help`: Show this message and exit.
     
# Setup Instructions
## Development Setup

For development purposes, you might need to have additional dependencies and tools:

1. **Clone the repository**:
   ```bash
   git clone https://github.com/DataForgeOpenAIHub/Steam-Sales-Analysis.git
   cd steam-sales-analysis
   ```

2. **Create a virtual environment**:
   - Using `venv`:
     ```bash
     python -m venv game
     source game/bin/activate  # On Windows use `game\Scripts\activate`
     ```
   - Using `conda`:
     ```bash
     conda env create -f environment.yml
     conda activate game
     ```

3. **Install dependencies**:
   - Install general dependencies:
     ```bash
     pip install -r requirements.txt
     ```
   - Install development dependencies:
     ```bash
     pip install -r dev-requirements.txt
     ```

4. **Configuration**:
   - Create an `.env` file in the root directory of the repository.
   - Add the following variables to the `.env` file:
     ```ini
     # Database configuration
     MYSQL_USERNAME=<your_mysql_username>
     MYSQL_PASSWORD=<your_mysql_password>
     MYSQL_HOST=<your_mysql_host>
     MYSQL_PORT=<your_mysql_port>
     MYSQL_DB_NAME=<your_mysql_db_name>
     ```

## Database Integration

The project connects to a MySQL database hosted on `Aiven Cloud` using the credentials provided in the `.env` file. Ensure that the database is properly set up and accessible with the provided credentials.

## Running Individual Parts of the ETL Pipeline
To execute the ETL pipeline, use the following commands:

1. **To collect metadata:**
   ```bash
   steamstore fetch_steamspy_metadata
   ```

2. **To collect SteamSpy data:**
   ```bash
   steamstore fetch_steamspy_data --batch-size 1000
   ```

3. **To collect Steam data:**
   ```bash
   steamstore fetch_steamstore_data --batch-size 5 --bulk-factor 10
   ```

4. **To clean Steam data:**
   ```bash
   steamstore clean_steam_data --batch-size 1000
   ```

This will start the process of retrieving data from the Steamspy and Steam APIs, processing and validating it, and then loading it into the MySQL database.

## References:

### API Used:

- [Steamspy API](https://steamspy.com/api.php)
- [Steam Store API - InternalSteamWebAPI](https://github.com/Revadike/InternalSteamWebAPI/wiki)
- [Steam Web API Documentation](https://steamapi.xpaw.me/#)
- [RJackson/StorefrontAPI Documentation](https://wiki.teamfortress.com/wiki/User:RJackson/StorefrontAPI)
- [Steamworks Web API Reference](https://partner.steamgames.com/doc/webapi)

### Repository

- [Nik Davis's Steam Data Science Project](https://github.com/nik-davis/steam-data-science-project)

---

#### LICENSE

This repository is licensed under the `MIT License`. See the [LICENSE](LICENSE) file for details.

#### Disclaimer

<sub>
The content and code provided in this repository are for educational and demonstrative purposes only. The project may contain experimental features, and the code might not be optimized for production environments. The authors and contributors are not liable for any misuse, damages, or risks associated with the direct or indirect use of this code. Users are strictly advised to review, test, and completely modify the code to suit their specific use cases and requirements. By using any part of this project, you agree to these terms.
</sub>


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/DataForgeOpenAIHub/Steam-Sales-Analysis",
    "name": "steamstore-etl",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "steam, etl, data-pipeline, cli, python",
    "author": "DataForgeOpenAIHub",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/8c/1c/24836ff5c6824dfbc6f363ce33fdead4860e09f0c4d6b7d3fa6490383aed/steamstore_etl-0.0.15.tar.gz",
    "platform": null,
    "description": "# Steam Sales Analysis\n\n![banner](assets/imgs/steam_logo_banner.jpg)\n\n## Overview\nWelcome to **Steam Sales Analysis** \u2013 an innovative project designed to harness the power of data for insights into the gaming world. We have meticulously crafted an ETL (Extract, Transform, Load) pipeline that covers every essential step: data retrieval, processing, validation, and ingestion. By leveraging the robust Steamspy and Steam APIs, we collect comprehensive game-related metadata, details, and sales figures.\n\nBut we don\u2019t stop there. The culmination of this data journey sees the information elegantly loaded into a MySQL database hosted on Aiven Cloud. From this solid foundation, we take it a step further: the data is analyzed and visualized through dynamic and interactive Tableau dashboards. This transforms raw numbers into actionable insights, offering a clear window into gaming trends and sales performance. Join us as we dive deep into the data and bring the world of gaming to life!\n\n# `steamstore` CLI\n![Steamstore ETL Pipeline](assets/imgs/steamstore-etl.drawio.png)\n\n## Setup\n### Installing the package\nFor general use, setting up the environment and dependencies is straightforward:\n\n```bash\n# Install the python distribution from PyPI\npip install steamstore-etl\n```\n\n### Setting up the environment variables\n- Create an `.env` file in a directory.\n```ini\n# Database configuration\nMYSQL_USERNAME=<your_mysql_username>\nMYSQL_PASSWORD=<your_mysql_password>\nMYSQL_HOST=<your_mysql_host>\nMYSQL_PORT=<your_mysql_port>\nMYSQL_DB_NAME=<your_mysql_db_name>\n```\n- Open a terminal at the specified location\n\n   #### For Ubuntu (or other Unix-like systems)\n\n   1. **Load `.env` Variables into the Terminal**\n\n      To load the variables from the `.env` file into your current terminal session, you can use the `export` command along with the `dotenv` command if you have the `dotenv` utility installed. \n\n      **Using `export` directly (manual method):**\n\n      ```bash\n      export $(grep -v '^#' .env | xargs)\n      ```\n\n      - `grep -v '^#' .env` removes any comments from the file.\n      - `xargs` converts the output into environment variable export commands.\n\n      **Using `dotenv` (requires installation):**\n\n      If you prefer a tool, you can use `dotenv`:\n\n      - Install `dotenv` if you don't have it:\n\n      ```bash\n      sudo apt-get install python3-dotenv\n      ```\n\n      - Then, use the following command to load the `.env` file:\n\n      ```bash\n      dotenv\n      ```\n\n      **Using `source` (not typical for `.env` but useful for `.sh` files):**\n\n      If your `.env` file is simple, you can use `source` directly (this method assumes no special parsing is needed):\n\n      ```bash\n      source .env\n      ```\n\n      Note that `source` works well if your `.env` file only contains simple `KEY=VALUE` pairs.\n\n   2. **Verify the Variables**\n\n      After loading, you can check that the environment variables are set:\n\n      ```bash\n      echo $MYSQL_USERNAME\n      ```\n\n   #### For Windows\n\n   1. **Load `.env` Variables into PowerShell**\n\n      You can use a PowerShell script to load the variables from the `.env` file.\n\n      **Create a PowerShell script (e.g., `load_env.ps1`):**\n\n      ```powershell\n      Get-Content .env | ForEach-Object {\n         if ($_ -match \"^(.*?)=(.*)$\") {\n            [System.Environment]::SetEnvironmentVariable($matches[1], $matches[2], [System.EnvironmentVariableTarget]::Process)\n         }\n      }\n      ```\n\n      - This script reads each line from the `.env` file and sets it as an environment variable for the current PowerShell session.\n\n      **Run the script:**\n\n      ```powershell\n      .\\load_env.ps1\n      ```\n\n      **Verify the Variables:**\n\n      ```powershell\n      echo $env:MYSQL_USERNAME\n      ```\n\n   2. **Load `.env` Variables into Command Prompt**\n\n      The Command Prompt does not have built-in support for `.env` files. You can use a batch script to achieve this.\n\n      **Create a batch script (e.g., `load_env.bat`):**\n\n      ```batch\n      @echo off\n      for /f \"tokens=1,2 delims==\" %%A in (.env) do set %%A=%%B\n      ```\n\n      **Run the batch script:**\n\n      ```batch\n      load_env.bat\n      ```\n\n      **Verify the Variables:**\n\n      ```batch\n      echo %MYSQL_USERNAME%\n      ```\n\n## CLI for Steam Store Data Ingestion ETL Pipeline\n\n**Usage**:\n\n```console\n$ steamstore [OPTIONS] COMMAND [ARGS]...\n```\n\n**Options**:\n\n- `--install-completion`: Install completion for the current shell.\n- `--show-completion`: Show completion for the current shell, to copy it or customize the installation.\n- `--help`: Show this message and exit.\n\n**Commands**:\n\n- `clean_steam_data`: Clean the Steam Data and ingest into the Custom Database\n- `fetch_steamspy_data`: Fetch from SteamSpy Database and ingest data into Custom Database\n- `fetch_steamspy_metadata`: Fetch metadata from SteamSpy Database and ingest metadata into Custom Database\n- `fetch_steamstore_data`: Fetch from Steam Store Database and ingest data into Custom Database\n\n## Detailed Command Usage\n### `steamstore clean_steam_data`\n\nClean the Steam Data and ingest into the Custom Database\n\n**Usage**:\n\n```console\n$ steamstore clean_steam_data [OPTIONS]\n```\n\n**Options**:\n\n- `--batch-size INTEGER`: Number of records to process in each batch.  [default: 1000]\n- `--help`: Show this message and exit.\n\n### `steamstore fetch_steamspy_data`\n\nFetch from SteamSpy Database and ingest data into Custom Database\n\n**Usage**:\n\n```console\n$ steamstore fetch_steamspy_data [OPTIONS]\n```\n\n**Options**:\n\n- `--batch-size INTEGER`: Number of records to process in each batch.  [default: 1000]\n- `--help`: Show this message and exit.\n\n### `steamstore fetch_steamspy_metadata`\n\nFetch metadata from SteamSpy Database and ingest metadata into Custom Database\n\n**Usage**:\n\n```console\n$ steamstore fetch_steamspy_metadata [OPTIONS]\n```\n\n**Options**:\n\n- `--max-pages INTEGER`: Number of pages to fetch from.  [default: 100]\n- `--help`: Show this message and exit.\n\n### `steamstore fetch_steamstore_data`\n\nFetch from Steam Store Database and ingest data into Custom Database\n\n**Usage**:\n\n```console\n$ steamstore fetch_steamstore_data [OPTIONS]\n```\n\n**Options**:\n\n- `--batch-size INTEGER`: Number of app IDs to process in each batch.  [default: 5]\n- `--bulk-factor INTEGER`: Factor to determine when to perform a bulk insert (batch_size * bulk_factor).  [default: 10]\n- `--reverse / --no-reverse`: Process app IDs in reverse order.  [default: no-reverse]\n- `--help`: Show this message and exit.\n     \n# Setup Instructions\n## Development Setup\n\nFor development purposes, you might need to have additional dependencies and tools:\n\n1. **Clone the repository**:\n   ```bash\n   git clone https://github.com/DataForgeOpenAIHub/Steam-Sales-Analysis.git\n   cd steam-sales-analysis\n   ```\n\n2. **Create a virtual environment**:\n   - Using `venv`:\n     ```bash\n     python -m venv game\n     source game/bin/activate  # On Windows use `game\\Scripts\\activate`\n     ```\n   - Using `conda`:\n     ```bash\n     conda env create -f environment.yml\n     conda activate game\n     ```\n\n3. **Install dependencies**:\n   - Install general dependencies:\n     ```bash\n     pip install -r requirements.txt\n     ```\n   - Install development dependencies:\n     ```bash\n     pip install -r dev-requirements.txt\n     ```\n\n4. **Configuration**:\n   - Create an `.env` file in the root directory of the repository.\n   - Add the following variables to the `.env` file:\n     ```ini\n     # Database configuration\n     MYSQL_USERNAME=<your_mysql_username>\n     MYSQL_PASSWORD=<your_mysql_password>\n     MYSQL_HOST=<your_mysql_host>\n     MYSQL_PORT=<your_mysql_port>\n     MYSQL_DB_NAME=<your_mysql_db_name>\n     ```\n\n## Database Integration\n\nThe project connects to a MySQL database hosted on `Aiven Cloud` using the credentials provided in the `.env` file. Ensure that the database is properly set up and accessible with the provided credentials.\n\n## Running Individual Parts of the ETL Pipeline\nTo execute the ETL pipeline, use the following commands:\n\n1. **To collect metadata:**\n   ```bash\n   steamstore fetch_steamspy_metadata\n   ```\n\n2. **To collect SteamSpy data:**\n   ```bash\n   steamstore fetch_steamspy_data --batch-size 1000\n   ```\n\n3. **To collect Steam data:**\n   ```bash\n   steamstore fetch_steamstore_data --batch-size 5 --bulk-factor 10\n   ```\n\n4. **To clean Steam data:**\n   ```bash\n   steamstore clean_steam_data --batch-size 1000\n   ```\n\nThis will start the process of retrieving data from the Steamspy and Steam APIs, processing and validating it, and then loading it into the MySQL database.\n\n## References:\n\n### API Used:\n\n- [Steamspy API](https://steamspy.com/api.php)\n- [Steam Store API - InternalSteamWebAPI](https://github.com/Revadike/InternalSteamWebAPI/wiki)\n- [Steam Web API Documentation](https://steamapi.xpaw.me/#)\n- [RJackson/StorefrontAPI Documentation](https://wiki.teamfortress.com/wiki/User:RJackson/StorefrontAPI)\n- [Steamworks Web API Reference](https://partner.steamgames.com/doc/webapi)\n\n### Repository\n\n- [Nik Davis's Steam Data Science Project](https://github.com/nik-davis/steam-data-science-project)\n\n---\n\n#### LICENSE\n\nThis repository is licensed under the `MIT License`. See the [LICENSE](LICENSE) file for details.\n\n#### Disclaimer\n\n<sub>\nThe content and code provided in this repository are for educational and demonstrative purposes only. The project may contain experimental features, and the code might not be optimized for production environments. The authors and contributors are not liable for any misuse, damages, or risks associated with the direct or indirect use of this code. Users are strictly advised to review, test, and completely modify the code to suit their specific use cases and requirements. By using any part of this project, you agree to these terms.\n</sub>\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "CLI for Steam Store Data Ingestion ETL Pipeline",
    "version": "0.0.15",
    "project_urls": {
        "Homepage": "https://github.com/DataForgeOpenAIHub/Steam-Sales-Analysis"
    },
    "split_keywords": [
        "steam",
        " etl",
        " data-pipeline",
        " cli",
        " python"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a42f2a1364975bbe863cbc5f5de97556493aa95f7c06f91fa95c10dac8efcf95",
                "md5": "92dc78d159d19e8e28dc68083d2aaee6",
                "sha256": "55174ab5e04135ff82cd08fe8bcebc5bf9c4f394a01e6a5ba84dbe01a401385b"
            },
            "downloads": -1,
            "filename": "steamstore_etl-0.0.15-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "92dc78d159d19e8e28dc68083d2aaee6",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 26582,
            "upload_time": "2024-08-29T05:53:53",
            "upload_time_iso_8601": "2024-08-29T05:53:53.606789Z",
            "url": "https://files.pythonhosted.org/packages/a4/2f/2a1364975bbe863cbc5f5de97556493aa95f7c06f91fa95c10dac8efcf95/steamstore_etl-0.0.15-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8c1c24836ff5c6824dfbc6f363ce33fdead4860e09f0c4d6b7d3fa6490383aed",
                "md5": "ff988e085e58684202bde6c2f86edff4",
                "sha256": "f17114c8a8011570f543c77aee2cd8b196919f02f2f21f6449e1c9c55781ab7c"
            },
            "downloads": -1,
            "filename": "steamstore_etl-0.0.15.tar.gz",
            "has_sig": false,
            "md5_digest": "ff988e085e58684202bde6c2f86edff4",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 1214368,
            "upload_time": "2024-08-29T05:53:55",
            "upload_time_iso_8601": "2024-08-29T05:53:55.469395Z",
            "url": "https://files.pythonhosted.org/packages/8c/1c/24836ff5c6824dfbc6f363ce33fdead4860e09f0c4d6b7d3fa6490383aed/steamstore_etl-0.0.15.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-29 05:53:55",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "DataForgeOpenAIHub",
    "github_project": "Steam-Sales-Analysis",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "steamstore-etl"
}
        
Elapsed time: 0.47108s