Name | alcedo-pdbc JSON |
Version |
0.1.3b0
JSON |
| download |
home_page | None |
Summary | Add your description here |
upload_time | 2025-08-11 05:56:22 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.9 |
license | None |
keywords |
pdbc
sql
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# ALcedo PDBC 项目简介
## 一、概述
ALcedo PDBC ® (Python DataBase Connectivity) 是数智教育发展(山东)有限公司 AI Lab 100 团队开发的高效、灵活的数据接口(API)
,它能够使您以最快、最节省内存的方式将数据从数据库加载到 Python 中,它旨在简化大数据访问和处理,提供一种统一的方式来与各种数据源进行交互,加速
ML 和 ETL 开发过程。
ALcedo PDBC ® 与 JDBC、ODBC 类似,专注于为Python应用程序、AI模型提供访问数据的编程接口。它支持主流的
RMDBS (关系型数据库)、NoSQL (
非结构化数据库)、DataLake (数据湖) 和 Data
Warehouse (数据仓库) ,包括但不限于 MySQL、SQL SERVER、SQLite、ORACLE、MariaDB
、PostgreSQL、MongoDB、Redis、Elasticsearch、MinIO、Amazon
S3、Google Cloud Storage (GCS)等。
## 二、技术特点及应用场景
**ALcedo PDBC 具有以下技术特点:**
1. 兼容性广泛: 支持多种数据库系统和文件存储服务,包括 RMDBS 关系型数据库和 NoSQL 数据库,并且兼容 MinIO 和 Flink 等大数据处理框架。
2. 接口简明易用: 将复杂的技术细节封装在内部,对 RMDBS 而言,接口统一、简洁明了。
3. 性能优化: ALcedo PDBC 使用了缓存及多线程等技术,以提高查询速度和资源利用率。通过减少不必要的网络通信和提升批处理能力,性能比传统连接器有显著的提升。
4. 动态代码生成: ALcedo PDBC 可以在运行时动态生成优化的查询执行代码,这种做法既保留了灵活性,又保证了效率。
5. 灵活配置: 允许用户自定义连接参数,满足不同环境和业务需求下的定制化要求。
**ALcedo PDBC 的应用场景包括不限于以下:**
- 大数据分析: 在Python中进行大规模数据处理和分析,利用 ALcedo PDBC 可大大提升查询速度。
- AI模型开发: 在Python中进行AI模型开发过程中,利用 ALcedo PDBC 可快速的实现多源数据读取,为AI模型提供实时的数据流处理;
- 数据集成: 可实现跨数据库迁移数据,或者将不同来源的数据集成到同一个处理平台。
- 云存储访问: 方便地读写云存储上的文件,如azure S3,提升云计算场景下的数据操作效率。
## 三、数据源及输出
ALcedo PDBC 模块中通过mysql、nosql、datalake、datawarehouse 类的封装,实现了不同数据源接口的支持;
数据源:
- [x] Mysql
- [x] SQL Server
- [x] PostgreSQL
- [x] Oracle
- [x] MariaDB
- [x] SQLite
- [x] MongoDB
- [x] Elasticsearch
- [x] Redis
- [x] DynamoDB
- [x] MinIO
- [x] Amazon S3
- [x] Google Cloud Storage (GCS)
- [x] Microsoft AzureBlob
- [x] Doris
- [x] SnowFlake
- [x] BigQuery
- [x] Redshift
- [x] StarRocks
- [ ] ...
输出 DataFrame:
- [x] Pandas
- [x] Polars
- [x] Dask
输出 File:
- [x] CSV
- [x] Excel
- [x] JSON
- [x] HTML
- [x] HDF5
- [x] Feather
- [x] Parquet
- [x] Apache Avro
<table>
<tr>
<th rowspan="2">类型</th>
<th style="min-width: 120px;" rowspan="2">数据源</th>
<th colspan="3"> DataFrame数据框 </th>
<th style="min-width: 160px;" rowspan="2" > File文件 </th>
<th rowspan="2" >备注</th>
</tr>
<tr>
<th >Pandas</th>
<th >Polars</th>
<th >Dask</th>
</tr>
<tr>
<td rowspan="6">结构化SQL</td>
<td>MySQL</td>
<td>✅ read<br>✅ write </td>
<td>✅ read<br>✅ write </td>
<td>✅ read<br> ✖ write </td>
<td>✅ CSV ✅ Excel <br> ✅ JSON ✅ HTML <br> ✅ HDF5 ✅ Feather <br> ✅ Parquet ✖ Apache Avro </td>
<td></td>
</tr>
<tr>
<td>SQL SERVER</td>
<td>✅ read<br>✅ write </td>
<td>✅ read<br>✅ write </td>
<td>✅ read<br> ✖ write </td>
<td>✅ CSV ✅ Excel <br> ✅ JSON ✅ HTML <br> ✅ HDF5 ✅ Feather <br> ✅ Parquet ✖ Apache Avro </td>
<td></td>
</tr>
<tr>
<td>PostgreSQL</td>
<td>✅ read<br>✅ write </td>
<td>✅ read<br>✅ write </td>
<td>✅ read<br> ✖ write </td>
<td>✅ CSV ✅ Excel <br> ✅ JSON ✅ HTML <br> ✅ HDF5 ✅ Feather <br> ✅ Parquet ✖ Apache Avro </td>
<td></td>
</tr>
<tr>
<td>Oracle</td>
<td>✅ read<br>✅ write </td>
<td>✅ read<br>✅ write </td>
<td>✅ read<br> ✖ write </td>
<td>✅ CSV ✅ Excel <br> ✅ JSON ✅ HTML <br> ✅ HDF5 ✅ Feather <br> ✅ Parquet ✖ Apache Avro </td>
<td></td>
</tr>
<tr>
<td>MariaDB</td>
<td>✅ read<br>✅ write </td>
<td>✅ read<br>✅ write </td>
<td>✅ read<br> ✖ write </td>
<td>✅ CSV ✅ Excel <br> ✅ JSON ✅ HTML <br> ✅ HDF5 ✅ Feather <br> ✅ Parquet ✖ Apache Avro </td>
<td></td>
</tr>
<tr>
<td>SQLite</td>
<td>✅ read<br>✅ write </td>
<td>✅ read<br>✅ write </td>
<td>✅ read<br> ✖ write </td>
<td>✅ CSV ✅ Excel <br> ✅ JSON ✅ HTML <br> ✅ HDF5 ✅ Feather <br> ✅ Parquet ✖ Apache Avro </td>
<td></td>
</tr>
<tr>
<td rowspan="4">非结构化noSQL</td>
<td>MongoDB</td>
<td>✅ read<br>✅ write </td>
<td>✅ read<br>✅ write</td>
<td> ✖ </td>
<td>✅ CSV ✅ Excel <br> ✖ JSON ✅ HTML <br> ✖ HDF5 ✖ Feather <br> ✖ Parquet ✖ Apache Avro </td>
</tr>
<tr>
<td>ElasticSearch</td>
<td>✅ read<br>✅ write </td>
<td>✅ read<br> ✖ write</td>
<td> ✖ </td>
<td>✅ CSV ✅ Excel <br> ✅ JSON ✅ HTML <br> ✅ HDF5 ✅ Feather <br> ✅ Parquet ✖ Apache Avro </td>
</tr>
<tr>
<td>Redis</td>
<td>✅ read </td>
<td> ✖ </td>
<td> ✖ </td>
<td>✅ CSV ✅ Excel <br> ✅ JSON ✅ HTML <br> ✅ HDF5 ✅ Feather <br> ✅ Parquet ✖ Apache Avro </td>
<td> </td>
</tr>
<tr>
<td>DynamoDB</td>
<td>✅ read<br>✅ write </td>
<td>✅ read<br>✅ write</td>
<td> ✖ </td>
<td>✅ CSV ✅ Excel <br> ✅ JSON ✅ HTML <br> ✅ HDF5 ✅ Feather <br> ✅ Parquet ✖ Apache Avro </td>
<td> </td>
</tr>
<tr>
<td rowspan="4">datalake</td>
<td> MinIO </td>
<td>✅ read<br>✅ write </td>
<td>✅ read<br>✅ write</td>
<td>✅ read<br> ✖ write</td>
<td>✅ CSV ✅ Excel <br> ✅ JSON ✅ HTML <br> ✅ HDF5 ✅ Feather <br> ✅ Parquet ✖ Apache Avro </td>
<td>✅ read<br> ✖ write</td>
</tr>
<tr>
<td> S3 </td>
<td>✅ read<br>✅ write </td>
<td>✅ read<br>✅ write</td>
<td>✅ read<br> ✖ write</td>
<td>✅ CSV ✅ Excel <br> ✅ JSON ✅ HTML <br> ✅ HDF5 ✅ Feather <br> ✅ Parquet ✖ Apache Avro </td>
<td>✅ read<br> ✖ write</td>
</tr>
<tr>
<td> GCS </td>
<td>✅ read<br>✅ write </td>
<td>✅ read<br>✅ write</td>
<td>✅ read<br> ✖ write</td>
<td>✅ CSV ✅ Excel <br> ✅ JSON ✅ HTML <br> ✅ HDF5 ✅ Feather <br> ✅ Parquet ✖ Apache Avro </td>
<td>✅ read<br> ✖ write</td>
</tr>
<tr>
<td> AzureBlob </td>
<td>✅ read<br>✅ write </td>
<td>✅ read<br>✅ write</td>
<td>✅ read<br> ✖ write</td>
<td>✅ CSV ✅ Excel <br> ✅ JSON ✅ HTML <br> ✅ HDF5 ✅ Feather <br> ✅ Parquet ✖ Apache Avro </td>
<td>✅ read<br> ✖ write</td>
</tr>
<tr>
<td rowspan="5">dataware house</td>
<td> Doris </td>
<td>✅ read<br>✅ write </td>
<td>✅ read<br>✅ write</td>
<td>✅ read<br> ✖ write</td>
<td>✅ CSV ✅ Excel <br> ✅ JSON ✅ HTML <br> ✅ HDF5 ✅ Feather <br> ✅ Parquet ✖ Apache Avro </td>
<td>✅ read<br> ✖ write</td>
</tr>
<tr>
<td> SnowFlake </td>
<td>✅ read<br>✅ write </td>
<td>✅ read<br>✅ write</td>
<td>✅ read<br> ✖ write</td>
<td>✅ CSV ✅ Excel <br> ✅ JSON ✅ HTML <br> ✅ HDF5 ✅ Feather <br> ✅ Parquet ✖ Apache Avro </td>
<td>✅ read<br> ✖ write</td>
</tr>
<tr>
<td> BigQuery </td>
<td>✅ read<br>✅ write </td>
<td>✅ read<br>✅ write</td>
<td>✅ read<br> ✖ write</td>
<td>✅ CSV ✅ Excel <br> ✅ JSON ✅ HTML <br> ✅ HDF5 ✅ Feather <br> ✅ Parquet ✖ Apache Avro </td>
<td>✅ read<br> ✖ write</td>
</tr>
<tr>
<td> Redshift </td>
<td>✅ read<br>✅ write </td>
<td>✅ read<br>✅ write</td>
<td>✅ read<br> ✖ write</td>
<td>✅ CSV ✅ Excel <br> ✅ JSON ✅ HTML <br> ✅ HDF5 ✅ Feather <br> ✅ Parquet ✖ Apache Avro </td>
<td>✅ read<br> ✖ write</td>
</tr>
<tr>
<td> StarRocks </td>
<td>✅ read<br>✅ write </td>
<td>✅ read<br>✅ write</td>
<td>✅ read<br> ✖ write</td>
<td>✅ CSV ✅ Excel <br> ✅ JSON ✅ HTML <br> ✅ HDF5 ✅ Feather <br> ✅ Parquet ✖ Apache Avro </td>
<td>✅ read<br> ✖ write</td>
</tr>
</table>
> 备注:导出xlsx需要openpyxl ;Parquet列式存储数据文件 ; Feather压缩二进制文件。
## 四、快速启动
可通过 pip 安装 ALcedo PDBC,考虑到国内镜像源更新问题,安装时指定从PyPI官方源下载。
```bash
pip install --index-url https://pypi.org/simple/ alcedo-pdbc
```
以MySQL为例,您仅需要几行代码:
```python
# 在 AI Lab 100 中通过ailab100.pdbc导入
# from ailab100.pdbc.sql import MySQL
from alcedo_pdbc.sql import MySQL
db_mysql = MySQL()
df = db_mysql.read_as_dataframe(table_name="public_rent_price_forecast_data",
params={"houseFloor":"低","totalFloor":2},
return_type='polars'
)
```
或者,您可以通过多线程加速数据加载。
```python
# 在 AI Lab 100 中通过ailab100.pdbc导入
# from ailab100.pdbc.datalake import MinIO
from alcedo_pdbc.datalake import MinIO
minio_client = MinIO('127.0.0.1:9000',
access_key='Q3AM3UQ867SPQQA43P2F',
secret_key='zuf+tfteSlswRu7BJ86wekitnifILbZam1KYY3TG'
)
minio_client.download_file(minio_path="s3://datalake/datasets/3财务困境研究数据集/ST财务预警.csv",num_threads=4)
```
该函数将通过将指定的线程数平均拆分为分区数来对查询进行分区,并为每个分区分配一个线程来并行加载和写入数据。
## 五、性能
实验室通过比较 Python 中的不同解决方案,采用 4 个线程并行处理,读取 MySQL 中一个 10,981,106 行的数据表 (1,092,616,192
字节,1.02 GB) 加载到 DataFrame 中,实验结果如下:
- **1.响应时间 (越短越好)**

- **2.内存消耗 (越低越好)**

总之,ALcedo PDBC 使用的内存减少了 1/3 ,与 Pandas 相比响应时间减少了近 1 倍(与 Polars 相比,响应时间相差无几 )。
## 六、生态系统
<div style="display: flex;align-items: center ">
<img src="./images/polars_logo.png"
style="margin-bottom: -2px;height: 30px" />
<img src="./images/pandas_logo.png"
style="margin-bottom: -2px;height: 50px"/>
</div>
## 七、版本更新
### 0.1.2b0
- 此版本增加了redis缓存配置,提高数据读取速度;
- 可通过系统环境变量定义各类数据库的连接信息;
### 0.1.3b0
- 增加命令行界面 CLI和docs文档集成,可通过alcedo-cli docs启动文档;
- 增加了自动化编译打包的脚本文件,支持
- 增加了各类数据库docker-compose部署模板,详见samples_docker目录中的docker-compose-db.yml
---
Raw data
{
"_id": null,
"home_page": null,
"name": "alcedo-pdbc",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "pdbc, sql",
"author": null,
"author_email": "AI Lab 100 <ailab100@163.com>",
"download_url": null,
"platform": null,
"description": "\r\n\r\n\r\n \r\n# ALcedo PDBC \u9879\u76ee\u7b80\u4ecb\r\n\r\n## \u4e00\u3001\u6982\u8ff0\r\n\r\nALcedo PDBC ® (Python DataBase Connectivity) \u662f\u6570\u667a\u6559\u80b2\u53d1\u5c55(\u5c71\u4e1c)\u6709\u9650\u516c\u53f8 AI Lab 100 \u56e2\u961f\u5f00\u53d1\u7684\u9ad8\u6548\u3001\u7075\u6d3b\u7684\u6570\u636e\u63a5\u53e3(API)\r\n\uff0c\u5b83\u80fd\u591f\u4f7f\u60a8\u4ee5\u6700\u5feb\u3001\u6700\u8282\u7701\u5185\u5b58\u7684\u65b9\u5f0f\u5c06\u6570\u636e\u4ece\u6570\u636e\u5e93\u52a0\u8f7d\u5230 Python \u4e2d\uff0c\u5b83\u65e8\u5728\u7b80\u5316\u5927\u6570\u636e\u8bbf\u95ee\u548c\u5904\u7406\uff0c\u63d0\u4f9b\u4e00\u79cd\u7edf\u4e00\u7684\u65b9\u5f0f\u6765\u4e0e\u5404\u79cd\u6570\u636e\u6e90\u8fdb\u884c\u4ea4\u4e92\uff0c\u52a0\u901f\r\nML \u548c ETL \u5f00\u53d1\u8fc7\u7a0b\u3002\r\n\r\nALcedo PDBC ® \u4e0e JDBC\u3001ODBC \u7c7b\u4f3c\uff0c\u4e13\u6ce8\u4e8e\u4e3aPython\u5e94\u7528\u7a0b\u5e8f\u3001AI\u6a21\u578b\u63d0\u4f9b\u8bbf\u95ee\u6570\u636e\u7684\u7f16\u7a0b\u63a5\u53e3\u3002\u5b83\u652f\u6301\u4e3b\u6d41\u7684\r\nRMDBS (\u5173\u7cfb\u578b\u6570\u636e\u5e93)\u3001NoSQL (\r\n\u975e\u7ed3\u6784\u5316\u6570\u636e\u5e93)\u3001DataLake (\u6570\u636e\u6e56) \u548c Data\r\nWarehouse (\u6570\u636e\u4ed3\u5e93) \uff0c\u5305\u62ec\u4f46\u4e0d\u9650\u4e8e MySQL\u3001SQL SERVER\u3001SQLite\u3001ORACLE\u3001MariaDB\r\n\u3001PostgreSQL\u3001MongoDB\u3001Redis\u3001Elasticsearch\u3001MinIO\u3001Amazon\r\nS3\u3001Google Cloud Storage (GCS)\u7b49\u3002\r\n\r\n## \u4e8c\u3001\u6280\u672f\u7279\u70b9\u53ca\u5e94\u7528\u573a\u666f\r\n\r\n**ALcedo PDBC \u5177\u6709\u4ee5\u4e0b\u6280\u672f\u7279\u70b9\uff1a**\r\n\r\n1. \u517c\u5bb9\u6027\u5e7f\u6cdb\uff1a \u652f\u6301\u591a\u79cd\u6570\u636e\u5e93\u7cfb\u7edf\u548c\u6587\u4ef6\u5b58\u50a8\u670d\u52a1\uff0c\u5305\u62ec RMDBS \u5173\u7cfb\u578b\u6570\u636e\u5e93\u548c NoSQL \u6570\u636e\u5e93\uff0c\u5e76\u4e14\u517c\u5bb9 MinIO \u548c Flink \u7b49\u5927\u6570\u636e\u5904\u7406\u6846\u67b6\u3002\r\n\r\n2. \u63a5\u53e3\u7b80\u660e\u6613\u7528\uff1a \u5c06\u590d\u6742\u7684\u6280\u672f\u7ec6\u8282\u5c01\u88c5\u5728\u5185\u90e8\uff0c\u5bf9 RMDBS \u800c\u8a00\uff0c\u63a5\u53e3\u7edf\u4e00\u3001\u7b80\u6d01\u660e\u4e86\u3002\r\n\r\n3. \u6027\u80fd\u4f18\u5316\uff1a ALcedo PDBC \u4f7f\u7528\u4e86\u7f13\u5b58\u53ca\u591a\u7ebf\u7a0b\u7b49\u6280\u672f\uff0c\u4ee5\u63d0\u9ad8\u67e5\u8be2\u901f\u5ea6\u548c\u8d44\u6e90\u5229\u7528\u7387\u3002\u901a\u8fc7\u51cf\u5c11\u4e0d\u5fc5\u8981\u7684\u7f51\u7edc\u901a\u4fe1\u548c\u63d0\u5347\u6279\u5904\u7406\u80fd\u529b\uff0c\u6027\u80fd\u6bd4\u4f20\u7edf\u8fde\u63a5\u5668\u6709\u663e\u8457\u7684\u63d0\u5347\u3002\r\n\r\n4. \u52a8\u6001\u4ee3\u7801\u751f\u6210\uff1a ALcedo PDBC \u53ef\u4ee5\u5728\u8fd0\u884c\u65f6\u52a8\u6001\u751f\u6210\u4f18\u5316\u7684\u67e5\u8be2\u6267\u884c\u4ee3\u7801\uff0c\u8fd9\u79cd\u505a\u6cd5\u65e2\u4fdd\u7559\u4e86\u7075\u6d3b\u6027\uff0c\u53c8\u4fdd\u8bc1\u4e86\u6548\u7387\u3002\r\n\r\n5. \u7075\u6d3b\u914d\u7f6e\uff1a \u5141\u8bb8\u7528\u6237\u81ea\u5b9a\u4e49\u8fde\u63a5\u53c2\u6570\uff0c\u6ee1\u8db3\u4e0d\u540c\u73af\u5883\u548c\u4e1a\u52a1\u9700\u6c42\u4e0b\u7684\u5b9a\u5236\u5316\u8981\u6c42\u3002\r\n\r\n**ALcedo PDBC \u7684\u5e94\u7528\u573a\u666f\u5305\u62ec\u4e0d\u9650\u4e8e\u4ee5\u4e0b\uff1a**\r\n\r\n- \u5927\u6570\u636e\u5206\u6790\uff1a \u5728Python\u4e2d\u8fdb\u884c\u5927\u89c4\u6a21\u6570\u636e\u5904\u7406\u548c\u5206\u6790\uff0c\u5229\u7528 ALcedo PDBC \u53ef\u5927\u5927\u63d0\u5347\u67e5\u8be2\u901f\u5ea6\u3002\r\n\r\n- AI\u6a21\u578b\u5f00\u53d1\uff1a \u5728Python\u4e2d\u8fdb\u884cAI\u6a21\u578b\u5f00\u53d1\u8fc7\u7a0b\u4e2d\uff0c\u5229\u7528 ALcedo PDBC \u53ef\u5feb\u901f\u7684\u5b9e\u73b0\u591a\u6e90\u6570\u636e\u8bfb\u53d6\uff0c\u4e3aAI\u6a21\u578b\u63d0\u4f9b\u5b9e\u65f6\u7684\u6570\u636e\u6d41\u5904\u7406\uff1b\r\n\r\n- \u6570\u636e\u96c6\u6210\uff1a \u53ef\u5b9e\u73b0\u8de8\u6570\u636e\u5e93\u8fc1\u79fb\u6570\u636e\uff0c\u6216\u8005\u5c06\u4e0d\u540c\u6765\u6e90\u7684\u6570\u636e\u96c6\u6210\u5230\u540c\u4e00\u4e2a\u5904\u7406\u5e73\u53f0\u3002\r\n\r\n- \u4e91\u5b58\u50a8\u8bbf\u95ee\uff1a \u65b9\u4fbf\u5730\u8bfb\u5199\u4e91\u5b58\u50a8\u4e0a\u7684\u6587\u4ef6\uff0c\u5982azure S3\uff0c\u63d0\u5347\u4e91\u8ba1\u7b97\u573a\u666f\u4e0b\u7684\u6570\u636e\u64cd\u4f5c\u6548\u7387\u3002\r\n\r\n## \u4e09\u3001\u6570\u636e\u6e90\u53ca\u8f93\u51fa\r\n\r\n ALcedo PDBC \u6a21\u5757\u4e2d\u901a\u8fc7mysql\u3001nosql\u3001datalake\u3001datawarehouse \u7c7b\u7684\u5c01\u88c5\uff0c\u5b9e\u73b0\u4e86\u4e0d\u540c\u6570\u636e\u6e90\u63a5\u53e3\u7684\u652f\u6301\uff1b\r\n\r\n\u6570\u636e\u6e90:\r\n\r\n- [x] Mysql\r\n- [x] SQL Server\r\n- [x] PostgreSQL\r\n- [x] Oracle\r\n- [x] MariaDB\r\n- [x] SQLite\r\n- [x] MongoDB\r\n- [x] Elasticsearch\r\n- [x] Redis\r\n- [x] DynamoDB\r\n- [x] MinIO\r\n- [x] Amazon S3\r\n- [x] Google Cloud Storage (GCS)\r\n- [x] Microsoft AzureBlob\r\n- [x] Doris\r\n- [x] SnowFlake\r\n- [x] BigQuery\r\n- [x] Redshift\r\n- [x] StarRocks\r\n- [ ] ...\r\n\r\n\u8f93\u51fa DataFrame:\r\n\r\n- [x] Pandas\r\n- [x] Polars\r\n- [x] Dask\r\n\r\n\u8f93\u51fa File:\r\n\r\n- [x] CSV\r\n- [x] Excel\r\n- [x] JSON\r\n- [x] HTML\r\n- [x] HDF5\r\n- [x] Feather\r\n- [x] Parquet\r\n- [x] Apache Avro\r\n\r\n<table>\r\n <tr>\r\n <th rowspan=\"2\">\u7c7b\u578b</th>\r\n <th style=\"min-width: 120px;\" rowspan=\"2\">\u6570\u636e\u6e90</th>\r\n <th colspan=\"3\"> DataFrame\u6570\u636e\u6846 </th>\r\n <th style=\"min-width: 160px;\" rowspan=\"2\" > File\u6587\u4ef6 </th>\r\n <th rowspan=\"2\" >\u5907\u6ce8</th>\r\n </tr>\r\n<tr>\r\n <th >Pandas</th>\r\n <th >Polars</th>\r\n <th >Dask</th>\r\n\r\n </tr>\r\n\r\n<tr>\r\n <td rowspan=\"6\">\u7ed3\u6784\u5316SQL</td>\r\n <td>MySQL</td>\r\n <td>\u2705 read<br>\u2705 write </td>\r\n <td>\u2705 read<br>\u2705 write </td>\r\n <td>\u2705 read<br> \u2716 write </td>\r\n <td>\u2705 CSV \u2705 Excel <br> \u2705 JSON \u2705 HTML <br> \u2705 HDF5 \u2705 Feather <br> \u2705 Parquet \u2716 Apache Avro </td>\r\n <td></td>\r\n </tr>\r\n <tr>\r\n <td>SQL SERVER</td>\r\n <td>\u2705 read<br>\u2705 write </td>\r\n <td>\u2705 read<br>\u2705 write </td>\r\n <td>\u2705 read<br> \u2716 write </td>\r\n <td>\u2705 CSV \u2705 Excel <br> \u2705 JSON \u2705 HTML <br> \u2705 HDF5 \u2705 Feather <br> \u2705 Parquet \u2716 Apache Avro </td>\r\n <td></td>\r\n </tr>\r\n <tr>\r\n <td>PostgreSQL</td>\r\n <td>\u2705 read<br>\u2705 write </td>\r\n <td>\u2705 read<br>\u2705 write </td>\r\n <td>\u2705 read<br> \u2716 write </td>\r\n <td>\u2705 CSV \u2705 Excel <br> \u2705 JSON \u2705 HTML <br> \u2705 HDF5 \u2705 Feather <br> \u2705 Parquet \u2716 Apache Avro </td>\r\n <td></td>\r\n </tr>\r\n <tr>\r\n <td>Oracle</td>\r\n <td>\u2705 read<br>\u2705 write </td>\r\n <td>\u2705 read<br>\u2705 write </td>\r\n <td>\u2705 read<br> \u2716 write </td>\r\n <td>\u2705 CSV \u2705 Excel <br> \u2705 JSON \u2705 HTML <br> \u2705 HDF5 \u2705 Feather <br> \u2705 Parquet \u2716 Apache Avro </td>\r\n <td></td>\r\n </tr>\r\n <tr>\r\n <td>MariaDB</td>\r\n <td>\u2705 read<br>\u2705 write </td>\r\n <td>\u2705 read<br>\u2705 write </td>\r\n <td>\u2705 read<br> \u2716 write </td>\r\n <td>\u2705 CSV \u2705 Excel <br> \u2705 JSON \u2705 HTML <br> \u2705 HDF5 \u2705 Feather <br> \u2705 Parquet \u2716 Apache Avro </td>\r\n <td></td>\r\n </tr>\r\n <tr>\r\n <td>SQLite</td>\r\n <td>\u2705 read<br>\u2705 write </td>\r\n <td>\u2705 read<br>\u2705 write </td>\r\n <td>\u2705 read<br> \u2716 write </td>\r\n <td>\u2705 CSV \u2705 Excel <br> \u2705 JSON \u2705 HTML <br> \u2705 HDF5 \u2705 Feather <br> \u2705 Parquet \u2716 Apache Avro </td>\r\n <td></td>\r\n </tr>\r\n <tr>\r\n <td rowspan=\"4\">\u975e\u7ed3\u6784\u5316noSQL</td>\r\n <td>MongoDB</td>\r\n <td>\u2705 read<br>\u2705 write </td>\r\n <td>\u2705 read<br>\u2705 write</td>\r\n <td> \u2716 </td>\r\n <td>\u2705 CSV \u2705 Excel <br> \u2716 JSON \u2705 HTML <br> \u2716 HDF5 \u2716 Feather <br> \u2716 Parquet \u2716 Apache Avro </td>\r\n </tr>\r\n <tr>\r\n <td>ElasticSearch</td>\r\n <td>\u2705 read<br>\u2705 write </td>\r\n <td>\u2705 read<br> \u2716 write</td>\r\n <td> \u2716 </td>\r\n <td>\u2705 CSV \u2705 Excel <br> \u2705 JSON \u2705 HTML <br> \u2705 HDF5 \u2705 Feather <br> \u2705 Parquet \u2716 Apache Avro </td>\r\n </tr>\r\n\r\n<tr>\r\n <td>Redis</td>\r\n <td>\u2705 read </td>\r\n <td> \u2716 </td>\r\n <td> \u2716 </td>\r\n <td>\u2705 CSV \u2705 Excel <br> \u2705 JSON \u2705 HTML <br> \u2705 HDF5 \u2705 Feather <br> \u2705 Parquet \u2716 Apache Avro </td>\r\n <td> </td>\r\n </tr>\r\n<tr>\r\n <td>DynamoDB</td>\r\n <td>\u2705 read<br>\u2705 write </td>\r\n <td>\u2705 read<br>\u2705 write</td>\r\n <td> \u2716 </td>\r\n <td>\u2705 CSV \u2705 Excel <br> \u2705 JSON \u2705 HTML <br> \u2705 HDF5 \u2705 Feather <br> \u2705 Parquet \u2716 Apache Avro </td>\r\n <td> </td>\r\n </tr>\r\n<tr>\r\n <td rowspan=\"4\">datalake</td>\r\n <td> MinIO </td>\r\n <td>\u2705 read<br>\u2705 write </td>\r\n <td>\u2705 read<br>\u2705 write</td>\r\n <td>\u2705 read<br> \u2716 write</td>\r\n <td>\u2705 CSV \u2705 Excel <br> \u2705 JSON \u2705 HTML <br> \u2705 HDF5 \u2705 Feather <br> \u2705 Parquet \u2716 Apache Avro </td>\r\n <td>\u2705 read<br> \u2716 write</td>\r\n </tr>\r\n<tr>\r\n <td> S3 </td>\r\n <td>\u2705 read<br>\u2705 write </td>\r\n <td>\u2705 read<br>\u2705 write</td>\r\n <td>\u2705 read<br> \u2716 write</td>\r\n <td>\u2705 CSV \u2705 Excel <br> \u2705 JSON \u2705 HTML <br> \u2705 HDF5 \u2705 Feather <br> \u2705 Parquet \u2716 Apache Avro </td>\r\n <td>\u2705 read<br> \u2716 write</td>\r\n </tr>\r\n<tr>\r\n <td> GCS </td>\r\n <td>\u2705 read<br>\u2705 write </td>\r\n <td>\u2705 read<br>\u2705 write</td>\r\n <td>\u2705 read<br> \u2716 write</td>\r\n <td>\u2705 CSV \u2705 Excel <br> \u2705 JSON \u2705 HTML <br> \u2705 HDF5 \u2705 Feather <br> \u2705 Parquet \u2716 Apache Avro </td>\r\n <td>\u2705 read<br> \u2716 write</td>\r\n </tr>\r\n<tr>\r\n <td> AzureBlob </td>\r\n <td>\u2705 read<br>\u2705 write </td>\r\n <td>\u2705 read<br>\u2705 write</td>\r\n <td>\u2705 read<br> \u2716 write</td>\r\n <td>\u2705 CSV \u2705 Excel <br> \u2705 JSON \u2705 HTML <br> \u2705 HDF5 \u2705 Feather <br> \u2705 Parquet \u2716 Apache Avro </td>\r\n <td>\u2705 read<br> \u2716 write</td>\r\n </tr>\r\n<tr>\r\n <td rowspan=\"5\">dataware house</td>\r\n <td> Doris </td>\r\n <td>\u2705 read<br>\u2705 write </td>\r\n <td>\u2705 read<br>\u2705 write</td>\r\n <td>\u2705 read<br> \u2716 write</td>\r\n <td>\u2705 CSV \u2705 Excel <br> \u2705 JSON \u2705 HTML <br> \u2705 HDF5 \u2705 Feather <br> \u2705 Parquet \u2716 Apache Avro </td>\r\n <td>\u2705 read<br> \u2716 write</td>\r\n </tr>\r\n<tr>\r\n <td> SnowFlake </td>\r\n <td>\u2705 read<br>\u2705 write </td>\r\n <td>\u2705 read<br>\u2705 write</td>\r\n <td>\u2705 read<br> \u2716 write</td>\r\n <td>\u2705 CSV \u2705 Excel <br> \u2705 JSON \u2705 HTML <br> \u2705 HDF5 \u2705 Feather <br> \u2705 Parquet \u2716 Apache Avro </td>\r\n <td>\u2705 read<br> \u2716 write</td>\r\n </tr>\r\n\r\n<tr>\r\n <td> BigQuery </td>\r\n <td>\u2705 read<br>\u2705 write </td>\r\n <td>\u2705 read<br>\u2705 write</td>\r\n <td>\u2705 read<br> \u2716 write</td>\r\n <td>\u2705 CSV \u2705 Excel <br> \u2705 JSON \u2705 HTML <br> \u2705 HDF5 \u2705 Feather <br> \u2705 Parquet \u2716 Apache Avro </td>\r\n <td>\u2705 read<br> \u2716 write</td>\r\n </tr>\r\n<tr>\r\n <td> Redshift </td>\r\n <td>\u2705 read<br>\u2705 write </td>\r\n <td>\u2705 read<br>\u2705 write</td>\r\n <td>\u2705 read<br> \u2716 write</td>\r\n <td>\u2705 CSV \u2705 Excel <br> \u2705 JSON \u2705 HTML <br> \u2705 HDF5 \u2705 Feather <br> \u2705 Parquet \u2716 Apache Avro </td>\r\n <td>\u2705 read<br> \u2716 write</td>\r\n </tr>\r\n<tr>\r\n <td> StarRocks </td>\r\n <td>\u2705 read<br>\u2705 write </td>\r\n <td>\u2705 read<br>\u2705 write</td>\r\n <td>\u2705 read<br> \u2716 write</td>\r\n <td>\u2705 CSV \u2705 Excel <br> \u2705 JSON \u2705 HTML <br> \u2705 HDF5 \u2705 Feather <br> \u2705 Parquet \u2716 Apache Avro </td>\r\n <td>\u2705 read<br> \u2716 write</td>\r\n </tr>\r\n\r\n\r\n</table>\r\n\r\n\r\n> \u5907\u6ce8\uff1a\u5bfc\u51faxlsx\u9700\u8981openpyxl \uff1bParquet\u5217\u5f0f\u5b58\u50a8\u6570\u636e\u6587\u4ef6 \uff1b Feather\u538b\u7f29\u4e8c\u8fdb\u5236\u6587\u4ef6\u3002\r\n\r\n## \u56db\u3001\u5feb\u901f\u542f\u52a8\r\n\r\n\u53ef\u901a\u8fc7 pip \u5b89\u88c5 ALcedo PDBC\uff0c\u8003\u8651\u5230\u56fd\u5185\u955c\u50cf\u6e90\u66f4\u65b0\u95ee\u9898\uff0c\u5b89\u88c5\u65f6\u6307\u5b9a\u4ecePyPI\u5b98\u65b9\u6e90\u4e0b\u8f7d\u3002\r\n\r\n```bash\r\npip install --index-url https://pypi.org/simple/ alcedo-pdbc\r\n\r\n```\r\n\r\n\r\n\u4ee5MySQL\u4e3a\u4f8b\uff0c\u60a8\u4ec5\u9700\u8981\u51e0\u884c\u4ee3\u7801\uff1a\r\n\r\n```python\r\n\r\n# \u5728 AI Lab 100 \u4e2d\u901a\u8fc7ailab100.pdbc\u5bfc\u5165\r\n# from ailab100.pdbc.sql import MySQL\r\nfrom alcedo_pdbc.sql import MySQL\r\n\r\ndb_mysql = MySQL()\r\ndf = db_mysql.read_as_dataframe(table_name=\"public_rent_price_forecast_data\",\r\n params={\"houseFloor\":\"\u4f4e\",\"totalFloor\":2},\r\n return_type='polars'\r\n )\r\n\r\n```\r\n\r\n\u6216\u8005\uff0c\u60a8\u53ef\u4ee5\u901a\u8fc7\u591a\u7ebf\u7a0b\u52a0\u901f\u6570\u636e\u52a0\u8f7d\u3002\r\n\r\n```python\r\n# \u5728 AI Lab 100 \u4e2d\u901a\u8fc7ailab100.pdbc\u5bfc\u5165\r\n\r\n# from ailab100.pdbc.datalake import MinIO\r\nfrom alcedo_pdbc.datalake import MinIO\r\n\r\nminio_client = MinIO('127.0.0.1:9000',\r\n access_key='Q3AM3UQ867SPQQA43P2F',\r\n secret_key='zuf+tfteSlswRu7BJ86wekitnifILbZam1KYY3TG'\r\n )\r\nminio_client.download_file(minio_path=\"s3://datalake/datasets/3\u8d22\u52a1\u56f0\u5883\u7814\u7a76\u6570\u636e\u96c6/ST\u8d22\u52a1\u9884\u8b66.csv\",num_threads=4)\r\n\r\n```\r\n\r\n\u8be5\u51fd\u6570\u5c06\u901a\u8fc7\u5c06\u6307\u5b9a\u7684\u7ebf\u7a0b\u6570\u5e73\u5747\u62c6\u5206\u4e3a\u5206\u533a\u6570\u6765\u5bf9\u67e5\u8be2\u8fdb\u884c\u5206\u533a\uff0c\u5e76\u4e3a\u6bcf\u4e2a\u5206\u533a\u5206\u914d\u4e00\u4e2a\u7ebf\u7a0b\u6765\u5e76\u884c\u52a0\u8f7d\u548c\u5199\u5165\u6570\u636e\u3002\r\n\r\n## \u4e94\u3001\u6027\u80fd\r\n\r\n\u5b9e\u9a8c\u5ba4\u901a\u8fc7\u6bd4\u8f83 Python \u4e2d\u7684\u4e0d\u540c\u89e3\u51b3\u65b9\u6848\uff0c\u91c7\u7528 4 \u4e2a\u7ebf\u7a0b\u5e76\u884c\u5904\u7406\uff0c\u8bfb\u53d6 MySQL \u4e2d\u4e00\u4e2a 10,981,106 \u884c\u7684\u6570\u636e\u8868 \uff081,092,616,192\r\n\u5b57\u8282\uff0c1.02 GB\uff09 \u52a0\u8f7d\u5230 DataFrame \u4e2d\uff0c\u5b9e\u9a8c\u7ed3\u679c\u5982\u4e0b\uff1a\r\n\r\n- **1.\u54cd\u5e94\u65f6\u95f4 (\u8d8a\u77ed\u8d8a\u597d)**\r\n\r\n\r\n\r\n- **2.\u5185\u5b58\u6d88\u8017 (\u8d8a\u4f4e\u8d8a\u597d)**\r\n\r\n\r\n\r\n\u603b\u4e4b\uff0cALcedo PDBC \u4f7f\u7528\u7684\u5185\u5b58\u51cf\u5c11\u4e86 1/3 \uff0c\u4e0e Pandas \u76f8\u6bd4\u54cd\u5e94\u65f6\u95f4\u51cf\u5c11\u4e86\u8fd1 1 \u500d\uff08\u4e0e Polars \u76f8\u6bd4\uff0c\u54cd\u5e94\u65f6\u95f4\u76f8\u5dee\u65e0\u51e0 \uff09\u3002\r\n\r\n## \u516d\u3001\u751f\u6001\u7cfb\u7edf\r\n\r\n<div style=\"display: flex;align-items: center \">\r\n<img src=\"./images/polars_logo.png\" \r\nstyle=\"margin-bottom: -2px;height: 30px\" />\r\n<img src=\"./images/pandas_logo.png\" \r\nstyle=\"margin-bottom: -2px;height: 50px\"/>\r\n</div>\r\n\r\n\r\n## \u4e03\u3001\u7248\u672c\u66f4\u65b0\r\n\r\n### 0.1.2b0 \r\n\r\n- \u6b64\u7248\u672c\u589e\u52a0\u4e86redis\u7f13\u5b58\u914d\u7f6e\uff0c\u63d0\u9ad8\u6570\u636e\u8bfb\u53d6\u901f\u5ea6\uff1b\r\n- \u53ef\u901a\u8fc7\u7cfb\u7edf\u73af\u5883\u53d8\u91cf\u5b9a\u4e49\u5404\u7c7b\u6570\u636e\u5e93\u7684\u8fde\u63a5\u4fe1\u606f\uff1b\r\n\r\n### 0.1.3b0 \r\n\r\n- \u589e\u52a0\u547d\u4ee4\u884c\u754c\u9762 CLI\u548cdocs\u6587\u6863\u96c6\u6210\uff0c\u53ef\u901a\u8fc7alcedo-cli docs\u542f\u52a8\u6587\u6863\uff1b\r\n- \u589e\u52a0\u4e86\u81ea\u52a8\u5316\u7f16\u8bd1\u6253\u5305\u7684\u811a\u672c\u6587\u4ef6\uff0c\u652f\u6301\r\n- \u589e\u52a0\u4e86\u5404\u7c7b\u6570\u636e\u5e93docker-compose\u90e8\u7f72\u6a21\u677f,\u8be6\u89c1samples_docker\u76ee\u5f55\u4e2d\u7684docker-compose-db.yml\r\n---\r\n",
"bugtrack_url": null,
"license": null,
"summary": "Add your description here",
"version": "0.1.3b0",
"project_urls": {
"Funding": "https://donate.pypi.org",
"Source": "https://github.com/"
},
"split_keywords": [
"pdbc",
" sql"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "e7edf57fd0e647d713d752446a040ee2641f412ab68162416d77fe86f7d08bd1",
"md5": "11b7439a8d3f11a858f4940ab1efc1ee",
"sha256": "ca9fbbe89e3bbeb4940cc0c095135ae9b3598393cb94892ab045bea35f0f980e"
},
"downloads": -1,
"filename": "alcedo_pdbc-0.1.3b0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "11b7439a8d3f11a858f4940ab1efc1ee",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 60938603,
"upload_time": "2025-08-11T05:56:22",
"upload_time_iso_8601": "2025-08-11T05:56:22.498389Z",
"url": "https://files.pythonhosted.org/packages/e7/ed/f57fd0e647d713d752446a040ee2641f412ab68162416d77fe86f7d08bd1/alcedo_pdbc-0.1.3b0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-11 05:56:22",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "alcedo-pdbc"
}