jayden-opensearch-retriever


Namejayden-opensearch-retriever JSON
Version 0.1.1 PyPI version JSON
download
home_pageNone
SummaryOpenSearch 기반 검색 및 검색 결과 검색 도구
upload_time2025-07-14 04:29:48
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseNone
keywords opensearch search retriever elasticsearch
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # OpenSearch 벡터 검색 리트리버

OpenSearch 기반 부모-자식 문서 벡터 검색 도구입니다.

## 기능

- OpenSearch 클러스터에 연결
- 임베딩 벡터를 이용한 유사도 검색
- 부모-자식 관계의 문서 검색 (Child 인덱스에서 검색 후 Parent 문서 반환)
- 쿼리 임베딩 생성

## 설치 방법

### 1. 개발 설치 (로컬 개발용)

```bash
# 저장소 클론
git clone https://github.com/nxtcloud-org/opensearch_retriever
cd opensearch-retriever

# 가상환경 생성 (권장)
python -m venv venv
source venv/bin/activate  # Linux/Mac
# venv\Scripts\activate  # Windows

# 개발 모드로 설치
pip install -e .

# 또는 개발 의존성과 함께 설치
pip install -e ".[dev]"
```

### 2. PyPI 설치 (배포 후)

```bash
pip install opensearch-retriever
```

### 3. requirements.txt에서 설치

```bash
pip install -r requirements.txt
```

## 사용법

### 기본 설정

```python
from opensearch_retriever import ParentChildRetriever
from langchain_core.embeddings import Embeddings

# 임베딩 모델 준비 (예: HuggingFace, OpenAI 등)
# embedding_model = YourEmbeddingModel()

# ParentChildRetriever 초기화
retriever = ParentChildRetriever(
    host="localhost",
    port=9200,
    username="admin",  # 선택사항
    password="admin",  # 선택사항
    use_ssl=False,
    verify_certs=True,
    embedding_model=embedding_model
)
```

### 벡터 검색

```python
# 부모-자식 관계 문서 벡터 검색
results = retriever.vector_search(
    child_index="child-documents",    # 자식 문서 인덱스 (벡터 검색 대상)
    parent_index="parent-documents",  # 부모 문서 인덱스 (결과 반환 대상)
    query="검색할 질문이나 내용",
    k=3,        # 벡터 검색에서 고려할 상위 문서 수
    fields=["title", "content"]  # 선택사항
)

# 검색 결과 처리
for document in results:
    print(f"문서 내용: {document}")
```

### 쿼리 임베딩

```python
# 쿼리를 벡터로 변환
query_vector = retriever.embed_query("임베딩할 텍스트")
print(f"임베딩 벡터 차원: {len(query_vector)}")
```

## 인덱스 구조

이 도구는 부모-자식 관계의 문서 구조를 가정합니다:

### Child 인덱스 (벡터 검색용)
```json
{
  "vector": [0.1, 0.2, 0.3, ...],  // 임베딩 벡터
  "metadata": {
    "parent_id": "parent_doc_1",    // 부모 문서 ID
    "chunk_id": "chunk_1"           // 청크 ID
  },
  "content": "실제 텍스트 내용"
}
```

### Parent 인덱스 (결과 반환용)
```json
{
  "title": "문서 제목",
  "content": "전체 문서 내용",
  "metadata": {
    "source": "파일 경로",
    "created_at": "2024-01-01"
  }
}
```

## 검색 과정

1. **쿼리 임베딩**: 입력된 쿼리를 벡터로 변환
2. **벡터 검색**: Child 인덱스에서 KNN 검색 수행
3. **부모 ID 추출**: 검색된 child 문서들에서 parent_id 수집
4. **부모 문서 조회**: Parent 인덱스에서 해당 부모 문서들을 조회
5. **결과 반환**: 최종 부모 문서들을 리스트로 반환

## 개발 환경 설정

### 의존성 설치

```bash
# 개발 의존성 설치
pip install -e ".[dev]"

# 또는 개별 설치
pip install pytest pytest-cov black flake8 mypy pre-commit
```

### 테스트 실행

```bash
# 모든 테스트 실행
pytest

# 특정 테스트 파일 실행
pytest tests/test_retriever.py

# 커버리지와 함께 테스트 실행
pytest --cov=opensearch_retriever
```

## 패키지 빌드 및 배포

### 패키지 빌드

```bash
# 빌드 도구 설치
pip install build

# 패키지 빌드
python -m build
```

### PyPI 배포

```bash
# 배포 도구 설치
pip install twine

# TestPyPI에 배포 (테스트용)
twine upload --repository testpypi dist/*

# 실제 PyPI에 배포
twine upload dist/*
```

## 요구사항

- Python >= 3.9
- requests >= 2.25.0
- opensearch-py >= 2.0.0
- langchain-core (임베딩 모델용)

## 주의사항

- 임베딩 모델(`embedding_model`)을 반드시 제공해야 합니다
- Child 인덱스에는 벡터 필드와 parent_id 메타데이터가 필요합니다
- Parent 인덱스는 실제 문서 내용을 담고 있어야 합니다
- OpenSearch에서 KNN 검색이 활성화되어 있어야 합니다

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "jayden-opensearch-retriever",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "Jayden <jayden.kim@nxtcloud.kr>",
    "keywords": "opensearch, search, retriever, elasticsearch",
    "author": null,
    "author_email": "Jayden <jayden.kim@nxtcloud.kr>",
    "download_url": "https://files.pythonhosted.org/packages/6e/ae/9696457f496bcff088a104b03a39cfe9da96c7dcc31fd02e187c430ea869/jayden_opensearch_retriever-0.1.1.tar.gz",
    "platform": null,
    "description": "# OpenSearch \ubca1\ud130 \uac80\uc0c9 \ub9ac\ud2b8\ub9ac\ubc84\n\nOpenSearch \uae30\ubc18 \ubd80\ubaa8-\uc790\uc2dd \ubb38\uc11c \ubca1\ud130 \uac80\uc0c9 \ub3c4\uad6c\uc785\ub2c8\ub2e4.\n\n## \uae30\ub2a5\n\n- OpenSearch \ud074\ub7ec\uc2a4\ud130\uc5d0 \uc5f0\uacb0\n- \uc784\ubca0\ub529 \ubca1\ud130\ub97c \uc774\uc6a9\ud55c \uc720\uc0ac\ub3c4 \uac80\uc0c9\n- \ubd80\ubaa8-\uc790\uc2dd \uad00\uacc4\uc758 \ubb38\uc11c \uac80\uc0c9 (Child \uc778\ub371\uc2a4\uc5d0\uc11c \uac80\uc0c9 \ud6c4 Parent \ubb38\uc11c \ubc18\ud658)\n- \ucffc\ub9ac \uc784\ubca0\ub529 \uc0dd\uc131\n\n## \uc124\uce58 \ubc29\ubc95\n\n### 1. \uac1c\ubc1c \uc124\uce58 (\ub85c\uceec \uac1c\ubc1c\uc6a9)\n\n```bash\n# \uc800\uc7a5\uc18c \ud074\ub860\ngit clone https://github.com/nxtcloud-org/opensearch_retriever\ncd opensearch-retriever\n\n# \uac00\uc0c1\ud658\uacbd \uc0dd\uc131 (\uad8c\uc7a5)\npython -m venv venv\nsource venv/bin/activate  # Linux/Mac\n# venv\\Scripts\\activate  # Windows\n\n# \uac1c\ubc1c \ubaa8\ub4dc\ub85c \uc124\uce58\npip install -e .\n\n# \ub610\ub294 \uac1c\ubc1c \uc758\uc874\uc131\uacfc \ud568\uaed8 \uc124\uce58\npip install -e \".[dev]\"\n```\n\n### 2. PyPI \uc124\uce58 (\ubc30\ud3ec \ud6c4)\n\n```bash\npip install opensearch-retriever\n```\n\n### 3. requirements.txt\uc5d0\uc11c \uc124\uce58\n\n```bash\npip install -r requirements.txt\n```\n\n## \uc0ac\uc6a9\ubc95\n\n### \uae30\ubcf8 \uc124\uc815\n\n```python\nfrom opensearch_retriever import ParentChildRetriever\nfrom langchain_core.embeddings import Embeddings\n\n# \uc784\ubca0\ub529 \ubaa8\ub378 \uc900\ube44 (\uc608: HuggingFace, OpenAI \ub4f1)\n# embedding_model = YourEmbeddingModel()\n\n# ParentChildRetriever \ucd08\uae30\ud654\nretriever = ParentChildRetriever(\n    host=\"localhost\",\n    port=9200,\n    username=\"admin\",  # \uc120\ud0dd\uc0ac\ud56d\n    password=\"admin\",  # \uc120\ud0dd\uc0ac\ud56d\n    use_ssl=False,\n    verify_certs=True,\n    embedding_model=embedding_model\n)\n```\n\n### \ubca1\ud130 \uac80\uc0c9\n\n```python\n# \ubd80\ubaa8-\uc790\uc2dd \uad00\uacc4 \ubb38\uc11c \ubca1\ud130 \uac80\uc0c9\nresults = retriever.vector_search(\n    child_index=\"child-documents\",    # \uc790\uc2dd \ubb38\uc11c \uc778\ub371\uc2a4 (\ubca1\ud130 \uac80\uc0c9 \ub300\uc0c1)\n    parent_index=\"parent-documents\",  # \ubd80\ubaa8 \ubb38\uc11c \uc778\ub371\uc2a4 (\uacb0\uacfc \ubc18\ud658 \ub300\uc0c1)\n    query=\"\uac80\uc0c9\ud560 \uc9c8\ubb38\uc774\ub098 \ub0b4\uc6a9\",\n    k=3,        # \ubca1\ud130 \uac80\uc0c9\uc5d0\uc11c \uace0\ub824\ud560 \uc0c1\uc704 \ubb38\uc11c \uc218\n    fields=[\"title\", \"content\"]  # \uc120\ud0dd\uc0ac\ud56d\n)\n\n# \uac80\uc0c9 \uacb0\uacfc \ucc98\ub9ac\nfor document in results:\n    print(f\"\ubb38\uc11c \ub0b4\uc6a9: {document}\")\n```\n\n### \ucffc\ub9ac \uc784\ubca0\ub529\n\n```python\n# \ucffc\ub9ac\ub97c \ubca1\ud130\ub85c \ubcc0\ud658\nquery_vector = retriever.embed_query(\"\uc784\ubca0\ub529\ud560 \ud14d\uc2a4\ud2b8\")\nprint(f\"\uc784\ubca0\ub529 \ubca1\ud130 \ucc28\uc6d0: {len(query_vector)}\")\n```\n\n## \uc778\ub371\uc2a4 \uad6c\uc870\n\n\uc774 \ub3c4\uad6c\ub294 \ubd80\ubaa8-\uc790\uc2dd \uad00\uacc4\uc758 \ubb38\uc11c \uad6c\uc870\ub97c \uac00\uc815\ud569\ub2c8\ub2e4:\n\n### Child \uc778\ub371\uc2a4 (\ubca1\ud130 \uac80\uc0c9\uc6a9)\n```json\n{\n  \"vector\": [0.1, 0.2, 0.3, ...],  // \uc784\ubca0\ub529 \ubca1\ud130\n  \"metadata\": {\n    \"parent_id\": \"parent_doc_1\",    // \ubd80\ubaa8 \ubb38\uc11c ID\n    \"chunk_id\": \"chunk_1\"           // \uccad\ud06c ID\n  },\n  \"content\": \"\uc2e4\uc81c \ud14d\uc2a4\ud2b8 \ub0b4\uc6a9\"\n}\n```\n\n### Parent \uc778\ub371\uc2a4 (\uacb0\uacfc \ubc18\ud658\uc6a9)\n```json\n{\n  \"title\": \"\ubb38\uc11c \uc81c\ubaa9\",\n  \"content\": \"\uc804\uccb4 \ubb38\uc11c \ub0b4\uc6a9\",\n  \"metadata\": {\n    \"source\": \"\ud30c\uc77c \uacbd\ub85c\",\n    \"created_at\": \"2024-01-01\"\n  }\n}\n```\n\n## \uac80\uc0c9 \uacfc\uc815\n\n1. **\ucffc\ub9ac \uc784\ubca0\ub529**: \uc785\ub825\ub41c \ucffc\ub9ac\ub97c \ubca1\ud130\ub85c \ubcc0\ud658\n2. **\ubca1\ud130 \uac80\uc0c9**: Child \uc778\ub371\uc2a4\uc5d0\uc11c KNN \uac80\uc0c9 \uc218\ud589\n3. **\ubd80\ubaa8 ID \ucd94\ucd9c**: \uac80\uc0c9\ub41c child \ubb38\uc11c\ub4e4\uc5d0\uc11c parent_id \uc218\uc9d1\n4. **\ubd80\ubaa8 \ubb38\uc11c \uc870\ud68c**: Parent \uc778\ub371\uc2a4\uc5d0\uc11c \ud574\ub2f9 \ubd80\ubaa8 \ubb38\uc11c\ub4e4\uc744 \uc870\ud68c\n5. **\uacb0\uacfc \ubc18\ud658**: \ucd5c\uc885 \ubd80\ubaa8 \ubb38\uc11c\ub4e4\uc744 \ub9ac\uc2a4\ud2b8\ub85c \ubc18\ud658\n\n## \uac1c\ubc1c \ud658\uacbd \uc124\uc815\n\n### \uc758\uc874\uc131 \uc124\uce58\n\n```bash\n# \uac1c\ubc1c \uc758\uc874\uc131 \uc124\uce58\npip install -e \".[dev]\"\n\n# \ub610\ub294 \uac1c\ubcc4 \uc124\uce58\npip install pytest pytest-cov black flake8 mypy pre-commit\n```\n\n### \ud14c\uc2a4\ud2b8 \uc2e4\ud589\n\n```bash\n# \ubaa8\ub4e0 \ud14c\uc2a4\ud2b8 \uc2e4\ud589\npytest\n\n# \ud2b9\uc815 \ud14c\uc2a4\ud2b8 \ud30c\uc77c \uc2e4\ud589\npytest tests/test_retriever.py\n\n# \ucee4\ubc84\ub9ac\uc9c0\uc640 \ud568\uaed8 \ud14c\uc2a4\ud2b8 \uc2e4\ud589\npytest --cov=opensearch_retriever\n```\n\n## \ud328\ud0a4\uc9c0 \ube4c\ub4dc \ubc0f \ubc30\ud3ec\n\n### \ud328\ud0a4\uc9c0 \ube4c\ub4dc\n\n```bash\n# \ube4c\ub4dc \ub3c4\uad6c \uc124\uce58\npip install build\n\n# \ud328\ud0a4\uc9c0 \ube4c\ub4dc\npython -m build\n```\n\n### PyPI \ubc30\ud3ec\n\n```bash\n# \ubc30\ud3ec \ub3c4\uad6c \uc124\uce58\npip install twine\n\n# TestPyPI\uc5d0 \ubc30\ud3ec (\ud14c\uc2a4\ud2b8\uc6a9)\ntwine upload --repository testpypi dist/*\n\n# \uc2e4\uc81c PyPI\uc5d0 \ubc30\ud3ec\ntwine upload dist/*\n```\n\n## \uc694\uad6c\uc0ac\ud56d\n\n- Python >= 3.9\n- requests >= 2.25.0\n- opensearch-py >= 2.0.0\n- langchain-core (\uc784\ubca0\ub529 \ubaa8\ub378\uc6a9)\n\n## \uc8fc\uc758\uc0ac\ud56d\n\n- \uc784\ubca0\ub529 \ubaa8\ub378(`embedding_model`)\uc744 \ubc18\ub4dc\uc2dc \uc81c\uacf5\ud574\uc57c \ud569\ub2c8\ub2e4\n- Child \uc778\ub371\uc2a4\uc5d0\ub294 \ubca1\ud130 \ud544\ub4dc\uc640 parent_id \uba54\ud0c0\ub370\uc774\ud130\uac00 \ud544\uc694\ud569\ub2c8\ub2e4\n- Parent \uc778\ub371\uc2a4\ub294 \uc2e4\uc81c \ubb38\uc11c \ub0b4\uc6a9\uc744 \ub2f4\uace0 \uc788\uc5b4\uc57c \ud569\ub2c8\ub2e4\n- OpenSearch\uc5d0\uc11c KNN \uac80\uc0c9\uc774 \ud65c\uc131\ud654\ub418\uc5b4 \uc788\uc5b4\uc57c \ud569\ub2c8\ub2e4\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "OpenSearch \uae30\ubc18 \uac80\uc0c9 \ubc0f \uac80\uc0c9 \uacb0\uacfc \uac80\uc0c9 \ub3c4\uad6c",
    "version": "0.1.1",
    "project_urls": {
        "Bug Tracker": "https://github.com/yourusername/opensearch-retriever/issues",
        "Documentation": "https://opensearch-retriever.readthedocs.io/",
        "Homepage": "https://github.com/yourusername/opensearch-retriever",
        "Repository": "https://github.com/yourusername/opensearch-retriever"
    },
    "split_keywords": [
        "opensearch",
        " search",
        " retriever",
        " elasticsearch"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "e8e258cc743bb1b8935045f1c810c1ead2c8bb78100e92b94c6cc5184e324d9c",
                "md5": "ba219d630e8d584493f73b6304da6f70",
                "sha256": "716f7580db4bb2396a43ad222d75e1f8dbe3dcef42847a75c36c1d4f045d1120"
            },
            "downloads": -1,
            "filename": "jayden_opensearch_retriever-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ba219d630e8d584493f73b6304da6f70",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 6159,
            "upload_time": "2025-07-14T04:29:47",
            "upload_time_iso_8601": "2025-07-14T04:29:47.699009Z",
            "url": "https://files.pythonhosted.org/packages/e8/e2/58cc743bb1b8935045f1c810c1ead2c8bb78100e92b94c6cc5184e324d9c/jayden_opensearch_retriever-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "6eae9696457f496bcff088a104b03a39cfe9da96c7dcc31fd02e187c430ea869",
                "md5": "ab31e3bcb177cc96f1faf07406408c7a",
                "sha256": "32310dee767c66910349d2a04bb8b59ac587edfea3c54aed2007289c488a88e4"
            },
            "downloads": -1,
            "filename": "jayden_opensearch_retriever-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "ab31e3bcb177cc96f1faf07406408c7a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 6953,
            "upload_time": "2025-07-14T04:29:48",
            "upload_time_iso_8601": "2025-07-14T04:29:48.917282Z",
            "url": "https://files.pythonhosted.org/packages/6e/ae/9696457f496bcff088a104b03a39cfe9da96c7dcc31fd02e187c430ea869/jayden_opensearch_retriever-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-14 04:29:48",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "yourusername",
    "github_project": "opensearch-retriever",
    "github_not_found": true,
    "lcname": "jayden-opensearch-retriever"
}
        
Elapsed time: 1.17707s