bible-xml-parser


Namebible-xml-parser JSON
Version 0.1.1 PyPI version JSON
download
home_pageNone
SummaryA Python package for parsing Bible texts in various XML formats (USFX, OSIS, ZEFANIA)
upload_time2025-10-25 22:49:27
maintainerNone
docs_urlNone
authorOmar Zintan
requires_python>=3.8
licenseNone
keywords bible parser xml usfx osis zefania scripture
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Bible XML Parser

![PyPI](https://img.shields.io/pypi/v/bible-xml-parser)
![Python Versions](https://img.shields.io/pypi/pyversions/bible-xml-parser)
![License](https://img.shields.io/pypi/l/bible-xml-parser)
![Downloads](https://img.shields.io/pypi/dm/bible-xml-parser)

A Python package for parsing Bible texts in various XML formats (USFX, OSIS, ZEFANIA). This package provides both direct parsing and database-backed approaches for handling Bible data in your Python applications.

## Features

- 📖 Parse Bible texts in multiple formats (USFX, OSIS, ZEFANIA)
- 🔍 Automatic format detection
- 🚀 Memory-efficient streaming XML parsing using defusedxml
- 🗄️ SQLite database caching for improved performance
- 🔎 Full-text search functionality (FTS5)
- 🔒 Secure XML parsing (protected against XXE attacks)
- 📝 Type hints throughout for better IDE support
- 🐍 Python 3.8+ support

## Installation

```bash
pip install bible-xml-parser
```

### Development Installation

```bash
git clone https://github.com/Omarzintan/bible_parser_python.git
cd bible_parser_python
pip install -e ".[dev]"
```

## Quick Start

### Direct Parsing Approach

Parse a Bible file directly without database caching:

```python
from bible_parser import BibleParser

# Parse from file (format auto-detected)
parser = BibleParser('path/to/bible.xml')

# Or parse from string with explicit format
xml_content = open('bible.xml').read()
parser = BibleParser.from_string(xml_content, format='USFX')

# Iterate over books
for book in parser.books:
    print(f"{book.title} ({book.id})")
    print(f"  Chapters: {len(book.chapters)}")
    print(f"  Verses: {len(book.verses)}")

# Or iterate over verses directly
for verse in parser.verses:
    print(f"{verse.book_id} {verse.chapter_num}:{verse.num} - {verse.text}")
```

### Database Approach (Recommended for Production)

For better performance, use the database approach:

```python
from bible_parser import BibleRepository

# Create repository
repo = BibleRepository(xml_path='path/to/bible.xml', format='USFX')

# Initialize database (only needed once)
repo.initialize('my_bible.db')

# Get all books
books = repo.get_books()
for book in books:
    print(f"{book.title} ({book.id})")

# Get verses from a specific chapter
verses = repo.get_verses('gen', 1)  # Genesis chapter 1
for verse in verses:
    print(f"{verse.num}. {verse.text}")

# Get a specific verse
verse = repo.get_verse('jhn', 3, 16)  # John 3:16
if verse:
    print(verse.text)

# Search for verses containing specific text
results = repo.search_verses('love')
print(f"Found {len(results)} verses containing 'love'")

# Don't forget to close
repo.close()
```

### Using Context Manager

```python
from bible_parser import BibleRepository

with BibleRepository(xml_path='bible.xml') as repo:
    repo.initialize('my_bible.db')
    
    # Use the repository
    verses = repo.get_verses('mat', 5)  # Matthew chapter 5
    for verse in verses:
        print(f"{verse.num}. {verse.text}")
    
    # Search
    results = repo.search_verses('faith hope love')
    for verse in results:
        print(f"{verse.book_id} {verse.chapter_num}:{verse.num}")

# Database automatically closed
```

## Supported Formats

### USFX (Unified Standard Format XML)
```xml
<usfx>
  <book id="gen">
    <c id="1"/>
    <v id="1">In the beginning...</v>
  </book>
</usfx>
```

### OSIS (Open Scripture Information Standard)
```xml
<osis>
  <osisText>
    <div type="book" osisID="Gen">
      <verse osisID="Gen.1.1">In the beginning...</verse>
    </div>
  </osisText>
</osis>
```

### Zefania XML
```xml
<XMLBIBLE>
  <BIBLEBOOK bnumber="1" bname="Genesis">
    <CHAPTER cnumber="1">
      <VERS vnumber="1">In the beginning...</VERS>
    </CHAPTER>
  </BIBLEBOOK>
</XMLBIBLE>
```

## API Reference

### BibleParser

Main parser class with automatic format detection.

**Methods:**
- `__init__(source, format=None)` - Initialize parser
- `from_string(xml_content, format=None)` - Create from XML string
- `books` - Property that yields Book objects
- `verses` - Property that yields Verse objects

### BibleRepository

Database-backed repository for efficient Bible data access.

**Methods:**
- `__init__(xml_path=None, xml_string=None, format=None)` - Initialize repository
- `initialize(database_name)` - Create/open database
- `get_books()` - Get all books
- `get_verses(book_id, chapter_num)` - Get verses from a chapter
- `get_verse(book_id, chapter_num, verse_num)` - Get a specific verse
- `get_chapter_count(book_id)` - Get number of chapters in a book
- `search_verses(query, limit=100)` - Full-text search
- `close()` - Close database connection

### Data Models

**Verse:**
- `num` (int) - Verse number
- `chapter_num` (int) - Chapter number
- `text` (str) - Verse text
- `book_id` (str) - Book identifier

**Chapter:**
- `num` (int) - Chapter number
- `verses` (List[Verse]) - List of verses

**Book:**
- `id` (str) - Book identifier (e.g., 'gen', 'mat')
- `num` (int) - Book number
- `title` (str) - Book title (e.g., 'Genesis', 'Matthew')
- `chapters` (List[Chapter]) - List of chapters
- `verses` (List[Verse]) - Flat list of all verses

## Performance Considerations

### Direct Parsing
**Pros:**
- Simple implementation
- No database setup required
- Always uses the latest source files

**Cons:**
- CPU and memory intensive
- Slower for repeated access
- Repeated parsing on each run

### Database Approach
**Pros:**
- Much faster access once data is loaded
- Lower memory usage during queries
- Efficient full-text search with FTS5
- Works offline without re-parsing

**Cons:**
- Initial setup time
- Requires disk space
- Additional complexity

## Security

This package uses `defusedxml` for secure XML parsing, protecting against:
- **XXE (XML External Entity) attacks** - Prevents reading local files or making network requests
- **Billion Laughs attack** - Prevents exponential entity expansion
- **Quadratic blowup** - Prevents memory exhaustion

All database queries use parameterized statements to prevent SQL injection.

## Examples

See the `examples/` directory for complete working examples:
- `direct_parsing.py` - Direct parsing example
- `database_approach.py` - Database caching example
- `search_example.py` - Full-text search example

## Testing

Run tests with pytest:

```bash
pytest
```

With coverage:

```bash
pytest --cov=bible_parser --cov-report=term-missing
```

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Acknowledgments

- Inspired by the Ruby [bible_parser](https://github.com/seven1m/bible_parser) library
- Flutter [bible_parser_flutter](https://github.com/Omarzintan/bible_parser_flutter) implementation
- Bible XML files from the [open-bibles](https://github.com/seven1m/open-bibles) repository

## Changelog

See [CHANGELOG.md](CHANGELOG.md) for version history.

## Support

- 📫 Issues: [GitHub Issues](https://github.com/Omarzintan/bible_parser_python/issues)
- 📖 Documentation: [GitHub Wiki](https://github.com/Omarzintan/bible_parser_python/wiki)

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "bible-xml-parser",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "bible, parser, xml, usfx, osis, zefania, scripture",
    "author": "Omar Zintan",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/aa/aa/dba909c97bee2e0f2122633522c4e7b1ae324f9481dd1a5bc2afb26fc1e3/bible_xml_parser-0.1.1.tar.gz",
    "platform": null,
    "description": "# Bible XML Parser\n\n![PyPI](https://img.shields.io/pypi/v/bible-xml-parser)\n![Python Versions](https://img.shields.io/pypi/pyversions/bible-xml-parser)\n![License](https://img.shields.io/pypi/l/bible-xml-parser)\n![Downloads](https://img.shields.io/pypi/dm/bible-xml-parser)\n\nA Python package for parsing Bible texts in various XML formats (USFX, OSIS, ZEFANIA). This package provides both direct parsing and database-backed approaches for handling Bible data in your Python applications.\n\n## Features\n\n- \ud83d\udcd6 Parse Bible texts in multiple formats (USFX, OSIS, ZEFANIA)\n- \ud83d\udd0d Automatic format detection\n- \ud83d\ude80 Memory-efficient streaming XML parsing using defusedxml\n- \ud83d\uddc4\ufe0f SQLite database caching for improved performance\n- \ud83d\udd0e Full-text search functionality (FTS5)\n- \ud83d\udd12 Secure XML parsing (protected against XXE attacks)\n- \ud83d\udcdd Type hints throughout for better IDE support\n- \ud83d\udc0d Python 3.8+ support\n\n## Installation\n\n```bash\npip install bible-xml-parser\n```\n\n### Development Installation\n\n```bash\ngit clone https://github.com/Omarzintan/bible_parser_python.git\ncd bible_parser_python\npip install -e \".[dev]\"\n```\n\n## Quick Start\n\n### Direct Parsing Approach\n\nParse a Bible file directly without database caching:\n\n```python\nfrom bible_parser import BibleParser\n\n# Parse from file (format auto-detected)\nparser = BibleParser('path/to/bible.xml')\n\n# Or parse from string with explicit format\nxml_content = open('bible.xml').read()\nparser = BibleParser.from_string(xml_content, format='USFX')\n\n# Iterate over books\nfor book in parser.books:\n    print(f\"{book.title} ({book.id})\")\n    print(f\"  Chapters: {len(book.chapters)}\")\n    print(f\"  Verses: {len(book.verses)}\")\n\n# Or iterate over verses directly\nfor verse in parser.verses:\n    print(f\"{verse.book_id} {verse.chapter_num}:{verse.num} - {verse.text}\")\n```\n\n### Database Approach (Recommended for Production)\n\nFor better performance, use the database approach:\n\n```python\nfrom bible_parser import BibleRepository\n\n# Create repository\nrepo = BibleRepository(xml_path='path/to/bible.xml', format='USFX')\n\n# Initialize database (only needed once)\nrepo.initialize('my_bible.db')\n\n# Get all books\nbooks = repo.get_books()\nfor book in books:\n    print(f\"{book.title} ({book.id})\")\n\n# Get verses from a specific chapter\nverses = repo.get_verses('gen', 1)  # Genesis chapter 1\nfor verse in verses:\n    print(f\"{verse.num}. {verse.text}\")\n\n# Get a specific verse\nverse = repo.get_verse('jhn', 3, 16)  # John 3:16\nif verse:\n    print(verse.text)\n\n# Search for verses containing specific text\nresults = repo.search_verses('love')\nprint(f\"Found {len(results)} verses containing 'love'\")\n\n# Don't forget to close\nrepo.close()\n```\n\n### Using Context Manager\n\n```python\nfrom bible_parser import BibleRepository\n\nwith BibleRepository(xml_path='bible.xml') as repo:\n    repo.initialize('my_bible.db')\n    \n    # Use the repository\n    verses = repo.get_verses('mat', 5)  # Matthew chapter 5\n    for verse in verses:\n        print(f\"{verse.num}. {verse.text}\")\n    \n    # Search\n    results = repo.search_verses('faith hope love')\n    for verse in results:\n        print(f\"{verse.book_id} {verse.chapter_num}:{verse.num}\")\n\n# Database automatically closed\n```\n\n## Supported Formats\n\n### USFX (Unified Standard Format XML)\n```xml\n<usfx>\n  <book id=\"gen\">\n    <c id=\"1\"/>\n    <v id=\"1\">In the beginning...</v>\n  </book>\n</usfx>\n```\n\n### OSIS (Open Scripture Information Standard)\n```xml\n<osis>\n  <osisText>\n    <div type=\"book\" osisID=\"Gen\">\n      <verse osisID=\"Gen.1.1\">In the beginning...</verse>\n    </div>\n  </osisText>\n</osis>\n```\n\n### Zefania XML\n```xml\n<XMLBIBLE>\n  <BIBLEBOOK bnumber=\"1\" bname=\"Genesis\">\n    <CHAPTER cnumber=\"1\">\n      <VERS vnumber=\"1\">In the beginning...</VERS>\n    </CHAPTER>\n  </BIBLEBOOK>\n</XMLBIBLE>\n```\n\n## API Reference\n\n### BibleParser\n\nMain parser class with automatic format detection.\n\n**Methods:**\n- `__init__(source, format=None)` - Initialize parser\n- `from_string(xml_content, format=None)` - Create from XML string\n- `books` - Property that yields Book objects\n- `verses` - Property that yields Verse objects\n\n### BibleRepository\n\nDatabase-backed repository for efficient Bible data access.\n\n**Methods:**\n- `__init__(xml_path=None, xml_string=None, format=None)` - Initialize repository\n- `initialize(database_name)` - Create/open database\n- `get_books()` - Get all books\n- `get_verses(book_id, chapter_num)` - Get verses from a chapter\n- `get_verse(book_id, chapter_num, verse_num)` - Get a specific verse\n- `get_chapter_count(book_id)` - Get number of chapters in a book\n- `search_verses(query, limit=100)` - Full-text search\n- `close()` - Close database connection\n\n### Data Models\n\n**Verse:**\n- `num` (int) - Verse number\n- `chapter_num` (int) - Chapter number\n- `text` (str) - Verse text\n- `book_id` (str) - Book identifier\n\n**Chapter:**\n- `num` (int) - Chapter number\n- `verses` (List[Verse]) - List of verses\n\n**Book:**\n- `id` (str) - Book identifier (e.g., 'gen', 'mat')\n- `num` (int) - Book number\n- `title` (str) - Book title (e.g., 'Genesis', 'Matthew')\n- `chapters` (List[Chapter]) - List of chapters\n- `verses` (List[Verse]) - Flat list of all verses\n\n## Performance Considerations\n\n### Direct Parsing\n**Pros:**\n- Simple implementation\n- No database setup required\n- Always uses the latest source files\n\n**Cons:**\n- CPU and memory intensive\n- Slower for repeated access\n- Repeated parsing on each run\n\n### Database Approach\n**Pros:**\n- Much faster access once data is loaded\n- Lower memory usage during queries\n- Efficient full-text search with FTS5\n- Works offline without re-parsing\n\n**Cons:**\n- Initial setup time\n- Requires disk space\n- Additional complexity\n\n## Security\n\nThis package uses `defusedxml` for secure XML parsing, protecting against:\n- **XXE (XML External Entity) attacks** - Prevents reading local files or making network requests\n- **Billion Laughs attack** - Prevents exponential entity expansion\n- **Quadratic blowup** - Prevents memory exhaustion\n\nAll database queries use parameterized statements to prevent SQL injection.\n\n## Examples\n\nSee the `examples/` directory for complete working examples:\n- `direct_parsing.py` - Direct parsing example\n- `database_approach.py` - Database caching example\n- `search_example.py` - Full-text search example\n\n## Testing\n\nRun tests with pytest:\n\n```bash\npytest\n```\n\nWith coverage:\n\n```bash\npytest --cov=bible_parser --cov-report=term-missing\n```\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## Acknowledgments\n\n- Inspired by the Ruby [bible_parser](https://github.com/seven1m/bible_parser) library\n- Flutter [bible_parser_flutter](https://github.com/Omarzintan/bible_parser_flutter) implementation\n- Bible XML files from the [open-bibles](https://github.com/seven1m/open-bibles) repository\n\n## Changelog\n\nSee [CHANGELOG.md](CHANGELOG.md) for version history.\n\n## Support\n\n- \ud83d\udceb Issues: [GitHub Issues](https://github.com/Omarzintan/bible_parser_python/issues)\n- \ud83d\udcd6 Documentation: [GitHub Wiki](https://github.com/Omarzintan/bible_parser_python/wiki)\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A Python package for parsing Bible texts in various XML formats (USFX, OSIS, ZEFANIA)",
    "version": "0.1.1",
    "project_urls": {
        "Homepage": "https://github.com/Omarzintan/bible_parser_python",
        "Issues": "https://github.com/Omarzintan/bible_parser_python/issues",
        "Repository": "https://github.com/Omarzintan/bible_parser_python"
    },
    "split_keywords": [
        "bible",
        " parser",
        " xml",
        " usfx",
        " osis",
        " zefania",
        " scripture"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "4a4aeea3c9d3f2a8b973ec3be9c70446854d4273d077f2d3c483ff9a2d2cf37d",
                "md5": "e4a6754908dedccc0ffee09456bfdd9f",
                "sha256": "706f4c7fff774a90a669cdb1ceb8565f52c8ef64ebd4cf9455806103c47eb3e2"
            },
            "downloads": -1,
            "filename": "bible_xml_parser-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e4a6754908dedccc0ffee09456bfdd9f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 20527,
            "upload_time": "2025-10-25T22:49:26",
            "upload_time_iso_8601": "2025-10-25T22:49:26.080930Z",
            "url": "https://files.pythonhosted.org/packages/4a/4a/eea3c9d3f2a8b973ec3be9c70446854d4273d077f2d3c483ff9a2d2cf37d/bible_xml_parser-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "aaaadba909c97bee2e0f2122633522c4e7b1ae324f9481dd1a5bc2afb26fc1e3",
                "md5": "eb2c5c8a50a9d661340f89bc9ecf4389",
                "sha256": "5afeacd9e29549a9e6db7275af51e245fd7325239eb673d51e0573be31f5248d"
            },
            "downloads": -1,
            "filename": "bible_xml_parser-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "eb2c5c8a50a9d661340f89bc9ecf4389",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 25394,
            "upload_time": "2025-10-25T22:49:27",
            "upload_time_iso_8601": "2025-10-25T22:49:27.355657Z",
            "url": "https://files.pythonhosted.org/packages/aa/aa/dba909c97bee2e0f2122633522c4e7b1ae324f9481dd1a5bc2afb26fc1e3/bible_xml_parser-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-25 22:49:27",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Omarzintan",
    "github_project": "bible_parser_python",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "bible-xml-parser"
}
        
Elapsed time: 1.92819s