# Firstname to Nationality - Python 3.13 Implementation
A name-to-nationality prediction library for Python 3.13+ using machine learning libraries.
## ๐ Features
This library provides the following capabilities:
- โ
**Python 3.13+ Compatible**: Uses Python features and type hints
- โ
**ML Stack**: Built with scikit-learn for performance and compatibility
- โ
**Type Safety**: Full type hints and dataclasses throughout
- โ
**Error Handling**: Robust error handling and fallbacks
- โ
**Dev Container Ready**: Includes VS Code dev container configuration
- โ
**Flexible Training**: Easy model training with your own data
- โ
**Batch Processing**: Efficient batch prediction support
## ๐ฆ Installation
### Using the Dev Container (Recommended)
1. Open in VS Code
2. When prompted, click "Reopen in Container"
3. The dev container will build automatically with Python 3.13
### Manual Installation
```bash
# Ensure you have Python 3.13+
python --version
# Install dependencies
pip install -r requirements.txt
# Install the package
pip install -e .
```
## ๐ง Quick Start
```python
from firstname_to_nationality import FirstnameToNationality
# Initialize the predictor
predictor = FirstnameToNationality()
# Predict nationality for a single name
result = predictor.predict_single("Giuseppe Rossi", top_n=3)
print(result) # [('Italian', 0.85), ('Spanish', 0.12), ...]
# Batch prediction
names = ["John Smith", "Maria Rodriguez", "Zhang Wei"]
results = predictor(names, top_n=2)
for name, predictions in results:
nationality, confidence = predictions[0]
print(f"{name} โ {nationality} ({confidence:.2f})")
```
## ๐งช Examples
Run the example script:
```bash
python example.py
```
## ๐ฅ Training Your Own Model
### Using Sample Data
```bash
python nationality_trainer.py
```
### Using Your Own Data
Create a CSV file with `name` and `nationality` columns:
```csv
name,nationality
John Smith,American
Giuseppe Rossi,Italian
Hiroshi Tanaka,Japanese
```
Then train:
```bash
python nationality_trainer.py your_data.csv
```
### Creating a Dictionary
```bash
python nationality_trainer.py --dict
```
## ๐๏ธ Architecture
The implementation consists of:
- **`FirstnameToNationality`**: Main predictor class with scikit-learn backend
- **`NamePreprocessor`**: Advanced name preprocessing and normalization
- **`PredictionResult`**: Type-safe prediction results using dataclasses
- **Model Pipeline**: TF-IDF vectorization + Logistic Regression
## ๐ File Structure
The implementation uses these file paths:
- `firstname_to_nationality/best-model.pt`: Model checkpoint file
- `firstname_to_nationality/firstname_nationalities.pkl`: Name-to-nationality dictionary
## ๏ฟฝ Usage Examples
### Basic Usage
```python
from firstname_to_nationality import FirstnameToNationality
predictor = FirstnameToNationality()
results = predictor(["John Smith"])
```
### Advanced Features
```python
# Type-safe single predictions
result = predictor.predict_single("John Smith", top_n=3)
# Training interface
predictor.train(names, nationalities, save_model=True)
# Dictionary management
predictor.save_dictionary(name_dict)
```
## ๐ณ Development with Docker
### Dev Container
The repository includes a complete dev container setup for VS Code:
```bash
# Open in VS Code
code .
# Click "Reopen in Container" when prompted
```
### Manual Docker
```bash
# Build
docker build -f .devcontainer/Dockerfile -t firstname-to-nationality .
# Run
docker run -it --rm -v $(pwd):/workspace firstname-to-nationality
```
## โก Performance
The implementation offers:
- Fast training with scikit-learn
- Memory efficiency
- Batch processing support
- Python optimizations
## ๐งฌ Dependencies
**Core Requirements:**
- Python 3.13+
- scikit-learn >= 1.3.0
- numpy >= 1.25.0
- pandas >= 2.0.0
- joblib >= 1.3.0
**Development:**
- pytest, black, isort, pylint, mypy
## ๐ค Contributing
1. Use the dev container for consistent environment
2. Follow type hints throughout
3. Run tests: `pytest`
4. Format code: `black . && isort .`
5. Check types: `mypy firstname_to_nationality/`
## ๐ License
MIT License
## ๏ฟฝ Implementation Details
This is a complete implementation with:
- โ
Consistent method signatures
- โ
Reliable file handling
- โ
Robust prediction results
- โ
Efficient model format
- โ
Minimal dependencies
## ๐ฏ Roadmap
- [ ] Transformer-based models support
- [ ] REST API server
- [ ] Web interface
- [ ] Multi-language support
- [ ] Advanced evaluation metrics
Raw data
{
"_id": null,
"home_page": "https://github.com/callidio/firstname_to_nationality",
"name": "firstname-to-nationality",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.11",
"maintainer_email": null,
"keywords": "firstname nationality prediction names machine-learning nlp",
"author": "Firstname to Nationality Team",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/41/cc/8ce49461960c6c10e73edf7173bac1290b45a768ac1db7957919a1a75752/firstname_to_nationality-1.0.0.tar.gz",
"platform": null,
"description": "# Firstname to Nationality - Python 3.13 Implementation\n\nA name-to-nationality prediction library for Python 3.13+ using machine learning libraries.\n\n## \ud83d\ude80 Features\n\nThis library provides the following capabilities:\n\n- \u2705 **Python 3.13+ Compatible**: Uses Python features and type hints\n- \u2705 **ML Stack**: Built with scikit-learn for performance and compatibility\n- \u2705 **Type Safety**: Full type hints and dataclasses throughout\n- \u2705 **Error Handling**: Robust error handling and fallbacks\n- \u2705 **Dev Container Ready**: Includes VS Code dev container configuration\n- \u2705 **Flexible Training**: Easy model training with your own data\n- \u2705 **Batch Processing**: Efficient batch prediction support\n\n## \ud83d\udce6 Installation\n\n### Using the Dev Container (Recommended)\n\n1. Open in VS Code\n2. When prompted, click \"Reopen in Container\"\n3. The dev container will build automatically with Python 3.13\n\n### Manual Installation\n\n```bash\n# Ensure you have Python 3.13+\npython --version\n\n# Install dependencies\npip install -r requirements.txt\n\n# Install the package\npip install -e .\n```\n\n## \ud83d\udd27 Quick Start\n\n```python\nfrom firstname_to_nationality import FirstnameToNationality\n\n# Initialize the predictor\npredictor = FirstnameToNationality()\n\n# Predict nationality for a single name\nresult = predictor.predict_single(\"Giuseppe Rossi\", top_n=3)\nprint(result) # [('Italian', 0.85), ('Spanish', 0.12), ...]\n\n# Batch prediction\nnames = [\"John Smith\", \"Maria Rodriguez\", \"Zhang Wei\"]\nresults = predictor(names, top_n=2)\n\nfor name, predictions in results:\n nationality, confidence = predictions[0]\n print(f\"{name} \u2192 {nationality} ({confidence:.2f})\")\n```\n\n## \ud83e\uddea Examples\n\nRun the example script:\n\n```bash\npython example.py\n```\n\n## \ud83d\udd25 Training Your Own Model\n\n### Using Sample Data\n\n```bash\npython nationality_trainer.py\n```\n\n### Using Your Own Data\n\nCreate a CSV file with `name` and `nationality` columns:\n\n```csv\nname,nationality\nJohn Smith,American\nGiuseppe Rossi,Italian\nHiroshi Tanaka,Japanese\n```\n\nThen train:\n\n```bash\npython nationality_trainer.py your_data.csv\n```\n\n### Creating a Dictionary\n\n```bash\npython nationality_trainer.py --dict\n```\n\n## \ud83c\udfd7\ufe0f Architecture\n\nThe implementation consists of:\n\n- **`FirstnameToNationality`**: Main predictor class with scikit-learn backend \n- **`NamePreprocessor`**: Advanced name preprocessing and normalization\n- **`PredictionResult`**: Type-safe prediction results using dataclasses\n- **Model Pipeline**: TF-IDF vectorization + Logistic Regression\n\n## \ud83d\udcc1 File Structure\n\nThe implementation uses these file paths:\n\n- `firstname_to_nationality/best-model.pt`: Model checkpoint file\n- `firstname_to_nationality/firstname_nationalities.pkl`: Name-to-nationality dictionary\n\n## \ufffd Usage Examples\n\n### Basic Usage\n```python\nfrom firstname_to_nationality import FirstnameToNationality\npredictor = FirstnameToNationality()\nresults = predictor([\"John Smith\"])\n```\n\n### Advanced Features\n```python\n# Type-safe single predictions\nresult = predictor.predict_single(\"John Smith\", top_n=3)\n\n# Training interface\npredictor.train(names, nationalities, save_model=True)\n\n# Dictionary management\npredictor.save_dictionary(name_dict)\n```\n\n## \ud83d\udc33 Development with Docker\n\n### Dev Container\nThe repository includes a complete dev container setup for VS Code:\n\n```bash\n# Open in VS Code\ncode .\n# Click \"Reopen in Container\" when prompted\n```\n\n### Manual Docker\n```bash\n# Build\ndocker build -f .devcontainer/Dockerfile -t firstname-to-nationality .\n\n# Run\ndocker run -it --rm -v $(pwd):/workspace firstname-to-nationality\n```\n\n## \u26a1 Performance\n\nThe implementation offers:\n\n- Fast training with scikit-learn\n- Memory efficiency\n- Batch processing support\n- Python optimizations\n\n## \ud83e\uddec Dependencies\n\n**Core Requirements:**\n- Python 3.13+\n- scikit-learn >= 1.3.0\n- numpy >= 1.25.0\n- pandas >= 2.0.0\n- joblib >= 1.3.0\n\n**Development:**\n- pytest, black, isort, pylint, mypy\n\n## \ud83e\udd1d Contributing\n\n1. Use the dev container for consistent environment\n2. Follow type hints throughout\n3. Run tests: `pytest`\n4. Format code: `black . && isort .`\n5. Check types: `mypy firstname_to_nationality/`\n\n## \ud83d\udcc4 License\n\nMIT License\n\n## \ufffd Implementation Details\n\nThis is a complete implementation with:\n\n- \u2705 Consistent method signatures\n- \u2705 Reliable file handling\n- \u2705 Robust prediction results\n- \u2705 Efficient model format\n- \u2705 Minimal dependencies\n\n## \ud83c\udfaf Roadmap\n\n- [ ] Transformer-based models support\n- [ ] REST API server\n- [ ] Web interface\n- [ ] Multi-language support\n- [ ] Advanced evaluation metrics\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Nationality Prediction from Firstname using Python 3.13 and scikit-learn",
"version": "1.0.0",
"project_urls": {
"Documentation": "https://github.com/callidio/firstname_to_nationality#readme",
"Homepage": "https://github.com/callidio/firstname_to_nationality",
"Source": "https://github.com/callidio/firstname_to_nationality",
"Tracker": "https://github.com/callidio/firstname_to_nationality/issues"
},
"split_keywords": [
"firstname",
"nationality",
"prediction",
"names",
"machine-learning",
"nlp"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "93856671ffa40b70bae35034a343364d347ee9543de847e685f039a57cd13dd2",
"md5": "254f4f44c664f72da723e578e80bf10f",
"sha256": "639d1963c27456bc6cc404421e6e40c482d1f78c0ab2840834cc0381eaa2d35b"
},
"downloads": -1,
"filename": "firstname_to_nationality-1.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "254f4f44c664f72da723e578e80bf10f",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.11",
"size": 27244601,
"upload_time": "2025-11-07T06:34:46",
"upload_time_iso_8601": "2025-11-07T06:34:46.124777Z",
"url": "https://files.pythonhosted.org/packages/93/85/6671ffa40b70bae35034a343364d347ee9543de847e685f039a57cd13dd2/firstname_to_nationality-1.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "41cc8ce49461960c6c10e73edf7173bac1290b45a768ac1db7957919a1a75752",
"md5": "0a356d254478707a80946d7fb2fe46ca",
"sha256": "3afbad07311e00db2d3b1ddf50f909e3ca83ddc828f36d6f8a066076c028e991"
},
"downloads": -1,
"filename": "firstname_to_nationality-1.0.0.tar.gz",
"has_sig": false,
"md5_digest": "0a356d254478707a80946d7fb2fe46ca",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.11",
"size": 27008265,
"upload_time": "2025-11-07T06:34:49",
"upload_time_iso_8601": "2025-11-07T06:34:49.870457Z",
"url": "https://files.pythonhosted.org/packages/41/cc/8ce49461960c6c10e73edf7173bac1290b45a768ac1db7957919a1a75752/firstname_to_nationality-1.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-11-07 06:34:49",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "callidio",
"github_project": "firstname_to_nationality",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "numpy",
"specs": [
[
">=",
"1.25.0"
]
]
},
{
"name": "scikit-learn",
"specs": [
[
">=",
"1.3.0"
]
]
},
{
"name": "joblib",
"specs": [
[
">=",
"1.3.0"
]
]
},
{
"name": "pandas",
"specs": [
[
">=",
"2.0.0"
]
]
},
{
"name": "matplotlib",
"specs": [
[
">=",
"3.7.0"
]
]
},
{
"name": "seaborn",
"specs": [
[
">=",
"0.12.0"
]
]
},
{
"name": "pytest",
"specs": [
[
">=",
"7.4.0"
]
]
},
{
"name": "black",
"specs": [
[
">=",
"23.0.0"
]
]
},
{
"name": "isort",
"specs": [
[
">=",
"5.12.0"
]
]
},
{
"name": "pylint",
"specs": [
[
">=",
"2.17.0"
]
]
},
{
"name": "mypy",
"specs": [
[
">=",
"1.5.0"
]
]
},
{
"name": "types-requests",
"specs": []
}
],
"lcname": "firstname-to-nationality"
}