arcaverborum


Namearcaverborum JSON
Version 0.2.1 PyPI version JSON
download
home_pagehttps://github.com/tresoldi/arcaverborum
SummaryLibrary for interfacing with data from the GLED project
upload_time2023-02-11 15:33:51
maintainer
docs_urlNone
authorTiago Tresoldi
requires_python>=3.7
licenseMIT
keywords linguistics typology sampling
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Arca Verborum

Arca Verborum is a project to interface with the data from the GLED package.

The main available function is currently the one for performing weighted sampling of
languages based on their phylogenetic distance, on their geographic distance
accounting for areal effects
(currently computed as a simple Haversine distance between the coordinates),
and on the frequency among previous random samples.

When obtaining random samples for multiple iterations, it is strongly
recommended to obtain all the samples in a single pass, so that the
library can account for the potential oversampling of languages
belonging to outgroups.

Note that the loading of the distance matrices, particularly of the
geographic one, can take up to a minute on slower machines.

See code for more documentation, as below.

```python
>>> import arcaverborum
>>> sampler = arcaverborum.GLED_Sampler()
WARNING:root:Loading the phylogenetic matrix from GLED...
WARNING:root:Loading the geographic matrix from GLED...
WARNING:root:Rescaling the phylogenetic matrix...
WARNING:root:Rescaling the geographic matrix...
>>> for idx, langset in enumerate(sampler.sample(4, 10)):
...   print(idx, langset)
... 
0 ('TlamacazapaNahuatl_tlam1239', 'GaviaoDoJiparana_gavi1246', 'Tubar_tuba1279', 'Pei_peii1238')
1 ('IslandCarib_isla1278', 'Samburu_samb1315', 'Dahalo_daha1245', 'Potawatomi_pota1247')
2 ('VlaxRomani_vlax1238', 'Gwahatike_gwah1244', 'NezPerce_nezp1238', 'Kwakwala_kwak1269')
3 ('AnaTingaDogon_anat1248', 'Zulgo-Gemzek_zulg1242', 'SkoltSaami_skol1241', 'Xokleng_xokl1240')
4 ('Mangarrayi_mang1381', 'Narak_nara1264', 'Matses_mats1244', 'Ionic-AtticAncientGreek_anci1242')
5 ('Jeli_jeri1242', 'Burum-Mindik_buru1306', 'Kistane_kist1241', 'Bongo_bong1285')
6 ('Patwin_patw1250', 'WesternTamang_west2415', 'Kapori_kapo1250', 'Sakha_yaku1245')
7 ('Kuy_kuyy1240', 'Kistane_kist1241', 'Kuruaya_kuru1309', 'Bolivar-NorthChimborazoHighlandQuichua_chim1302')
8 ('Betaf_beta1253', 'Bargam_barg1252', 'Pengo_peng1244', 'Wuding-LuquanYi_wudi1238')
9 ('NuclearWintu_nucl1651', 'Munit_muni1257', 'Nyawaygi_nyaw1247', 'MadaCameroon_mada1293')
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/tresoldi/arcaverborum",
    "name": "arcaverborum",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "linguistics,typology,sampling",
    "author": "Tiago Tresoldi",
    "author_email": "tiago.tresoldi@lingfil.uu.se",
    "download_url": "https://files.pythonhosted.org/packages/35/84/44fa2cbe64fd62923e26ce5047ca68e51e9bbb3aa2cc649cef17ec5a4a51/arcaverborum-0.2.1.tar.gz",
    "platform": null,
    "description": "# Arca Verborum\n\nArca Verborum is a project to interface with the data from the GLED package.\n\nThe main available function is currently the one for performing weighted sampling of\nlanguages based on their phylogenetic distance, on their geographic distance\naccounting for areal effects\n(currently computed as a simple Haversine distance between the coordinates),\nand on the frequency among previous random samples.\n\nWhen obtaining random samples for multiple iterations, it is strongly\nrecommended to obtain all the samples in a single pass, so that the\nlibrary can account for the potential oversampling of languages\nbelonging to outgroups.\n\nNote that the loading of the distance matrices, particularly of the\ngeographic one, can take up to a minute on slower machines.\n\nSee code for more documentation, as below.\n\n```python\n>>> import arcaverborum\n>>> sampler = arcaverborum.GLED_Sampler()\nWARNING:root:Loading the phylogenetic matrix from GLED...\nWARNING:root:Loading the geographic matrix from GLED...\nWARNING:root:Rescaling the phylogenetic matrix...\nWARNING:root:Rescaling the geographic matrix...\n>>> for idx, langset in enumerate(sampler.sample(4, 10)):\n...   print(idx, langset)\n... \n0 ('TlamacazapaNahuatl_tlam1239', 'GaviaoDoJiparana_gavi1246', 'Tubar_tuba1279', 'Pei_peii1238')\n1 ('IslandCarib_isla1278', 'Samburu_samb1315', 'Dahalo_daha1245', 'Potawatomi_pota1247')\n2 ('VlaxRomani_vlax1238', 'Gwahatike_gwah1244', 'NezPerce_nezp1238', 'Kwakwala_kwak1269')\n3 ('AnaTingaDogon_anat1248', 'Zulgo-Gemzek_zulg1242', 'SkoltSaami_skol1241', 'Xokleng_xokl1240')\n4 ('Mangarrayi_mang1381', 'Narak_nara1264', 'Matses_mats1244', 'Ionic-AtticAncientGreek_anci1242')\n5 ('Jeli_jeri1242', 'Burum-Mindik_buru1306', 'Kistane_kist1241', 'Bongo_bong1285')\n6 ('Patwin_patw1250', 'WesternTamang_west2415', 'Kapori_kapo1250', 'Sakha_yaku1245')\n7 ('Kuy_kuyy1240', 'Kistane_kist1241', 'Kuruaya_kuru1309', 'Bolivar-NorthChimborazoHighlandQuichua_chim1302')\n8 ('Betaf_beta1253', 'Bargam_barg1252', 'Pengo_peng1244', 'Wuding-LuquanYi_wudi1238')\n9 ('NuclearWintu_nucl1651', 'Munit_muni1257', 'Nyawaygi_nyaw1247', 'MadaCameroon_mada1293')\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Library for interfacing with data from the GLED project",
    "version": "0.2.1",
    "split_keywords": [
        "linguistics",
        "typology",
        "sampling"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0c668981af93cd8d788e3e6a028fe86ab3333cef7121466256307c8247480f66",
                "md5": "8de3a786783a60f510667c983b6244b5",
                "sha256": "764cac5a43b0c233182cc3b1a0a82c796614845aea3f1a99fc1d5801f1f0b9fc"
            },
            "downloads": -1,
            "filename": "arcaverborum-0.2.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "8de3a786783a60f510667c983b6244b5",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 97010299,
            "upload_time": "2023-02-11T15:33:08",
            "upload_time_iso_8601": "2023-02-11T15:33:08.331441Z",
            "url": "https://files.pythonhosted.org/packages/0c/66/8981af93cd8d788e3e6a028fe86ab3333cef7121466256307c8247480f66/arcaverborum-0.2.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "358444fa2cbe64fd62923e26ce5047ca68e51e9bbb3aa2cc649cef17ec5a4a51",
                "md5": "87633f04cb89ff91a2b0f0e883a4a4a2",
                "sha256": "7901c685d5928987068fc96b36f236d72b9fe5d5d254c768a0bbba7c76cad44e"
            },
            "downloads": -1,
            "filename": "arcaverborum-0.2.1.tar.gz",
            "has_sig": false,
            "md5_digest": "87633f04cb89ff91a2b0f0e883a4a4a2",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 96495681,
            "upload_time": "2023-02-11T15:33:51",
            "upload_time_iso_8601": "2023-02-11T15:33:51.533665Z",
            "url": "https://files.pythonhosted.org/packages/35/84/44fa2cbe64fd62923e26ce5047ca68e51e9bbb3aa2cc649cef17ec5a4a51/arcaverborum-0.2.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-02-11 15:33:51",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "tresoldi",
    "github_project": "arcaverborum",
    "lcname": "arcaverborum"
}
        
Elapsed time: 0.87393s