slovnet


Nameslovnet JSON
Version 0.6.0 PyPI version JSON
download
home_pagehttps://github.com/natasha/slovnet
SummaryDeep-learning based NLP modeling for Russian language
upload_time2023-01-23 08:08:01
maintainer
docs_urlNone
authorAlexander Kukushkin
requires_python
licenseMIT
keywords nlp deeplearning russian
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
<img src="https://github.com/natasha/natasha-logos/blob/master/slovnet.svg">

![CI](https://github.com/natasha/slovnet/actions/workflows/test.yml/badge.svg)

SlovNet is a Python library for deep-learning based NLP modeling for Russian language. Library is integrated with other <a href="https://github.com/natasha/">Natasha</a> projects: <a href="https://github.com/natasha/nerus">Nerus</a> — large automatically annotated corpus, <a href="https://github.com/natasha/razdel">Razdel</a> — sentence segmenter, tokenizer and <a href="https://github.com/natasha/navec">Navec</a> — compact Russian embeddings. Slovnet provides high quality practical models for Russian NER, morphology and syntax, see <a href="#evaluation">evaluation section</a> for more:

* NER is 1-2% worse than current BERT SOTA by DeepPavlov but 60 times smaller in size (~30 MB) and works fast on CPU (~25 news articles/sec).
* Morphology tagger and syntax parser have comparable accuracy on news dataset with large SOTA BERT models, take 50 times less space (~30 MB), work faster on CPU (~500 sentences/sec). 

## Downloads

<table>

<tr>
<th>Model</th>
<th>Size</th>
<th>Description</th>
</tr>

<tr>
<td>
  <a href="https://storage.yandexcloud.net/natasha-slovnet/packs/slovnet_ner_news_v1.tar">slovnet_ner_news_v1.tar</a>
</td>
<td>2MB</td>
<td>
  Russian NER, standart PER, LOC, ORG annotation, trained on news articles.
</td>
</tr>

<tr>
<td>
  <a href="https://storage.yandexcloud.net/natasha-slovnet/packs/slovnet_morph_news_v1.tar">slovnet_morph_news_v1.tar</a>
</td>
<td>2MB</td>
<td>
  Russian morphology tagger optimized for news articles.
</td>
</tr>

<tr>
<td>
  <a href="https://storage.yandexcloud.net/natasha-slovnet/packs/slovnet_syntax_news_v1.tar">slovnet_syntax_news_v1.tar</a>
</td>
<td>3MB</td>
<td>
  Russian syntax parser optimized for news articles.
</td>
</tr>

</table>

## Install

During inference Slovnet depends only on Numpy. Library supports Python 3.5+, PyPy 3.

```bash
$ pip install slovnet
```

## Usage

Download model weights and vocabs package, use links from <a href="#downloads">downloads section</a> and <a href="https://github.com/natasha/navec#downloads">Navec download section</a>. Optionally install <a href="https://github.com/natasha/ipymarkup">Ipymarkup</a> to visualize NER markup.

Slovnet annotator `map` method has list of items as input and same size iterator over markups as output. Internally items are processed in batches of size `batch_size`. Default size is 8, larger batch — more RAM, better CPU utilization. `__call__` method just calls `map` with a list of 1 item.

### NER

```python
>>> from navec import Navec
>>> from slovnet import NER
>>> from ipymarkup import show_span_ascii_markup as show_markup

>>> text = 'Европейский союз добавил в санкционный список девять политических деятелей из самопровозглашенных республик Донбасса — Донецкой народной республики (ДНР) и Луганской народной республики (ЛНР) — в связи с прошедшими там выборами. Об этом говорится в документе, опубликованном в официальном журнале Евросоюза. В новом списке фигурирует Леонид Пасечник, который по итогам выборов стал главой ЛНР. Помимо него там присутствуют Владимир Бидевка и Денис Мирошниченко, председатели законодательных органов ДНР и ЛНР, а также Ольга Позднякова и Елена Кравченко, председатели ЦИК обеих республик. Выборы прошли в непризнанных республиках Донбасса 11 ноября. На них удержали лидерство действующие руководители и партии — Денис Пушилин и «Донецкая республика» в ДНР и Леонид Пасечник с движением «Мир Луганщине» в ЛНР. Президент Франции Эмманюэль Макрон и канцлер ФРГ Ангела Меркель после встречи с украинским лидером Петром Порошенко осудили проведение выборов, заявив, что они нелегитимны и «подрывают территориальную целостность и суверенитет Украины». Позже к осуждению присоединились США с обещаниями новых санкций для России.'

>>> navec = Navec.load('navec_news_v1_1B_250K_300d_100q.tar')
>>> ner = NER.load('slovnet_ner_news_v1.tar')
>>> ner.navec(navec)

>>> markup = ner(text)
>>> show_markup(markup.text, markup.spans)
Европейский союз добавил в санкционный список девять политических 
LOC─────────────                                                  
деятелей из самопровозглашенных республик Донбасса — Донецкой народной
                                          LOC─────   LOC──────────────
 республики (ДНР) и Луганской народной республики (ЛНР) — в связи с 
─────────────────   LOC────────────────────────────────             
прошедшими там выборами. Об этом говорится в документе, опубликованном
 в официальном журнале Евросоюза. В новом списке фигурирует Леонид 
                       LOC──────                            PER────
Пасечник, который по итогам выборов стал главой ЛНР. Помимо него там 
────────                                        LOC                  
присутствуют Владимир Бидевка и Денис Мирошниченко, председатели 
             PER─────────────   PER───────────────               
законодательных органов ДНР и ЛНР, а также Ольга Позднякова и Елена 
                        LOC   LOC          PER─────────────   PER───
Кравченко, председатели ЦИК обеих республик. Выборы прошли в 
─────────               ORG                                  
непризнанных республиках Донбасса 11 ноября. На них удержали лидерство
                         LOC─────                                     
 действующие руководители и партии — Денис Пушилин и «Донецкая 
                                     PER──────────    ORG──────
республика» в ДНР и Леонид Пасечник с движением «Мир Луганщине» в ЛНР.
──────────    LOC   PER────────────              ORG──────────    LOC 
 Президент Франции Эмманюэль Макрон и канцлер ФРГ Ангела Меркель после
           LOC──── PER─────────────           LOC PER───────────      
 встречи с украинским лидером Петром Порошенко осудили проведение 
                              PER─────────────                    
выборов, заявив, что они нелегитимны и «подрывают территориальную 
целостность и суверенитет Украины». Позже к осуждению присоединились 
                          LOC────                                    
США с обещаниями новых санкций для России.
LOC                                LOC─── 

```

### Morphology

Morphology annotator processes tokenized text. To split the input into sentencies and tokens use <a href="https://github.com/natasha/razdel">Razdel</a>.

```python
>>> from razdel import sentenize, tokenize
>>> from navec import Navec
>>> from slovnet import Morph

>>> chunk = []
>>> for sent in sentenize(text):
>>>     tokens = [_.text for _ in tokenize(sent.text)]
>>>     chunk.append(tokens)
>>> chunk[:1]
[['Европейский', 'союз', 'добавил', 'в', 'санкционный', 'список', 'девять', 'политических', 'деятелей', 'из', 'самопровозглашенных', 'республик', 'Донбасса', '—', 'Донецкой', 'народной', 'республики', '(', 'ДНР', ')', 'и', 'Луганской', 'народной', 'республики', '(', 'ЛНР', ')', '—', 'в', 'связи', 'с', 'прошедшими', 'там', 'выборами', '.']]

>>> navec = Navec.load('navec_news_v1_1B_250K_300d_100q.tar')
>>> morph = Morph.load('slovnet_morph_news_v1.tar', batch_size=4)
>>> morph.navec(navec)

>>> markup = next(morph.map(chunk))
>>> for token in markup.tokens:
>>>     print(f'{token.text:>20} {token.tag}')
         Европейский ADJ|Case=Nom|Degree=Pos|Gender=Masc|Number=Sing
                союз NOUN|Animacy=Inan|Case=Nom|Gender=Masc|Number=Sing
             добавил VERB|Aspect=Perf|Gender=Masc|Mood=Ind|Number=Sing|Tense=Past|VerbForm=Fin|Voice=Act
                   в ADP
         санкционный ADJ|Animacy=Inan|Case=Acc|Degree=Pos|Gender=Masc|Number=Sing
              список NOUN|Animacy=Inan|Case=Acc|Gender=Masc|Number=Sing
              девять NUM|Case=Nom
        политических ADJ|Case=Gen|Degree=Pos|Number=Plur
            деятелей NOUN|Animacy=Anim|Case=Gen|Gender=Masc|Number=Plur
                  из ADP
 самопровозглашенных ADJ|Case=Gen|Degree=Pos|Number=Plur
           республик NOUN|Animacy=Inan|Case=Gen|Gender=Fem|Number=Plur
            Донбасса PROPN|Animacy=Inan|Case=Gen|Gender=Masc|Number=Sing
                   — PUNCT
            Донецкой ADJ|Case=Gen|Degree=Pos|Gender=Fem|Number=Sing
            народной ADJ|Case=Gen|Degree=Pos|Gender=Fem|Number=Sing
          республики NOUN|Animacy=Inan|Case=Gen|Gender=Fem|Number=Sing
                   ( PUNCT
                 ДНР PROPN|Animacy=Inan|Case=Gen|Gender=Fem|Number=Sing
                   ) PUNCT
                   и CCONJ
           Луганской ADJ|Case=Gen|Degree=Pos|Gender=Fem|Number=Sing
            народной ADJ|Case=Gen|Degree=Pos|Gender=Fem|Number=Sing
          республики NOUN|Animacy=Inan|Case=Gen|Gender=Fem|Number=Sing
                   ( PUNCT
                 ЛНР PROPN|Animacy=Inan|Case=Gen|Gender=Fem|Number=Sing
                   ) PUNCT
                   — PUNCT
                   в ADP
               связи NOUN|Animacy=Inan|Case=Loc|Gender=Fem|Number=Sing
                   с ADP
          прошедшими VERB|Aspect=Perf|Case=Ins|Number=Plur|Tense=Past|VerbForm=Part|Voice=Act
                 там ADV|Degree=Pos
            выборами NOUN|Animacy=Inan|Case=Ins|Gender=Masc|Number=Plur
                   . PUNCT

```

### Syntax

Syntax parser processes sentencies split into tokens. Use <a href="https://github.com/natasha/razdel">Razdel</a> for segmentation.

```python
>>> from ipymarkup import show_dep_ascii_markup as show_markup
>>> from razdel import sentenize, tokenize
>>> from navec import Navec
>>> from slovnet import Syntax

>>> chunk = []
>>> for sent in sentenize(text):
>>>     tokens = [_.text for _ in tokenize(sent.text)]
>>>     chunk.append(tokens)
>>> chunk[:1]
[['Европейский', 'союз', 'добавил', 'в', 'санкционный', 'список', 'девять', 'политических', 'деятелей', 'из', 'самопровозглашенных', 'республик', 'Донбасса', '—', 'Донецкой', 'народной', 'республики', '(', 'ДНР', ')', 'и', 'Луганской', 'народной', 'республики', '(', 'ЛНР', ')', '—', 'в', 'связи', 'с', 'прошедшими', 'там', 'выборами', '.']]

>>> navec = Navec.load('navec_news_v1_1B_250K_300d_100q.tar')
>>> syntax = Syntax.load('slovnet_syntax_news_v1.tar')
>>> syntax.navec(navec)

>>> markup = next(syntax.map(chunk))

# Convert CoNLL-style format to source, target indices
>>> words, deps = [], []
>>> for token in markup.tokens:
>>>     words.append(token.text)
>>>     source = int(token.head_id) - 1
>>>     target = int(token.id) - 1
>>>     if source > 0 and source != target:  # skip root, loops
>>>         deps.append([source, target, token.rel])
>>> show_markup(words, deps)
              ┌► Европейский         amod
            ┌►└─ союз                nsubj
┌───────┌─┌─└─── добавил             
│       │ │ ┌──► в                   case
│       │ │ │ ┌► санкционный         amod
│       │ └►└─└─ список              obl
│       │   ┌──► девять              nummod:gov
│       │   │ ┌► политических        amod
│ ┌─────└►┌─└─└─ деятелей            obj
│ │       │ ┌──► из                  case
│ │       │ │ ┌► самопровозглашенных amod
│ │       └►└─└─ республик           nmod
│ │         └──► Донбасса            nmod
│ │ ┌──────────► —                   punct
│ │ │       ┌──► Донецкой            amod
│ │ │       │ ┌► народной            amod
│ │ │ ┌─┌─┌─└─└─ республики          
│ │ │ │ │ │   ┌► (                   punct
│ │ │ │ │ └►┌─└─ ДНР                 parataxis
│ │ │ │ │   └──► )                   punct
│ │ │ │ │ ┌────► и                   cc
│ │ │ │ │ │ ┌──► Луганской           amod
│ │ │ │ │ │ │ ┌► народной            amod
│ │ └─│ └►└─└─└─ республики          conj
│ │   │       ┌► (                   punct
│ │   └────►┌─└─ ЛНР                 parataxis
│ │         └──► )                   punct
│ │     ┌──────► —                   punct
│ │     │ ┌►┌─┌─ в                   case
│ │     │ │ │ └► связи               fixed
│ │     │ │ └──► с                   fixed
│ │     │ │ ┌►┌─ прошедшими          acl
│ │     │ │ │ └► там                 advmod
│ └────►└─└─└─── выборами            nmod
└──────────────► .                   punct

```

## Documentation

Materials are in Russian:

* <a href="https://natasha.github.io/ner">Article about distillation and quantization in Slovnet</a> 
* <a href="https://youtu.be/-7XT_U6hVvk?t=2034">Slovnet section of Datafest 2020 talk</a>

## Evaluation

In addition to quality metrics we measure speed and models size, parameters that are important in production:

* `init` — time between system launch and first response. It is convenient for testing and devops to have model that starts quickly.
* `disk` — file size of artefacts one needs to download before using the system: model weights, embeddings, binaries, vocabs. It is convenient to deploy compact models in production.
* `ram` — average CPU/GPU RAM usage.
* `speed` — number of input items processed per second: news articles, tokenized sentencies.

### NER

4 datasets are used for evaluation: <a href="https://github.com/natasha/corus#load_factru"><code>factru</code></a>, <a href="https://github.com/natasha/corus#load_gareev"><code>gareev</code></a>, <a href="https://github.com/natasha/corus#load_ne5"><code>ne5</code></a> and <a href="https://github.com/natasha/corus#load_bsnlp"><code>bsnlp</code></a>. Slovnet is compared to <a href="https://github.com/natasha/naeval#deeppavlov_ner"><code>deeppavlov</code></a>, <a href="https://github.com/natasha/naeval#deeppavlov_bert_ner"><code>deeppavlov_bert</code></a>, <a href="https://github.com/natasha/naeval#deeppavlov_slavic_bert_ner"><code>deeppavlov_slavic</code></a>, <a href="https://github.com/natasha/naeval#pullenti"><code>pullenti</code></a>, <a href="https://github.com/natasha/naeval#spacy"><code>spacy</code></a>, <a href="https://github.com/natasha/naeval#stanza"><code>stanza</code></a>, <a href="https://github.com/natasha/naeval#texterra"><code>texterra</code></a>, <a href="https://github.com/natasha/naeval#tomita"><code>tomita</code></a>, <a href="https://github.com/natasha/naeval#mitie"><code>mitie</code></a>.

For every column top 3 results are highlighted:

<!--- ner1 --->
<table border="0" class="dataframe">
  <thead>
    <tr>
      <th></th>
      <th colspan="3" halign="left">factru</th>
      <th colspan="2" halign="left">gareev</th>
      <th colspan="3" halign="left">ne5</th>
      <th colspan="3" halign="left">bsnlp</th>
    </tr>
    <tr>
      <th>f1</th>
      <th>PER</th>
      <th>LOC</th>
      <th>ORG</th>
      <th>PER</th>
      <th>ORG</th>
      <th>PER</th>
      <th>LOC</th>
      <th>ORG</th>
      <th>PER</th>
      <th>LOC</th>
      <th>ORG</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>slovnet</th>
      <td><b>0.959</b></td>
      <td><b>0.915</b></td>
      <td><b>0.825</b></td>
      <td><b>0.977</b></td>
      <td><b>0.899</b></td>
      <td><b>0.984</b></td>
      <td><b>0.973</b></td>
      <td><b>0.951</b></td>
      <td>0.944</td>
      <td>0.834</td>
      <td>0.718</td>
    </tr>
    <tr>
      <th>slovnet_bert</th>
      <td><b>0.973</b></td>
      <td><b>0.928</b></td>
      <td><b>0.831</b></td>
      <td><b>0.991</b></td>
      <td><b>0.911</b></td>
      <td><b>0.996</b></td>
      <td><b>0.989</b></td>
      <td><b>0.976</b></td>
      <td><b>0.960</b></td>
      <td>0.838</td>
      <td><b>0.733</b></td>
    </tr>
    <tr>
      <th>deeppavlov</th>
      <td>0.910</td>
      <td>0.886</td>
      <td>0.742</td>
      <td>0.944</td>
      <td>0.798</td>
      <td>0.942</td>
      <td>0.919</td>
      <td>0.881</td>
      <td>0.866</td>
      <td>0.767</td>
      <td>0.624</td>
    </tr>
    <tr>
      <th>deeppavlov_bert</th>
      <td><b>0.971</b></td>
      <td><b>0.928</b></td>
      <td><b>0.825</b></td>
      <td><b>0.980</b></td>
      <td><b>0.916</b></td>
      <td><b>0.997</b></td>
      <td><b>0.990</b></td>
      <td><b>0.976</b></td>
      <td><b>0.954</b></td>
      <td><b>0.840</b></td>
      <td><b>0.741</b></td>
    </tr>
    <tr>
      <th>deeppavlov_slavic</th>
      <td>0.956</td>
      <td>0.884</td>
      <td>0.714</td>
      <td>0.976</td>
      <td>0.776</td>
      <td>0.984</td>
      <td>0.817</td>
      <td>0.761</td>
      <td><b>0.965</b></td>
      <td><b>0.925</b></td>
      <td><b>0.831</b></td>
    </tr>
    <tr>
      <th>pullenti</th>
      <td>0.905</td>
      <td>0.814</td>
      <td>0.686</td>
      <td>0.939</td>
      <td>0.639</td>
      <td>0.952</td>
      <td>0.862</td>
      <td>0.683</td>
      <td>0.900</td>
      <td>0.769</td>
      <td>0.566</td>
    </tr>
    <tr>
      <th>spacy</th>
      <td>0.901</td>
      <td>0.886</td>
      <td>0.765</td>
      <td>0.970</td>
      <td>0.883</td>
      <td>0.967</td>
      <td>0.928</td>
      <td>0.918</td>
      <td>0.919</td>
      <td>0.823</td>
      <td>0.693</td>
    </tr>
    <tr>
      <th>stanza</th>
      <td>0.943</td>
      <td>0.865</td>
      <td>0.687</td>
      <td>0.953</td>
      <td>0.827</td>
      <td>0.923</td>
      <td>0.753</td>
      <td>0.734</td>
      <td>0.938</td>
      <td><b>0.838</b></td>
      <td>0.724</td>
    </tr>
    <tr>
      <th>texterra</th>
      <td>0.900</td>
      <td>0.800</td>
      <td>0.597</td>
      <td>0.888</td>
      <td>0.561</td>
      <td>0.901</td>
      <td>0.777</td>
      <td>0.594</td>
      <td>0.858</td>
      <td>0.783</td>
      <td>0.548</td>
    </tr>
    <tr>
      <th>tomita</th>
      <td>0.929</td>
      <td></td>
      <td></td>
      <td>0.921</td>
      <td></td>
      <td>0.945</td>
      <td></td>
      <td></td>
      <td>0.881</td>
      <td></td>
      <td></td>
    </tr>
    <tr>
      <th>mitie</th>
      <td>0.888</td>
      <td>0.861</td>
      <td>0.532</td>
      <td>0.849</td>
      <td>0.452</td>
      <td>0.753</td>
      <td>0.642</td>
      <td>0.432</td>
      <td>0.736</td>
      <td>0.801</td>
      <td>0.524</td>
    </tr>
  </tbody>
</table>
<!--- ner1 --->

`it/s` — news articles per second, 1 article ≈ 1KB.

<!--- ner2 --->
<table border="0" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>init, s</th>
      <th>disk, mb</th>
      <th>ram, mb</th>
      <th>speed, it/s</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>slovnet</th>
      <td><b>1.0</b></td>
      <td><b>27</b></td>
      <td><b>205</b></td>
      <td>25.3</td>
    </tr>
    <tr>
      <th>slovnet_bert</th>
      <td>5.0</td>
      <td>473</td>
      <td>9500</td>
      <td><b>40.0 (gpu)</b></td>
    </tr>
    <tr>
      <th>deeppavlov</th>
      <td>5.9</td>
      <td>1024</td>
      <td>3072</td>
      <td>24.3 (gpu)</td>
    </tr>
    <tr>
      <th>deeppavlov_bert</th>
      <td>34.5</td>
      <td>2048</td>
      <td>6144</td>
      <td>13.1 (gpu)</td>
    </tr>
    <tr>
      <th>deeppavlov_slavic</th>
      <td>35.0</td>
      <td>2048</td>
      <td>4096</td>
      <td>8.0 (gpu)</td>
    </tr>
    <tr>
      <th>pullenti</th>
      <td><b>2.9</b></td>
      <td><b>16</b></td>
      <td><b>253</b></td>
      <td>6.0</td>
    </tr>
    <tr>
      <th>spacy</th>
      <td>8.0</td>
      <td>140</td>
      <td>625</td>
      <td>8.0</td>
    </tr>
    <tr>
      <th>stanza</th>
      <td>3.0</td>
      <td>591</td>
      <td>11264</td>
      <td>3.0 (gpu)</td>
    </tr>
    <tr>
      <th>texterra</th>
      <td>47.6</td>
      <td>193</td>
      <td>3379</td>
      <td>4.0</td>
    </tr>
    <tr>
      <th>tomita</th>
      <td><b>2.0</b></td>
      <td><b>64</b></td>
      <td><b>63</b></td>
      <td><b>29.8</b></td>
    </tr>
    <tr>
      <th>mitie</th>
      <td>28.3</td>
      <td>327</td>
      <td>261</td>
      <td><b>32.8</b></td>
    </tr>
  </tbody>
</table>
<!--- ner2 --->

### Morphology

<a href="https://github.com/natasha/corus#load_gramru">Datasets from GramEval2020</a> are used for evaluation:

* `news` — sample from Lenta.ru.
* `wiki` — UD GSD.
* `fiction` — SynTagRus + JZ.
* `social`, `poetry` — social, poetry subset of Taiga.

Slovnet is compated to a number of existing morphology taggers: <a href="https://github.com/natasha/naeval#deeppavlov_morph"><code>deeppavlov</code></a>, <a href="https://github.com/natasha/naeval#deeppavlov_bert_morph"><code>deeppavlov_bert</code></a>, <a href="https://github.com/natasha/naeval#rupostagger"><code>rupostagger</code></a>, <a href="https://github.com/natasha/naeval#rnnmorph"><code>rnnmorph</code></a>, <a href="https://github.com/natasha/naeval#mary"><code>maru</code></a>, <a href="https://github.com/natasha/naeval#udpipe"><code>udpipe</code></a>, <a href="https://github.com/natasha/naeval#spacy"><code>spacy</code></a>, <a href="https://github.com/natasha/naeval#stanza"><code>stanza</code></a>.

For every column top 3 results are highlighted. `slovnet` was trained only on news dataset:

<!--- morph1 --->
<table border="0" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>news</th>
      <th>wiki</th>
      <th>fiction</th>
      <th>social</th>
      <th>poetry</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>slovnet</th>
      <td><b>0.961</b></td>
      <td>0.815</td>
      <td>0.905</td>
      <td>0.807</td>
      <td>0.664</td>
    </tr>
    <tr>
      <th>slovnet_bert</th>
      <td><b>0.982</b></td>
      <td><b>0.884</b></td>
      <td><b>0.990</b></td>
      <td><b>0.890</b></td>
      <td><b>0.856</b></td>
    </tr>
    <tr>
      <th>deeppavlov</th>
      <td>0.940</td>
      <td>0.841</td>
      <td>0.944</td>
      <td>0.870</td>
      <td><b>0.857</b></td>
    </tr>
    <tr>
      <th>deeppavlov_bert</th>
      <td>0.951</td>
      <td><b>0.868</b></td>
      <td><b>0.964</b></td>
      <td><b>0.892</b></td>
      <td><b>0.865</b></td>
    </tr>
    <tr>
      <th>udpipe</th>
      <td>0.918</td>
      <td>0.811</td>
      <td><b>0.957</b></td>
      <td>0.870</td>
      <td>0.776</td>
    </tr>
    <tr>
      <th>spacy</th>
      <td><b>0.964</b></td>
      <td><b>0.849</b></td>
      <td>0.942</td>
      <td>0.857</td>
      <td>0.784</td>
    </tr>
    <tr>
      <th>stanza</th>
      <td>0.934</td>
      <td>0.831</td>
      <td>0.940</td>
      <td><b>0.873</b></td>
      <td>0.825</td>
    </tr>
    <tr>
      <th>rnnmorph</th>
      <td>0.896</td>
      <td>0.812</td>
      <td>0.890</td>
      <td>0.860</td>
      <td>0.838</td>
    </tr>
    <tr>
      <th>maru</th>
      <td>0.894</td>
      <td>0.808</td>
      <td>0.887</td>
      <td>0.861</td>
      <td>0.840</td>
    </tr>
    <tr>
      <th>rupostagger</th>
      <td>0.673</td>
      <td>0.645</td>
      <td>0.661</td>
      <td>0.641</td>
      <td>0.636</td>
    </tr>
  </tbody>
</table>
<!--- morph1 --->

`it/s` — sentences per second.

<!--- morph2 --->
<table border="0" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>init, s</th>
      <th>disk, mb</th>
      <th>ram, mb</th>
      <th>speed, it/s</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>slovnet</th>
      <td><b>1.0</b></td>
      <td><b>27</b></td>
      <td><b>115</b></td>
      <td><b>532.0</b></td>
    </tr>
    <tr>
      <th>slovnet_bert</th>
      <td>5.0</td>
      <td>475</td>
      <td>8087</td>
      <td><b>285.0 (gpu)</b></td>
    </tr>
    <tr>
      <th>deeppavlov</th>
      <td><b>4.0</b></td>
      <td>32</td>
      <td>10240</td>
      <td>90.0 (gpu)</td>
    </tr>
    <tr>
      <th>deeppavlov_bert</th>
      <td>20.0</td>
      <td>1393</td>
      <td>8704</td>
      <td>85.0 (gpu)</td>
    </tr>
    <tr>
      <th>udpipe</th>
      <td>6.9</td>
      <td>45</td>
      <td><b>242</b></td>
      <td>56.2</td>
    </tr>
    <tr>
      <th>spacy</th>
      <td>8.0</td>
      <td>140</td>
      <td>579</td>
      <td>50.0</td>
    </tr>
    <tr>
      <th>stanza</th>
      <td><b>2.0</b></td>
      <td>591</td>
      <td>393</td>
      <td><b>92.0</b></td>
    </tr>
    <tr>
      <th>rnnmorph</th>
      <td>8.7</td>
      <td><b>10</b></td>
      <td>289</td>
      <td>16.6</td>
    </tr>
    <tr>
      <th>maru</th>
      <td>15.8</td>
      <td>44</td>
      <td>370</td>
      <td>36.4</td>
    </tr>
    <tr>
      <th>rupostagger</th>
      <td>4.8</td>
      <td><b>3</b></td>
      <td><b>118</b></td>
      <td>48.0</td>
    </tr>
  </tbody>
</table>
<!--- morph2 --->

### Syntax

Slovnet is compated to several existing syntax parsers: <a href="https://github.com/natasha/naeval#udpipe"><code>udpipe</code></a>, <a href="https://github.com/natasha/naeval#spacy"><code>spacy</code></a>, <a href="https://github.com/natasha/naeval#deeppavlov_bert_syntax"><code>deeppavlov</code></a>, <a href="https://github.com/natasha/naeval#stanza"><code>stanza</code></a>.

<!--- syntax1 --->
<table border="0" class="dataframe">
  <thead>
    <tr>
      <th></th>
      <th colspan="2" halign="left">news</th>
      <th colspan="2" halign="left">wiki</th>
      <th colspan="2" halign="left">fiction</th>
      <th colspan="2" halign="left">social</th>
      <th colspan="2" halign="left">poetry</th>
    </tr>
    <tr>
      <th></th>
      <th>uas</th>
      <th>las</th>
      <th>uas</th>
      <th>las</th>
      <th>uas</th>
      <th>las</th>
      <th>uas</th>
      <th>las</th>
      <th>uas</th>
      <th>las</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>slovnet</th>
      <td>0.907</td>
      <td>0.880</td>
      <td>0.775</td>
      <td>0.718</td>
      <td>0.806</td>
      <td>0.776</td>
      <td>0.726</td>
      <td>0.656</td>
      <td>0.542</td>
      <td>0.469</td>
    </tr>
    <tr>
      <th>slovnet_bert</th>
      <td><b>0.965</b></td>
      <td><b>0.936</b></td>
      <td><b>0.891</b></td>
      <td><b>0.828</b></td>
      <td><b>0.958</b></td>
      <td><b>0.940</b></td>
      <td><b>0.846</b></td>
      <td><b>0.782</b></td>
      <td><b>0.776</b></td>
      <td><b>0.706</b></td>
    </tr>
    <tr>
      <th>deeppavlov_bert</th>
      <td><b>0.962</b></td>
      <td><b>0.910</b></td>
      <td><b>0.882</b></td>
      <td><b>0.786</b></td>
      <td><b>0.963</b></td>
      <td><b>0.929</b></td>
      <td><b>0.844</b></td>
      <td><b>0.761</b></td>
      <td><b>0.784</b></td>
      <td><b>0.691</b></td>
    </tr>
    <tr>
      <th>udpipe</th>
      <td>0.873</td>
      <td>0.823</td>
      <td>0.622</td>
      <td>0.531</td>
      <td>0.910</td>
      <td>0.876</td>
      <td>0.700</td>
      <td>0.624</td>
      <td>0.625</td>
      <td>0.534</td>
    </tr>
    <tr>
      <th>spacy</th>
      <td><b>0.943</b></td>
      <td><b>0.916</b></td>
      <td><b>0.851</b></td>
      <td><b>0.783</b></td>
      <td>0.901</td>
      <td>0.874</td>
      <td><b>0.804</b></td>
      <td><b>0.737</b></td>
      <td>0.704</td>
      <td><b>0.616</b></td>
    </tr>
    <tr>
      <th>stanza</th>
      <td>0.940</td>
      <td>0.886</td>
      <td>0.815</td>
      <td>0.716</td>
      <td><b>0.936</b></td>
      <td><b>0.895</b></td>
      <td>0.802</td>
      <td>0.714</td>
      <td><b>0.713</b></td>
      <td>0.613</td>
    </tr>
  </tbody>
</table>
<!--- syntax1 --->

`it/s` — sentences per second.

<!--- syntax2 --->
<table border="0" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>init, s</th>
      <th>disk, mb</th>
      <th>ram, mb</th>
      <th>speed, it/s</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>slovnet</th>
      <td><b>1.0</b></td>
      <td><b>27</b></td>
      <td><b>125</b></td>
      <td><b>450.0</b></td>
    </tr>
    <tr>
      <th>slovnet_bert</th>
      <td><b>5.0</b></td>
      <td>504</td>
      <td>3427</td>
      <td><b>200.0 (gpu)</b></td>
    </tr>
    <tr>
      <th>deeppavlov_bert</th>
      <td>34.0</td>
      <td>1427</td>
      <td>8704</td>
      <td><b>75.0 (gpu)</b></td>
    </tr>
    <tr>
      <th>udpipe</th>
      <td>6.9</td>
      <td><b>45</b></td>
      <td><b>242</b></td>
      <td>56.2</td>
    </tr>
    <tr>
      <th>spacy</th>
      <td>9.0</td>
      <td><b>140</b></td>
      <td><b>579</b></td>
      <td>41.0</td>
    </tr>
    <tr>
      <th>stanza</th>
      <td><b>3.0</b></td>
      <td>591</td>
      <td>890</td>
      <td>12.0</td>
    </tr>
  </tbody>
</table>
<!--- syntax2 --->

## Support

- Chat — https://telegram.me/natural_language_processing
- Issues — https://github.com/natasha/slovnet/issues
- Commercial support — https://lab.alexkuk.ru

## Development

Dev env

```bash
python -m venv ~/.venvs/natasha-slovnet
source ~/.venvs/natasha-slovnet/bin/activate

pip install -r requirements/dev.txt
pip install -e .
```

Test

```bash
make test
```

Rent GPU

```bash
yc compute instance create \
  --name gpu \
  --zone ru-central1-a \
  --network-interface subnet-name=default,nat-ip-version=ipv4 \
  --create-boot-disk image-folder-id=standard-images,image-family=ubuntu-1804-lts-ngc,type=network-ssd,size=20 \
  --cores=8 \
  --memory=96 \
  --gpus=1 \
  --ssh-key ~/.ssh/id_rsa.pub \
  --folder-name default \
  --platform-id gpu-standard-v1 \
  --preemptible

yc compute instance delete --name gpu
```

Setup instance

```
sudo locale-gen ru_RU.UTF-8

sudo apt-get update
sudo apt-get install -y \
  python3-pip

# grpcio long install ~10m, not using prebuilt wheel
# "it is not compatible with this Python" 
sudo pip3 install -v \
  jupyter \
  tensorboard

mkdir runs
nohup tensorboard \
  --logdir=runs \
  --host=localhost \
  --port=6006 \
  --reload_interval=1 &

nohup jupyter notebook \
  --no-browser \
  --allow-root \
  --ip=localhost \
  --port=8888 \
  --NotebookApp.token='' \
  --NotebookApp.password='' &

ssh -Nf gpu -L 8888:localhost:8888 -L 6006:localhost:6006

scp ~/.slovnet.json gpu:~
rsync --exclude data -rv . gpu:~/slovnet
rsync -u --exclude data -rv 'gpu:~/slovnet/*' .
```

Intall dev

```bash
pip3 install -r slovnet/requirements/dev.txt -r slovnet/requirements/gpu.txt
pip3 install -e slovnet
```

Release

```bash
# Update setup.py version

git commit -am 'Up version'
git tag v0.6.0

git push
git push --tags

# Github Action builds dist and publishes to PyPi
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/natasha/slovnet",
    "name": "slovnet",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "nlp,deeplearning,russian",
    "author": "Alexander Kukushkin",
    "author_email": "alex@alexkuk.ru",
    "download_url": "https://files.pythonhosted.org/packages/3e/d1/bba34dec46f1fcb85ca35815268be427ee89d09728c85ae4ab294dd9db09/slovnet-0.6.0.tar.gz",
    "platform": null,
    "description": "\n<img src=\"https://github.com/natasha/natasha-logos/blob/master/slovnet.svg\">\n\n![CI](https://github.com/natasha/slovnet/actions/workflows/test.yml/badge.svg)\n\nSlovNet is a Python library for deep-learning based NLP modeling for Russian language. Library is integrated with other <a href=\"https://github.com/natasha/\">Natasha</a> projects: <a href=\"https://github.com/natasha/nerus\">Nerus</a> \u2014 large automatically annotated corpus, <a href=\"https://github.com/natasha/razdel\">Razdel</a> \u2014 sentence segmenter, tokenizer and <a href=\"https://github.com/natasha/navec\">Navec</a> \u2014 compact Russian embeddings. Slovnet provides high quality practical models for Russian NER, morphology and syntax, see <a href=\"#evaluation\">evaluation section</a> for more:\n\n* NER is 1-2% worse than current BERT SOTA by DeepPavlov but 60 times smaller in size (~30 MB) and works fast on CPU (~25 news articles/sec).\n* Morphology tagger and syntax parser have comparable accuracy on news dataset with large SOTA BERT models, take 50 times less space (~30 MB), work faster on CPU (~500 sentences/sec). \n\n## Downloads\n\n<table>\n\n<tr>\n<th>Model</th>\n<th>Size</th>\n<th>Description</th>\n</tr>\n\n<tr>\n<td>\n  <a href=\"https://storage.yandexcloud.net/natasha-slovnet/packs/slovnet_ner_news_v1.tar\">slovnet_ner_news_v1.tar</a>\n</td>\n<td>2MB</td>\n<td>\n  Russian NER, standart PER, LOC, ORG annotation, trained on news articles.\n</td>\n</tr>\n\n<tr>\n<td>\n  <a href=\"https://storage.yandexcloud.net/natasha-slovnet/packs/slovnet_morph_news_v1.tar\">slovnet_morph_news_v1.tar</a>\n</td>\n<td>2MB</td>\n<td>\n  Russian morphology tagger optimized for news articles.\n</td>\n</tr>\n\n<tr>\n<td>\n  <a href=\"https://storage.yandexcloud.net/natasha-slovnet/packs/slovnet_syntax_news_v1.tar\">slovnet_syntax_news_v1.tar</a>\n</td>\n<td>3MB</td>\n<td>\n  Russian syntax parser optimized for news articles.\n</td>\n</tr>\n\n</table>\n\n## Install\n\nDuring inference Slovnet depends only on Numpy. Library supports Python 3.5+, PyPy 3.\n\n```bash\n$ pip install slovnet\n```\n\n## Usage\n\nDownload model weights and vocabs package, use links from <a href=\"#downloads\">downloads section</a> and <a href=\"https://github.com/natasha/navec#downloads\">Navec download section</a>. Optionally install <a href=\"https://github.com/natasha/ipymarkup\">Ipymarkup</a> to visualize NER markup.\n\nSlovnet annotator `map` method has list of items as input and same size iterator over markups as output. Internally items are processed in batches of size `batch_size`. Default size is 8, larger batch \u2014 more RAM, better CPU utilization. `__call__` method just calls `map` with a list of 1 item.\n\n### NER\n\n```python\n>>> from navec import Navec\n>>> from slovnet import NER\n>>> from ipymarkup import show_span_ascii_markup as show_markup\n\n>>> text = '\u0415\u0432\u0440\u043e\u043f\u0435\u0439\u0441\u043a\u0438\u0439 \u0441\u043e\u044e\u0437 \u0434\u043e\u0431\u0430\u0432\u0438\u043b \u0432 \u0441\u0430\u043d\u043a\u0446\u0438\u043e\u043d\u043d\u044b\u0439 \u0441\u043f\u0438\u0441\u043e\u043a \u0434\u0435\u0432\u044f\u0442\u044c \u043f\u043e\u043b\u0438\u0442\u0438\u0447\u0435\u0441\u043a\u0438\u0445 \u0434\u0435\u044f\u0442\u0435\u043b\u0435\u0439 \u0438\u0437 \u0441\u0430\u043c\u043e\u043f\u0440\u043e\u0432\u043e\u0437\u0433\u043b\u0430\u0448\u0435\u043d\u043d\u044b\u0445 \u0440\u0435\u0441\u043f\u0443\u0431\u043b\u0438\u043a \u0414\u043e\u043d\u0431\u0430\u0441\u0441\u0430 \u2014 \u0414\u043e\u043d\u0435\u0446\u043a\u043e\u0439 \u043d\u0430\u0440\u043e\u0434\u043d\u043e\u0439 \u0440\u0435\u0441\u043f\u0443\u0431\u043b\u0438\u043a\u0438 (\u0414\u041d\u0420) \u0438 \u041b\u0443\u0433\u0430\u043d\u0441\u043a\u043e\u0439 \u043d\u0430\u0440\u043e\u0434\u043d\u043e\u0439 \u0440\u0435\u0441\u043f\u0443\u0431\u043b\u0438\u043a\u0438 (\u041b\u041d\u0420) \u2014 \u0432 \u0441\u0432\u044f\u0437\u0438 \u0441 \u043f\u0440\u043e\u0448\u0435\u0434\u0448\u0438\u043c\u0438 \u0442\u0430\u043c \u0432\u044b\u0431\u043e\u0440\u0430\u043c\u0438. \u041e\u0431 \u044d\u0442\u043e\u043c \u0433\u043e\u0432\u043e\u0440\u0438\u0442\u0441\u044f \u0432 \u0434\u043e\u043a\u0443\u043c\u0435\u043d\u0442\u0435, \u043e\u043f\u0443\u0431\u043b\u0438\u043a\u043e\u0432\u0430\u043d\u043d\u043e\u043c \u0432 \u043e\u0444\u0438\u0446\u0438\u0430\u043b\u044c\u043d\u043e\u043c \u0436\u0443\u0440\u043d\u0430\u043b\u0435 \u0415\u0432\u0440\u043e\u0441\u043e\u044e\u0437\u0430. \u0412 \u043d\u043e\u0432\u043e\u043c \u0441\u043f\u0438\u0441\u043a\u0435 \u0444\u0438\u0433\u0443\u0440\u0438\u0440\u0443\u0435\u0442 \u041b\u0435\u043e\u043d\u0438\u0434 \u041f\u0430\u0441\u0435\u0447\u043d\u0438\u043a, \u043a\u043e\u0442\u043e\u0440\u044b\u0439 \u043f\u043e \u0438\u0442\u043e\u0433\u0430\u043c \u0432\u044b\u0431\u043e\u0440\u043e\u0432 \u0441\u0442\u0430\u043b \u0433\u043b\u0430\u0432\u043e\u0439 \u041b\u041d\u0420. \u041f\u043e\u043c\u0438\u043c\u043e \u043d\u0435\u0433\u043e \u0442\u0430\u043c \u043f\u0440\u0438\u0441\u0443\u0442\u0441\u0442\u0432\u0443\u044e\u0442 \u0412\u043b\u0430\u0434\u0438\u043c\u0438\u0440 \u0411\u0438\u0434\u0435\u0432\u043a\u0430 \u0438 \u0414\u0435\u043d\u0438\u0441 \u041c\u0438\u0440\u043e\u0448\u043d\u0438\u0447\u0435\u043d\u043a\u043e, \u043f\u0440\u0435\u0434\u0441\u0435\u0434\u0430\u0442\u0435\u043b\u0438 \u0437\u0430\u043a\u043e\u043d\u043e\u0434\u0430\u0442\u0435\u043b\u044c\u043d\u044b\u0445 \u043e\u0440\u0433\u0430\u043d\u043e\u0432 \u0414\u041d\u0420 \u0438 \u041b\u041d\u0420, \u0430 \u0442\u0430\u043a\u0436\u0435 \u041e\u043b\u044c\u0433\u0430 \u041f\u043e\u0437\u0434\u043d\u044f\u043a\u043e\u0432\u0430 \u0438 \u0415\u043b\u0435\u043d\u0430 \u041a\u0440\u0430\u0432\u0447\u0435\u043d\u043a\u043e, \u043f\u0440\u0435\u0434\u0441\u0435\u0434\u0430\u0442\u0435\u043b\u0438 \u0426\u0418\u041a \u043e\u0431\u0435\u0438\u0445 \u0440\u0435\u0441\u043f\u0443\u0431\u043b\u0438\u043a. \u0412\u044b\u0431\u043e\u0440\u044b \u043f\u0440\u043e\u0448\u043b\u0438 \u0432 \u043d\u0435\u043f\u0440\u0438\u0437\u043d\u0430\u043d\u043d\u044b\u0445 \u0440\u0435\u0441\u043f\u0443\u0431\u043b\u0438\u043a\u0430\u0445 \u0414\u043e\u043d\u0431\u0430\u0441\u0441\u0430 11 \u043d\u043e\u044f\u0431\u0440\u044f. \u041d\u0430 \u043d\u0438\u0445 \u0443\u0434\u0435\u0440\u0436\u0430\u043b\u0438 \u043b\u0438\u0434\u0435\u0440\u0441\u0442\u0432\u043e \u0434\u0435\u0439\u0441\u0442\u0432\u0443\u044e\u0449\u0438\u0435 \u0440\u0443\u043a\u043e\u0432\u043e\u0434\u0438\u0442\u0435\u043b\u0438 \u0438 \u043f\u0430\u0440\u0442\u0438\u0438 \u2014 \u0414\u0435\u043d\u0438\u0441 \u041f\u0443\u0448\u0438\u043b\u0438\u043d \u0438 \u00ab\u0414\u043e\u043d\u0435\u0446\u043a\u0430\u044f \u0440\u0435\u0441\u043f\u0443\u0431\u043b\u0438\u043a\u0430\u00bb \u0432 \u0414\u041d\u0420 \u0438 \u041b\u0435\u043e\u043d\u0438\u0434 \u041f\u0430\u0441\u0435\u0447\u043d\u0438\u043a \u0441 \u0434\u0432\u0438\u0436\u0435\u043d\u0438\u0435\u043c \u00ab\u041c\u0438\u0440 \u041b\u0443\u0433\u0430\u043d\u0449\u0438\u043d\u0435\u00bb \u0432 \u041b\u041d\u0420. \u041f\u0440\u0435\u0437\u0438\u0434\u0435\u043d\u0442 \u0424\u0440\u0430\u043d\u0446\u0438\u0438 \u042d\u043c\u043c\u0430\u043d\u044e\u044d\u043b\u044c \u041c\u0430\u043a\u0440\u043e\u043d \u0438 \u043a\u0430\u043d\u0446\u043b\u0435\u0440 \u0424\u0420\u0413 \u0410\u043d\u0433\u0435\u043b\u0430 \u041c\u0435\u0440\u043a\u0435\u043b\u044c \u043f\u043e\u0441\u043b\u0435 \u0432\u0441\u0442\u0440\u0435\u0447\u0438 \u0441 \u0443\u043a\u0440\u0430\u0438\u043d\u0441\u043a\u0438\u043c \u043b\u0438\u0434\u0435\u0440\u043e\u043c \u041f\u0435\u0442\u0440\u043e\u043c \u041f\u043e\u0440\u043e\u0448\u0435\u043d\u043a\u043e \u043e\u0441\u0443\u0434\u0438\u043b\u0438 \u043f\u0440\u043e\u0432\u0435\u0434\u0435\u043d\u0438\u0435 \u0432\u044b\u0431\u043e\u0440\u043e\u0432, \u0437\u0430\u044f\u0432\u0438\u0432, \u0447\u0442\u043e \u043e\u043d\u0438 \u043d\u0435\u043b\u0435\u0433\u0438\u0442\u0438\u043c\u043d\u044b \u0438 \u00ab\u043f\u043e\u0434\u0440\u044b\u0432\u0430\u044e\u0442 \u0442\u0435\u0440\u0440\u0438\u0442\u043e\u0440\u0438\u0430\u043b\u044c\u043d\u0443\u044e \u0446\u0435\u043b\u043e\u0441\u0442\u043d\u043e\u0441\u0442\u044c \u0438 \u0441\u0443\u0432\u0435\u0440\u0435\u043d\u0438\u0442\u0435\u0442 \u0423\u043a\u0440\u0430\u0438\u043d\u044b\u00bb. \u041f\u043e\u0437\u0436\u0435 \u043a \u043e\u0441\u0443\u0436\u0434\u0435\u043d\u0438\u044e \u043f\u0440\u0438\u0441\u043e\u0435\u0434\u0438\u043d\u0438\u043b\u0438\u0441\u044c \u0421\u0428\u0410 \u0441 \u043e\u0431\u0435\u0449\u0430\u043d\u0438\u044f\u043c\u0438 \u043d\u043e\u0432\u044b\u0445 \u0441\u0430\u043d\u043a\u0446\u0438\u0439 \u0434\u043b\u044f \u0420\u043e\u0441\u0441\u0438\u0438.'\n\n>>> navec = Navec.load('navec_news_v1_1B_250K_300d_100q.tar')\n>>> ner = NER.load('slovnet_ner_news_v1.tar')\n>>> ner.navec(navec)\n\n>>> markup = ner(text)\n>>> show_markup(markup.text, markup.spans)\n\u0415\u0432\u0440\u043e\u043f\u0435\u0439\u0441\u043a\u0438\u0439 \u0441\u043e\u044e\u0437 \u0434\u043e\u0431\u0430\u0432\u0438\u043b \u0432 \u0441\u0430\u043d\u043a\u0446\u0438\u043e\u043d\u043d\u044b\u0439 \u0441\u043f\u0438\u0441\u043e\u043a \u0434\u0435\u0432\u044f\u0442\u044c \u043f\u043e\u043b\u0438\u0442\u0438\u0447\u0435\u0441\u043a\u0438\u0445 \nLOC\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500                                                  \n\u0434\u0435\u044f\u0442\u0435\u043b\u0435\u0439 \u0438\u0437 \u0441\u0430\u043c\u043e\u043f\u0440\u043e\u0432\u043e\u0437\u0433\u043b\u0430\u0448\u0435\u043d\u043d\u044b\u0445 \u0440\u0435\u0441\u043f\u0443\u0431\u043b\u0438\u043a \u0414\u043e\u043d\u0431\u0430\u0441\u0441\u0430 \u2014 \u0414\u043e\u043d\u0435\u0446\u043a\u043e\u0439 \u043d\u0430\u0440\u043e\u0434\u043d\u043e\u0439\n                                          LOC\u2500\u2500\u2500\u2500\u2500   LOC\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n \u0440\u0435\u0441\u043f\u0443\u0431\u043b\u0438\u043a\u0438 (\u0414\u041d\u0420) \u0438 \u041b\u0443\u0433\u0430\u043d\u0441\u043a\u043e\u0439 \u043d\u0430\u0440\u043e\u0434\u043d\u043e\u0439 \u0440\u0435\u0441\u043f\u0443\u0431\u043b\u0438\u043a\u0438 (\u041b\u041d\u0420) \u2014 \u0432 \u0441\u0432\u044f\u0437\u0438 \u0441 \n\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500   LOC\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500             \n\u043f\u0440\u043e\u0448\u0435\u0434\u0448\u0438\u043c\u0438 \u0442\u0430\u043c \u0432\u044b\u0431\u043e\u0440\u0430\u043c\u0438. \u041e\u0431 \u044d\u0442\u043e\u043c \u0433\u043e\u0432\u043e\u0440\u0438\u0442\u0441\u044f \u0432 \u0434\u043e\u043a\u0443\u043c\u0435\u043d\u0442\u0435, \u043e\u043f\u0443\u0431\u043b\u0438\u043a\u043e\u0432\u0430\u043d\u043d\u043e\u043c\n \u0432 \u043e\u0444\u0438\u0446\u0438\u0430\u043b\u044c\u043d\u043e\u043c \u0436\u0443\u0440\u043d\u0430\u043b\u0435 \u0415\u0432\u0440\u043e\u0441\u043e\u044e\u0437\u0430. \u0412 \u043d\u043e\u0432\u043e\u043c \u0441\u043f\u0438\u0441\u043a\u0435 \u0444\u0438\u0433\u0443\u0440\u0438\u0440\u0443\u0435\u0442 \u041b\u0435\u043e\u043d\u0438\u0434 \n                       LOC\u2500\u2500\u2500\u2500\u2500\u2500                            PER\u2500\u2500\u2500\u2500\n\u041f\u0430\u0441\u0435\u0447\u043d\u0438\u043a, \u043a\u043e\u0442\u043e\u0440\u044b\u0439 \u043f\u043e \u0438\u0442\u043e\u0433\u0430\u043c \u0432\u044b\u0431\u043e\u0440\u043e\u0432 \u0441\u0442\u0430\u043b \u0433\u043b\u0430\u0432\u043e\u0439 \u041b\u041d\u0420. \u041f\u043e\u043c\u0438\u043c\u043e \u043d\u0435\u0433\u043e \u0442\u0430\u043c \n\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500                                        LOC                  \n\u043f\u0440\u0438\u0441\u0443\u0442\u0441\u0442\u0432\u0443\u044e\u0442 \u0412\u043b\u0430\u0434\u0438\u043c\u0438\u0440 \u0411\u0438\u0434\u0435\u0432\u043a\u0430 \u0438 \u0414\u0435\u043d\u0438\u0441 \u041c\u0438\u0440\u043e\u0448\u043d\u0438\u0447\u0435\u043d\u043a\u043e, \u043f\u0440\u0435\u0434\u0441\u0435\u0434\u0430\u0442\u0435\u043b\u0438 \n             PER\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500   PER\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500               \n\u0437\u0430\u043a\u043e\u043d\u043e\u0434\u0430\u0442\u0435\u043b\u044c\u043d\u044b\u0445 \u043e\u0440\u0433\u0430\u043d\u043e\u0432 \u0414\u041d\u0420 \u0438 \u041b\u041d\u0420, \u0430 \u0442\u0430\u043a\u0436\u0435 \u041e\u043b\u044c\u0433\u0430 \u041f\u043e\u0437\u0434\u043d\u044f\u043a\u043e\u0432\u0430 \u0438 \u0415\u043b\u0435\u043d\u0430 \n                        LOC   LOC          PER\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500   PER\u2500\u2500\u2500\n\u041a\u0440\u0430\u0432\u0447\u0435\u043d\u043a\u043e, \u043f\u0440\u0435\u0434\u0441\u0435\u0434\u0430\u0442\u0435\u043b\u0438 \u0426\u0418\u041a \u043e\u0431\u0435\u0438\u0445 \u0440\u0435\u0441\u043f\u0443\u0431\u043b\u0438\u043a. \u0412\u044b\u0431\u043e\u0440\u044b \u043f\u0440\u043e\u0448\u043b\u0438 \u0432 \n\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500               ORG                                  \n\u043d\u0435\u043f\u0440\u0438\u0437\u043d\u0430\u043d\u043d\u044b\u0445 \u0440\u0435\u0441\u043f\u0443\u0431\u043b\u0438\u043a\u0430\u0445 \u0414\u043e\u043d\u0431\u0430\u0441\u0441\u0430 11 \u043d\u043e\u044f\u0431\u0440\u044f. \u041d\u0430 \u043d\u0438\u0445 \u0443\u0434\u0435\u0440\u0436\u0430\u043b\u0438 \u043b\u0438\u0434\u0435\u0440\u0441\u0442\u0432\u043e\n                         LOC\u2500\u2500\u2500\u2500\u2500                                     \n \u0434\u0435\u0439\u0441\u0442\u0432\u0443\u044e\u0449\u0438\u0435 \u0440\u0443\u043a\u043e\u0432\u043e\u0434\u0438\u0442\u0435\u043b\u0438 \u0438 \u043f\u0430\u0440\u0442\u0438\u0438 \u2014 \u0414\u0435\u043d\u0438\u0441 \u041f\u0443\u0448\u0438\u043b\u0438\u043d \u0438 \u00ab\u0414\u043e\u043d\u0435\u0446\u043a\u0430\u044f \n                                     PER\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500    ORG\u2500\u2500\u2500\u2500\u2500\u2500\n\u0440\u0435\u0441\u043f\u0443\u0431\u043b\u0438\u043a\u0430\u00bb \u0432 \u0414\u041d\u0420 \u0438 \u041b\u0435\u043e\u043d\u0438\u0434 \u041f\u0430\u0441\u0435\u0447\u043d\u0438\u043a \u0441 \u0434\u0432\u0438\u0436\u0435\u043d\u0438\u0435\u043c \u00ab\u041c\u0438\u0440 \u041b\u0443\u0433\u0430\u043d\u0449\u0438\u043d\u0435\u00bb \u0432 \u041b\u041d\u0420.\n\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500    LOC   PER\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500              ORG\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500    LOC \n \u041f\u0440\u0435\u0437\u0438\u0434\u0435\u043d\u0442 \u0424\u0440\u0430\u043d\u0446\u0438\u0438 \u042d\u043c\u043c\u0430\u043d\u044e\u044d\u043b\u044c \u041c\u0430\u043a\u0440\u043e\u043d \u0438 \u043a\u0430\u043d\u0446\u043b\u0435\u0440 \u0424\u0420\u0413 \u0410\u043d\u0433\u0435\u043b\u0430 \u041c\u0435\u0440\u043a\u0435\u043b\u044c \u043f\u043e\u0441\u043b\u0435\n           LOC\u2500\u2500\u2500\u2500 PER\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500           LOC PER\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500      \n \u0432\u0441\u0442\u0440\u0435\u0447\u0438 \u0441 \u0443\u043a\u0440\u0430\u0438\u043d\u0441\u043a\u0438\u043c \u043b\u0438\u0434\u0435\u0440\u043e\u043c \u041f\u0435\u0442\u0440\u043e\u043c \u041f\u043e\u0440\u043e\u0448\u0435\u043d\u043a\u043e \u043e\u0441\u0443\u0434\u0438\u043b\u0438 \u043f\u0440\u043e\u0432\u0435\u0434\u0435\u043d\u0438\u0435 \n                              PER\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500                    \n\u0432\u044b\u0431\u043e\u0440\u043e\u0432, \u0437\u0430\u044f\u0432\u0438\u0432, \u0447\u0442\u043e \u043e\u043d\u0438 \u043d\u0435\u043b\u0435\u0433\u0438\u0442\u0438\u043c\u043d\u044b \u0438 \u00ab\u043f\u043e\u0434\u0440\u044b\u0432\u0430\u044e\u0442 \u0442\u0435\u0440\u0440\u0438\u0442\u043e\u0440\u0438\u0430\u043b\u044c\u043d\u0443\u044e \n\u0446\u0435\u043b\u043e\u0441\u0442\u043d\u043e\u0441\u0442\u044c \u0438 \u0441\u0443\u0432\u0435\u0440\u0435\u043d\u0438\u0442\u0435\u0442 \u0423\u043a\u0440\u0430\u0438\u043d\u044b\u00bb. \u041f\u043e\u0437\u0436\u0435 \u043a \u043e\u0441\u0443\u0436\u0434\u0435\u043d\u0438\u044e \u043f\u0440\u0438\u0441\u043e\u0435\u0434\u0438\u043d\u0438\u043b\u0438\u0441\u044c \n                          LOC\u2500\u2500\u2500\u2500                                    \n\u0421\u0428\u0410 \u0441 \u043e\u0431\u0435\u0449\u0430\u043d\u0438\u044f\u043c\u0438 \u043d\u043e\u0432\u044b\u0445 \u0441\u0430\u043d\u043a\u0446\u0438\u0439 \u0434\u043b\u044f \u0420\u043e\u0441\u0441\u0438\u0438.\nLOC                                LOC\u2500\u2500\u2500 \n\n```\n\n### Morphology\n\nMorphology annotator processes tokenized text. To split the input into sentencies and tokens use <a href=\"https://github.com/natasha/razdel\">Razdel</a>.\n\n```python\n>>> from razdel import sentenize, tokenize\n>>> from navec import Navec\n>>> from slovnet import Morph\n\n>>> chunk = []\n>>> for sent in sentenize(text):\n>>>     tokens = [_.text for _ in tokenize(sent.text)]\n>>>     chunk.append(tokens)\n>>> chunk[:1]\n[['\u0415\u0432\u0440\u043e\u043f\u0435\u0439\u0441\u043a\u0438\u0439', '\u0441\u043e\u044e\u0437', '\u0434\u043e\u0431\u0430\u0432\u0438\u043b', '\u0432', '\u0441\u0430\u043d\u043a\u0446\u0438\u043e\u043d\u043d\u044b\u0439', '\u0441\u043f\u0438\u0441\u043e\u043a', '\u0434\u0435\u0432\u044f\u0442\u044c', '\u043f\u043e\u043b\u0438\u0442\u0438\u0447\u0435\u0441\u043a\u0438\u0445', '\u0434\u0435\u044f\u0442\u0435\u043b\u0435\u0439', '\u0438\u0437', '\u0441\u0430\u043c\u043e\u043f\u0440\u043e\u0432\u043e\u0437\u0433\u043b\u0430\u0448\u0435\u043d\u043d\u044b\u0445', '\u0440\u0435\u0441\u043f\u0443\u0431\u043b\u0438\u043a', '\u0414\u043e\u043d\u0431\u0430\u0441\u0441\u0430', '\u2014', '\u0414\u043e\u043d\u0435\u0446\u043a\u043e\u0439', '\u043d\u0430\u0440\u043e\u0434\u043d\u043e\u0439', '\u0440\u0435\u0441\u043f\u0443\u0431\u043b\u0438\u043a\u0438', '(', '\u0414\u041d\u0420', ')', '\u0438', '\u041b\u0443\u0433\u0430\u043d\u0441\u043a\u043e\u0439', '\u043d\u0430\u0440\u043e\u0434\u043d\u043e\u0439', '\u0440\u0435\u0441\u043f\u0443\u0431\u043b\u0438\u043a\u0438', '(', '\u041b\u041d\u0420', ')', '\u2014', '\u0432', '\u0441\u0432\u044f\u0437\u0438', '\u0441', '\u043f\u0440\u043e\u0448\u0435\u0434\u0448\u0438\u043c\u0438', '\u0442\u0430\u043c', '\u0432\u044b\u0431\u043e\u0440\u0430\u043c\u0438', '.']]\n\n>>> navec = Navec.load('navec_news_v1_1B_250K_300d_100q.tar')\n>>> morph = Morph.load('slovnet_morph_news_v1.tar', batch_size=4)\n>>> morph.navec(navec)\n\n>>> markup = next(morph.map(chunk))\n>>> for token in markup.tokens:\n>>>     print(f'{token.text:>20} {token.tag}')\n         \u0415\u0432\u0440\u043e\u043f\u0435\u0439\u0441\u043a\u0438\u0439 ADJ|Case=Nom|Degree=Pos|Gender=Masc|Number=Sing\n                \u0441\u043e\u044e\u0437 NOUN|Animacy=Inan|Case=Nom|Gender=Masc|Number=Sing\n             \u0434\u043e\u0431\u0430\u0432\u0438\u043b VERB|Aspect=Perf|Gender=Masc|Mood=Ind|Number=Sing|Tense=Past|VerbForm=Fin|Voice=Act\n                   \u0432 ADP\n         \u0441\u0430\u043d\u043a\u0446\u0438\u043e\u043d\u043d\u044b\u0439 ADJ|Animacy=Inan|Case=Acc|Degree=Pos|Gender=Masc|Number=Sing\n              \u0441\u043f\u0438\u0441\u043e\u043a NOUN|Animacy=Inan|Case=Acc|Gender=Masc|Number=Sing\n              \u0434\u0435\u0432\u044f\u0442\u044c NUM|Case=Nom\n        \u043f\u043e\u043b\u0438\u0442\u0438\u0447\u0435\u0441\u043a\u0438\u0445 ADJ|Case=Gen|Degree=Pos|Number=Plur\n            \u0434\u0435\u044f\u0442\u0435\u043b\u0435\u0439 NOUN|Animacy=Anim|Case=Gen|Gender=Masc|Number=Plur\n                  \u0438\u0437 ADP\n \u0441\u0430\u043c\u043e\u043f\u0440\u043e\u0432\u043e\u0437\u0433\u043b\u0430\u0448\u0435\u043d\u043d\u044b\u0445 ADJ|Case=Gen|Degree=Pos|Number=Plur\n           \u0440\u0435\u0441\u043f\u0443\u0431\u043b\u0438\u043a NOUN|Animacy=Inan|Case=Gen|Gender=Fem|Number=Plur\n            \u0414\u043e\u043d\u0431\u0430\u0441\u0441\u0430 PROPN|Animacy=Inan|Case=Gen|Gender=Masc|Number=Sing\n                   \u2014 PUNCT\n            \u0414\u043e\u043d\u0435\u0446\u043a\u043e\u0439 ADJ|Case=Gen|Degree=Pos|Gender=Fem|Number=Sing\n            \u043d\u0430\u0440\u043e\u0434\u043d\u043e\u0439 ADJ|Case=Gen|Degree=Pos|Gender=Fem|Number=Sing\n          \u0440\u0435\u0441\u043f\u0443\u0431\u043b\u0438\u043a\u0438 NOUN|Animacy=Inan|Case=Gen|Gender=Fem|Number=Sing\n                   ( PUNCT\n                 \u0414\u041d\u0420 PROPN|Animacy=Inan|Case=Gen|Gender=Fem|Number=Sing\n                   ) PUNCT\n                   \u0438 CCONJ\n           \u041b\u0443\u0433\u0430\u043d\u0441\u043a\u043e\u0439 ADJ|Case=Gen|Degree=Pos|Gender=Fem|Number=Sing\n            \u043d\u0430\u0440\u043e\u0434\u043d\u043e\u0439 ADJ|Case=Gen|Degree=Pos|Gender=Fem|Number=Sing\n          \u0440\u0435\u0441\u043f\u0443\u0431\u043b\u0438\u043a\u0438 NOUN|Animacy=Inan|Case=Gen|Gender=Fem|Number=Sing\n                   ( PUNCT\n                 \u041b\u041d\u0420 PROPN|Animacy=Inan|Case=Gen|Gender=Fem|Number=Sing\n                   ) PUNCT\n                   \u2014 PUNCT\n                   \u0432 ADP\n               \u0441\u0432\u044f\u0437\u0438 NOUN|Animacy=Inan|Case=Loc|Gender=Fem|Number=Sing\n                   \u0441 ADP\n          \u043f\u0440\u043e\u0448\u0435\u0434\u0448\u0438\u043c\u0438 VERB|Aspect=Perf|Case=Ins|Number=Plur|Tense=Past|VerbForm=Part|Voice=Act\n                 \u0442\u0430\u043c ADV|Degree=Pos\n            \u0432\u044b\u0431\u043e\u0440\u0430\u043c\u0438 NOUN|Animacy=Inan|Case=Ins|Gender=Masc|Number=Plur\n                   . PUNCT\n\n```\n\n### Syntax\n\nSyntax parser processes sentencies split into tokens. Use <a href=\"https://github.com/natasha/razdel\">Razdel</a> for segmentation.\n\n```python\n>>> from ipymarkup import show_dep_ascii_markup as show_markup\n>>> from razdel import sentenize, tokenize\n>>> from navec import Navec\n>>> from slovnet import Syntax\n\n>>> chunk = []\n>>> for sent in sentenize(text):\n>>>     tokens = [_.text for _ in tokenize(sent.text)]\n>>>     chunk.append(tokens)\n>>> chunk[:1]\n[['\u0415\u0432\u0440\u043e\u043f\u0435\u0439\u0441\u043a\u0438\u0439', '\u0441\u043e\u044e\u0437', '\u0434\u043e\u0431\u0430\u0432\u0438\u043b', '\u0432', '\u0441\u0430\u043d\u043a\u0446\u0438\u043e\u043d\u043d\u044b\u0439', '\u0441\u043f\u0438\u0441\u043e\u043a', '\u0434\u0435\u0432\u044f\u0442\u044c', '\u043f\u043e\u043b\u0438\u0442\u0438\u0447\u0435\u0441\u043a\u0438\u0445', '\u0434\u0435\u044f\u0442\u0435\u043b\u0435\u0439', '\u0438\u0437', '\u0441\u0430\u043c\u043e\u043f\u0440\u043e\u0432\u043e\u0437\u0433\u043b\u0430\u0448\u0435\u043d\u043d\u044b\u0445', '\u0440\u0435\u0441\u043f\u0443\u0431\u043b\u0438\u043a', '\u0414\u043e\u043d\u0431\u0430\u0441\u0441\u0430', '\u2014', '\u0414\u043e\u043d\u0435\u0446\u043a\u043e\u0439', '\u043d\u0430\u0440\u043e\u0434\u043d\u043e\u0439', '\u0440\u0435\u0441\u043f\u0443\u0431\u043b\u0438\u043a\u0438', '(', '\u0414\u041d\u0420', ')', '\u0438', '\u041b\u0443\u0433\u0430\u043d\u0441\u043a\u043e\u0439', '\u043d\u0430\u0440\u043e\u0434\u043d\u043e\u0439', '\u0440\u0435\u0441\u043f\u0443\u0431\u043b\u0438\u043a\u0438', '(', '\u041b\u041d\u0420', ')', '\u2014', '\u0432', '\u0441\u0432\u044f\u0437\u0438', '\u0441', '\u043f\u0440\u043e\u0448\u0435\u0434\u0448\u0438\u043c\u0438', '\u0442\u0430\u043c', '\u0432\u044b\u0431\u043e\u0440\u0430\u043c\u0438', '.']]\n\n>>> navec = Navec.load('navec_news_v1_1B_250K_300d_100q.tar')\n>>> syntax = Syntax.load('slovnet_syntax_news_v1.tar')\n>>> syntax.navec(navec)\n\n>>> markup = next(syntax.map(chunk))\n\n# Convert CoNLL-style format to source, target indices\n>>> words, deps = [], []\n>>> for token in markup.tokens:\n>>>     words.append(token.text)\n>>>     source = int(token.head_id) - 1\n>>>     target = int(token.id) - 1\n>>>     if source > 0 and source != target:  # skip root, loops\n>>>         deps.append([source, target, token.rel])\n>>> show_markup(words, deps)\n              \u250c\u25ba \u0415\u0432\u0440\u043e\u043f\u0435\u0439\u0441\u043a\u0438\u0439         amod\n            \u250c\u25ba\u2514\u2500 \u0441\u043e\u044e\u0437                nsubj\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u250c\u2500\u250c\u2500\u2514\u2500\u2500\u2500 \u0434\u043e\u0431\u0430\u0432\u0438\u043b             \n\u2502       \u2502 \u2502 \u250c\u2500\u2500\u25ba \u0432                   case\n\u2502       \u2502 \u2502 \u2502 \u250c\u25ba \u0441\u0430\u043d\u043a\u0446\u0438\u043e\u043d\u043d\u044b\u0439         amod\n\u2502       \u2502 \u2514\u25ba\u2514\u2500\u2514\u2500 \u0441\u043f\u0438\u0441\u043e\u043a              obl\n\u2502       \u2502   \u250c\u2500\u2500\u25ba \u0434\u0435\u0432\u044f\u0442\u044c              nummod:gov\n\u2502       \u2502   \u2502 \u250c\u25ba \u043f\u043e\u043b\u0438\u0442\u0438\u0447\u0435\u0441\u043a\u0438\u0445        amod\n\u2502 \u250c\u2500\u2500\u2500\u2500\u2500\u2514\u25ba\u250c\u2500\u2514\u2500\u2514\u2500 \u0434\u0435\u044f\u0442\u0435\u043b\u0435\u0439            obj\n\u2502 \u2502       \u2502 \u250c\u2500\u2500\u25ba \u0438\u0437                  case\n\u2502 \u2502       \u2502 \u2502 \u250c\u25ba \u0441\u0430\u043c\u043e\u043f\u0440\u043e\u0432\u043e\u0437\u0433\u043b\u0430\u0448\u0435\u043d\u043d\u044b\u0445 amod\n\u2502 \u2502       \u2514\u25ba\u2514\u2500\u2514\u2500 \u0440\u0435\u0441\u043f\u0443\u0431\u043b\u0438\u043a           nmod\n\u2502 \u2502         \u2514\u2500\u2500\u25ba \u0414\u043e\u043d\u0431\u0430\u0441\u0441\u0430            nmod\n\u2502 \u2502 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u25ba \u2014                   punct\n\u2502 \u2502 \u2502       \u250c\u2500\u2500\u25ba \u0414\u043e\u043d\u0435\u0446\u043a\u043e\u0439            amod\n\u2502 \u2502 \u2502       \u2502 \u250c\u25ba \u043d\u0430\u0440\u043e\u0434\u043d\u043e\u0439            amod\n\u2502 \u2502 \u2502 \u250c\u2500\u250c\u2500\u250c\u2500\u2514\u2500\u2514\u2500 \u0440\u0435\u0441\u043f\u0443\u0431\u043b\u0438\u043a\u0438          \n\u2502 \u2502 \u2502 \u2502 \u2502 \u2502   \u250c\u25ba (                   punct\n\u2502 \u2502 \u2502 \u2502 \u2502 \u2514\u25ba\u250c\u2500\u2514\u2500 \u0414\u041d\u0420                 parataxis\n\u2502 \u2502 \u2502 \u2502 \u2502   \u2514\u2500\u2500\u25ba )                   punct\n\u2502 \u2502 \u2502 \u2502 \u2502 \u250c\u2500\u2500\u2500\u2500\u25ba \u0438                   cc\n\u2502 \u2502 \u2502 \u2502 \u2502 \u2502 \u250c\u2500\u2500\u25ba \u041b\u0443\u0433\u0430\u043d\u0441\u043a\u043e\u0439           amod\n\u2502 \u2502 \u2502 \u2502 \u2502 \u2502 \u2502 \u250c\u25ba \u043d\u0430\u0440\u043e\u0434\u043d\u043e\u0439            amod\n\u2502 \u2502 \u2514\u2500\u2502 \u2514\u25ba\u2514\u2500\u2514\u2500\u2514\u2500 \u0440\u0435\u0441\u043f\u0443\u0431\u043b\u0438\u043a\u0438          conj\n\u2502 \u2502   \u2502       \u250c\u25ba (                   punct\n\u2502 \u2502   \u2514\u2500\u2500\u2500\u2500\u25ba\u250c\u2500\u2514\u2500 \u041b\u041d\u0420                 parataxis\n\u2502 \u2502         \u2514\u2500\u2500\u25ba )                   punct\n\u2502 \u2502     \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u25ba \u2014                   punct\n\u2502 \u2502     \u2502 \u250c\u25ba\u250c\u2500\u250c\u2500 \u0432                   case\n\u2502 \u2502     \u2502 \u2502 \u2502 \u2514\u25ba \u0441\u0432\u044f\u0437\u0438               fixed\n\u2502 \u2502     \u2502 \u2502 \u2514\u2500\u2500\u25ba \u0441                   fixed\n\u2502 \u2502     \u2502 \u2502 \u250c\u25ba\u250c\u2500 \u043f\u0440\u043e\u0448\u0435\u0434\u0448\u0438\u043c\u0438          acl\n\u2502 \u2502     \u2502 \u2502 \u2502 \u2514\u25ba \u0442\u0430\u043c                 advmod\n\u2502 \u2514\u2500\u2500\u2500\u2500\u25ba\u2514\u2500\u2514\u2500\u2514\u2500\u2500\u2500 \u0432\u044b\u0431\u043e\u0440\u0430\u043c\u0438            nmod\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u25ba .                   punct\n\n```\n\n## Documentation\n\nMaterials are in Russian:\n\n* <a href=\"https://natasha.github.io/ner\">Article about distillation and quantization in Slovnet</a> \n* <a href=\"https://youtu.be/-7XT_U6hVvk?t=2034\">Slovnet section of Datafest 2020 talk</a>\n\n## Evaluation\n\nIn addition to quality metrics we measure speed and models size, parameters that are important in production:\n\n* `init` \u2014 time between system launch and first response. It is convenient for testing and devops to have model that starts quickly.\n* `disk` \u2014 file size of artefacts one needs to download before using the system: model weights, embeddings, binaries, vocabs. It is convenient to deploy compact models in production.\n* `ram` \u2014 average CPU/GPU RAM usage.\n* `speed` \u2014 number of input items processed per second: news articles, tokenized sentencies.\n\n### NER\n\n4 datasets are used for evaluation: <a href=\"https://github.com/natasha/corus#load_factru\"><code>factru</code></a>, <a href=\"https://github.com/natasha/corus#load_gareev\"><code>gareev</code></a>, <a href=\"https://github.com/natasha/corus#load_ne5\"><code>ne5</code></a> and <a href=\"https://github.com/natasha/corus#load_bsnlp\"><code>bsnlp</code></a>. Slovnet is compared to <a href=\"https://github.com/natasha/naeval#deeppavlov_ner\"><code>deeppavlov</code></a>, <a href=\"https://github.com/natasha/naeval#deeppavlov_bert_ner\"><code>deeppavlov_bert</code></a>, <a href=\"https://github.com/natasha/naeval#deeppavlov_slavic_bert_ner\"><code>deeppavlov_slavic</code></a>, <a href=\"https://github.com/natasha/naeval#pullenti\"><code>pullenti</code></a>, <a href=\"https://github.com/natasha/naeval#spacy\"><code>spacy</code></a>, <a href=\"https://github.com/natasha/naeval#stanza\"><code>stanza</code></a>, <a href=\"https://github.com/natasha/naeval#texterra\"><code>texterra</code></a>, <a href=\"https://github.com/natasha/naeval#tomita\"><code>tomita</code></a>, <a href=\"https://github.com/natasha/naeval#mitie\"><code>mitie</code></a>.\n\nFor every column top 3 results are highlighted:\n\n<!--- ner1 --->\n<table border=\"0\" class=\"dataframe\">\n  <thead>\n    <tr>\n      <th></th>\n      <th colspan=\"3\" halign=\"left\">factru</th>\n      <th colspan=\"2\" halign=\"left\">gareev</th>\n      <th colspan=\"3\" halign=\"left\">ne5</th>\n      <th colspan=\"3\" halign=\"left\">bsnlp</th>\n    </tr>\n    <tr>\n      <th>f1</th>\n      <th>PER</th>\n      <th>LOC</th>\n      <th>ORG</th>\n      <th>PER</th>\n      <th>ORG</th>\n      <th>PER</th>\n      <th>LOC</th>\n      <th>ORG</th>\n      <th>PER</th>\n      <th>LOC</th>\n      <th>ORG</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>slovnet</th>\n      <td><b>0.959</b></td>\n      <td><b>0.915</b></td>\n      <td><b>0.825</b></td>\n      <td><b>0.977</b></td>\n      <td><b>0.899</b></td>\n      <td><b>0.984</b></td>\n      <td><b>0.973</b></td>\n      <td><b>0.951</b></td>\n      <td>0.944</td>\n      <td>0.834</td>\n      <td>0.718</td>\n    </tr>\n    <tr>\n      <th>slovnet_bert</th>\n      <td><b>0.973</b></td>\n      <td><b>0.928</b></td>\n      <td><b>0.831</b></td>\n      <td><b>0.991</b></td>\n      <td><b>0.911</b></td>\n      <td><b>0.996</b></td>\n      <td><b>0.989</b></td>\n      <td><b>0.976</b></td>\n      <td><b>0.960</b></td>\n      <td>0.838</td>\n      <td><b>0.733</b></td>\n    </tr>\n    <tr>\n      <th>deeppavlov</th>\n      <td>0.910</td>\n      <td>0.886</td>\n      <td>0.742</td>\n      <td>0.944</td>\n      <td>0.798</td>\n      <td>0.942</td>\n      <td>0.919</td>\n      <td>0.881</td>\n      <td>0.866</td>\n      <td>0.767</td>\n      <td>0.624</td>\n    </tr>\n    <tr>\n      <th>deeppavlov_bert</th>\n      <td><b>0.971</b></td>\n      <td><b>0.928</b></td>\n      <td><b>0.825</b></td>\n      <td><b>0.980</b></td>\n      <td><b>0.916</b></td>\n      <td><b>0.997</b></td>\n      <td><b>0.990</b></td>\n      <td><b>0.976</b></td>\n      <td><b>0.954</b></td>\n      <td><b>0.840</b></td>\n      <td><b>0.741</b></td>\n    </tr>\n    <tr>\n      <th>deeppavlov_slavic</th>\n      <td>0.956</td>\n      <td>0.884</td>\n      <td>0.714</td>\n      <td>0.976</td>\n      <td>0.776</td>\n      <td>0.984</td>\n      <td>0.817</td>\n      <td>0.761</td>\n      <td><b>0.965</b></td>\n      <td><b>0.925</b></td>\n      <td><b>0.831</b></td>\n    </tr>\n    <tr>\n      <th>pullenti</th>\n      <td>0.905</td>\n      <td>0.814</td>\n      <td>0.686</td>\n      <td>0.939</td>\n      <td>0.639</td>\n      <td>0.952</td>\n      <td>0.862</td>\n      <td>0.683</td>\n      <td>0.900</td>\n      <td>0.769</td>\n      <td>0.566</td>\n    </tr>\n    <tr>\n      <th>spacy</th>\n      <td>0.901</td>\n      <td>0.886</td>\n      <td>0.765</td>\n      <td>0.970</td>\n      <td>0.883</td>\n      <td>0.967</td>\n      <td>0.928</td>\n      <td>0.918</td>\n      <td>0.919</td>\n      <td>0.823</td>\n      <td>0.693</td>\n    </tr>\n    <tr>\n      <th>stanza</th>\n      <td>0.943</td>\n      <td>0.865</td>\n      <td>0.687</td>\n      <td>0.953</td>\n      <td>0.827</td>\n      <td>0.923</td>\n      <td>0.753</td>\n      <td>0.734</td>\n      <td>0.938</td>\n      <td><b>0.838</b></td>\n      <td>0.724</td>\n    </tr>\n    <tr>\n      <th>texterra</th>\n      <td>0.900</td>\n      <td>0.800</td>\n      <td>0.597</td>\n      <td>0.888</td>\n      <td>0.561</td>\n      <td>0.901</td>\n      <td>0.777</td>\n      <td>0.594</td>\n      <td>0.858</td>\n      <td>0.783</td>\n      <td>0.548</td>\n    </tr>\n    <tr>\n      <th>tomita</th>\n      <td>0.929</td>\n      <td></td>\n      <td></td>\n      <td>0.921</td>\n      <td></td>\n      <td>0.945</td>\n      <td></td>\n      <td></td>\n      <td>0.881</td>\n      <td></td>\n      <td></td>\n    </tr>\n    <tr>\n      <th>mitie</th>\n      <td>0.888</td>\n      <td>0.861</td>\n      <td>0.532</td>\n      <td>0.849</td>\n      <td>0.452</td>\n      <td>0.753</td>\n      <td>0.642</td>\n      <td>0.432</td>\n      <td>0.736</td>\n      <td>0.801</td>\n      <td>0.524</td>\n    </tr>\n  </tbody>\n</table>\n<!--- ner1 --->\n\n`it/s` \u2014 news articles per second, 1 article \u2248 1KB.\n\n<!--- ner2 --->\n<table border=\"0\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>init, s</th>\n      <th>disk, mb</th>\n      <th>ram, mb</th>\n      <th>speed, it/s</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>slovnet</th>\n      <td><b>1.0</b></td>\n      <td><b>27</b></td>\n      <td><b>205</b></td>\n      <td>25.3</td>\n    </tr>\n    <tr>\n      <th>slovnet_bert</th>\n      <td>5.0</td>\n      <td>473</td>\n      <td>9500</td>\n      <td><b>40.0 (gpu)</b></td>\n    </tr>\n    <tr>\n      <th>deeppavlov</th>\n      <td>5.9</td>\n      <td>1024</td>\n      <td>3072</td>\n      <td>24.3 (gpu)</td>\n    </tr>\n    <tr>\n      <th>deeppavlov_bert</th>\n      <td>34.5</td>\n      <td>2048</td>\n      <td>6144</td>\n      <td>13.1 (gpu)</td>\n    </tr>\n    <tr>\n      <th>deeppavlov_slavic</th>\n      <td>35.0</td>\n      <td>2048</td>\n      <td>4096</td>\n      <td>8.0 (gpu)</td>\n    </tr>\n    <tr>\n      <th>pullenti</th>\n      <td><b>2.9</b></td>\n      <td><b>16</b></td>\n      <td><b>253</b></td>\n      <td>6.0</td>\n    </tr>\n    <tr>\n      <th>spacy</th>\n      <td>8.0</td>\n      <td>140</td>\n      <td>625</td>\n      <td>8.0</td>\n    </tr>\n    <tr>\n      <th>stanza</th>\n      <td>3.0</td>\n      <td>591</td>\n      <td>11264</td>\n      <td>3.0 (gpu)</td>\n    </tr>\n    <tr>\n      <th>texterra</th>\n      <td>47.6</td>\n      <td>193</td>\n      <td>3379</td>\n      <td>4.0</td>\n    </tr>\n    <tr>\n      <th>tomita</th>\n      <td><b>2.0</b></td>\n      <td><b>64</b></td>\n      <td><b>63</b></td>\n      <td><b>29.8</b></td>\n    </tr>\n    <tr>\n      <th>mitie</th>\n      <td>28.3</td>\n      <td>327</td>\n      <td>261</td>\n      <td><b>32.8</b></td>\n    </tr>\n  </tbody>\n</table>\n<!--- ner2 --->\n\n### Morphology\n\n<a href=\"https://github.com/natasha/corus#load_gramru\">Datasets from GramEval2020</a> are used for evaluation:\n\n* `news` \u2014 sample from Lenta.ru.\n* `wiki` \u2014 UD GSD.\n* `fiction` \u2014 SynTagRus + JZ.\n* `social`, `poetry` \u2014 social, poetry subset of Taiga.\n\nSlovnet is compated to a number of existing morphology taggers: <a href=\"https://github.com/natasha/naeval#deeppavlov_morph\"><code>deeppavlov</code></a>, <a href=\"https://github.com/natasha/naeval#deeppavlov_bert_morph\"><code>deeppavlov_bert</code></a>, <a href=\"https://github.com/natasha/naeval#rupostagger\"><code>rupostagger</code></a>, <a href=\"https://github.com/natasha/naeval#rnnmorph\"><code>rnnmorph</code></a>, <a href=\"https://github.com/natasha/naeval#mary\"><code>maru</code></a>, <a href=\"https://github.com/natasha/naeval#udpipe\"><code>udpipe</code></a>, <a href=\"https://github.com/natasha/naeval#spacy\"><code>spacy</code></a>, <a href=\"https://github.com/natasha/naeval#stanza\"><code>stanza</code></a>.\n\nFor every column top 3 results are highlighted. `slovnet` was trained only on news dataset:\n\n<!--- morph1 --->\n<table border=\"0\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>news</th>\n      <th>wiki</th>\n      <th>fiction</th>\n      <th>social</th>\n      <th>poetry</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>slovnet</th>\n      <td><b>0.961</b></td>\n      <td>0.815</td>\n      <td>0.905</td>\n      <td>0.807</td>\n      <td>0.664</td>\n    </tr>\n    <tr>\n      <th>slovnet_bert</th>\n      <td><b>0.982</b></td>\n      <td><b>0.884</b></td>\n      <td><b>0.990</b></td>\n      <td><b>0.890</b></td>\n      <td><b>0.856</b></td>\n    </tr>\n    <tr>\n      <th>deeppavlov</th>\n      <td>0.940</td>\n      <td>0.841</td>\n      <td>0.944</td>\n      <td>0.870</td>\n      <td><b>0.857</b></td>\n    </tr>\n    <tr>\n      <th>deeppavlov_bert</th>\n      <td>0.951</td>\n      <td><b>0.868</b></td>\n      <td><b>0.964</b></td>\n      <td><b>0.892</b></td>\n      <td><b>0.865</b></td>\n    </tr>\n    <tr>\n      <th>udpipe</th>\n      <td>0.918</td>\n      <td>0.811</td>\n      <td><b>0.957</b></td>\n      <td>0.870</td>\n      <td>0.776</td>\n    </tr>\n    <tr>\n      <th>spacy</th>\n      <td><b>0.964</b></td>\n      <td><b>0.849</b></td>\n      <td>0.942</td>\n      <td>0.857</td>\n      <td>0.784</td>\n    </tr>\n    <tr>\n      <th>stanza</th>\n      <td>0.934</td>\n      <td>0.831</td>\n      <td>0.940</td>\n      <td><b>0.873</b></td>\n      <td>0.825</td>\n    </tr>\n    <tr>\n      <th>rnnmorph</th>\n      <td>0.896</td>\n      <td>0.812</td>\n      <td>0.890</td>\n      <td>0.860</td>\n      <td>0.838</td>\n    </tr>\n    <tr>\n      <th>maru</th>\n      <td>0.894</td>\n      <td>0.808</td>\n      <td>0.887</td>\n      <td>0.861</td>\n      <td>0.840</td>\n    </tr>\n    <tr>\n      <th>rupostagger</th>\n      <td>0.673</td>\n      <td>0.645</td>\n      <td>0.661</td>\n      <td>0.641</td>\n      <td>0.636</td>\n    </tr>\n  </tbody>\n</table>\n<!--- morph1 --->\n\n`it/s` \u2014 sentences per second.\n\n<!--- morph2 --->\n<table border=\"0\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>init, s</th>\n      <th>disk, mb</th>\n      <th>ram, mb</th>\n      <th>speed, it/s</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>slovnet</th>\n      <td><b>1.0</b></td>\n      <td><b>27</b></td>\n      <td><b>115</b></td>\n      <td><b>532.0</b></td>\n    </tr>\n    <tr>\n      <th>slovnet_bert</th>\n      <td>5.0</td>\n      <td>475</td>\n      <td>8087</td>\n      <td><b>285.0 (gpu)</b></td>\n    </tr>\n    <tr>\n      <th>deeppavlov</th>\n      <td><b>4.0</b></td>\n      <td>32</td>\n      <td>10240</td>\n      <td>90.0 (gpu)</td>\n    </tr>\n    <tr>\n      <th>deeppavlov_bert</th>\n      <td>20.0</td>\n      <td>1393</td>\n      <td>8704</td>\n      <td>85.0 (gpu)</td>\n    </tr>\n    <tr>\n      <th>udpipe</th>\n      <td>6.9</td>\n      <td>45</td>\n      <td><b>242</b></td>\n      <td>56.2</td>\n    </tr>\n    <tr>\n      <th>spacy</th>\n      <td>8.0</td>\n      <td>140</td>\n      <td>579</td>\n      <td>50.0</td>\n    </tr>\n    <tr>\n      <th>stanza</th>\n      <td><b>2.0</b></td>\n      <td>591</td>\n      <td>393</td>\n      <td><b>92.0</b></td>\n    </tr>\n    <tr>\n      <th>rnnmorph</th>\n      <td>8.7</td>\n      <td><b>10</b></td>\n      <td>289</td>\n      <td>16.6</td>\n    </tr>\n    <tr>\n      <th>maru</th>\n      <td>15.8</td>\n      <td>44</td>\n      <td>370</td>\n      <td>36.4</td>\n    </tr>\n    <tr>\n      <th>rupostagger</th>\n      <td>4.8</td>\n      <td><b>3</b></td>\n      <td><b>118</b></td>\n      <td>48.0</td>\n    </tr>\n  </tbody>\n</table>\n<!--- morph2 --->\n\n### Syntax\n\nSlovnet is compated to several existing syntax parsers: <a href=\"https://github.com/natasha/naeval#udpipe\"><code>udpipe</code></a>, <a href=\"https://github.com/natasha/naeval#spacy\"><code>spacy</code></a>, <a href=\"https://github.com/natasha/naeval#deeppavlov_bert_syntax\"><code>deeppavlov</code></a>, <a href=\"https://github.com/natasha/naeval#stanza\"><code>stanza</code></a>.\n\n<!--- syntax1 --->\n<table border=\"0\" class=\"dataframe\">\n  <thead>\n    <tr>\n      <th></th>\n      <th colspan=\"2\" halign=\"left\">news</th>\n      <th colspan=\"2\" halign=\"left\">wiki</th>\n      <th colspan=\"2\" halign=\"left\">fiction</th>\n      <th colspan=\"2\" halign=\"left\">social</th>\n      <th colspan=\"2\" halign=\"left\">poetry</th>\n    </tr>\n    <tr>\n      <th></th>\n      <th>uas</th>\n      <th>las</th>\n      <th>uas</th>\n      <th>las</th>\n      <th>uas</th>\n      <th>las</th>\n      <th>uas</th>\n      <th>las</th>\n      <th>uas</th>\n      <th>las</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>slovnet</th>\n      <td>0.907</td>\n      <td>0.880</td>\n      <td>0.775</td>\n      <td>0.718</td>\n      <td>0.806</td>\n      <td>0.776</td>\n      <td>0.726</td>\n      <td>0.656</td>\n      <td>0.542</td>\n      <td>0.469</td>\n    </tr>\n    <tr>\n      <th>slovnet_bert</th>\n      <td><b>0.965</b></td>\n      <td><b>0.936</b></td>\n      <td><b>0.891</b></td>\n      <td><b>0.828</b></td>\n      <td><b>0.958</b></td>\n      <td><b>0.940</b></td>\n      <td><b>0.846</b></td>\n      <td><b>0.782</b></td>\n      <td><b>0.776</b></td>\n      <td><b>0.706</b></td>\n    </tr>\n    <tr>\n      <th>deeppavlov_bert</th>\n      <td><b>0.962</b></td>\n      <td><b>0.910</b></td>\n      <td><b>0.882</b></td>\n      <td><b>0.786</b></td>\n      <td><b>0.963</b></td>\n      <td><b>0.929</b></td>\n      <td><b>0.844</b></td>\n      <td><b>0.761</b></td>\n      <td><b>0.784</b></td>\n      <td><b>0.691</b></td>\n    </tr>\n    <tr>\n      <th>udpipe</th>\n      <td>0.873</td>\n      <td>0.823</td>\n      <td>0.622</td>\n      <td>0.531</td>\n      <td>0.910</td>\n      <td>0.876</td>\n      <td>0.700</td>\n      <td>0.624</td>\n      <td>0.625</td>\n      <td>0.534</td>\n    </tr>\n    <tr>\n      <th>spacy</th>\n      <td><b>0.943</b></td>\n      <td><b>0.916</b></td>\n      <td><b>0.851</b></td>\n      <td><b>0.783</b></td>\n      <td>0.901</td>\n      <td>0.874</td>\n      <td><b>0.804</b></td>\n      <td><b>0.737</b></td>\n      <td>0.704</td>\n      <td><b>0.616</b></td>\n    </tr>\n    <tr>\n      <th>stanza</th>\n      <td>0.940</td>\n      <td>0.886</td>\n      <td>0.815</td>\n      <td>0.716</td>\n      <td><b>0.936</b></td>\n      <td><b>0.895</b></td>\n      <td>0.802</td>\n      <td>0.714</td>\n      <td><b>0.713</b></td>\n      <td>0.613</td>\n    </tr>\n  </tbody>\n</table>\n<!--- syntax1 --->\n\n`it/s` \u2014 sentences per second.\n\n<!--- syntax2 --->\n<table border=\"0\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>init, s</th>\n      <th>disk, mb</th>\n      <th>ram, mb</th>\n      <th>speed, it/s</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>slovnet</th>\n      <td><b>1.0</b></td>\n      <td><b>27</b></td>\n      <td><b>125</b></td>\n      <td><b>450.0</b></td>\n    </tr>\n    <tr>\n      <th>slovnet_bert</th>\n      <td><b>5.0</b></td>\n      <td>504</td>\n      <td>3427</td>\n      <td><b>200.0 (gpu)</b></td>\n    </tr>\n    <tr>\n      <th>deeppavlov_bert</th>\n      <td>34.0</td>\n      <td>1427</td>\n      <td>8704</td>\n      <td><b>75.0 (gpu)</b></td>\n    </tr>\n    <tr>\n      <th>udpipe</th>\n      <td>6.9</td>\n      <td><b>45</b></td>\n      <td><b>242</b></td>\n      <td>56.2</td>\n    </tr>\n    <tr>\n      <th>spacy</th>\n      <td>9.0</td>\n      <td><b>140</b></td>\n      <td><b>579</b></td>\n      <td>41.0</td>\n    </tr>\n    <tr>\n      <th>stanza</th>\n      <td><b>3.0</b></td>\n      <td>591</td>\n      <td>890</td>\n      <td>12.0</td>\n    </tr>\n  </tbody>\n</table>\n<!--- syntax2 --->\n\n## Support\n\n- Chat \u2014 https://telegram.me/natural_language_processing\n- Issues \u2014 https://github.com/natasha/slovnet/issues\n- Commercial support \u2014 https://lab.alexkuk.ru\n\n## Development\n\nDev env\n\n```bash\npython -m venv ~/.venvs/natasha-slovnet\nsource ~/.venvs/natasha-slovnet/bin/activate\n\npip install -r requirements/dev.txt\npip install -e .\n```\n\nTest\n\n```bash\nmake test\n```\n\nRent GPU\n\n```bash\nyc compute instance create \\\n  --name gpu \\\n  --zone ru-central1-a \\\n  --network-interface subnet-name=default,nat-ip-version=ipv4 \\\n  --create-boot-disk image-folder-id=standard-images,image-family=ubuntu-1804-lts-ngc,type=network-ssd,size=20 \\\n  --cores=8 \\\n  --memory=96 \\\n  --gpus=1 \\\n  --ssh-key ~/.ssh/id_rsa.pub \\\n  --folder-name default \\\n  --platform-id gpu-standard-v1 \\\n  --preemptible\n\nyc compute instance delete --name gpu\n```\n\nSetup instance\n\n```\nsudo locale-gen ru_RU.UTF-8\n\nsudo apt-get update\nsudo apt-get install -y \\\n  python3-pip\n\n# grpcio long install ~10m, not using prebuilt wheel\n# \"it is not compatible with this Python\" \nsudo pip3 install -v \\\n  jupyter \\\n  tensorboard\n\nmkdir runs\nnohup tensorboard \\\n  --logdir=runs \\\n  --host=localhost \\\n  --port=6006 \\\n  --reload_interval=1 &\n\nnohup jupyter notebook \\\n  --no-browser \\\n  --allow-root \\\n  --ip=localhost \\\n  --port=8888 \\\n  --NotebookApp.token='' \\\n  --NotebookApp.password='' &\n\nssh -Nf gpu -L 8888:localhost:8888 -L 6006:localhost:6006\n\nscp ~/.slovnet.json gpu:~\nrsync --exclude data -rv . gpu:~/slovnet\nrsync -u --exclude data -rv 'gpu:~/slovnet/*' .\n```\n\nIntall dev\n\n```bash\npip3 install -r slovnet/requirements/dev.txt -r slovnet/requirements/gpu.txt\npip3 install -e slovnet\n```\n\nRelease\n\n```bash\n# Update setup.py version\n\ngit commit -am 'Up version'\ngit tag v0.6.0\n\ngit push\ngit push --tags\n\n# Github Action builds dist and publishes to PyPi\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Deep-learning based NLP modeling for Russian language",
    "version": "0.6.0",
    "split_keywords": [
        "nlp",
        "deeplearning",
        "russian"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7c32d5aff64e3d51ec4021674215680f16b7d2907860c6443b0d058579ac7d59",
                "md5": "4989731dbce0ba173d09158c56fcdc47",
                "sha256": "bdecc3d7cbe5758a675316855d988339592e657565c0f2bc84f5dadb2e056ea4"
            },
            "downloads": -1,
            "filename": "slovnet-0.6.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "4989731dbce0ba173d09158c56fcdc47",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 46662,
            "upload_time": "2023-01-23T08:07:59",
            "upload_time_iso_8601": "2023-01-23T08:07:59.809307Z",
            "url": "https://files.pythonhosted.org/packages/7c/32/d5aff64e3d51ec4021674215680f16b7d2907860c6443b0d058579ac7d59/slovnet-0.6.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3ed1bba34dec46f1fcb85ca35815268be427ee89d09728c85ae4ab294dd9db09",
                "md5": "4e6d99673c377ff12f679dd08ac7749e",
                "sha256": "02d2257bdc9b9cc1d242bd34ee2c861c648f7083ef7898fe1468abcc381ef799"
            },
            "downloads": -1,
            "filename": "slovnet-0.6.0.tar.gz",
            "has_sig": false,
            "md5_digest": "4e6d99673c377ff12f679dd08ac7749e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 70734,
            "upload_time": "2023-01-23T08:08:01",
            "upload_time_iso_8601": "2023-01-23T08:08:01.420282Z",
            "url": "https://files.pythonhosted.org/packages/3e/d1/bba34dec46f1fcb85ca35815268be427ee89d09728c85ae4ab294dd9db09/slovnet-0.6.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-01-23 08:08:01",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "natasha",
    "github_project": "slovnet",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "slovnet"
}
        
Elapsed time: 0.09838s