pynavernews


Namepynavernews JSON
Version 0.1.0 PyPI version JSON
download
home_page
SummaryNaver News Scraper
upload_time2024-01-24 06:52:52
maintainer
docs_urlNone
authorilotoki0804
requires_python>=3.11,<4.0
licenseApache-2.0
keywords naver news dataset nlp
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # pynavernews

## Introduction

`pynavernews`는 [`canrevan`](https://github.com/affjljoo3581/canrevan)의 코드 중 일부를 재사용해 만든 네이버 웹툰 크롤링 라이브러리입니다.

이 라이브러리의 필요성에 대해서는 [canrevan](https://github.com/affjljoo3581/canrevan#introduction)에 자세히 역설되어 있습니다.

하지만 이 라이브러리는 자연어 데이터만을 위한 라이브러리는 아니며, 네이버 뉴스에서 종합적인 데이터를 불러오는 라이브러리입니다.

## Installation

```console
pip install pynavernews
```

`navernews`가 **아닙니다**. 다른 패키지를 설치하지 않도록 주의해 주세요.

## Build from source

우선 git과 python을 설치하고 레포지토리를 클론하세요.

```console
git clone https://github.com/ilotoki0804/pynavernews.git
```

그런 다음 가상 환경을 생성하고 활성화하세요.

```console
echo 윈도우의 경우
py -3.12 -m venv .venv
.venv\Scripts\activate

echo UNIX인 경우
python3.12 -m venv .venv
.venv/Scripts/activate
```

poetry를 설치하고 의존성을 설치하세요.

```console
pip install poetry
poetry install --no-root
```

`build.py`를 실행하세요.

```console
python build.py
```

이제 `dist`에 빌드된 `whl` 파일과 `tar.gz` 파일이 나타납니다.

## How to use

수집하고자 하는 카테고리의 id를 [네이버 뉴스](https://news.naver.com/)에서 확인합니다.

## Example

2020년 5월 1일부터 31일까지 5개의 페이지에 대한 정치(100)와 경제(101) 카테고리에 대한 뉴스를 수집하는 코드는 다음과 같이 짤 수 있습니다.

```python
from datetime import datetime
from pathlib import Path

from pynavernews import (
    string_date_range,
    construct_index_page_urls,
    fetch_and_store_news_raw_data,
)

date_range = string_date_range(datetime(2024, 1, 1), datetime(2024, 1, 15), 1)
index_page_urls = construct_index_page_urls([100, 101], date_range, 5)
await fetch_and_store_news_raw_data(
    index_page_urls,
    concurrent_tasks=10,
    result_path=Path("result.jsonperline"),
    timeout=20,
    extractor=None,
    proceed=True,
)
```

성공적으로 뉴스 기사가 수집되었다면, 다음과 같이 json 데이터가 한 줄에 하나씩 저장됩니다.

```json
{"original_url": "https://news.naver.com/main/list.nhn?mode=LSD&mid=shm&sid1=101&date=20240114&page=1", "image_url": "https://imgnews.pstatic.net/image/origin/018/2024/01/14/5654670.jpg?type=nf106_72", "article_url": "https://n.news.naver.com/mnews/article/018/0005654670?sid=101", "title": "중동정세 불안에 유가 ‘꿈틀’…산업부, 국내 수급상황 점검", "summary": "정부가 주말인 14일 정유 4사 등 관계기업·기관과 국내 석유·가스 수급 현황과 국제유가 영향 점검에 나섰다. 최남호(오른쪽)  …", "publisher": "이데일리", "date_string": "2024-01-14T23:16:00"}
```

이때 모든 데이터는 문자열이고 image_url는 null이 될 수 있습니다.

만약 `summery` 뿐만이 아닌 전체 기사를 불러오고 싶다면 FullExtractor를 사용하세요.

```python
from datetime import datetime
from pathlib import Path

from pynavernews import (
    string_date_range,
    construct_index_page_urls,
    fetch_and_store_news_raw_data,
    FullExtractor,
)

date_range = string_date_range(datetime(2024, 1, 1), datetime(2024, 1, 15), 1)
index_page_urls = construct_index_page_urls([100, 101], date_range, 5)
await fetch_and_store_news_raw_data(
    index_page_urls,
    concurrent_tasks=10,
    result_path=Path("result-full.jsonperline"),
    timeout=20,
    extractor=FullExtractor(),
    proceed=True,
)
```

그러면 다음과 같이 조금 더 상세한 정보와 함께 전체 데이터가 `content`에 나오게 됩니다. `summary`가 없어지진 않습니다.

```json
{"original_url": "https://news.naver.com/main/list.nhn?mode=LSD&mid=shm&sid1=101&date=20240101&page=5", "image_url": "https://imgnews.pstatic.net/image/origin/005/2024/01/01/1663580.jpg?type=nf106_72", "article_url": "https://n.news.naver.com/mnews/article/005/0001663580?sid=101", "title": "上上, 현실이 되나", "summary": "증권가는 이미 올해 코스피에 대한 장밋빛 전망이 한창이다. 주요국의 통화정책 기조 전환과 국내 수출 회복 전망 등이 맞물리면서  …", "publisher": "국민일보", "date_string": "2024-01-01T20:54:00", "reporter_name": "신재희 기자(jshin@kmib.co.kr)", "content": "증권가, 증시 장밋빛 전망 잇달아\n금리 인하·수출 회복 낙관론 우세\n코스피 2655 마감, 1년새 18.7% ↑\n올해 최대 3000선 돌파 기대감\n이미지를 크게 보려면 국민일보 홈페이지에서 여기를 클릭하세요\n증권가는 이미 올해 코스피에 대한 장밋빛 전망이 한창이다. 주요국의 통화정책 기조 전환과 국내 수출 회복 전망 등이 맞물리면서 증시에 긍정적 흐름이 이어질 것이라는 예상이다.\n1일 금융투자업계에 따르면 코스피 지수는 지난해 마지막 거래일인 28일 2655.28에 장을 마감했다. 코스피 지수는 지난달 미국 연방준비제도(Fed·연준)의 금리 인하 기대감에 힘입어 계속 상승 기세를 이어갔다. 지난해 첫 거래일 시초가와 비교한 연간 상승률은 18.7%다.\n증권가는 올해 코스피 전망 범위를 상향 조정했다. ‘코스피 3000’을 기대하는 증권사도 나왔다. 증시 전망을 가장 낙관적으로 본 곳은 대신증권으로 코스피 변동 폭을 2350~2850으로 제시했다. 특히 미국이 오는 3월 금리 인하를 단행할 경우 코스피 3000선 돌파도 가능할 것으로 봤다.\nKB증권(상단만 2810으로 제시)과 신한투자증권(2200~2800)도 코스피가 2800대까지 오를 수 있을 것으로 내다봤다. 한국투자증권(2300~2750), NH투자증권(2300~2750). 삼성증권(2200~2750)은 2750을 코스피 고점으로 예상했다. 하나증권은 코스피 변동 폭을 2350~2700으로 제시해 상단이 가장 낮았다.\n상고하저? 상저하고? 엇갈린 전망\n증권사들은 연간 시장 흐름에 대해서는 다소 엇갈린 관측을 내놨다. 주요한 ‘변곡점’으로 꼽히는 미국의 금리 인하와 대통령 선거를 기준으로 증시의 상승·하락 시점이 다를 것이라는 분석이다.\n대신증권과 NH투자증권은 하반기 반등을 기대하는 ‘상저하고’ 흐름을 예상했다. 상반기 저점을 찍고 하반기로 갈수록 기업 이익과 경제가 점차 회복되면서 증시도 함께 상승세를 탈 것이라는 예측이다. 이경민 대신증권 연구원은 “상반기는 물가 수준, 연준의 통화정책 스탠스, 시장의 금리 인하 기대가 뒤섞이며 글로벌 금융시장이 혼란스러운 흐름을 보일 것”이라며 “다만 하반기 금리 인하 사이클 진입 시 시장의 방향성은 명확해질 것”이라고 말했다.\n김병연 NH투자증권 연구원은 “미 대선이 치러지는 해의 6월과 11월은 정책 불확실성이 확대되는 가운데 통상 9월이 고점을 찍는다”며 “국내 주식시장도 1분기 낮은 지수대에서 출발해 3분기 고점을 형성할 것”이라고 말했다.\n반면 기준금리 인하를 기점으로 증시가 고점을 찍은 뒤 조정을 받을 것이라는 ‘상고하저’ 의견도 적지 않다. 한국투자증권과 신한투자증권 등은 미국 대선 등 정치 이벤트가 증시 불확실성을 키울 것으로 전망했다.\n노동길 신한투자증권 연구원은 “상반기 재고순환 사이클 회복과 반도체 경기 개선에 따른 코스피 상승세가 기대되고, 하반기에는 미국 대선을 앞둔 경계감과 경기 사이클의 하강 국면, 2025년 증시 이슈들이 부담이 될 것”이라고 예측했다. 김대준 한국투자증권 연구원도 “상반기는 금리 인하와 정부의 증시 부양책 효과가 이어지다 2분기 고점을 찍고 하반기 들어 정책효과 소멸과 대외 정치 리스크로 지수가 흔들릴 수 있다”며 ‘상고하저’ 전망에 힘을 실었다.\n다만 국내 증시가 지난해 말부터 미 연준의 금리 인하 기대감을 선반영한 측면이 있어 향후 과도한 기대감은 경계해야 한다는 목소리도 적지 않다. 정용택 IBK투자증권 연구원은 “2024년은 이미 높아진 추세적 불확실성에 선거, 지정학적 위험 등 외적 위험이 증가하는 시기”라며 “시장의 기대가 급격하게 낙관적으로 변하고 있고 주요 투자은행의 전망치가 빠르게 상향조정되고 있지만 경제와 투자환경은 (낙관하기에) 여전히 조심스럽다”고 말했다.\n올해 증시 주도는 ‘반도체’\n증권업계는 올해 증시를 이어갈 주도주로 단연 반도체를 꼽고 있다. 침체기를 맞았던 반도체 시장이 올해부터 ‘슈퍼 사이클’로 접어들 것이라는 기대감에서다. 반도체업계는 올해 전 세계 메모리 반도체 시장(D램·낸드) 규모가 지난해보다 66% 증가한 1310억 달러(약 170조원)를 기록하고, 2025년에는 전년 대비 39% 증가한 1820억 달러(약 235조원)를 기록할 것으로 예상하고 있다.\n이미 지난해 말부터 반도체주는 영향력을 확대하고 있다. SK하이닉스는 2년 만에 시총 2위를 탈환했고, 삼성전자도 증시 마지막 거래일이던 지난달 28일 7만8500원에 거래를 마치며 2년 만에 ‘8만 전자’ 탈환을 눈앞에 뒀다. 지난해 첫 거래일과 비교하면 각각 41.4%, 89.4% 오른 수치다.\n올해에도 이들 기업의 가파른 회복세가 예상된다. 금융정보업체 에프앤가이드에 따르면 삼성전자의 연결 기준 영업이익은 지난해 7조3443억원에서 올해 33조8109억원, 2025년 49조2039억원으로 늘어날 것으로 전망된다. SK하이닉스의 올해 연간 영업이익 컨센서스도 8조3671억원으로 2021년(21조4103억원) 이후 3년 만에 최대 실적이 가능할 것이라는 시장의 기대가 나온다.\n대형 반도체주뿐 아니라 소부장(소재·부품·장비) 종목에 대한 기대도 커지고 있다. 특히 주가 반등 국면에서는 상대적으로 시가총액이 적은 중소형 소부장 종목의 반등 폭이 더 클 수 있다는 관측이다."}
```

이때 모든 데이터는 문자열이고 image_url와 reporter_name은 null이 될 수 있습니다.

다시 읽을 때는 `f.readline()`을 이용하면 됩니다.

## License

pynavernews는 Apache-2.0 라이선스로 공유됩니다.

pynavernews는 [canrevan](https://github.com/affjljoo3581/canrevan) 레포지토리의 코드 중 일부를 포함하고 있습니다.

이 레포지토리의 예시와 테스트 데이터에는 위키백과, 위키피디아, 부산일보, 연합뉴스TV, KBS, YTN, 이코노미스트, 뉴시스, 데일리안, SBS Biz, 국민일보의 저작물이 포함되어 있습니다.

## Changelog

* 0.1.0: 시작

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "pynavernews",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.11,<4.0",
    "maintainer_email": "",
    "keywords": "naver,news,dataset,nlp",
    "author": "ilotoki0804",
    "author_email": "ilotoki0804@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/82/58/ea8647c05985a1a9ee005d88f6f206466fcc179767a8bb318b79d0072bcb/pynavernews-0.1.0.tar.gz",
    "platform": null,
    "description": "# pynavernews\n\n## Introduction\n\n`pynavernews`\ub294 [`canrevan`](https://github.com/affjljoo3581/canrevan)\uc758 \ucf54\ub4dc \uc911 \uc77c\ubd80\ub97c \uc7ac\uc0ac\uc6a9\ud574 \ub9cc\ub4e0 \ub124\uc774\ubc84 \uc6f9\ud230 \ud06c\ub864\ub9c1 \ub77c\uc774\ube0c\ub7ec\ub9ac\uc785\ub2c8\ub2e4.\n\n\uc774 \ub77c\uc774\ube0c\ub7ec\ub9ac\uc758 \ud544\uc694\uc131\uc5d0 \ub300\ud574\uc11c\ub294 [canrevan](https://github.com/affjljoo3581/canrevan#introduction)\uc5d0 \uc790\uc138\ud788 \uc5ed\uc124\ub418\uc5b4 \uc788\uc2b5\ub2c8\ub2e4.\n\n\ud558\uc9c0\ub9cc \uc774 \ub77c\uc774\ube0c\ub7ec\ub9ac\ub294 \uc790\uc5f0\uc5b4 \ub370\uc774\ud130\ub9cc\uc744 \uc704\ud55c \ub77c\uc774\ube0c\ub7ec\ub9ac\ub294 \uc544\ub2c8\uba70, \ub124\uc774\ubc84 \ub274\uc2a4\uc5d0\uc11c \uc885\ud569\uc801\uc778 \ub370\uc774\ud130\ub97c \ubd88\ub7ec\uc624\ub294 \ub77c\uc774\ube0c\ub7ec\ub9ac\uc785\ub2c8\ub2e4.\n\n## Installation\n\n```console\npip install pynavernews\n```\n\n`navernews`\uac00 **\uc544\ub2d9\ub2c8\ub2e4**. \ub2e4\ub978 \ud328\ud0a4\uc9c0\ub97c \uc124\uce58\ud558\uc9c0 \uc54a\ub3c4\ub85d \uc8fc\uc758\ud574 \uc8fc\uc138\uc694.\n\n## Build from source\n\n\uc6b0\uc120 git\uacfc python\uc744 \uc124\uce58\ud558\uace0 \ub808\ud3ec\uc9c0\ud1a0\ub9ac\ub97c \ud074\ub860\ud558\uc138\uc694.\n\n```console\ngit clone https://github.com/ilotoki0804/pynavernews.git\n```\n\n\uadf8\ub7f0 \ub2e4\uc74c \uac00\uc0c1 \ud658\uacbd\uc744 \uc0dd\uc131\ud558\uace0 \ud65c\uc131\ud654\ud558\uc138\uc694.\n\n```console\necho \uc708\ub3c4\uc6b0\uc758 \uacbd\uc6b0\npy -3.12 -m venv .venv\n.venv\\Scripts\\activate\n\necho UNIX\uc778 \uacbd\uc6b0\npython3.12 -m venv .venv\n.venv/Scripts/activate\n```\n\npoetry\ub97c \uc124\uce58\ud558\uace0 \uc758\uc874\uc131\uc744 \uc124\uce58\ud558\uc138\uc694.\n\n```console\npip install poetry\npoetry install --no-root\n```\n\n`build.py`\ub97c \uc2e4\ud589\ud558\uc138\uc694.\n\n```console\npython build.py\n```\n\n\uc774\uc81c `dist`\uc5d0 \ube4c\ub4dc\ub41c `whl` \ud30c\uc77c\uacfc `tar.gz` \ud30c\uc77c\uc774 \ub098\ud0c0\ub0a9\ub2c8\ub2e4.\n\n## How to use\n\n\uc218\uc9d1\ud558\uace0\uc790 \ud558\ub294 \uce74\ud14c\uace0\ub9ac\uc758 id\ub97c [\ub124\uc774\ubc84 \ub274\uc2a4](https://news.naver.com/)\uc5d0\uc11c \ud655\uc778\ud569\ub2c8\ub2e4.\n\n## Example\n\n2020\ub144 5\uc6d4 1\uc77c\ubd80\ud130 31\uc77c\uae4c\uc9c0 5\uac1c\uc758 \ud398\uc774\uc9c0\uc5d0 \ub300\ud55c \uc815\uce58(100)\uc640 \uacbd\uc81c(101) \uce74\ud14c\uace0\ub9ac\uc5d0 \ub300\ud55c \ub274\uc2a4\ub97c \uc218\uc9d1\ud558\ub294 \ucf54\ub4dc\ub294 \ub2e4\uc74c\uacfc \uac19\uc774 \uc9e4 \uc218 \uc788\uc2b5\ub2c8\ub2e4.\n\n```python\nfrom datetime import datetime\nfrom pathlib import Path\n\nfrom pynavernews import (\n    string_date_range,\n    construct_index_page_urls,\n    fetch_and_store_news_raw_data,\n)\n\ndate_range = string_date_range(datetime(2024, 1, 1), datetime(2024, 1, 15), 1)\nindex_page_urls = construct_index_page_urls([100, 101], date_range, 5)\nawait fetch_and_store_news_raw_data(\n    index_page_urls,\n    concurrent_tasks=10,\n    result_path=Path(\"result.jsonperline\"),\n    timeout=20,\n    extractor=None,\n    proceed=True,\n)\n```\n\n\uc131\uacf5\uc801\uc73c\ub85c \ub274\uc2a4 \uae30\uc0ac\uac00 \uc218\uc9d1\ub418\uc5c8\ub2e4\uba74, \ub2e4\uc74c\uacfc \uac19\uc774 json \ub370\uc774\ud130\uac00 \ud55c \uc904\uc5d0 \ud558\ub098\uc529 \uc800\uc7a5\ub429\ub2c8\ub2e4.\n\n```json\n{\"original_url\": \"https://news.naver.com/main/list.nhn?mode=LSD&mid=shm&sid1=101&date=20240114&page=1\", \"image_url\": \"https://imgnews.pstatic.net/image/origin/018/2024/01/14/5654670.jpg?type=nf106_72\", \"article_url\": \"https://n.news.naver.com/mnews/article/018/0005654670?sid=101\", \"title\": \"\uc911\ub3d9\uc815\uc138 \ubd88\uc548\uc5d0 \uc720\uac00 \u2018\uafc8\ud2c0\u2019\u2026\uc0b0\uc5c5\ubd80, \uad6d\ub0b4 \uc218\uae09\uc0c1\ud669 \uc810\uac80\", \"summary\": \"\uc815\ubd80\uac00 \uc8fc\ub9d0\uc778 14\uc77c \uc815\uc720 4\uc0ac \ub4f1 \uad00\uacc4\uae30\uc5c5\u00b7\uae30\uad00\uacfc \uad6d\ub0b4 \uc11d\uc720\u00b7\uac00\uc2a4 \uc218\uae09 \ud604\ud669\uacfc \uad6d\uc81c\uc720\uac00 \uc601\ud5a5 \uc810\uac80\uc5d0 \ub098\uc130\ub2e4. \ucd5c\ub0a8\ud638(\uc624\ub978\ucabd)  \u2026\", \"publisher\": \"\uc774\ub370\uc77c\ub9ac\", \"date_string\": \"2024-01-14T23:16:00\"}\n```\n\n\uc774\ub54c \ubaa8\ub4e0 \ub370\uc774\ud130\ub294 \ubb38\uc790\uc5f4\uc774\uace0 image_url\ub294 null\uc774 \ub420 \uc218 \uc788\uc2b5\ub2c8\ub2e4.\n\n\ub9cc\uc57d `summery` \ubfd0\ub9cc\uc774 \uc544\ub2cc \uc804\uccb4 \uae30\uc0ac\ub97c \ubd88\ub7ec\uc624\uace0 \uc2f6\ub2e4\uba74 FullExtractor\ub97c \uc0ac\uc6a9\ud558\uc138\uc694.\n\n```python\nfrom datetime import datetime\nfrom pathlib import Path\n\nfrom pynavernews import (\n    string_date_range,\n    construct_index_page_urls,\n    fetch_and_store_news_raw_data,\n    FullExtractor,\n)\n\ndate_range = string_date_range(datetime(2024, 1, 1), datetime(2024, 1, 15), 1)\nindex_page_urls = construct_index_page_urls([100, 101], date_range, 5)\nawait fetch_and_store_news_raw_data(\n    index_page_urls,\n    concurrent_tasks=10,\n    result_path=Path(\"result-full.jsonperline\"),\n    timeout=20,\n    extractor=FullExtractor(),\n    proceed=True,\n)\n```\n\n\uadf8\ub7ec\uba74 \ub2e4\uc74c\uacfc \uac19\uc774 \uc870\uae08 \ub354 \uc0c1\uc138\ud55c \uc815\ubcf4\uc640 \ud568\uaed8 \uc804\uccb4 \ub370\uc774\ud130\uac00 `content`\uc5d0 \ub098\uc624\uac8c \ub429\ub2c8\ub2e4. `summary`\uac00 \uc5c6\uc5b4\uc9c0\uc9c4 \uc54a\uc2b5\ub2c8\ub2e4.\n\n```json\n{\"original_url\": \"https://news.naver.com/main/list.nhn?mode=LSD&mid=shm&sid1=101&date=20240101&page=5\", \"image_url\": \"https://imgnews.pstatic.net/image/origin/005/2024/01/01/1663580.jpg?type=nf106_72\", \"article_url\": \"https://n.news.naver.com/mnews/article/005/0001663580?sid=101\", \"title\": \"\u4e0a\u4e0a, \ud604\uc2e4\uc774 \ub418\ub098\", \"summary\": \"\uc99d\uad8c\uac00\ub294 \uc774\ubbf8 \uc62c\ud574 \ucf54\uc2a4\ud53c\uc5d0 \ub300\ud55c \uc7a5\ubc0b\ube5b \uc804\ub9dd\uc774 \ud55c\ucc3d\uc774\ub2e4. \uc8fc\uc694\uad6d\uc758 \ud1b5\ud654\uc815\ucc45 \uae30\uc870 \uc804\ud658\uacfc \uad6d\ub0b4 \uc218\ucd9c \ud68c\ubcf5 \uc804\ub9dd \ub4f1\uc774 \ub9de\ubb3c\ub9ac\uba74\uc11c  \u2026\", \"publisher\": \"\uad6d\ubbfc\uc77c\ubcf4\", \"date_string\": \"2024-01-01T20:54:00\", \"reporter_name\": \"\uc2e0\uc7ac\ud76c \uae30\uc790(jshin@kmib.co.kr)\", \"content\": \"\uc99d\uad8c\uac00, \uc99d\uc2dc \uc7a5\ubc0b\ube5b \uc804\ub9dd \uc787\ub2ec\uc544\\n\uae08\ub9ac \uc778\ud558\u00b7\uc218\ucd9c \ud68c\ubcf5 \ub099\uad00\ub860 \uc6b0\uc138\\n\ucf54\uc2a4\ud53c 2655 \ub9c8\uac10, 1\ub144\uc0c8 18.7% \u2191\\n\uc62c\ud574 \ucd5c\ub300 3000\uc120 \ub3cc\ud30c \uae30\ub300\uac10\\n\uc774\ubbf8\uc9c0\ub97c \ud06c\uac8c \ubcf4\ub824\uba74 \uad6d\ubbfc\uc77c\ubcf4 \ud648\ud398\uc774\uc9c0\uc5d0\uc11c \uc5ec\uae30\ub97c \ud074\ub9ad\ud558\uc138\uc694\\n\uc99d\uad8c\uac00\ub294 \uc774\ubbf8 \uc62c\ud574 \ucf54\uc2a4\ud53c\uc5d0 \ub300\ud55c \uc7a5\ubc0b\ube5b \uc804\ub9dd\uc774 \ud55c\ucc3d\uc774\ub2e4. \uc8fc\uc694\uad6d\uc758 \ud1b5\ud654\uc815\ucc45 \uae30\uc870 \uc804\ud658\uacfc \uad6d\ub0b4 \uc218\ucd9c \ud68c\ubcf5 \uc804\ub9dd \ub4f1\uc774 \ub9de\ubb3c\ub9ac\uba74\uc11c \uc99d\uc2dc\uc5d0 \uae0d\uc815\uc801 \ud750\ub984\uc774 \uc774\uc5b4\uc9c8 \uac83\uc774\ub77c\ub294 \uc608\uc0c1\uc774\ub2e4.\\n1\uc77c \uae08\uc735\ud22c\uc790\uc5c5\uacc4\uc5d0 \ub530\ub974\uba74 \ucf54\uc2a4\ud53c \uc9c0\uc218\ub294 \uc9c0\ub09c\ud574 \ub9c8\uc9c0\ub9c9 \uac70\ub798\uc77c\uc778 28\uc77c 2655.28\uc5d0 \uc7a5\uc744 \ub9c8\uac10\ud588\ub2e4. \ucf54\uc2a4\ud53c \uc9c0\uc218\ub294 \uc9c0\ub09c\ub2ec \ubbf8\uad6d \uc5f0\ubc29\uc900\ube44\uc81c\ub3c4(Fed\u00b7\uc5f0\uc900)\uc758 \uae08\ub9ac \uc778\ud558 \uae30\ub300\uac10\uc5d0 \ud798\uc785\uc5b4 \uacc4\uc18d \uc0c1\uc2b9 \uae30\uc138\ub97c \uc774\uc5b4\uac14\ub2e4. \uc9c0\ub09c\ud574 \uccab \uac70\ub798\uc77c \uc2dc\ucd08\uac00\uc640 \ube44\uad50\ud55c \uc5f0\uac04 \uc0c1\uc2b9\ub960\uc740 18.7%\ub2e4.\\n\uc99d\uad8c\uac00\ub294 \uc62c\ud574 \ucf54\uc2a4\ud53c \uc804\ub9dd \ubc94\uc704\ub97c \uc0c1\ud5a5 \uc870\uc815\ud588\ub2e4. \u2018\ucf54\uc2a4\ud53c 3000\u2019\uc744 \uae30\ub300\ud558\ub294 \uc99d\uad8c\uc0ac\ub3c4 \ub098\uc654\ub2e4. \uc99d\uc2dc \uc804\ub9dd\uc744 \uac00\uc7a5 \ub099\uad00\uc801\uc73c\ub85c \ubcf8 \uacf3\uc740 \ub300\uc2e0\uc99d\uad8c\uc73c\ub85c \ucf54\uc2a4\ud53c \ubcc0\ub3d9 \ud3ed\uc744 2350~2850\uc73c\ub85c \uc81c\uc2dc\ud588\ub2e4. \ud2b9\ud788 \ubbf8\uad6d\uc774 \uc624\ub294 3\uc6d4 \uae08\ub9ac \uc778\ud558\ub97c \ub2e8\ud589\ud560 \uacbd\uc6b0 \ucf54\uc2a4\ud53c 3000\uc120 \ub3cc\ud30c\ub3c4 \uac00\ub2a5\ud560 \uac83\uc73c\ub85c \ubd24\ub2e4.\\nKB\uc99d\uad8c(\uc0c1\ub2e8\ub9cc 2810\uc73c\ub85c \uc81c\uc2dc)\uacfc \uc2e0\ud55c\ud22c\uc790\uc99d\uad8c(2200~2800)\ub3c4 \ucf54\uc2a4\ud53c\uac00 2800\ub300\uae4c\uc9c0 \uc624\ub97c \uc218 \uc788\uc744 \uac83\uc73c\ub85c \ub0b4\ub2e4\ubd24\ub2e4. \ud55c\uad6d\ud22c\uc790\uc99d\uad8c(2300~2750), NH\ud22c\uc790\uc99d\uad8c(2300~2750). \uc0bc\uc131\uc99d\uad8c(2200~2750)\uc740 2750\uc744 \ucf54\uc2a4\ud53c \uace0\uc810\uc73c\ub85c \uc608\uc0c1\ud588\ub2e4. \ud558\ub098\uc99d\uad8c\uc740 \ucf54\uc2a4\ud53c \ubcc0\ub3d9 \ud3ed\uc744 2350~2700\uc73c\ub85c \uc81c\uc2dc\ud574 \uc0c1\ub2e8\uc774 \uac00\uc7a5 \ub0ae\uc558\ub2e4.\\n\uc0c1\uace0\ud558\uc800? \uc0c1\uc800\ud558\uace0? \uc5c7\uac08\ub9b0 \uc804\ub9dd\\n\uc99d\uad8c\uc0ac\ub4e4\uc740 \uc5f0\uac04 \uc2dc\uc7a5 \ud750\ub984\uc5d0 \ub300\ud574\uc11c\ub294 \ub2e4\uc18c \uc5c7\uac08\ub9b0 \uad00\uce21\uc744 \ub0b4\ub1a8\ub2e4. \uc8fc\uc694\ud55c \u2018\ubcc0\uace1\uc810\u2019\uc73c\ub85c \uaf3d\ud788\ub294 \ubbf8\uad6d\uc758 \uae08\ub9ac \uc778\ud558\uc640 \ub300\ud1b5\ub839 \uc120\uac70\ub97c \uae30\uc900\uc73c\ub85c \uc99d\uc2dc\uc758 \uc0c1\uc2b9\u00b7\ud558\ub77d \uc2dc\uc810\uc774 \ub2e4\ub97c \uac83\uc774\ub77c\ub294 \ubd84\uc11d\uc774\ub2e4.\\n\ub300\uc2e0\uc99d\uad8c\uacfc NH\ud22c\uc790\uc99d\uad8c\uc740 \ud558\ubc18\uae30 \ubc18\ub4f1\uc744 \uae30\ub300\ud558\ub294 \u2018\uc0c1\uc800\ud558\uace0\u2019 \ud750\ub984\uc744 \uc608\uc0c1\ud588\ub2e4. \uc0c1\ubc18\uae30 \uc800\uc810\uc744 \ucc0d\uace0 \ud558\ubc18\uae30\ub85c \uac08\uc218\ub85d \uae30\uc5c5 \uc774\uc775\uacfc \uacbd\uc81c\uac00 \uc810\ucc28 \ud68c\ubcf5\ub418\uba74\uc11c \uc99d\uc2dc\ub3c4 \ud568\uaed8 \uc0c1\uc2b9\uc138\ub97c \ud0c8 \uac83\uc774\ub77c\ub294 \uc608\uce21\uc774\ub2e4. \uc774\uacbd\ubbfc \ub300\uc2e0\uc99d\uad8c \uc5f0\uad6c\uc6d0\uc740 \u201c\uc0c1\ubc18\uae30\ub294 \ubb3c\uac00 \uc218\uc900, \uc5f0\uc900\uc758 \ud1b5\ud654\uc815\ucc45 \uc2a4\ud0e0\uc2a4, \uc2dc\uc7a5\uc758 \uae08\ub9ac \uc778\ud558 \uae30\ub300\uac00 \ub4a4\uc11e\uc774\uba70 \uae00\ub85c\ubc8c \uae08\uc735\uc2dc\uc7a5\uc774 \ud63c\ub780\uc2a4\ub7ec\uc6b4 \ud750\ub984\uc744 \ubcf4\uc77c \uac83\u201d\uc774\ub77c\uba70 \u201c\ub2e4\ub9cc \ud558\ubc18\uae30 \uae08\ub9ac \uc778\ud558 \uc0ac\uc774\ud074 \uc9c4\uc785 \uc2dc \uc2dc\uc7a5\uc758 \ubc29\ud5a5\uc131\uc740 \uba85\ud655\ud574\uc9c8 \uac83\u201d\uc774\ub77c\uace0 \ub9d0\ud588\ub2e4.\\n\uae40\ubcd1\uc5f0 NH\ud22c\uc790\uc99d\uad8c \uc5f0\uad6c\uc6d0\uc740 \u201c\ubbf8 \ub300\uc120\uc774 \uce58\ub7ec\uc9c0\ub294 \ud574\uc758 6\uc6d4\uacfc 11\uc6d4\uc740 \uc815\ucc45 \ubd88\ud655\uc2e4\uc131\uc774 \ud655\ub300\ub418\ub294 \uac00\uc6b4\ub370 \ud1b5\uc0c1 9\uc6d4\uc774 \uace0\uc810\uc744 \ucc0d\ub294\ub2e4\u201d\uba70 \u201c\uad6d\ub0b4 \uc8fc\uc2dd\uc2dc\uc7a5\ub3c4 1\ubd84\uae30 \ub0ae\uc740 \uc9c0\uc218\ub300\uc5d0\uc11c \ucd9c\ubc1c\ud574 3\ubd84\uae30 \uace0\uc810\uc744 \ud615\uc131\ud560 \uac83\u201d\uc774\ub77c\uace0 \ub9d0\ud588\ub2e4.\\n\ubc18\uba74 \uae30\uc900\uae08\ub9ac \uc778\ud558\ub97c \uae30\uc810\uc73c\ub85c \uc99d\uc2dc\uac00 \uace0\uc810\uc744 \ucc0d\uc740 \ub4a4 \uc870\uc815\uc744 \ubc1b\uc744 \uac83\uc774\ub77c\ub294 \u2018\uc0c1\uace0\ud558\uc800\u2019 \uc758\uacac\ub3c4 \uc801\uc9c0 \uc54a\ub2e4. \ud55c\uad6d\ud22c\uc790\uc99d\uad8c\uacfc \uc2e0\ud55c\ud22c\uc790\uc99d\uad8c \ub4f1\uc740 \ubbf8\uad6d \ub300\uc120 \ub4f1 \uc815\uce58 \uc774\ubca4\ud2b8\uac00 \uc99d\uc2dc \ubd88\ud655\uc2e4\uc131\uc744 \ud0a4\uc6b8 \uac83\uc73c\ub85c \uc804\ub9dd\ud588\ub2e4.\\n\ub178\ub3d9\uae38 \uc2e0\ud55c\ud22c\uc790\uc99d\uad8c \uc5f0\uad6c\uc6d0\uc740 \u201c\uc0c1\ubc18\uae30 \uc7ac\uace0\uc21c\ud658 \uc0ac\uc774\ud074 \ud68c\ubcf5\uacfc \ubc18\ub3c4\uccb4 \uacbd\uae30 \uac1c\uc120\uc5d0 \ub530\ub978 \ucf54\uc2a4\ud53c \uc0c1\uc2b9\uc138\uac00 \uae30\ub300\ub418\uace0, \ud558\ubc18\uae30\uc5d0\ub294 \ubbf8\uad6d \ub300\uc120\uc744 \uc55e\ub454 \uacbd\uacc4\uac10\uacfc \uacbd\uae30 \uc0ac\uc774\ud074\uc758 \ud558\uac15 \uad6d\uba74, 2025\ub144 \uc99d\uc2dc \uc774\uc288\ub4e4\uc774 \ubd80\ub2f4\uc774 \ub420 \uac83\u201d\uc774\ub77c\uace0 \uc608\uce21\ud588\ub2e4. \uae40\ub300\uc900 \ud55c\uad6d\ud22c\uc790\uc99d\uad8c \uc5f0\uad6c\uc6d0\ub3c4 \u201c\uc0c1\ubc18\uae30\ub294 \uae08\ub9ac \uc778\ud558\uc640 \uc815\ubd80\uc758 \uc99d\uc2dc \ubd80\uc591\ucc45 \ud6a8\uacfc\uac00 \uc774\uc5b4\uc9c0\ub2e4 2\ubd84\uae30 \uace0\uc810\uc744 \ucc0d\uace0 \ud558\ubc18\uae30 \ub4e4\uc5b4 \uc815\ucc45\ud6a8\uacfc \uc18c\uba78\uacfc \ub300\uc678 \uc815\uce58 \ub9ac\uc2a4\ud06c\ub85c \uc9c0\uc218\uac00 \ud754\ub4e4\ub9b4 \uc218 \uc788\ub2e4\u201d\uba70 \u2018\uc0c1\uace0\ud558\uc800\u2019 \uc804\ub9dd\uc5d0 \ud798\uc744 \uc2e4\uc5c8\ub2e4.\\n\ub2e4\ub9cc \uad6d\ub0b4 \uc99d\uc2dc\uac00 \uc9c0\ub09c\ud574 \ub9d0\ubd80\ud130 \ubbf8 \uc5f0\uc900\uc758 \uae08\ub9ac \uc778\ud558 \uae30\ub300\uac10\uc744 \uc120\ubc18\uc601\ud55c \uce21\uba74\uc774 \uc788\uc5b4 \ud5a5\ud6c4 \uacfc\ub3c4\ud55c \uae30\ub300\uac10\uc740 \uacbd\uacc4\ud574\uc57c \ud55c\ub2e4\ub294 \ubaa9\uc18c\ub9ac\ub3c4 \uc801\uc9c0 \uc54a\ub2e4. \uc815\uc6a9\ud0dd IBK\ud22c\uc790\uc99d\uad8c \uc5f0\uad6c\uc6d0\uc740 \u201c2024\ub144\uc740 \uc774\ubbf8 \ub192\uc544\uc9c4 \ucd94\uc138\uc801 \ubd88\ud655\uc2e4\uc131\uc5d0 \uc120\uac70, \uc9c0\uc815\ud559\uc801 \uc704\ud5d8 \ub4f1 \uc678\uc801 \uc704\ud5d8\uc774 \uc99d\uac00\ud558\ub294 \uc2dc\uae30\u201d\ub77c\uba70 \u201c\uc2dc\uc7a5\uc758 \uae30\ub300\uac00 \uae09\uaca9\ud558\uac8c \ub099\uad00\uc801\uc73c\ub85c \ubcc0\ud558\uace0 \uc788\uace0 \uc8fc\uc694 \ud22c\uc790\uc740\ud589\uc758 \uc804\ub9dd\uce58\uac00 \ube60\ub974\uac8c \uc0c1\ud5a5\uc870\uc815\ub418\uace0 \uc788\uc9c0\ub9cc \uacbd\uc81c\uc640 \ud22c\uc790\ud658\uacbd\uc740 (\ub099\uad00\ud558\uae30\uc5d0) \uc5ec\uc804\ud788 \uc870\uc2ec\uc2a4\ub7fd\ub2e4\u201d\uace0 \ub9d0\ud588\ub2e4.\\n\uc62c\ud574 \uc99d\uc2dc \uc8fc\ub3c4\ub294 \u2018\ubc18\ub3c4\uccb4\u2019\\n\uc99d\uad8c\uc5c5\uacc4\ub294 \uc62c\ud574 \uc99d\uc2dc\ub97c \uc774\uc5b4\uac08 \uc8fc\ub3c4\uc8fc\ub85c \ub2e8\uc5f0 \ubc18\ub3c4\uccb4\ub97c \uaf3d\uace0 \uc788\ub2e4. \uce68\uccb4\uae30\ub97c \ub9de\uc558\ub358 \ubc18\ub3c4\uccb4 \uc2dc\uc7a5\uc774 \uc62c\ud574\ubd80\ud130 \u2018\uc288\ud37c \uc0ac\uc774\ud074\u2019\ub85c \uc811\uc5b4\ub4e4 \uac83\uc774\ub77c\ub294 \uae30\ub300\uac10\uc5d0\uc11c\ub2e4. \ubc18\ub3c4\uccb4\uc5c5\uacc4\ub294 \uc62c\ud574 \uc804 \uc138\uacc4 \uba54\ubaa8\ub9ac \ubc18\ub3c4\uccb4 \uc2dc\uc7a5(D\ub7a8\u00b7\ub0b8\ub4dc) \uaddc\ubaa8\uac00 \uc9c0\ub09c\ud574\ubcf4\ub2e4 66% \uc99d\uac00\ud55c 1310\uc5b5 \ub2ec\ub7ec(\uc57d 170\uc870\uc6d0)\ub97c \uae30\ub85d\ud558\uace0, 2025\ub144\uc5d0\ub294 \uc804\ub144 \ub300\ube44 39% \uc99d\uac00\ud55c 1820\uc5b5 \ub2ec\ub7ec(\uc57d 235\uc870\uc6d0)\ub97c \uae30\ub85d\ud560 \uac83\uc73c\ub85c \uc608\uc0c1\ud558\uace0 \uc788\ub2e4.\\n\uc774\ubbf8 \uc9c0\ub09c\ud574 \ub9d0\ubd80\ud130 \ubc18\ub3c4\uccb4\uc8fc\ub294 \uc601\ud5a5\ub825\uc744 \ud655\ub300\ud558\uace0 \uc788\ub2e4. SK\ud558\uc774\ub2c9\uc2a4\ub294 2\ub144 \ub9cc\uc5d0 \uc2dc\ucd1d 2\uc704\ub97c \ud0c8\ud658\ud588\uace0, \uc0bc\uc131\uc804\uc790\ub3c4 \uc99d\uc2dc \ub9c8\uc9c0\ub9c9 \uac70\ub798\uc77c\uc774\ub358 \uc9c0\ub09c\ub2ec 28\uc77c 7\ub9cc8500\uc6d0\uc5d0 \uac70\ub798\ub97c \ub9c8\uce58\uba70 2\ub144 \ub9cc\uc5d0 \u20188\ub9cc \uc804\uc790\u2019 \ud0c8\ud658\uc744 \ub208\uc55e\uc5d0 \ub480\ub2e4. \uc9c0\ub09c\ud574 \uccab \uac70\ub798\uc77c\uacfc \ube44\uad50\ud558\uba74 \uac01\uac01 41.4%, 89.4% \uc624\ub978 \uc218\uce58\ub2e4.\\n\uc62c\ud574\uc5d0\ub3c4 \uc774\ub4e4 \uae30\uc5c5\uc758 \uac00\ud30c\ub978 \ud68c\ubcf5\uc138\uac00 \uc608\uc0c1\ub41c\ub2e4. \uae08\uc735\uc815\ubcf4\uc5c5\uccb4 \uc5d0\ud504\uc564\uac00\uc774\ub4dc\uc5d0 \ub530\ub974\uba74 \uc0bc\uc131\uc804\uc790\uc758 \uc5f0\uacb0 \uae30\uc900 \uc601\uc5c5\uc774\uc775\uc740 \uc9c0\ub09c\ud574 7\uc8703443\uc5b5\uc6d0\uc5d0\uc11c \uc62c\ud574 33\uc8708109\uc5b5\uc6d0, 2025\ub144 49\uc8702039\uc5b5\uc6d0\uc73c\ub85c \ub298\uc5b4\ub0a0 \uac83\uc73c\ub85c \uc804\ub9dd\ub41c\ub2e4. SK\ud558\uc774\ub2c9\uc2a4\uc758 \uc62c\ud574 \uc5f0\uac04 \uc601\uc5c5\uc774\uc775 \ucee8\uc13c\uc11c\uc2a4\ub3c4 8\uc8703671\uc5b5\uc6d0\uc73c\ub85c 2021\ub144(21\uc8704103\uc5b5\uc6d0) \uc774\ud6c4 3\ub144 \ub9cc\uc5d0 \ucd5c\ub300 \uc2e4\uc801\uc774 \uac00\ub2a5\ud560 \uac83\uc774\ub77c\ub294 \uc2dc\uc7a5\uc758 \uae30\ub300\uac00 \ub098\uc628\ub2e4.\\n\ub300\ud615 \ubc18\ub3c4\uccb4\uc8fc\ubfd0 \uc544\ub2c8\ub77c \uc18c\ubd80\uc7a5(\uc18c\uc7ac\u00b7\ubd80\ud488\u00b7\uc7a5\ube44) \uc885\ubaa9\uc5d0 \ub300\ud55c \uae30\ub300\ub3c4 \ucee4\uc9c0\uace0 \uc788\ub2e4. \ud2b9\ud788 \uc8fc\uac00 \ubc18\ub4f1 \uad6d\uba74\uc5d0\uc11c\ub294 \uc0c1\ub300\uc801\uc73c\ub85c \uc2dc\uac00\ucd1d\uc561\uc774 \uc801\uc740 \uc911\uc18c\ud615 \uc18c\ubd80\uc7a5 \uc885\ubaa9\uc758 \ubc18\ub4f1 \ud3ed\uc774 \ub354 \ud074 \uc218 \uc788\ub2e4\ub294 \uad00\uce21\uc774\ub2e4.\"}\n```\n\n\uc774\ub54c \ubaa8\ub4e0 \ub370\uc774\ud130\ub294 \ubb38\uc790\uc5f4\uc774\uace0 image_url\uc640 reporter_name\uc740 null\uc774 \ub420 \uc218 \uc788\uc2b5\ub2c8\ub2e4.\n\n\ub2e4\uc2dc \uc77d\uc744 \ub54c\ub294 `f.readline()`\uc744 \uc774\uc6a9\ud558\uba74 \ub429\ub2c8\ub2e4.\n\n## License\n\npynavernews\ub294 Apache-2.0 \ub77c\uc774\uc120\uc2a4\ub85c \uacf5\uc720\ub429\ub2c8\ub2e4.\n\npynavernews\ub294 [canrevan](https://github.com/affjljoo3581/canrevan) \ub808\ud3ec\uc9c0\ud1a0\ub9ac\uc758 \ucf54\ub4dc \uc911 \uc77c\ubd80\ub97c \ud3ec\ud568\ud558\uace0 \uc788\uc2b5\ub2c8\ub2e4.\n\n\uc774 \ub808\ud3ec\uc9c0\ud1a0\ub9ac\uc758 \uc608\uc2dc\uc640 \ud14c\uc2a4\ud2b8 \ub370\uc774\ud130\uc5d0\ub294 \uc704\ud0a4\ubc31\uacfc, \uc704\ud0a4\ud53c\ub514\uc544, \ubd80\uc0b0\uc77c\ubcf4, \uc5f0\ud569\ub274\uc2a4TV, KBS, YTN, \uc774\ucf54\ub178\ubbf8\uc2a4\ud2b8, \ub274\uc2dc\uc2a4, \ub370\uc77c\ub9ac\uc548, SBS Biz, \uad6d\ubbfc\uc77c\ubcf4\uc758 \uc800\uc791\ubb3c\uc774 \ud3ec\ud568\ub418\uc5b4 \uc788\uc2b5\ub2c8\ub2e4.\n\n## Changelog\n\n* 0.1.0: \uc2dc\uc791\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Naver News Scraper",
    "version": "0.1.0",
    "project_urls": null,
    "split_keywords": [
        "naver",
        "news",
        "dataset",
        "nlp"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2df32cb270364016452b16940fb634670083b3bd47f6e2efc1b10430fd5c91d9",
                "md5": "b299a97f13ed3547883bbb4e4da83449",
                "sha256": "ba952d6d5460ba09e8f4cf538451cc39478bf225358e981bce94b10f425e3ac6"
            },
            "downloads": -1,
            "filename": "pynavernews-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b299a97f13ed3547883bbb4e4da83449",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11,<4.0",
            "size": 14105,
            "upload_time": "2024-01-24T06:52:51",
            "upload_time_iso_8601": "2024-01-24T06:52:51.069530Z",
            "url": "https://files.pythonhosted.org/packages/2d/f3/2cb270364016452b16940fb634670083b3bd47f6e2efc1b10430fd5c91d9/pynavernews-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8258ea8647c05985a1a9ee005d88f6f206466fcc179767a8bb318b79d0072bcb",
                "md5": "9b16e8e16827f0f56ff9df28d7be9c7b",
                "sha256": "49ed2b726fb0d5a440a09b195ddc0d10848baa28a2b90dc069f25985e306606f"
            },
            "downloads": -1,
            "filename": "pynavernews-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "9b16e8e16827f0f56ff9df28d7be9c7b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11,<4.0",
            "size": 13456,
            "upload_time": "2024-01-24T06:52:52",
            "upload_time_iso_8601": "2024-01-24T06:52:52.585774Z",
            "url": "https://files.pythonhosted.org/packages/82/58/ea8647c05985a1a9ee005d88f6f206466fcc179767a8bb318b79d0072bcb/pynavernews-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-01-24 06:52:52",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "pynavernews"
}
        
Elapsed time: 0.17462s