Top Banner
Python Machine Learning Chapter 06. Text Analysis & Chatbot [email protected] Ryan Jeong
12

Python machine learning_chap06_1

Jan 21, 2018

Download

Software

PartPrime
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Python machine learning_chap06_1

Python Machine LearningChapter 06. Text Analysis & Chatbot

[email protected] Jeong

Page 2: Python machine learning_chap06_1

Today …

6-1 KOREAN MORPHOLOGY

6-2 ABOUT Word2Vec

Page 3: Python machine learning_chap06_1

6-1 KOREAN MORPHOLOGYwith KoNLPy

Page 4: Python machine learning_chap06_1

KoNLPy�설치

jdk설치

KoNLPy설치

$pip3 install konlpy or

Page 5: Python machine learning_chap06_1

연습�1단계�:�기본�형태소�분석�연습

소스

결과출력

이�예제�소스는�많은�한글�형태소�분석�라이브러리�중에서,�Twitter�라이브러리를�사용하는�가장�기본적인�예제�입니다.�

한글�형태소분석�라이브러리�중에서,�속도�성능은�Mecab이�가장�좋다고�알려져�있지만,�Twitter는�개인적으로�normalization�기능이�좋아서,��나중에�학습시킬�때�여러모로�활용할�수�있어서�좋아합니다.

참고자료�출처�:�http://konlpy-ko.readthedocs.io/ko/v0.4.3/morph/

Page 6: Python machine learning_chap06_1

연습�2단계�:�형태소�+�단어빈도�분석�연습소스 결과출력

이�예제를�실행하면,�명사만�추출하여,�명사가�출현한�빈도를�세어서,�그�명사와�함께�저장해�둡니다.�

그�후,�for�문을�돌면서,�출현빈도가�많은�순으로�상위�50개까지의�단어데이터를,�‘명사(개수)’�형태로�출력합니다.�

Page 7: Python machine learning_chap06_1

6-2 ABOUT Word2Vecwith KoNLPy

Page 8: Python machine learning_chap06_1

Word2Vec 란?

문장�내부의�단어들끼리의�상관관계를�표현하기�위해,�단어를�숫자�벡터로�변환하는�것.

Page 9: Python machine learning_chap06_1

Word2Vec을�위한�Gensim�설치

$pip3 install gensim or

Page 10: Python machine learning_chap06_1

연습�1단계�:�Word2Vec�모델�만들기�연습소스 결과출력

이�예제를�실행하면,�결과�출력은�위와�같이�나옵니다.�

calvin.wakati�파일은,�원본텍스트에서�조사/어미/구두점�등을�제거한�후,�새롭게�저장한�text�파일�입니다.�

그러나�실제로�calvin.model�이라는�파일이�생성되는데,�이것이�실질적인�이�프로그램의�결과물이지요.�

Page 11: Python machine learning_chap06_1

연습�2단계�:�만든�Word2Vec�모델�써먹기�연습

소스 결과출력

이제�저장했던�모델을�불러와서,�‘칼뱅’과�가까운�단어들을�추출해�보았습니다.�

출력된�결과는,�읽어들이�텍스트�데이터를�학습한�결과,�대략�유사도가�98점�이상�나오는�단어들이�추출된�것입니다.

Page 12: Python machine learning_chap06_1

Thank youhttp://www.partprime.com