Top Banner
NLTK Alberts Pumpurs
28

Python NLTK

Jul 17, 2015

Download

Documents

Alberts Pumpurs
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Python NLTK

NLTK

Alberts Pumpurs

Page 2: Python NLTK

90% of world's data generated over last two years

Page 3: Python NLTK

commonInternet

user creates

Visual Textual

Instagram

Flickr

Vscocam

Facebook

Tumblr

Blogger

Twitter

Facebook

Emails

Costumer Reviews

Page 4: Python NLTK

Detecting hidden signals

Page 5: Python NLTK

World is full of unstructured, text-rich data. Everything from emails to customer tweets.

The information buried in all that text holds the potential to deliver

valuable business insights

Page 6: Python NLTK

Text analytics is the practice of using technology to gather, store and mine textual information for hidden signals that can be used to inform smarter business decisions

Page 7: Python NLTK

An explosion of unstructured data

Page 8: Python NLTK

Many types of organizations are experiencing explosive growth in their unstructured enterprise data.

Same time that they have access to external sources of data such as social media, blogs, and mobile data.

Page 9: Python NLTK

Until now, much of this information passed through the organization virtually unanalyzed. Today, new tools for handling large amounts of complex data makes it easier to squeeze value from such unlikely sources.

Page 10: Python NLTK

Text Processing use cases

Page 11: Python NLTK

sentiment analysisspam filteringtext categorizationtopic detectionkeyword frequencyplagiatism detection document similarityphrase extraction

Page 12: Python NLTK

Natural Language Tool Kit

leading platform for building

Python programs to work with human language data

Page 13: Python NLTK

NLTK Features

Page 14: Python NLTK

sentence and word tokenizationtext calsificationcorporaparsingclustringpart of speach taggingtext stemming

and mutch more..

Page 15: Python NLTK

Sentencetokenization

Page 16: Python NLTK

Wordtokenization

Page 17: Python NLTK

Part of speech tagging

Page 18: Python NLTK

Part of speech tagging explanationCC Coordinating conjunctinCD Cardinal NumberDT DeterminerEX Existing “ there“FW Foreign wordIN Preposition or subordination conjuctionJJ AdjectiveJJR Adjective- comparativeJJS Adjective- superlativeLS List item markerMD ModalNN Noun- singular or massNNS Non-PluralNP Proper noun- singular

nltk.help.upenn_tagset() //all tag sets

Page 19: Python NLTK

Chunking and NER

Page 20: Python NLTK

Text clasification Algorithms in NLTK

Naive BayesMaximum EntropyDecision Tree

Page 21: Python NLTK

Text clasification

Page 22: Python NLTK

Sentiment analysis

https://github.com/pumpurs/SentimentWordsLV/

Page 23: Python NLTK

Document similarity detection

Tf-idf stands for term frequency-inverse document frequency, and the tf-idf weight is a weight often used in information retrieval and text mining. This weight is a statistical measure used to evaluate how important a word is to a document in a collection or corpus. 

Page 24: Python NLTK

Similarity and concordance

Page 25: Python NLTK

Dispersion Plot

Page 26: Python NLTK

But where is the

Page 27: Python NLTK

“Market and product reserch”

“Social CMS” 1.97 b social network users

“Costumer profiling / analytics”70% of marketers used Facebook to

gain6.7 million people blog on blogging

sites

Page 28: Python NLTK

[email protected]

Big Data, Startups, Text Analysis, Internet of Things, Web Development