This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Learning Bit by Bit
Class 4 - Ngrams
Ngrams
• Counting words• Using observation to make predictions
Ngrams
• Corpus/Corpora
Unigram
• “how’s the weather out there?”• [how’s, the, weather, out, there]
Unigram
• how many words are there?
Unigram
• How many times does “weather” occur?
Unigram
• Prob “weather” = occurrences of “weather”/ total # words
Unigram
• P(“weather”) = c(“weather”) / c(total)
Bigram
• “the storm swept through the land”• [(the, storm), (storm, swept), (swept,
through), (through, the), (the land)]
Bigram
• How many times does “storm” follow “the”?
Bigram
• How many times does the word “the” occur?
Bigram
• Prob “the storm” given “the” = occurrences of “the storm”/ occurrences of “the”
Bigram
• Prob “the storm” = occurrences of “the storm”/ occurrences of “the”
• P(word n| word n-1)
Markov Assumption
• The assumption that the probability of a word can depend only on the previous word, or previous N words
• P(“land” | “the”)• P (“land” | “the storm swept through the”)
N gram
• Extends bigram model to previous N words
Maximum Likelihood Estimation
• N-Gram probability based on corpus counts• P(word n| word n-1) = counts of word n-1 followed by word n /Counts of all times word n-1 occurs
Trigram
• “the quick red fox jumped the quick black bear. The quick red fox hopped away.”