Natural-Language-Processing

Statistical NLP methods determine the likelihood of a word combination according to its frequency in a training corpus
probabilistic word association model based on distributional word similarity,
class based vs similarity based models provide alternative to independence assumption of cooccurance of w2 & w1 for a bigram model P(w2|w1)
MLE estimates the parameters of prob distribution by maximising the liklihood function, so that under the statistical model the observed data is most probable.
Why we study prob distribution - becasue we want to know how sure are we for an event to occur e.g. we have tossed two coins 100 times. doing so we get discrete events - binomial distribution.
if an observation is not in the training corpus then the MLE estimate P(w2|w1) will result 0.

Provide feedback