Feature Engineering for NLP

https://cdn-images-1.medium.com/max/2600/0*1sgws0-RDHjUcyNe

Original Source Here

Understanding N-Grams

N-Grams is a useful language model aimed at finding probability distributions over word sequences. N-Gram essentially means a sequence of N words. Consider a simple example sentence, “This is Big Data AI Book,” whose unigrams, bigrams, and trigrams are shown below.

Illustrating N-grams (Image Source)

Understanding the Math

  • P(w|h): Probability of word w, given some history h
  • Example: P(the| today the sky is so clear that)
  • w: the
  • h: today, the sky is so clear that

Approach 1: Relative Frequency Count

Step 1: Take a text corpus
Step 2: Count the number of times 'today the sky is so clear that' appears
Step 3: Count the number of times it is followed by 'the'
P(the|today the sky is so clear that) =
Count(today the sky is so clear that the)/
Count(today the sky is so clear that)
# In essence, we seek to answer the question, Out of the N times we saw the history h, how many times did the word w follow it?

Disadvantages of the approach:

  • When the size of the text corpus is large, then this approach has to traverse the entire corpus.
  • Not scalable and is clearly suboptimal in performance.

Approach 2: Bigram Model

Bigram model approximates the probability of a word given all the previous words by using only the conditional probability of the preceding word. In the above example that we considered, w_(n-1)=that

Assuming Markov Model (Image Source)

This assumption that the probability of occurrence of a word depends only on the preceding word (Markov Assumption) is quite strong; In general, an N-grams model assumes dependence on the preceding (N-1) words. In practice, this N is a hyperparameter that we can play around with to check which N optimizes model performance on the specific task, say sentiment analysis, text classification, etc.😊

Putting it all together, we’ve covered the differences between syntactic and semantic analysis, importance of POS tagging, Named Entity Recognition(NER) and chunking in text analysis and briefly looked at the concept of N-grams for language modeling.

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot



via WordPress https://ramseyelbasheer.wordpress.com/2020/11/18/feature-engineering-for-nlp/

Popular posts from this blog

I’m Sorry! Evernote Has A New ‘Home’ Now

Jensen Huang: Racism is one flywheel we must stop

Fully Explained DBScan Clustering Algorithm with Python