Probability-based LM

Shortcomings

If a word occurs 0 times, the produced model has prob 0

| Hi | Bye | I | | 0.5 | 0.5 | 0 |

Then the occurrence of “I Hi Bye” = 0 * 0.5 * 0.5 = 0???

But logically there should be a low chance of I occurring.

To fix this use smoothing , e.g. add 1 smoothing