What is "entropy and information gain"?
I am reading this book (NLTK) and it is confusing. Entropy is defined as:
Entropy is the sum of the probability of each label times the log probability of that same label
How can I apply entropy and maximum entropy in terms of text mining? Can someone give me a easy, simple example (visual)?