The Jensen-Shannon Divergence: A Measure of Distance Between Probability Distributions

udit
3 min readJan 1, 2023

--

Source: https://medium.com/datalab-log/measuring-the-statistical-similarity-between-two-samples-using-jensen-shannon-and-kullback-leibler-8d05af514b15

The Jensen-Shannon divergence (JSD) is a measure of the distance between two probability distributions. It is based on the Kullback-Leibler divergence, which is a measure of the difference between two probability distributions, but it is symmetric, meaning that it does not matter which distribution is considered the “reference” distribution and which is considered the “comparison” distribution. In this article, we will provide an intuition for the Jensen-Shannon divergence, give some examples of its use, and derive its mathematical definition.

Intuition

The Jensen-Shannon divergence is a measure of the “distance” between two probability distributions. It can be thought of as a measure of the similarity or dissimilarity between the two distributions. The JSD is always non-negative, with a value of 0 indicating that the two distributions are identical, and a value greater than 0 indicating that the two distributions are different.

The JSD is based on the Kullback-Leibler divergence, which is a measure of the difference between two probability distributions. However, the Kullback-Leibler divergence is asymmetric, meaning that it depends on which distribution is considered the “reference” distribution and which is considered the “comparison” distribution. The Jensen-Shannon divergence is a symmetrized version of the Kullback-Leibler divergence, which allows it to be used to compare any two probability distributions, regardless of which distribution is considered the “reference” and which is considered the “comparison.”

Examples

The Jensen-Shannon divergence has a wide range of applications, including:

  1. Comparing the similarity of two probability distributions, such as the distribution of word frequencies in two different documents.
  2. Clustering data points based on the similarity of their probability distributions, such as clustering genes based on the expression levels of their corresponding proteins.
  3. Compressing data by representing it as a distribution and finding a reference distribution that is similar to it, such as using a Gaussian distribution as a reference distribution to compress a dataset of real-valued numbers.

Derivation

The Jensen-Shannon divergence between two probability distributions, P and Q, is defined as:

JSD(P || Q) = (KL(P || M) + KL(Q || M)) / 2

Where M is the “average” distribution, defined as:

M = (P + Q) / 2

And KL(P || Q) is the Kullback-Leibler divergence between P and Q, defined as:

KL(P || Q) = ∑ p(i) * log(p(i) / q(i))

Where the sum is taken over all possible values of i and p(i) and q(i) are the probabilities of i in the distributions P and Q, respectively.

Conclusion

The Jensen-Shannon divergence is a measure of the distance between two probability distributions. It is based on the Kullback-Leibler divergence, but is symmetric, meaning that it does not depend on which distribution is considered the “reference” and which is considered the “comparison.” By understanding its intuition, examples, and derivation, data scientists can use the Jensen-Shannon divergence to compare and analyze probability distributions in a wide range of fields.

--

--