Uncategorized

Lognormal to Normal Distribution


The Normal and lognormal distributions are fundamental concepts in statistics. I recently used the relationship between these two distributions in a project. In this blog post, I want to share what I learned.

Outline:

  1. Normal & Lognormal Distributions
  2. Lognormal to Normal
  3. Normal to Lognormal
  4. Conclusion

Normal & Lognormal Distributions

The normal distribution is also called the bell curve or Gaussian distribution. The bell height represents the mean position, and the bottom width of the bell represents the spread of values (standard deviation). Thus, the shape changes as we change mu (\(\mu\)) and sigma (\(\sigma\)). The \(\mu\) is the mean or average of the sample, and \(\sigma\) is the standard deviation. We denote a normal distribution as:

\[{\mathcal {N}}(\mu ,\sigma ^{2})\]

Find more details about the normal distribution on Wikipedia. Here are two ways of defining a normal distribution in Python.

1
2
3
from statistics import NormalDist
mu, sigma = 5, .5
norm_dist = NormalDist(mu, sigma)

1
2
3
import scipy.stats as stats
mu, sigma = 5, .5
norm_dist = stats.norm(mu, sigma)

We get a lognormal distribution when we apply exponentiation to the normal distribution. The result is a lopsided curve. It means that there is a longer tail on the right side, where larger values occur. We denote the lognormal distribution as follows:

\[{\displaystyle \ X\sim \operatorname {Lognormal} \left(\ \mu _{x},\sigma _{x}^{2}\ \right)\ }\]

Since the log of the lognormal distribution is a normal distribution, we can denote the relationship as follows:

\[{\displaystyle \ln(X)\sim {\mathcal {N}}(\mu ,\sigma ^{2})}\]

Find more details about the lognormal distribution on Wikipedia. We define a lognormal distribution in Python as follows. The Python stdlib does not have a lognormal implementation.

1
2
3
4
import numpy as np
import scipy.stats as stats
mu, sigma = 5, .5
norm_dist = stats.lognorm(s=sigma, scale=np.exp(mu))

Note: the scipy.stats.lognorm takes mu and sigma of the underlying normal distribution from which we derive the lognormal distribution. While providing the scale parameter, we take the exponentiation of the mean of the normal distribution. I found the documentation inadequate in explaining the parameters. This SO question has answers that discuss the meaning of the parameters.

Here is how both the distributions look for the same mu (\(\mu\)) and sigma (\(\sigma\)).

Code to generate the below plot.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt


# all distributions
mu, sigma = 5, .5
norm_d1 = NormalDist(mu, sigma)
lognorm_d1 = stats.lognorm(s=sigma, scale=np.exp(mu))
lognorm_d1.mu, lognorm_d1.sigma = mu, sigma

mu, sigma = 5, 1
norm_d2 = NormalDist(mu, sigma)
lognorm_d2 = stats.lognorm(s=sigma, scale=np.exp(mu))
lognorm_d2.mu, lognorm_d2.sigma = mu, sigma

mu, sigma = 4, 0.3
norm_d3 = NormalDist(mu, sigma)
lognorm_d3 = stats.lognorm(s=sigma, scale=np.exp(mu))
lognorm_d3.mu, lognorm_d3.sigma = mu, sigma

# norm y
x = np.linspace(0, 10, 500)
norm_y1 = np.array([norm_d1.pdf(i) for i in x])
norm_y2 = np.array([norm_d2.pdf(i) for i in x])
norm_y3 = np.array([norm_d3.pdf(i) for i in x])

# lognorm y
x = np.linspace(0, 800, 500)
lognorm_y1 = np.array([lognorm_d1.pdf(i) for i in x])
lognorm_y2 = np.array([lognorm_d2.pdf(i) for i in x])
lognorm_y3 = np.array([lognorm_d3.pdf(i) for i in x])


# Set the figsize
fig1, ax1 = plt.subplots(figsize=(6, 4))
ax1.plot(x, norm_y1, label=f"mu = {norm_d1.mean}; sigma = {norm_d1.stdev}")
ax1.plot(x, norm_y2, label=f"mu = {norm_d2.mean}; sigma = {norm_d2.stdev}")
ax1.plot(x, norm_y3, label=f"mu = {norm_d3.mean}; sigma = {norm_d3.stdev}")
ax1.legend()

fig2, ax2 = plt.subplots(figsize=(6, 4))
ax2.plot(x, lognorm_y1, label=f"mu = {lognorm_d1.mu}; sigma = {lognorm_d1.sigma}")
ax2.plot(x, lognorm_y2, label=f"mu = {lognorm_d2.mu}; sigma = {lognorm_d2.sigma}")
ax2.plot(x, lognorm_y3, label=f"mu = {lognorm_d3.mu}; sigma = {lognorm_d3.sigma}")
ax2.legend()

plt.show()

fig1.savefig('norm_dist.svg', format='svg', dpi=1200, bbox_inches='tight')
fig2.savefig('lognorm_dist.svg', format='svg', dpi=1200, bbox_inches='tight')

For normal distribution: Instead of using the NormalDist.pdf() we can also use numpy.random.Generator.normal to get a normal distribution sample and plot a histogram. Similarly, for lognormal distribution, instead of stats.lognorm.pdf(), we can use numpy.random.Generator.lognormal.

Lognormal to Normal

As mentioned in the previous section, normal distribution is just a log of the lognormal distribution. So, if \({\displaystyle \ X\sim \operatorname {Lognormal} \left(\mu _{x},\sigma _{x}^{2} \right)}\), then \({\ \displaystyle \ln(X)\sim {\mathcal {N}}(\mu ,\sigma ^{2})}\).

Let us understand this by code.

1
2
3
4
5
6
7
8
9
import numpy as np

rng = np.random.default_rng()

mu, sigma = 5, .5
lognorm_samples = rng.lognormal(mu, sigma, 10000)
# take the log of lognorm samples to derive the normal dist.
norm_samples = np.log(lognorm_samples)
print(norm_samples.mean(), norm_samples.std())

5.005339216906491 0.4934326302969564

The parameters (mean and std) of the derived normal distribution (line 7) are the same as the original parameters we provided to the lognormal dist (line 5).

Code to generate the below plots
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# log normal dist
fig1, ax1 = plt.subplots(figsize=(5, 3))
ax1.hist(lognorm_samples, bins=50, alpha=0.7, density=True, color="orange")

x1 = np.linspace(0, 800, 500)
lognorm_d = stats.lognorm(s=sigma, scale=np.exp(mu))
lognorm_y = np.array([lognorm_d.pdf(i) for i in x1])
ax1.plot(x1, lognorm_y, label=f"mu = {mu}; sigma = {sigma}")
ax1.legend()

# normal dist
fig2, ax2 = plt.subplots(figsize=(5, 3))
ax2.hist(norm_samples, bins=50, alpha=0.7, density=True, color="orange")

x2 = np.linspace(0, 7, 500)
norm_d = stats.norm(mu, sigma)
norm_y = np.array([norm_d.pdf(i) for i in x2])
ax2.plot(x2, norm_y, label=f"mu = {mu}; sigma = {sigma}")
ax2.legend()

plt.show()
fig1.savefig('lognorm_dist2.svg', format='svg', dpi=1200, bbox_inches='tight')
fig2.savefig('norm_dist2.svg', format='svg', dpi=1200, bbox_inches='tight')

Conclusion: to convert from a lognormal to normal, take the logarithm of the lognormal sample.

Normal to Lognormal

If the logarithm of a lognormal distribution is normally distributed, then the reverse will also be true. That is, the exponential of a normal distribution will give us a lognormal distribution. In notation, if \({\displaystyle Y\sim {\mathcal {N}}(\mu ,\sigma ^{2})}\), then \({\ \displaystyle \exp(Y)\sim \operatorname {Lognormal} \left(\mu _{x},\sigma _{x}^{2} \right)\ }\).

Let’s again understand this through code.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import numpy as np
import scipy.stats as stats

rng = np.random.default_rng()

mu, sigma = 5, .5
norm_samples = rng.normal(mu, sigma, 10000)

# take the exp of norm samples to derive the lognormal dist.
lognorm_samples = np.exp(norm_samples)

# fit a lognorm distribution to get the mean and std dev
shape, loc, scale = stats.lognorm.fit(lognorm_samples)
mean, stddev = np.log(scale), shape
print(mean, stddev)

4.984256782660331 0.5067622675605842

The parameters (mean and std) of the derived lognormal distribution (line 10) are the same as the original parameters we provided to the normal dist (line 6). Note that we used the [scipy.stats.lognorm.fit] method to fit the lognorm distribution on the data. It gives us the following three parameters: loc, shape and scale. The shape is same as standard deviation. To get the mean, we have to take the logarithm of the scale. We did not have to do this when we converted the lognormal to a normal distribution (previous section) because we can directly get the params (mean and std). Read this SO answer for more details.

Code to generate the below plots
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# normal dist
fig1, ax1 = plt.subplots(figsize=(5, 3))
ax1.hist(norm_samples, bins=50, alpha=0.7, density=True, color="orange")

x1 = np.linspace(0, 7, 500)
norm_d = stats.norm(mu, sigma)
norm_y = np.array([norm_d.pdf(i) for i in x1])
ax1.plot(x1, norm_y, label=f"mu = {mu}; sigma = {sigma}")
ax1.legend()

# lognormal dist
fig2, ax2 = plt.subplots(figsize=(5, 3))
ax2.hist(lognorm_samples, bins=50, alpha=0.7, density=True, color="orange")

x2 = np.linspace(0, 800, 500)
lognorm_d = stats.lognorm(s=sigma, scale=np.exp(mu))
lognorm_y = np.array([lognorm_d.pdf(i) for i in x2])
ax2.plot(x2, lognorm_y, label=f"mu = {mu}; sigma = {sigma}")
ax2.legend()

plt.show()
fig1.savefig('norm_dist3.svg', format='svg', dpi=1200, bbox_inches='tight')
fig2.savefig('lognorm_dist3.svg', format='svg', dpi=1200, bbox_inches='tight')

Conclusion: to convert from a normal to lognormal, take exp of the normal sample.

Conclusion

We started with the Normal and Lognormal distributions and with their definition in Python. We converted each of the distributions into the other. It took me some effort to figure out how to do the conversion. With this post, I tried to demystify the confusion.

I found an interesting link while researching the answers: visualisation of all the distributions available in scipy.stats on this SO answer.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *