Probability Plots, Normal Distributions 'in nature'¶

Generate 5000 normally distributed values with given mean and standard deviation and plot the histogram¶

from pylab import *
from scipy.stats import norm
def gen_norm(N,mu,sigma):
    return sigma*norm.ppf(random(N))+mu

values=gen_norm(5000,2,3)

#divide values into 20 equal sized bins between -6 and 11
#and draw the histogram
bins=linspace(-6,11,20)
h=histogram(values,bins)
midpoints=[(h[1][i]+h[1][i+1])/2 for i in range(19)]
stem(midpoints,h[0])
show()

Superimpose the normal density¶

## Superimpose the plot of the normal density with the given mean and standard deviation, adjusted to same height
x=linspace(-6,11,200)
y=exp(-(x-2)**2/18)*max(h[0])
plot(x,y,color='k')
stem(midpoints,h[0])
show()

Make a qq-plot of the simulated data¶

from scipy.stats import probplot
u=probplot(values,plot=plt)
plt.axhline(color='k')
plt.axvline(color='k')
plt.show()

In contrast, look at the qq-plot when we try something that isn't normal¶

# generate exponentially distributed data
expvalues=-log(random(5000))
u=probplot(expvalues,plot=plt)
plt.show()

Now work with real data -- plot histogram of human height data (9999 18-year-olds)¶

f=open('heightextract','r')
hlist=[]
for s in f:
    hlist.append(float(s))
    
bins=linspace(min(hlist),max(hlist),40)
u=histogram(hlist,bins)
v=[(u[1][i]+u[1][i+1])/2 for i in range(39)]
#plot height histogram with max height normalized
    #to 1
nor=max(u[0])
stem(v,u[0]/nor)
xlim(min(hlist),max(hlist))
show()

Estimate mean and variance from data¶

hlist=array(hlist)
mubar=sum(hlist)/len(hlist)
varbar=sum(hlist**2)/len(hlist)-mubar**2

Superimpose normal density¶

x=linspace(min(hlist),max(hlist),500)
y=exp(-(x-mubar)**2/(2*varbar))
plot(x,y,color='k')
stem(v,u[0]/nor)
xlim(min(hlist),max(hlist))
show()

Make a qq plot¶

u=probplot(hlist,plot=plt)
plt.axhline(y=55,color='k')
plt.axvline(color='k')
plt.show()