Normal Distributions in C

One of the most useful bits of number-crunching you can do with data is to calculate the probability distribution, in the earnest hope that it will be a reasonable fit for one of the recognised distributions such as the normal distribution. In this project I will write a small C library to calculate the normal distribution for a given data set.

The Normal Distribution

Most people will have an intuitive understanding of the normal distribution even if they have no formal education in statistics. As an example let's think about people's heights: a few people are very short, rather more are fairly short, a lot are around average height, and then the numbers drop back down as we start to consider heights above average, eventually reaching zero.

If we draw a graph of heights and their respective frequencies we might expect to see something like this. (Note this is just fictitious data I have created for this project.)

normal distribution

The bars show actual data, whereas the red circles show the theoretical normal distribution for this particular data. As you can see there is a very good but not perfect fit, for example the frequency of 66 inches is 21, but the theoretical frequency assuming the data fits a normal distribution is 20.1516.

So how do we calculate, for example, 20.1516 from our data? Short answer: using the formula below, borrowed from Wikipedia. The variables used in this formula are:

  • x - the individual data value, eg. 21
  • μ - the arithmetic mean of the data
  • σ - the standard deviation of the data
normal distribution