The majority of data can easily be plotted on a graph with equal intervals on the axes, for example 1, 2, 3 or 100, 200, 300 etc.. Some data, typically that which increases or decreases exponentially, cannot comfortably be graphed on such a scale without squashing the data up so much at one end that it becomes incomprehensible. The solution to this problem is to use a logarithmic scale.
The Problem
Consider the data in the following table. Graphing this data with equal axis intervals of, say, 100,000 would make the differences in the lower values indistinguishable, and a scale to show them distinctly would make the graph impossibly large.
Label | Data |
---|---|
1910 | 2 |
1920 | 6 |
1930 | 29 |
1940 | 84 |
1950 | 361 |
1960 | 622 |
1970 | 4106 |
1980 | 6951 |
1990 | 15994 |
2000 | 81022 |
2010 | 198240 |
2020 | 765008 |
The Solution
To show the lower values distinctly but still fit all the data on a reasonably sized graph we need to plot the logarithms of the data rather than the data itself, using a scale which increases exponentially. Assuming we are using a base 10 scale, the increments on the axis would be 1, 10, 100, 1000 etc..
Let's look at the data again, this time including the logarithm (to base 10) of the data.
Label | Data | log10(Data) |
---|---|---|
1910 | 2 | 0.301030 |
1920 | 6 | 0.778151 |
1930 | 29 | 1.462398 |
1940 | 84 | 1.924279 |
1950 | 361 | 2.561101 |
1960 | 622 | 2.793790 |
1970 | 4106 | 3.613419 |
1980 | 6951 | 3.842047 |
1990 | 15994 | 4.203957 |
2000 | 81022 | 4.908603 |
2010 | 198240 | 5.297191 |
2020 | 765008 | 5.883666 |
We have now reduced the data to a range of approximately 0.3 to 5.8, which can comfortably be shown on a graph with an axis of perhaps 0 to 10. Note though that the axis will not be labeled 0-10, but instead with 10 (or whatever base we are using) to the power of 0 to 10, as shown in the following table.
Interval Values | Power Equation | Axis Label |
---|---|---|
6 | 106 | 1000000 |
5 | 105 | 100000 |
4 | 104 | 10000 |
3 | 103 | 1000 |
2 | 102 | 100 |
1 | 101 | 10 |
0 | 100 | 1 |
For this project we will write a short program to create a logarithmic plot of the sample data shown above, and save it as an SVG file looking like this.
The sample data is only very approximately exponential but is still reduced to roughly a straight line when plotted here. If the data were exactly exponential the points on the logarithmic plot would be on an exact straight line, but would have an ever-increasing gradient if plotted on a interval scale.
This project uses the SVG library I wrote for an earlier post. I won't include that code here but you might wish to take a look at the post to get an idea how the SVG library works.
Coding
Create a new folder somewhere and in it create the following empty files. You can download the source code as a zip or clone/download the Github repository if you prefer, and the source code zip also contains the SVG library files.
- data.h
- data.c
- logarithmicplot.c
Source Code Links
Open data.h and enter the following.
data.h
//-------------------------------------------------------- // FUNCTION PROTOTYPES //-------------------------------------------------------- void populate_data(double data[12], double labels[12]);
Then open data.c and enter the function body.
data.c
#include<string.h> //-------------------------------------------------------- // FUNCTION populate_data //-------------------------------------------------------- void populate_data(double data[12], double labels[12]) { memcpy(labels, (double[12]){1910,1920,1930,1940,1950,1960,1970,1980,1990,2000,2010,2020}, sizeof(double[12])); memcpy(data, (double[12]){2,6, 29,84, 364,622, 4106, 6951, 15994, 81022, 198240, 765008}, sizeof(double[12])); }
The data.h and data.c files simply implement a quick and dirty way of getting some data suitable for plotting on a logarithmic scale. We can now move on to writing the code to create the actual graph, so open logarithmicplot.c and enter the #includes, function prototypes and main function.
logarithmicplot.c (part 1)
#include<stdio.h> #include<math.h> #include<time.h> #include<locale.h> #include<stdlib.h> #include"data.h" #include"svg.h" //-------------------------------------------------------- // FUNCTION PROTOTYPES //-------------------------------------------------------- void print_data(double* data, double* labels, int size); void draw_logarithmic_plot(int width, int height, char* title, double* data, double* labels, int size, int maxpower, char* filename); //-------------------------------------------------------- // FUNCTION main //-------------------------------------------------------- int main(int argc, char* argv[]) { puts("--------------------"); puts("| codedrome.com |"); puts("| Logarithmic Plot |"); puts("--------------------\n"); double data[12]; double labels[12]; populate_data(data, labels); print_data(data, labels, 12); draw_logarithmic_plot(720, 540, "Logarithmic Plot", data, labels, 12, 6, "logarithmicplot1.svg"); return EXIT_SUCCESS; }
I'll discuss the print_data and draw_logarithmic_plot when we actually implement them, but for the moment let's just look at the main function. Firstly it creates a couple of double arrays (for the purposes of creating sample data the size is hard coded) and then passes them to populate_data. We then call print_data to show the data on screen, and draw_logarithmic_plot to create and save the graph.
Now let's look at the print_data function which can be added to logarithmicplot.c.
logarithmicplot.c (part 2)
//-------------------------------------------------------- // FUNCTION print_data //-------------------------------------------------------- void print_data(double* data, double* labels, int size) { puts(" label data log10(data)\n--------------------------------------"); for(int i = 0; i < size; i++) { printf("%12.0lf %12.0lf %12.6lf\n", labels[i], data[i], log10(data[i])); } }
This prints out the data in the same format as in the second table above. Of course it is not necessary for the main task of creating a graph, but does provide a useful indicator of how data maps to its corresponding logarithmic values. Finally we can move on to the draw_logarithmic_plot function.
logarithmicplot.c (part 3)
//-------------------------------------------------------- // FUNCTION draw_logarithmic_plot //-------------------------------------------------------- void draw_logarithmic_plot(int width, int height, char* title, double* data, double* labels, int size, int maxpower, char* filename) { int topmargin = 64; int bottommargin = 32; int leftmargin = 86; int rightmargin = 32; int graph_height = height - topmargin - bottommargin; int graph_width = width - leftmargin - rightmargin; double pixels_per_unit_x = (double)graph_width / (double)(size - 1); double pixels_per_unit_y = (double)graph_height / (double)maxpower; double x; double y; char number_string[8]; // Create svg struct svg* psvg; psvg = svg_create(width, height); if(psvg == NULL) { puts("psvg is NULL"); } else { svg_fill(psvg, "#FFFFFF"); // header text and border lines svg_text(psvg, width/2, 38, "sans-serif", 16, "#000000", "#000000", "middle", title); svg_line(psvg, "#808080", 2, leftmargin, topmargin, leftmargin, height - bottommargin); svg_line(psvg, "#808080", 2, leftmargin, height - bottommargin, width - rightmargin, height - bottommargin); // y axis indexes and values y = height - bottommargin; for(int power = 0; power <= maxpower; power++) { svg_line(psvg, "#808080", 1, leftmargin - 8, y, leftmargin, y); sprintf(number_string, "%.0lf", pow(10, power)); svg_text(psvg, leftmargin - 12, y + 4, "sans-serif", 10, "#000000", "#000000", "end", number_string); y -= pixels_per_unit_y; } // x axis indexes and values x = leftmargin; for(int i = 0; i < size; i++) { svg_line(psvg, "#808080", 1, x, height - bottommargin, x, height - bottommargin + 8); sprintf(number_string, "%.0lf", labels[i]); svg_text(psvg, x, height - bottommargin + 24, "sans-serif", 10, "#000000", "#000000", "middle", number_string); x += pixels_per_unit_x; } // plot data x = leftmargin; for(int d = 0; d < size; d++) { y = height - bottommargin - (log10(data[d]) * pixels_per_unit_y); svg_circle(psvg, "#0000FF", 0, "#0000FF", 3, x, y); x += pixels_per_unit_x; } // finish off svg_finalize(psvg); svg_save(psvg, filename); puts("File saved"); svg_free(psvg); } }
In the draw_logarithmic_plot function we first create a few variables:
- topmargin, bottommargin, leftmargin and rightmargin - the sizes of the four margins in pixels
- graph_height and graph_width - the size of the actual graph inside the margins
- pixels_per_unit_x and pixels_per_unit_y - the number of pixels used to represent each unit of data
- x and y - these will be used several times for the location of the various elements of the graph
- number_string - we will sprintf numbers to this to get them in a string form suitable for drawing on the graph
We can then create an SVG struct - refer to the SVG Library post if you want to know the full details of how this works. If the struct creation is successful we can then fill its background, which I have hardcoded as white, and then draw the title and axis lines.
We then use a pair of for loops to draw the indices and values on the two axes, and a third for loop to calculate the position of and draw a small circle for each data point. Note the use of the log10 function in the calculation; this function lives, not surprisingly, in math.h.
That's the graph drawn so we then call svg_finalize, which basically just adds a closing tag, then svg_save to write the SVG to a file. Then we just write out a message and call svg_free to free up the dynamic memory used by the SVG library.
The code is now finished so we can compile and run it - enter this in your terminal.
Compile and Run
gcc logarithmicplot.c data.c svg.c -std=c11 -lm -o logarithmicplot ./logarithmicplot
The program output itself isn't hugely exciting, basically just the stuff in one of the tables above.
Program Output
-------------------- | codedrome.com | | Logarithmic Plot | -------------------- label data log10(data) -------------------------------------- 1910 2 0.301030 1920 6 0.778151 1930 29 1.462398 1940 84 1.924279 1950 364 2.561101 1960 622 2.793790 1970 4106 3.613419 1980 6951 3.842047 1990 15994 4.203957 2000 81022 4.908603 2010 198240 5.297191 2020 765008 5.883666 File saved
But if you open the folder where you saved your source code you'll find a newly-created file called logarithmicplot1.svg, which you can double click to open with your default image viewer.
Bases Other Than 10
The code in this project uses base 10, which is likely to be the most appropriate for the majority of data. However, there is no reason why you shouldn't use another base if necessary. As an example, if you were plotting the growth of computer memory over the years base 2 would be more appropriate.
Dealing With Fractions
The sample data used for this project consisted only of values >= 1. Fractions and negatives can also be plotted using logarithmic scales and in this section I'll show a couple of tables demonstrating values between 0 and 1.
Firstly let's look at 10 to the power of negative integers. This table is equivalent to Table 3 above, and shows values getting an order of magnitude smaller each step instead of larger.
Interval Values | Power Equation | Axis Label |
---|---|---|
0 | 100 | 1 |
-1 | 10-1 | 0.1 |
-2 | 10-2 | 0.01 |
-3 | 10-3 | 0.001 |
-4 | 10-4 | 0.0001 |
-5 | 10-5 | 0.00001 |
-6 | 10-6 | 0.000001 |
Now lets look at some sample data with its base 10 logarithms.
Label | Data | log10(data) to 6 dp |
---|---|---|
1910 | 0.9 | -0.045757 |
1920 | 0.36 | -0.443697 |
1930 | 0.081 | -1.091515 |
1940 | 0.052 | -1.283997 |
1950 | 0.0064 | -2.193820 |
1960 | 0.0012 | -2.920819 |
1970 | 0.00092 | -3.036212 |
1980 | 0.00049 | -3.309804 |
1990 | 0.000051 | -4.292430 |
2000 | 0.000011 | -4.958607 |
2010 | 0.0000077 | -5.113509 |
2020 | 0.0000029 | -5.537602 |
The raw data ranges from 0.0000029 to 0.9 which, as with the data in Table 1, is too wide a range to sensibly plot as it is but using the logarithms we reduce the values to fit neatly within the range 0 to -6 shown in table 4.
Dealing With Negative Data
Negative values can also be plotted on a logarithmic scale but are rather fiddly as the absolute (positive) values must be used for calculating the logarithm, therefore the log increases as the actual negative data values decreases. It is therefore necessary to invert the plot. This should be clearer with another table.
Label | Data | abs(Data) | log10(abs(Data)) |
---|---|---|---|
1910 | -2 | 2 | 0.301030 |
1920 | -6 | 6 | 0.778151 |
1930 | -29 | 29 | 1.462398 |
1940 | -84 | 84 | 1.924279 |
1950 | -361 | 361 | 2.561101 |
1960 | -622 | 622 | 2.793790 |
1970 | -4106 | 4106 | 3.613419 |
1980 | -6951 | 6951 | 3.842047 |
1990 | -15994 | 15994 | 4.203957 |
2000 | -81022 | 81022 | 4.908603 |
2010 | -198240 | 198240 | 5.297191 |
2020 | -765008 | 765008 | 5.883666 |
The data in this table are the negatives of the sample data we plotted. Therefore if we take the logarithms of the absolute values we end up plotting the exact same numbers, which of course is wrong. However, if we plot downwards instead of upwards, effectively mirroring the graph along the x-axis, we will get the correct result.
Combining Negative, Fractional and Positive Data Values
Combining positive and fractional data is no problem - we can just extend the solution we developed above so that the powers run from negative through to positive, as show in the following table which combines tables 3 and 4.
Including negative values presents a bit of a problem though, which can only be resolved by dealing with the negative values separately, both when drawing the indexes and plotting the data points.
Interval Values | Power Equation | Axis Label |
---|---|---|
6 | 106 | 1000000 |
5 | 105 | 100000 |
4 | 104 | 10000 |
3 | 103 | 1000 |
2 | 102 | 100 |
1 | 101 | 10 |
0 | 100 | 1 |
-1 | 10-1 | 0.1 |
-2 | 10-2 | 0.01 |
-3 | 10-3 | 0.001 |
-4 | 10-4 | 0.0001 |
-5 | 10-5 | 0.00001 |
-6 | 10-6 | 0.000001 |
Conclusion
This has been a very basic introduction to the rather esoteric topic of logarithmic plots, but I hope I have got the principles across sufficiently to give a foundation on which to build should you need to do so.