This post will demonstrate programs which each perform a single very specific task but which can be chained together in such a way that the output of one forms the input of another. Connecting programs like this, or piping to use the correct terminology, enables more complex workflows or processes to be run.
This post includes just three short programs to carry out the following tasks:
- Generating data
- Filtering data from the data generating program
- Calculating totals from the filtering program
The three programs described above can really only be used together in the order listed (perhaps missing out the filtering stage) but the main purpose of this article is to show that a suite of programs, each carrying out a single task, can be developed which can be used in a "mix and match" fashion to carry out a large number of different tasks, providing their inputs and/or outputs are in a common format.
A few examples of additional programs to add to the suite might be:
- Extracting data from various different sources (databases, flat files, web services etc.) and transforming it into a common format for use by other programs
- Filtering and/or sorting data in a variety of ways
- Processing the data from previous programs in the pipeline in various ways, anything from simple totals to complex machine learning
- Saving data created earlier in the pipeline to a variety of formats - databases, XML, spreadsheet etc.
Project Requirements
For this project we will create a file of transaction amounts for a hypothetical business. Some will be positive representing invoices sent out to customers, and some will be negative representing invoices received from suppliers. For this demonstration the amounts will be randomly generated but of course in a real-world situation the data would be retrieved from a source such as a database. The data will then be written to a file.
The next stage is to filter the data. It is common accounting practice to write off amounts below a certain value as it is not cost effective to handle them. Our filtering program will therefore take a minimum amount as a command line parameter, read in amounts from the previous program, and only output amounts above the minimum.
The final program will read in data from the filtering program and calculate two totals. Negative amounts will be added to obtain the total creditors, while positive amounts will be added to obtain the total debtors. These amounts will be written to a further file.
Files in C, stdin and stdout
In the previous section I mentioned files several times, implying that data will be written to and read from files saved to disc. While each of the three programs can be used in this way they don't need to be - the piping process mentioned above can be used to write the output from one program directly to the input of another without any intermediate stage of writing and reading disc files.
One of the very first things most people learn about C is that printf, putc and puts write to the terminal, and that scanf, gets and getchar read from the keyboard. Well actually, that's not quite true. In fact printf, putc and puts writes to a file called stdout (standard output) which by default points to the screen rather than a file on disc. Similarly, scanf, gets and getchar read from a file called stdin (standard input) which by default points to the keyboard. However, these defaults can easily be changed. For example assume you have a program called getdata which uses printf to output data. If you ran it like this:
./getdata
then not surprisingly the data would appear on the screen. However, you can easily change stdout like this:
./getdata > data.csv
which will make printf write to data.csv. You can also change where stdin points to using the < operator, as we'll see later. We will also see later how to pipe several programs together, each redirecting stdout and stdin to pass data along the pipeline without any unnecessary disc writes and reads.
Time to Start Coding
Create a new folder somewhere and within it create three empty files; you can also download the source code as a zip or clone/download from Github if you prefer.
- generatedata.c
- filterdata.c
- calculatetotals.c
Source Code Links
Open generatedata.c and enter or paste this code.
generatedata.c
#include<stdio.h> #include<stdlib.h> #include<time.h> //-------------------------------------------------------- // FUNCTION main //-------------------------------------------------------- int main(int argc, char* argv[]) { srand(time(NULL)); double amount; for(int i = 0; i < 64; i++) { amount = ((rand() % 200) - 100); printf("%lf\n", amount); } fprintf(stderr, "%s\n", "data generated"); return EXIT_SUCCESS; }
The code for these three programs is deliberately simple so as to concentrate on redirecting stdin and stdout, and piping the outputs from one to the inputs of the next.
This first one simply generates 64 random numbers between -100 and +100, and uses printf to write them to stdout (wherever that might be - neither we or the printf function know or care!)
Having done its stuff it then uses fprintf to write a message to stderr which, by default, also points to the screen but unlike stdout is not redirected with the > operator. (Please feel free to disapprove of my hijacking stderr for something which is not actually an error! However, it does demonstrate that you still have access to the screen should you need it for real errors. I have omitted any error handling for brevity but obviously that should not be done for production code.)
Now lets move on to filterdata.c.
filterdata.c
#include<stdio.h> #include<stdlib.h> //-------------------------------------------------------- // FUNCTION main //-------------------------------------------------------- int main(int argc, char* argv[]) { double minimum = atof(argv[1]); char inputbuffer[16]; double amount; while(fgets(inputbuffer, sizeof(inputbuffer), stdin) != NULL) { amount = atof(inputbuffer); if(abs(amount) > minimum) { printf("%-8.2lf\n", amount); } } fprintf(stderr, "%s\n", "data filtered"); return EXIT_SUCCESS; }
This is equally simple. It firstly picks up the minimum from the command line arguments and then uses fgets to read in data from stdin line by line. Any amounts over the minimum are written to stdout.
As with printf in the previous program we don't know or care where stdin actually comes from or where stdout actually goes to.Finally let's move on to calculatetotals.c.
calculatetotals.c
#include<stdio.h> #include<stdlib.h> //-------------------------------------------------------- // FUNCTION main //-------------------------------------------------------- int main(int argc, char* argv[]) { char inputbuffer[16]; double amount; double totaldebtors = 0; double totalcreditors = 0; while(fgets(inputbuffer, sizeof(inputbuffer), stdin) != NULL) { amount = atof(inputbuffer); if(amount > 0) { totaldebtors += amount; } else { totalcreditors += amount; } } printf("Debtors: %8.2lf\n", totaldebtors); printf("Creditors: %8.2lf\n", totalcreditors); fprintf(stderr, "%s\n", "totals calculated"); return EXIT_SUCCESS; }
Another very simple little program. Firstly we declare a few variables: a char array to use as an input buffer and a double to hold amounts converted from strings. We also have variables for the two totals, initialized to 0.
In the loop we read in data from stdin line by line, convert the string to a number, and then add the amount on to the relevant variable. After the loop terminates we write the two totals to stdout. You know what I'm going to say next: "we don't know or care where stdin comes from or where stdout goes to".
The coding is now finished so we can compile the three programs.
Compile
gcc generatedata.c -std=c11 -o generatedata
gcc filterdata.c -std=c11 -o filterdata
gcc calculatetotals.c -std=c11 -o calculatetotals
We can actually run these in different ways. The first way is individually, causing them to write data to or read data from files on disc. Let's do that to start with.
Run individually
./generatedata > transactions.csv
./filterdata 10 < transactions.csv > filteredtransactions.csv
./calculatetotals < filteredtransactions.csv > totals.csv
The first line runs generatedata, telling it to redirect stdout to transactions.csv.
The second line runs filterdata with a command line parameter of 10, the minimum amount. The stdin file is redirected to transactions.csv created by the previous program, and stdout to filteredtransactions.csv.
The last line runs calculatetotals with stdin redirected to filteredtransactions.csv and stdout redirected to totals.csv.
When you run these all you will see is the stuff written to stderr...
Program Output
data generated
data filtered
totals calculated
... but if you look in the folder where you have your source code you will see three csv files have been created.
Now let's run the programs again, this time all in one go with the output from one piped to the input of the next using the '|' character.
Run with piping to create totals.csv file
./generatedata | ./filterdata 20 | ./calculatetotals > totals.csv
There is only one file name here, totals.csv, which is the end result of the process. We no longer waste resources generating unnecessary intermediate files. When you run this you'll still see the messages printed to stderr but only the final totals.csv file has been written.
If we just want to see the totals on screen we can do away with files on disc completely. Let's run the programs one last time, this time without redirecting the output of the last to a file.
Run with piping to print totals to screen
./generatedata | ./filterdata 20 | ./calculatetotals
The only difference here is we have missed off "> totals.csv" from the end. For the calculatetotals program stdout points to the default screen so we actually get to see the totals without opening a file.
Program Output
data generated
data filtered
Debtors: 1125.00
Creditors: -1532.00
totals calculated