Tuesday, January 12, 2010

10.5 CONCEPT OF BINARY FILES

So far we have discussed mostly about data files (commonly known as text files).Every computer uses another kind of files also that are known as binary files. All machine language files are actually binary files.For opening a binary file, file has to be mentioned as "rb" or "wb" in fopen command. Otherwise all files are opened in default mode, which is the text mode. So mentioning mode as "r" or "w" in fopen is always equivalent to "rt" or "wt" respectively. Here t stands for text (files) and b stand for binary (files).It may be noted that text files can also be stored and processed as binary files but not vice-versa.The binary files differ from text files in two ways mainly:
1.The storage of newline characters
2. The eof character

First difference is about the storage of \n, i.e. newline character. In text files, \n is stored as a single newline character by user, but it takes 2 bytes of storage inside the memory. Are you surprised? Actually newline characters is collection of two characters- first carriage return(\r, ASCII code 13), and second line feed(ASCII code 10). When a text file \n, it is stored using 2 bytes, but is considered as a single character. So when you try to count the number of characters of a text file, each newline character contributes by one.

Text files and binary files also differ in the way of handling end of file. Although this may not make much difference to the users, but it may become a reason of failure of certain programs.Hence, this difference is crucial to understand.The eof corresponds to the character having ASCII code 26 for text files only. In binary files there is no such explicit eof character. The binary files do not store any special character at the end of file  and their file-end is verified by using their size itself.Now if a text file is opened in binary mode for reading purposes, it may detect eof wrongly as a result of a particular data having ASCII code 26.

While discussing binary files, one more point is worth mentioning  and corresponds to storage of data in binary format. The numbers can be written to files using fprintf statement (putc and fputc can't be used for this purpose).The fprintf statements stores numbers as sequence of alphabets. So storage of 1001 in files (text as well as binary) using fprintf will be done as sequence of 4 alphabets '1', '0', '0', '1'. It means that storage of a 4-digit number will take 4 bytes. But on a 16-bit computer, an integer needs 2 bytes of storage and this concept can be utilized with help of fwrite statement (in both text as well as binary mode). The fwrite will store every integer value by taking two bytes not equal to the number of digits in that integer value.

Let us write a C program which stores numbers in binary format using fwrite.

/* store 1001 to 1100 in a binary file using fwrite */


#include < stdio.h >
#include < conio.h >


main()
{
FILE *fp;
int i;
if ((fp = fopen("binval.dat", "wb")) == NULL)
  printf("\n ERROR- Cannot create the designated file\n");
else  {
    for  (i=1001; i <= 1100; i++)
      fwrite (&i, sizeof (int), 1, fp);

}
fclose(fp);
getch();

}

The size of the output file created with the help of the above program will be 200 bytes (2 bytes for each of 1000 integers). If the same data was to be stored in a text file using fprintf statement, the size of the file would be minimum 4000 bytes (4 x 1000).

No comments:

Post a Comment