Maddog's HTML Primer for Real People

Chapter 6, Lesson 2

Ch 6, L 2 - Image File Formats

If you already know the difference between .GIF and .JPG image file formats, and when to use which, you can skip this lesson.

Before we move on to building HTML documents with images, we need to talk a little about image file formats. As we alluded earlier, all images will be stored in files separate from the HTML document. So the question naturally arises: "Can I use any image file for HTML?"

Alas, you cannot. As you may already know, there are scores of file formats for images used by computers. By "file format", we simply mean the method by which the image is stored in a file. Some methods store numbers representing the color and intensity of each pixel (screen dot), storing one unique set of numbers for each pixel. Such methods are called "lossless" because they store every teeny-weeny detail and nuance of the image, without "losing" anything. Such also methods usually produce relatively large files, even for relatively small pictures. Some methods store the same "lossless" information, but embellish the method by performing "data compression", which serves to make the file not quite so large. And some methods perform "image compression", which means they "approximate" the image by some very, very oh-so-clever means beyond the understanding (or at least the caring) of real people. Methods that use "image compression" are often "lossy", which means that some information about the image is "lost". Lossy methods produce a different image than the original, but if done correctly, the average viewer will never know the difference. Methods that use image compression can produce good images with files 1/5 to 1/10 the size of a lossless, uncompressed image file.

Conventions have arisen which help to identify what kind of image file format method is being stored, so that software (and operators) which need to reconstruct the image (from the image file) can tell which method should be used. The most popular convention is to use the filename extension (the last part of the filename, after the last '.' [dot]) to indicate the type file format used. Popular extensions you may have encountered are .bmp, .pcx, .wmf, .ppm, .tif, .dib, .dxf, .gif, .jpg. Some image files may also store more information inside the file, while still using a specific filename extension.

Software which reconstructs images for display has to reverse the process which originally produced the image file. If the image file was produced by a method that used "type x" image compression, then the software which reconstructs the image for display must un-compress the image using the "reverse type x" method. The software must match the file format, regardless of the filename extension, in order to successfully reproduce the image on the screen. For example, if the software tries to apply the reconstruction methods for '.bmp' files to a file which was constructed using the '.tif' method, it will fail miserably.

The web-weenies who designed graphics browser software recognized that it would be a very, very complicated undertaking to include the ability to reconstruct every popular image format, so they settled on two that were known to have great flexibility and excellent reliability, and reasonable file size. (file size, as you have noticed if you ever tried to download a large image file, or a web page with one, translates to "time" on the internet). We shall briefly discuss both formats.

The '.gif' image file format was originally developed by Compuserve, before graphics browsers were popularized, for transferring images around the Internet. It has been through many incarnations since its first inception, and has proved to be robust and flexible. We don't need to know how it works, but we do need to remember that it is lossless (i.e., no image information or detail is lost), and it produces a medium-sized image file. Actually, file size is difficult to address when talking about image files. With .gif files, it's safe to say, the larger the image, the larger the file. With most lossless methods, the file size is directly proportional (more or less) to the number of pixels in the image, or, in other words, to the area of the image. Let's say you have two images stored in .gif format. One is 100 X 100 pixels, and the other is 200 X 200 pixels. The larger one will produce a file up to four times (4x) larger than the first one, because it has four times as many dots. Yet it only looks twice as big when displayed, i.e., twice as wide, and twice as tall.

The '.jpg' or '.jpeg' file format is a "lossy" method using image compression. It stores information inside the file which tells the browser software what specific algorithm (computer formula) to use to de-compress the image for restoration. Again, we don't need to know how it works, but we do need to remember that it is lossy (i.e., some image information or detail is lost), and it produces a relatively small image file. Also, unlike other methods, it usually becomes more efficient with larger images, simply because the image-compression method has more material to work with. The result is that the files are typically 1/2 - 1/5 the size of an identical '.gif' file, and sometimes even smaller. Also, when you double the image size, it doesn't usually quadruple the file size, but does something a little more than double it. The result is almost the same picture, but a much smaller file size.

Let's see some examples:

filename=saturn1.gif; file size=4.2KB; load time=2.9 sec

filename=saturn1.gif; file size=13.6KB; load time=9.4 sec

filename=saturn1.jpg; file size=1.3KB; load time=0.9 sec

filename=saturn1.jpg; file size=3.0KB; load time=2.1 sec

The first two pictures, as you will see from the file names, are '.gif' format. The smaller one is 100 x 100 pixels, and the larger one is 200 x 200 pixels. Look at the file size: the larger one is 3.2 times as big as the smaller, even though the image appears to be only twice as big. Here's the real cruncher: the smaller one takes about 3 seconds to download over a typical 14.4Kb modem connection, while the larger one takes almost 10 seconds. (And keep in mind, these aren't even large images)

Now look at the last two '.jpg' images. They're identical in size to the first two '.gif' images. Can you see any difference in the "lossy" '.jpg' images? Not much, huh? Yet the size of the '.jpg' images is significantly smaller than the '.gif' images. In fact, the larger '.jpg' image file takes less time to download than the smaller '.gif' image. Amazing stuff! In practice, the disparity between the sizes becomes even more apparent with larger images, with the '.jpg' far outclassing the '.gif' method for full-screen size files, which can take several minutes to download using '.gif' format.

Well, we might be tempted to use '.jpg' file formats for every image, except for one thing: not all browsers will display '.jpg' images. (Maybe you already found that out, in the example above). Therefore, some good guidelines to follow are these:

Use only '.gif' files for images displayed on a web page (i.e., inline images);
Use the smallest image possible to get the job done (images almost always take longer to download than text);
If you have a lot of images to show, consider splitting them into different pages, or using thumbnail images which link to off-page images;
Use '.jpg' files for off-page images, whenever possible, at least for all but the smallest ones (or else pay the price in download time).

Remember, when using images with web pages, consider the file size of the image, and consider the the time it will take to download the web page with all your images. As a rule of thumb, you can use the file size in bytes (which you can find out from your file manager software), and divide that by 1400 to get the typical download time in seconds for each image. (if you're using KB, divide by 1.4) If it looks like your web page will take more than 20 or 30 seconds to download, the person wanting to view your web page may abondon you. Think about it. A common solution is to use a small "thumbnail" image, and provide a link to the larger image file, which can then be downloaded when and if the remote operator chooses. Much more civilized, don't you think? (Would you pass the grey poupon, please?)

Overseer: Monty Northrup ...

... leave e-mail ...