advert

What is Compression?

Iain Laskey begins a series explaining the ins and outs of compression, what it is, how to use it and where to find it

Whilst modern PCs tend to come equipped with pretty sizable hard drives, it is still all too easy to run out of room. Those of us with smaller drives often have an even harder time juggling files to maximise the free space. A further problem is when you send or receive files via the Internet. It can take a long expensive call to send a big file to someone so what can be done to help? The answer is to compress the files so they take up less room. There are two basic types of compression, lossy and lossless.

Loser!

Lossy compression shrinks files by throwing away bits of data that hopefully won't be noticed. MP3 is such a system. It relies on the psycho-acoustic way the brain interprets audio and uses various tricks to produce something which sounds almost the same but is actually missing as much as 90% of the data. Another lossy system is Jpeg or JPG, which is designed to provide high compression on photographic type images.

Not Such a Loser

The other type of compression is lossless where the file is made smaller but can then be restored to its original form with no effect on the data. This seemingly impossible task relies on the fact that most files contain large amounts of space or repetitive data. As an example, a Word document unsurprisingly contains words. In this article, the word 'compression' appears over and over again, each one taking 11 bytes of storage. A compression system could note this and after the first occurrence, rather than store the actual word, it can store a one byte indicator to say it is a repeat word plus a byte to indicate which word it is. The result is that each occurrence of 'compression' now needs 2 bytes not 11, a saving of 9 bytes and over 80% for that word. If you now repeat that process for the 256 most common words, you can make quite a difference to the size of the file. When you decompress the file, the decompression program finds these codes for repeated words and restores the full words in their place thus restoring the document to its original size and content.

Another example is pictures of charts and graphs. Large portions of the chart will be the same colour, perhaps whole lines. Rather than storing an entire row of perhaps 800 white pixels with each needing two bytes to store the colour (allowing a maximum of 65535 possible colours) which would result in 2 x 800 or 1600 bytes, you could store two bytes for the colour, a code byte that means 'repeat this many times' and another two to store the 800. That ends up as 4 bytes to store what used to be 1600, a huge saving. I won't go into too much detail about why two bytes can hold numbers up to 65535 as this is only of interest to programmers and may confuse the issue slightly.

How Can I Use Compression?

A graph - easily compressedOne way is to use programs that are designed to compress and uncompress files. Once compressed, files cannot generally be used until they are decompressed again and as such, compression is good for archival or for emailing. I tend to compress files before burning them to a CD-R to maximise the use of space there. In the next article I will look at ZIP files, a common standard for compressing files. In many cases, programs and files you download from the Internet will be in ZIP format requiring that you 'UnZIP' them before being able to use them.

Compression is also used in many cases without you knowing. Your modem uses a form of compression when it sends and receives data. You may have noticed that even if you are connected at 33K which ought to limit download speeds to around 3.5k a second, you often see double that speed when downloading text and other highly compressible files.

Another place it happens transparently is with graphics files. Get a JPG file and open it using a graphics program such as Paint Shop Pro. Now save it as a TIF file. Taking an example file, I loaded a 60K JPG file and when saved as a TIF it grew to 802K in size purely because the JPG is stored in a compressed format whereas the TIF file is essentially uncompressed. Different graphics formats have different effects on the size and appearance of file. As a general rule, JPG is good for photos, GIF is good for graphs and charts and TIF will produce the best results but at a huge premium in size.

 

Iain Laskey
See Iain's site at www.pcbookreview.com

Keep up to Date with PPC

RSS feed icon

Add to Google

Free Sitemap Generator