Johnny Rollerfeet's Guide to Audio Over the Internet

Brief Explanation of Audio and the Internet

Audio over the internet was/is a difficult problem. I'll start by explaining the basics of an audio file. Every audio format has four characteristics that define it's quality.

Sampling Rate - determines how many times during 1 second that we 'sample' the audio information. So, a file that has an 8,000 Hz (or 8kHz) Sampling Rate would be able to change the sound 8,000 times during 1 second. CD quality has a Sampling Rate of 44,100 Hz - or 44.1kHz. Analog devices, such as cassette tapes and vinyl records have a near infinite Sampling Rate¹. Your sound card is a digital device. (Your ears can't really tell any difference above ~48kHz, btw.)

Sampling Depth - determines how much data is stored per sample. Usually Sampling Depth is either 8 bits, or 16 bits. So a file that has a Sampling Depth of 8 bits would store 8 bits of information each time it is sampled. (Your ears can hear a difference of 8 bits versus 16 bits sampling depth in music, but not in plain speech recordings.)

Channel - determines if the sound is in two channels (left and right) - the way we hear naturally, or one channel - the way we hear over a telephone. Stereo (two channels) stores twice as much information as mono (one channel). (Now, with DTS and Dolby Surround this has become more complicated, but since most computer sound files don't use 6 channels -- front left, front right, center, back left, back right, sub-woofer -- we won't go into that.)

Compression Algorithm - this is too complex to go into detail², but the simple explanation is that information can be condensed so that it doesn't take up as much room. When compression is used on an audio file, the computer takes longer (or uses more processing power) to record or play the file.

A standard sound file (.wav - the one most used in Windows 3.1 and Windows 95) 2 minutes and 47 seconds long, in stereo, at 44.1kHz, 16 bits, with no compression would sound ideal, but would take up 29,458,800 bytes of data on your hard drive (that's nearly 30 megabytes!... which would take 2 hours and 28 minutes of uniterrupted transfer time on a 28.8 modem with a perfect phone line.) It's easy to see how ridiculous this would be.
		Sampling Rate 44,100 Hz
	*	Sampling Depth 2 bytes (16 bits)

	=	Number of bytes per second per channel 88,200 bytes
	*	Stereo Channels 2

	=	Number of bytes per second 176,400 bytes
	*	Length of Sample 2 minutes 47 seconds (167 seconds)

	=	Total number of bytes for 44.1 kHz / 16 bits / Stereo
		song that's 2:47 29,458,800 bytes
To make files small enough to be sent over the internet, some sacrifices have to be made. The first things to go are the Sampling Depth, and Stereo. Changing to 8 bits and mono sound reduces your file by half and then half again. (So now, the previously 30 meg file is only 7,364,700 bytes.) As mentioned before, a sampling rate of 44.1 kHz is CD quality. Well, unfortunately older phone systems don't do better than 8 kHz, and 8kHz is more than enough for regular voice sounds (though it sucks for music.) Now, when the 2 minute, 47 second sound file is dropped from 44.1 kHz to 8 kHz it is only 1,336,000 bytes. We've changed the Sampling Rate, the Sampling Depth, and set the file to Mono. This has greatly reduced the file (from 30 megabytes to 1.3 megabytes), but that file would still take over five minutes to download. So our last option is compression. How does compression work? There are entire college graduate-level classes on compression, so it's not easy to truly comprehend. Some compression techniques search the sound file for spots where the sound doesn't change for a long time and alters the Sampling Rate mid-file². Some compression techniques simply use advanced file compression techniques (similar to "ZIP") to reduce file size. The important thing to realize is that files can be made smaller, but the computer has to work harder to do that. As computers get faster and faster, we can compress files smaller and smaller without taxing the processor.

What file types are there?

There are several types of audio files available for use over the Internet. I have picked the ones that were most accessible and had the best compression. If you know of a good one that I haven't included, send e-mail to me with a URL and I'll see about integrating it with these. (I have no control over our server so if your format requires a special server, too bad for all of us.) Take a look at the sites listed here and try out thier audio players and encoders to get the most benefit of this page while using the least bandwidth. I've listed them in (IMHO) preferential order.

	MPEG 3	This is a fairly recent format, and while the compression is not as good as MP2 the quality is phenomenal for what compression it does get. Players: WinAmp for Windows 95 is really the only mp3 player, Encoders: CDEx encodes from most CD Players to mp3, and Wav2MP encodes wave files to mp3.
	MPEG 2	Players:Cool Edit for Windows 3.1 and Windows 95, xingsound for Windows 3.1, MAplay for Windows 95, MPEG/CD for Macintosh Encoders:MPEG Audio Macintosh, Cool Edit for Windows 3.1 and Windows 95
	Real Audio 28.8 and 14.4	Players:Real Audio's Player for Win 3.1 and Windows 95, Encoders:Real Audio's Encoder for Win 3.1 and Windows 95
	DSP	Players:DSP Player or the DSP Netscape Plugin. Encoders:To save DSP you need Win 95 or the Win 3.1 encoder from DSP and you can get instructions on using Win 95 to make DSP files.
	Vox	You can get both the player and encoder from the Voxware site.

Let me know if you have any questions or problems with any of this.

Footnotes

¹Longer Boring Technical Explanation: An analog device is actually storing pieces of information on the molecules of the recording device (the plastic on the record, or the magnetic strip on a tape.) A digital device is storing information in larger stripes. The problem with analog is that when you copy from an analog device the atoms can be interrfered with (dust, demagnetized, etc.) and the recording device doesn't now any better. With digital devices, an exact copy can be made without interference.

²Longer Boring Technical Explanation 2: I've been reading up on mp3 compression. Here's the brief rap: when you hear sounds, the sound waves tend to run together. If the waves are very similar, then you can't distinguish between them. Mp3 compression takes the similar sound waves and makes them into one sound wave. A more tecnical description from the guys who made mp3 compression is available (-- be sure to check out the "Perceptual Audio Coding" section.)