Digitizing the sound

Digitizing sound is about transforming an analog audio input (like microphone, mixer or cd/cassette/records player) into computer data. This process is, of course reversable (your computer loudspeakers receive an input which has been previously transformed form computer data to analog audio).

This "translation" process is always done in the same way by a computer, but there are different methods for a computer to handle audio as computer data. You sure already heard about mp3 and wav, maybe also about ogg or wma. These are nothing else than methods to store audio as data, called codecs.

Looseless and lossy codecs

While a normal CD uses a loseless encoder tecnique (the algorithm focuses on the maximal quality of the sound), the codecs we are goung to use in the streaming process are so called "lossy codecs", which means that they focus on a reasonable high quality while tring to use as less space as possible and doing it all at a good speed. Lossy codecs, basically, compress the data to be stored and decompress it when they are asked to deliver audio output for human ears. The biggest differences betwen loosless compression and lossy compression is that while with the first method you will never have a quality loss, if you compress, decompress and than compress again with a lossy codec, the audio quality will be sooner or later be compromised.

Streaming, our computer has to quickly deliver a lot of data over the Internet, so we will always use a lossy codec. Mostly used lossy codecs are: Mpeg Layer 3 (mp3), Vorbis (ogg), Widows Media Audio (wma), and Real Audio (ra). The most used looseles codec is wav, which is the way data is stored in a standard audio cd.

Average bit rate

Depending on the transmit capacities we have access to, we will "suggest" the codec at which speed we want our audio data to be delivered. Of course this is very different if we want to store our audio data or if we want to stream it. The lower the bitrate is, lower is the amount of data but also lower is the quality. It is an important decision to take when we are streaming, and it is in direct relacionship with the upload (upstream) capability of the Internet connection we are useing. The higher the upstream rate is, higher will be the bit rate we can decide to use, resulting in a better audio quality.

Average bit rate refers to the average amount of data transfered per second by a codec. An mp3 file, for example, that has an average bit rate of 128 kbps transfers, on average, 128,000 bits every second. Bit rate is not the only measure of audio quality, as some formats such as wma and Vorbis produce higher sound quality than the standard mp3 format at the same bit rate.

The reference table for average bit rate is:

  • 4 kbit/s - minimum necessary for recognizable speech (using special-purpose speech codecs)
  • 8 kbit/s - telephone quality
  • 32 kbit/s - MW (AM) quality
  • 96 kbit/s - FM quality
  • 128 kbit/s - Typical "acceptable" music quality
  • 256 - 320 kbit/s - Near CD quality

Principles

Digital audio refers to audio signals stored in a digital format. Specifically, the term encompasses the following:

  1. Audio conversion:
    • Analogue to digital conversion (ADC) - the capture and digitisation of an analogue audio signal.
    • Digital to analog conversion (DAC) - the conversion of digital audio to a line signal for playback or distribution.
  2. Audio signal processing - processing the digital signal in some way, such as to apply equalisation, reverberation, or to perform sample rate conversion.
  3. Storage, retrieval, and transmission of digital information in an audio format such as CD, mp3, OGG Vorbis, etc.
The digital paradigm

Digital technology has emerged for the simple reason that analogue signals cannot be copied or transmitted perfectly, while digital signals can be. With analog technology, information resolution (ie. 'quality') is lost with each generation of reproduction.

The digitization of signals is due to the increasing demand for different uses for media. In a professional broadcasting environment where signals may require transmission and processing through cable, varied mixing desks and processing equipment, the need to maintain quality is paramount.

Overview of digital audio

Sound inherently begins as an analogue signal and in order for the benefits of digital audio to be realised, the conversion process must be of sufficiently high quality in order to be worthwhile.

In an audio context, "sufficiently high quality" means that the reproduced digital signal should sound identical to the original analogue signal. In other words, the limits of the human auditory system govern the technical requirements of the conversion process.

The generally accepted frequency response of human hearing is from 20 Hz - 20 kHz. The maximum bandwidth that can be represented by a digital signal less is half that of the sample rate. This leads to a required sample rate of at least 40 kHz. In practise, a slightly higher sample rate is needed to allow for a practical anti-aliasing filter.

Encoder

An audio codec is a computer program that compresses/decompresses digital audio data according to a given audio file format or streaming audio format. Most codecs are implemented as libraries which interface to one or more multimedia players, such as XMMS, Winamp or Windows Media Player.

Audio compression is a form of data compression designed to reduce the size of audio data files. Audio compression algorithms are typically referred to as audio codecs. As with other specific forms of data compression, there exist many "lossless" and "lossy" algorithms to achieve the compression effect.

Lossless compression

The primary users of lossless compression are audio engineers, audiophiles and those consumers who want to preserve the full quality of their audio files, in contrast to the quality loss from lossy compression techniques such as Vorbis and MP3.

Lossy compression

Lossy compression typically achieves far greater compression than lossless compression (data of 5-20% of the original stream, rather than 50-60%), by simplifying the complexities of the data. Given that bandwidth and storage are always limited, the trade-off of reduced audio quality is clearly outweighed for some applications where users wish to transmit or store more information.

The innovation of lossy audio compression was to use psychoacoustics to recognize that not all data in an audio stream is perceived by the human ear. Most lossy compression reduces perceptual redundancy by first identifying sounds which are considered perceptually irrelevant, that is, sounds that are very hard to hear. Typical examples include high frequencies, or sounds that occur at the same time as other louder sounds. Those sounds are coded with decreased accuracy or not coded at all.

Due to the nature of lossy algorithms, audio quality suffers when a file is decompressed and recompressed (generational losses). This makes lossy-compressed files unsuitable for professional audio engineering applications, such as sound editing and multitrack recording. However, they are very popular with end users (particularly MP3), as a megabyte can store about a minute's worth of music at adequate quality.

Usability of lossy audio codecs is determined by:

  • Perceived audio quality
  • Compression factor
  • Speed of compression and decompression
  • Inherent latency of algorithm
  • Software and hardware support

Lossy formats are often used for the distribution of streaming audio, or interactive applications (such as the coding of speech for digital transmission in cell phone networks). In such applications, the data must be decompressed as the data flows, rather than after the entire data stream has been transmitted. Not all audio codecs can be used for streaming applications, and for such applications a codec designed to stream data effectively will usually be chosen.

Latency results from the methods used to encode and decode the data. Some codecs will analyze a longer segment of the data to optimize efficiency, and then code it in a manner that requires a larger segment of data at one time in order to decode. (Often codecs create segments called a "frame" to create discrete data segments for encoding and decoding.) The inherent latency of the coding algorithm can be critical; for example, when there is two-way transmission of data, such as with a telephone conversation, significant delays may seriously degrade the perceived quality.

In contrast to the speed of compression, which is proportional to the number of operations required by the algorithm, here latency refers to the number of samples which must be analysed before a block of audio is processed. In the minimum case, latency is 0 zero samples (e.g., if the coder/decoder simply reduces the number of bits used to quantize the signal). Time domain algorithms such as LPC also often have low latencies, hence their popularity in speech coding for telephony. In algorithms such as MP3, however, a large number of samples have to be analyzed in order to implement a psychoacoustic model in the frequency domain, and latency is on the order of 23 ms (46 ms for two-way communication).

If lossy data compression is used on audio or visual data, differences from the original signal will be introduced; if the compression is substantial, or lossy data is decompressed and recompressed, this may become noticeable in the form of compression artifacts. Whether these affect the perceived quality, and if so how much, depends on the compression scheme, encoder power, the characteristics of the input data, the listeners perceptions, and the listening or viewing environment.

Experts and audiophiles may detect artifacts in many cases in which the average listener would not. Some musicians enjoy the distinct artifacts of low bit rate (sub-fm quality) encoding and there is a growing scene of net labels distributing stylized low bit music.

The bit rates in this section are approximately the minimum that the average listener in a typical listening or viewing environment, when using the best available compression, would perceive as not significantly worse than the reference standard:

  • 4 kbit/s - minimum necessary for recognizable speech (using special-purpose speech codecs)
  • 8 kbit/s - telephone quality
  • 32 kbit/s - MW (AM) quality
  • 96 kbit/s - FM quality
  • 128 kbit/s - Typical "acceptable" music quality
  • 256 - 320 kbit/s - Near CD quality