Ogg Vorbis: Fidelity measurement and terminology discussion

Terminology discussed in this document is based on common terminology associated with contemporary codecs such as MPEG I audio layer 3 (mp3). However, some differences in terminology are useful in the context of Vorbis as Vorbis functions somewhat differently than most current formats. For clarity, then, we describe a common terminology for discussion of Vorbis's and other formats' audio quality.

Subjective and Objective

Objective fidelity is a measure, based on a computable, mechanical metric, of how carefully an output matches an input. For example, a stereo amplifier may claim to introduce less that .01% total harmonic distortion when amplifying an input signal; this claim is easy to verify given proper equipment, and any number of testers are likely to arrive at the same, exact results. One need not listen to the equipment to make this measurement.

However, given two amplifiers with identical, verifiable objective specifications, listeners may strongly prefer the sound quality of one over the other. This is actually the case in the decades old debate [some would say jihad] among audiophiles involving vacuum tube versus solid state amplifiers. There are people who can tell the difference, and strongly prefer one over the other despite seemingly identical, measurable quality. This preference is subjective and difficult to measure but nonetheless real.

Individual elements of subjective differences often can be qualified, but overall subjective quality generally is not measurable. Different observers are likely to disagree on the exact results of a subjective test as each observer's perspective differs. When measuring subjective qualities, the best one can hope for is average, empirical results that show statistical significance across a group.

Perceptual codecs are most concerned with subjective, not objective, quality. This is why evaluating a perceptual codec via distortion measures and sonograms alone is useless; these objective measures may provide insight into the quality or functioning of a codec, but cannot answer the much squishier subjective question, "Does it sound good?". The tube amplifier example is perhaps not the best as very few people can hear, or care to hear, the minute differences between tubes and transistors, whereas the subjective differences in perceptual codecs tend to be quite large even when objective differences are not.

Fidelity, Artifacts and Differences

Audio artifacts and loss of fidelity or more simply put, audio differences are not the same thing.

A loss of fidelity implies differences between the perceived input and output signal; it does not necessarily imply that the differences in output are displeasing or that the output sounds poor (although this is often the case). Tube amplifiers are not higher fidelity than modern solid state and digital systems. They simply produce a form of distortion and coloring that is either unnoticeable or actually pleasing to many ears.

As compared to an original signal using hard metrics, all perceptual codecs [ASPEC, ATRAC, MP3, WMA, AAC, TwinVQ, AC3 and Vorbis included] lose objective fidelity in order to reduce bitrate. This is fact. The idea is to lose fidelity in ways that cannot be perceived. However, most current streaming applications demand bitrates lower than what can be achieved by sacrificing only objective fidelity; this is also fact, despite whatever various company press releases might claim. Subjective fidelity eventually must suffer in one way or another.

The goal is to choose the best possible tradeoff such that the fidelity loss is graceful and not obviously noticeable. Most listeners of FM radio do not realize how much lower fidelity that medium is as compared to compact discs or DAT. However, when compared directly to source material, the difference is obvious. A cassette tape is lower fidelity still, and yet the degradation, relatively speaking, is graceful and generally easy not to notice. Compare this graceful loss of quality to an average 44.1kHz stereo mp3 encoded at 80 or 96kbps. The mp3 might actually be higher objective fidelity but subjectively sounds much worse.

Thus, when a CODEC must sacrifice subjective quality in order to satisfy a user's requirements, the result should be a difference that is generally either difficult to notice without comparison, or easy to ignore. An artifact, on the other hand, is an element introduced into the output that is immediately noticeable, obviously foreign, and undesired. The famous 'underwater' or 'twinkling' effect synonymous with low bitrate (or poorly encoded) mp3 is an example of an artifact. This working definition differs slightly from common usage, but the coined distinction between differences and artifacts is useful for our discussion.

The goal, when it is absolutely necessary to sacrifice subjective fidelity, is obviously to strive for differences and not artifacts. The vast majority of codecs today fail at this task miserably, predictably, and regularly in one way or another. Avoiding such failures when it is necessary to sacrifice subjective quality is a fundamental design objective of Vorbis and that objective is reflected in Vorbis's design and tuning.