This copy was created on the 30th of January, 2025.

Previously archived on The Internet Archive (WAVE and Canon) on the 9th of July, 2014 and the 11th of September, 2014 from Timothy John Weber's old personal website.

Originally written by Timothy John Weber.


The WAVE File Format

Introduction

The WAVE file format is a subset of Microsoft's RIFF spec, which can include lots of different kinds of data. It was originally intended for multimedia files, but the spec is open enough to allow pretty much anything to be placed in such a file, and ignored by programs that read the format correctly.

This description is not meant to be exhaustive, but to suggest simple ways of doing common tasks with waveform audio, and give some pointers to other sources of information.

Basics of digital audio and sound

First, some basics. Sound is air pressure fluctuation. Digitized sound is a graph of the change in air pressure over time. That's all there is to it.

For a good picture of this, open up Windows Sound Recorder and record a short sound, then look at the green bars it shows. When they're wide, there's a lot of air pressure, which your ear detects as a loud noise. When they're flat in the middle, there's no change in air pressure, which your ear detects as silence. The faster they go up and down, the higher the sound you hear.

When you record a sound, your microphone changes the air pressure fluctuations into electrical voltage fluctuations, which your sound card measures every so often and changes into numbers, called samples. When you play a sound back, the process is reversed, except that the voltage fluctuations go to your speakers instead of your microphone, and are converted back into air pressure by the speaker cone.

The speed with which your sound card samples the voltage is called the sample rate, and is expressed in kilohertz (kHz). One kHz is a thousand samples per second.

It's important to note that digitized audio stores nothing directly about a sound's frequency, pitch, or perceived loudness. You can run certain algorithms on the samples to determine these values approximately, but you can't just read them from the file.

What is RIFF?

RIFF is a file format for storing many kinds of data, primarily multimedia data like audio and video. It is based on chunks and sub-chunks. Each chunk has a type, represented by a four-character tag. This chunk type comes first in the file, followed by the size of the chunk, then the contents of the chunk.

The entire RIFF file is a big chunk that contains all the other chunks. The first thing in the contents of the RIFF chunk is the "form type," which describes the overall type of the file's contents. So the structure of a RIFF file looks like this:

    Offset  Contents
    (hex)
    0000    'R', 'I', 'F', 'F'
    0004    Length of the entire file - 8 (32-bit unsigned integer)
    0008    form type (4 characters)
    
    000C    first chunk type (4 character)
    0010    first chunk length (32-bit unsigned integer)
    0014    first chunk's data
    ...     ...

All integers are stored in the Intel low-high byte ordering (usually referred to as "little-endian").

A more detailed description of the RIFF format can be found in the Microsoft Win32 Multimedia API documentation, which is supplied as a Windows Help file with many Windows programming tools such as C++ compilers.

What is WAVE?

The WAVE format is a subset of RIFF used for storing digital audio. Its form type is "WAVE", and it requires two kinds of chunks:

WAVE can also contain any other chunk type allowed by RIFF, including LIST chunks, which are used to contain optional kinds of data such as the copyright date, author's name, etc. Chunks can appear in any order.

The WAVE file is thus very powerful, but also not trivial to parse. For this reason, and also possibly because a simpler (or inaccurate?) description of the WAVE format was promulgated before the Win32 API was released, a lot of older programs read and write a subset of the WAVE format, which I refer to as the "canonical" WAVE format. This subset basically consists of only two chunks, the fmt and data chunks, in that order, with the sample data in PCM format. For a detailed description of what this format looks like, and a description of the contents of the fmt chunk, look at the section under "The Canonical WAVE File Format".

What kind of compression is used in WAVE files?

The WAVE specification supports a number of different compression algorithms. The format tag entry in the fmt chunk indicates the type of compression used. A value of 1 indicates Pulse Code Modulation (PCM), which is a "straight," or uncompressed encoding of the samples. Values other than 1 indicate some form of compression. For more information on the values supported and how to decode the samples, see the Microsoft Win32 Multimedia API documentation.

How can I write data to a WAVE file?

The simplest way to write data into WAVE files produced by your own programs is to use the canonical format. This will be compatible with any other program, even the older ones.

How can I read sample data from a WAVE file?

There are four methods you could use to read the sample data from a WAVE file:

  1. For the most flexibility, use the mmio* functions in the Win32 API. These allow you to navigate through the various chunks and subchunks in the RIFF file and find the ones you want. You'll also need to parse the fmt chunk to determine the sample rate, sample width, etc.
  2. Another option is to only support the canonical format. This is especially useful if your program will be used in a limited environment, and you know the sample format ahead of time. In this case, you can simply read the sample data starting at offset 44 from the beginning of the file. If you want to support newer files and it's not a problem to run them through another program first, you can use my shareware StripWav [archived] program.
  3. If you're using C++, you can use my classes for reading WAVE and RIFF files. Read the documentation for WAVE [archived] and RIFF [archived] to see if they'll do what you need; if so, download the source code [archived] (version 1.3, updated 15 August 2001, 40 KB).
  4. There are also a number of other packages for dealing with audio around, that work with Delphi, ActiveX, etc. Probably one of them will help keep you from reinventing the wheel.

How can I record and play WAVE files under Windows?

If all you want to do is play a WAVE file, use the Win32 function PlaySound(). If you're writing for 16-bit Windows, use sndPlaySound() instead. You can use either of these functions without knowing anything about the internal format of the file.

To give your users a little more control over pausing, stopping, and rewinding longer WAVE files, use the MCI functions or an MCI control provided by your programming language. You can also record audio using MCI.

For more precise control than the MCI functions provide, you'll need to either use the Microsoft Win32 Multimedia API (which should be documented with whatever programming language you're using), or find a commercial or shareware library for your language that does. If you find one you like, please let me know so I can list it here, since I'm often asked for advice on this!

I really need frequency (or pitch) information from the WAVE file. How do I get it?

One way is to use a Fast Fourier Transform. Other ways involve doing things like examining statistical correlations between samples at different offsets, to determine the full wavelength at a given timepoint. It's hairy stuff.

For more references, try the following links:

Links for more information


The Canonical WAVE File Format

The canonical WAVE format starts with the RIFF header:

Offset Length Contents
0 4 bytes 'RIFF'
4 4 bytes file length - 8, the '8' is the length of the first two entries. i.e., the number of bytes that follow in the file.
8 4 bytes 'WAVE'
Next, the fmt chunk describes the sample format:
12 4 bytes 'fmt '
16 4 bytes 0x00000010, length of the fmt data (16 bytes)
20 2 bytes 0x0001, format tag: 1 = PCM
22 2 bytes channels, channels: 1 = mono, 2 = stereo
24 4 bytes sample rate, samples per second e.g., 44100
28 4 bytes bytes per second, sample rate * block align
32 2 bytes block align, channels * bits per sample / 8
34 2 bytes bits per sample, 8 or 16
Finally, the data chunk contains the sample data:
36 4 bytes 'data'
40 4 bytes length of the data block
44 X bytes sample data

The sample data must end on an even byte boundary. All numeric data fields are in the Intel format of low-high byte ordering. 8-bit samples are stored as unsigned bytes, ranging from 0 to 255. 16-bit samples are stored as 2's-complement signed integers, ranging from -32768 to 32767.

For multi-channel data, samples are interleaved between channels, like this:

sample 0 for channel 0
sample 0 for channel 1
sample 1 for channel 0
sample 1 for channel 1
...

For stereo audio, channel 0 is the left channel and channel 1 is the right.