Bert Schiettecatte is a computer science student in Belgium.
He is currently working on QOrchestra, a visual authoring
tool for SAOL as well as contacting audio companies to raise
the awareness of Mpeg-4 technology. In addition he's promoting
FlowML, his own audio synthesis format.
In this wide ranging article Bert takes a look at issues
surrounding MP4 including the chances of it making the
big time.
related links:
Bert's homepage
FlowML
QOrchestra
Mpeg-4 home at MIT
Mpeg-4 home at Berkeley
Intro to Structured Audio
|
Mpeg-4 Structured Audio, media piracy today and the future of
music distribution: a musician's impression.
by Bert Schiettecatte (bschiett@vub.ac.be)
----------------------------------------------------------
The following article is (C)opyright 1999-2000 by
Bert Schiettecatte, and may not be reproduced or published
in any way (except for personal use) without written
permission from the author, Bert Schiettecatte
(bschiett@vub.ac.be).
----------------------------------------------------------
1 Introduction
1.1 Introduction
This article attempts to evaluate how Mpeg-4 audio can be interesting to
both musicians and the pro-audio industry. It discusses how authoring in
Mpeg-4 audio will affect a musicians' methodology, how it relates to today's
technology, why media piracy exists and if Mpeg-4 audio will change the
situation, and finally, how and what technology will decide if Mpeg-4 audio
will become the next big buzz.
1.2 Structured audio
Structured audio [3] is a way to describe audio. In traditional audio formats,
audio is stored as a sequence of samples, sometimes compressed. Popular
compression like MP3 (actually Mpeg-2 layer 3) stores "differences" between
consecutive audio samples instead of the samples. The Mpeg-4 audio standard
introduces an audio synthesis language (SAOL, Structured Audio Orchestra
Language, similar to CSound) which allows to describe (instead of store)
an audio signal using signal processing techniques. Together with SAOL, a
language for describing a musical score has been introduced: SASL (Structured
Audio Score Language). Instead of storing a composition by sampling and
compressing it, it should now be (theoretically) possible to describe the
instruments used in the composition using SAOL, and store information about
the score for the instruments using SASL. These two pieces of information
can be encoded in an Mpeg-4 bitstream using an Mpeg-4 encoder.
2 Structured authoring
2.1 Today's music distribution
Today, music is distributed on media like CDs, Vinyl, and MiniDiscs. When there
are multiple versions of one song, each of these versions requires additional time
on the medium. Most of the music distributed on these media is sold through normal
record shops, and more and more music is being sold through online mail-order shops
on the web (e.g. amazon.com).
One could question this type of music distribution: do people always
want to pay for songs on a CD they aren't interested in? It has been suggested that
with future online shops, it will become possible to create your own compilation,
order that CD and pay for the rights of the songs you selected. Another alternative
is to download these songs in the MP3 format and (again) pay the rights of these
songs -- people could then burn their compilation using these files.
Music might become more interactive and offer more possibilities than ever before,
if meta-information is added to the music (e.g. information about the elements the
music consists of). Today's music distribution is not very suited for this new
philosophy (even if some information is stored on the CD, e.g. instrument samples,
the user still needs a lot of expensive equipment to do something with this
information). Therefore, a new format in which all possible information about the
music is stored might be interesting. A variety of possibilities aready exist,
but they all lack the synchronization with video and graphics. This is were Mpeg-4
becomes interesting: it offers a lot of possibilities for offering interactive audio
applications.
2.2 The future of music distribution
Mpeg-4 SA offers a way to add information about music (instruments, score,
samples, ...) to music itself. This property makes the format very suited for
a new way of music distribution over the internet. Together with the advanced
copy-protection and encryption possibilities offered by the IPMP, Mpeg-4 might
become the new standard for music distribution, provided that enough
musicians are interested in directly creating music in this format and enough
tools to create Mpeg-4 bitstreams will become available in the near future.
Since this Mpeg-4 SA is a way to author music, not to compress, one could ask
if it will be possible to author any style of music in this format while still having
the incredible "compression ratio". The answer is probably no (as discussed in
the next section), but the format is still interesting because extra value can be
added to music (discussed in the previous section).
2.3 Music classification and structured audio
Various music styles exist, and recording these styles has never been a
problem throughout history because recording consisted of storing consecutive
audio samples, either in an analog or digital fashion. Most compression techniques
do something to this "snapshot", and because Mpeg-4 SA is a new way of authoring
rather than compressing, the style of the music becomes one of the most crucial
factors in compressing (authoring) music using Mpeg-4 SA.
For example, instrumental electronic music requiring at most a few audio samples,
is straightforward to describe using Mpeg-4 SA. On the other hand, mainstream music
becomes very hard to describe in Mpeg-4 SA. Actually, it's possible, but there's no
significant loss in file size compared to a compression format like MP3. This
is mainly because of the various "organic" elements present in this type of music:
although a lot of research and results exist on synthesis of a singing voice, there's
probably no way to accurately describe a singer's voice using a synthesis algorithm.
Common sense dictates that this is because every human being's vocal chords differ,
and the way a singer performs is related to emotions, which will probably never
be simulated by a computer system in today's definition.
One could argue that compressing this type of music using Mpeg-4 SA is still
interesting, because most likely this music contains a vocal track (if there are
multiple vocal tracks, we assume that they have been mixed to a single audio
channel) and several instrument tracks (which can be easily described using synthesis
algorithms). Thus, the music could be encoded using a compressed audio track (using
an algorithm similar to the one used in MP3) containing vocals, and a SAOL program
describing the instruments. But this is useless, because a single compressed audio
track would require the same amount of storage as the whole song compressed using MP3
(since the whole song is mixed to a single audio track anyway, in the end).
This discussion of course assumes that an algorithm that,
given a stereo audio stream, generates a collection of synthesis algorithms that
approximate each different component in the audio stream, does not exist.
It's reasonable to assume that such an algorithm does not exist for most music styles,
because audio channels might be close in frequency to each other, preventing their
detection and analysis. The assumption is also made that, in this case, neither the
user or the content provider would be interested in extra value (instruments which would
allow the user to create a remix, for example) added to the music.
2.4 Rethinking existing methodologies
Because creating music in Mpeg-4 SA is very different (today) from creating music
using dedicated hardware, it will be very hard to convince musicians to make the
transition, and it will probably take them several years. Therefore, it would be better
to hide all the details of creating an Mpeg-4 audio composition, such that musicians
can still use the same sequencer and perhaps software versions of the hardware
synthesizers they use today. This requires that pro-audio software companies come up
with Mpeg-4 compatibility in some way, which means that the success of this new
format depends on the products they will change or develop. It is still not very clear
if these companies are willing to make the transition, if there is no guarantee for
higher sales figures -- the discussion remains open.
2.5 A structured audio composition
The following examples were taken from John Lazzaro's excellent tutorial on
structured audio [4]. The following is a piece of SAOL code consisting of a
sine oscillator instrument.
global
{
srate 48000; // DAT-quality
krate 2400; // 417 us
}
//
// instr vtone
// shaped sinewave
//
instr vtone (num)
{
// declarations
// envelope settings
ivar atime; // attack
ivar rtime; // release
// internal env state
ivar attack;
ivar release;
ivar sustain;
ivar a; // sets osc f
ksig env; // env output
asig x, y; // osc state
asig init;
// **********************
// computed during i-pass
// **********************
// turns MIDI number into
// oscillator constant
a = 2*sin(3.141597*cpsmidi(num)/s_rate);
// envelope computation
atime = 0.3; // attack time (s)
rtime = 0.2; // decay time (s)
// computes envelope state
// dur is an internal variable holding
// the duration of the sound
if (dur > atime + rtime)
{
attack = atime;
release = rtime;
sustain = dur - atime - rtime;
}
else
{
attack = dur/2;
release = dur/2;
sustain = 0;
}
// **********************
// computed during k-pass
// **********************
env = kline(0, attack, 1, sustain, 1, release, 0);
// **********************
// computed during a-pass
// **********************
if (init == 0)
{
x = 0.25;
init = 1;
}
x = x - a*y;
y = y + a*x;
output(y*env);
}
The following is a simple sequence (a SASL program) for the above SAOL program.
0 tempo 110
2 tempo 112.1
4 tempo 114
6 tempo 116
8 tempo 118
1 vtone 1.5 52
3 vtone 1.5 64
5 vtone 1 63
6 vtone 0.5 59
6.5 vtone 0.5 61
7 vtone 1 63
8 vtone 1 64
10 end
2.6 Key elements in promoting structured audio
To be attractive to the end user, the classic requirements apply: there should be
enough software to play the music, this software should be free, and it should exist
for every operating system possible.
One of the most crucial factors in the success of this new format, is how quickly
pro-audio companies support this new technology, and do this without getting the
composer involved too much in its low-level details. Composing music today is
technically easy, and it has taken a while to convince musicians to use a computer
system with sequencing software for creating music. Making musicians switch again
to a completely new way of making music probably won't work, certainly not
if they have to become a SASL programmer first. Creating SAOL instruments should be
done using some visual synthesis tool, similar to the commercial software synthesizers
in use today by professional musicians.
It is also a fact that the success of this new audio format depends greatly on the way
it deals with copy protection, and the price to pay for a song in this new format.
Most likely copy protections will be analyzed and removed as it has happened
in the past, unless something is done about the cost of a song and how easy it is
to get a new song. The only way to battle piracy, as discussed below, is to find
it source and to do something about the cause, not the symptoms. People don't do
piracy when it's not worth the trouble.
3 Copyright issues and intellectual property
3.1 An overview of the music business today
The music business today finds new musicians, promotes and produces their
hard work, makes sure they get paid, licenses their work to other record
companies, and if relevant, arranges live performances for the musicians.
This is of course the official image the media shows of these companies,
the truth is quite different: most musicians which deserve some attention
don't get a deal because their music is not really interesting from a business
point of view. This is probably the main reason why record companies just
don't like MP3 and any future formats which have a similar effect on their
sales, together with the copyright nightmares involved. They have been in
this position for years, and sites like mp3.com have drastically changed
their position in the market: they allow musicians to "break free".
One of the main reasons today why musicians still rely on big record
companies, is because of the incredibly expensive recording studios
($100,000 is no exception) these companies can afford. Most musicians can
never get all this technology. He's making music in his bedroom,
using at best a cheap 32-channel mixing console with some outboard effect
processors, synthesizers, samplers and other (cheap) dedicated hardware.
This situation was even worse a few years ago, when electronics in general
were extremely expensive and definately not affordable by most people.
Mpeg-4 structured audio might change all this:
SAOL, the signal processing language specified in the standard, doesn't
have any limits (your computer's processing power decides what you can do)
by itself and is powerful enough to describe any existing piece of audio gear.
This means that the shift from hardware audio processing to software audio
processing (which is already clearly visible today: Nemesys Gigasampler
already makes hardware samplers obsolete, while Native Instruments
Reaktor questions the need for a dedicated hardware synthesizer) will
continue to a much broader extent.
In the future, it is very likely that musicians will be able to do whatever
they want, just using a powerful computer (which becomes cheaper every day)
and some software. Entire studios full of dedicated hardware will probably
be replaced by a computer system which does everything, from recording the
very first takes of an audio track to mastering.
3.2 Media piracy: source, consequences and solutions
The mean reason why music piracy exists today is not because people don't
want to pay for good music, or because all music is freely available anyway
using tools like Napster, but because record companies missed some vital
points in their business plan: the price of a CD varies drastically around
the globe. For example, you can buy a CD in the US for $12, and pay over $20
for the same CD in a European country. On top of that, different versions of
the same CD exist around the globe, probably censored or cut. And even
if that's not enough, most CDs are not available around the globe at the same
time (sometimes, a CD is available in the US over 6 months before it is in the
record stores in Europe). The consequences are trivial: people don't accept this,
don't wait when they will be able to buy the CD in their country, or don't want the
(bad) european version, and certainly don't want to pay more than people in
the US.
An attempt has been made to add some copy protection to MP3, and
new copy-protection enabled audio formats like WMA have been developed.
However, one could ask if this will solve the problem. These formats have
been around for some time, but nobody is selling music or buying music
in such a copy-protected format. If you buy music, you want to be able to
make a copy, even if it's just for backup purposes, in case the original
gets damaged. The solution in stopping music piracy is probably not
in raising CD prices or coming up with another copy-protection format,
but making people aware of what they do, and find out why they copy music
in the first place, and react to that. People might have no choice but to
copy and distribute music, because of several geographic or social-related
issues.
Mpeg-4 structured audio is an even more complicated matter of course. A SAOL
program is available in the bitstream, and if not encrypted, might be a huge
copyright problem for pro-audio software companies which want to offer authoring
support for Mpeg-4 structured audio. After all, their algorithms which have been
established through years of costly research, are about to be exposed. On top of
this encryption problem, there is the classic tale of copy protection. It has
been proven several times in the past that there is no such thing as a bullet-proof
encryption mechanism. For example, the encryption technique in DVD was revealed
last year by a Linux enthousiast. The big question is if this can happen with
Mpeg-4 structured audio as well.
3.3 IPMP
IPMP stands for Intellectual Property Management & Protection, a framework proposed
in the Mpeg-4 standard to battle illegal copying or reverse-engineering of content.
The standard itself does not propose any means of protection, but a framework which
can be used by implementors of Mpeg-4 players (or other software). The implementors
can come up with their own IPMP systems, which can deal with content in a variety of
ways through IPMP-descriptors and IPMP-elementary streams. A system is identified
using its descriptor, and an IPMP-elementary stream contains things like decryption
keys. Content can even contain watermarks, to detect illegal copies (every player
can have its own unique ID, and when the content is streamed, a watermark can be
added which indicates this ID). IPMP also allows management of rights, patents and
royalties, for any type of content (even SAOL code), and allows for auditing of
the content.
It is clear that the success of piracy will be dictated by the design of these IPMP
systems, which is beyond the scope of the Mpeg-4 standard. It's the application
developer's responsability to come up with a way of protecting and controlling content.
This is quite a different situation than with a format like DVD, where the encryption/
decryption is described in the standard and thus common for all DVD players. Also, it
won't be easy to reverse-engineer the copy control technology in Mpeg-4 products, because
this technology will probably be very different for each product on the market.
Thus, the success of copy protection depends very much on the people who will design
the IPMP systems. A dangerous situation (for the intellectual property owners) can
arise when several companies sit together and come up with a common copy protection
system, which is likely to happen in the near future. In this situation, the same
will probably happen as with DVD.
4 A technological point of view
4.1 Mpeg-4 SA and FlowML
The SAOL language described in the Mpeg-4 SA standard, is a low-level signal processing
language which is very close to C in syntax. A discussion or evaluation of the language's
possibilities and semantics is beyond the scope of this document, but it is interesting
to note that designing a very complex SAOL program involves the same architectural
challenges as in classic software engineering, and the paradigms which already exist
to structure a piece of software elegantly, probably apply to a signal processing
language too.
This might sound a bit strange at first, but in theory it should be possible to recreate
an existing hardware synthesizer in SAOL. For example, the Korg Wavestation is a vector-
based synthesizer based on the Prophet range of synthesizers from Sequential Circuits,
and has been out of production for some time. The synthesizer is based on wavesequencing
and wavetables, and can probably be recreated as a SAOL program (that is of course not a
trivial project). However, designing a SAOL program which is understandable, uses minimal
computing power, and simulates the real thing is far from trivial, since electronic
equipment is structured differently compared to a procedural program (this has been a
fact in software engineering for some time, and a lot of research already exists on
component-oriented programming, a paradigm which brings software design closer to hardware
design).
This observation together with the design issues of the QOrchestra project [1], indicated
that it might be interesting to build a high-level component-oriented language (a format
actually) on top of SAOL, to speed up the design process of a SAOL program, just like UML is
used to design a large, complex piece of object-oriented software. This high-level language
is called FlowML [2], and is a set of XML DTDs together with a specification of which
standard high-level signal processing components are supported, and a mechanism to add
language-specific components. The translation from a FlowML diagram to a SAOL program is
trivial most of the time, and is a special case of translating a component-oriented program
to a procedural program.
4.2 Operating systems and their influence
It is clear that *nix operating systems prove their importance once again in the development
and exploration of Mpeg-4 tools. It is a fact that most of the experiments involving Mpeg-4
technology are done on these operating systems, which guarantees that there will be a good
basis for consumer-end products (this has always been the case) on desktop-oriented
operating systems.
Unfortunately it's a fact that software giants like Microsoft have the power to force their
own audio formats with intellectual property management upon the masses, hiding the
internals and know-how required for decent software development by 3rd parties once again.
Mpeg-4 offers far more advanced features than a format such as WMA, but it might take some
time before the first Mpeg-4 solutions with IPMP are available, certainly if not everybody
cooperates and understands that standardization is a far more important issue than market
share.
4.3 Existing tools
This section gives an overview of open-source Mpeg-4 SA tools.
4.3.1 Encoders
The reference software for the Mpeg-4 SA standard is "saolc", a program which can be used to
create an Mpeg-4 bitstream from a SAOL program and SASL or MIDI score. The software is not
meant to be used when performance is an issue, it's a reference implementation. It can also
decode an Mpeg-4 bitstream.
4.3.2 Decoders
Sfront is a great SAOL-to-C compiler, very fast and easy to use. It's a command-line program
just like saolc, and it supports some nice features like real-time audio output and real-
time MIDI input. It is meant to be a decoder in the first place, but it can encode too.
Sarun is an Mpeg-4 SA decoder which uses internally a compiler from SAOL to an instruction
set for a special virtual machine. It is being developed by Ross Bencina, and is still under
development.
4.3.3 Authoring tools
QOrchestra is a visual authoring tool for SAOL and FlowML. It allows you to create diagrams
from high-level synthesis building blocks, and reuse these diagrams as new building blocks.
It can save diagrams in SAOL and FlowML, and it loads FlowML. It is meant to be used by
people who don't know anything about SAOL, still have to design synthesis instruments in a
high-level way, and need a way to generate SAOL code for their instruments.
5 Conclusion
Mpeg-4 SA offers a new way to think about music. Its synthesis language SAOL is as powerful
as a studio full of expensive dedicated hardware equipment. In theory, SAOL allows you to
create any type of music, but it is very clear that the "compression" ratio of Mpeg-4 SA
depends on the type of music you want to encode.
6 References
1. QOrchestra: a visual authoring tool for SAOL, project specification.
Bert Schiettecatte, 1999. http://qorchestra.sourceforge.net/
2. FlowML: a format for audio synthesis diagrams.
Bert Schiettecatte, 1999. http://www.flowml.com/
3. Mpeg-4 homepage at the MIT media lab.
http://sound.media.mit.edu/mpeg4/
4. Mpeg-4 homepage at Berkeley.
http://www.cs.berkeley.edu/~lazzaro/