m-station -----> Bert Schiettecatte //Mpeg-4 8/2000

Bert Schiettecatte is a computer science student in Belgium. He is currently working on QOrchestra, a visual authoring tool for SAOL as well as contacting audio companies to raise the awareness of Mpeg-4 technology. In addition he's promoting FlowML, his own audio synthesis format. In this wide ranging article Bert takes a look at issues surrounding MP4 including the chances of it making the big time. related links: Bert's homepage FlowML QOrchestra Mpeg-4 home at MIT Mpeg-4 home at Berkeley Intro to Structured Audio

Mpeg-4 Structured Audio, media piracy today and the future of music distribution: a musician's impression. by Bert Schiettecatte (bschiett@vub.ac.be) ---------------------------------------------------------- The following article is (C)opyright 1999-2000 by Bert Schiettecatte, and may not be reproduced or published in any way (except for personal use) without written permission from the author, Bert Schiettecatte (bschiett@vub.ac.be). ---------------------------------------------------------- 1 Introduction 1.1 Introduction This article attempts to evaluate how Mpeg-4 audio can be interesting to both musicians and the pro-audio industry. It discusses how authoring in Mpeg-4 audio will affect a musicians' methodology, how it relates to today's technology, why media piracy exists and if Mpeg-4 audio will change the situation, and finally, how and what technology will decide if Mpeg-4 audio will become the next big buzz. 1.2 Structured audio Structured audio [3] is a way to describe audio. In traditional audio formats, audio is stored as a sequence of samples, sometimes compressed. Popular compression like MP3 (actually Mpeg-2 layer 3) stores "differences" between consecutive audio samples instead of the samples. The Mpeg-4 audio standard introduces an audio synthesis language (SAOL, Structured Audio Orchestra Language, similar to CSound) which allows to describe (instead of store) an audio signal using signal processing techniques. Together with SAOL, a language for describing a musical score has been introduced: SASL (Structured Audio Score Language). Instead of storing a composition by sampling and compressing it, it should now be (theoretically) possible to describe the instruments used in the composition using SAOL, and store information about the score for the instruments using SASL. These two pieces of information can be encoded in an Mpeg-4 bitstream using an Mpeg-4 encoder. 2 Structured authoring 2.1 Today's music distribution Today, music is distributed on media like CDs, Vinyl, and MiniDiscs. When there are multiple versions of one song, each of these versions requires additional time on the medium. Most of the music distributed on these media is sold through normal record shops, and more and more music is being sold through online mail-order shops on the web (e.g. amazon.com). One could question this type of music distribution: do people always want to pay for songs on a CD they aren't interested in? It has been suggested that with future online shops, it will become possible to create your own compilation, order that CD and pay for the rights of the songs you selected. Another alternative is to download these songs in the MP3 format and (again) pay the rights of these songs -- people could then burn their compilation using these files. Music might become more interactive and offer more possibilities than ever before, if meta-information is added to the music (e.g. information about the elements the music consists of). Today's music distribution is not very suited for this new philosophy (even if some information is stored on the CD, e.g. instrument samples, the user still needs a lot of expensive equipment to do something with this information). Therefore, a new format in which all possible information about the music is stored might be interesting. A variety of possibilities aready exist, but they all lack the synchronization with video and graphics. This is were Mpeg-4 becomes interesting: it offers a lot of possibilities for offering interactive audio applications. 2.2 The future of music distribution Mpeg-4 SA offers a way to add information about music (instruments, score, samples, ...) to music itself. This property makes the format very suited for a new way of music distribution over the internet. Together with the advanced copy-protection and encryption possibilities offered by the IPMP, Mpeg-4 might become the new standard for music distribution, provided that enough musicians are interested in directly creating music in this format and enough tools to create Mpeg-4 bitstreams will become available in the near future. Since this Mpeg-4 SA is a way to author music, not to compress, one could ask if it will be possible to author any style of music in this format while still having the incredible "compression ratio". The answer is probably no (as discussed in the next section), but the format is still interesting because extra value can be added to music (discussed in the previous section). 2.3 Music classification and structured audio Various music styles exist, and recording these styles has never been a problem throughout history because recording consisted of storing consecutive audio samples, either in an analog or digital fashion. Most compression techniques do something to this "snapshot", and because Mpeg-4 SA is a new way of authoring rather than compressing, the style of the music becomes one of the most crucial factors in compressing (authoring) music using Mpeg-4 SA. For example, instrumental electronic music requiring at most a few audio samples, is straightforward to describe using Mpeg-4 SA. On the other hand, mainstream music becomes very hard to describe in Mpeg-4 SA. Actually, it's possible, but there's no significant loss in file size compared to a compression format like MP3. This is mainly because of the various "organic" elements present in this type of music: although a lot of research and results exist on synthesis of a singing voice, there's probably no way to accurately describe a singer's voice using a synthesis algorithm. Common sense dictates that this is because every human being's vocal chords differ, and the way a singer performs is related to emotions, which will probably never be simulated by a computer system in today's definition. One could argue that compressing this type of music using Mpeg-4 SA is still interesting, because most likely this music contains a vocal track (if there are multiple vocal tracks, we assume that they have been mixed to a single audio channel) and several instrument tracks (which can be easily described using synthesis algorithms). Thus, the music could be encoded using a compressed audio track (using an algorithm similar to the one used in MP3) containing vocals, and a SAOL program describing the instruments. But this is useless, because a single compressed audio track would require the same amount of storage as the whole song compressed using MP3 (since the whole song is mixed to a single audio track anyway, in the end). This discussion of course assumes that an algorithm that, given a stereo audio stream, generates a collection of synthesis algorithms that approximate each different component in the audio stream, does not exist. It's reasonable to assume that such an algorithm does not exist for most music styles, because audio channels might be close in frequency to each other, preventing their detection and analysis. The assumption is also made that, in this case, neither the user or the content provider would be interested in extra value (instruments which would allow the user to create a remix, for example) added to the music. 2.4 Rethinking existing methodologies Because creating music in Mpeg-4 SA is very different (today) from creating music using dedicated hardware, it will be very hard to convince musicians to make the transition, and it will probably take them several years. Therefore, it would be better to hide all the details of creating an Mpeg-4 audio composition, such that musicians can still use the same sequencer and perhaps software versions of the hardware synthesizers they use today. This requires that pro-audio software companies come up with Mpeg-4 compatibility in some way, which means that the success of this new format depends on the products they will change or develop. It is still not very clear if these companies are willing to make the transition, if there is no guarantee for higher sales figures -- the discussion remains open. 2.5 A structured audio composition The following examples were taken from John Lazzaro's excellent tutorial on structured audio [4]. The following is a piece of SAOL code consisting of a sine oscillator instrument. global { srate 48000; // DAT-quality krate 2400; // 417 us } // // instr vtone // shaped sinewave // instr vtone (num) { // declarations // envelope settings ivar atime; // attack ivar rtime; // release // internal env state ivar attack; ivar release; ivar sustain; ivar a; // sets osc f ksig env; // env output asig x, y; // osc state asig init; // ********************** // computed during i-pass // ********************** // turns MIDI number into // oscillator constant a = 2*sin(3.141597*cpsmidi(num)/s_rate); // envelope computation atime = 0.3; // attack time (s) rtime = 0.2; // decay time (s) // computes envelope state // dur is an internal variable holding // the duration of the sound if (dur > atime + rtime) { attack = atime; release = rtime; sustain = dur - atime - rtime; } else { attack = dur/2; release = dur/2; sustain = 0; } // ********************** // computed during k-pass // ********************** env = kline(0, attack, 1, sustain, 1, release, 0); // ********************** // computed during a-pass // ********************** if (init == 0) { x = 0.25; init = 1; } x = x - a*y; y = y + a*x; output(y*env); } The following is a simple sequence (a SASL program) for the above SAOL program. 0 tempo 110 2 tempo 112.1 4 tempo 114 6 tempo 116 8 tempo 118 1 vtone 1.5 52 3 vtone 1.5 64 5 vtone 1 63 6 vtone 0.5 59 6.5 vtone 0.5 61 7 vtone 1 63 8 vtone 1 64 10 end 2.6 Key elements in promoting structured audio To be attractive to the end user, the classic requirements apply: there should be enough software to play the music, this software should be free, and it should exist for every operating system possible. One of the most crucial factors in the success of this new format, is how quickly pro-audio companies support this new technology, and do this without getting the composer involved too much in its low-level details. Composing music today is technically easy, and it has taken a while to convince musicians to use a computer system with sequencing software for creating music. Making musicians switch again to a completely new way of making music probably won't work, certainly not if they have to become a SASL programmer first. Creating SAOL instruments should be done using some visual synthesis tool, similar to the commercial software synthesizers in use today by professional musicians. It is also a fact that the success of this new audio format depends greatly on the way it deals with copy protection, and the price to pay for a song in this new format. Most likely copy protections will be analyzed and removed as it has happened in the past, unless something is done about the cost of a song and how easy it is to get a new song. The only way to battle piracy, as discussed below, is to find it source and to do something about the cause, not the symptoms. People don't do piracy when it's not worth the trouble. 3 Copyright issues and intellectual property 3.1 An overview of the music business today The music business today finds new musicians, promotes and produces their hard work, makes sure they get paid, licenses their work to other record companies, and if relevant, arranges live performances for the musicians. This is of course the official image the media shows of these companies, the truth is quite different: most musicians which deserve some attention don't get a deal because their music is not really interesting from a business point of view. This is probably the main reason why record companies just don't like MP3 and any future formats which have a similar effect on their sales, together with the copyright nightmares involved. They have been in this position for years, and sites like mp3.com have drastically changed their position in the market: they allow musicians to "break free". One of the main reasons today why musicians still rely on big record companies, is because of the incredibly expensive recording studios ($100,000 is no exception) these companies can afford. Most musicians can never get all this technology. He's making music in his bedroom, using at best a cheap 32-channel mixing console with some outboard effect processors, synthesizers, samplers and other (cheap) dedicated hardware. This situation was even worse a few years ago, when electronics in general were extremely expensive and definately not affordable by most people. Mpeg-4 structured audio might change all this: SAOL, the signal processing language specified in the standard, doesn't have any limits (your computer's processing power decides what you can do) by itself and is powerful enough to describe any existing piece of audio gear. This means that the shift from hardware audio processing to software audio processing (which is already clearly visible today: Nemesys Gigasampler already makes hardware samplers obsolete, while Native Instruments Reaktor questions the need for a dedicated hardware synthesizer) will continue to a much broader extent. In the future, it is very likely that musicians will be able to do whatever they want, just using a powerful computer (which becomes cheaper every day) and some software. Entire studios full of dedicated hardware will probably be replaced by a computer system which does everything, from recording the very first takes of an audio track to mastering. 3.2 Media piracy: source, consequences and solutions The mean reason why music piracy exists today is not because people don't want to pay for good music, or because all music is freely available anyway using tools like Napster, but because record companies missed some vital points in their business plan: the price of a CD varies drastically around the globe. For example, you can buy a CD in the US for $12, and pay over $20 for the same CD in a European country. On top of that, different versions of the same CD exist around the globe, probably censored or cut. And even if that's not enough, most CDs are not available around the globe at the same time (sometimes, a CD is available in the US over 6 months before it is in the record stores in Europe). The consequences are trivial: people don't accept this, don't wait when they will be able to buy the CD in their country, or don't want the (bad) european version, and certainly don't want to pay more than people in the US. An attempt has been made to add some copy protection to MP3, and new copy-protection enabled audio formats like WMA have been developed. However, one could ask if this will solve the problem. These formats have been around for some time, but nobody is selling music or buying music in such a copy-protected format. If you buy music, you want to be able to make a copy, even if it's just for backup purposes, in case the original gets damaged. The solution in stopping music piracy is probably not in raising CD prices or coming up with another copy-protection format, but making people aware of what they do, and find out why they copy music in the first place, and react to that. People might have no choice but to copy and distribute music, because of several geographic or social-related issues. Mpeg-4 structured audio is an even more complicated matter of course. A SAOL program is available in the bitstream, and if not encrypted, might be a huge copyright problem for pro-audio software companies which want to offer authoring support for Mpeg-4 structured audio. After all, their algorithms which have been established through years of costly research, are about to be exposed. On top of this encryption problem, there is the classic tale of copy protection. It has been proven several times in the past that there is no such thing as a bullet-proof encryption mechanism. For example, the encryption technique in DVD was revealed last year by a Linux enthousiast. The big question is if this can happen with Mpeg-4 structured audio as well. 3.3 IPMP IPMP stands for Intellectual Property Management & Protection, a framework proposed in the Mpeg-4 standard to battle illegal copying or reverse-engineering of content. The standard itself does not propose any means of protection, but a framework which can be used by implementors of Mpeg-4 players (or other software). The implementors can come up with their own IPMP systems, which can deal with content in a variety of ways through IPMP-descriptors and IPMP-elementary streams. A system is identified using its descriptor, and an IPMP-elementary stream contains things like decryption keys. Content can even contain watermarks, to detect illegal copies (every player can have its own unique ID, and when the content is streamed, a watermark can be added which indicates this ID). IPMP also allows management of rights, patents and royalties, for any type of content (even SAOL code), and allows for auditing of the content. It is clear that the success of piracy will be dictated by the design of these IPMP systems, which is beyond the scope of the Mpeg-4 standard. It's the application developer's responsability to come up with a way of protecting and controlling content. This is quite a different situation than with a format like DVD, where the encryption/ decryption is described in the standard and thus common for all DVD players. Also, it won't be easy to reverse-engineer the copy control technology in Mpeg-4 products, because this technology will probably be very different for each product on the market. Thus, the success of copy protection depends very much on the people who will design the IPMP systems. A dangerous situation (for the intellectual property owners) can arise when several companies sit together and come up with a common copy protection system, which is likely to happen in the near future. In this situation, the same will probably happen as with DVD. 4 A technological point of view 4.1 Mpeg-4 SA and FlowML The SAOL language described in the Mpeg-4 SA standard, is a low-level signal processing language which is very close to C in syntax. A discussion or evaluation of the language's possibilities and semantics is beyond the scope of this document, but it is interesting to note that designing a very complex SAOL program involves the same architectural challenges as in classic software engineering, and the paradigms which already exist to structure a piece of software elegantly, probably apply to a signal processing language too. This might sound a bit strange at first, but in theory it should be possible to recreate an existing hardware synthesizer in SAOL. For example, the Korg Wavestation is a vector- based synthesizer based on the Prophet range of synthesizers from Sequential Circuits, and has been out of production for some time. The synthesizer is based on wavesequencing and wavetables, and can probably be recreated as a SAOL program (that is of course not a trivial project). However, designing a SAOL program which is understandable, uses minimal computing power, and simulates the real thing is far from trivial, since electronic equipment is structured differently compared to a procedural program (this has been a fact in software engineering for some time, and a lot of research already exists on component-oriented programming, a paradigm which brings software design closer to hardware design). This observation together with the design issues of the QOrchestra project [1], indicated that it might be interesting to build a high-level component-oriented language (a format actually) on top of SAOL, to speed up the design process of a SAOL program, just like UML is used to design a large, complex piece of object-oriented software. This high-level language is called FlowML [2], and is a set of XML DTDs together with a specification of which standard high-level signal processing components are supported, and a mechanism to add language-specific components. The translation from a FlowML diagram to a SAOL program is trivial most of the time, and is a special case of translating a component-oriented program to a procedural program. 4.2 Operating systems and their influence It is clear that *nix operating systems prove their importance once again in the development and exploration of Mpeg-4 tools. It is a fact that most of the experiments involving Mpeg-4 technology are done on these operating systems, which guarantees that there will be a good basis for consumer-end products (this has always been the case) on desktop-oriented operating systems. Unfortunately it's a fact that software giants like Microsoft have the power to force their own audio formats with intellectual property management upon the masses, hiding the internals and know-how required for decent software development by 3rd parties once again. Mpeg-4 offers far more advanced features than a format such as WMA, but it might take some time before the first Mpeg-4 solutions with IPMP are available, certainly if not everybody cooperates and understands that standardization is a far more important issue than market share. 4.3 Existing tools This section gives an overview of open-source Mpeg-4 SA tools. 4.3.1 Encoders The reference software for the Mpeg-4 SA standard is "saolc", a program which can be used to create an Mpeg-4 bitstream from a SAOL program and SASL or MIDI score. The software is not meant to be used when performance is an issue, it's a reference implementation. It can also decode an Mpeg-4 bitstream. 4.3.2 Decoders Sfront is a great SAOL-to-C compiler, very fast and easy to use. It's a command-line program just like saolc, and it supports some nice features like real-time audio output and real- time MIDI input. It is meant to be a decoder in the first place, but it can encode too. Sarun is an Mpeg-4 SA decoder which uses internally a compiler from SAOL to an instruction set for a special virtual machine. It is being developed by Ross Bencina, and is still under development. 4.3.3 Authoring tools QOrchestra is a visual authoring tool for SAOL and FlowML. It allows you to create diagrams from high-level synthesis building blocks, and reuse these diagrams as new building blocks. It can save diagrams in SAOL and FlowML, and it loads FlowML. It is meant to be used by people who don't know anything about SAOL, still have to design synthesis instruments in a high-level way, and need a way to generate SAOL code for their instruments. 5 Conclusion Mpeg-4 SA offers a new way to think about music. Its synthesis language SAOL is as powerful as a studio full of expensive dedicated hardware equipment. In theory, SAOL allows you to create any type of music, but it is very clear that the "compression" ratio of Mpeg-4 SA depends on the type of music you want to encode. 6 References 1. QOrchestra: a visual authoring tool for SAOL, project specification. Bert Schiettecatte, 1999. http://qorchestra.sourceforge.net/ 2. FlowML: a format for audio synthesis diagrams. Bert Schiettecatte, 1999. http://www.flowml.com/ 3. Mpeg-4 homepage at the MIT media lab. http://sound.media.mit.edu/mpeg4/ 4. Mpeg-4 homepage at Berkeley. http://www.cs.berkeley.edu/~lazzaro/

home

music

news

opinion

software

tips