Musical collaboration over the internet has been something that people have dreamed about since its inception. Various programmes have attempted to make it a reality but shifting large quantities of sound data requires big bandwidth. One solution is to cut back on the amount of data that is transmitted and the Structured Audio part of the MPEG4 standard looks very promising for that purpose. Here John Lazzaro and John Wawrzynek look at what's involved.

For background information on this subject read John Lazzaro's Introduction to Structured Audio
and also Bert Schiettecatte's article on MPEG4.

Network Musical Performance

By John Lazzaro and John Wawrzynek, CS Division, UC Berkeley.

In this article, we take a brief look at Network Musical Performance, and describe ongoing work in the field.


A Network Musical Performance (NMP) occurs when musicians in different locations interact over the Internet, to perform as they would if located in the same room.

An NMP system unavoidably introduces time delays between the musicians, due to the network latency of the links connecting the players and the local latency at each host. The total latency must be kept reasonably short for the NMP system to be usable.

However, some latency is always present in conventional musical performance -- the acoustic latency due to the speed of sound.

One way to think about NMP is to consider the physical separation between network hosts that would yield the equivalent acoustic latency between players in a room. For example, Internet data packets travel 40 miles from the Stanford University campus to the UC Berkeley campus in the time it takes for sound to travel 2.4 feet.

However, the quality of NMP depends on the total system latency: network delays plus the local latency at each host. If we take host audio and network latency into account, we find a total delay between Berkeley and Stanford that corresponds to a musician separation of about 7 feet, a typical distance between two players in rehearsal.

Resiliency vs. Latency

In many networks, occasional packet delays and losses are inevitable, as other users transiently consume resources. Internet telephony copes with congestion delay by using large audio buffers at the receiver.

Buffer delay is tolerable for telephony, but is not acceptable for network musical performance. A key research issue in NMP concerns the design of systems to handle lost and late packets gracefully, using methods that do not increase total system latency.

Audio Coding

One approach to NMP starts with the basic idea behind Internet telephony -- sending real-time audio streams between hosts -- and modifies it to work well for low latency. The first Internet performance of this type dates back to 1991.

More recently, work by the SoundWire project at CCRMA (part of Stanford University) focuses on low-latency, professional-quality audio streaming for NMP. The SoundWire project is also interested in understanding the maximum latency that musicians can tolerate during performance.

Gestural Coding

Internet telephony users are well aware of the audio artifacts produced during periods of network congestion. Audio coding can be brittle in the face of late and lost packets, because graceful recovery is difficult to perform on interrupted audio waveforms. If the system was aware of the phonemes and words being spoken, perhaps it could do a better job at handling disruptions.

This observation also holds for NMP. Concealing packet loss is easier if the musical performance is sent across the Internet at a higher level of abstraction, that describes the physical gestures musicians use to manipulate their instruments.

In this model, each host should execute identical audio signal processing algorithms to generate the sounds of the instruments played in the session, under the control of local and network gestural data. Gestural data sent across the network should be tagged with timestamps and sequence numbers, and should include contextual information about recently sent gestures, so that late and lost packets can be detected and concealed.

Using our software synthesizer sfront as a platform, we have have implemented a system for network musical performance based on gestural coding.

In this system, the musicians play electronic instruments that produce MIDI control data. MIDI data is sent to the remote players, using a resilient coding to protect against packet loss. Sfront clients running on each host turn both local and remote MIDI data into sound, and use knowledge about the gestural coding to handle late and late packets gracefully.

Our system is based on the RTP and SIP networking standards from Internet Engineering Task Force, and the Structured Audio standard from MPEG 4. This web page describes our system in detail, and includes pointers to software downloads and research papers.

Other research groups and companies are also working on gestural network coding for musical applications. The Open Sound Control project at CNMAT (part of UC Berkeley), and the WebDrum project by SoftSynth, are two interesting examples.


The modern Internet has the sufficiently low nominal latency to support Network Musical Performance over interesting distances. Many practical issues remain to be solved, but the early results are positive. We encourage you to download our Network Musical Performance software and see for yourself!

Copyright 2001 John Lazzaro and John Wawrzynek. Non-exclusive, royalty-free license to publish granted to John Littler.

Back to top


post to Delicious Digg Reddit Facebook StumbleUpon

Recent on Mstation: music: Vivian Girls, America's Cup, music: Too Young to Fall..., music: Pains of Being Pure At Heart, Berlin Lakes, music: Atarah Valentine, Travel - Copenhagen, House in the Desert

front page / music / software / games / hardware /wetware / guides / books / art / search / travel /rss / podcasts / contact us