Linux Audio Programmer Paul Davis talks about his collection
of apps ...
|
Paul
Davis has put a lot into the Linux audio scene. He has written a
number of apps, libraries, and drivers such as the ALSA driver for
the RME Hammerfall. Here we chat to him about aspects of Linux audio
programming.
|
related links:
lad
las
quasimodo
ardour
softwerk
Philadelphia
|
M station:
>Having said that I'd ask about ardour and quasimodo perhaps we could
>lead in with advice to kids who think they'd like to do audio programming in a
>comprehensive sort of way. I'd say, first learn a language (I'd go for C or C++),
>then read some books and some code. Do you agree? Which books would you
>recommend? I guess you'd probably side with C++ as the language. Do you think
>Object hierarchies are easier to suss out than function libraries?
Paul:
Well, it depends a little bit on what kind of audio programming. if
you want to write plugins (LADSPA, VST, whatever), then its probably
enough to learn a language (and yes, C and C++ are the obvious choices
because of speed and integration into existing systems), read any of
the textbooks on DSP (most of them have something good to offer), and
get coding. The way plugins work means that you can focus almost 100%
on your DSP algorithm(s) and experiment much more easily than if you
were writing an entire application.
On the other hand, if you want to design more elaborate software (by
analogy with plugins, lets call them "hosts"), then knowing a language
is only half the challenge. You need good system design skills. I've
never thought of myself as a particularly brilliant low level coder -
I still have to use bc(1) to figure out binary<->hex conversions - but
I think that the skills I have acquired designing and then
implementing medium to large scale systems is extremely useful. You
can't get that from a book, but if you could, then the bible of
contemporary OOP programmers, "Design Patterns" is a good place to
start. For host systems, I (obviously) do prefer C++. I find it very
useful to have the compiler take care of a lot the stuff I would have
to do by hand in C, and I also like the enforced encapsulation and
public/private features that the language brings to my code.
I don't think that OOP makes it necessarily easy for a non-author to
understand the code, but I do feel that it makes it easier for me (the
author) to keep the structure in my head and manipulate it. I find
class heirarchies more plastic and malleable; function libraries work
for me when the semantics of each function are simple (such as the
basic POSIX API), but not as a way of wrapping up complex
internals.
Finally, you also need to really understand what real-time programming
means, grounded in the details of whatever OS you're working with. For
Linux, that means having a firm grasp of the relationship between the
kernel and an application, POSIX RT scheduling, the poll(2) system
call, how virtual memory and malloc work and many other "advanced"
topics.
>On the plugin side beginners might be confused at first about where a host
>ends and their plugin begins .. the definition of a plugin. I guess the best
>place to start there is the docs for LADSPA or whatever.
In the context we're discussing, a plugin is a piece of code that does
either generates or processes data (i.e. audio samples). It doesn't
talk to an audio interface, or any of that complex stuff, and its not
an entire application. A host is a complete application that *does*
talk to an audio interface. It can load plugins, organize them into
processing networks, and causes them to do their thing at the right
times, feeding them data from the audio interface, and passing their
results back to it. The host is a *vastly* more complex program than a
plugin, at least in practice. Theoretically, a plugin could do
anything, but in the real world, they implement things like delay
lines, waveshapers, companders, multiband EQ's and so on.
>What I was trying to get at was that where a person has limited knowledge
>and thinks writing a plugin is a good idea, they might have some difficulty
>working out what they can go ahead with ... but I guess it would be true to
>say that if you don't know yet, you need to read more
>ardour and quasimodo - could you say a little about their genesis?
Quasimodo: I would best refer the reader to
quasimodo.org/intro.html
which describes where Quasimodo came from.
Ardour: I started hearing about the RME Hammerfall on various mailing
lists I was on. It sounded pretty amazing. In November of 1999, I
decided to buy one, and proceeded to write the ALSA low-level driver
for it. Once that was working, it became clear that no existing Linux
audio software could really use the card, and moreover, no existing
Linux audio software was really up to the task of use in a
professional/commercial recording studio. In fact, the situation was
even bleaker than that: it was hard to find any Linux program that
could even handle the idea of a multichannel card, or a card with more
than 16 bits of sample precision. So, I decided that it was time to
fill this gap. At the same time, I was getting into an association
with a friend who owns a commercial studio, and we wanted to try to
use Linux and open source software there. We have 3 Alesis M20 ADAT
recorders (about $5K for each recorder), and the first goal was to
produce some software that could replace the tape machines. That goal
was satisfied relatively quickly, but it became clear that doing so
was actually fairly useless for our purposes, since you couldn't do
anything with the result except play it back as-is: no existing Linux
audio editor could handle 24 channels at 500MB+ per channel. After
Bill Schottstaedt did some preliminary and excellent work to see if
his editor "snd" could do this, I reluctantly came to the conclusion
that it was impossible to retrofit this kind of operation into a
program based around 16 bit stereo audio interfaces. Ardour's scope
then expanded to the point where its "only" goal is to do everything
that systems like Digidesign's ProTools, Sek'd's Samplitude and
Emagic's Logic systems can do, and preferably more.
>What do you think of MPEG4 and such things as SAOL? Have you had a play with it
>at all?
I have no opinion on most of MPEG4. MPEG4-SA (Structured Audio) is the
part that involves SAOL. I think that Eric (Scheirer) did two great
things with SAOL: first, he did some rationalization of the Csound
orchestra language and second he worked very hard at getting SAOL
included as part of the MPEG4 standard. However, I regret that the
language that has been included is so old fashioned - both SAOL and
Csound (examples of each are sometimes indistinguishable from one
another) are based on the kinds of programming languages that were in
use in the 1970's and early 1980's. In addition to the general
syntactic structure of the language, Eric retained a feature of Csound
that I think is a real problem - the distinction between audio signals
and control signals. In most hardware-based systems I can think of,
the really cool stuff tends to be the systems that allow you to mix
and match such signals to see what they do. This may seem like a
nitpick, but this is a language that by several accounts is going to
end up being poured in silicon and distributed as part of mass market
consumer items. Barry Vercoe's original introduction of the
audio/control distinction was a brilliant addition to the Music N
languages, but I firmly believe that its wrong to retain it. To try to
be a little more positive, I would have been much happier to have seen
something like James McCartney's SuperCollider language be
adopted. However, SC is not "close to the machine" in any way, and
that would have violated one of Eric's main goals in designing SAOL.
I've tried using the sfront SAOL-to-C compiler that John Lazzarro and
friends have been working on. Its an impressive piece of work, though
my opinion is sullied by the fundamental problems I have with the
language.
>In relation to writing quasimodo to use a dual CPU setup,
>is there a huge difference in approach in writing for multi processor
>or is God in the details? Do you actually code at the processor level or
>does the compiler handle most of that?
God is mostly in the details. There are several keys things to take
into account. The most significant is that more or less all X Window
based GUI toolkits are not naturally multithreaded. This means that
you must choose between an approach that uses explicit locks around
every call to a toolkit function and one that ensures that only a
single thread is responsible for making those calls. The second most
important issue is that the thread that handles data movement to and
from the audio interface must be "real time" (in the sense that it has
to meet deadlines based on an external clock ticking). In practical
terms, this means that this thread should avoid more or less 100% any
system calls, any calls to malloc or free, any thread synchronization
operations (mutexes, condition variables) and so on.
>Is Quasimodo being used mostly to produce 'academic' computer music?
>As an analogue synth it strikes me that it could be used in all kinds of
>music.
To be fair, Quasimodo isn't being used to produce any music right
now. But its design is not intended to be limited to any particular
musical form. I generally use it as an FX processor.
>Those of us on l-a-d were witness to just how hard it was to get those
>streaming rates up high enough to cope with all the data. I gather it's
>worked out very well in the end... did you end up using any of the
>low-latency hacks to help achieve this or are you using a standard kernel?
The low latency hacks aren't necessary for the data streaming, but
they are absolutely vital to get the thread that handles the i/o to
the audio interface to work reliably. I always run a low latency
kernel - right now, I use 2.4.0 patched with Andrew Morton's
(excellent) "lowish" patch, and I keep meaning to move on to 2.4.1
which appears to be even better. The only thing that matters for the
data streaming is the overall performance of the disk subsystem. The
thing that I am happiest about is that I managed to get the desired
performance without using a special filesystem and with standard audio
files (Ardour currently uses RIFF/WAVE format audio files with IEEE 32
bit floating point data). Despite the fact that this is Linux, many of
the guidelines for audio programs on Windows and the Mac will still
apply: shutdown unnecessary applications, use a dedicated disk,
etc. That said, I tend to run Ardour with Netscape, several rxvt's,
xosview, and a huge emacs process all active, and its fine (though I
have a dual CPU system).
>What is the status of the inbuilt editor now?
Well, its there. But I think the design needs a drastic redoing. When
I started, I was working from models like snd and sweep and
soundforge, programs that work fundamentally at the waveform level. It
turns out that this is inappropriate for "arranging" pieces of
music. I was very resistant to the ProTools "Region" model (known as
"Objects" in Samplitude), but as time has continued, I have come to
understand why PT (and most other similar tools) use it. It won't take
a big reworking of the internals to switch to such a model, but the
GUI will need quite a lot of changes. Being able to drag rectangular
regions around is non-trivial in X, for example.
>Pro Tools is certainly a nice functionality set to aim at. There's a
>few man-years of work in there! How far along the road do you think
>you are with ardour?
Its hard to say precisely. We have no MIDI recording/playback at this
point at all, though the plan is to incorporate another GPL project,
probably MidiMountain, into Ardour itself. I would say that if the
target is ProTools 5.0 (TDM-based) without MIDI or automation, Ardour
is about 70% of the way there. The recording/playback aspects of the
program are more like 99% there, the mixer parts are about 85% there,
and the editing parts are more like 50% or less. We also have a few
features that PT doesn't have, such as per-track looping, and an
implementation of Bill Gribble's excellent idea for metering
clipping. Automation work has just started recently, though it turns
out that it started off as a meander down the wrong track.
>What's the future for ardour?
I am having to consider ways of making money from what I
do. The way that my divorce settlement has worked out doesn't leave me
in any immediate problems, but combined with the recent stock market
situation in the US, my original thoughts of "never having to work for
money again" have changed to something more like "should maybe consult
from time to time to avoid drawing down capital". this is a
challenge. i have no desire to get involved in "support", and i don't
know of any business models for GPL'ed software. its an interesting
problem. once ardour gets closer to protools functionality, i think
that some heads will start turning (perhaps even rolling), and it
might be easier to convince h/w manufacturers to start paying
attention to what i and the rest of the linux audio+midi software
community are doing. that might be an interesting avenue to take
towards a revenue model for this kind of stuff. imagine that you buy
an audio interface and a control surface and get Ardour "for
free". but that, and other possibilities, really do hinge on getting
ardour "finished".
>Thanks for your time Paul.
|