Linux MusicStation - low-latency mini howto ... Benno Senoner

Hot Rod your Linux Box for Audio! Video! whatever
MusicStation Intro

Jan 01 update
The 2.4.0 final kernel is out and Andrew Morton has a patch for it ... A patch against kernel 2.4.0 final which provides low-latency scheduling is at www.uow.edu.au~andrewm/linux/schedlat.html#downloads
Some notes:
- Worst-case scheduling latency with *very* intense workloads is now 0.8 milliseconds on a 500MHz uniprocessor.
For normal workloads you can expect to achieve better than 0.5 milliseconds for ever. For example, worst-case latency between entry to an interrupt routine and activation of a usermode process during a `make clean && make bzImage' is 0.35 milliseconds. This is one to three orders of magnitude better than BeOS, MacOS and the Windowses.
- Low latency is enabled from the `Processor type and features' kernel configuration menu for all architectures. It would be nice to hear from non-x86 users.
- The SMP problem hasn't been addressed. Enabling low-latency for SMP works well under normal workloads but comes unstuck under very heavy workloads. I'll be taking a further look at this.
- The supporting tools `rtc_debug' and `amlat' have been updated. These are quite useful tools for providing accurate measurement of latencies. They may also be used to identify the causes of poor latency in the kernel.
- Remaining problem areas (the Don't Do That list) is pretty small:
- Scrolling the fb console.
- Running hdparm.
- Using LILO
- Starting the X server
- Low latency will probably only be achieved when using the ext2 and NFS filesystems.
- If you care about latency, be *very* cautious about upgrading to XFree86 4.x. I'll cover this issue in a separate email, copied to the XFree team.
(Andrew Morton email to l-a-d) and also Ingo Molnar's series is updated... a new version against recent 2.4 kernels of my multimedia-lowlatency patchset is now available. These patches are the 2.4-adapted versions of my 2.2 lowlatency patch, which project has now reached an age of 1.5+ years. the lowlatency patch against 2.4.0-ac6 can also be found at: http://www.kernel.org/pub/linux/kernel/people/mingo/lowlatency-patches/lowlatency-2.4.0-ac6-A2
Sep 00 update
Benno Senoner's testing indicates that 2.2.15 16 and 17 are all fairly similar as far as latencies are concerned. The fastest is 2.2.10.
2.2.17 has been added to the list of lowlatency prepatched kernel RPMs. Info here
Aug 00 update
Now there's a 2.2.15 patched kernel RPM. 2.2.15 is less 'spikey' than 2.2.16. Also new is an Ingo Molnar patch for 2.4.0-test6-pre1.
You can now experiment with the LL kernel and ALSA fairly easily. Read the notes on Bennos' page and get the patched 2.2.16 kernel RPM. Get the kernel headers RPM and then the ALSA 0.5.7 (won't be doing 0.5.8 as it breaks some sequencers) drivers, libs,and utils RPMs. These were compiled with the patched 2.2.16 LL kernel. Note that these are i586 RPMs. The ALSA driver RPM's for 2.2.13 were i386 (available from the same place).
July 00 update
Latest news is that a small note by Benno has been added to cover recent developments which consist of LL patches for more kernels along with Andrew Morton's work on the 2.4.0 kernel. This latter has become known as the low-ish latency patch and is really quite good if kswapd isn't needed (you have large amounts of RAM). This work might make it into the kernel source. The very latest news is that there's now a 2.2.16 patched low latency kernel rpm.
Mar 00 update
This doc was first put up in Sep. 99. Now it's Mar 00 and it looks as if the low-latency patches won't make it into the 2.4.0 kernel so you'll need to continue to apply the hot-rod patch to achieve the latencies mentioned below. A discussion of the patch led Roger Larsson to write a short explanatory piece about the merits of various alternatives and it is here.
The information below is an exciting development for Linux as a multi-media platform. What it will enable you to do is have the sort of control over media streams that people currently expect of purpose-built OSs such as BeOS - and the speed too. All that and it's still Linux with all its other packages and developers.
In order to achieve this you will need to be able to patch and compile a kernel, and then you may need to run a little thing that sets SCHED_FIFO policy or, if you're developing your own apps, insert a code fragment. Some apps will run SCHED_FIFO "out of the box" right now, when run as root or suid. You'll need to bring your kernel up to 2.2.10 before you start. The kernel patch has been thought about and tested extensively. You should still exercise caution ... but you weren't going to put it on someone else's mission-critical-for-something-else-box anyway :)
Also, certain usages of the hdparm command have been known to corrupt data on very old IDE drives. You should check that before you start.
A related document is Paul Winkler's Audio Quality Howto.






Low-latency mini-howto by Benno Senoner
 (sbenno@gardena.net) 09/09/1999



This document describes the "low-latency" capabilities of Linux plus
appropriate patches. It focuses mainly on realtime audio applications,
but you can apply the concepts to other realtime apps as well.

Sometime ago I noticed that heavy system load caused audio players
to skip during playback which was very annoying.

I tracked down some sources which might cause audio drop-outs:

Hardware related: 

(solution to avoid skips: buy new hardware)

- old mainboards which monopolize the bus during disk I/O,
  preventing the CPU from accessing the PCI/ISA bus for several msecs.

- some PCI graphic cards which hog the PCI bus for several msecs
  due to large burst transfers (to gain some graphic performance).
  For example old PCI Matrox cards had this problem.
  Newer G100/G200 cards seem to work flawlessly.

- devices which require busywaiting polling (especially high bandwidth ones)


Software related:

(in most cases you can avoid skips by tuning the kernel/user-apps)

- The main cause of skips are kernel routines which do a lot of work
  ( = several msecs) without returning the control to the scheduler

  Ingo Molnar found the main latency sources:

  - disk buffer cache
  - memory page handling
  - proc file system
  - vga/console handling
  - forking / exiting of large processes
  - keyboard driver
  
   

Ingo made a patch which does conditionally reschedule high
priority processes if needed.

On a regular Linux kernel during high disk I/O the kernel freezes
all processes up to 150ms on my box ( PII400 + IBM 16GB EIDE UDMA disk).
On older hardware the freezes could last much longer up to several seconds,
especially on old non-DMAed disks.

This means that on non-DMAed disks you are unable to play skip-free
audio during heavy disk I/O, even by using the full 64KB buffer
present on most soundcards.
64KB at 44.1kHz stereo 16 bit = about 370ms , and since 
the drop outs could last up to 1-2 secs, you will hear a drop out.

Using DMAed EIDE disks, the freezes are <150ms and therefore
the 64KB buffer is enough to ensure skip-free audio.

But 300ms is far from low-latency,
because low-latency means "near realtime".

To analyze the realtime behaviour of userspace processes under Linux , 
I wrote a tool called "latencytest"

You can get the tool on my audio page at:
www.gardena.net/benno/linux/audio


It plays a testsound , and while playing it measures the time the write()
call to /dev/dsp takes on each iteration.
The data is presented as a graph with the ideal latency
(time it takes to play 1 audio fragment), and the real measured latency.

During the playing the app tries to stress the system as much as possible,
by generating heavy disk I/O , graphics output and /proc filesystem stress.


Ingo's patches are amazing:
With an audio buffer of only 3 fragments of 256bytes =4.3ms,
the latencies went down from 70-150ms to *ONLY* 2.9ms, 
but most of time the latency stays in a range of +/-500usecs from
the nominal latency (in the test case 1.45ms = time to play 1 fragment)


With this kind of low-latencies realtime audio becomes possible 
on Linux, by using regular SCHED_FIFO (POSIX soft-RT api) scheduled
processes.

How well does Linux plus low-latency patches compare to other OS's?
Very well! Windows has many problems in the low-latency arena,
except you utilize dirty hacks like IRQ re-programming etc.

I tested 2 realtime synths on Win98:
( PII400+ 256MB RAM 16GB EIDE UDMA disk + SB AWE64 Gold)
N.I Reaktor and Seer Reality -

Reaktor works somewhat reliably on latencies >= 20ms ,
Seer has a much lower latency but blocks other DirectX audio apps
(you can't play your MP3 on the 2nd soundcard while using Reality)

I was able to generate little sound skips on both apps, by stressing
the Win98 box very much (heavy disk I/O etc).

The low-latency performance for Linux with Ingo's patch now even comes
very close to BEOS which was designed with realtime-multimedia in mind.
BEOS is said to be able to play reliable 3ms-latency audio.
On my box using Linux, I was able to play 4ms-latency audio very reliably.

On faster processors and/or SCSI systems you could even get lower latencies
in the range of 2ms or so.

If the patches go into the mainstream kernel then Linux will be a really
good MULTIMEDIA OS.

One example:
Take the case of Steinberg's Cubase:
It's a great MIDI sequencer/harddisk recorder/effects box,
but it has one problem on Windows: latency

That means if you change parameters in the EQs or volume sliders,
you will hear the changes 50- several msecs later.
Cubase needs large audio buffers to prevent drop-outs and skips
which are not acceptable in the pro-audio sector.

Linux + low-latency patches, is now able to run the following in parallel
without glitches and VERY low latency ( about 5ms on my box) :

- a MIDI sequencer /harddisk recorder
- effects plugins 
- low-latency software synthesizers / samplers


That means: if you change parameters on your FX-plugins you will 
hear the changes immediately,
and the soft-synth would feel like a hardware counterpart,
with very tight Note-on response.
You could even use your PC as a cheap Effect processor,
WHILE using your box to surf the WEB !

This is what pro-audio folks are demanding: a low-latency , reliable
audio environment in the PC.

( David Olofson is even working on a RT-Linux audio engine which could 
allow latencies down to the hardware limits ( <1ms) )


HOW TO GET RELIABLE LOW-LATENCY IN YOUR AUDIO APPLICATIONS:

Download the lowlatency-2.2.10-N6B.patch from 
www.gardena.net/benno/linux/audio
and now (Nov. 99) there is a patch for 2.2.13 available from the same
place.

If you have IDE disks tune all your disks with
hdparm -m 8 -d 1 -u 1 -c 1 /dev/your_hd

This is very important, since when I run IDE disks
in non-DMA mode, even with Ingo's patches I can't go
below the 15ms on my box. (Which is still good compared to Windoze :-) )


Run your sound application with realtime privileges by using 
the SCHED_FIFO scheduling policy.

Note:
1) Some apps already have SCHED_FIFO built in (eg XMMS). Run as root or suid.
2) If you're building your own app or hacking you'll need something like
   the code immediately below.
3) In other cases you will need to set SCHED_FIFO with the little utility
   included towards the end of this document.

Summary:
Apply patch + tune HD and if the app doesn't set SCHED_FIFO as default,
set it with the little utility.


For example you could set the max RT priority for your audio app
by calling the set_realtime_priority() function below.

----
#include <sched.h>  
int set_realtime_priority(void)
{
struct sched_param schp;
        /*
         * set the process to realtime privs
         */
        memset(&schp, 0, sizeof(schp));
        schp.sched_priority = sched_get_priority_max(SCHED_FIFO);

        if (sched_setscheduler(0, SCHED_FIFO, &schp) != 0) {
                perror("sched_setscheduler");
                return -1;
        }

        return 0;

}
-----
This code could be cut and pasted into an existing app.
just call the function set_realtime_priority() before                      
you do your low-latency stuff.                                                   
                                                                                
The sequence could be the following:                                            
                                                                                
init_audio_device()                                                             
do any non-realtime related stuff here...                                          
set_realtime_priority();                                                        
while(1)
{                                                                               
  do_your_realtime_computations();                                              
  do_audio_io();                                                                
}

Alternatively, you could set SCHED_FIFO policy to the pid of an
already running process by running something like the following (as root)...

#include <stdio.h>                                                              
#include <stdlib.h>                                                             
#include <unistd.h>                                                             
#include <sched.h>                                                              
                                                                                
int main(int argc,char **argv)                                                  
{                                                                               
  struct sched_param schp;                                                      
  pid_t  pid;                                                                   
                                                                                
  if(argc !=2)                                                                  
  {                                                                             
    fprintf(stderr,"error, give the PID of the process to be scheduled          
SCHED_FIFO as argument\n");                                                     
  exit(1);                                                                      
   }                                                                            
  pid=atoi(argv[1]);                                                            
  if(pid <=0) exit(1);                                                          
                                                                                
        /*                                                                      
         * set the process to realtime privs                                    
         */                                                                     
        memset(&schp, 0, sizeof(schp));
        schp.sched_priority = sched_get_priority_max(SCHED_FIFO);               
                                                                                
        if (sched_setscheduler(pid, SCHED_FIFO, &schp) != 0) {                  
                perror("sched_setscheduler");                                   
                return -1;                                                      
        }                                                                       
                                                                                
}                                                                               
--------
But be careful, if a SCHED_FIFO app does long computations (maybe for minutes or
so), your entire system will freeze until the app finishes the computations.     
Try this by setting the gimp to SCHED_FIFO policy, and rescale an image to a     
factor of 4x or so. You will notice that your box will freeze until the operation 
is completed.


AVOID ANY DISK I/O in your low-latency thread since you will never be 
able to meet the strict deadlines of <5ms.

If you plan to stream audio from/to disk, use a 2 threaded model:
the player thread running at max priority, and the disk thread running
at lower priority and exchange data by using shared memory or pipes.

Only by using this approach you will be able to implement your 5ms latency
harddisk recorder reliably.


If you use an "earliest deadline first" approach , you will even be
able to run several low-latency apps concurrently as long
as the sum of CPU usage of each app doesn't exceed the 100%
( better to stay below the 80% to leave some room for the GUI and 
the other OS functions).

The right "earliest deadline first" approach is to let run the
application with the shorter processing cycles, at higher priority
than apps with longer processing cycles.

Let's take an example:

You have 2 soundcards:
on the first you want to run your soft-synth with 5ms latency and 
on the second you want to run your mp3 player.

Assume that the mp3 player uses fragments of 20ms.

Therefore you should run your soft-synth at maximum priority since
it uses smaller fragments.

Running the synth at higher priority than the mp3 player, will
ensure that the app with the shorter processing cycle time can
interrupt the other when the CPU is needed to meet the earliest
deadline.



look at my audio page
www.gardena.net/benno/linux/audio
for latest news, benchmarks results , patches , latencytest tools.


comments , ideas and suggestions welcome.

Benno Senoner

July 00 update

The problem is that true low-latency apps are still a work
in progress and before we have sketched out a spec
(and implemented a working application)  about the
rtsoundserver model which runs the apps as plugins in
order to minimize latencies, it does not make too much
sense to describe the perfect low latency programming approach.

As for kernel ussues, the 2.2.15 patches are more stable than
the previous ones and may even work on 2.2.16.
(But I have to ask Ingo about this).

As for 2.4, Andrew Morton is working on his own
non-kludge patches, and perhaps a part of them will get integrated
into later 2.4 versions, but they will not deliver the latencies
of Ingo's patches (2msec) but will stay around 5-6msec.
That means they will be ok for most audio apps except
those that want 3msec MIDI note-on to audio latency,
eg. those that implement a software synth/sampler with excellent
response (at par with hardware instruments).
For those apps, even if Andrew's patches will get accepted,
an additional patch (which adds more rescheduling points) will
still be needed in order to provide the required performance.
At that stage we could provide kernel RPMs and DEBs so that
audio users could easily install the lowlatency kernel with one
simple command.