1 Mike Raznick NYU | Digital Audio Processing II Final Research Paper Digital Audio And The Linux Operating System This paper explores the increasingly popular Linux operating system as a free, viable alternative to other operating systems such as Microsoft Windows or Macintosh OSX for the use of professional audio applications. Specifically, discussion will focus on the history behind Linux as well as the varying layers of applications that support digital audio within the Linux operating system. In order to fully understand the success of Linux, it is important to define the word “free” in the context of Linux and open-source development. Although “free” does imply that software can be downloaded and used without any required monetary payment, the term “free” in this case more importantly refers to “restriction-free”. In this sense, open-source (open sharing of the source code) can be thought of as a type of shareware system where “payment” consists of active development community participation, often leading to the release of bug fixes and improvements back into the public/user community. “Any motivated individual can contribute to the product's development and can inspect all aspects of the underlying code, which is not possible with most commercial products.” Open source therefore “relies on a volunteer development community's willingness to share all improvements to the code with the rest of the world.” 1
22
Embed
Digital Audio And The Linux Operating System · release of the Advanced Linux Sound Architecture (ALSA) API into the 2.5.x development cycle of the Linux kernel. The ALSA sound driver
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
Mike RaznickNYU | Digital Audio Processing IIFinal Research Paper
Digital Audio And The Linux Operating System
This paper explores the increasingly popular Linux operating system as a free, viable
alternative to other operating systems such as Microsoft Windows or Macintosh OSX for
the use of professional audio applications. Specifically, discussion will focus on the
history behind Linux as well as the varying layers of applications that support digital
audio within the Linux operating system.
In order to fully understand the success of Linux, it is important to define the word “free”
in the context of Linux and open-source development. Although “free” does imply that
software can be downloaded and used without any required monetary payment, the term
“free” in this case more importantly refers to “restriction-free”. In this sense, open-source
(open sharing of the source code) can be thought of as a type of shareware system where
“payment” consists of active development community participation, often leading to the
release of bug fixes and improvements back into the public/user community. “Any
motivated individual can contribute to the product's development and can inspect all
aspects of the underlying code, which is not possible with most commercial products.”
Open source therefore “relies on a volunteer development community's willingness to
share all improvements to the code with the rest of the world.”1
2
Before Linux
Linux is an operating system that is similar in design and functionality to the UNIX
operating system. Linux was primarily created to provide its developers and users with a
free system that rivaled traditional UNIX systems. UNIX, the predecessor to Linux, was
first developed in the late 1960s at the Bell Laboratories research facilities with the
purpose of creating a “single, scalable operating system” that would exist “on all
computers”2 for the needs of the company. UNIX was written in the C programming
language. By the 1970s, C had become increasingly popular for
solving a variety of development tasks and was therefore
supported by a variety of computing environments. Because of
this, the latter part of the 1970s saw many types of computers
running ported versions of UNIX. While UNIX was distributed
widely and proved easily portable due to its C implementation,
ownership of the source code was retained by AT&T. This
meant that licenses for the use of UNIX had to be purchased at enterprise-level prices that
were clearly out of the range of most users and enthusiasts. Furthermore, these licenses
prohibited redistribution, the making of derivative works, as well as any sort of
evolutionary improvement of the code-base by a community of programmers.
In 1984, Richard M. Stallman, an employee at MIT’s Artificial Intelligence Laboratory,
launched the GNU project with the intent of developing a free operating system called
GNU (a recursive acronym for “GNU’s Not UNIX). The GNU project promised to allow
computer users the “freedom to run, copy, distribute, study, change and improve the
3
software.”3 Stallman set out by programming a set of tools that were designed to work
without requiring further modification to an existing UNIX system. Additionally,
volunteer programmers were recruited to architect and construct additional tools
necessary for the project. Throughout the 1980’s, the set of GNU tools became widely
accepted by UNIX users throughout the world. The most important concept behind the
GNU project was that it provided the freedom for all interested parties to able to improve
a software program and then release these improvements to the public so that the entire
community could benefit, leading to added “stability, reliability, and maintainability of
the tools.”4
During this period, the University of California at Berkeley also commenced the design
and implementation of a new version of UNIX that maintained free distribution in the
academic world. This version of UNIX, which came to be known as BSD UNIX, also
treated the original version of UNIX created by AT&T as a design standard. However,
the range of application for BSD UNIX was limited by its license terms. Furthermore, all
hardware specific BSD code was considered proprietary and was therefore not released in
the distribution of the source code. This meant that a working operating system could not
be built for a variety of computers using BSD, which therefore was not useful as a
universal operating system for large-scale community development purposes.
History of Linux
Between 1989 and 1991, Linus Torvalds, a computer science student at the University of
Helsinki in Finland, developed a project where he adapted an existing tool known as the
4
MINIX kernel. The MINIX kernel has been commonly used in academia as an example
for teaching courses in Operating System development. Without at first realizing it,
Torvalds found himself transforming the MINIX kernel into a usable kernel that was
adapted for UNIX on the Intel x86 processor, found in today’s PC home computers.
Torvalds, realizing that in order to increase the usability and eventual success of his
newly named Linux kernel, also adjusted his operating system design and architecture
decisions so that existing GNU components would be compatibility with his kernel. This
was important because for the first time, the free GNU components were bridged with a
new operating system that would additionally be free. Torvalds therefore created the
framework for a completely free software package that was also completely self-
contained.
In 1991, Linus Torvalds released the first version of Linux to the Internet community. At
the initial time of release, Linux, although “sketchy” in its implementation, existed as a
free software kernel for a UNIX-like operating system that was developed especially for
personal computers. Linux was “fully compatible with and designed convergent with the
large and high-quality suite of system components created by Stallman's Project GNU
and distributed by the Free Software Foundation.”5 Because Torvalds made the decision
to release the Linux kernel under the Free Software Foundation's General Public License,
any software engineer who was interested in contributing to the further development of
the Linux kernel could feel confident that their contributions would result in permanently
free software. This meant that any software written for Linux would not be subject to the
similar fate as past, proprietary products that were also widely available. Of equal
5
importance was that all interested parties would have the opportunity to test and
scrutinize existing as well as new source code so that it could be shared, consistently
improved upon, and redistributed. Furthermore, the development of the Linux kernel
proved an early example where the Internet could successfully bring together many
thousands of part-time developers, resulting in a software development project involving
well over one million lines of code. This scale of unpaid collaboration was previously
unprecedented among such a geographically dispersed group.
The release of version 1.0 of Linux for the first time represented a usable, production-
level kernel. Version 2.0 was subsequently released in 1996, and by 1998 version 2.2 was
expanded to include support for a variety of machine architectures, in addition to the Intel
x86 processor family. During this time, GNU/Linux and Microsoft’s Windows NT
operating systems remained the only two operating systems that saw consistent gains in
market share. A Microsoft-based assessment of the credibility of Linux in October 1998
stated that "Linux represents a best-of-breed UNIX, that is trusted in mission critical
applications, and - due to it's open source code - has a long term credibility which
exceeds many other competitive OS's."6
Today, a variety of distributions of the GNU/Linux system, based on the kernel Linux
developed by Linus Torvalds, are in widespread use all over the world. “The number of
GNU/Linux system users is currently estimated to be around 18 million.”7 Some of the
more popular Linux distributions include: the Fedora Project, SuSe (pronounced “SUZ-
eh”), Linux, Mandrake Linux, Red Hat, Yellow Dog, Debian, Slackware and Licoris. A
6
good source of information regarding the various distributions is the LinuxISO.org
website.
Linux Support for Digital Audio
As early as 1992, the first audio applications programming interface (API) was developed
for Linux by Hannu Savolainen. The Open Source System Interface (OSS/Free)
initially included support for the basic SoundBlaster-compatible devices. Supported
features included: the ability to perform PCM audio recording and playback, MIDI input
and output, as well as an audio device mixer. The OSS/Free kernel API was included as
part of the Linux distribution. Over the next number of years, OSS/Free provided
additional, functional support for a number of advanced features, including on-board
synthesizers and full-duplex recording. This was due, in part, to the donation of required
specifications by a few companies that specialized in digital audio. The majority of
professional digital audio hardware manufacturers, however, were unwilling to provide
access to relevant specifications and documentation that they felt were proprietary to their
product offerings. This resulted in Linux software applications developers not having the
means for architecting high-performance sound software that would be compatible with
professional audio hardware.
Hannu Savolainen once again assisted in bringing together the Linux community and
digital-audio hardware manufacturers by forming a company that would sign non-
disclosure agreements (NDAs) in order to provide Linux audio developers with the
information necessary to create a greater collection of professional-audio sound drivers.
7
Drivers could then be released in the form of commercial software where certain
components existed as binary-only closed-source. This package, known as the
OSS/Linux commercial driver package, could be thought of as a direct descendent from
the OSS/Free API. OSS/Linux currently supports a number of professional audio boards
and chipsets. For example, drivers for the M-Audio Delta series of multi-channel audio
boards, support for the Creative Labs Sound Blaster Live and Audigy sound cards, as
well as drivers for the RME Hammerfall series of professional digital audio boards have
since been officially supported.
The next important event that significantly enhanced the Linux operating system in its
support for professional-grade digital audio applications occurred in 2002 with the
release of the Advanced Linux Sound Architecture (ALSA) API into the 2.5.x
development cycle of the Linux kernel. The ALSA sound driver was originally written in
order to replace the Linux kernel sound driver for Gravis UltraSound (GUS) cards. When
this replacement proved to be a success, the author started the ALSA project with the
intent of creating a generic driver that could be effectively used for any number of sound
chips, with a fully modularized design.
While ALSA is now compatible with the OSS/Free and OSS/Linux sound drivers, ALSA
additionally has its own interface that is improved over that of the OSS drivers. A
complete list of supported soundcards can be found by visiting the ALSA Soundcard
matrix web page at: http://www.alsa-project.org/alsa-doc/. Furthermore, a complete list
of ALSA-supported applications can be found on the ALSA Applications webpage at:
http://www.alsa-project.org/applications.php3. The ALSA sound drivers proved to be
8
such a success that they have since replaced the OSS drivers in the Linux kernel.
The inclusion of the ALSA API and drivers into the stable Linux kernel releases meant
that sound and MIDI-based applications in Linux could be supported with professional-
level capabilities. While the OSS/Free kernel modules, and the OSS/Linux commercial
driver package have their relative strengths, the ALSA drivers and library are now
considered to be the standard set of audio and MIDI drivers for Linux. Notable features
included with ALSA are as follows: “support for audio interfaces from consumer-grade
sound cards to professional digital audio boards; multi-processor, thread-safe capabilities;
fully modularized drivers; a user-space library to simplify audio applications
programming and provide high-level functionality; and compatibility with the older
OSS/Free API. ALSA also supports serial port and USB interfaces. It is completely free
and open-source, with its code base licensed under the GPL (General Public License).”8
Furthermore, because ALSA supports professional audio hardware such as Echo Digital
Audio products, RME Hammerfall and the M-Audio Delta series multi-channel boards, it
is now possible to accomplish audio-related work with professional-grade hardware and
software under Linux.
The Modern Linux Sound System
The Linux sound system can currently be thought of as consisting of three layers, which
together allow for solving the problems inherent to the increased system performance
demands required by audio and video applications. These layers can be described as
follows: the kernel (defined as the innermost core component of the Linux OS -
9
including the presence of ALSA; the kernel operates directly on the computer’s
hardware), the middle layer (consisting of additional components such as ALSA, JACK,
LADSPA), and finally the user space (including programs such as Ardour, Sweep, and
MusE). Development at each of these levels has been extensive and will be further
discussed.
In addition to its presence at the kernel level, ALSA has a significant presence in the
middle layer with its exceptional utilities and tools such as aconnect (a mechanism for
routing MIDI I/O between ALSA-aware programs) and ALSA-mixer (a text-mode
system audio mixer). As a general rule, it can be said that if an application supports
ALSA drivers for both audio and MIDI, it is most likely to exhibit excellent performance
as a standalone program. For MIDI, the ALSA Sequencer interface allows applications
that support this interface the ability to “publish” their inputs and outputs, allowing third-
party applications to connect to them. For example, if a MIDI-based drum machine and a
MIDI sequencer both provide ALSA Sequencer support, it would be possible to record
the events of the drum machine by simply connecting it to the sequencer. JACK, which is
discussed below, also supports this functionality.
The JACK Audio Connection Kit (JACK) is also an important component that exists
on the middle layer. The JACK audio server was designed specifically for purposes
related to professional audio work. The intent behind development of JACK was to focus
on two key areas: “synchronous execution of all connected clients as well as low-latency
operation. JACK can connect a number of different applications to an audio device, as
10
well as allowing them to share audio between applications. A JACK client program can
run in its own processes (for example as a as normal, stand-alone application), or it can
run within the JACK server (for instance, as a plugin).”9 For example, any JACK-aware
application can connect its output to the input of another JACK-aware application. A
software synthesizer could therefore be used as a plugin within an audio-recording
program by connecting each component together using JACK.
JACK Interface
Examples of current JACK-based applications include the JACK Rack
(http://arb.bash.sh/~rah/software/jack-rack/), defined as an “effects rack” for the JACK
audio API. The JACK Rack can be populated with LADSPA effects plugins (LADSPA is
the Linux equivalent to VST or Audio Units technologies) and can then be controlled
using the ALSA sequencer. Additionally, for mastering purposes, JAMin, the JACK
Audio Mastering interface is designed to perform professional audio mastering of stereo
input streams. JAMin also uses LADSPA for digital signal processing (DSP). Finally, for
viewing meters in JACK, the JACK Meterbridge supports a number of different meter-
types that can be rendered using the SDL library and user-editable pixel maps.
Additional sound servers existing within the Linux middle layer include KDE’s Analog
11
Real-Time Synthesizer daemon (arts) and GNOME’s Enlightened Sound Daemon
(esd), libraries such as libsndfile and libaudiofile, and plug-ins based on the Linux
Audio Developers Simple Plug-in API (LADSPA). LADSPA is supported by many
Linux audio applications. Roughly equivalent to the Audio Unit or VST plug-ins used in
most commercial digital-audio workstations, the LADSPA plug-in architecture allows a
single set of audio tools to be shared by all of the audio applications on a system. There
are well over 100 LADSPA plugins available that include applications for signal
processing techniques such as flangers, delays, reverbs, filters, phasers, as well as a full
assortment of additional processing tools. More information can be found at
http://www.ladspa.org/.
Finally, the user space layer of the Linux audio system presents full-featured digital audio
workstations such as Ardour. This program rivals similar commercial applications in
functionality, with support for 24 or more channels of 32-bit audio. “Ardour capabilities
include: multichannel recording, non-linear, non-destructive region based editing with
unlimited undo/redo, full automation support, a mixer whose capabilities rival high end
hardware consoles, lots of plugins to warp, shift and shape your music, and controllable
from hardware control surfaces at the same time as it syncs to timecode.”10 Ardour was
developed by Paul Davis and was designed for users familiar with the Pro Tools digital
audio workstation (DAW) model. Additional information and downloads can be found at
http://ardour.org.
Also present within the user space layer of the Linux audio system are MIDI sequencer
12
programs such as MusE and Rosegarden. Both have many of the features included in
MIDI sequencers on commercial platforms, including audio-record and import
functionality as well.
Graphical editing window in Ardour
It should be noted, however, that these integrated audio capabilities are reported as “not
yet refined” to the level as what is currently available in the commercial applications on
other platforms. However, as with most Linux-based components, one can be relatively
certain that development will continue and that product quality will only improve over
time.
Rosegarden, in addition to the typical track, event-list, and piano-roll views, includes a
standout feature that allows the user to view their music in standard musical notation.
13
Furthermore, Rosegarden additionally includes functionality for exporting music for use
by other Linux-supported applications such as Csound or LilyPond. Finally, Rosegarden
also allows the user to save its output as a Csound score. MusE also has a few notable
features that are not found in Rosegarden, including a mixer window as well as integrated
access to some of the software synthesizers available on the platform . Rosegarden can
Rosegarden MIDI and audio sequencer is designed similar to programs such as Cubase
be downloaded from the their dedicated website: http://www.rosegardenmusic.com.
MusE can be downloaded from http://muse.dyne.org.
Sweep, is a versatile audio program that can be used both in a serious production
environment and as a performance tool for fully digital DJs. Sweep makes for a unique
14
case study because of its development being supported by the film animation studio
Pixar, who likely required a high-quality audio editor that could be used on Linux and
Sun UNIX workstations. This application is possibly the first to come into existence out
of a large movie/multimedia studio into the Linux audio community, and may present a
model that could be repeated in the future. As an application, Sweep can be thought of as
a conventional multi-channel audio file editor. However, the inclusion of a virtual stylus
rather than the traditional cursor makes the program quite unique when compared to most
other applications. This allows the user to “scrub” through a file to hear the exact location
where an edit should be made. However, “the virtual stylus, known as Scrubby, has been
programmed with the physics of a real turntable. Throwing the mouse to the left results in
a spin-back effect, decelerating Scrubby to a full stop.”11 Sweep supports a variety of
music and voice formats including WAV, AIFF, Ogg Vorbis, Speex and MP3.
Sweep is an audio editor and live playback tool for GNU/Linux
15
Additionally, support for LADSPA effects plugins is included for multi-channel audio
file-editing.
Ecasound is another software package designed especially for multi-track audio
processing. It can be used for tasks ranging from audio playback, recording, and format
conversions, to multi-track effect processing, mixing, recording and signal recycling.
Ecasound supports a wide range of audio inputs, outputs, and effect algorithms. Effects
and audio objects can be combined in various ways and “their parameters can be
controlled by operator objects such as oscillators and MIDI-related controllers. A
versatile console-mode user interface is included in the package.”12
Notation software is also available on the Linux platform and continues to be enhanced
by an active development community. LilyPond, which takes a specially formatted text
file and converts it into printable music scores, creates professional looking scores that
are said to rival hand engraving. There are other programs that can also export files in the
LilyPond format.
Software synthesizers of interest include FluidSynth, AlsaModularSynth and
additionally, for building custom synthesizers, Csound, which is also built for Linux (in
addition to active Windows and Mac-based user communities) is available. FluidSynth
can load multiple SoundFont files and play them on 16 MIDI channels at once.
FluidSynth does not include a graphical interface. However, once SoundFonts have been
loaded, it is possible to choose among the loaded instruments using MIDI Bank Select
16
and Program Change messages (or through command-line instructions). If SoundFont
files need to be created or edited, an editor called Swami is also available.
AlsaModularSynth, with its powerful capabilities, and lots of included example patches,
has a clean, intuitive graphical interface that allows the user to patch together modules in
any configuration that is desired.
It should be noted that there are a variety of other tools for musical applications such as
sample rate converters and CD burning software to name a few. Other notable categories
include applications working with CD technologies (rippers, burners, players), telephony
systems (see the Bayonne project), digital DJ tools, virtual drum machines (Hydrogen
drum pattern editor) MP3 and OGG audio compression software, MOD trackers (for
video game music), software synthesizers (ZynAddSubFX), as well as various work /
audio solutions, including network sound servers and streaming audio delivery systems.
Tuning a Linux Installation for Digital Audio
When Linux is first installed on a system, it must be patched/tuned so that it can support
professional-level digital audio tasks. This means that a low-latency patch must be
downloaded and applied to the Linux kernel. This is an extremely important step that is
necessary for setting up Linux to be used for digital audio applications. An un-patched,
un-tuned Linux 2.4 kernel can create latencies of up to 300 ms. Low latencies (defined in
digital audio terms as a “lag time” between when an event should occur and when it
actually occurs) are required for multimedia applications in order to achieve smooth
audio/video performance, even on the most powerful machines. Generally, most issues
17
related to “jerky video/choppy audio” are software related and are mainly caused by the
OS scheduler. With the correct low latency patch (or combination of patches), Linux can
rival most desktop operating systems allowing latencies as low as 2.1 milliseconds. This
proves especially useful for real-time audio use. Specifically, patches from Andrew
Morton and Ingo Molnar have proven that the Linux kernel can be tuned to eliminate
performance bottlenecks associated with latency and bad performance, particularly when
scheduled processes remain active. Information regarding latency and performance of
Linux for digital audio use can be found at http://www.linuxdj.com/audio/.
Packaged Distributions
While the Linux 2.5.x development kernel has officially introduced ALSA as the new
kernel sound system (dispensing with the aging OSS/Free API), and has also introduced
Robert Love’s preemptive patch as a kernel configuration option (which gives certain
processes preemptive status at the Kernel level), the most convenient way to begin
learning about Linux audio projects is to install one of the turnkey systems optimized for
sound and music production: Planet CCRMA, Turn-Key Linux Audio
(http://lulu.esm.rochester.edu/kevine/turnkey/home.html), APODIO or ANGULA
(www.agnula.org) are a few of the most popular. They all provide an out-of-the-box
low-latency Linux kernel, ALSA drivers, JACK and LADSPA, as well as a host of sound
and music applications. Note that these are either full-fledged distributions or require
only a simple installation to upgrade an existing system.
APODIO 0.9 (http://cd.apodio.org) has very recently been released in it’s English
18
(GNU/Linux) version distribution and is especially attractive because it can act as a live
bootable CD, containing major audio tools (under GNU/Linux) and a complete operating
system that is based on Mandrake 9.2. APODIO can therefore be used from boot, without
the need to install or make changes on the hard disk. This makes it particularly easy for a
potential user to try out. If desired, APODIO can then be installed directly to the hard
disk and run locally. Turn-Key Linux installs within an existing Mandrake 9.x
distribution by executing only a single script. This also makes for an easy installation for
users that may be unfamiliar or uncomfortable with command line interfacing.
Furthermore, this ensures a virtually transparent initiation into Linux as a multimedia
platform without the potentially frustrating experience of spending hours configuring the
system with correct versions of necessary software. The Turn-Key package was initially
put together as a way to provide students at Eastman Computer Music Center with the
same tools used in the studio for use on their home systems.
AGNULA is an acronym for “A GNU/Linux Audio distribution” and is also devoted
completely to professional and consumer audio applications and multimedia
development. Its goal is to offer two distributions: one will be Debian-based (DeMuDi)
and the other will be Red Hat-based (ReHMuDi). AGNULA has the Alsa Modular Synth,
Cecilia (a graphical user interface for Csound), Jack, jMax, LADCCA, Nyquist, and
TkECA among its offering. LADCAA is an acronym for “Linux Audio Developer's
Configuration and Connection API”. More specifically, “it is a session management
system for audio applications on the GNU/Linux platform. It understands both the JACK
audio API and the ALSA MIDI sequencer interface.”13
19
Planet CCRMA (pronounced “karma”) is another Linux audio-based distribution and
has been packaged for release by Fernando Lopez-Lezcano, composer and system
administrator at Stanford University's Center for Computer Research in Music and
Acoustics. Lopez-Lezcano, who is responsible for the configuration and maintenance of
CCRMAs network of mostly Linux workstations, originally created the software package
to “mirror the Center's software so that students could run exactly the same system at
home as they did at the Center. This package eventually became what is now called
Planet CCRMA At Home (Planet CCRMA for short, or just the Planet).”14
Planet CCRMA installs on top of a default RedHat 7.3, 8.0, 9 or Fedora Core 1
installation, and provides three important features. First, a “tuned” Linux kernel that is
optimized for low-latency operation is included. Next, Planet CCRMA provides a
collection of interrelated projects which make up the professional-level Linux sound
architecture discussed above, including ALSA, Jack, and LADSPA. Additionally
included with Planet CCRMA are a substantial number of applications designed to take
advantage of this optimized audio foundation. Planet CCRMA, like the aforementioned
distributions represents a convenient installation option where applications created
separately are acquirable in a single download. Planet CCRMA can be downloaded from
the following Stanford-based website: http://ccrma.stanford.edu/planetccrma/software/.
More than 100 LADSPA plug-ins are installed with Planet CCRMA as well as programs
already discussed, including Ardour, MusE and Rosegarden, Csound, and Lilypond.
Software synthesizers such as AlsaModularSynth and FluidSynth as well as many more
20
are also included. Other programs such as Pd (a program similar to Cycling 74’s MAX),
Snd (a powerful sound-file editor), Ecasound and many more are also included. A