Spectral "Image Processing" The Conventional Photo or image editing software might not be something you would ever think that you had a need for in a recording or editing situation, but it can be a hugely useful tool for solving simple problems, and can also be used for unconventional editing and sound design tasks. But before we even consider the editing possibilities themselves, we need to figure out how to get from audio recording to image editor. The process is a very simple one, but you will probably need additional software in order to do it as it is quite a specialised and unusual thing to want to do. MetaSynth is perhaps the most well-known package for sound- to-image and image-to-sound translations but it was originally designed as a tool to allow you to “draw” new sounds and analyse existing audio files rather than as a “bridge” between audio and image editing software. In fact, the best tool that I know of for doing this is the excellent PhotoSounder, which not only allows you to load
56
Embed
€¦ · Web viewThe first thing that we need to get used to is the idea of “layers”. The benefit of using layers is that each layer can serve as an adjustment to the original
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Spectral "Image Processing"
The Conventional
Photo or image editing software might not be something you would ever think that
you had a need for in a recording or editing situation, but it can be a hugely useful
tool for solving simple problems, and can also be used for unconventional editing and
sound design tasks. But before we even consider the editing possibilities themselves,
we need to figure out how to get from audio recording to image editor. The process is
a very simple one, but you will probably need additional software in order to do it as
it is quite a specialised and unusual thing to want to do.
MetaSynth is perhaps the most well-known package for sound-to-image and image-to-
sound translations but it was originally designed as a tool to allow you to “draw” new
sounds and analyse existing audio files rather than as a “bridge” between audio and
image editing software. In fact, the best tool that I know of for doing this is the
excellent PhotoSounder, which not only allows you to load up audio files and then
export the spectrogram as an image for editing in your software of choice before
loading the image and converting back to audio, but also features some quite
comprehensive editing tools of its own. These tools are very much the kind of things
you would expect to see in any image editing package and, coupled with the “layers”
methodology adopted by many image editing packages, PhotoSounder is actually
more than capable of a great deal before you even think about exporting the image.
Many of the ideas that follow will be possible within PhotoSounder itself, but I will
look at the techniques for image editing in Photoshop as it features some options that
are unavailable in PhotoSounder or other simpler editors.
However, a fundamental advantage to working within PhotoSounder is that you can
preview the changes in real time whereas any editing done after exporting to your
image editor will be done on visuals alone – you would have to re-save the image and
import it back in to PhotoSounder to hear the effects. As a result it can be easier to do
the simplest jobs in a spectral editing application, to consider PhotoSounder for more
complex jobs and then move to an external image editor only when absolutely
necessary.
The first thing that we need to get used to is the idea of “layers”. The benefit of using
layers is that each layer can serve as an adjustment to the original audio. With most
spectral editing software you work directly on the audio file so any changes you make
are applied to it. They can, of course, be undone but you work sequentially applying
one edit after another. You can step back through many levels of undo but you
couldn’t, for example, keep the actions that you carried out on steps 3, 4 and 6 but
lose what you did on steps 1, 2, 5 and 7. With layers in an image editor it is certainly
possible to do that. Each adjustment can exist on its own layer and then can be freely
shown, hidden or combined in numerous ways.
The next things to consider are the tools used to make the selections. In the previous
chapter I stated that many of the selection tools used in spectral editing applications
were in some ways a subset of those used in image editing and taking a look at the
extended options available in your image editor. The “lasso” tool in a spectral editor
is very useful for highlighting irregularly shaped selections, but an image editor may
well have a “magnetic lasso” tool that will try to detect obviously visible “edges” and
automatically jump to and follow them. In certain situations this could be helpful for
making sure your selection is as “tight” as possible. In addition to the rectangular
selection tools, most image editors also offer circular/oval selection tools as well,
which might prove very useful in some situations. Finally, many image editors allow
you to save selection areas as “masks”, which means that you can load them at any
point and you will be guaranteed to have the same selection area – even if you choose
to go back and apply additional processing and adjustments to a particular area after
you have deselected it.
One other option which can be very useful is the ability to control the size, shape and
softness of the brush tool. While a round brush will probably be the most useful, there
might be times when a differently shaped brush will serve your purposes better. And
the softness parameter will change the brush from a hard edge to a very soft, blurred
edge. If we use blurred-edge brushes in our adjustment layers then it will graduate the
effect of whatever the layer is doing from the edges of the blur to the solid centre of
the brush. Once again it is a subtle difference but it can be helpful if a particular
unwieldy edit is needed.
Once we have used the expanded range of tools to create our selection it is time to
actually do something useful with it. For each new adjustment (or collection of similar
adjustments, perhaps) we would create a new layer and then set it up to carry out the
task we want. Many of the tasks we looked at in the last chapter have direct
equivalents in image editing in this way so we will quickly look at the ways we can
achieve the same results here before moving on to look at some things that aren’t
achievable in a spectral editor (yet).
Attenuation is a very simple task to achieve in an image editor. All we need to do is
remember that brightness is the image processing (and spectral editing) equivalent of
amplitude so attenuation is achieved by decreasing the brightness. If we make our
selection (or load a previously saved one) and then create a brightness adjustment
layer in our image editor we can then increase or decrease the brightness (amplitude)
of the selection as desired. The great advantage here is that not only can we make the
amplitude adjustment in both directions (quieter or louder) but we can also go back
and change this setting later in light of additional changes that we may make, all
without having to commit to any change before we do a final export of the image
from the image editor.
Copying and pasting is achieved in much the same way as it would be within a
spectral editor but working in an image editor does allow some additional flexibility.
By pasting the copied area on to a new layer it allows for it to be moved or “unpasted”
after the fact and, perhaps more importantly, it allows us to soften the edges of the
pasted area so that there are no abrupt changes in tone. If you copy and paste a
selection in a spectral editor you have very little control over the transition between
the original and pasted areas and this could potentially lead to audible artefacts at the
transition point. Doing the same thing in an image editor gives you much more
control over this important aspect of the pasting process.
While there is no dedicated de-clicking process in an image editor, by and large,
clicks are quite easy to deal with because they are a relatively high burst of sonic
energy over a wide frequency range and a short period of time. As such they will
normally show up quite clearly as fairly obvious vertical lines in the image editor.
Once we have identified them there are a couple of ways we could deal with them.
The first is to simply highlight the offending area and simply delete it and move the
surrounding areas so that they cover the gap. This will obviously make the whole file
a little shorter but if there are only one or two digital clicks of a few samples long then
this would amount to a few milliseconds at most and is unlikely to have any impact on
the file as a whole. Of course in doing this there is a risk that you will still have an
audible “glitch” because of subtle differences between the two areas that you have
moved to be adjacent. It is also not a very viable option if there are many clicks and
each is of a longer duration because the reduction in length would quickly add up.
Another option would be to copy and paste an equivalent length region from either
side of the click area but, again, you still run the risk of having audible glitches
because of minor differences between the original and pasted areas. You could, of
course, soften the edges as we described above to smooth out any such glitches. There
is a third option, however, which could prove interesting. Some image editors, either
natively or through plugins, provide the option to “fill” a selection with content that is
created by looking at the rest of the image and analysing patterns so that it blends with
the surrounding areas. This “content aware fill” (as it is called in Photoshop) could be
a great way to try to fill the gap left by the removed clicks, as it is a very similar tool
to some of the more advanced removal/attenuation algorithms in spectral editors.
What we really need to do when removing clicks this way is to interpolate between
the values immediately to the left of the removed and the values immediately to the
right. The shorter the click was, the easier this will be to do because, if we are talking
about a couple of milliseconds, even if the interpolated values aren’t exactly what
should have been there without the click, the period is so short that we probably
wouldn’t notice anything untoward. With longer clicks, however, a simple averaging
interpolation may be audible.
De-crackling with an image editor would be, I believe, a very impractical thing. It
isn’t so much a problem with having the tools to remove the crackle so much as it is a
problem having a means to accurately (and automatically) detect and locate it. The de-
crackle tools in spectral editors are expertly designed to look for specific audio events
and, when located, to treat them in a certain way. There are no tools that currently
exist in image editing to achieve a similar result and, therefore, de-crackling is
something that should probably be left to a spectral editor or dedicated plugin to deal
with. At least for the time being.
In theory we may be able to de-noise audio using an image editor but that comes with
one huge caveat. Image editors and image editing plugins have some really great tools
for getting rid of noise but, and this is the important part, the way noise occurs in
images is generally quite different to the way it occurs in audio. Noise in images tends
to be either compression artefacts (which are dealt with in a very specific way) or
what is known as Gaussian noise. This is the visual equivalent of white noise in audio
terminology and represents a statistically random and uniformly distributed noise. In
audio terms this would mean that it would be totally equal across all frequency bands.
The tools used to remove this in image editing would, therefore, be of limited use to
us if our “noise profile” happened to have a non-uniform distribution. The reality is
that de-noising using graphical image editing tools often results in a spectrogram that
looks blurred and the audible result of that is one of making the whole sound very
unfocused and, at worst, having a “wateriness” to it which is very far from the desired
result. As a consequence of this, de-noising is probably best left to conventional audio
editing processes – but it’s always worth a try if you fancy experimenting.
As a long shot, it might be possible to work around this by taking the spectrum of a
pure noise part of the recording (as you would do with de-noising in a spectral editor)
and then combining this spectral profile of the noise with a pure Gaussian noise image
to create a “weighted” Gaussian noise profile, and then using that as a subtractive
layer on top of the spectrogram image in your image editor. However, it is unlikely
the results would be as good as a dedicated de-noising process. If you enjoy
experimentation, though, it could be something you might like to try just as an
alternative technique. It would be very much a static noise profile though, and
wouldn’t offer the advantages of adaptive noise profiles used in many de-noising
algorithms and plugins.
Hum, on the other hand, should be much easier to deal with because it is much easier
to detect in an image editor. We have already seen that mains hum consists of a
constant frequency of either 50Hz or 60Hz and the associated lower-order harmonics.
In a spectrogram image this will show as horizontal lines located at 50Hz or
60Hz,then fainter lines at 100Hz or 120Hz, fainter still at 150Hz or 180Hz, and so on.
You could almost look at these in the same way that we did with clicks. They
represent very narrow focuses of audio energy. Clicks represented a wide frequency
range for a short period of time while each of the harmonics present in mains hum
will represent a narrow frequency range for a long period of time. Therefore each of
the options we looked at for dealing with clicks would be equally applicable here
except that we would be working with horizontal selections rather than vertical ones.
Equally, if we wanted to deal with mains hum we would simply select the relevant
frequencies and then reduce the brightness (attenuate them) which would, in effect, be
like using a very narrow bandwidth EQ.
There are a number of ways to achieve compression-like effects in an image editor
and these range from limiting the dynamic range of occasional sounds through to a
more general re-balancing of the dynamic range of the recording as a whole. There
are also ways in which you can achieve dynamic range effects that would be
completely impossible with any other spectral editing process of plugins. To begin
with let’s have a look at the most simple case: taming occasional peaks.
As with spectral editors, you can create selections in image editors and then attenuate
those areas to pull the peak levels down. If you want a result that is most similar to a
traditional single band compressor, then you would create a selection that covers the
width of the event we want to compress and also covers the entire spectrogram from
top to bottom (covering all frequencies). You can then create a brightness adjustment
layer and decrease the brightness by the required amount. You can also soften the
edges of the selection and this softening would be analogous to the “attack” and
“release” settings on a compressor. The softer the left-hand edge of the selection, the
slower the equivalent attack would be and the softer the right-hand edge, the slower
the equivalent release would be. What we are doing, in effect, is attenuating a single
part of the sound, which is what compression does at its most basic level.
You can then apply the same theory and techniques to a multi-band compression
equivalent by simply selecting only the frequencies that you want to compress, in a
vertical sense, and then creating the selection and brightness adjustment layer as
before. In additional to softening the left and right edges to create “attack” and
“release”, you can also soften the top and bottom edges to simulate the effects of the
filter slopes on the multi-band compressor bands. All pretty straightforward stuff and
fairly similar to what we have already spoken about for spectral editors. But then
things get a little more interesting.
Compression is the process of reducing dynamic range of sound, and dynamic range
is defined as the range between the loudest and quietest parts of an audio signal.
Moving that across to our image editor we know that loudness is equivalent to
brightness (or more correctly luminance), so in order to compress the audio in our
spectrogram we would need to change the amount of difference between the lightest
and darkest parts of our image. However if we simply adjust the contrast of the image
(in effect reducing the difference between the loudest and quietest parts of the signal)
we are not doing what we would traditionally consider as compression. In
compression as we know it we are adjusting the levels of the sound as a whole at
different points in time whereas with this spectral compression we are adjusting the
difference between the levels of different frequencies in the sound. That’s not to say
that it isn’t a very interesting effect in its own right; it most certainly is, but it won’t
achieve the same results as a normal audio compressor.
Adjusting the contrast amount will reduce the effective dynamic range of our audio
but it does it in a way that no hardware or plugin compressor does. It will take a
hypothetical midpoint in the dynamic range and make any sounds that are louder than
this point progressively quieter and sounds that are quieter will get progressively
louder. We have already seen that downward compression involves reducing the
levels of loud signals while upward compression involves increasing the levels of
quiet signals. In a way this actually does both simultaneously so would be a unique
effect in its own right. But I promised earlier to talk about upward compression, so
let’s see how we can adapt this technique to give us something closer to genuine
upward compression.
More traditional audio-type compression can actually be achieved inside of
Photosounder itself and there are a number of tutorials on the website which go
through this process. Very interesting is the fact that you can actually draw the
compression curve directly on to the audio, which means that you can apply a linear
compression, a logarithmic (approximately) compression, or a compression curve of
any shape that you desire which allows for compression effects simply not possible
with any traditional audio compressor.
While neither of these methods (spectral compression in an image editor or the
pseudo-traditional approach in Photosounder) is a precision tool in terms of absolute
control, image processing of spectrograms can be an extremely creative tool and can
offer us ways of doing things which are just not possible in any other editing platform.
Even in the relatively commonplace editing tasks that we have looked at here there
are ways that image-processing editing can help us. If you are happy to venture off of
the beaten track just a little then even more becomes possible.
The Unconventional
The processes and tools we have described so far in this chapter have all been related
to attempting to recreate tasks and effects that we can create other ways, either using a
spectral editor or, in some cases, simply using plugins or hardware processors. But if
we left it at that we would be overlooking some of the truly amazing ways that we can
bend and manipulate sound using an image editor. With this short section we are
doing a little more than just dipping a toe in to the waters of sound design; this is very
much experimenting with audio and wouldn’t really be construed as “editing” as such,
but it is related to the editing tasks we have been discussing and it’s also a lot of fun
so, while we are in this area, let’s just have a brief look at some intriguing
possibilities. One thing that I will say at the outset, though, is that the best results are
achieved with sounds that aren’t overly complex. What I mean by this is sounds
which are not rhythmically complex or full pieces of music. Isolated, single sounds
that aren’t too “busy” are best for these types of effects.
Using blur effects can give rise to some really nice results that are actually quite hard
to describe. If we took a sound which had very definite harmonics of 100Hz, 200Hz,
300Hz, 400Hz, etc., we would see each of these harmonics as a very clearly defined
horizontal line. If we were to then apply a blur effect this would have the effect of
creating additional frequency content either side of the main frequency. In the case of
the 100Hz harmonic this would mean very subtle additional frequencies ranging from,
for example, 98Hz to 102Hz, increasing from inaudible at the edges of the range, up
to a maximum level at 100Hz and then down to inaudible again at 102Hz. The
resulting sound is very hard to characterise but it almost sounds like multiple versions
of the same sound detuned against each other. Imagine a choir singing where each
person was slightly out of tune. Depending on the sound there can be a subtle
inharmonic “shimmer” to the sound that can become quite pronounced if the level of
blurring is high enough.
Easier to describe, perhaps, is the effect of blurring only in a horizontal sense with a
“motion blur” effect. This visual effect is the kind of thing you would expect to see in
a photo of a fast moving vehicle: a blur in a horizontal direction. If we apply this
effect to our spectrogram we get a kind of time-domain blurring which will sounds
reminiscent of a reversed reverb if the blur is carried out from right to left and a
strange, “ghostly” reverb if the blur is carried out from left to right. Although we
could achieve effects which are superficially similar with reverb, this particular
processing does sound different because a reverb will tend to diffuse the sound as it
dies away and give the sound a sense of distance while this method sounds more like
a natural fade-in or fade-out of a sound. As a result, if you apply it to a naturally
sustaining sound (such as a pad), it probably won’t sound that special because the
sound itself could well have a naturally long attack or decay. The real magic comes
when you apply it to a sound that your ear will recognise as a naturally percussive
sound. In situations like that the fact that there is a fade-in or fade-out is just enough
to catch your attention as sounding “wrong” but in a completely natural way,
assuming of course that you don’t go to extremes with the blur (which can be fun in
itself!)
Another interesting thing that you can do is to skew the spectrogram image. This is
where you move the position of the top of the spectrogram relative to the bottom. If
you imagine a click that would show up as a vertical line through the spectrogram
then, after skewing, this would become slightly diagonal. Now if we replace that click
with a more useful sound we can perform a very unique kind of fade-in where the
harmonics of a sound that, ordinarily, start at the same time can begin at different
times. Once again, the particular effect that this has will depend a lot on the sound. If
we use a sustaining sound the result could be quite similar to quickly opening up a
low-pass filter (if the top of the image is skewed to the right) or closing a high-pass
filter (if the top of the image is skewed to the left). If, on the other hand, you use more
percussive sounds and experiment with greater skew amounts you get a very
interesting “smearing” of the frequencies. Because of the nature of what we are doing,
each new sound, be that a new percussive hit or a new word in a vocal passage, will
sound like it has the filter opening on it. This would mean that the sound no longer
has its obvious percussive attack, but the newly created sounds can be unique in many
ways so it is worthwhile experimenting. Of course, if we shift the top of the image to
the left relative to the bottom then the high frequencies will come in first, so the effect
will be closer to that of a high-pass filter being closed down on each individual note.
The problem here is that the start of each note sounds less distinct that with the low
frequencies coming in first simply because the low frequencies are often more
obvious in defining the start of a sound even though the higher frequencies contribute
more to the character.
If you were to change the horizontal size of the image, it would be equivalent to time-
stretching; making it wider would be stretching and making it narrower would be
compressing. However, when you look at stretching or compressing the image in a
vertical sense then we have what I can only call “spectral squashing”. What you
would be doing here is compressing the spectral distribution of the sound and, as a
result, the frequencies would no longer be nicely and harmonically distributed. Each
frequency that was previously a whole number multiple of the fundamental may now
be, for example, 1.42 times the previous frequency, which will lead to anything from
slightly odd-sounding at very low squash values to unrecognisable at higher values.
Flipping the image horizontally will perform a basic reverse function much like you
find in almost every audio editor but, on the other hand, flipping the image in a
vertical sense will mean that the “fundamental” frequency – the most dominant
frequency – now becomes the highest frequency and the reducing levels of all of the
other higher harmonics in the original sound now become increasingly quiet sub-
harmonics. Once again, it is quite hard to describe this effect and it is very much
material dependant, so trial and error (if there is such a thing as “error” in processing
of this kind) is the order of the day.
Image editors often allow you to create “gradient fills”, so if you create a new layer,
then create a “white to black” horizontal gradient (white on the left) and use this layer
to multiply with the spectrogram, the gradient will act as a volume fade where the
white part of the gradient represents full volume and the black represents silence. In
this case it would equate to a fade-out. If we reverse the gradient so that the white is
on the right then we will have a fade-in. If we create a vertical fade with white at the
bottom then all of the lower harmonics will be at high volumes while the higher ones
will be silent. In audio terms this is a low-pass filter effect. Similarly we could reverse
the gradient so that the white was at the top, which would be equivalent to a high-pass
filter. By combining the two ideas and creating a diagonal fade we will create an
effect like a gradually opening or closing filter. One example would be a diagonal
gradient with white at the bottom right corner and black at the top left corner. At the
beginning of the file all high frequencies would be silent and low frequencies would
be very quiet. As we progress through the file, the lower frequencies will become
louder first and only towards the end of the file will the higher frequencies start to
become audible. In audio terms this would be a slowly opening low-pass filter. By
varying this we can create opening and closing low-pass and high-pass filters
depending on the positions of the white and black portions of the diagonal gradient.
Finally, you can often vary the smoothness of the gradient from a very gentle
transition to a simple two-tone effect that changes from white to black at the mid-
point. This smoothness variation is equivalent to the filter slope of an analogue filter.
The smoothest gradient could be the equivalent to a 6dB/octave filter while the two-
tone transition would be beyond the sharpness of even a 48db/octave filter.
One really interesting thing you can do is to create a new layer, load another
spectrogram into this layer and then set the layer blending to multiply mode. The
resulting image will be a hybrid of the two. It is almost as if you have the “picture” of
one, with the tonal characteristics of the other. Or perhaps you could say you have the
meaning of one and the colour of another. And this is exactly what a vocoder does.
You take one sound and impose its spectral content onto another sound. The most
obvious use of this is to create Cylon-esque or “Mr. Blue Sky” robot voices, but
vocoding actually has a lot of potential uses outside of that. You can vocode a
percussive loop with a melodic sound to create a hybrid that has the rhythm, dynamics
and “snap” of the percussion loop but has tonal qualities from the melodic sound. You
could be more subtle still and vocode a string section and a choir “ahhh” sample to
create a sound that is neither and both simultaneously. This sounds particularly un-
vocoder-like if both sounds are playing the same chord or melody so you aren’t
actually imposing any musical change on either of the sounds. but are simply messing
with the spectral balance.
And finally, just as a little curve-ball, and something for the adventurous only, how
about creating a negative of the original sound as you would with a photograph? If
you try this, be warned; turn your speakers or headphones down because the resulting
noise is usually chaotic as there will be a large “noise” component to the sound.
Whatever was “silent” harmonic space in your original sound will become full
amplitude in this negative. I can’t actually think of a practical use for this to be
honest, but you never know.
Even this only really scratches the surface of what you might be able to do with an
image editor and plugins – a lot of the things you try might prove to be unusable in
any musical context but could be just perfect for creating that foreboding, heavily