Image-Based Spectrographic Processing

Page 1

Image Based Spectrographic Processing Noah Benjamin Maze Electrical Engineering, University of North Texas 3940 N. Elm Street Denton, TX 76207-7102 Abstract— This paper presents a novel way to graphically filter a sound’s frequency spectrum with an image file. The filtering is accomplished by creating a series of multiple-passband filters based on each column of the image, and applying them sequentially to the original audio file.

Spectrograms treat intensity as a 1-dimensional value, so the RGB color values for each pixel are averaged together to produce a 1-dimensional (grayscale) intensity map. The resulting matrix is contains a value corresponding to each pixel of the original image.

I. INTRODUCTION Historically, embedding an image into the voiceprint of a sound has only been attempted by a handful of avant-garde musicians [1]. These attempts have produced noisy results that are generally void of any appreciable audio content. This is a byproduct of their production. Before today, generation of spectral images was done by reversing the process of generating a spectrogram. The sole input to this process was the target image, and the resulting sound has been described as “discordant, metallic scratching” by Wired magazine [2]. By using abstract, minimalistic images, artists have been able to coax more interesting sounds from this algorithm, but this sacrifices the overall effect of the image. Image Based Spectral Processing (IBSP) allows any audio file to be the manipulated into a spectrographic image. The resulting sound has a frequency spectrum that is characterized by ever-changing passbands, but its original content is otherwise preserved. This functionality is accomplished by way of a series of FIR filters working to create a frequency response that resembles an image supplied by the user. This supplied image is broken into columns that correspond to the amplitude response of the normalized frequencies of the sound file. The mathematics behind this system are simple enough to see use in many different environments, but this proof of concept was assembled, tested and demonstrated in MATLAB.

2) Sound Input Sound files can be of any sample rate. When an audio file is loaded, its sampling frequency is documented and applied to the output function, but it is not used for any portion of the processing. Filtering is done based on the normalized frequency of the audio data [4]. The spectral density of the output file resembles a masked version of the original audio spectrum, so the frequency spectrum of the input file serves as a canvas for the image. A noisy sound with a broad range of frequencies (such as Gaussian white noise, or pink noise) will provide the most uniform frequency spectrum. Music files usually exhibit a periodic burst of frequencies on the downbeats followed by quieter and more focused spectral density. This periodicity results in a spectrogram that is characterized by bright bars that fade to a sparser spectrum on the upbeats. Spectrographically filtered music resembles the original image overlaid with vertical scan lines. This problem can be avoided by selecting musical passages that are particularly cacophonous.

II. THE FILTERING PROCESS Each pixel of the original image corresponds to a momentary bandpass filter. The passband of the filter is centered at a frequency corresponding to the pixel’s vertical position in the image, and the filter is applied to a window of time corresponding to the horizontal position of the pixel. A. Input Acquisition The arguments to the IBSP function contain the location of an image file and a sound file to be processed by the aforementioned image. 1) Image Input and Preprocessing Image Based Spectral Processing can work with any raster image. Both indexed and RGB images are acceptable, but indexed images are converted to RGB before processing can take place [3].

B. Audio Processing The filter behind IBSP is actually a series of filters that are applied sequentially to frames of the original sound. These frames are then recombined into the spectrally modified output sound. Many filtering options were tested, but a finite impulse response filter ended up being the best way to minimize execution time while maximizing spectrographic image quality. Because it is quick and simple to implement in MATLAB, FIR filtering was employed during the initial design phase. Surprisingly, it proved to be the most viable method considered. Once the initial design was completed, more complex filter designs were tested. FIR filters that were designed to provide higher precision, including least-squares approximations [5] and the Parks-McClellan algorithm [6], produced slight visual differences and drastic increases in execution time. A recursive IIR filter implementation [7] was attempted as well, but the algorithm did not meet expectations. The trailing samples of each FIR filter output frame, resulting from the convolution of the impulse response with the input frame, create an aurally pleasing transition between each column of the image. Previously tested lower-order filters left jarring frequency bursts between column transitions.

The frequency response of the FIR filter is described by a vector of frequencies, and a corresponding vector of magnitude. Each column of the input image enumerates these magnitudes. A shorter image results in a shorter list of frequency-magnitude pairs. Fewer pairs lead to fewer compromises in the filter creation, which in turn lead to a clearer and more accurate representation of the original image. The IBSP function automatically selects an appropriate order for the filter design. This maximum order number is limited by two factors: the length of the image, and the number of samples in the sound. The frame size is equal to the number of audio samples in the original sound divided by the width, in pixels, of the image. The order of the filter must be less than the frame length. The upper bound of the filter is theoretically unlimited, but a hard limit of 1024 is built in to the program to avoid wasting processing power on a needlessly detailed filter. Experimental results indicate that there is little improvement in a filter of double this magnitude, while the execution time is greatly increased. C. Reprocessing with Multiple Passes Further detail can be applied to the output sound by reprocessing: Executing the algorithm again with the output sound as the input, and filtering with the same image. This functionality is built in to the program, because multiple passes have an extremely beneficial effect on the process. This benefit is particularly noticeable with detailed images. A single pass with the IBSP function results in adequate results for small images, and images that are comprised of broad areas of uniform color. Two to four passes will result in a clearer spectrographic representation of these types of images, but a great deal of passes leads to attenuation of frequencies that should not be attenuated. For photographs and other, more detailed images, four passes will provide an acceptable level of detail. The finer details of an image will not stand out in the results of the first few passes. Each pass of the IBSP reapplies the filter, and increases the contrast of the image as it exists in the spectral intensity of the sound file. The increasing contrast occurs because each sample of the file is, at best, left alone. Most samples (and most frequencies as a result) are attenuated by the filtering process. The darkest portions of the image result in the most attenuated signals, and the attenuation decreases as the image map becomes more intense. Because the rate of attenuation decreases with brightness, the difference in brightness between two points of unequal intensity grows. Unfortunately, this effect has diminishing returns. This attenuation results in a race to the bottom. The frequencies representing the darkest pixels attenuate at the fastest rate. These incongruous fall rates cause the frequency intensities to move farther away from each other as they fall (increasing contrast), but the intensities eventually reach zero and the detail they contained is lost. An over-passed sound is quiet aside from a few loud bursts, and its spectrogram is very dark with a few exceptionally bright patches. III. DEMONSTRATIVE RESULTS To demonstrate the functionality of the IBSP function, the picture in Figure 1 was used to filter 30 seconds of Gaussian white noise with a sample rate of 8000 Hz.

Gaussian white noise is an extremely harsh sound, but it possesses a uniform frequency spectrum across the entire frequency band. It was chosen for this demonstration for two reasons: It provides a very uniform background with which to demonstrate IBSP functionality, and no one will ever have to listen to these sounds.

Figure 1: The image used to demonstrate the functionality of the filter.

This image is broken into two regions: a photograph of the campus at UNT, and the UNT logo. Between these two pictures, the filter’s ability to represent detail will be demonstrated. The spectrogram of the result of the first pass can be seen in Figure 2. The UNT logo is immediately and clearly visible, but the photograph portion of the image is unclear and noisy. An additional pass (see Fig 3) brings out more detail in the photo while significantly attenuating the fill color of the logo.

Figure 2: Spectrogram of the output sound resulting from processing pink noise with the image in Figure 1 (1st Pass)

Figure 3: Details emerge with a 2nd pass.

Figure 4 continues this trend with an additional boost in the detail of the UNT campus after the 4th pass. The fill color of the UNT logo has been completely muted by this pass. This tradeoff illustrates the need for a customizable number of passes in the IBSP function.

Figure 4: After a 4th pass, details of the photograph emerge.

By the 8th pass (see Fig 5) the darker portions of the photo have begun to lose detail. The outlines of the logo are also

thinning out. This loss of detail due to the attenuation of darker colors marks the beginning stages of over-passing.

ACKNOWLEDGMENTS Noah Maze wishes to acknowledge Oluwayomi Adamo and other contributors for developing and maintaining the course content consulted in this assignment. REFERENCES [1] [2]

Figure 5: The result of 8 passes with the IBSP function.

By the 16th frame (Figure 6), all but the brightest sections of the picture have been attenuated. The remaining areas of the photo are extraordinarily detailed, but the UNT logo is totally unrecognizable. These images have a totally different aesthetic than the original input, but the sparse frequency spectrum produces a much more pleasing sound than the original.






Figure 6: The 16th pass.

As mentioned earlier in the paper, this process can be performed on any image-sound pair, but sounds with a lessuniform spectral density do not produce easy-to-see results. Figure 7 illustrates a new input sound: a piece of music with a very strong downbeat. This downbeat results in a periodic wall of frequencies followed by quieter gaps. Visually this results in a spectrogram that looks like it has been fed through a paper shredder.

Figure 7: The demonstration image mixed with an actual piece of music.

Sound files featuring lots of sustain and “walls of sound� typically have a broad and stable frequency distribution. More nuanced noises such as pink noise and grey noise can be used in place of white to achieve a more pleasing sound. IV. CONCLUSIONS The Image Based Spectrographic Processing function discussed in this paper uses multiple passes of a time-varying FIR filter to manipulate the spectral density of a sound file with the grayscale color map of an image file. The results of this demonstration illustrated that IBSP is appropriate for detailed spectral reproduction of images with any level of detail.

Bastwood (2010, September 13). The Aphex Face | bastwood. Message posted to Kahney, Leander (2002, May 10). Hey, Who's That Face in My Song? Retrieved December 17, 2010, from B. MathWorks. (2010). Convert indexed image to RGB image. Retrieved December 17, 2010, from Welch, Cameron H. G. Wright and Michael G. Morrow, Real-time digital signal processing from MATLAB to C with the TMS320C6x DSK, Florida: CRC Press, 2006. MathWorks. (2010). Least square linear-phase FIR filter design. Retrieved December 17, 2010, from MathWorks. (2010).Parks-McClellan optimal FIR filter design. Retrieved December 17, 2010, from MathWorks. (2010).Recursive digital filter design. Retrieved December 17, 2010, from

Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.