AUDIO COMPRESSION USING DCT & CS 1 · PDF fileAUDIO COMPRESSION USING DCT ... syndrome bits of an LDPC code applied to the projections. By using the DCT as ... loss e.g. lossy compression

AUDIOCOMPRESSIONUSINGDCT&CS

1MR.SUSHILKUMARBAPUSAHEBSHINDE,2PROF.MR.RAKESHMANDLIYA

1M.Tech.(VLSI),BMCTIndore,MadhyaPradesh,India,[email protected]

2HeadofECDepartment,BMCTIndore,MadhyaPradesh,India,[email protected]

ABSTRACT

A largeamountoftechniqueshavebeenproposedto identifywhetheramultimediacontenthasbeen illegallytamperedornot.Nevertheless,veryfeweffortshavebeendevotedtoidentifyingwhichkindofattackhasbeencarriedout,especiallyduetothelargedatarequiredforthistask.Weproposeanovelhashingschemewhichexploitstheparadigmsofcompressivesensinganddistributedsourcecodingtogenerateacompacthashsignature,andweapplyittothecaseofaudiocontentprotection.At the content user side, the hash is decoded using distributed source coding tools. If the tampering is sparsifiable orcompressibleinsomeorthonormalbasisorredundantdictionary,itispossibletoidentifythetime‐frequencypositionoftheattack,withahashsizeassmallas200bits/second;thebitsavingobtainedbyintroducingdistributedsourcecodingrangesbetween20%to70%.Theaudiocontentproviderproducesasmallhashsignaturebycomputingalimitednumberofrandomprojections of a perceptual, time‐frequency representation of the original audio stream; the audio hash is given by thesyndromebitsofanLDPCcodeappliedtotheprojections.ByusingtheDCTassignalpreprocessorinordertoobtainasparserepresentation in the frequency domain,we show that the subsequent application of CS represent our signalswith lessinformation than thewell‐knownsampling theorem.Thismeans thatourresultscouldbe thebasis foranewcompressionmethodforaudioandspeechsignals.

IndexTerms:—AudioSignal,DCT,CompressiveSampling,Sparsity.

1.INTRODUCTION

Compressive Sampling (CS) is a new framework forsamplingandcompressingofaudioandspeechsignal. Incompressive sampling Nyquist sampling model isreplacedbysparsemodelbyassumingthatsignalcanberepresented efficiently using just few significantcoefficients.

The tremendousworkbyCandes et.al.[3] anDonoho [4]provedthat,alongwithimplyingthepotentialofdramaticreduction of sampling rates, power consumption andcomputationcomplexity indigitaldataacquisition,signalcanbereconstructedwithsmallerthanNyquistrate.

Forlowpowerandlowresolutionimagingdevicesandorwhenmeasurement isverycostly,compressivesamplingistraditionallyused.(e.g.Terahertzapplication).

ButtherestillexitsahugegapbetweenCStheoryanditsapplication to audio signals [13][14].How to construct asparse audio signal, especiallywhen CS depends on twoprincipal: sparsity (which pertains to signal of interest),and incoherence(whichpertains tosensingmodality), isstillunknown[6]‐[8].

WehaveusedDCTforsparserepresentationofanaudiosignal. It concentrates on the transformation content inrelatively few coefficients, and it achieves a good datacompressionwhichcausesitspopularity[9].Thuswecanobtain a compressed version of audio signal by firstobtaining a sparse representation in frequency domain,andthenafterprocessingtheresultwithCSalgorithm.

1.1COMPRESSIVESAMPLING

Forincreasingamountofdatainourmoderntechnology,mostofdatawecanthrowawaywithoutanyperceptuallosse.g. lossycompression formats for sound, imageetc.Hencequestionarises thatwhy toacquirealldatawhenmost of data we will throw away? Can we directlymeasure only that data which is necessary? A theory ofsignal recovery from highly incomplete information isdeveloped in recent series of paper [3]‐[8]. Overview ofresults state that sparsevectorx0ϵRN (e.g.Digital Signal)can be recovered from small number of linearmeasurement b=Ax0ϵRN or b=Ax0+e, where A is n x mmatrixwithfarfewerrowsthancolumn(n<<m)ande ismeasurementnoisebysolvingaconvexprogram.

ConsiderrealvaluedsignalxoflengthNandsupposethatthebasis functionψprovideskas sparse representationofx.Intermsofmatrixnotation,wehavex=ψ.f.InwhichfissparsevectorwithonlyKnon‐zeroelements,whichcanbewell approximated using only k<<N non zero entriesandψiscalledassparseorthogonalbasismatrixi.e.{ψ1,ψ2.....ψN}[4]

The CS theory sates that by taking only M=O(klogN)linear,nonadaptivemeasurementsshownbelowwecanreconstructsignalx.[1],[2]:

(1)

SUSHILKUMAR BAPUSAHEB SHINDE et al. DATE OF PUBLICATION: FEB 20, 2015

ISSN: 2348-4098 VOL 3 ISSUE 1 JAN-FEB 2015

INTERNATIONAL JOURNAL OF SCIENCE, ENGINEERING AND TECHNOLOGY- www.ijset.in 308

WhereYrepresentsM×1sampledvectorand is anM×N

measurement matrix that is incoherent with i.e., themaximummagnitudeoftheelementin .ψissmall[7].

Along with this information we decide to recover thesignalbyL1‐minimisationisprobablyexact[1].

1.2OneDimensionalDCT

The most common DCT definition of a 1‐D sequence oflengthNis

(2)

foru=0,1,2,…,N−1.Similarly,theinversetransformationisdefinedas

(3)

Forx=0,1,2,…,N−1.Inbothequations(ii)and(iii)α(u)isdefinedas

Itisclearfrom(1)thatfor

Thus,thefirsttransformcoefficientistheaveragevalueofthe sample sequence.Thisvalue is referred toas theDCCoefficientandallother transformcoefficientsarecalledtheACCoefficientsinLiterature.[9]

1.3 THETIME‐FREQUENCYFILTERBANK

TheMP3standard[4]recommendstheuseofahighpassfilter.Ahighpass filterallows frequenciesaboveagivencutofffrequencytopassanddoesnotallowloweronestopass. Inotherwords, itattenuatesthelowerfrequencies.Thecutofffrequencyshouldbeintherangeof2Hzto10Hz.

1.4 THEPOLYPHASEFILTER

ThepolyphasefilterusedinMP3[8]isadaptedfroman earlier audio coder named Masking Pattern AdaptedUniversal Subband Integrated Coding and Multiplexing(MUSICAM). It is a cosinemodulated lowpass prototypefilter with uniform bandwidth parallel M‐channelbandpass filter. This achieves nearly perfectreconstruction and has been called a psuedo QMF(QuadratureMirrorFilter).

2.PROPERTIESOFDCT

2.1 ENERGYCOMPACTION

For highly correlated signals DCT exhibits excellentenergycompaction.Theuncorrelatedsignalhasitsenergyspreadout,whereastheenergyofthecorrelatedsignalispackedintothelowfrequencyregion.Usingtheabilitytopack input data efficiency of transformation scheme canbe directly gauged into as few coefficients as possible.Because of this quantizer allows to discard coefficientwith relatively small amplitudes without introducingvisualdistortioninreconstructedsignal.

2.2 ORTHOGONALITY

IDCT basis functions are orthogonal. Thus, the inversetransformation matrix of A is equal to its transpose i.e.invA=A'.WhereAisanyrandomnxnmatrix.Thereforein addition to its decorrelation characteristics, thisproperty results reduction in pre‐computationcomplexity.

2.3 SYMMETRY

Thisisextremelyusefulpropertysinceitimpliesthatthetransformation matrix can be precomputed offline andapplied to the signal thereby providing orders ofmagnitudeimprovementincomputationefficiency.

2.4 DECORRELATION

The principle advantage of signal transformation is theremovalof redundancybetweenneighboringpixels.Thisleadstouncorrelatedthetransformcoefficientswhichcanbe encoded independently. The amplitude ofautocorrelation after the DCT operation is very smallhence it can be inferred that DCT exhibits excellentdecorrelationproperties.

2.5 SEPARABILITY

Perform DCT operation in any of the direction first andthenapplyonoppositedirection,thenalsocoefficientwillnotchange.

3.METHODOLOGY

This section includes proposed techniques applied to anaudiosignalanddescribedthetechniqueforrepresentingittheformofsparse.

Figure1:ProposedScheme

As we can recover sparse signal from just a fewmeasurements,compressivesamplingneedstodealwith




speech signals which are only approximately sparse. Toobtain an accurate reconstruction of such signal fromhighly under sampled measurement is the main issue.IdeallywehavetomeasurealltheNcoefficientsoff,butCSframeworkwillallowobservingasubsetoftheseonlyandcollectingthedata.

Figure2:AudioSignal

Figure3:FFTAmplitudeofAudioSignal

As seen in Figure 2, the audio signal (funky.wav) isconsideredherefortheoperationinTimedomainbutnotsparse, hence we have applied Fast Fourier Transform(FFT) which represents our signal in frequency domainandintheformofSparseSignalasshowninFig.3.

Due to Matrix transformation on compressive samplingprogram,asdescribein[10],phaseanglechangesbecauseof representation in real and complex parts. Hence justapplyingInverseFourierTransform,originalsignalwon’tberecovered.

4.RESULTS

Figure4:OriginalAudioSignal

Figure5:SpectrogramofOriginalAudioSignal

Figure6:DCTofOriginalAudioSignal

Figure7:ThresholdingofsignalafterDCT

Figure8:RandomMeasurementMatrix




Figure9:ObservationVector

Figure10:ReconstructionofSignalusingl1‐minimisation

Figure11:ReconstructedAudioSignalafterIDCT

Figure12:SpectrogramofReconstructedAudioSignal

As per above discussion, here we have taken a sampleAudiosignalasshowninFigure4.Aspectrogramisvisualrepresentation of spectrum of frequency in sound.[15].Forbetter visibility andunderstandingof this signal,we

have constructed spectrogram as shown in Figure 5.Spectrogram is nothing but graph of Time versusNormalizedfrequency.

Asour requirement is first togenerate sparsesignal,wehave taken DCT of audio signal shown in Figure 6. Forbetterperformanceandgoodcompressionofgivenaudiosignal, we can omit the unnecessary noisy samples bythresholding.Wecandecidetherangeforthresholdingasper our need. Here we have taken the range ofthresholdingas ‐0.06 to0.04.This rangeof thresholdinghasbeendecidedbytrialanderrormethodandselectingthethresholdrangewhichgivesbetteroutputasshowninFigure7.

Now for generationof observation vector (Figure9),wehave taken random samples from original audio signaland then reconstructed a matrix called ‘Randommeasurement matrix’ as shown in Figure 8. Bymultiplying threshold signal with random measurementmatrixwegetobservationvectorwhichisusedforfurtherprocessofreconstruction.

L1 minimization is theory of signal reconstruction fromhighly incomplete information[16].SoauthorshaveusedL1 minimization for reconstruction of audio signal. Thereconstructed audio signal using L1 minimization isshown in Fig. 10. Now to reconstruct this signal intooriginalaudiosignalwehavetakenIDCTofitasshowninFigure11anditsspectrogramisshowninFig.12.

Inthisexpt.,wehavetoconsidernumberofsamplesandcompressionratioisgivenby

Similarity=e2/E2(iv)

Where e2 is Matrix error given by norm of ||x‐xrec||dividedbyhislength.AndE2ismatrixpowergivenbybynormof||x||dividedbyhislength.

In Experiment‐I, Sparsity is kept constant at value of1000andCompressionFactoriskeptconstantatvalueof0.05. By varying Block Size, Compression Ratio, SNR,PSNRismeasured.

From the table Iwe can see that, for the less block sizemeasureofcompressionratioismore,butSNRandPSNRis less. But as Block size goes on increasing,measure ofcompressionratiogoesondecreasingwhereasSNRandPSNR goes on increasing. For Block size 8, Compressionratiois0.6877,SNRis3.3517dBandPSNRis15.5773dBwhereasforblocksize512,compressionratiois0.59688which is low as compare to compression ratio of blocksize8,SNRis4.0212dBandPSNRis16.4728dBwhichishighascomparetocompressionratioofblocksize8.

Table‐1:MeasureofCompressionRatio,SNRandPSNRforvariousvaluesofBlocksize

SR.No.

BlockSize

Measure ofCompressionRatio

SNR PSNR

1 8 0.6877 3.3517 15.5773

2 16 0.64371 3.1055 15.4329




3 32 0.60878 3.6844 16.2396

4 64 0.58672 3.4062 15.7336

5 128 0.55898 3.5541 15.7331

6 256 0.58086 3.4986 16.0433

7 512 0.59688 4.0212 16.4728

In Experiment‐II, Block Size is kept constant at value of128andCompressionFactor iskeptconstantatvalueof0.05.ByvaryingSparsity,CompressionRatio,SNR,PSNRismeasured.

From the table II we can see that, for the less value ofSparsity, SNR and PSNR is less. But as Sparsity goes onincreasing,SNRandPSNRgoesonincreasing.

ForSparsity128,SNRis‐0.5677dBandPSNRis11.6113dB where as for Sparsity 2500, SNR is 20.1456 dB andPSNRis32.3249dBwhichistoohighascomparetoSNRandPSNRatSparsity128.

Table‐2:MeasureofCompressionRatio,SNRandPSNRforvariousvaluesofSparsity

SR.No.

Sparsity SNR PSNR

1 128 ‐0.56777 11.6113

2 300 ‐0.07818 12.1008

3 600 1.6737 13.8527

4 800 2.3328 14.5118

5 1000 3.2318 15.4109

6 1200 5.1285 17.3075

7 1500 6.8487 19.0277

8 1800 10.2104 22.3895

9 2000 14.0706 26.2496

10 2500 20.1459 32.3249

InExperiment‐III,BlockSize is kept constant at valueof128 and Sparsity is kept constant at value of 1000. ByvaryingCompressionfactor(CF),MeasureofCompressionRatio,SNR,PSNRismeasured.

From the table III we can see that, for the lessCompression factor, measure of compression ratio ismore but SNR and PSNR is Compression factor 0.01,Measureofcompressionratiois0.86133,SNRis3.395dBandPSNRis15.7545dBwhereasforCompressionfactor2.5,measureofcompressionratio is0.0019531which istoo low as compare tomeasure of compression ratio ofCompression factor 0.01. SNR is 17.927 dB and PSNR is32.8129dBwhichisalsotoohigh.

Table‐3:MeasureofCompressionRatio,SNRandPSNRforvariousvaluesofCompressionfactor(CF).

SR.No.

CFMeasure ofCompressionRatio

SNR PSNR

1 0.01 0.86133 3.395 15.7545

2 0.03 0.68516 3.9678 16.3307

3 0.05 0.55898 4.0602 16.2393

4 0.08 0.4375 3.7993 16.052

5 0.1 0.38125 3.7526 15.9674

6 0.5 0.080469 3.6443 16.7103

7 1 0.028516 3.9149 17.0527

8 1.5 0.0125 6.6961 19.4112

9 2 0.0054687 10.7931 24.5405

10 2.5 0.0019531 17.9247 32.8129

5.CONCLUSION

Inputdata ispacked into few coefficients inDCT speechsignal representation. This helps quantizer to removecoefficients with smaller amplitudes without generatingaudiodistortioninreconstructedsignal.

Compressivesampling ismainlyusedforcompressionofimagesbutwecanachievegoodresultsbypreprocessingtheaudiosignal.

This technique can achieve a significant reduction innumber of samples required to represent certain audiosignal and it reduces required number of bytes forencoding.

FurtherimprovementsarepossiblewithadvancedcodingtechniqueslikeWaveletorDWT[23].

ACKNOWLEDGEMENTS

I want to give my whole sincere to my supervisor andgrateful appreciation to Prof. Mr. Rakesh Mandliya, hetriedherbest tohelpme.Withoutherguidance Icannotbringthetheoriesintopractice.Ontheotherhand,Iwantto thank all my family members and friends for theiralwayssupportandspiritualmotivation.

Thanksalot!

REFERENCES

[1].R.G.Moreno‐Alvarado,MauricoMartinez‐Garcia,”DCTCompressive Sampling of Frequency Sparse AudioSignals,” IEEE Trans. Inform. Theory, vol. II WCE 2011,July6‐8,2011,London,U.K.




[2]. Adnan I. Hussein,”Multirate Audio Coding Based OnCombining Wavelet with DCT Transform,” IEEE TransInform.Theory,Vol.16,9Dec.2008.

[3].E.Candes,J.Romberg,andT.Tao,“RobustUncertaintyPrinciples: Exact Signal Reconstruction From HighlyIncomplete Frequency Information, ”IEEE Trans. Inform.Theory,vol.52,pp.489–509,Feb.2006.

[4]. D. L. Donoho, “Compressed Sensing,” IEEE Trans.Inform.Theory,vol.52,pp.1289–1306,July2006.

[5]. M. Lustig, D.L. Donoho, J. Santos, and J. Pauly,“Compressed Sensing MRI,” IEEE Signal ProcessingMagazine,March2008.

[6]. E. Candes and J. Romberg, “Robust Signal RecoveryFrom Incomplete Observations,” in Proc. ICIP, 2006, pp.1281–1284.

[7].E.CandesandJ.Romberg,“SparsityAndIncoherenceIn Compressive Sampling,” Inverse Problems, vol. 23,pp.969–985,June2007.

[8]. E. Candes and J. Romberg, “Practical signal recoveryfrom random projections,” 2005,preprint.[Online]Available:www.acm.caltech.edu/emmanuel/papers/PracticalRecovery.pdf

[9]. Sayed Ali Khayam, “The Discrete Cosine Transform(DCT):TheoryandApplication”DepartmentofElectrical&ComputerEngineeringMichiganStateUniversity,2003

[10]. l1‐magic. [Online]. Available: http://www.l1‐magic.orgOct2010

[11]. John G Proakis and Dimitris G Manolakis, “DigitalSignalProcessing,”,ed.,Pearson,2007

[12].R.C.GonzalezandP.Wintz,“DigitalImage

Processing,”ed.,PrenticeHall.,2008.

[13]. T. V. Sreenivas and W. B. Kleijn, “CompressiveSensingforSparselyExcitedSpeechSignals,”inProc.IEEEInt.Conf. Acoust., Speech, Signal Processing, 2009,pp.4125–4128.

[14].D.Giacobello,M.G.Christensen,J.Dahl,S.H.Jensen,and M. Moonen, “Sparse Llinear Predictors for SpeechProcessing,”inProc.Inter.speech,2008.

[15]. Haykin, Simon (1991). Advances in SpectrumAnalysis and Array Processing. Prentice‐Hall.ISBN0‐13‐0074446

[16]. E. Candes and Justi Romera,”L‐Magic, Recovery ofSparseSignalViaConvexProgramming”,Oct.2005

[17].http://users.ece.gatech.edu/~sasif/homotopy.

[18].R.GribonvalandM.Nielsen,“SparseRepresentationsIn Unions Of Bases”, IEEE Transactions on InformationTheory,49(2003),3320‐3325.

[19].K.JogdeoandS.M.Samuels,“MonotoneConvergenceof Binomial Probabilities And A Generalization ofRamanujan'sEquation”,AnnalsofMath.Stat.,39(1968).

[20]. H. J. Landau, “The Eigenvalue Behavior of CertainConvolution Equations”, Trans. Am. Math. Soc.,115(1964).

[21]. H. J. Landau andH.O. Pollack, “Prolate SpheroidalWave Functions, Fourier Analysis And Uncertainty II”,BellSystemsTech.Journal,40(1961).

[22].H. J.LandauandH.Widom,Eigenvaluedistributionof time and frequency limiting, J. Math. Anal. App., 77(1980).

[23]. S.D, Gunjal and Dr. R.D. Raut, “Advance Sourcecoding compression Techniques: A Survey” IJCTAVol.3(4),2012,1335‐1342.

BIOGRAPHIES

Mr. Sushilkumar BapusahebShinde completed B.E. inElectronics & Telecommunicationin 2012 from SRESCOE,Kopargaon, PuneUniversity, PuneandcurrentlypursuingM.Tech.inVLSI from BMCT, Indore, RGPV,MP.

Prof Mr.RakeshMandliya isHeadofECDepartment,BMCTIndore,MadhyaPradesh,India.




AUDIO COMPRESSION USING DCT & CS 1 · PDF fileAUDIO COMPRESSION USING DCT ... syndrome bits of an LDPC code applied to the projections. By using the DCT as ... loss e.g. lossy compression

Documents

AUDIO COMPRESSION USING DCT & CS 1 · PDF fileAUDIO COMPRESSION USING DCT ... syndrome bits of an LDPC code applied to the projections. By using the DCT as ... loss e.g. lossy compression