Realizing an autonomous recognizer using data compression

ESA-EUSC-JRC 2011

Realizing an autonomous recognizer using data compression

ESA-EUSC-JRC 2011, Keynote Speech, T. Watanabe, U.E.C, Tokyo, JapanESA-EUSC-JRC 2011, ISPRA, Varese, Italy, 2011.03.31

Toshinori WatanabeProfessor, Dr. EngGrad. School of Inf. Sys., UEC,Tokyo, [email protected]@gmail.com

T. Watanabe with a long komuso-shakuhachi

3/28/201111Thank you for all of you for the hurtful helps given to Japan since the catastrophic earthquake and the tsunami in the middle of this month.Thank you also for Professor Datcu, committee members and chairman, for permitting me to give my presentation in this irregular style.I have been looking forward to join this meeting, meet friends again and enjoy the beautiful site around the lake, but I decided to abandon them and concentrate on my roles in Tokyo.Today, Id like to talk my favorite topic, the problem of realizing an autonomous recognizer using data compression.StoryRevisit the recognition problemWhat is recognition?Low level recognition as FSQ (Feature Space Quantization)Clarify open problems in FSQ designPropose an autonomous FSQCompressibility-based general feature space using PRDC Case-based nonlinear feature space quantizer TAMPOPOCSOR : Compression-based Self Organizing RecognizerESA-EUSC-JRC 2011, Keynote Speech, T. Watanabe, U.E.C, Tokyo, Japan

This slide shows my story. Firstly, I will revisit the recognition problem. After asking what is recognition? I will give a view that the low level recognition problem is the problem of feature space quantization, FSQ in short.Secondly, I will clarify open problems in FSQ design. Thirdly, I will propose an autonomous FSQ with the compressibility-based general feature space using PRDCand the case-based nonlinear feature space quantizer TAMPOPO.Both are my own inventions. Combining them, I will introduce CSOR, a compression-base Self-Organizing Recognizer, as a possible autonomous FSQ.3/28/20112What is recognition?Approximately, it is a mapping cascade Low level : from input signals to low level labelsHigh level : from low level labels to high level onesESA-EUSC-JRC 2011, Keynote Speech, T. Watanabe, U.E.C, Tokyo, JapanSignalsLow level labelsHigh level labelsLow level mappingHigh level mapping

This slide shows what is recognition?. Approximately, it is a mapping cascade composed of at least two levels. The low level mapping is from input signals to low level labels. And the high level mapping is from low level labels to high level ones.Today, I will concentrate on the former.

3/28/20113Low level recognition example

roadNaked landForestHousesInput: set of signalsOutput: set of labelsESA-EUSC-JRC 2011, Keynote Speech, T. Watanabe, U.E.C, Tokyo, Japan

3/28/201144This slide shows a low level recognition example.The input is a set of color signals and the output is the set of land cover labels.Low level recognition as the problem of FSQRepresentativesADALINE (Adaptive Linear Threshold)SVM (Support Vector Machine)SOM (Self Organizing Map), etc.They are all feature space quantizers (FSQ) ESA-EUSC-JRC 2011, Keynote Speech, T. Watanabe, U.E.C, Tokyo, JapanFeature spaceFeature spaceQuantized & labeled feature spacePartitioning & labelingbuildingseasquaregrass

This slide shows the representative low level recognition methods.Representatives include, as many of you know, the adaptive linear threshold, the support vector machine, and the self organizing map, and so on.They are all feature space quantizers, in that all of them exploit some multi-dimensional feature space as well as some space partitioning and labeling functions to get a quantized and labeled feature space.3/28/20115Open problems in FSQ designTwo basic elements of FSQPreparation of a set of bases to span the feature spacePreparation of a method to partition the spaceusing observed vectors, i.e., casesESA-EUSC-JRC 2011, Keynote Speech, T. Watanabe, U.E.C, Tokyo, JapanFeature 1Feature 2Feature spaceFeature 1Feature 2Cases

Now let me point out open problems in FSQ design.Two basic elements of FSQ are shown here.These are the preparation of a set of bases to span the feature space,and the preparation of a method to partition the space using observed vectors, in other words, cases.3/28/20116Open problems in FSQ design Feature space designColor histogram, Fourier coeff., Shape momentsProblem-specific , not generalQuantizor designLinear/nonlinear, Offline/onlineModel-respecting, memory-saving,not individual (case)-respecting

ESA-EUSC-JRC 2011, Keynote Speech, T. Watanabe, U.E.C, Tokyo, Japan

This slide shows open problems in the design of these two elements.As for the feature space design, we know the color histogram, Fourier coefficients, shape moments, and so on. I think these are problem-specific, and so not general.As for quantizor design, we have many types, linear or nonlinear, and offline or online. I think these are strongly model-respecting, and memory-saving, but not individual- or case-respecting.3/28/20117My proposals : How to realize a highly autonomous FSQ Compression-based general feature space by PRDCCompressibility feature spaceAutonomous feature space generation processSource signal textizationCase-based feature space quantization by TAMPOPOCSOR: Possible autonomous FSQ


This slide summarizes my proposals on how to realize a highly autonomous FSQ.One is the compression-based general feature space based on PRDC,A pattern representation scheme using data compression. The other is the case-based feature space quantizer based on TAMPOPO learning machine.Both are highly data-respecting approaches contrasting to the traditional statistical model-respecting and memory-saving approaches. These require larger memory but respect individual cases, simple, robust to nonlinearity, and easy to modify. Combining them, I will propose CSOR, a compression-base Self-Organizing Recognizer, as a possible highly autonomous FSQ.

3/28/20118Text (sequence) featuring paradigm 1 : Statistical information theory of ShanonTries to characterize a statistical set XSelf entropy H(X) = p(x) log p(x)Joint entropy H(XY) of X and YConditional entropy H(X|Y)Mutual information I(X;Y)I(X;Y) = H(X) + H(Y) - H(XY)H(XY) = H(X) + H(Y) - I(X;Y)Required to know occurrence probabilities of the target texts

ESA-EUSC-JRC 2011, Keynote Speech, T. Watanabe, U.E.C, Tokyo, JapanH(X)H(Y)H(XY)H(X|Y)I(X;Y)

Before entering into details, let us visit two fundamental text featuring paradigms.The first one is the statistical information theory of Shanon.It tries to characterize a statistical set X by using the self entropy H(X), Joint entropy H(XY) and so on.Of course, X can be a text set. However, we are required to know the occurrence probabilities of the target texts.

3/28/20119Text (sequence) featuring paradigm 2 : Algorithmic information theory (AIT) of KolmogorovTries to give the complexity of an individual text xK(x) = min( sizeP{ A(P)=x | A: some algorithm} )K(x) is not statistical but defined on an individual xK(x) has similar properties as H(x)K(x) cant be calculatedAIT is the heavenly fire of ZeusESA-EUSC-JRC 2011, Keynote Speech, T. Watanabe, U.E.C, Tokyo, Japan

3/28/20111010The second text featuring paradigm is the algorithmic information theory, AIT in short, of Kolmogorov.Different from Shanon, Kolmogorov tried to give the complexity of an individual text x.He defined the complexity K(x) as the minimum program length that can output x under some algorithm A.K(x) is not a statistical function but defined on an individual x.K(x) has similar properties as the statistical entropy H(X).But regrettably, K(x) can not be calculated at all.It is only a mathematical idea, not reachable by us.So, AIT is, so to speak, the heavenly fire of Zeus.LZ coding by Zip and LempelAn approximation device to calculate K(x)The rate R of new phrase appearance when x is compressed by a self-delimiting encoder Proved that R = H(x) for long textsThey were the Prometheus who brought K(x) down to our earth


3/28/20111111The LZ coding scheme proposed by Zip and Lempel is an approximation device to calculate K(x).They tried to approximate K(x) by the rate R of new phrase appearance when x is compressed by aself-delimiting encoder. They proved that, R converge to H(X) for long texts.So, they are the prometheus who brought K(x) down to our earth.PRDCEarly trial to exploit individual objects complexity in real world problem solvingGeneral media data featuring schemeCompressibility vector (CV) spaceSpanned by compression dictionaries D1, D2, , DnCV = ((X|D1), (X|D2), , (X|Dn))For generality enhancementPre-textizationLZ-type text compressionESA-EUSC-JRC 2011, Keynote Speech, T. Watanabe, U.E.C, Tokyo, Japan

My PRDC scheme is an early trial to exploit individual objects complexity in real world problem solving.It was proposed as a general media featuring scheme composed of several devices.These include, the compressibility vector space spanned by compression dictionaries.For generality enhancement, pre-textization and LZ-type text compressor were used.

3/28/201112Where is PRDC ?Algorithmic ITKolmogorov,et.al. K(x)Algorithmic IT based similarity Li, et.al., Datcu, Cerra, Gueguen, Mallet.

LZ compressorZip & Lempel

ConceptualRealNon-parametric, algorithmicParametric, statisticalStatistical inf. theory H(X)ShannonPRDC Watanabe ESA-EUSC-JRC 2011, Keynote Speech, T. Watanabe, U.E.C, Tokyo, Japan

3/28/20111313This slide shows the living pit of PRDC.To get a rough vista, I have placed related ideas in a two dimensional feature space.The horizontal axis represents weather it is conceptual or real.And the vertical axis represents whether it is parametric or not.In other words, statistical or algorithmic.The Shannon's statistical information theory might be placed at bottom.The classical algorithmic information theory by Kolmogorov, et.al, might be placed pper left. It was only conceptual. But it triggered the work of Zip and Lempel.They invented the LZ compressor as a practical device to measure the complexity of a text. My PRDC would be placed here. The goal of the recent AIT based similarity measure researches by Li, et al. and Dr. Datcus group coincide with my own goal. Enhanced measures and wide applications are being proposed by them.My proposals : How to realize a highly autonomous FSQ Compression-based general feature space by PRDCCompressibility feature spaceAutonomous feature space generation processSource signal textizationCase-based feature space quantization by TAMPOPOCSOR: Possible autonomous FSQ


Now let me return to my proposals.The first topic is the compression-based general feature space by PRDC.I will discuss the compressibility feature space first.Then, I will propose an autonomous feature space generation process.The source signal textization will be visited only briefly.3/28/201114Overview of PRDCTextizationSoundFeaturespaceImageOthersTextDictionary- based text compressionPivotal representationMedia-specific methods Compressibility vectorsESA-EUSC-JRC 2011, Keynote Speech, T. Watanabe, U.E.C, Tokyo, Japan

Applications

This slide shows the overview of the PRDC.Signals from various information sources, such as image, sound and others are textized by media specific methods and the texts are mapped into a compressibility feature space by dictionary-based text compression.Various type signals can be mapped into the compressibility feature space as vectors, those can be used for similarity-based retrieval, data mining and so on.

3/28/201115Dictionary based compression : LZWCompressDic.aabababaaaabCurrent place cursor01RootESA-EUSC-JRC 2011, Keynote Speech, T. Watanabe, U.E.C, Tokyo, JapanInitial state

3/28/20111616Let me precise the dictionary based compression by using the LZW compressor.This is the initial state.The original text aabababaaa is given and the dictionary contains the initial tree as shown here.All the alphabet characters, in this case a and b, are memorized on two edges emanating from the root node to descendant nodes each holding the short code 0 and 1.The current place cursor is placed at the start of the input text. First cycle bababaaaCompressDic.ab012aNext start pointaa0ESA-EUSC-JRC 2011, Keynote Speech, T. Watanabe, U.E.C, Tokyo, Japan

3/28/20111717The first cycle of the compression process is shown here.First, the longest prefix, of the text, in this case the red charactera, is found in the dictionary.Here, the longest prefix is the longest characters after the current cursor that can be found in the dictionary.As the found a has the code 0, so it is output.Then, to extend the dictionary, one character peeping , the blue a in this case is made and a new phrase aa with the new code 2 is added.(Note that the added code is not used at this time. It is used from the next chance.)Finally the cursor is moved to the next starting point.

0CompressDic.ab0123ab ababaaaaab0ESA-EUSC-JRC 2011, Keynote Speech, T. Watanabe, U.E.C, Tokyo, JapanSecond cycle

3/28/20111818This is the second cycle. a is found again, 0 is output, and the one character extension b is found.It is added to the dictionary with the new code 3.Then the cursor is moved.ESA-EUSC-JRC 2011, Keynote Speech, T. Watanabe, U.E.C, Tokyo, Japan001352CompressDic.ab0123456abaaaaabababaaaFinal state

3/28/20111919This is the final state.As there is no remaining characters, the cycle stops.Behavior summaryInput

Output

CR (Compression Ratio) = |output| / |input| = 6/10 = 0.6

aabababaaa001352ab0123456abaaaDictionaryESA-EUSC-JRC 2011, Keynote Speech, T. Watanabe, U.E.C, Tokyo, Japan

3/28/20112020This is the behavior summary.The input text is encoded into the output 001352.The dictionary is generated as shown here.The compression ratio, defined as the ratio of the output length to the input length, is 0.6.Another exampleESA-EUSC-JRC 2011, Keynote Speech, T. Watanabe, U.E.C, Tokyo, JapanInput

Output

CR (Compression Ratio) = 4/10 = 0.4 (aabababaaa: 0.6)bbbbbbbbbb1234ab01234bbbDictionary

3/28/20112121This is an another example.The input is the succession of 10 bs. The output is 1234. The dictionary is almost linear.The compression ratio is 0.4, which is smaller than the previous 0.6.This is because this text contains many repeating phrases.

Compressibility vector space ESA-EUSC-JRC 2011, Keynote Speech, T. Watanabe, U.E.C, Tokyo, JapanWhat will happen ifWe compress a text Ty by a dictionary D(Tx) of another text TxBy using the LZW* methodLZW* uses D(Tx) in freeze modeExperimentT1 = aabababaaa, T3 = aaaabaaaabT2 = bbbbbbbbbb, T4 = bbbbabbbbaDictionaries = (D(T1), D(T2))

3/28/20112222Now let me introduce the compressibility vector space.What will happen if we compress a text Ty by a dictionary D(Tx) of another text Tx, by using a slightly different LZW* method.Here LZW* uses the dictionary D(Tx) in freeze mode,that is, the dictionary is fixed and does not enlarge during the compression.Let us experiment using following four texts, T1 and T3 that are similar,and T2 and T4, that are also similar.We use the dictionaries (D(T1), D(T2)) appeared before tospan the feature space.

Compressibility vector space ESA-EUSC-JRC 2011, Keynote Speech, T. Watanabe, U.E.C, Tokyo, JapanCR by D(T1)CR by D(T2)0 0.25 0.5 0.75 110.750.50.250T3=aaaabaaaabT4=bbbbabbbbaT1T2T3T4T1=aabababaaaT2=bbbbbbbbbb

3/28/20112323This slide shows the compressibility vector space.It is spanned by the two dictionaries D(T1) and D(T2).Compression ratio vectors of four texts are shown.Similar texts T1 and T3, and T2 and T4 formed two clusters.

The projection of these vectors to the horizontal axis can be interpreted as D(T1) knows T1 and T3 better than it knows T2 and T4.Similarly, D(T2) knows T2 and T4 better than it knows T1 and T3.

Please notice that, in the CV space, each text is represented by not only positive but also negative opinions from all the dictionaries spanning the space.

This is very important to realize a high resolution space to store and retrieve many texts using a rather small dictionary set.My proposals : How to realize a highly autonomous FSQ Compression-based general feature space by PRDCCompressibility feature spaceAutonomous feature space generation processSource signal textizationCase-based feature space quantization by TAMPOPOCSOR: Possible autonomous FSQ


Now let me introduce the autonomous feature space generation process.I will find some facts of the compressibility vector space and design the autonomous process based on them.3/28/201124Fact 1: Local properties of CV spaceESA-EUSC-JRC 2011, Keynote Speech, T. Watanabe, U.E.C, Tokyo, JapanCR by D(T1)CR by D(T2)0 0.25 0.5 0.75 110.750.50.250Known to T1 onlyKnown to T2 onlyKnown to bothUnknown to both

3/28/20112525This slide shows the first facts concerning the local properties of the CV space.The left bottom area vectors are known to both T1 and T2.The right bottom area vectors are known to T2 only.The left top area vectors are known to T1 only.The top right area vectors are unknown to both T1 and T2. In other words, these are independent to T1 and T2.This fact tells us that by gathering the top right area vectors, we are able to extend the CV space autonomously.

Fact 2: Similar bases cause low resolutionCR by D(bbbbbbbbbb)CR by D(bbbbbbbbbb)0 0.25 0.5 0.75 110.750.50.250T1T2T3T4ESA-EUSC-JRC 2011, Keynote Speech, T. Watanabe, U.E.C, Tokyo, Japan

This slide shows the second fact of the CV space.Both axes are spanned by identical texts, all the vectors become placed on the diagonal line.So, the resolution of the space becomes low.This fact tells us not to use similar texts as space bases.3/28/201126Fact 3:Concatenated text causes low resolution CR by D(T1=aabababaaa)CR by D(T12=T1T2=aabababaaabbbbbbbbbb)0 0.25 0.5 0.75 110.750.50.250T1T4T3T2T12T12

3/28/20112727This slide shows the third fact of the CV space.If we use the concatenated basis as shown in the vertical axis, where two texts T1 and T2 are concatenated to T12.All the texts become to have similar y axis values and the resolution becomes low.This is because the concatenated vertical basis knows both T1 and T2 well.This fact tells us not to use texts that include much for bases.

Fact 4:Splitting can enhance resolution D(aabababaaa)D(T12=aabababaaabbbbbbbbbb)D(aabababaaa)D(aabababaaa)D(T2=bbbbbbbbbb)D(T1=aabababaaa)

This slide shows the fourth fact of the CV space.This is the reverse of the fact 3.The long text T12 used for the y axis in the left space is split into shorter T1 and T2 in the right two spaces.In these new spaces, the resolution are enhanced.This fact tells us that the basis text trimming enhances the resolution. Of course, in such a case when the original text and its parts are all similar, this enhancement can not be expected.

3/28/201128Autonomous CV space generation processDefine the CV space at step k as CVS(k) = [D(k), F(k)]D(k) is the list of current base dictionaries at step kF(k) is the list of current foreign segments at step k Rewrite CVS(k) as follows foreverGet an input text segment x (of reasonable length) Branch by casesCase1) d* in D(k) nicely compresses x then If d* is full then D(k+1) = D(k), F(k+1) = F(k)If d* is not full then D(k+1) = D(k) d* + ed*, F(k+1) = F(k) % ed* : d* enlarged by xCase2) x is foreign to D(k) and F(k) is not full, then add x to F(k)D(k+1) = D(k), F(k+1) = F(k) + xCase3) x is foreign to D(k) and F(k) is full then extend D(k) by using F(k)Let dd* dictionaries generated by ff* in F(k) by LZW %ff* : set of large similar groupsD(k+1) = D(k) + dd*, F(k+1) = F(k) ff*.ESA-EUSC-JRC 2011, Keynote Speech, T. Watanabe, U.E.C, Tokyo, Japan

Using these facts, we can introduce the following autonomous CV space generation process.First, Define the CV space at step k as CVS(k) = [D(k), F(k)], where D(k) is the list of current bases dictionaries at step k andF(k) is the list of foreign segments at step k.Second, rewrite the CVS(k) as follows forever. Get an input text segment x of reasonable length. From the fact 4, the length of x should not be so long.Then branch by cases.Case 1 is when some current dictionary member d* compresses x well, then if d* is full do nothing.Else if d* is not full, then enlarge it by LZW compression of x. Case 2 is when x is foreign to current dictionaries, then add x to F(k), if it is not yet full.Case 3 is when x is foreign to current dictionaries and F(k) is full, then gather similar texts ff* in F(k) to generate a set of new dictionaries and add them to current dictionary D(k). ff* should be deleted from F(k).

3/28/201129Current basis dictionariesSplitterFeature-space-based applicationRewriterTextsSegmentsLengthESA-EUSC-JRC 2011, Keynote Speech, T. Watanabe, U.E.C, Tokyo, JapanAutonomous CV space generator : diagramCurrent foreign segments

3/28/20113030This slide shows the overall diagram.Text is split into segments and input to the rewriter.The rewriter gets the segments one by one and renews the current foreign segments and current basis dictionaries.At any time, the current basis dictionaries can be read out for feature-space-based application.My proposals : How to realize a highly autonomous FSQ Compression-based general feature space by PRDCCompressibility feature spaceAutonomous feature space generation processSource signal textizationCase-based feature space quantization by TAMPOPOCSOR: Possible autonomous FSQ


Now let me move to the source signal textization.3/28/201131Source signal textizaiton : ImageImage-MST (Minimum Spanning Tree)ESA-EUSC-JRC 2011, Keynote Speech, T. Watanabe, U.E.C, Tokyo, JapanGraphwithcolor differenceedge weightsImage-MSTPixel array02

00

3/28/20113232This slide shows a method of source signal textization for images.The original pixel array is transformed into a graph with each edge carrying a pixel color difference.Then the MST is extracted so as to connect similar colored pixels.Notice that, MST edges tend to run along similar colored contours. So MST edges reflect part-of the objects shape.T = abbbabbbcdccffee ColorDirect.bfdaecImage-MSTEncoding tableOutput textsESA-EUSC-JRC 2011, Keynote Speech, T. Watanabe, U.E.C, Tokyo, JapanSource signal textizaiton : ImageTextization by MST traversal

3/28/20113333This slide shows the textization by MST traversal.Each of the nodes is visited as shown by the dotted arrow along the MST.And at each time we reach a new node, the encoding table is looked up and a character is output.For example, a is output when we arrive at a blue node along the horizontal edge as shown in the table,and b is output if we arrive at a blue node along a vertical edge.We get the output text T as shown. At a branching node, our experience showed that the small-weight-edge first or the simple depth-first rule work well.

My proposals : How to realize a highly autonomous FSQ Compression-based general feature space by PRDCCompressibility feature spaceAutonomous feature space generation processSource signal textizationCase-based feature space quantization by TAMPOPOCSOR: Possible autonomous FSQ


Now let us move to the second topic, the case-based feature space quantization by TAMPOPO.

3/28/201134Case-based feature space quantization by TAMPOPOESA-EUSC-JRC 2011, Keynote Speech, T. Watanabe, U.E.C, Tokyo, JapanQuantized feature space with local labelsFeature space with case data QuantizationL4L2L1L3GoalIncremental non-linear quantization under successive case data arrival

3/28/20113535The goal here is the incremental non-linear quantization under successive case data arrival.By this scheme we hope to get the quantized feature space with local labels as shown in the right figure.Possible schemeTAMPOPO learning machineTAMPOPO is the Japanese of DANDELIONDuplication AND DELEtiON scheme for learningBasic ideasIndividual case data representation mimicking the snow-cap shapeEvolutional rewriting of the case databaseNonlinear mapping formation by territories of casesESA-EUSC-JRC 2011, Keynote Speech, T. Watanabe, U.E.C, Tokyo, Japan

3/28/20113636This slide shows the possible scheme.My proposal is the TAMPOPO learning machine.Here TAMPOPO is the Japanese name of DANDELION which is also the abbreviation of Duplication and deletion scheme for learning.The basic ideas include, the individual case data representation mimicking the snow-cap shape, the evolutional rewriting of the case database, and the nonlinear mapping formation by territories of cases.The life of TAMPOPOWaterHigh land

SandMeadowLiveDieLiveDieESA-EUSC-JRC 2011, Keynote Speech, T. Watanabe, U.E.C, Tokyo, Japan

3/28/20113737This figure shows the life of TAMPOPO.Year by year, new generation snow-caps are born by mutated duplication of their parents. They are carried by winds to many places.Those arriving at bad places such as water or sand will die.Those arriving at places such as highland or meadow will succeed to live.Repeating this, the land will be covered gradually by individuals each with a pit-specific DNA.From the viewpoint of computation, this process is no less than the online FSQ, wherein the whole land is the feature space and each of the pit-specific TAMPOPOs DNA isthe label of the pit. The mapping between the feature space to the label set is generated fully autonomously.

The shape of a TAMPOPOSeedFitness scoreUpper fur : my possible worst score functionLower fur : my possible best score functionMy data (label)Root : my key (feature)My fitness score Feature spaceESA-EUSC-JRC 2011, Keynote Speech, T. Watanabe, U.E.C, Tokyo, JapanF1F2Smaller is better

3/28/20113838This slide shows the shape of an individual TAMPOPO.I have designed it mimicking the figure of the dandelions snow-cap.It has a root, a trunk, a seed, and upper and lower furs.The root denotes the feature vector of the environment where this TAMPOPO was implanted.The trunk height denotes the fitness score to the environment.We assume the smaller score is better.The seed denotes the output or the responce data, that is the DNA. Depending on the application, it may be a real vector or an integer label.The upper fur gives the possible worst score when it were implanted around its root.In contrast, the lower fur denotes the possible best score.

Superior / Inferior / Incomparable relationT1 is superior to T2, as the possible score range of T1 is always better than that of T2ScoreFeature spaceC1T1C2T2ESA-EUSC-JRC 2011, Keynote Speech, T. Watanabe, U.E.C, Tokyo, Japanf

3/28/20113939This slide shows the superior/inferior/incomparable relation between two TAMPOPOs T1 and T2.In this case, T1 is superior to T2 or T2 is inferior to T1, because the possible score range of T1 is always better than that of T2 for all points of the feature space. T1 can know this, by comparing its score and the value of the lower fur of T2 at it root. If this relation does not hold, we say T1 and T2 are incomparable.

Acquisition of the mapping : F CNon-liner mapping acquired: F1 C1, F2 C2, F3 C2, F4 C1C1C2C2C1ScoreFeature spaceF1F2F3F4T1T2T3T4ESA-EUSC-JRC 2011, Keynote Speech, T. Watanabe, U.E.C, Tokyo, Japan

3/28/20114040This slide shows the acquisition of the mapping between the feature space and the output data set.Suppose the current TAMPOPO database is as shown here.For ease of explanation, one dimensional feature space example is shown.We can use the upper furs to introduce territories F1, F2, F3, and F4, in the feature space.The break even point of the upper furs of two adjacent TAMPOPOs induces a territory border . The background semantics of F2, for example, is that, for all points in F2, the TAMPOPO T2s data C2 gives the best possible score.

Each of territories T1, T2, T3, and T4 introduces the mapping, from F1 to C1, F2 to C2, F3 to C2 and F4 to C1, respectively.

Notice that C1 is reached from both F1 and F4.This means that a non-linear mapping is realized very easily.Rewrite : Recall the best with duplicationC1C2C1ScoreFS(1) Input query vector f*(2) Recall T3 arg.best(worst(f*))T1T2T3f*Copy of T3C1T3*f*ESA-EUSC-JRC 2011, Keynote Speech, T. Watanabe, U.E.C, Tokyo, Japan

3/28/20114141Let me explain the rewriting process next.The first step is recall the best with duplication.(1) Suppose the input query vector f* arrives as shown,(2) Then the TAMPOPO that gives the best possible worst score at f* is recalled. In this case T3.We duplicate T3 as T3*. But notice the root is changed to f*.

Rewrite : Modify the seed , output and get the scoreC1*f*C1*C1f*Modify CApply c1* to environment and get the fitness score j*f*T3*T3*T3*ESA-EUSC-JRC 2011, Keynote Speech, T. Watanabe, U.E.C, Tokyo, JapanScore j*

3/28/20114242Next, the seed c1 is modified to c1* by some means, the random mutation for example.This c1* is applied to the outer environment and the fitness score j* is acquired.Combining them, a new generation TAMPOPO T3* is defined.This is a case data asserting that, under the environment feature f*, the output c1* brings the score j*.

Rewrite : Implant it with inferior deletionC1C1FST1T3f*C2T2The inferior T2 of T3* is deleted T3*ScoreC1C1FST1T3f*C1*T3*ScoreESA-EUSC-JRC 2011, Keynote Speech, T. Watanabe, U.E.C, Tokyo, JapanOld DB New DBC1*T3*

3/28/20114343Finally, the implantation of the new generation T3* is tried.As shown here, T2 which is inferior to T3* is deleted and T3* is implanted, giving the new DB. If this is not the case, simply T3* is implanted as long as the memory is available.Otherwise, T3* is casted away.

Evolutional rewriting of individualsRecall the best element by duplicationModify its seed vector, output and get its scoreImplant it with inferior deletionFeature vector Output vectorEnvironmental scoreESA-EUSC-JRC 2011, Keynote Speech, T. Watanabe, U.E.C, Tokyo, JapanOld DB New DBRewriter

3/28/20114444This slide summarizes the above processes.Upon receiving the feature vector, the rewrite process output a response and renew the old TAMPOPO database into the new one.This process is repeated forever and the nonlinear mapping between the feature space and the output vector space is formed autonomously.C1C2FSf*ScoreC1C1FSC3ScoreHow to get the mapping :Feature Label for FSQJcAgingIntroduce a new threshold score JcIf arg.best(worst(f*)) < Jc then recall it, else implant a new child with a new label and a default score Jd at f*(Possibly) Add an aging score to allESA-EUSC-JRC 2011, Keynote Speech, T. Watanabe, U.E.C, Tokyo, JapanJd

3/28/20114545This slide shows how to get the mapping from a feature space to an integer label set when neither the label nor the environmental score are given. This is often the case when FSQ is applied to low level image recognition.One idea is shown here.Instead of the environmental score, let us introduce a permissible fixed score Jc to judge the necessity of a new TAMPOPO.In the recalling step, if we can find a TAMPOPO of which upper fur value is less than Jc, then recall it,but if this is not the case as shown here, no similar TAMPOPO as yet exits, so implant a new child with a new integer label having a default score, say Jd.Finally add an aging score to all, so as to make it possible to forget old cases.By using appropriate values of Jc and Jd, and the aging parameters, the mapping from a feature space to integer labels can be formed automatically.

My proposals : How to realize a highly autonomous FSQ Compression-based general feature space by PRDCCompressibility feature spaceAutonomous feature space generation processSource signal textizationCase-based feature space quantization by TAMPOPOCSOR: Possible autonomous FSQ


Finally, combining these two schemes, let me propose CSOR, the compression-base Self-Organizing Recognizer, as a possible highly autonomous FSQ.

3/28/201146CSOR: Possible Autonomous FSQESA-EUSC-JRC 2011, Keynote Speech, T. Watanabe, U.E.C, Tokyo, JapanSignal sourceRecognized labelsTextizerCompressionTextsAutonomous feature (CV) space generatorFeature vectors (CVs)Autonomous feature (CV) space quantizerTAMPOPO DBCurrent basis dictionariesCurrent foreign segments

3/28/20114747This slide shows the CSOR.

The input signals are textized and the feature space generator generates basis dictionaries incrementally.Foreign text segments are gathered and used to extend the bases dictionaries automatically.

Dictionaries are applied to input texts to get their compression feature vectors.

Using these vectors, The TAMPOPO-based feature space quantizer output their labels side by side with the generation of the mapping from the feature space to labels in online autonomous mode.ESA-EUSC-JRC 2011, Keynote Speech, T. Watanabe, U.E.C, Tokyo, Japan

Application : Land cover analysis 1

3/28/20114848I have not yet finished checking the overall performance of CSOR.So, I will show only the utility of its core part.This is the result of compression-based land cover analysis.The image-MST was cut into similar colored segments, and each of them was processed by PRDC to get its feature vector.Instead of the full TAMPOPO learning process, each of the representative land cover texts were memorized as a pair of feature vector and its label. And the output label for a query was determined by the simple nearest-neighbor search.The canal, sea, road, rail, and large buildings could be recognized. ESA-EUSC-JRC 2011, Keynote Speech, T. Watanabe, U.E.C, Tokyo, Japan

Application : Land cover analysis 2

3/28/20114949This is an another example.The sea, woods, lands, and roads could be recognized.

SummaryIn this presentationI have picked up the problem of low level recognizer design in highly autonomous modeIt is a feature space quantizer (FSQ) construction problemA possible solution CSOR is proposedCSOR : Compression-based Self-Organizing RecognizerMain componentsA general compression-based feature space using PRDCAn online feature space quantizer based on TAMPOPOCSOR is highly data-respecting and fits to the modern computer with rich memory


This slide is the summary.In this presentation,I have picked up the problem of low level recognizer design in highly autonomous mode, and notified that it is a feature space quantizer construction problem.A possible solution CSOR is proposed.The nickname comes from Compression-based Self-Organizing Recognizer,It is composed of A general compression-based feature space using PRDC, andAn online feature space quantizer based on TAMPOPO.CSOR is highly data-respecting and fits to modern computer with rich memory.

3/28/201150ESA-EUSC-JRC 2011, Keynote Speech, T. Watanabe, U.E.C, Tokyo, JapanReferences PRDC (1) T. Watanabe, K. Sugawara and H. Sugihara, A new pattern representation scheme using data compression, IEEE trans. PAMI, Vol.24, No.5, pp.579-590, 2002.

TAMPOPO (1) T. Watanabe, K. Sasaki and K. Ihara, DANDELION Duplication and deletion strategy which realizes autonomous learner, JIPSJ, Vol. 24, No. 6, pp.847- 856, 1983. (in Japanese) (2) T. Watanabe, TAMPOPO: An Evolutionary Learning Machine Based on the Principle of Realtime Minimum Skyline Detection, Advances in software science and technology, 1994, Academic Press.

LectureT.Watanabe, On the possibility of highly automated image information mining: problems and possible solutions, Workshop on Innovative Data Mining Techniques in Support of GEOSS, August 31, 2009 - September 2, 2009, Sinaia, Romania,http://events.rosa-rc.ro/index.php/GEOSS_Sinaia/GEOSS_09/schedConf/program

3/28/20115151These are the references of PRDC, TAMPOPO and related past lectures.Thanks for your attentionESA-EUSC-JRC 2011, Keynote Speech, T. Watanabe, U.E.C, Tokyo, Japan

Toshinori Watanabe [email protected]

Komuso and children at duskPart of this research was supported by JSPS 19500076

3/28/20115252Thank you for your attention.

Realizing an autonomous recognizer using data compression

Documents