Classification and Clustering of Spatial Patterns with Geometric Algebra

Classification and Clustering of Spatial Patternswith Geometric Algebra

Minh Tuan Pham, Kanta Tachibana, Eckhard M. S. Hitzer, Tomohiro Yoshikawa,and Takeshi Furuhashi

Abstract In fields of classification and clustering of patterns most conventionalmethods of feature extraction do not pay much attention to the geometric propertiesof data, even in cases where the data have spatial features. This paper proposes touse geometric algebra to systematically extract geometric features from data givenas a series or a set of vectors in a vector space. We show the results of classificationof hand-written digits and those of clustering of consumers’ impression with theproposed method.

1 Introduction

Nowadays classification and clustering of patterns are of central importance for thediscovery of information from enormous amounts of data which are available invarious practical fields. An appropriate method to extract features from patterns isneeded for good classification and clustering. But so far most conventional methodsof feature extraction ignore the geometric properties of data even in the case wherethe data have spatial features. For example, whenm vectors are measured froman object in three-dimensional space, conventional methods represent the objectby x ∈ R3m which is the vector made by arrangingm groups of 3 coordinates ofeach vector in a row. However, using only these coordinate values fails to capturegeometric relationships amongm vectors, i.e. the coordinate values depend on thedefinition of the coordinate system, inference or classification becomes remarkablybad, when objects are measured in a coordinate system different from the one usedfor learning. Some conventional methods may extract coordinate-free features, but

Minh Tuan Pham, Kanta Tachibana, Tomohiro Yoshikawa, and Takeshi FuruhashiNagoya University, Furou 1-1, Chikusa, Nagoya, Japan e-mail:{[email protected],[email protected], yoshikawa@cse, furuhashi@cse}.nagoya-u.ac.jp

Eckhard M. S. HitzerUniversity of Fukui, Bunkyo 3-9-1, Fukui, Japan e-mail: [email protected]

1

2 M. T. Pham, K. Tachibana, E. M. S. Hitzer, T. Yoshikawa, & T. Furuhashi

whether such features arise and are adopted depends on the experience of the modelbuilder.

In this study, we use geometric algebra (GA) [1, 2, 3] to systematically undertakevarious kinds of feature extractions, and to improve precision and robustness inclassification and clustering problems. There are already many successful examplesof its use in e.g. colored image processing or multi-dimensional time-series signalprocessing with low dimensional GAs [4, 5, 6, 7, 8, 9, 10]. In addition, GA-valuedneural network learning methods for learning input-output relationships [11] arewell studied. In our proposed method, geometric features extracted with GA canalso be used for learning a distribution and for semi-supervised clustering.

We use geometric features to learn a Gaussian mixture model (GMM) with theexpectation maximization (EM) algorithm [12]. Because each feature extraction de-rived by the proposed method has its own advantages and disadvantages, we apply aplural mixture of GMMs for a classification problem. As an example of multi-classclassification of geometric data, we use a hand-written digit dataset. When classi-fying new hand-written digits in practice with the learning model, it is natural toexpect that the coordinate system in a real new environment differs from the oneused for obtaining the learning dataset. Therefore, in this paper, we evaluate theclassification performance for randomly rotated test data.

As a second application, we analyze a dataset of a questionnaire for a newlydeveloped product. Characteristics of this dataset are:

1. The samemquestions are asked forn different objects (usage scenes of the prod-uct);

2. Each subject answers his/her willingness to buy for one of three different prices,and does not answer for the other two prices.

Considering the first characteristic, we regard a pattern of answering to the questionsby a subject as a tuple ofm points in ann dimensional space. This aims to extractfeatures ofn-dimensionalshapeformed by them vectors with GA. For the secondcharacteristic, we utilize harmonic functions [13] for the semi-supervised learning.In our proposed method, geometric features extracted with GA can be used for defin-ing a weighted graph over unlabeled and labeled data where the weights are givenin terms of a similarity function between subjects. This paper reports a result ofsemi-supervised clustering of subjects taking geometric properties of questionnaireinto consideration.

2 Method

This section describes our proposal to extract geometric features from spatial pat-terns for classification and clustering. Our general scheme is based on a descriptionof shapeformed by anm-tuple ofn-dimensional vectors by GA.

Classification and Clustering of Spatial Patterns with Geometric Algebra 3

2.1 Feature extraction with GA

An orthonormal basis{e1,e2, . . . ,en} can be chosen for a real vector spaceRn. TheGA of Rn, denoted byGn, is constructed by an associative and bilinear product ofvectors, the geometric product, which is defined by

eiej ={

1 (i = j) ,−ejei (i 6= j) . (1)

GAs are also defined for negative squarese2i =−1 of some or all basis vectors. Such

GAs have many applications in computer graphics, robotics, virtual reality, etc [3].But for our purposes definition (1) will be sufficient.

The geometric product of linear independent vectorsa1, . . . ,ak,(k≤ n) has itsmaximum grade term as thek-bladea1∧ . . .∧ak. Linear combinations ofk-bladesare calledk-vectors, represented by∑I∈Ik

wI eI , whereIk = {i1 . . . ik | 1≤ i1 < .. . <

ik ≤ n} is the combination set ofk elements from{1, . . . ,n}. ForGn,∧k Rn denotes

the set of allk-blades andG kn denotes set ofk-vectors. The geometric product ofk

vectors yields

a1 . . .ak ∈{

G 1n ⊕ . . .⊕G k−2

n ⊕∧k Rn (oddk) ,G 0

n ⊕ . . .⊕G k−2n ⊕∧k Rn (evenk) .

(2)

Now we propose a systematic derivation of feature extractions from a series or aset of spatial vectorsξ = {pl ∈ Rn, l = 1, . . . ,m}. Our method is to extractk-bladesof different gradesk; which encode the variations of the features. Scalars whichappeared in eq. (2) in the case ofk = 2 can also be extracted.

First, assumingξ is a series ofn-dimensional vectors,n′+1 feature extractionsare derived wheren′ = min{n,m}. Fork = 1, . . . ,n′,

fk (ξ ) = {〈pl . . .pl+k−1e−1I 〉, I ∈Ik, l = 1, . . . ,n′−k+1} ∈ R(m−k+1)|Ik|, (3)

where〈·〉 denotes the operator that selects the scalar part,|Ik| is the number ofcombinations ofk elements fromn elements, ande−1

I is the inverse ofeI . For I =i1 . . . ik, e−1

I = eik . . .ei2ei1. We further define

f0 (ξ ) = {〈pl pl+1〉, l = 1, . . . ,n′−1} ∈ Rm−1. (4)

Next, assuming thatξ is a set of vectors,n′+ 1 feature extractions can also bederived in the same way

fk (ξ ) = {〈pl1 · · ·plke−1I 〉, I ∈Ik} ∈ R(mCk)|Ik|, (5)

f0 (ξ ) = {〈pl1pl2〉} ∈ R(mC2+m), (6)


Fig. 1 Flow of multi-class classification. The top diagram shows the training of the GMM for classC ∈ {‘0’ , . . . , ‘9’ }. TheD1C denotes a subset of training samples whose label isC. The f : ξ 7→ xshows feature extraction. Either of{ f1, f2, f0} is chosen asf . The bottom diagram shows esti-mation by the learned GMMs. The samef chosen for training is used here. The GMMC outputsp(ξ |C). The final estimation isC∗ = argmaxC p(ξ |C)P(C), whereP(C) is the prior distribu-tion. The setD3 consists of independent test data.

wheremCk is the number of combinations when we choosek elements fromm ele-ments. The dimension of the feature space becomes different from the case whereξis a series.

2.2 Distribution Learning and its Mixture for Classification

A GMM is useful to approximate a data distribution in a data space. A GMM ischaracterized by parametersΘ = {β j ,µ j ,Σ j}, whereβ j , µ j andΣ j are the mixtureratio, the mean vector, and the variance covariance matrix of thej-th Gaussian,respectively. The output is

p(ξ |Θ) =M

∑j=1

β jNd ( f (ξ )−µ j ;Σ j), (7)

whereNd (·; ·) is thed-dimensional Gaussian distribution function whose center isfixed at the origin.

To trainM Gaussians with given incomplete dataX = {xi = f (ξi) | 1≤ i ≤ N},the EM algorithm [12] is often utilized. The algorithm identifies both parametersΘand latent variablesZ = {zi j ∈ {0,1} | 1≤ j ≤ M}. Thezi j are random variables,which for zi j = 1 indicate that the individual datumxi belongs to thej-th of MGaussian distributions. Thus∑M

j=1P(zi j ) = 1. The EM algorithm repeats the E-stepand the M-step untilP(Z) andΘ converge. The E-step updates the probabilities ofZaccording to Bayes’ theoremP(Z | X,Θ) ∝ p(X | Z,Θ). The M-step updates the pa-rametersΘ of the Gaussians to maximize the likelyhoodl (Θ ,X,Z) = p(X | Z,Θ).


The flow of training and estimation of hand-written digit classification, as anexample of multi-class classification, is shown in Fig. 1. TheM for each GMM isdecided by validation with datasetD2.

Each feature extraction derived with GA has advantages and disadvantages. Abig merit of adopting learning of distributions rather than learning of input-outputrelations is that the learned distributions allow us to obtain reliable inference bymixing plural weak learners. In this study, we therefore use a mixture of GMMs.Inferences are mixed as

p(ξ |C) =2

∏k=0

p( fk(ξ ) |C) . (8)

2.3 Semi-Supervised Learning for Clustering

Suppose we have labeled and unlabeled pointsL = {(x1,y1), . . . ,(xl ,yl )} andU ={xl+1, . . . ,xl+u}, respectively, wherexi ∈RM for i = 1, . . . , p(= l +u) andyi ∈{0,1}.The goal is to find a binary functionγ:U → {0,1} so that similar points have thesame label. First, we find temporally labels all data points by minimum spanningtree with Kruskal’s algorithm based on Euclidean distance. Then, we setσ = d0/3,whered0 is the median distance of edges which connect two points with differenttemporal labels. A similarity function between two instances can be defined as

wi j = exp

(−

M

∑d=1

(xid−x jd

)2

σ2

), (9)

wherexid is thed-th component of instancexi , and then we can construct ap× psymmetric weight matrixW = [wi j ]. The weight matrix can be separated as

W =[

WLL WLU

WUL WUU

](10)

at thel -th row and thel -th column. Zhu et al [13] proposed to compute a real-valuedfunctiong:U → [0,1] which minimizes the energy

E (g) = ∑i, j

wi j (g(i)−g( j))2 . (11)

Restrictingg(i) = gL(i) ≡ yi for the labeled data,g for the unlabeled data can becalculated by

gU = (DUU −WUU )−1WULgL, (12)


whereDUU = diag(di) is the diagonal matrix with entriesdi = ∑ j wi j , for the un-labeled data. Thenγ(i) is decided using the class mass normalization proposed byZhu et al [13].

Because either of three different prices{ϕ1,ϕ2,ϕ3 | ϕ1 < ϕ2 < ϕ3} is indicated toa subject when he/she answers willingness to buy, we divide the subjects into threesets according to the indicated price. Then, we calculateγϕk for eachk ∈ {1,2,3}regarding one set of subjects as labeled and the other sets as unlabeled. After that,we check the consistency of subjecti whether the willingness decreases weaklymonotonously with the price, i.e.

γϕ1(i)≥ γϕ2(i)≥ γϕ3(i). (13)

If subject i contradicts to this condition then we clear the labelyi and repeat thesemi-supervised learning regarding such subjects as unlabeled from this time on.As shown in Fig. 2, we repeat this procedure until no more contradictions occur.

Fig. 2 Algorithm to find latent willingness to buy. From questionnaire dataξi , f ∈ { f0, f1, f2}extracts geometric featuresxi . Labely∗i is initially set atyi . After calculatingγϕ (i), labels of con-tradicted subjects are excluded from{y∗i }. The algorithm ends when no more contradictions occur.

3 Experimental Results and Discussion

This section shows experimental results of the proposed methods. Sections 3.1and 3.2 show applications of the multi-class classification method proposed in sec-tion 2.2 and the semi-supervised learning method proposed in 2.3, respectively.

3.1 Classification of Hand-written Digits

We used the Pen-Based Recognition of Hand-written Digits dataset of the UCIRepository [14] as an example application for multi-class classification becausedigits have two-dimensional spatial features. The dataset consists of 10992 sam-


ples written by 44 people. Among these samples 7494 samples were written by 30people, divided into learning dataD1, and validation dataD2. The 3498 remainingsamples were written by 14 other people and are used as test dataD3. Eight points{r l} dividing the orbit of the pen point into 7 equally long segments were chosen. Inthis study, we carry out the feature extraction with GA, after computingpl = r l − r̄ ,i.e. setting the origin at the centerr̄ of the digit.

We assumed cases of different measurement environments from the one in whichboth D1 andD2 have been measured. We generated a datasetD′

3 = {R(ξ ,ϕ) ,ξ ∈D3,ϕ ∼ U (ε)} that randomly rotated each digit of the test dataD3. U (ε) is auniform distribution of[−ε,ε] andϕ is a random variable. We generated 20 differentsetsD′

3 from the test datasetD3 for eachε ∈ {π/40,π/20,π/10} and classifiedthem.R(ξ ,θ) rotates all the points ofξ by a given angleθ . Fig. 3 shows the averageand the standard deviation of the correct classification rate when using the featureextractionf1 and the mixture of experts.

Fig. 3 Correct classification rate withf1 and mixture of experts.

The classification precision using only feature extractionf1 decreased remark-ably with increasingε. On the other hand, the classification precision using themixture of experts did not decrease that much. The rotations had no influence inthe cases off2 and f0. Their classification success rates were 94.25% and 85.88%,respectively.

3.2 Web Questionnaire Data

We used a web questionnaire data for a new product to extract features with GA andfind the latent willingness to buy. Subjects answered 10 questions about each objecti.e. a scene in which the product was used. Subjects were asked to give an evaluation


value to each question in 5 levels{1,2,3,4,5}, where “5” means “I agree verymuch” and “1” means “I disagree very much”. Subjects expressed their willingnessto buy the product for a price randomly selected from three prices.

We carried out the feature extractions with GA after subtracting 3 from all eval-uation values so thatxl ,i = {−2,−1,0,1,2}. For simplicity we binarized the 5 will-ingness levels: “5”, “4”7→ 1 and “3”, “2”, “1” 7→ 0.

Table 1 Result without GAC 0.1%

F-F 39.9%f1 F-T 22.0%

T-F 6.7%T-T 31.3%

Table 2 Result with GA. Analyses based onf0 and f2 fur-ther divide the subjects of “F-T” or “T-T” in Table 1.

f1(F-T) f222.0% F-F F-Tf0 F-F 6.0% 0.3%

F-T 14.4%1.3%

f1(T-T) f231.3% T-F T-Tf0 T-F 3.0% 0.9%

T-T 13.7%13.7%

Table 1 shows the result without introducing GA to find latent willingness to buythe product. In the table, “C” shows the number of subjects whoseγ contradictedto the condition (13) after the algorithm ended, thus we ignore those subjects. The“F-F” shows the number of subjects whoseyi = 0, where, for simplicity, we do notmind which price was indicated to the subject, andγϕ1(i) = 0, i.e. the subject didnot have either apparent or latent willingness even if the price is lowest. The “F-T”shows the number of subjects whoseyi = 0 but γϕ1(i) = 1, i.e. the subject answerednot to have willingness but he/she had latent willingness at least for the lowest price.The “T-F” shows the number of subjects whoseyi = 1 butγϕ1(i) = 0, i.e. the subjectshowed apparent willingness but from the similarity of answering pattern he/shewas not willing to buy. The “T-T” shows the number of subjects whoseyi = 1 andγϕ1(i) = 1, i.e. the subject had both apparent and latent willingness. The analysismade the following clear:

• Out of 61.9% of subjects (“F-F” or “F-T”) who answered negatively to the directwillingness question, 22.0% of all subjects were detected as willing to buy (“F-T”).

• Out of 38.0% of subjects (“T-F” or “T-T”) who answered positively to the directquestion of willingness, 31.3% of all subjects were detected as willing to buy(“T-T”).

As a conclusion, 53.3% subjects had a latent willingness to buy the product (“F-T”or “T-T”), and 46.7% of subjects did not.

Next, we conducted more detailed analysis introducing GA to define two morefeature spaces which are based onf0, f2, respectively. We focussed on the subjectswho had latent willingness in the analysis based onf1. These subjects were dividedaccording to whether they had latent willingness inferred from similarity defined oneach feature space.

• The left side of Table 2 shows that out of 22.0% of subjects who did not haveapparent but had latent willingness according to the analysis based onf1 alone,


only 1.3% of all subjects were judged so according to both analyses based onf0and f2.

• The right side of Table 2 shows that out of 31.3% of subjects who answeredand were judged as willing to buy in the analysis based onf1 alone, 17.6% ofall subjects were judged differently in at least one aspect of his/her answeringpattern. On the other hand, the remaining 13.7% of all subjects can be judged aswilling to buy with more confidence supported by the judgments based onf0 andf2.

As a conclusion, 15.0% of subjects were regarded as to have latent willingness tobuy (“F-T” and “T-T” by all analyses) with more confidence than analysis withoutintroducing GA.

Finally, we utilized principal component analysis (PCA) to visualize the datagiven by f1. Figure 4 shows apparent and latent willingness. In the right of thefigure, we find clusters to which the willing (‘◦’) and strongly willing (filled ‘¦’)subjects belong.

Fig. 4 The visualization of subjects by PCA. The left side shows apparent willingness and the rightshows latent willingness to buy. A willing subject is shown by ‘◦’. A subject who was not willingis shown by ‘×’. The filled ‘¦’ in the right side figure shows subjects who were judged as willingto buy with all feature extractionsf0, f1, f2.

4 Conclusions

In this study, we proposed systematic feature extraction methods by using GA.Based on the extracted features, we solved two machine learning problems, i.e. theclassification of hand-written digits by using GMMs and the discovery of latentwillingness to buy using semi-supervised learning.


We applied the proposed method to two-dimensional objects of hand-written dig-its deriving 3 ways of feature extraction, via coordinates, outer products, and innerproducts. When we assumed cases of different measurement environments, the clas-sification success rate by pure coordinate value feature extraction dropped substan-tially for the rotated test data. In contrast to this, with the mixture of experts, theclassification success rate was not only higher in the case of a constant measure-ment environment, it was also much more stable in the cases of large rotations ofthe test data.

We also applied the proposed method of feature extraction to clustering of an-swering patterns for a web questionnaire. We found latent willingness to buy a prod-uct from the questionnaire data. The results showed that semi-supervised learningbased on coordinates may detect subjects who had latent willingness to buy, and thatintroducing GA to the analysis may further find subjects ofstronglatent willingness.

Acknowledgements The authors thank Dr. Sven Buchholz, University of Kiel. This work waspartly supported by the Grant-in-Aid for the 21st century COE program “Frontiers of Computa-tional Science” Nagoya University and the Grant-in-Aid for Young Scientists (B) #19700218.

References

1. C. Doran and A. Lasenby, Geometric algebra for physicists, Cambridge University Press, 2003.2. D. Hestenes, New foundations for classical mechanics, Dordrecht, 1986.3. L. Dorst, D. Fontijne, and S. Mann, Geometric Algebra for Computer Science: An Object-

oriented Approach to Geometry (Morgan Kaufmann Series in Computer Graphics), 2007.4. I. Sekita, T. Kurita, and N. Otsu, Complex Autoregressive Model for Shape Recognition, IEEE

Trans. on Pattern Analysis and Machine Intelligence, Vol. 14, No. 4, 1992.5. A. Hirose, Complex-Valued Neural Networks: Theories and Applications, Series on Innovative

Intelligence, Vol. 5, 2006.6. N. Matsui, T. Isokawa, H. Kusamichi, F. Peper, and H. Nishimura, Quaternion neural network

with geometrical operators, Journal of Intelligent and Fuzzy Systems, Volume 15, Numbers3–4, pp. 149–164, 2004.

7. S. Buchholz and N. Le Bihan, Optimal separation of polarized signals by quaternionic neu-ral networks, 14th European Signal Processing Conference, EUSIPCO 2006, September 4–8,Florence, Italy, 2006.

8. T. Nitta, An Extension of the Back-Propagation Algorithm to Complex Numbers, Neural Net-works, Volume 10, Number 8, pp. 1391–1415(25), November 1997.

9. D. Hildenbrand and E. Hitzer, Analysis of point clouds using conformal geometric algebra,3rd International conference on computer graphics theory and applications, Funchal, Madeira,Portugal, 2008.

10. E. Hitzer, Quaternion Fourier Transform on Quaternion Fields and Generalizations, Advancesin Applied Clifford Algebras, 17(3), pp. 497–517 (2007).

11. G. Sommer, Geometric Computing with Clifford Algebras, Springer, 2001.12. A. Dempster, N. Laird, and D. Rubin, Maximum likelihood from incomplete data via the EM

algorithm. Journal of the Royal Statistical Society Series B, Vol. 39, No. 1, pp. 1–38, 1977.13. X. Zhu, J. Lafferty, and Z. Ghahramani, Combining active learning and semi-supervised learn-

ing using Gaussian fields and harmonic functions. ICML 2003 workshop on the Continuumfrom Labeled to Unlabeled Data in Machine Learning and Data Mining, 2003.

14. A. Asuncion, and D. J. Newman, UCI Machine Learning Repository. Irvine, CA: Universityof California, School of Information and Computer Science, 2007.

Classification and Clustering of Spatial Patterns with Geometric Algebra

Documents