Kenji Fukushima,1,2, Shotaro Shiba Funai, and Hideaki Iida · Featuring the topology with the unsupervised machine learning Kenji Fukushima,1,2, Shotaro Shiba Funai,3, yand Hideaki

Featuring the topology with the unsupervised machine learning

Kenji Fukushima,1, 2, ∗ Shotaro Shiba Funai,3, † and Hideaki Iida1, ‡

1Department of Physics, The University of Tokyo,

7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033, Japan

2Institute for Physics of Intelligence (IPI), The University of Tokyo,

7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033, Japan

3Physics and Biology Unit, Okinawa Institute of Science and Technology (OIST),

1919-1 Tancha Onna-son, Kunigami-gun, Okinawa 904-0495, Japan

Images of line drawings are generally composed of primitive elements. One of the

most fundamental elements to characterize images is the topology; line segments

belong to a category different from closed circles, and closed circles with different

winding degrees are nonequivalent. We investigate images with nontrivial winding

using the unsupervised machine learning. We build an autoencoder model with a

combination of convolutional and fully connected neural networks. We confirm that

compressed data filtered from the trained model retain more than 90% of correct

information on the topology, evidencing that image clustering from the unsupervised

learning features the topology.

∗ [email protected]† [email protected]‡ [email protected]

arX

iv:1

908.

0028

1v1

[cs

.LG

] 1

Aug

201

9

mailto:[email protected]



2

I. INTRODUCTION

Brains ingeniously function with networks of neurons. For understanding of intrinsic

brain dynamics, physicists would favorably decompose such an integral system into irre-

ducible elements, so that we can analyze relatively simpler function of each building element

that takes rather primitive actions. Then, numerical simulations on computer are handy

devices to test if a postulated mechanism of brains should go as expected. Such modeling

embodies non-equilibrium processes of brains, which is an approach acknowledged commonly

as computational neuroscience. Besides, a hypothesis called quantum brain dynamics im-

plements quantum fluctuations and Nambu-Goldstone bosons for brain sciences [1, 2] (for

discussions for/against quantum phenomena in brain dynamics, see Ref. [3]), which bridges

a devide between computational neuroscience and modern physics.

In contrast to such “off-equilibrium” problems, in the language of physics, perception and

recognition are “static” problems. For the latter problems, model-independent research tools

are available for computer simulations. That is, the machine learning enables us to emulate

the neural structure of brains on computer. One of intriguing attributes of the machine

learning, particularly with deep neural networks (NNs), is that any nonlinear mapping can

be represented by data transmission through multiple hidden layers [4].

These days we have witnessed tremendous progresses in the field of image recognition and

classification by means of the machine learning. In particular the progress has been driven

by Convolutional Neural Network (CNN) [5], which was originally proposed as a multi-layer

neural network imitating animal’s visual cortex [6]. The CNN has become the most common

approach for high-level image recognition since an overwhelming victory of AlexNet [7] at

“ImageNet Large Scale Visual Recognition Challenge 2012.” Challenges of minimizing inter-

class variability, reducing error rate, achieving large-scale image recognition, etc are ongoing

improvements and they are all crucial steps for practical usages.

Physicswise, the image handling with the deep learning has proved its strength in identi-

fying phase transitions. Some successful attempts are found in Ref. [8] for two-dimensional

systems analyzed by the supervised learning, Ref. [9] for its extension to three-dimensional

systems, and Ref. [10] for statistical systems studied by the unsupervised learning. Here,

we point out an essential difference between the supervised and unsupervised learning; the

former is useful for regression and grouping problems, while the latter efficiently makes fea-

3

ture extraction and clustering of data. Interestingly, similarity between the unsupervised

learning and the renormalization group in physics has also been investigated, see Ref. [11].

In the context of image recognition, at the same time, a distinct direction toward more

fundamental research would be as important to demystify blackboxed artificial intelligence,

which may be somehow beneficial for so-called explainable artificial intelligence [12, 13].

The fundamental question of our interest in the present work is what would be the simplest

element of images that categorizes those images into representative clusters. For the sake of

image clustering, a useful mathematical notion, which underlies modern physics, has been

developed known as the “topology” theorized into the form of homotopy. The most well-

known example is that a mug with one handle and a torus-shaped donut belong to the same

grouping class; the shape can be smoothly deformed from one to the other, and they are of

the same homotopy type. In this work we report leading-edge results from our simulations

with the CNN supporting an idea that the topology is critical information for image feature

extraction and clustering.

II. TOPOLOGY AND THE WINDING NUMBER

The topology is classified by the homotopy group in mathematics. The simplest example

is what is called the fundamental homotopy group denoted as π1(S1) = Z associated with a

mapping from S1 (i.e., one dimensional unit sphere) to another S1 and an integer nW ∈ Z

corresponds to the winding number. To demonstrate the idea concretely, let us consider the

following function on S1 of U(1),

φ(x) = eiθ(x) = cos θ(x) + i sin θ(x) . (1)

If x is a coordinate on a circle with period L, the above function represents a mapping

from S1 in coordinate space to S1 on Gauss’ plane with Euler’s angle θ (which is also called

the “lift” in homotopy theory). While x travels around from 0 to L under a condition,

φ(0) = φ(L), Euler’s angle θ should return to the original position modulo 2π. The winding

number associated with the above function (or the “degree” of this function) reads,

nW =θ(L)− θ(0)

2π=

ln[φ(L)/φ(0)]

2πi=

1

2πi

∫ L

0

dxφ−1(x)dφ(x)

dx. (2)

Figure 1 schematically illustrates one winding configuration of φ(x) having nW = 1.

4

-1.0

-0.5

0.5

1.0

ReIm

x<latexit sha1_base64="jtFo/7QcF48exNI0BcWJHkH4w30=">AAAB6XicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04rGK/YA2lM120i7dbMLuRiyh/8CLB0W8+o+8+W/ctjlo64OBx3szzMwLEsG1cd1vZ2V1bX1js7BV3N7Z3dsvHRw2dZwqhg0Wi1i1A6pRcIkNw43AdqKQRoHAVjC6mfqtR1Sax/LBjBP0IzqQPOSMGivdk6deqexW3BnIMvFyUoYc9V7pq9uPWRqhNExQrTuemxg/o8pwJnBS7KYaE8pGdIAdSyWNUPvZ7NIJObVKn4SxsiUNmam/JzIaaT2OAtsZUTPUi95U/M/rpCa88jMuk9SgZPNFYSqIicn0bdLnCpkRY0soU9zeStiQKsqMDadoQ/AWX14mzWrFO69U7y7Ktes8jgIcwwmcgQeXUINbqEMDGITwDK/w5oycF+fd+Zi3rjj5zBH8gfP5Az0ajSo=</latexit>



Re

Im-1.0

-0.5

0.5

1.0

FIG. 1. Schematic illustration of π1(S1) realized by φ(x) = eiθ(x). In the left representation the

behavior of θ(x) is manifest and nW = 1 is easily concluded, but in our simulation, only (Reφ, Imφ)

as shown in the right is given to the NN model.

For us, human-beings, it would be an elementary-class exercise to discover a counting

rule of nW . If we were given many different images with correct answers of nW , it would be

just a matter of time for us to eventually find a right counting rule out. This description

is nothing but the machinery of the supervised learning. Interestingly, it has been reported

that the deep neural network trained by the supervised learning correctly reproduced nW

of π1(S1) [14]. There, the machine discovered a nice fit of the formula (2) only from the

information of given (Reφ, Imφ) and nW , but not referring to θ(x) directly. In the present

work, we are taking one step forward; we would like to see the NN model not only optimizing

a fit from the supervised learning but featuring the topology from the unsupervised learning.

More specifically, we would like to think of classification of many images without giving

the answers of nW . It would be very intriguing to ask such a question of how the CNN

makes clustering of differently winding data, which would be a prototype model of how our

brains categorize images based on the topological characterization. Thus, the unsupervised

learning as adopted in this work should tell us surpassing information than the supervised

learning for the purpose to dissect the topological contents.

III. RESULTS AND DISCUSSIONS

In the numerical procedure we represent φ(x) by a sequence of numbers on discretized

x with L = 128 grids, i.e., we generate 2 × 128 = 256 sized data of (Reφi, Imφi) with

5

xL1 L2 L3 L4 L5

2⇡

4⇡

6⇡+ + + +�

x

2⇡

4⇡

6⇡

✓+ + + +�

nW = +3

`1<latexit sha1_base64="EuuwP/S1BhqEkG1qJAo5HWTMkkQ=">AAACaXichVG7SgNBFD1Z3/GVaBO0CQbFKtxVQbESbSx9JQomhN11Elcnu8vuJhCDP2BlJ2qlICJ+ho0/YJFPkJQKNhbebBZERb3DzJw5c8+dMzO6I03PJ2pElI7Oru6e3r5o/8Dg0HAsPpL17IpriIxhS9vd0TVPSNMSGd/0pdhxXKGVdSm29cOV1v52VbieaVtbfs0R+bJWssyiaWg+U9mckLKgFmIpSlMQyZ9ADUEKYazZsVvksAcbBiooQ8CCz1hCg8dtFyoIDnN51JlzGZnBvsAxoqytcJbgDI3ZQx5LvNoNWYvXrZpeoDb4FMndZWUSk/REd/RCj3RPz/T+a616UKPlpcaz3tYKpzB8kth8+1dV5tnH/qfqT88+ilgIvJrs3QmY1i2Mtr56dPayubgxWZ+ia2qy/ytq0APfwKq+GjfrYuMSUf4A9ftz/wTZmbQ6m6b1udTScvgVvRjHBKb5veexhFWsIcPnHuAU57iINJW4klDG2qlKJNSM4ksoqQ8aEIvd</latexit>

`2<latexit sha1_base64="pNbej+j4bmGjxlg1G3NiYe+f+QQ=">AAACaXichVFNLwNBGH66vqo+WlyES6OpODVvkRAn4eKIaklomt01ZXW6u9ndNqHxB5zcBCcSEfEzXPwBh/4E6bESFwdvt5sIDd7JzDzzzPu888yMZkvD9YjqIaWru6e3L9wfGRgcGo7GRkZzrlVxdJHVLWk5O5rqCmmYIusZnhQ7tiPUsibFtlZabe1vV4XjGpa55R3bIl9WD0yjaOiqx1RuT0hZmC3EEpQiP+KdIB2ABIJYt2L32MM+LOiooAwBEx5jCRUut12kQbCZy6PGnMPI8PcFThFhbYWzBGeozJZ4PODVbsCavG7VdH21zqdI7g4r40jSCz1Qk57pkV7p49daNb9Gy8sxz1pbK+xC9Gw88/6vqsyzh8Mv1Z+ePRSx6Hs12LvtM61b6G199eSimVnaTNam6ZYa7P+G6vTENzCrb/rdhti8RoQ/IP3zuTtBbjaVnkvRxnxieSX4ijAmMYUZfu8FLGMN68jyuUc4xyWuQg1lRBlXJtqpSijQjOFbKIlPHBCL3g==</latexit>

`3<latexit sha1_base64="biZhD6TuYyNvc4JxFEUy0h6AUl0=">AAACaXichVHLLgRBFD3T3uM1w2bCZmJCrCZ3jIRYCRvLeZghQSbdrdBUP9LdMwkTP2BlJ1iRiIjPsPEDFj5BLElsLNzu6UQQ3EpVnTp1z61TVZojDc8neowpbe0dnV3dPfHevv6BwURyqOrZdVcXFd2WtruqqZ6QhiUqvuFLseq4QjU1KVa0vcVgf6UhXM+wrWV/3xEbprptGVuGrvpMVdeFlLV8LZGhLIWR/glyEcggioKduMY6NmFDRx0mBCz4jCVUeNzWkAPBYW4DTeZcRka4L3CIOGvrnCU4Q2V2j8dtXq1FrMXroKYXqnU+RXJ3WZnGOD3QDb3QPd3SE73/WqsZ1gi87POstbTCqQ0epcpv/6pMnn3sfKr+9OxjC7OhV4O9OyET3EJv6RsHJy/ludJ4c4Iu6Zn9X9Aj3fENrMarflUUpXPE+QNy35/7J6hOZXP5LBWnM/ML0Vd0YxRjmOT3nsE8llBAhc/dxTFOcRZ7VpJKShlppSqxSDOML6FkPgAeEIvf</latexit>

`4<latexit sha1_base64="iG9G0pr9Db31fYVc1E+sExUxCok=">AAACaXichVG7SgNBFD1Z3/GRqI1oEwwRq3CjgmIVtLFUY6KgIeyuE10z2V12N4EY/AErO1ErBRHxM2z8AQs/QVJGsLHwZrMgGtQ7zMyZM/fcOTOj2dJwPaKXkNLV3dPb1z8QHhwaHolER8dyrlVxdJHVLWk5O5rqCmmYIusZnhQ7tiPUsibFtlZabe1vV4XjGpa55dVskS+rB6ZRNHTVYyq3J6QsLBSicUqSH7FOkApAHEGsW9E77GEfFnRUUIaACY+xhAqX2y5SINjM5VFnzmFk+PsCJwiztsJZgjNUZks8HvBqN2BNXrdqur5a51Mkd4eVMSTome6pSU/0QK/08Wutul+j5aXGs9bWCrsQOZ3IvP+rKvPs4fBL9adnD0Us+V4N9m77TOsWeltfPT5vZpY3E/UZuqEG+7+mF3rkG5jVN/12Q2xeIcwfkPr53J0gN5dMzSdpYyGeXgm+oh9TmMYsv/ci0ljDOrJ87hHOcIHLUEMZVSaUyXaqEgo04/gWSvwTIBCL4A==</latexit>

`5<latexit sha1_base64="uWdmbGs+isy4Vf7412jf2vZNWD4=">AAACaXichVHLSsNAFD2N7/po1U3RjRgUV+XWB4qrohuXvlqFVkoSpxqdJiFJC1r8AVfuRF0piIif4cYfcOEniEsFNy68SQOiRb3DzJw5c8+dMzO6I03PJ3qKKS2tbe0dnV3x7p7evkSyfyDv2VXXEDnDlra7qWuekKYlcr7pS7HpuEKr6FJs6PuLwf5GTbieaVvr/oEjtirajmWWTUPzmcoXhZSlmVJSpTSFMdIMMhFQEcWynbxBEduwYaCKCgQs+IwlNHjcCsiA4DC3hTpzLiMz3Bc4Qpy1Vc4SnKExu8/jDq8KEWvxOqjphWqDT5HcXVaOYIwe6ZZe6YHu6Jk+fq1VD2sEXg541hta4ZQSx6m1939VFZ597H6p/vTso4y50KvJ3p2QCW5hNPS1w9PXtfnVsfo4XdEL+7+kJ7rnG1i1N+N6RaxeIM4fkPn53M0gP5nOTKVpZVrNLkRf0YlhjGKC33sWWSxhGTk+dw8nOMN57EXpV1LKUCNViUWaQXwLRf0EIhCL4Q==</latexit>

0<latexit sha1_base64="keWkuYcowncBhvzD+4F+R5gMAJU=">AAACZHichVFNLwNBGH66vqqKIhKJRERDnJq3SIiTcHFUVSQ0ze4abOxXdqdNqvEHuBIHJxIR8TNc/AEHf0AijpW4OHh3u4nQ4J3MzDPPvM87z8xormn4kugpprS0trV3xDsTXcnunt5UX/+675Q9XRR0x3S8TU31hWnYoiANaYpN1xOqpZliQztYCvY3KsLzDcdek1VXFC11zzZ2DV2VTOWolEpThsIYbQbZCKQRxYqTusE2duBARxkWBGxIxiZU+Ny2kAXBZa6IGnMeIyPcFzhCgrVlzhKcoTJ7wOMer7Yi1uZ1UNMP1TqfYnL3WDmKcXqkW6rTA93RC338WqsW1gi8VHnWGlrhlnqPh/Lv/6osniX2v1R/epbYxVzo1WDvbsgEt9Ab+srheT0/vzpem6AremX/l/RE93wDu/KmX+fE6gUS/AHZn8/dDNanMtnpDOVm0guL0VfEMYwxTPJ7z2IBy1hBgc8VOMEpzmLPSlIZUAYbqUos0gzgWygjn2b2ibI=</latexit>

FIG. 2. One example of generated data for (+,+,+,−,+) with nW = +3.

i = 1, . . . , L under the periodic boundary condition; (ReφL, ImφL) = (Reφ1, Imφ1). These

256 numbers consist of the input data onto the CNN side.

We prepare the training and test data randomly. We will give more detailed explanations

on the numerical procedure in Method. Each data consists of distinct Ns segments along x

with either positive or negative winding, where the segment lengths, `m, are chosen randomly,

whereNs∑m=1

`m = L should be kept. For the data used in this work, we take account of

Ns = 0, 1, 2, 3, 4, and 5. We then assign positive and negative winding randomly to each

segment, which is symbolically labeled as (p1, p2, . . . , pNs) with pm = ±1, where + (and −)

stands for positive (and negative, respectively) winding. For example, (+,+,+,−,+) for

Ns = 5 is a configuration with net winding, nW =Ns∑m=1

pm = +3. In m-th segment we first

postulate that θ(x) linearly changes its value by 2πpm. Then, zigzag lines are distorted

with random noises to enhance the learning quality and thus the adaptivity. Figure 2

depicts an example of generated data with a choice of (+,+,+,−,+). With Ns up to 5

20 + 21 + 22 + 23 + 24 + 25 = 63 winding patterns are possible, and nW can take a value

from −5 to +5. This setup is for the moment sufficiently general for our goal to check

performance of the topology detection.

The unsupervised learning utilizes the autoencoder [15]; it first encodes the data com-

pressed into smaller number of neurons in the CNN (in our case, 16 sites× 4 filters from

original 256 sites) and then decodes the compressed data with fully connected NN into the

original size. We repeat such encoding and decoding processes to minimize the loss function

6

1st filter

0 2 4 6 8 10 12 140.0

0.2

0.4

0.6

neuron

valu

e# of segments Ns = 5

1st filter

0 2 4 6 8 10 12 140.0

0.1

0.2

0.3

0.4

0.5

0.6

neuron

valu

e

winding number nW = +3 (Ns = 5)

++++-

+++-+

++-++

+-+++

-++++

FIG. 3. Examples of feature maps on the deepest CNN layer. (Left) Ns = 5 and all possible (i.e.,

25 = 32) winding patterns averaged over 1,000 test data. (Right) Ns = 5 with nW = +3 fixed and

five winding patterns averaged over 1,000 test data.

measured by the squared difference between the original input data and the coarse-grained

output data. The learning process optimizes the filters in the CNN encoder and simultane-

ously the weights in the NN decoder.

We shall see the results from our NN model that has been optimized by the unsupervised

learning with the training dataset of 63 winding patterns times 1,000 randomly generated

data (i.e., 63,000 data in total). We input the test data into the optimized NN model and

observe feature maps of 16 neurons convoluted with 4 filters on the deepest CNN layer.

Figure 3 summarizes sampled feature maps from the 1st filter. The left plot is for Ns = 5

with all possible winding patterns averaged over 1,000 test data. The right plot particularly

picks up the averaged feature maps for five different windings with Ns = 5 and nW = +3.

At a glance one may think that the behavior looks like coarse-grained Reφ(x) or Imφ(x)

with segment lengths normalized.

Now, the most fascinated question is whether the compressed data on 16 sites convoluted

with 4 filters could retain information on the topology or not, and if yes, how we can retrieve

it. From the left of Fig. 3 it is obvious that the peak heights reflect different winding patterns.

In fact, the averaged feature maps exhibit a clear hierarchy of four separated heights with

one-to-one correspondance to the winding sequence; that is, the peak heights increase with

sequential windings as

(+,+) < (+,−) < (−,+) < (−,−) , (3)

and the height at the far right end is determined solely by ±, which is due to zero-padding

7

1st filter

0 2 4 6 8 10 12 140.0

0.2

0.4

0.6

0.8

neuron

valu

e

2nd filter

0 2 4 6 8 10 12 140.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

neuron

valu

e

3rd filter

0 2 4 6 8 10 12 140.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

neuron

valu

e

4th filter

0 2 4 6 8 10 12 140.0

0.1

0.2

0.3

0.4

0.5

0.6

neuron

valu

e

FIG. 4. Feature maps from the 1st filter (top-left), the 2nd filter (top-right), the 3rd filter (bottom-

left), and the 4th filter (bottom-right) for (+,+,+,−,+) data without taking the average. Four

randomly selected data are shown by four different colors.

in the convolution to keep the data size.

For example, if we see the far left peak around the 2nd neuron in the right of Fig. 3,

(−,+,+,+,+) has the highest peak, (+,−,+,+,+) the second highest, and others are

degenerated in accord with Eq. (3). Also, we notice in the right of Fig. 3 that, for

(+,+,+,+,−), three consecutive short peaks appear from the left, one middle peak follows,

and then one tall peak sits at the right end. The short peaks correspond to (+,+) of

((+,+),+,+,−), (+, (+,+),+,−), and (+,+, (+,+),−). The middle peak corresponds to

(+,+,+, (+,−)) and the tall peak is sensitive only to − of (+,+,+,+,−).

One might have thought that such a clear hierarchy as in Fig. 3 is visible only after

taking the average. This is indeed the case, as exemplified in Fig. 4; the feature maps for

one test data do not always show prominent peaks with well separated heights. Neverthe-

less, surprisingly, we can see that such fluctuating coarse-grained data in Fig. 4 still retain

information on the topology!

It is impossible for our eyes to recognize any topological contents from Fig. 4, so we will

8

0% 20% 40% 60% 80% 100%

1filter

2filters

3filters

4filters

CorrectAnswerRate

1st

2nd

3rd

4th

5th

FIG. 5. Correct answer rate for nW guessed from the feature maps. For each of 63,000 test data we

have checked where the correct winding number ranks (1st, 2nd, ... in the legend) in the probability

output from the supervised NN model.

ask for a help of another machine learning device. For each of 63,000 training data, we

have such feature maps like Fig. 4 and also the corresponding nW . We can then perform

the supervised learning to train a fully connected NN (with one hidden layer) such that the

output gives the probability distribution of guessed nW (out of −5, . . . , 5) in response to the

input of feature maps. Figure 5 is the correct answer rate. If we input the feature maps

from only 1 filter, the most-likely nW hits the correct value at the rate of 50%, and the

second-likely one at the rate of 31%. If we use the feature maps from 2 filters, the available

information is doubled, and the correct answer rate of the most-likely nW becomes 78%.

Amazingly, for the feature maps from 3 filters, it increases up to 90%! There is almost no

difference between the 3-filter and the 4-filter results, and it seems that the correct answer

rate is saturated at 90%.

To summarize, from these results and analyses, we can conclude that the coarse-grained

data of the feature maps through the unsupervised learning do retain information on the

topology. In other words, the unsupervised learning with autoencoder can provide us with

a nontrivial machinery to compress data without losing the topological contents.

9

IV. DISCUSSION

We demonstrated that the unsupervised machine learning of images makes feature extrac-

tion without losing information on the topology. This evidences that the winding number

corresponding to the fundamental group, π1(S1), should be one of the essential indices that

characterize image clustering. We trained an autoencoder with the CNN and the fully con-

nected NN for the unsupervised learning using randomly generated data of functions from

x on S1 to a U(1) element. We found that the averaged feature maps on the deepest CNN

layer show a clear hierarchy pattern of the configurations with one-to-one correspondence

to the winding sequences. With help of the supervised learning technique we also revealed

that feature maps for each image data look coarse-grained images and such compressed data

retain information on the topology correctly.

The extension of the present work, i.e., the unsupervised learning of higher-dimensional

images with nontrivial winding would be quite interesting. We note that the supervised

machine learning has been utilized for π2(S2) [14], but as we illustrated in this work, the

unsupervised machine learning would be more interesting. Implications from this extension

include intriguing applications in quantum field theories. In fact, some classical solutions of

the equations of motion in quantum field theory are topologically stabilized. In such cases

the field configurations are classified according to the winding number. For representative

examples, π2(SU(2)/U(1)) = Z for monopoles, π3(SU(2)) = Z for Skyrmions (with no time-

dependence), and π3(SU(2)) = Z for instantons (with the fields at infinity identified) in

pure Yang-Mills theory [16, 17]. Actually, it is a long-standing problem how to visualize

the topological contents in quantum field theories. The numerical lattice simulation is a

powerful tool to solve quantum field theories, and field configurations should in principle

contain information on the topology. Some algorithms to extract topologically nontrivial

configurations such as monopoles, Skyrmions, and instantons have been proposed [18–20].

The most well-known approach, i.e., the cooling method has a serious flaw, however. If the

cooling is applied too many times, the topology is lost and the field configuration would

become trivially flat. Therefore, the cooling procedures should be stopped at some point,

and this artificial termination of the procedures causes uncertainties. Alternatively, we would

emphasize that the compression of field configuration images by means of the unsupervised

machine learning is a promising candidate for the superior smearing algorithm not to lose

10

the topological contents. We are testing this idea in simple two-dimensional lattice model,

namely, CPN−1 model that has π2(S2) instantons.

Mathematically, it would be also a very interesting question to consider not only the

homotopy groups but also the homology Hn(X) of a topological space X defined by a coset

of cycles in n dimensions over boundaries of (n + 1)-dimensional elements. For example,

H0(X) counts the number of connected drawings of images, and H1(X) counts the number

of loops of images, etc. This direction of research is now ongoing.

ACKNOWLEDGMENTS

We thank Yuya Abe and Yuki Fujimoto for discussions. K. F. was supported by Japan

Society for the Promotion of Science (JSPS) KAKENHI Grant No. 18H01211.

METHOD

Input data of winding patterns

Input data φi = eiθi (i = 1, . . . , L), including training data and test data, are generated

in the following way. Note that since the data φi are complex numbers, we actually input

the combinations of their real and imaginary parts (Reφi, Imφi).

First we impose the periodic boundary condition:

Reφ1 = ReφL, Imφ1 = ImφL. (4)

Then we divide L sites into Ns segments, and length of each segment `m (m = 1, . . . , Ns) is

randomly chosen as

`m =L− 1

Ns

[1 + 0.4(ξm − ξm−1)] (5)

where ξm is a random number from a uniform distribution in the open interval (−1, 1). This

means the lengthes in all the segments satisfy 0.2L−1Ns

< `m < 1.8L−1Ns

. We set ξ0 = ξNs = 0

so that the total length L is kept. To be exact, the length `m should be an integer, so the

right hand side is rounded to an integer.

In each segment m, the angle θi is composed of a linear part from 0 to ±2π and a random

noise:θi2π

= pmi− i0`m

+ 0.1ζi (6)

11

128x12ch

Conv18x1x2ch+bias2filtersReLU

Pooling2x1

64x12ch

Pooling4x1

Inputdata

… …

Conv28x1x2ch+bias4filtersReLU

… … … …

… … … ……

1

23

127128

1

2

8

4filters…

12

64

…

16x1x4128

…

Outputdata

…

1

16

4…… …

123

127128

…Fullyconn.ReLU&dropout

…… ……123

127128128x2

FIG. 6. Schematic figure of the autoencoder used for our unsupervised learning.

where pm = ±1 is the winding direction and i0 = 1 +∑m−1

m′=1 `m′ . The random noise ζi is

from a Gaussian distribution with mean 0 and variance 1.

In our experiment, we set L = 128 and Ns = 0, 1, . . . , 5. Then all the combinations of

winding directions pm have∑5

Ns=0 2Ns = 63 patterns. For each winding pattern, we generate

1,000 + 1,000 input data with the parameters (ξm, ζi) chosen randomly. The first 1,000 data

(in total 63,000 data) is the training data, which is used for training our autoencoder and

supervised NN. The other 1,000 data is the test data for analyzing the feature maps in the

autoencoder (see Figs. 3 and 4) and the output from the supervised NN (see Fig. 5).

Autoencoder

The autoencoder for our unsupervised learning consists of the CNN (encoder) and the

fully-connected NN (decoder). A schematic figure of our autoencoder is shown in Fig. 6.

We made a training code using TensorFlow [21].

The CNN encoder has two layers. In the both layers, we use the convolution with 8×1(×2

channel) sized filters and stride 1. We also use the zero padding to keep the size of data,

and the ReLU as an activation function. In the convolution part, difference between the

first and second layers is only the number of filters. After the convolution, our encoder has

the pooling part in each layer. We use the max pooling with 2× 1 sized window and stride

12

0 1000 2000 3000 4000 5000 6000 70000.20

0.25

0.30

0.35

0.40

0.45

0.50

0.55

Learning epoch

Lossfunction

Training of autoencoder

Training data

Test data

FIG. 7. Loss function during training of our autoencoder.

2 for each channel in the first layer. This pooling compresses the data size into 1/stride of

the original size. The second layer has 4 × 1 sized window and stride 4, then as a result,

the CNN encoder outputs the feature map with L/2/4 = 16 sites (for each filter out of 4

filters), as we saw in Fig. 3.

The fully-connected NN decoder has two layers, too. In the first layer, we use the dropout

method with probability 0.5 to avoid overlearning, then use the ReLU again. The final layer

has no dropout and no activation function, then outputs coarse-grained data (ϕi,1, ϕi,2) with

the same size as the input data (Reφi, Imφi).

As the loss function for the training, we choose the squared difference between input data

φi and output data ϕi,j, that is,

1

2L

L∑i=1

[(Reφi − ϕi,1)2 + (Imφi − ϕi,2)2

]. (7)

We prepared the training and test data, both of which contain 63 winding patterns times

1000 randomly generated data. Then we found the unsupervised learning with the learning

rate 10−7 and the mini-batch size 10 decreases the loss function of the test data to its

minimum around 6000 epochs, as shown in Fig. 7.

[1] L. M. Ricciardi and H. Umezawa, Kybernetik 4, 44 (1967).

http://dx.doi.org/10.1007/BF00292170

13

[2] M. Jibu and K. Yasue, Quantum Brain Dynamics and Consciousness: An Introduction, Ad-

vances in consciousness research (J. Benjamins Publishing Company, 1995).

[3] P. Jedlicka, Frontiers in Molecular Neuroscience 10, 366 (2017).

[4] Y. LeCun, Y. Bengio, and G. Hinton, Nature 521, 436 (2015).

[5] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D.

Jackel, Neural Computation 1, 541 (1989).

[6] K. Fukushima, Biological Cybernetics 36, 193 (1980).

[7] A. Krizhevsky, I. Sutskever, and G. E. Hinton, in Advances in neural information processing

systems (2012) pp. 1097–1105.

[8] T. Ohtsuki and T. Ohtsuki, Journal of the Physical Society of Japan 85, 123706 (2016).

[9] T. Ohtsuki and T. Ohtsuki, Journal of the Physical Society of Japan 86, 044708 (2017).

[10] S. S. Funai and D. Giataganas, (2018), arXiv:1810.08179 [cond-mat.stat-mech].

[11] S. Iso, S. Shiba, and S. Yokoo, Phys. Rev. E97, 053304 (2018), arXiv:1801.07172 [hep-th].

[12] W. Samek, T. Wiegand, and K. Muller, CoRR abs/1708.08296 (2017), arXiv:1708.08296.

[13] F. K. Dosilovic, M. Brcic, and N. Hlupic, in 2018 41st International Convention on Informa-

tion and Communication Technology, Electronics and Microelectronics (MIPRO) (2018) pp.

0210–0215.

[14] P. Zhang, H. Shen, and H. Zhai, Phys. Rev. Lett. 120, 066401 (2018).

[15] G. E. Hinton and R. R. Salakhutdinov, Science 313, 504 (2006).

[16] A. A. Belavin, A. M. Polyakov, A. S. Schwartz, and Yu. S. Tyupkin, Phys. Lett. B59, 85

(1975).

[17] G. ’t Hooft, Phys. Rev. D14, 3432 (1976).

[18] B. Berg, Phys. Lett. 104B, 475 (1981).

[19] M. Teper, Phys. Lett. B171, 86 (1986).

[20] E.-M. Ilgenfritz, M. Muller-Preuβker, G. Schierholz, H. Schiller, et al., Proceedings, 23RD

International Conference on High Energy Physics, JULY 16-23, 1986, Berkeley, CA, Nucl.

Phys. B268, 693 (1986).

[21] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving,

M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. G. Murray, B. Steiner, P. Tucker,

V. Vasudevan, P. Warden, M. Wicke, Y. Yu, and X. Zheng, in 12th USENIX Symposium on

Operating Systems Design and Implementation (OSDI 16) (USENIX Association, Savannah,

http://dx.doi.org/10.3389/fnmol.2017.00366

http://dx.doi.org/10.1038/nature14539

http://dx.doi.org/ 10.1162/neco.1989.1.4.541

http://dx.doi.org/10.1007/BF00344251

http://dx.doi.org/10.7566/JPSJ.85.123706

http://dx.doi.org/10.7566/JPSJ.86.044708

http://arxiv.org/abs/1810.08179

http://dx.doi.org/10.1103/PhysRevE.97.053304




http://dx.doi.org/10.23919/MIPRO.2018.8400040

http://dx.doi.org/10.23919/MIPRO.2018.8400040

http://dx.doi.org/ 10.1103/PhysRevLett.120.066401

http://dx.doi.org/10.1126/science.1127647

http://dx.doi.org/10.1016/0370-2693(75)90163-X

http://dx.doi.org/10.1016/0370-2693(75)90163-X

http://dx.doi.org/10.1103/PhysRevD.18.2199.3, 10.1103/PhysRevD.14.3432

http://dx.doi.org/10.1016/0370-2693(81)90518-9

http://dx.doi.org/10.1016/0370-2693(86)91004-X

http://dx.doi.org/ 10.1016/0550-3213(86)90265-8

http://dx.doi.org/ 10.1016/0550-3213(86)90265-8

14

GA, 2016) pp. 265–283.

Kenji Fukushima,1,2, Shotaro Shiba Funai, and Hideaki Iida · Featuring the topology with the unsupervised machine learning Kenji Fukushima,1,2, Shotaro Shiba Funai,3, yand Hideaki

Documents