Top Banner
平成27年7月11日
87

NEM Blockchain Meetup H27/07/11

Aug 16, 2015

Download

Technology

makoto1337
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: NEM Blockchain Meetup H27/07/11

平成27年7月11日

Page 2: NEM Blockchain Meetup H27/07/11

BLOCK CHAINとは?誠さんへ100円を送金する

“Let’s meet at Hachiko”

Send 500 XEM toab5f3cc…

選挙に何々さんを選ぶ

車登録

Page 3: NEM Blockchain Meetup H27/07/11

BLOCK CHAINとは?誠さんへ100円を送金する

“Let’s meet at Hachiko”

Send 500 XEM toab5f3cc…

選挙に何々さんを選ぶ

車登録参加者全員が、ひとつの共通した

データベースを保持する

Page 4: NEM Blockchain Meetup H27/07/11

BLOCK CHAINとは?誠さんへ100円を送金する

“Let’s meet at Hachiko”

Send 500 XEM toab5f3cc…

選挙に何々さんを選ぶ

車登録分散形合意形成システムに基づいて、データをブロックに入れて、

P2Pネットワークで共有する

Page 5: NEM Blockchain Meetup H27/07/11

BLOCK CHAINとは?誠さんへ100円を送金する

“Let’s meet at Hachiko”

Send 500 XEM toab5f3cc…

選挙に何々さんを選ぶ

車登録

モノのインターネットではいろんなメーカーからの機械の情報を共有しようとする場合、一つのメーカーが開発したシステムを信用せず、分散形合意形成システム、いわゆるブロック・チェーンを用いて、情報を共有する

Page 6: NEM Blockchain Meetup H27/07/11

• New Economy Movement (新経済運動)www.nem.io

• 3月31日より公開した

• 目標:人々に力を与える

• 通貨:8,999,999,999 XEM(一定)

• 将来、スマートコントラクトなども追加する

• ブロック・エクスプローラー:http://nembex.nem.ninja/

Page 7: NEM Blockchain Meetup H27/07/11

• 全部の8,999,999,999XEMは4000ステークスに分けて、約1800アカウントに2,250,000

XEMごとを送金した

Page 8: NEM Blockchain Meetup H27/07/11

NEMの開発特徴

• プロの開発者の最先端な開発のやりかた

• テストに基づく開発(test-driven development;

TTD)

• 実用性が第一!

Page 9: NEM Blockchain Meetup H27/07/11
Page 10: NEM Blockchain Meetup H27/07/11

公開鍵暗号について• 公開鍵と秘密鍵があって、秘密鍵でメッセージなどを署名したら、誰でも分かる公開鍵を使って、署名を確認できる

• メッセージなどを暗号化させる際、自分の秘密鍵を利用して、相手の公開鍵に向けて暗号化をして、相手は相手の秘密鍵を利用して復号できる

Page 11: NEM Blockchain Meetup H27/07/11

公開鍵暗号についてへの へのもへ

への へのもへ

Aさん

Bさん

メッセージ「なんでやねん」を送信する

秘密鍵を利用してBさんの公開鍵に暗号化する

秘密鍵を利用してAさんからのメッセージを復号する

Page 12: NEM Blockchain Meetup H27/07/11

公開鍵の例:

秘密鍵の例:

71fc5b675058d55fd81eb6fe91f6e3bc321bab752720416fb38aa7a0d1d0515a

00b3b8cf802ea687ee1e0f249e442ae82ee02b8b82e4cb3900092601603c658351

• 楕円曲線はいろんなあり、NEMはEd25519、

(http://ed25519.cr.yp.to/)を利用する

Page 13: NEM Blockchain Meetup H27/07/11

公開鍵暗号• Given 256ビット SHA-3 hash の部分 H():

3 Cryptography

“ I understood the importance in principle of public key cryptography but it’sall moved much faster than I expected. I did not expect it to be a mainstayof advanced communications technology. ”- Whitfield Di�e

Block chain technology demands the use of some cryptographic concepts.NEM, like many other crypto currencies, is using cryptography based onElliptic Curve Cryptography. The choice of the underlying curve is impor-tant in order to guarantee security and speed.

NEM has chosen to use the Twisted Edwards curve:

≠x2 + y2 = 1 ≠ 121665121666x2y2

over the finite field defined by the prime number 2255 ≠ 19 together with the digitalsignature algorithm called Ed25519. It was developed by D. J. Bernstein et al. and is oneof the safest and fastest digital signature algorithms [2].

The base point for the corresponding group G is called B. The group has q = 2252 ≠27742317777372353535851937790883648493 elements. Every group element A can be en-coded into a 256 bit integer A which can also be interpreted as 256-bit string and A canbe decoded to receive A again. For details see [2].

For the hash function H mentioned in the paper, NEM uses the 512 bit SHA3 hashfunction.

3.1 Private and public key

The private key is a random 256-bit integer k. To derive the public key A from it, thefollowing steps are taken:

H(k) = (h0

, h1

, ..., h511

) (1)a = 2254 +

ÿ

3ÆiÆ253

2ihi

(2)

A = aB (3)

Since A is a group element it can be encoded into a 256-bit integer A which serves as thepublic key.

Page 7 of 54

k:秘密鍵(ランダムな数字)

Page 14: NEM Blockchain Meetup H27/07/11

楕円曲線暗号

• NEMでは256ビットSHA-3 ハッシュの部分を利用する

• Ed25519, http://ed25519.cr.yp.to/

3 Cryptography

“ I understood the importance in principle of public key cryptography but it’sall moved much faster than I expected. I did not expect it to be a mainstayof advanced communications technology. ”- Whitfield Di�e

Block chain technology demands the use of some cryptographic concepts.NEM, like many other crypto currencies, is using cryptography based onElliptic Curve Cryptography. The choice of the underlying curve is impor-tant in order to guarantee security and speed.

NEM has chosen to use the Twisted Edwards curve:

≠x2 + y2 = 1 ≠ 121665121666x2y2

over the finite field defined by the prime number 2255 ≠ 19 together with the digitalsignature algorithm called Ed25519. It was developed by D. J. Bernstein et al. and is oneof the safest and fastest digital signature algorithms [2].

The base point for the corresponding group G is called B. The group has q = 2252 ≠27742317777372353535851937790883648493 elements. Every group element A can be en-coded into a 256 bit integer A which can also be interpreted as 256-bit string and A canbe decoded to receive A again. For details see [2].

For the hash function H mentioned in the paper, NEM uses the 512 bit SHA3 hashfunction.

3.1 Private and public key

The private key is a random 256-bit integer k. To derive the public key A from it, thefollowing steps are taken:

H(k) = (h0

, h1

, ..., h511

) (1)a = 2254 +

ÿ

3ÆiÆ253

2ihi

(2)

A = aB (3)

Since A is a group element it can be encoded into a 256-bit integer A which serves as thepublic key.

Page 7 of 54

Twisted Edward’sカーブ:

Page 15: NEM Blockchain Meetup H27/07/11

暗号署名の確認3.2 Signing and verification of a signature

Given a message M , private key k and its associated public key A, the following steps aretaken to create a signature:

H(k) = (h0

, h1

, ..., h511

) (4)r = H(h

256

, ..., h511

, M) where the comma means concatenation. (5)R = rB (6)S = (r + H(R, A, M)a) mod q (7)

Then (R, S) is the signature for the message M under the private key k. Note thatonly signatures where S < q and S > 0 are considered as valid to prevent the problemof signature malleability.

To verify the signature (R, S) for the given message M and public key A one checksS < q and S > 0 and then calculates

R = SB ≠ H(R, A, M)A

and verifies thatR = R (8)

If S was computed as shown in (7) then

SB = rB + (H(R, A, M)a)B = R + H(R, A, M)A

so (8) will hold.

3.3 Encoding and decoding messages

NEM uses Bouncy Castle’s AES block cipher implementation in CBC mode4 to encryptand decrypt messages.

If Alice has the private key kA

and wants to encrypt a message for Bob who has thepublic key A

B

(with corresponding group element AB

) then the shared secret used whensetting up the cipher is calculated as follows:

aA

is computed from kA

according to (2)salt = 32 random bytes

G = aA

AB

shared secret = H(G Y salt)4http://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#CBC

Page 8 of 54

署名の作成:

3.2 Signing and verification of a signature

Given a message M , private key k and its associated public key A, the following steps aretaken to create a signature:

H(k) = (h0

, h1

, ..., h511

) (4)r = H(h

256

, ..., h511

, M) where the comma means concatenation. (5)R = rB (6)S = (r + H(R, A, M)a) mod q (7)

Then (R, S) is the signature for the message M under the private key k. Note thatonly signatures where S < q and S > 0 are considered as valid to prevent the problemof signature malleability.

To verify the signature (R, S) for the given message M and public key A one checksS < q and S > 0 and then calculates

R = SB ≠ H(R, A, M)A

and verifies thatR = R (8)

If S was computed as shown in (7) then

SB = rB + (H(R, A, M)a)B = R + H(R, A, M)A

so (8) will hold.

3.3 Encoding and decoding messages

NEM uses Bouncy Castle’s AES block cipher implementation in CBC mode4 to encryptand decrypt messages.

If Alice has the private key kA

and wants to encrypt a message for Bob who has thepublic key A

B

(with corresponding group element AB

) then the shared secret used whensetting up the cipher is calculated as follows:

aA

is computed from kA

according to (2)salt = 32 random bytes

G = aA

AB

shared secret = H(G Y salt)4http://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#CBC

Page 8 of 54

3.2 Signing and verification of a signature

Given a message M , private key k and its associated public key A, the following steps aretaken to create a signature:

H(k) = (h0

, h1

, ..., h511

) (4)r = H(h

256

, ..., h511

, M) where the comma means concatenation. (5)R = rB (6)S = (r + H(R, A, M)a) mod q (7)

Then (R, S) is the signature for the message M under the private key k. Note thatonly signatures where S < q and S > 0 are considered as valid to prevent the problemof signature malleability.

To verify the signature (R, S) for the given message M and public key A one checksS < q and S > 0 and then calculates

R = SB ≠ H(R, A, M)A

and verifies thatR = R (8)

If S was computed as shown in (7) then

SB = rB + (H(R, A, M)a)B = R + H(R, A, M)A

so (8) will hold.

3.3 Encoding and decoding messages

NEM uses Bouncy Castle’s AES block cipher implementation in CBC mode4 to encryptand decrypt messages.

If Alice has the private key kA

and wants to encrypt a message for Bob who has thepublic key A

B

(with corresponding group element AB

) then the shared secret used whensetting up the cipher is calculated as follows:

aA

is computed from kA

according to (2)salt = 32 random bytes

G = aA

AB

shared secret = H(G Y salt)4http://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#CBC

Page 8 of 54

署名の確認:

Page 16: NEM Blockchain Meetup H27/07/11

ADDRESSES

• ベース32エンコーディングされたデータ:

• ネットワーク・バイト

• 160ビット公開鍵のハッシュ

• 4バイトchecksum

例:NDMAKO-TODH4O-XMQCAG-ZWWPVL-RUVOCL-

XXKOSK-FUJA

Page 17: NEM Blockchain Meetup H27/07/11

取引の種類

• importance transfer

• 重要性はリモートアカウントへ移転する

• multisig aggregate modification transaction

• 最低署名者の人数を設定する及び一括で複数署名者を変更する

• multisig cosignatory modification

• 署名者を追加したり、消したり

• multisig signature transaction

• 署名者が既にある取引を署名される

• multisig transaction

• 複数署名が必要取引

• transfer transaction

• 普通の取引

Page 18: NEM Blockchain Meetup H27/07/11

取引• 宛先

• 総額(Amount)

• BitcoinもNEMもスパムを防止する為、手数料が必要である。手数料は大変安い

• 締め切りもある(24時間まで)

• テキスト・メッセージ

Page 19: NEM Blockchain Meetup H27/07/11

手数料の計算

• 累進的な手数料

4 Transactions

“ To transact business with the girl who ran the gas-pump Dean merely threwon his T-shirt like a scarf and was curt and abrupt as usual and got back inthe car and o� we roared again. ”- Jack Kerouac

Transactions introduce dynamism into a cryptocurrency system. They arethe only way of altering the state of an account. A newly created transactionthat has not yet been included in a block is called an unconfirmed transaction.Unconfirmed transactions are not guaranteed to be included in any block.

As a result, unconfirmed transactions have no e�ect on the account state. The accountstate is only updated when a transaction is included in a harvested block and therebyconfirmed.

Di�erent types of transactions exist. Each type has a specific purpose, e.g. transferXEM from one account to another or convert an account to a multisig account. Sincetransactions consume resources of the p2p network there is a fee for each transaction. Thefee depends on the transaction type and other parameters of the transaction.

Transactions have a deadline. If a transaction is not included in a block before itsdeadline, the transaction is considered expired and gets dropped by the network nodes.

The following sections describe the di�erent transaction types.

4.1 Transfer transactions

A transfer transaction is used to transfer XEM from one account to another. A smallmessage of at most 96 bytes can be attached to each transfer transaction. In the case ofan encrypted message, only 48 bytes can contain custom data because the salt and theIV data are part of the encrypted message and require 48 bytes.

Fees for transfer transactions are divided into two parts:

transfer fee =Y]

[10 ≠ amount if amount < 8

max(2, 99 ú arctan(amount/150000)) otherwise

message fee =Y]

[0 if message is empty

2 ú max11, message length

16

2otherwise

Both fee parts are added to give the final fee.

Page 10 of 54

Page 20: NEM Blockchain Meetup H27/07/11

マルチシグ• アカウントから取引を作成したい際、ブロックチェーンで取引をアクセプトするまで、複数アカウントからの署名が必要である

• 暗号通貨の基本では、送金するとお金は現金と同様に戻れない為、秘密鍵が失うとお金も失う恐れがあり、マルチシグを利用すると桁違いに安全・安心になる

Page 21: NEM Blockchain Meetup H27/07/11

M-OF-N マルチシグ• 現在のテスト・ネット

• nの中で、m数の署名することが必要

• 例:5アカウントの中で三つが署名するが必要

• 特に取引所を使って欲しい

Page 22: NEM Blockchain Meetup H27/07/11

ブロック・データ• バージョン (例:1744830465)

• タイムスタンプ(例: 9232942)

• 作成者の公開鍵

• signature for block data

• (例:0a1351ef3e9b19c601e804a6d329c9ade662051d1da2c12c3aec9934353e421c79de7d8e59b127a8ca9b9d764e3ca67daefcf1952f71bc36f747c8a738036b05)

• 前のブロック・ハッシュ

• (例: 58efa578aea719b644e8d7c731852bb26d8505257e03a897c8102e8c894a99d6)

• generationハッシュ

• ブロックの高さ(height)(例: 42804)

• 取引のリスト

Page 23: NEM Blockchain Meetup H27/07/11

確率的ビザンチン合意形成 (中本合意形成)

• 中本哲史が提案した分散形合意形成アルゴリズムであり、皆で共有しているデータが正確だと証明するものである

• それぞれのプルーフ・オブ・「何々」のアルゴリズムは中本合意形成を実現していて、参加者から次のブロックの作成者を決定するものである

Page 24: NEM Blockchain Meetup H27/07/11

プルーフ・オブ・ワーク(POW)• ビットコインの代表的な「マイニング」

• アルゴリズム:

• minerは取引集合をブロックに入れて、ブロックの全部のデータをハッシュする

• →ハッシュは難易度より低かったら、オッケー!終わりとして、新しいブロックを作成する

• →ハッシュは難易度より高かったら、ダメなので、「nonce」という数字を変更する

• 繰り返す

Page 25: NEM Blockchain Meetup H27/07/11

ビットコインのブロック難易度計算

• 各2015個ブロック難易度の計算が行う

• 2015個ブロックは2週間掛ることに目指して、

それになれるように難易度を調整する

• 最初の難易度は1であった

• 参加者は皆独立計算すると全く同じ価値が結果となる

Page 26: NEM Blockchain Meetup H27/07/11

プルーフ・オブ・ワーク(POW)• ビットコインの代表的な「マイニング」

• アルゴリズム:

• minerは取引集合をブロックに入れて、ブロックの全部のデータをハッシュする

• →ハッシュは難易度より低かったら、オッケー!終わりとして、新しいブロックを作成する

• →ハッシュは難易度より高かったら、ダメなので、「nonce」という数字を変更する

• 繰り返す

Page 27: NEM Blockchain Meetup H27/07/11

プルーフ・オブ・ワーク(POW)• ビットコインは代表的な「マイニング」

• アルゴリズム:

• minerは取引集合をブロックに入れて、ブロックの全部のデータをハッシュする

• →ハッシュはdifficultyより低かったら、オッケー!終わりとして、新しいブロックを作成する

• →ハッシュはdifficultyより高かったら、ダメなので、「nonce」という数字を変更する

• 繰り返す

参加者の計算力を合わせて、ブロックのデータが正しいとの証明になる

(ワークの証明)

Page 28: NEM Blockchain Meetup H27/07/11

プルーフ・オブ・ワーク(POW)• ビットコインは代表的な「マイニング」

• アルゴリズム:

• minerは取引集合をブロックに入れて、ブロックの全部のデータをハッシュする

• →ハッシュはdifficultyより低かったら、オッケー!終わりとして、新しいブロックを作成する

• →ハッシュはdifficultyより高かったら、ダメなので、「nonce」という数字を変更する

• 繰り返す

複数の正確なブロックがあり得る!

ビットコインでは一番ブロック数が多いチェーンだけを存在する

Page 29: NEM Blockchain Meetup H27/07/11

POWの問題点• マイニング団体が力を持ちすぎたら、嘘のデータをブロックに入られるし、ダブル・スペンド攻撃の恐れがある

• P2Pノードを立ち上がるモチベーションがない

• ブロック作成者だけに手数料などを貰う

• 電力が非常に無駄になり、地球環境を痛め(人生の命だ!)、お金もかかる(中国には一箇所の電気代は月8万米ドル!!)

Page 30: NEM Blockchain Meetup H27/07/11

POWの一般的な働き極端にPoWはminerによる計算力の確率分布からサンプルを取って、次のブロックの作成者を決定する

Page 31: NEM Blockchain Meetup H27/07/11

POWの問題の解決に向けて

計算力の確率分布からサンプルするより、電力を無駄にしない分布からサンプルする

Page 32: NEM Blockchain Meetup H27/07/11

PROOF-OF-STAKE(取得した権利の残高の証明)

• 計算力の分布からブロックの作成者を決定せず、アカウント取得した権利の残高の分布を利用する

• これから説明して述べるProof-of-Importanceと同様なので、ここではPoSを詳しく説明しない

• 問題:お金持ちがブロック作成者になってしまったら、平等な参加資格を得られない

Page 33: NEM Blockchain Meetup H27/07/11

重要性の証明(PROOF-OF-IMPORTANCE; POI)

• NEMが開発したアルゴリズム

• 経済に関するユーザー自身の重要性によって次のブロックを誰が収穫(作成)するのかを確率的に決める

• PoSと同じ様であるが、残高ではなく、importanceを計算に利用する

Page 34: NEM Blockchain Meetup H27/07/11

取引グラフ

• 非常に重要なデータ

Page 35: NEM Blockchain Meetup H27/07/11

付与された残高• 残高に重みを付けて、時間が経つと重みが上がる

• 毎日残っている付与されてないの一割が付与される

0 2 4 6 8 10 12 14 16 18 20

0

10,000

20,000

30,000

40,000

50,000

60,000

70,000

80,000

90,000

days

vest

edpa

rt

Figure 1: Vesting of 100,000 XEM

All accounts in the nemesis block2 are fully vested.

2.2 NEM addresses

A NEM address is a base-323 encoded triplet consisting of:

• network byte

• 160-bit hash of the account’s public key

• 4 byte checksum

The checksum allows for quick recognition of mistyped addresses. It is possible to sendXEM to any valid address even if the address has not previously participated in any

2first block in the NEM block chain3http://en.wikipedia.org/wiki/Base32

Page 3 of 54

Page 36: NEM Blockchain Meetup H27/07/11

アウトリンク行列

• アカウントからの取引はアウトリンクで、アカウント数 x アカウント数の行列はアウトリンク行列である

Page 37: NEM Blockchain Meetup H27/07/11

取引グラフをクラスターする

NTTで開発したSCAN++というアルゴリズムを利用する

Page 38: NEM Blockchain Meetup H27/07/11
Page 39: NEM Blockchain Meetup H27/07/11

SCAN++similarity between two accounts u and v in the transaction graph, ‡(u, v), is calculatedas follows:

‡(u, v) = |�(u) fl �(v)|Ò

|�(u)| |�(v)|, (20)

where |.| denotes set cardinality and � is the set of structurally connected accounts(inclusive of self), defined as:

�(u) = {v œ V | {u, v} œ E} fi {u} . (21)

N‘

is the set of structurally connected accounts that have structural similarity with anaccount over a pre-determined threshold, ‘:

N‘

(u) = {v œ �(u)|‡(u, v) Ø ‘} . (22)

Core nodes are used for pivoting and expanding clusters and are defined as follows:

K‘,µ

(u) … |N‘

(u)| Ø µ (23)

where µ is the minimum number of epsilon neighbor accounts that an account musthave to be considered core. During clustering, clusters are centered (pivoted) around coreaccounts. The initial members of the cluster are the members of N

. This means that µcontrols the smallest possible size of a cluster. In NEM, µ is 4 and ‘ is 0.3. An account vhas direct structure reachability, u ‘æ

‘,µ

v, with account u for a given ‘ and µ, if u is coreand v is a member of N

(u):

u ‘æ‘,µ

v … K‘,µ

(u) · v œ N‘

(u). (24)

In the SCAN algorithm, accounts that are core are set as pivots and then clusters areexpanded by including accounts with direct structure reachability (Equation 24: Cluster-ing the transaction graph). This requires computing the structural similarity with eachneighbor and the neighbors’ neighbors.

The improved version of SCAN only looks at the pivot accounts and accounts that aretwo-hops away from the pivot accounts. Accounts two-hops away from account u, H(u),are defined as follows:

H(u) = {v œ V |(u, v) /œ E · (v, w) œ E)} , (25)

Page 33 of 54

類似度の定義:

similarity between two accounts u and v in the transaction graph, ‡(u, v), is calculatedas follows:

‡(u, v) = |�(u) fl �(v)|Ò

|�(u)| |�(v)|, (20)

where |.| denotes set cardinality and � is the set of structurally connected accounts(inclusive of self), defined as:

�(u) = {v œ V | {u, v} œ E} fi {u} . (21)

N‘

is the set of structurally connected accounts that have structural similarity with anaccount over a pre-determined threshold, ‘:

N‘

(u) = {v œ �(u)|‡(u, v) Ø ‘} . (22)

Core nodes are used for pivoting and expanding clusters and are defined as follows:

K‘,µ

(u) … |N‘

(u)| Ø µ (23)

where µ is the minimum number of epsilon neighbor accounts that an account musthave to be considered core. During clustering, clusters are centered (pivoted) around coreaccounts. The initial members of the cluster are the members of N

. This means that µcontrols the smallest possible size of a cluster. In NEM, µ is 4 and ‘ is 0.3. An account vhas direct structure reachability, u ‘æ

‘,µ

v, with account u for a given ‘ and µ, if u is coreand v is a member of N

(u):

u ‘æ‘,µ

v … K‘,µ

(u) · v œ N‘

(u). (24)

In the SCAN algorithm, accounts that are core are set as pivots and then clusters areexpanded by including accounts with direct structure reachability (Equation 24: Cluster-ing the transaction graph). This requires computing the structural similarity with eachneighbor and the neighbors’ neighbors.

The improved version of SCAN only looks at the pivot accounts and accounts that aretwo-hops away from the pivot accounts. Accounts two-hops away from account u, H(u),are defined as follows:

H(u) = {v œ V |(u, v) /œ E · (v, w) œ E)} , (25)

Page 33 of 54

しかしながら,SCANはその計算量の大きさから大規模なグラフ構造を対象としたクラスタリングは難しい.SCANでは,全てのエッジに対して構造的類似度を計算する.そのため,グラフ構造中のエッジ数を |E| とした時に,O(|E|) の時間計算量が生じる.またグラフ構造中のノード数を |V | としたとき|E| ≈ |V |2 となるような場合,SCANの時間計算量は最悪計算量 O(|V |2)となる.したがって,近年増加する大規模なグラフ構造のクラスタリングに膨大な処理時間を要することになる.1. 1 本研究の貢献本稿では以下の問題について取り組む.

[問題定義 1](構造的類似度に基づく高速なクラスタリング)Given: グラフ構造 G = (V,E),構造的類似度の閾値 ϵ,

およびクラスタを構成する最小ノード数 µ.Find: グラフ構造 Gから,クラスタ集合 C,ハブ集合 H

および外れ値集合 O.本稿では問題定義 1を従来よりも大規模なグラフ構造に対し

て適用可能にするため,SCANと同一のクラスタ集合 C,ハブ集合 H,外れ値集合 O を高速に抽出するクラスタリング手法を提案する.本稿では従来手法 SCAN の計算コストを削減するために,現実のグラフ構造の高いクラスタ性に着目した.クラスタ性はあるノードの隣接ノード同士が互いエッジで接続しやすい傾向にあるという性質ある.現実のグラフ構造は高いクラスタ性を持つことから,クラスタを形成しやすく密にエッジで接続した部分グラフ構造を内包していると考えられる.そこで本研究では,全てのエッジに対して構造的類似度の計算を必要とした SCAN を高速化するために,高いクラスタ性により密なエッジの接続を有する部分グラフ構造の計算を可能な限り回避するような手法を考える.提案手法では,最短ホップ数が2となるような部分ノード集合を抽出し,抽出した部分ノードに含まれるノードに接続したエッジについてのみ構造的類似度計算を行う.本稿では,本手法で抽出する最短ホップ数が 2となる部分ノード集合を 2-hop awayノードと呼ぶ.2-hop away

ノードに接続するエッジについてのみ構造的類似度の計算を行うことで,高いクラスタ性により密なエッジの接続を有するサブグラフ構造を少ない計算回数でクラスタリングする.従来手法である SCAN では,全てのエッジについて構造的類似度計算を行う必要があったのに対し,提案手法は 2-hop awayノードに接続したエッジのみ構造的類似度計算を行う.ゆえに,クラスタリング全体で計算されるエッジの本数を削減することができる.提案手法はグラフ中のノード数 |V | に対して O(|V |)の時間計算量を示す.その結果として,提案手法は以下の特性を有する.

• 高速性:  先に述べた 2つのアプローチにより,従来手法 SCANに対して高速にクラスタリングを行うことができる.

• 正確性:  提案手法で用いるアプローチは,SCANにおけるクラスタの定義を満たす. したがって,問題定義 1において SCANと同一のクラスタを得ることができる.

• 運用性:  提案手法は事前計算を必要とせず,グラフ構造 Gと 2つのパラメータ ϵ,µを与えることによりクラスタリングを実行できる.

表 1 記号の定義記号 定義|V | ノード数|E| エッジ数ϵ クラスタを構成するための構造的類似度の閾値µ クラスタに含まれる最小ノード数

Γ(u) ノード u の構造的隣接ノード集合|Γ(u)| Γ(u) に含まれるノード数σ(u, v) エッジ (u, v) の構造的類似度Nϵ(u) ノード u の ϵ − neighborhood

|Nϵ(u)| Nϵ に含まれるノード数Kϵ,µ(u) core であるノード u

u "→ϵ,µ v ノード u からノード v への direct structure reachability

u →ϵ,µ v ノード u からノード v への structure reachability

本稿で提案する手法は我々の知る限り,クラスタリングの高速性と正確性の両方を同時に満たす最初の手法である.従来手法である SCANは高い計算コストを有するものの,クラスタだけでなくハブや外れ値を抽出できることから幅広いアプリケーションで利用されている.本稿で提案する手法は,既に従来技術が利用されているアプリケーションや将来的に利用が予測される分野において,その処理性能の向上に貢献する.本稿の構成は,次の通りである.2.節で本稿の前提となる知

識について概説する.3.節にて提案手法の詳細について説明し,4.節において提案手法の評価と分析を行う.5.節にて関連研究について述べ,6.節にて,本稿をまとめ,今後の課題について論ずる.

2. 事 前 準 備従来手法 SCAN [8]を基に,本稿の前提について述べる.本

稿では無向重みなしグラフ G = (V,E)に対して,クラスタ集合 C,ハブ集合 H および外れ値集合 O を抽出することを考える.表 1にて本稿で用いる記号とその定義を示す.従来手法では 2つのノード間で共有される隣接ノード集合の

割合を評価することで構造類似度を計算していく.ここで用いる隣接ノード集合は以下のように定義される.[定義 1](構造的隣接ノード集合) u ∈ V とするとき,隣接ノード集合はノード uにエッジで接続するノードとノード u自身から構成される集合 Γ(u)で与えられる.

Γ(u) = {v ∈ V |{u, v} ∈ E} ∪ {u}.

また先に述べたように,クラスタリングで用いられる 2ノード間の構造的類似度は定義 1に基づき以下のように定義される.[定義 2](構造的類似度) u, v ∈ V,|Γ(u)|を隣接ノード集合に含まれるノード数とするとき,ノード u,v 間の構造的類似度は σ(u, v)となる.

σ(u, v) =|Γ(u) ∩ Γ(v)|!|Γ(u)||Γ(v)|

.

定義 2 に示したように,ノード u,v 間の σ(u, v) は共有されるノードがない場合 σ(u, v) = 0,互いに全て共有する場合には σ(u, v) = 1となる.構造的類似度に基づくクラスタリング手法 SCAN はクラス

タを構成するための類似度の閾値として ϵを導入し,類似度 ϵ

以上で接続する隣接ノード集合 ϵ-neighborhoodを定義する.[定義 3](ϵ-neighborhood) u ∈ V,ϵ ∈ R とするとき,ϵ-

neighborhood Nϵ(u)は以下のように定義される.

Page 40: NEM Blockchain Meetup H27/07/11

SCAN++

similarity between two accounts u and v in the transaction graph, ‡(u, v), is calculatedas follows:

‡(u, v) = |�(u) fl �(v)|Ò

|�(u)| |�(v)|, (20)

where |.| denotes set cardinality and � is the set of structurally connected accounts(inclusive of self), defined as:

�(u) = {v œ V | {u, v} œ E} fi {u} . (21)

N‘

is the set of structurally connected accounts that have structural similarity with anaccount over a pre-determined threshold, ‘:

N‘

(u) = {v œ �(u)|‡(u, v) Ø ‘} . (22)

Core nodes are used for pivoting and expanding clusters and are defined as follows:

K‘,µ

(u) … |N‘

(u)| Ø µ (23)

where µ is the minimum number of epsilon neighbor accounts that an account musthave to be considered core. During clustering, clusters are centered (pivoted) around coreaccounts. The initial members of the cluster are the members of N

. This means that µcontrols the smallest possible size of a cluster. In NEM, µ is 4 and ‘ is 0.3. An account vhas direct structure reachability, u ‘æ

‘,µ

v, with account u for a given ‘ and µ, if u is coreand v is a member of N

(u):

u ‘æ‘,µ

v … K‘,µ

(u) · v œ N‘

(u). (24)

In the SCAN algorithm, accounts that are core are set as pivots and then clusters areexpanded by including accounts with direct structure reachability (Equation 24: Cluster-ing the transaction graph). This requires computing the structural similarity with eachneighbor and the neighbors’ neighbors.

The improved version of SCAN only looks at the pivot accounts and accounts that aretwo-hops away from the pivot accounts. Accounts two-hops away from account u, H(u),are defined as follows:

H(u) = {v œ V |(u, v) /œ E · (v, w) œ E)} , (25)

Page 33 of 54

similarity between two accounts u and v in the transaction graph, ‡(u, v), is calculatedas follows:

‡(u, v) = |�(u) fl �(v)|Ò

|�(u)| |�(v)|, (20)

where |.| denotes set cardinality and � is the set of structurally connected accounts(inclusive of self), defined as:

�(u) = {v œ V | {u, v} œ E} fi {u} . (21)

N‘

is the set of structurally connected accounts that have structural similarity with anaccount over a pre-determined threshold, ‘:

N‘

(u) = {v œ �(u)|‡(u, v) Ø ‘} . (22)

Core nodes are used for pivoting and expanding clusters and are defined as follows:

K‘,µ

(u) … |N‘

(u)| Ø µ (23)

where µ is the minimum number of epsilon neighbor accounts that an account musthave to be considered core. During clustering, clusters are centered (pivoted) around coreaccounts. The initial members of the cluster are the members of N

. This means that µcontrols the smallest possible size of a cluster. In NEM, µ is 4 and ‘ is 0.3. An account vhas direct structure reachability, u ‘æ

‘,µ

v, with account u for a given ‘ and µ, if u is coreand v is a member of N

(u):

u ‘æ‘,µ

v … K‘,µ

(u) · v œ N‘

(u). (24)

In the SCAN algorithm, accounts that are core are set as pivots and then clusters areexpanded by including accounts with direct structure reachability (Equation 24: Cluster-ing the transaction graph). This requires computing the structural similarity with eachneighbor and the neighbors’ neighbors.

The improved version of SCAN only looks at the pivot accounts and accounts that aretwo-hops away from the pivot accounts. Accounts two-hops away from account u, H(u),are defined as follows:

H(u) = {v œ V |(u, v) /œ E · (v, w) œ E)} , (25)

Page 33 of 54

しかしながら,SCANはその計算量の大きさから大規模なグラフ構造を対象としたクラスタリングは難しい.SCANでは,全てのエッジに対して構造的類似度を計算する.そのため,グラフ構造中のエッジ数を |E| とした時に,O(|E|) の時間計算量が生じる.またグラフ構造中のノード数を |V | としたとき|E| ≈ |V |2 となるような場合,SCANの時間計算量は最悪計算量 O(|V |2)となる.したがって,近年増加する大規模なグラフ構造のクラスタリングに膨大な処理時間を要することになる.1. 1 本研究の貢献本稿では以下の問題について取り組む.

[問題定義 1](構造的類似度に基づく高速なクラスタリング)Given: グラフ構造 G = (V,E),構造的類似度の閾値 ϵ,

およびクラスタを構成する最小ノード数 µ.Find: グラフ構造 Gから,クラスタ集合 C,ハブ集合 H

および外れ値集合 O.本稿では問題定義 1を従来よりも大規模なグラフ構造に対し

て適用可能にするため,SCANと同一のクラスタ集合 C,ハブ集合 H,外れ値集合 O を高速に抽出するクラスタリング手法を提案する.本稿では従来手法 SCAN の計算コストを削減するために,現実のグラフ構造の高いクラスタ性に着目した.クラスタ性はあるノードの隣接ノード同士が互いエッジで接続しやすい傾向にあるという性質ある.現実のグラフ構造は高いクラスタ性を持つことから,クラスタを形成しやすく密にエッジで接続した部分グラフ構造を内包していると考えられる.そこで本研究では,全てのエッジに対して構造的類似度の計算を必要とした SCAN を高速化するために,高いクラスタ性により密なエッジの接続を有する部分グラフ構造の計算を可能な限り回避するような手法を考える.提案手法では,最短ホップ数が2となるような部分ノード集合を抽出し,抽出した部分ノードに含まれるノードに接続したエッジについてのみ構造的類似度計算を行う.本稿では,本手法で抽出する最短ホップ数が 2となる部分ノード集合を 2-hop awayノードと呼ぶ.2-hop away

ノードに接続するエッジについてのみ構造的類似度の計算を行うことで,高いクラスタ性により密なエッジの接続を有するサブグラフ構造を少ない計算回数でクラスタリングする.従来手法である SCAN では,全てのエッジについて構造的類似度計算を行う必要があったのに対し,提案手法は 2-hop awayノードに接続したエッジのみ構造的類似度計算を行う.ゆえに,クラスタリング全体で計算されるエッジの本数を削減することができる.提案手法はグラフ中のノード数 |V | に対して O(|V |)の時間計算量を示す.その結果として,提案手法は以下の特性を有する.

• 高速性:  先に述べた 2つのアプローチにより,従来手法 SCANに対して高速にクラスタリングを行うことができる.

• 正確性:  提案手法で用いるアプローチは,SCANにおけるクラスタの定義を満たす. したがって,問題定義 1において SCANと同一のクラスタを得ることができる.

• 運用性:  提案手法は事前計算を必要とせず,グラフ構造 Gと 2つのパラメータ ϵ,µを与えることによりクラスタリングを実行できる.

表 1 記号の定義記号 定義|V | ノード数|E| エッジ数ϵ クラスタを構成するための構造的類似度の閾値µ クラスタに含まれる最小ノード数

Γ(u) ノード u の構造的隣接ノード集合|Γ(u)| Γ(u) に含まれるノード数σ(u, v) エッジ (u, v) の構造的類似度Nϵ(u) ノード u の ϵ − neighborhood

|Nϵ(u)| Nϵ に含まれるノード数Kϵ,µ(u) core であるノード u

u "→ϵ,µ v ノード u からノード v への direct structure reachability

u →ϵ,µ v ノード u からノード v への structure reachability

本稿で提案する手法は我々の知る限り,クラスタリングの高速性と正確性の両方を同時に満たす最初の手法である.従来手法である SCANは高い計算コストを有するものの,クラスタだけでなくハブや外れ値を抽出できることから幅広いアプリケーションで利用されている.本稿で提案する手法は,既に従来技術が利用されているアプリケーションや将来的に利用が予測される分野において,その処理性能の向上に貢献する.本稿の構成は,次の通りである.2.節で本稿の前提となる知

識について概説する.3.節にて提案手法の詳細について説明し,4.節において提案手法の評価と分析を行う.5.節にて関連研究について述べ,6.節にて,本稿をまとめ,今後の課題について論ずる.

2. 事 前 準 備従来手法 SCAN [8]を基に,本稿の前提について述べる.本

稿では無向重みなしグラフ G = (V,E)に対して,クラスタ集合 C,ハブ集合 H および外れ値集合 O を抽出することを考える.表 1にて本稿で用いる記号とその定義を示す.従来手法では 2つのノード間で共有される隣接ノード集合の

割合を評価することで構造類似度を計算していく.ここで用いる隣接ノード集合は以下のように定義される.[定義 1](構造的隣接ノード集合) u ∈ V とするとき,隣接ノード集合はノード uにエッジで接続するノードとノード u自身から構成される集合 Γ(u)で与えられる.

Γ(u) = {v ∈ V |{u, v} ∈ E} ∪ {u}.

また先に述べたように,クラスタリングで用いられる 2ノード間の構造的類似度は定義 1に基づき以下のように定義される.[定義 2](構造的類似度) u, v ∈ V,|Γ(u)|を隣接ノード集合に含まれるノード数とするとき,ノード u,v 間の構造的類似度は σ(u, v)となる.

σ(u, v) =|Γ(u) ∩ Γ(v)|!|Γ(u)||Γ(v)|

.

定義 2 に示したように,ノード u,v 間の σ(u, v) は共有されるノードがない場合 σ(u, v) = 0,互いに全て共有する場合には σ(u, v) = 1となる.構造的類似度に基づくクラスタリング手法 SCAN はクラス

タを構成するための類似度の閾値として ϵを導入し,類似度 ϵ

以上で接続する隣接ノード集合 ϵ-neighborhoodを定義する.[定義 3](ϵ-neighborhood) u ∈ V,ϵ ∈ R とするとき,ϵ-

neighborhood Nϵ(u)は以下のように定義される.

しかしながら,SCANはその計算量の大きさから大規模なグラフ構造を対象としたクラスタリングは難しい.SCANでは,全てのエッジに対して構造的類似度を計算する.そのため,グラフ構造中のエッジ数を |E| とした時に,O(|E|) の時間計算量が生じる.またグラフ構造中のノード数を |V | としたとき|E| ≈ |V |2 となるような場合,SCANの時間計算量は最悪計算量 O(|V |2)となる.したがって,近年増加する大規模なグラフ構造のクラスタリングに膨大な処理時間を要することになる.1. 1 本研究の貢献本稿では以下の問題について取り組む.

[問題定義 1](構造的類似度に基づく高速なクラスタリング)Given: グラフ構造 G = (V,E),構造的類似度の閾値 ϵ,

およびクラスタを構成する最小ノード数 µ.Find: グラフ構造 Gから,クラスタ集合 C,ハブ集合 H

および外れ値集合 O.本稿では問題定義 1を従来よりも大規模なグラフ構造に対し

て適用可能にするため,SCANと同一のクラスタ集合 C,ハブ集合 H,外れ値集合 O を高速に抽出するクラスタリング手法を提案する.本稿では従来手法 SCAN の計算コストを削減するために,現実のグラフ構造の高いクラスタ性に着目した.クラスタ性はあるノードの隣接ノード同士が互いエッジで接続しやすい傾向にあるという性質ある.現実のグラフ構造は高いクラスタ性を持つことから,クラスタを形成しやすく密にエッジで接続した部分グラフ構造を内包していると考えられる.そこで本研究では,全てのエッジに対して構造的類似度の計算を必要とした SCAN を高速化するために,高いクラスタ性により密なエッジの接続を有する部分グラフ構造の計算を可能な限り回避するような手法を考える.提案手法では,最短ホップ数が2となるような部分ノード集合を抽出し,抽出した部分ノードに含まれるノードに接続したエッジについてのみ構造的類似度計算を行う.本稿では,本手法で抽出する最短ホップ数が 2となる部分ノード集合を 2-hop awayノードと呼ぶ.2-hop away

ノードに接続するエッジについてのみ構造的類似度の計算を行うことで,高いクラスタ性により密なエッジの接続を有するサブグラフ構造を少ない計算回数でクラスタリングする.従来手法である SCAN では,全てのエッジについて構造的類似度計算を行う必要があったのに対し,提案手法は 2-hop awayノードに接続したエッジのみ構造的類似度計算を行う.ゆえに,クラスタリング全体で計算されるエッジの本数を削減することができる.提案手法はグラフ中のノード数 |V | に対して O(|V |)の時間計算量を示す.その結果として,提案手法は以下の特性を有する.

• 高速性:  先に述べた 2つのアプローチにより,従来手法 SCANに対して高速にクラスタリングを行うことができる.

• 正確性:  提案手法で用いるアプローチは,SCANにおけるクラスタの定義を満たす. したがって,問題定義 1において SCANと同一のクラスタを得ることができる.

• 運用性:  提案手法は事前計算を必要とせず,グラフ構造 Gと 2つのパラメータ ϵ,µを与えることによりクラスタリングを実行できる.

表 1 記号の定義記号 定義|V | ノード数|E| エッジ数ϵ クラスタを構成するための構造的類似度の閾値µ クラスタに含まれる最小ノード数

Γ(u) ノード u の構造的隣接ノード集合|Γ(u)| Γ(u) に含まれるノード数σ(u, v) エッジ (u, v) の構造的類似度Nϵ(u) ノード u の ϵ − neighborhood

|Nϵ(u)| Nϵ に含まれるノード数Kϵ,µ(u) core であるノード u

u "→ϵ,µ v ノード u からノード v への direct structure reachability

u →ϵ,µ v ノード u からノード v への structure reachability

本稿で提案する手法は我々の知る限り,クラスタリングの高速性と正確性の両方を同時に満たす最初の手法である.従来手法である SCANは高い計算コストを有するものの,クラスタだけでなくハブや外れ値を抽出できることから幅広いアプリケーションで利用されている.本稿で提案する手法は,既に従来技術が利用されているアプリケーションや将来的に利用が予測される分野において,その処理性能の向上に貢献する.本稿の構成は,次の通りである.2.節で本稿の前提となる知

識について概説する.3.節にて提案手法の詳細について説明し,4.節において提案手法の評価と分析を行う.5.節にて関連研究について述べ,6.節にて,本稿をまとめ,今後の課題について論ずる.

2. 事 前 準 備従来手法 SCAN [8]を基に,本稿の前提について述べる.本

稿では無向重みなしグラフ G = (V,E)に対して,クラスタ集合 C,ハブ集合 H および外れ値集合 O を抽出することを考える.表 1にて本稿で用いる記号とその定義を示す.従来手法では 2つのノード間で共有される隣接ノード集合の

割合を評価することで構造類似度を計算していく.ここで用いる隣接ノード集合は以下のように定義される.[定義 1](構造的隣接ノード集合) u ∈ V とするとき,隣接ノード集合はノード uにエッジで接続するノードとノード u自身から構成される集合 Γ(u)で与えられる.

Γ(u) = {v ∈ V |{u, v} ∈ E} ∪ {u}.

また先に述べたように,クラスタリングで用いられる 2ノード間の構造的類似度は定義 1に基づき以下のように定義される.[定義 2](構造的類似度) u, v ∈ V,|Γ(u)|を隣接ノード集合に含まれるノード数とするとき,ノード u,v 間の構造的類似度は σ(u, v)となる.

σ(u, v) =|Γ(u) ∩ Γ(v)|!|Γ(u)||Γ(v)|

.

定義 2 に示したように,ノード u,v 間の σ(u, v) は共有されるノードがない場合 σ(u, v) = 0,互いに全て共有する場合には σ(u, v) = 1となる.構造的類似度に基づくクラスタリング手法 SCAN はクラス

タを構成するための類似度の閾値として ϵを導入し,類似度 ϵ

以上で接続する隣接ノード集合 ϵ-neighborhoodを定義する.[定義 3](ϵ-neighborhood) u ∈ V,ϵ ∈ R とするとき,ϵ-

neighborhood Nϵ(u)は以下のように定義される.

similarity between two accounts u and v in the transaction graph, ‡(u, v), is calculatedas follows:

‡(u, v) = |�(u) fl �(v)|Ò

|�(u)| |�(v)|, (20)

where |.| denotes set cardinality and � is the set of structurally connected accounts(inclusive of self), defined as:

�(u) = {v œ V | {u, v} œ E} fi {u} . (21)

N‘

is the set of structurally connected accounts that have structural similarity with anaccount over a pre-determined threshold, ‘:

N‘

(u) = {v œ �(u)|‡(u, v) Ø ‘} . (22)

Core nodes are used for pivoting and expanding clusters and are defined as follows:

K‘,µ

(u) … |N‘

(u)| Ø µ (23)

where µ is the minimum number of epsilon neighbor accounts that an account musthave to be considered core. During clustering, clusters are centered (pivoted) around coreaccounts. The initial members of the cluster are the members of N

. This means that µcontrols the smallest possible size of a cluster. In NEM, µ is 4 and ‘ is 0.3. An account vhas direct structure reachability, u ‘æ

‘,µ

v, with account u for a given ‘ and µ, if u is coreand v is a member of N

(u):

u ‘æ‘,µ

v … K‘,µ

(u) · v œ N‘

(u). (24)

In the SCAN algorithm, accounts that are core are set as pivots and then clusters areexpanded by including accounts with direct structure reachability (Equation 24: Cluster-ing the transaction graph). This requires computing the structural similarity with eachneighbor and the neighbors’ neighbors.

The improved version of SCAN only looks at the pivot accounts and accounts that aretwo-hops away from the pivot accounts. Accounts two-hops away from account u, H(u),are defined as follows:

H(u) = {v œ V |(u, v) /œ E · (v, w) œ E)} , (25)

Page 33 of 54

Page 41: NEM Blockchain Meetup H27/07/11

SCAN++

similarity between two accounts u and v in the transaction graph, ‡(u, v), is calculatedas follows:

‡(u, v) = |�(u) fl �(v)|Ò

|�(u)| |�(v)|, (20)

where |.| denotes set cardinality and � is the set of structurally connected accounts(inclusive of self), defined as:

�(u) = {v œ V | {u, v} œ E} fi {u} . (21)

N‘

is the set of structurally connected accounts that have structural similarity with anaccount over a pre-determined threshold, ‘:

N‘

(u) = {v œ �(u)|‡(u, v) Ø ‘} . (22)

Core nodes are used for pivoting and expanding clusters and are defined as follows:

K‘,µ

(u) … |N‘

(u)| Ø µ (23)

where µ is the minimum number of epsilon neighbor accounts that an account musthave to be considered core. During clustering, clusters are centered (pivoted) around coreaccounts. The initial members of the cluster are the members of N

. This means that µcontrols the smallest possible size of a cluster. In NEM, µ is 4 and ‘ is 0.3. An account vhas direct structure reachability, u ‘æ

‘,µ

v, with account u for a given ‘ and µ, if u is coreand v is a member of N

(u):

u ‘æ‘,µ

v … K‘,µ

(u) · v œ N‘

(u). (24)

In the SCAN algorithm, accounts that are core are set as pivots and then clusters areexpanded by including accounts with direct structure reachability (Equation 24: Cluster-ing the transaction graph). This requires computing the structural similarity with eachneighbor and the neighbors’ neighbors.

The improved version of SCAN only looks at the pivot accounts and accounts that aretwo-hops away from the pivot accounts. Accounts two-hops away from account u, H(u),are defined as follows:

H(u) = {v œ V |(u, v) /œ E · (v, w) œ E)} , (25)

Page 33 of 54

where account w is an account, such that w œ N‘

(u) \ {u}. For each core account that istwo-hops away from the pivot, a new cluster is generated and pivoted around it. All ofthe core account’s epsilon neighbors (N

) are added to the new cluster. When computingthe accounts that are two-hops away, accounts with direct structure reachability from thepivoted node are removed from the calculation. When expanding the two-hops awayaccounts, the accounts are processed, such that:

H(un

) =I

v œ V |(u, v) /œ E · (v, w) œ E · v /œn≠1€

i=0

N‘

(ui

) fi H(ui

)J

. (26)

After all accounts in the graph have been processed, all nodes are analyzed. If anaccount belongs to multiple clusters, then those clusters are merged. Afterwards, anyaccount that is not in a cluster is marked as a hub if it connects two or more clusters oras an outlier if it does not.

The use of the two-hop away nodes to expand the clusters reduces the computationcost of clustering because the calculation of structural similarity ‡ is the slowest part ofthe algorithm.

The computed clusters are also used to determine the levels in the NCDawareRankinter-level proximity matrix, as these clusters are representative of the nearly completelydecomposable nature of the transaction graph.

7.5 Calculating Importance Scores

The importance score, Â, is calculated as follows:

 = (normalize1

(max(0, ‹ + ‡wo

)) + fiwi

)‰, (27)

where:

normalize1

(v) is: v

Îv΋ is the vested amount of XEM‡ is the weighted, net outlinking XEMfi is the NCDawareRank [10] score‰ is a weighting vector that considers the structural topology of the graph

wo

, wi

are suitable constants

Page 34 of 54

Page 42: NEM Blockchain Meetup H27/07/11

SCAN++る.この際に,2-hop awayノード集合を拡張する時点までに選択された pivot から direct structure reachable とされたノード集合は拡張された 2-hop awayノード集合から除外する.本稿では拡張された 2-hop away ノード集合を拡張 2-hop away

ノード集合と呼び,以下に定義する.[定義 11](拡張 2-hop awayノード集合) ノード un を新たに選択した pivot,ノード u1, u2, . . . , un−1 ∈ V をノード un

が pivotとして選択される以前に選択された pivotとする.ただし,ノード ui−1 と ui に対して,ノード ui−1 が先に選択されたものとする.また,ノード w を w ∈ Γ(un)とする.このとき,un が pivot として選択された際に得られる拡張 2-hop

awayノード集合 H(un)は,

H(un)={v ∈ V |(u, v) /∈ E∧(v, w) ∈ E∧v /∈!n−1

i=0 Nϵ(ui)∪H(ui)},

で与えられるノード集合である.提案手法は選択した pivot の集合を P とするとき,

{!n

i=0 H(ui)}\P = ∅ となるまで,拡張 2-hop away ノード集合を取得し,取得したノード集合に接続するエッジに対して構造的類似度を計算する.ノード集合 {

!ni=0 H(ui)}\P が収束

する条件は (1) 提案手法が全てのノードを走査し終えた場合,もしくは,(2)

!ni=0 Nϵ(ui) が収束した場合の 2 通りである.

我々の検証では,事前に与えられるパラメータ ϵおよび µは極めて小さい場合を除き,提案手法は (2)の理由で収束する.提案手法では,グラフ中に未計算のノードが存在する限り定

義 10および定義 11で定義される(拡張)2-hop awayノード集合の取得と構造的類似度の計算を繰り返す.提案手法のアルゴリズムの詳細については Algorithm2を参照されたい.3. 2. 1 非計算対象ノードの後処理提案手法では定義 7で与えられた structure-connectedクラ

スタと同一のクラスタを抽出するために,2つ以上のクラスタに属する未計算のノードに対して後処理を行う.2つ以上のクラスタに属するノードは,そのノードが coreで

ある場合,定義 6により隣接するクラスタが同一の structure-

connectedクラスタとなる.ゆえに,2つ以上のクラスタに属するノードが存在する場合,そのノードが coreとなるかを判定する必要がある.提案手法では所属クラスタ数の多いノードから順に選択し,所属クラスタ数がパラメータ µ以上の場合には,類似度計算を必要とせずに core と判定し,関連するクラスタのクラスタラベルを更新する.上記の処理が終了した後,依然として複数のクラスタに属しているノードについてのみ,未計算のエッジに関して構造的類似度を計算し coreの判定を行う.この後処理は,一見すると提案手法の計算量を従来手法 SCAN

と同等のものに近づけてしまう手法にみえる.しかしながら,2つの理由により計算量の増加が回避されている.第 1の理由は,複数クラスタに属する全てのノードを後処理する必要がない点である.グラフ構造のクラスタ性に着目すると,同一のcoreに隣接するノードは複数存在することが示唆される.このような場合,複数存在するノードのうち,どれか一つでも core

であることが判明すれば,その他のノードについては後処理する必要がない.第 2の理由は,先行研究 [8]に示された知見に

Algorithm 2 Proposed methodRequire: G = (V,E), ϵ ∈ R, µ ∈ N;Ensure: clusters C, hubs H, and outliers O;1: for each unclassified node u ∈ V do2: P = {u};3: if Kϵ,µ(u) then4: generate new clusterID id;5: assign id to ∀v ∈ Nϵ(u);6: else7: label u as non-member;8: end if9: while {

!∀u∈P H(u)}\P |= ∅ do

10: for v ∈ {!

∀u∈P H(u)}\P do

11: if Kϵ,µ(v) then12: generate new clusterID id;13: assign id to ∀v ∈ Nϵ(v);14: else15: label v as non-member ;16: end if17: end for18: for u ∈ {

!∀u∈P H(u)}\P do

19: P = P ∪ H(u);20: end for21: end while22: end for23: while each node u which labeled as several id do24: if the number of ids < µ then25: compute structural similarities for non-evaluated edges26: end if27: if Kϵ,µ(u) then28: u is core;29: refine cluster ids30: end if31: end while32: for each non-member node u do33: if ∃x, y ∈ Γ(u), x.clusterID |= y.clusterID then34: label u as hub;35: else36: label u as outlier ;37: end if38: end for

よるものである.文献 [8] では,“an ϵ value between 0.5 and

0.8 is normally sufficient to achieve a good clustering result.

We recommend a value for µ, of 2.” と示されている.この知見に従い,よいクラスタが得られる µ = 2をパラメータとして与えた場合,複数のクラスタに属するノードは自明に coreとなる.以上の理由から現実のグラフ構造に対しては計算量の増加が回避されている.実際に我々の評価実験では,未計算のエッジに対する構造的類似度計算の計算は一度も発生せず,かつ,coreの判定に要する時間もクラスタリング全体にかかる計算時間に対して無視できる程度に小さいという結果が得られている.3. 3 提案手法の正確性提案手法により抽出されるクラスタの正確性について証明す

る.本稿でいう正確性とは同一のグラフ構造およびパラメータを与えた際に,従来手法である SCAN と同一のクラスタリング結果を出力することである.クラスタの正確性を示すためには,2-hop awayノード集合

によって計算されるノード集合が 2-hop awayノード集合に含まれる coreノードが構築する structure-connectedクラスタを全て含んでいる必要がある.これは 2-hop awayノード集合のクラスタ包含性により証明できる.まず本節ではクラスタ包含性を示すために,2-hop away ノード集合の non-direct struc-

tural reachability を補題 1 で示す.補題 1 では,拡張 2-hop

away ノード集合の抽出が収束した際に得られた pivot 集合をP,pivot集合に含まれるノードの ϵ-neighborhoodの和集合を!

∀u∈P Nϵ(u)とする.また,2-hop awayノード集合抽出時に得られるノード集合を VH = {

!∀u∈P Nϵ} ∪ P とする.

[補題 1](non-direct structural reachability) 拡張 2-hop away

ノード集合抽出時に走査されるノードの部分集合を VH,それに含まれない全てのノード集合を V H = V \VH とするとき,

Page 43: NEM Blockchain Meetup H27/07/11

NCDAWARERANK

• Googleの有名なPageRankと同様ですが、「NCD」awareの部分が新しい

• NCD: Nearly Completely Decomposable (大体全てがdecomposable)

Page 44: NEM Blockchain Meetup H27/07/11

NCDAWARERANK1

3

2

5

4 7

6

(a) “flat” web

1

3

2

5

4 7

6

(b) NCD web

Figure 1: Tiny web graph

added pages which usually have too-few incoming links, andthus cannot receive reasonable ranking [9].We believe that one of the main causes of the previously

mentioned problems is that PageRank, as in fact most linkanalysis algorithms, approach Web in a “flat”way. Researchabout the topological structure of the Web has shown thatthe hyperlink graph has a nested block structure [19], withdomains, hosts and websites, introducing intermediate levelsof affiliation and revealing an interesting macro-structure [3,12]. Thus, the Web just like many naturally emerging com-plex systems has a hierarchical nature. According to Simon[30] this is no accident; basically all viable complex systems,be they physical, social, biological, or artificial, share theproperty of having a nearly completely decomposable archi-tecture: they are organized into hierarchical layers of blocks,sub-blocks, sub-sub-blocks and so on, in such a way that in-teractions among elements belonging to the same block aremuch stronger than interactions between elements belongingto different blocks.The analysis of decomposable systems has been pioneered

by Simon and Ando [29], who reported on state aggregationin linear models of economic systems, but the versatility ofSimon’s idea has permitted the theory to be used with no-ticeable success in many complex problems originated fromevolutionary biology, social sciences, cognitive science, man-agement, etc. The introduction of near complete decompos-ability (NCD) in the fields of computer science and engineer-ing is due to Courtois [11] who achieved the full mathemat-ical development of the theory, and applied it with greatoriginality in a number of queueing and computer systemperformance problems.In recent years, the decomposability of the Web has been

exploited mainly from a computational point of view. The“mathematical fingerprint” of NCD in the PageRank prob-lem is the special spectral structure [11, 23] of the stochas-tic matrix corresponding to the random walk on the Webgraph. This property opens the way for the use of nu-merical methods like Iterative Aggregation/Disaggregation(IAD) [27] that can be used to accelerate the computation ofthe PageRank vector considerably. Many researchers havefollowed such approaches with promising results. Kamvar etal.[19] use the first two steps of the general IAD method toobtain an initial vector for subsequent Power Method itera-tions. Zhu et al.[33] propose a distributed PageRank com-

putation algorithm based on IAD methods. Langville andMeyer [21] use a two-block IAD to accelerate the updatingof PageRank vector, and Ipsen et al.[17] analyse its asymp-totic convergence. Recently, Cevahir et al.[8] used site basedpartitioning techniques in order to accelerate PageRank.

However, little have been done to exploit decomposabilityfrom a qualitative point of view. Knowing that a complexsystem possesses the property of NCD, points the way to-wards a more appropriate modelling approach and a math-ematical analysis, which highlights the system’s endemiccharacteristics, gives us invaluable insight to its behaviourand consequently, provides us a theoretical framework to de-velop algorithms and methods that materialize this insightfrom a qualitative, computational as well as conceptual an-gle.

The main question we try to address in this work is “Howcould someone incorporate the concept of NCD to the ba-sic PageRank model in a way that refines and generalizesit while preserving its efficiency?”. For example, if we havethe information that the tiny Web graph of Figure 1(a) canbe decomposed like in Figure 1(b)1, in what way can weutilize this to achieve better ranking, without obscuring thesimplicity and the clarity of PageRank’s approach?

Summary of Contributions.The main contribution of this paper is the proposal of

NCDawareRank, a novel ranking measure which:

• provides a theoretical framework that enables the ex-ploitation of Web’s innately decomposable structure ina computationally efficient way.

• can serve as a generalization of PageRank that en-hances its expressiveness while inheriting its attractivemathematical characteristics and approach.

• displays low sensitivity to the problems caused by thesparsity of the Web graph and treats newly addedpages more fairly.

• exhibits resistance to direct manipulation through linkspamming.

The rest of the paper is organized as follows. In Sec-tion 2, we outline the basic idea behind NCDawareRankand we briefly discuss the nature of the basic NCD blocks.In Section 3, we develop the mathematical framework ofNCDawareRank. Our experimental results are presented inSection 4. Finally, Section 5 concludes this paper and out-lines directions for future work.

2. EXPLOITING WEB’S DECOMPOSABI-LITY: THE INTUITION

2.1 From PageRank to NCDawareRankUnderlying the definition of PageRank is the assumption

that the existence of a link from page u to page v testifies theimportance of page v. Furthermore, the amount of impor-tance conferred to page v is proportional to the importanceof page u and inversely proportional to the number of pagesu links to.

1same coloured nodes represent pages belonging to the sameblock

一言で、アカウントに付属するクラスタの情報を利用する PageRank

Page 45: NEM Blockchain Meetup H27/07/11

NCDAWARERANK

7.3 NCDawareRank

There are many ways to determine the salience of nodes in a network, and PageRank isone method. NCDawareRank is similar to PageRank, where the stationary probabilitydistribution of an ergodic Markov chain is calculated [9, 11]. NCDawareRank additionallyexploits the nearly completely decomposable structure of large-scale graphs of informationflows by adding an inter-level proximity matrix as an extra term, M. The inter-levelproximity matrix models the fact that groups of nodes are closely linked together to formclusters that interact with each other. This allows NCDawareRank to converge fasterthan PageRank while also being more resistant to manipulation of scores, because therank for nodes within the same level will be limited.

Shown in matrix notation, NCDawareRank is calculated as:

fi = O÷fi + Mµfi + E(1 ≠ ÷ ≠ µ)fi, (10)

where:

O is the outlink matrixM is the inter-level proximity matrixE is the teleportation matrixfi is the NCDawareRank÷ is the fraction of importance that is given via outlinksµ is the fraction of importance given to proximal accounts

This definition is the same as for PageRank, only with the addition of M and µ. ForNEM, ÷ is 0.7 and µ is 0.1. The details of how each of these variables is calculated are asfollows.

Let W be the set of all harvesting-eligible accounts. For u œ W , Gu

is the set ofaccounts that have received more in value transfers from account u than have sent u.Nearly completely decomposable (NCD) partitions of W are defined as {A

1

, A2

, ..., AN

},such that for every u œ W , there is a unique K such that u œ A

K

. The proximal accountsof each u, ‰

u

, are thus defined as:

‰u

,€

wœ(ufiGu)

A(w)

, (11)

and Nu

denotes the number of NCD blocks in ‰u

.

Page 30 of 54

7.3 NCDawareRank

There are many ways to determine the salience of nodes in a network, and PageRank isone method. NCDawareRank is similar to PageRank, where the stationary probabilitydistribution of an ergodic Markov chain is calculated [9, 11]. NCDawareRank additionallyexploits the nearly completely decomposable structure of large-scale graphs of informationflows by adding an inter-level proximity matrix as an extra term, M. The inter-levelproximity matrix models the fact that groups of nodes are closely linked together to formclusters that interact with each other. This allows NCDawareRank to converge fasterthan PageRank while also being more resistant to manipulation of scores, because therank for nodes within the same level will be limited.

Shown in matrix notation, NCDawareRank is calculated as:

fi = O÷fi + Mµfi + E(1 ≠ ÷ ≠ µ)fi, (10)

where:

O is the outlink matrixM is the inter-level proximity matrixE is the teleportation matrixfi is the NCDawareRank÷ is the fraction of importance that is given via outlinksµ is the fraction of importance given to proximal accounts

This definition is the same as for PageRank, only with the addition of M and µ. ForNEM, ÷ is 0.7 and µ is 0.1. The details of how each of these variables is calculated are asfollows.

Let W be the set of all harvesting-eligible accounts. For u œ W , Gu

is the set ofaccounts that have received more in value transfers from account u than have sent u.Nearly completely decomposable (NCD) partitions of W are defined as {A

1

, A2

, ..., AN

},such that for every u œ W , there is a unique K such that u œ A

K

. The proximal accountsof each u, ‰

u

, are thus defined as:

‰u

,€

wœ(ufiGu)

A(w)

, (11)

and Nu

denotes the number of NCD blocks in ‰u

.

Page 30 of 54

アウトリンク行列

7.3 NCDawareRank

There are many ways to determine the salience of nodes in a network, and PageRank isone method. NCDawareRank is similar to PageRank, where the stationary probabilitydistribution of an ergodic Markov chain is calculated [9, 11]. NCDawareRank additionallyexploits the nearly completely decomposable structure of large-scale graphs of informationflows by adding an inter-level proximity matrix as an extra term, M. The inter-levelproximity matrix models the fact that groups of nodes are closely linked together to formclusters that interact with each other. This allows NCDawareRank to converge fasterthan PageRank while also being more resistant to manipulation of scores, because therank for nodes within the same level will be limited.

Shown in matrix notation, NCDawareRank is calculated as:

fi = O÷fi + Mµfi + E(1 ≠ ÷ ≠ µ)fi, (10)

where:

O is the outlink matrixM is the inter-level proximity matrixE is the teleportation matrixfi is the NCDawareRank÷ is the fraction of importance that is given via outlinksµ is the fraction of importance given to proximal accounts

This definition is the same as for PageRank, only with the addition of M and µ. ForNEM, ÷ is 0.7 and µ is 0.1. The details of how each of these variables is calculated are asfollows.

Let W be the set of all harvesting-eligible accounts. For u œ W , Gu

is the set ofaccounts that have received more in value transfers from account u than have sent u.Nearly completely decomposable (NCD) partitions of W are defined as {A

1

, A2

, ..., AN

},such that for every u œ W , there is a unique K such that u œ A

K

. The proximal accountsof each u, ‰

u

, are thus defined as:

‰u

,€

wœ(ufiGu)

A(w)

, (11)

and Nu

denotes the number of NCD blocks in ‰u

.

Page 30 of 54

7.3 NCDawareRank

There are many ways to determine the salience of nodes in a network, and PageRank isone method. NCDawareRank is similar to PageRank, where the stationary probabilitydistribution of an ergodic Markov chain is calculated [9, 11]. NCDawareRank additionallyexploits the nearly completely decomposable structure of large-scale graphs of informationflows by adding an inter-level proximity matrix as an extra term, M. The inter-levelproximity matrix models the fact that groups of nodes are closely linked together to formclusters that interact with each other. This allows NCDawareRank to converge fasterthan PageRank while also being more resistant to manipulation of scores, because therank for nodes within the same level will be limited.

Shown in matrix notation, NCDawareRank is calculated as:

fi = O÷fi + Mµfi + E(1 ≠ ÷ ≠ µ)fi, (10)

where:

O is the outlink matrixM is the inter-level proximity matrixE is the teleportation matrixfi is the NCDawareRank÷ is the fraction of importance that is given via outlinksµ is the fraction of importance given to proximal accounts

This definition is the same as for PageRank, only with the addition of M and µ. ForNEM, ÷ is 0.7 and µ is 0.1. The details of how each of these variables is calculated are asfollows.

Let W be the set of all harvesting-eligible accounts. For u œ W , Gu

is the set ofaccounts that have received more in value transfers from account u than have sent u.Nearly completely decomposable (NCD) partitions of W are defined as {A

1

, A2

, ..., AN

},such that for every u œ W , there is a unique K such that u œ A

K

. The proximal accountsof each u, ‰

u

, are thus defined as:

‰u

,€

wœ(ufiGu)

A(w)

, (11)

and Nu

denotes the number of NCD blocks in ‰u

.

Page 30 of 54

7.3 NCDawareRank

There are many ways to determine the salience of nodes in a network, and PageRank isone method. NCDawareRank is similar to PageRank, where the stationary probabilitydistribution of an ergodic Markov chain is calculated [9, 11]. NCDawareRank additionallyexploits the nearly completely decomposable structure of large-scale graphs of informationflows by adding an inter-level proximity matrix as an extra term, M. The inter-levelproximity matrix models the fact that groups of nodes are closely linked together to formclusters that interact with each other. This allows NCDawareRank to converge fasterthan PageRank while also being more resistant to manipulation of scores, because therank for nodes within the same level will be limited.

Shown in matrix notation, NCDawareRank is calculated as:

fi = O÷fi + Mµfi + E(1 ≠ ÷ ≠ µ)fi, (10)

where:

O is the outlink matrixM is the inter-level proximity matrixE is the teleportation matrixfi is the NCDawareRank÷ is the fraction of importance that is given via outlinksµ is the fraction of importance given to proximal accounts

This definition is the same as for PageRank, only with the addition of M and µ. ForNEM, ÷ is 0.7 and µ is 0.1. The details of how each of these variables is calculated are asfollows.

Let W be the set of all harvesting-eligible accounts. For u œ W , Gu

is the set ofaccounts that have received more in value transfers from account u than have sent u.Nearly completely decomposable (NCD) partitions of W are defined as {A

1

, A2

, ..., AN

},such that for every u œ W , there is a unique K such that u œ A

K

. The proximal accountsof each u, ‰

u

, are thus defined as:

‰u

,€

wœ(ufiGu)

A(w)

, (11)

and Nu

denotes the number of NCD blocks in ‰u

.

Page 30 of 54

7.3 NCDawareRank

There are many ways to determine the salience of nodes in a network, and PageRank isone method. NCDawareRank is similar to PageRank, where the stationary probabilitydistribution of an ergodic Markov chain is calculated [9, 11]. NCDawareRank additionallyexploits the nearly completely decomposable structure of large-scale graphs of informationflows by adding an inter-level proximity matrix as an extra term, M. The inter-levelproximity matrix models the fact that groups of nodes are closely linked together to formclusters that interact with each other. This allows NCDawareRank to converge fasterthan PageRank while also being more resistant to manipulation of scores, because therank for nodes within the same level will be limited.

Shown in matrix notation, NCDawareRank is calculated as:

fi = O÷fi + Mµfi + E(1 ≠ ÷ ≠ µ)fi, (10)

where:

O is the outlink matrixM is the inter-level proximity matrixE is the teleportation matrixfi is the NCDawareRank÷ is the fraction of importance that is given via outlinksµ is the fraction of importance given to proximal accounts

This definition is the same as for PageRank, only with the addition of M and µ. ForNEM, ÷ is 0.7 and µ is 0.1. The details of how each of these variables is calculated are asfollows.

Let W be the set of all harvesting-eligible accounts. For u œ W , Gu

is the set ofaccounts that have received more in value transfers from account u than have sent u.Nearly completely decomposable (NCD) partitions of W are defined as {A

1

, A2

, ..., AN

},such that for every u œ W , there is a unique K such that u œ A

K

. The proximal accountsof each u, ‰

u

, are thus defined as:

‰u

,€

wœ(ufiGu)

A(w)

, (11)

and Nu

denotes the number of NCD blocks in ‰u

.

Page 30 of 54

7.3 NCDawareRank

There are many ways to determine the salience of nodes in a network, and PageRank isone method. NCDawareRank is similar to PageRank, where the stationary probabilitydistribution of an ergodic Markov chain is calculated [9, 11]. NCDawareRank additionallyexploits the nearly completely decomposable structure of large-scale graphs of informationflows by adding an inter-level proximity matrix as an extra term, M. The inter-levelproximity matrix models the fact that groups of nodes are closely linked together to formclusters that interact with each other. This allows NCDawareRank to converge fasterthan PageRank while also being more resistant to manipulation of scores, because therank for nodes within the same level will be limited.

Shown in matrix notation, NCDawareRank is calculated as:

fi = O÷fi + Mµfi + E(1 ≠ ÷ ≠ µ)fi, (10)

where:

O is the outlink matrixM is the inter-level proximity matrixE is the teleportation matrixfi is the NCDawareRank÷ is the fraction of importance that is given via outlinksµ is the fraction of importance given to proximal accounts

This definition is the same as for PageRank, only with the addition of M and µ. ForNEM, ÷ is 0.7 and µ is 0.1. The details of how each of these variables is calculated are asfollows.

Let W be the set of all harvesting-eligible accounts. For u œ W , Gu

is the set ofaccounts that have received more in value transfers from account u than have sent u.Nearly completely decomposable (NCD) partitions of W are defined as {A

1

, A2

, ..., AN

},such that for every u œ W , there is a unique K such that u œ A

K

. The proximal accountsof each u, ‰

u

, are thus defined as:

‰u

,€

wœ(ufiGu)

A(w)

, (11)

and Nu

denotes the number of NCD blocks in ‰u

.

Page 30 of 54

7.3 NCDawareRank

There are many ways to determine the salience of nodes in a network, and PageRank isone method. NCDawareRank is similar to PageRank, where the stationary probabilitydistribution of an ergodic Markov chain is calculated [9, 11]. NCDawareRank additionallyexploits the nearly completely decomposable structure of large-scale graphs of informationflows by adding an inter-level proximity matrix as an extra term, M. The inter-levelproximity matrix models the fact that groups of nodes are closely linked together to formclusters that interact with each other. This allows NCDawareRank to converge fasterthan PageRank while also being more resistant to manipulation of scores, because therank for nodes within the same level will be limited.

Shown in matrix notation, NCDawareRank is calculated as:

fi = O÷fi + Mµfi + E(1 ≠ ÷ ≠ µ)fi, (10)

where:

O is the outlink matrixM is the inter-level proximity matrixE is the teleportation matrixfi is the NCDawareRank÷ is the fraction of importance that is given via outlinksµ is the fraction of importance given to proximal accounts

This definition is the same as for PageRank, only with the addition of M and µ. ForNEM, ÷ is 0.7 and µ is 0.1. The details of how each of these variables is calculated are asfollows.

Let W be the set of all harvesting-eligible accounts. For u œ W , Gu

is the set ofaccounts that have received more in value transfers from account u than have sent u.Nearly completely decomposable (NCD) partitions of W are defined as {A

1

, A2

, ..., AN

},such that for every u œ W , there is a unique K such that u œ A

K

. The proximal accountsof each u, ‰

u

, are thus defined as:

‰u

,€

wœ(ufiGu)

A(w)

, (11)

and Nu

denotes the number of NCD blocks in ‰u

.

Page 30 of 54

レベル間proximity行列(付属するクラスタの情報)teleportation行列NCDawareRank

アウトリンクの重み近位アカウントの重み

Page 46: NEM Blockchain Meetup H27/07/11

重要性の計算

where account w is an account, such that w œ N‘

(u) \ {u}. For each core account that istwo-hops away from the pivot, a new cluster is generated and pivoted around it. All ofthe core account’s epsilon neighbors (N

) are added to the new cluster. When computingthe accounts that are two-hops away, accounts with direct structure reachability from thepivoted node are removed from the calculation. When expanding the two-hops awayaccounts, the accounts are processed, such that:

H(un

) =I

v œ V |(u, v) /œ E · (v, w) œ E · v /œn≠1€

i=0

N‘

(ui

) fi H(ui

)J

. (26)

After all accounts in the graph have been processed, all nodes are analyzed. If anaccount belongs to multiple clusters, then those clusters are merged. Afterwards, anyaccount that is not in a cluster is marked as a hub if it connects two or more clusters oras an outlier if it does not.

The use of the two-hop away nodes to expand the clusters reduces the computationcost of clustering because the calculation of structural similarity ‡ is the slowest part ofthe algorithm.

The computed clusters are also used to determine the levels in the NCDawareRankinter-level proximity matrix, as these clusters are representative of the nearly completelydecomposable nature of the transaction graph.

7.5 Calculating Importance Scores

The importance score, Â, is calculated as follows:

 = (normalize1

(max(0, ‹ + ‡wo

)) + fiwi

)‰, (27)

where:

normalize1

(v) is: v

Îv΋ is the vested amount of XEM‡ is the weighted, net outlinking XEMfi is the NCDawareRank [10] score‰ is a weighting vector that considers the structural topology of the graph

wo

, wi

are suitable constants

Page 34 of 54

where account w is an account, such that w œ N‘

(u) \ {u}. For each core account that istwo-hops away from the pivot, a new cluster is generated and pivoted around it. All ofthe core account’s epsilon neighbors (N

) are added to the new cluster. When computingthe accounts that are two-hops away, accounts with direct structure reachability from thepivoted node are removed from the calculation. When expanding the two-hops awayaccounts, the accounts are processed, such that:

H(un

) =I

v œ V |(u, v) /œ E · (v, w) œ E · v /œn≠1€

i=0

N‘

(ui

) fi H(ui

)J

. (26)

After all accounts in the graph have been processed, all nodes are analyzed. If anaccount belongs to multiple clusters, then those clusters are merged. Afterwards, anyaccount that is not in a cluster is marked as a hub if it connects two or more clusters oras an outlier if it does not.

The use of the two-hop away nodes to expand the clusters reduces the computationcost of clustering because the calculation of structural similarity ‡ is the slowest part ofthe algorithm.

The computed clusters are also used to determine the levels in the NCDawareRankinter-level proximity matrix, as these clusters are representative of the nearly completelydecomposable nature of the transaction graph.

7.5 Calculating Importance Scores

The importance score, Â, is calculated as follows:

 = (normalize1

(max(0, ‹ + ‡wo

)) + fiwi

)‰, (27)

where:

normalize1

(v) is: v

Îv΋ is the vested amount of XEM‡ is the weighted, net outlinking XEMfi is the NCDawareRank [10] score‰ is a weighting vector that considers the structural topology of the graph

wo

, wi

are suitable constants

Page 34 of 54

where account w is an account, such that w œ N‘

(u) \ {u}. For each core account that istwo-hops away from the pivot, a new cluster is generated and pivoted around it. All ofthe core account’s epsilon neighbors (N

) are added to the new cluster. When computingthe accounts that are two-hops away, accounts with direct structure reachability from thepivoted node are removed from the calculation. When expanding the two-hops awayaccounts, the accounts are processed, such that:

H(un

) =I

v œ V |(u, v) /œ E · (v, w) œ E · v /œn≠1€

i=0

N‘

(ui

) fi H(ui

)J

. (26)

After all accounts in the graph have been processed, all nodes are analyzed. If anaccount belongs to multiple clusters, then those clusters are merged. Afterwards, anyaccount that is not in a cluster is marked as a hub if it connects two or more clusters oras an outlier if it does not.

The use of the two-hop away nodes to expand the clusters reduces the computationcost of clustering because the calculation of structural similarity ‡ is the slowest part ofthe algorithm.

The computed clusters are also used to determine the levels in the NCDawareRankinter-level proximity matrix, as these clusters are representative of the nearly completelydecomposable nature of the transaction graph.

7.5 Calculating Importance Scores

The importance score, Â, is calculated as follows:

 = (normalize1

(max(0, ‹ + ‡wo

)) + fiwi

)‰, (27)

where:

normalize1

(v) is: v

Îv΋ is the vested amount of XEM‡ is the weighted, net outlinking XEMfi is the NCDawareRank [10] score‰ is a weighting vector that considers the structural topology of the graph

wo

, wi

are suitable constants

Page 34 of 54

7.3 NCDawareRank

There are many ways to determine the salience of nodes in a network, and PageRank isone method. NCDawareRank is similar to PageRank, where the stationary probabilitydistribution of an ergodic Markov chain is calculated [9, 11]. NCDawareRank additionallyexploits the nearly completely decomposable structure of large-scale graphs of informationflows by adding an inter-level proximity matrix as an extra term, M. The inter-levelproximity matrix models the fact that groups of nodes are closely linked together to formclusters that interact with each other. This allows NCDawareRank to converge fasterthan PageRank while also being more resistant to manipulation of scores, because therank for nodes within the same level will be limited.

Shown in matrix notation, NCDawareRank is calculated as:

fi = O÷fi + Mµfi + E(1 ≠ ÷ ≠ µ)fi, (10)

where:

O is the outlink matrixM is the inter-level proximity matrixE is the teleportation matrixfi is the NCDawareRank÷ is the fraction of importance that is given via outlinksµ is the fraction of importance given to proximal accounts

This definition is the same as for PageRank, only with the addition of M and µ. ForNEM, ÷ is 0.7 and µ is 0.1. The details of how each of these variables is calculated are asfollows.

Let W be the set of all harvesting-eligible accounts. For u œ W , Gu

is the set ofaccounts that have received more in value transfers from account u than have sent u.Nearly completely decomposable (NCD) partitions of W are defined as {A

1

, A2

, ..., AN

},such that for every u œ W , there is a unique K such that u œ A

K

. The proximal accountsof each u, ‰

u

, are thus defined as:

‰u

,€

wœ(ufiGu)

A(w)

, (11)

and Nu

denotes the number of NCD blocks in ‰u

.

Page 30 of 54

where account w is an account, such that w œ N‘

(u) \ {u}. For each core account that istwo-hops away from the pivot, a new cluster is generated and pivoted around it. All ofthe core account’s epsilon neighbors (N

) are added to the new cluster. When computingthe accounts that are two-hops away, accounts with direct structure reachability from thepivoted node are removed from the calculation. When expanding the two-hops awayaccounts, the accounts are processed, such that:

H(un

) =I

v œ V |(u, v) /œ E · (v, w) œ E · v /œn≠1€

i=0

N‘

(ui

) fi H(ui

)J

. (26)

After all accounts in the graph have been processed, all nodes are analyzed. If anaccount belongs to multiple clusters, then those clusters are merged. Afterwards, anyaccount that is not in a cluster is marked as a hub if it connects two or more clusters oras an outlier if it does not.

The use of the two-hop away nodes to expand the clusters reduces the computationcost of clustering because the calculation of structural similarity ‡ is the slowest part ofthe algorithm.

The computed clusters are also used to determine the levels in the NCDawareRankinter-level proximity matrix, as these clusters are representative of the nearly completelydecomposable nature of the transaction graph.

7.5 Calculating Importance Scores

The importance score, Â, is calculated as follows:

 = (normalize1

(max(0, ‹ + ‡wo

)) + fiwi

)‰, (27)

where:

normalize1

(v) is: v

Îv΋ is the vested amount of XEM‡ is the weighted, net outlinking XEMfi is the NCDawareRank [10] score‰ is a weighting vector that considers the structural topology of the graph

wo

, wi

are suitable constants

Page 34 of 54

7.3 NCDawareRank

There are many ways to determine the salience of nodes in a network, and PageRank isone method. NCDawareRank is similar to PageRank, where the stationary probabilitydistribution of an ergodic Markov chain is calculated [9, 11]. NCDawareRank additionallyexploits the nearly completely decomposable structure of large-scale graphs of informationflows by adding an inter-level proximity matrix as an extra term, M. The inter-levelproximity matrix models the fact that groups of nodes are closely linked together to formclusters that interact with each other. This allows NCDawareRank to converge fasterthan PageRank while also being more resistant to manipulation of scores, because therank for nodes within the same level will be limited.

Shown in matrix notation, NCDawareRank is calculated as:

fi = O÷fi + Mµfi + E(1 ≠ ÷ ≠ µ)fi, (10)

where:

O is the outlink matrixM is the inter-level proximity matrixE is the teleportation matrixfi is the NCDawareRank÷ is the fraction of importance that is given via outlinksµ is the fraction of importance given to proximal accounts

This definition is the same as for PageRank, only with the addition of M and µ. ForNEM, ÷ is 0.7 and µ is 0.1. The details of how each of these variables is calculated are asfollows.

Let W be the set of all harvesting-eligible accounts. For u œ W , Gu

is the set ofaccounts that have received more in value transfers from account u than have sent u.Nearly completely decomposable (NCD) partitions of W are defined as {A

1

, A2

, ..., AN

},such that for every u œ W , there is a unique K such that u œ A

K

. The proximal accountsof each u, ‰

u

, are thus defined as:

‰u

,€

wœ(ufiGu)

A(w)

, (11)

and Nu

denotes the number of NCD blocks in ‰u

.

Page 30 of 54

7.3 NCDawareRank

There are many ways to determine the salience of nodes in a network, and PageRank isone method. NCDawareRank is similar to PageRank, where the stationary probabilitydistribution of an ergodic Markov chain is calculated [9, 11]. NCDawareRank additionallyexploits the nearly completely decomposable structure of large-scale graphs of informationflows by adding an inter-level proximity matrix as an extra term, M. The inter-levelproximity matrix models the fact that groups of nodes are closely linked together to formclusters that interact with each other. This allows NCDawareRank to converge fasterthan PageRank while also being more resistant to manipulation of scores, because therank for nodes within the same level will be limited.

Shown in matrix notation, NCDawareRank is calculated as:

fi = O÷fi + Mµfi + E(1 ≠ ÷ ≠ µ)fi, (10)

where:

O is the outlink matrixM is the inter-level proximity matrixE is the teleportation matrixfi is the NCDawareRank÷ is the fraction of importance that is given via outlinksµ is the fraction of importance given to proximal accounts

This definition is the same as for PageRank, only with the addition of M and µ. ForNEM, ÷ is 0.7 and µ is 0.1. The details of how each of these variables is calculated are asfollows.

Let W be the set of all harvesting-eligible accounts. For u œ W , Gu

is the set ofaccounts that have received more in value transfers from account u than have sent u.Nearly completely decomposable (NCD) partitions of W are defined as {A

1

, A2

, ..., AN

},such that for every u œ W , there is a unique K such that u œ A

K

. The proximal accountsof each u, ‰

u

, are thus defined as:

‰u

,€

wœ(ufiGu)

A(w)

, (11)

and Nu

denotes the number of NCD blocks in ‰u

.

Page 30 of 54

7.3 NCDawareRank

There are many ways to determine the salience of nodes in a network, and PageRank isone method. NCDawareRank is similar to PageRank, where the stationary probabilitydistribution of an ergodic Markov chain is calculated [9, 11]. NCDawareRank additionallyexploits the nearly completely decomposable structure of large-scale graphs of informationflows by adding an inter-level proximity matrix as an extra term, M. The inter-levelproximity matrix models the fact that groups of nodes are closely linked together to formclusters that interact with each other. This allows NCDawareRank to converge fasterthan PageRank while also being more resistant to manipulation of scores, because therank for nodes within the same level will be limited.

Shown in matrix notation, NCDawareRank is calculated as:

fi = O÷fi + Mµfi + E(1 ≠ ÷ ≠ µ)fi, (10)

where:

O is the outlink matrixM is the inter-level proximity matrixE is the teleportation matrixfi is the NCDawareRank÷ is the fraction of importance that is given via outlinksµ is the fraction of importance given to proximal accounts

This definition is the same as for PageRank, only with the addition of M and µ. ForNEM, ÷ is 0.7 and µ is 0.1. The details of how each of these variables is calculated are asfollows.

Let W be the set of all harvesting-eligible accounts. For u œ W , Gu

is the set ofaccounts that have received more in value transfers from account u than have sent u.Nearly completely decomposable (NCD) partitions of W are defined as {A

1

, A2

, ..., AN

},such that for every u œ W , there is a unique K such that u œ A

K

. The proximal accountsof each u, ‰

u

, are thus defined as:

‰u

,€

wœ(ufiGu)

A(w)

, (11)

and Nu

denotes the number of NCD blocks in ‰u

.

Page 30 of 54

7.3 NCDawareRank

There are many ways to determine the salience of nodes in a network, and PageRank isone method. NCDawareRank is similar to PageRank, where the stationary probabilitydistribution of an ergodic Markov chain is calculated [9, 11]. NCDawareRank additionallyexploits the nearly completely decomposable structure of large-scale graphs of informationflows by adding an inter-level proximity matrix as an extra term, M. The inter-levelproximity matrix models the fact that groups of nodes are closely linked together to formclusters that interact with each other. This allows NCDawareRank to converge fasterthan PageRank while also being more resistant to manipulation of scores, because therank for nodes within the same level will be limited.

Shown in matrix notation, NCDawareRank is calculated as:

fi = O÷fi + Mµfi + E(1 ≠ ÷ ≠ µ)fi, (10)

where:

O is the outlink matrixM is the inter-level proximity matrixE is the teleportation matrixfi is the NCDawareRank÷ is the fraction of importance that is given via outlinksµ is the fraction of importance given to proximal accounts

This definition is the same as for PageRank, only with the addition of M and µ. ForNEM, ÷ is 0.7 and µ is 0.1. The details of how each of these variables is calculated are asfollows.

Let W be the set of all harvesting-eligible accounts. For u œ W , Gu

is the set ofaccounts that have received more in value transfers from account u than have sent u.Nearly completely decomposable (NCD) partitions of W are defined as {A

1

, A2

, ..., AN

},such that for every u œ W , there is a unique K such that u œ A

K

. The proximal accountsof each u, ‰

u

, are thus defined as:

‰u

,€

wœ(ufiGu)

A(w)

, (11)

and Nu

denotes the number of NCD blocks in ‰u

.

Page 30 of 54

付与された残高重み付けた送金されたXEMNCDawareRank値グラフ構造の重み(1if cluster, else 0.9)

where account w is an account, such that w œ N‘

(u) \ {u}. For each core account that istwo-hops away from the pivot, a new cluster is generated and pivoted around it. All ofthe core account’s epsilon neighbors (N

) are added to the new cluster. When computingthe accounts that are two-hops away, accounts with direct structure reachability from thepivoted node are removed from the calculation. When expanding the two-hops awayaccounts, the accounts are processed, such that:

H(un

) =I

v œ V |(u, v) /œ E · (v, w) œ E · v /œn≠1€

i=0

N‘

(ui

) fi H(ui

)J

. (26)

After all accounts in the graph have been processed, all nodes are analyzed. If anaccount belongs to multiple clusters, then those clusters are merged. Afterwards, anyaccount that is not in a cluster is marked as a hub if it connects two or more clusters oras an outlier if it does not.

The use of the two-hop away nodes to expand the clusters reduces the computationcost of clustering because the calculation of structural similarity ‡ is the slowest part ofthe algorithm.

The computed clusters are also used to determine the levels in the NCDawareRankinter-level proximity matrix, as these clusters are representative of the nearly completelydecomposable nature of the transaction graph.

7.5 Calculating Importance Scores

The importance score, Â, is calculated as follows:

 = (normalize1

(max(0, ‹ + ‡wo

)) + fiwi

)‰, (27)

where:

normalize1

(v) is: v

Îv΋ is the vested amount of XEM‡ is the weighted, net outlinking XEMfi is the NCDawareRank [10] score‰ is a weighting vector that considers the structural topology of the graph

wo

, wi

are suitable constants

Page 34 of 54

where account w is an account, such that w œ N‘

(u) \ {u}. For each core account that istwo-hops away from the pivot, a new cluster is generated and pivoted around it. All ofthe core account’s epsilon neighbors (N

) are added to the new cluster. When computingthe accounts that are two-hops away, accounts with direct structure reachability from thepivoted node are removed from the calculation. When expanding the two-hops awayaccounts, the accounts are processed, such that:

H(un

) =I

v œ V |(u, v) /œ E · (v, w) œ E · v /œn≠1€

i=0

N‘

(ui

) fi H(ui

)J

. (26)

After all accounts in the graph have been processed, all nodes are analyzed. If anaccount belongs to multiple clusters, then those clusters are merged. Afterwards, anyaccount that is not in a cluster is marked as a hub if it connects two or more clusters oras an outlier if it does not.

The use of the two-hop away nodes to expand the clusters reduces the computationcost of clustering because the calculation of structural similarity ‡ is the slowest part ofthe algorithm.

The computed clusters are also used to determine the levels in the NCDawareRankinter-level proximity matrix, as these clusters are representative of the nearly completelydecomposable nature of the transaction graph.

7.5 Calculating Importance Scores

The importance score, Â, is calculated as follows:

 = (normalize1

(max(0, ‹ + ‡wo

)) + fiwi

)‰, (27)

where:

normalize1

(v) is: v

Îv΋ is the vested amount of XEM‡ is the weighted, net outlinking XEMfi is the NCDawareRank [10] score‰ is a weighting vector that considers the structural topology of the graph

wo

, wi

are suitable constants

Page 34 of 54

= 1.25

= 0.1337

Page 47: NEM Blockchain Meetup H27/07/11

POSとPOIの比較Table 2: Di�erences between Bitcoin account ranks for importance scores vs. vestedbalances.

average rank increase for importance vs stakes (poor half): 2814.6average rank increase for importance vs stakes (rich half): -2949.6

average rank increase for importance vs stakes: -67.5

Page 41 of 54

両方PoSとPoIの場合(いわゆる取得した権利の残高と重要性)にして、各アカウントを1から54、683の間にランクをつけて、PoSとPoIの場合を比較した

平成26年10月あたり54、683ビットコインのアカウントの情報をダウンロードしたデータに基づいて、解析が行った

Page 48: NEM Blockchain Meetup H27/07/11

NEMの取引グラフに基づいたPOSとPOIの比較

サイズ=正規形残高 サイズ=正規形重要性

Page 49: NEM Blockchain Meetup H27/07/11

NEMのブロック難易度計算

20,000 25,000 30,000 35,000 40,000 45,000 50,000 55,000 60,00056

58

60

62

64

66

block heightse

cond

s

block times average over 360 blocks

Figure 4: Main net average block times over 360 blocks

If only one block is available, then the block has a predefined initial di�culty of 1014.Otherwise, the di�culty is calculated from the last n blocks the following way:

d = 1n

nÿ

i=1

(di�culty of block i) (average di�culty)

t = 1n

nÿ

i=1

(time to create block i) (average creation time)

di�culty = d60t

(new di�culty)

If the new di�culty is more than 5% greater or smaller than the di�culty of the lastblock, then the change is capped to 5%.

Additionally, di�culties are kept within certain bounds. The new di�culty is clampedto the boundaries if it is greater than 1015 or smaller than 1013.

Simulations and the NEM beta phase have shown that the algorithm produces blockswith an average time of 60 ± 0.5 seconds.

The slow change rate of 5% makes it hard for an attacker with considerably less than50% importance to create a better chain in secret since block times will be considerablyhigher than 60 seconds for the beginning of his secret chain.

Page 17 of 54

平均難易度平均ブロック間の時間新しい難易度

新しい難易度が前のブロックより5%以上と違ったら、5%までにする

Page 50: NEM Blockchain Meetup H27/07/11

ブロック間の時間• 60秒と言っても、それは理想であり、実際にばらつきがある

• ブロック難易度で決める

• NEM:60秒+/-0.5秒

20,000 25,000 30,000 35,000 40,000 45,000 50,000 55,000 60,00056

58

60

62

64

66

block height

seco

nds

block times average over 360 blocks

Figure 4: Main net average block times over 360 blocks

If only one block is available, then the block has a predefined initial di�culty of 1014.Otherwise, the di�culty is calculated from the last n blocks the following way:

d = 1n

nÿ

i=1

(di�culty of block i) (average di�culty)

t = 1n

nÿ

i=1

(time to create block i) (average creation time)

di�culty = d60t

(new di�culty)

If the new di�culty is more than 5% greater or smaller than the di�culty of the lastblock, then the change is capped to 5%.

Additionally, di�culties are kept within certain bounds. The new di�culty is clampedto the boundaries if it is greater than 1015 or smaller than 1013.

Simulations and the NEM beta phase have shown that the algorithm produces blockswith an average time of 60 ± 0.5 seconds.

The slow change rate of 5% makes it hard for an attacker with considerably less than50% importance to create a better chain in secret since block times will be considerablyhigher than 60 seconds for the beginning of his secret chain.

Page 17 of 54

Page 51: NEM Blockchain Meetup H27/07/11

ブロックのスコア値スコア値 = 難易度 ー 前ブロックからの時間

NEMは開発したP2P時計システムを利用する5.4 Block chain synchronization

Since blocks are assigned a score, a score can be assigned to a chain of blocks too:

score =ÿ

blockœblocks

block score (block chain score)

Block chain synchronization is a central task for every block chain based crypto currency.From time to time a local node will ask a remote node about its chain. The remote nodeis selected using the calculated trust values (see section 6: A reputation system for nodes).

If the remote node promises a chain with a higher score, the two nodes try to agree onthe last common block. If successful, the remote node will supply up to 400 blocks of itschain to the local node.

If the supplied chain is valid, the local node will replace its own chain with the remotechain. If the supplied chain is invalid, the local node will reject the chain and considerthe synchronization attempt with the remote node to have failed.

This algorithm will also resolve forks. The last common block may have a heightdi�erence of at most 360 compared to the local node’s last block. Thus, the maximaldepth of forks that can be resolved via the synchronization algorithm is 360.

The flow chart on the next page illustrates the process in more detail.

Page 19 of 54

ブロック・チェーンのスコア値:

Page 52: NEM Blockchain Meetup H27/07/11

POIでブロックの作成の仕方

5.2 Block score

The score for a block is derived from its di�culty and the time (in seconds) that haselapsed since the last block:

score = di�culty ≠ time elasped since last block (block score)

5.3 Block creation

The process of creating new blocks is called harvesting. The harvesting account gets thefees for the transactions in the block. This gives the harvester an incentive to add asmany transactions to the block as possible. Any account that has a vested balance of atleast 10,000 XEM is eligible to harvest.

To check if an account is allowed to create a new block at a specific network time, thefollowing variables are calculated:

h = H(generation hash of previous block, public key of account)interpreted as 256-bit integer

t = time in seconds since last blockb = 8999999999 · (importance of the account)d = di�culty for new block

and from that the hit and target integer values:

hit = 254

-----lnA

h

2256

B-----

target = 264

b

dt

The account is allowed to create the new block whenever hit < target. In the caseof delegated harvesting, the importance of the original account is used instead of theimportance of the delegated account.

Since target is proportional to the elapsed time, a new block will be created after acertain amount of time even if all accounts are unlucky and generate a very high hit.

Also note that hit has an exponential distribution. Therefore, the probability to createa new block does not change if the importance is split among many accounts.

Page 18 of 54

5.2 Block score

The score for a block is derived from its di�culty and the time (in seconds) that haselapsed since the last block:

score = di�culty ≠ time elasped since last block (block score)

5.3 Block creation

The process of creating new blocks is called harvesting. The harvesting account gets thefees for the transactions in the block. This gives the harvester an incentive to add asmany transactions to the block as possible. Any account that has a vested balance of atleast 10,000 XEM is eligible to harvest.

To check if an account is allowed to create a new block at a specific network time, thefollowing variables are calculated:

h = H(generation hash of previous block, public key of account)interpreted as 256-bit integer

t = time in seconds since last blockb = 8999999999 · (importance of the account)d = di�culty for new block

and from that the hit and target integer values:

hit = 254

-----lnA

h

2256

B-----

target = 264

b

dt

The account is allowed to create the new block whenever hit < target. In the caseof delegated harvesting, the importance of the original account is used instead of theimportance of the delegated account.

Since target is proportional to the elapsed time, a new block will be created after acertain amount of time even if all accounts are unlucky and generate a very high hit.

Also note that hit has an exponential distribution. Therefore, the probability to createa new block does not change if the importance is split among many accounts.

Page 18 of 54

hit < targetの場合、参加者はブロックを作成できる

(PoSも同じ様であるが、importanceの代わりに残高を利用する)

Page 53: NEM Blockchain Meetup H27/07/11

BLOCKCHAIN UNWINDING• 正確なブロックを複数あり得るのでブロックチェーンも複数あり得る為、一番スコア値が高いチェーンだけを存在する

• ピアと同期する際、ピアのチェーンの方が高いスコア値を持ちの場合、自分のチェーンから360ブロック(約6時間)までアンワインド(消除)して、書き換える

• これはソフト・フォークと言い、一時的によく起こる

Page 54: NEM Blockchain Meetup H27/07/11

P2Pデータネットワーク• BTC・Ethereum・NEMなどはP2Pでデータを通信して、全てのノードは全てのブロックいわゆるデータを持つ

Page 55: NEM Blockchain Meetup H27/07/11

P2Pデータ通信• コミュニケーションのハンドシェーク

1. ランダム64バイトのメッセージを送信する

2. リモートノードが送ったメッセージに署名して、返事する

t1

t2

t3

t4

request+

random data

local node

partner node

response+

random datasignature

network time

network time

Figure 15: Communication between local and partner node.

9.2 Node Startup

When a NIS node is launched, the node processes the block chain and caches some datain memory to improve online performance. Once the processing is finished, the node isstill not connected to the network because it is not yet booted.

A non-booted node is not associated with an account. This prevents it from being ableto sign responses and prevents other nodes from being able to verify its identity.

In order to boot a node, the private key of a NEM account must be supplied to thenode. This account is the primary account associated with the node. A delegated accountcan be used to boot a node in order to better protect the private key of the real account(see subsection 4.2: Importance transfer transactions).

9.3 Node Discovery

Once a node is booted, it connects to the NEM network and starts sharing informationwith other nodes. Initially, the node is only aware of the well-known nodes. These nodesare the same as the pre-trusted nodes described in subsection 6.2: Local trust value.

Over time, the node becomes aware of more nodes in the network. There are typicallytwo ways this happens: via an announcement or a refresh.

9.3.1 Announcement

Periodically, a node announces itself to its current partner nodes and includes its localexperience information in the request. If the node is unknown to the partner, the partnermarks it as active and updates the node’s experiences. If the node is known to the partner,the partner only updates the node’s experiences but does not change its status.

Page 50 of 54

Page 56: NEM Blockchain Meetup H27/07/11

PEERとブロック・チェーン同期• ピアがハンド・シェークを合格すると同期が始まる

• ピアのブロック・チェーンのスコア値が自分より高かったら、ピアから400個のブロックまで受信する

• 全てのブロックを持つまで繰り返す

Page 57: NEM Blockchain Meetup H27/07/11

EIGENTRUST++:P2Pネットワークの安全化を向上する

• ブロック・チェーン・プラットフォームではNEMしか利用されていない

• ピアから嘘のデータを受け取った際に、その情報を他のピアと共有することで、ピアの信頼性を計算するアルゴリズムである

• 全てのノードは起動する際、秘密鍵を利用して、公開鍵と付属することになるので、ノードが証明するIDがある

Page 58: NEM Blockchain Meetup H27/07/11

EIGENTRUST++• ノードはピアとブロックデータ及び取引のデータを共有する

• 受信できるデータの全てが確認できること

• データを受信する場合、三つの可能性があり得る:

1. valid:正確及び新しいデータが受信した

2. neutral:正確な既に持っているデータを受信した

3. failure:受信されたデータが正確でない

Page 59: NEM Blockchain Meetup H27/07/11

EIGENTRUST++• ノードはピアとブロックデータ及び取引のデータを共有する

• 受信できるデータの全てが確認できること

• データを受信する場合、三つの可能性があり得る:

1. valid:正確及び新しいデータが受信した

2. neutral:正確な既に持っているデータを受信した

3. failure:受信されたデータが正確でない

Page 60: NEM Blockchain Meetup H27/07/11

EIGENTRUST++the pre-trusted nodes for other existing nodes in the network. Knowing about n nodes inthe network each node can build an initial trust vector p by setting:

pj

=Y]

[

1

|P | if nodej

œ P

0 otherwise

pj

indicates how much trust a node initially has in node j.

After some time node i had some interactions with other nodes and can update its localtrust values by first calculating:

sij

=Y]

[

success(i,j)

success(i,j)+failed(i,j)

if success(i, j) + failed(i, j) > 0p

j

otherwise

and then normalizing the local trust values:

cij

= sijq

m

sim

to define the local trust vector ci

with components cij

.

6.3 Aggregating local trust values

From time to time nodes broadcast their local trust values to other nodes. Having receivedthe local trust values from other nodes, node i can calculate an aggregate trust value fornode k by weighing the local trust node j has in node k with its own trust in node j:

tik

=ÿ

j

cij

cjk

This can be written in matrix notation by defining C = (ckj

) and ti

having componentstik

:ti

= CT ci

If we define the iteration:ti+1

= CT ti

then this will converge to the left principal eigenvector t of the matrix C under theassumptions that C is irreducible and aperiodic. To guarantee the assumptions beingvalid, we slightly change the iteration by mixing a portion of the initial trust vector p intothe iteration:

ti

=Y]

[p if i = 0(1 ≠ a)CT t

i≠1

+ ap otherwise(9)

where a suitable 0 < a < 1 is chosen. This iteration will always converge to a vector t,which represents the trust a node has in other nodes.

Page 22 of 54

the pre-trusted nodes for other existing nodes in the network. Knowing about n nodes inthe network each node can build an initial trust vector p by setting:

pj

=Y]

[

1

|P | if nodej

œ P

0 otherwise

pj

indicates how much trust a node initially has in node j.

After some time node i had some interactions with other nodes and can update its localtrust values by first calculating:

sij

=Y]

[

success(i,j)

success(i,j)+failed(i,j)

if success(i, j) + failed(i, j) > 0p

j

otherwise

and then normalizing the local trust values:

cij

= sijq

m

sim

to define the local trust vector ci

with components cij

.

6.3 Aggregating local trust values

From time to time nodes broadcast their local trust values to other nodes. Having receivedthe local trust values from other nodes, node i can calculate an aggregate trust value fornode k by weighing the local trust node j has in node k with its own trust in node j:

tik

=ÿ

j

cij

cjk

This can be written in matrix notation by defining C = (ckj

) and ti

having componentstik

:ti

= CT ci

If we define the iteration:ti+1

= CT ti

then this will converge to the left principal eigenvector t of the matrix C under theassumptions that C is irreducible and aperiodic. To guarantee the assumptions beingvalid, we slightly change the iteration by mixing a portion of the initial trust vector p intothe iteration:

ti

=Y]

[p if i = 0(1 ≠ a)CT t

i≠1

+ ap otherwise(9)

where a suitable 0 < a < 1 is chosen. This iteration will always converge to a vector t,which represents the trust a node has in other nodes.

Page 22 of 54

7.3 NCDawareRank

There are many ways to determine the salience of nodes in a network, and PageRank isone method. NCDawareRank is similar to PageRank, where the stationary probabilitydistribution of an ergodic Markov chain is calculated [9, 11]. NCDawareRank additionallyexploits the nearly completely decomposable structure of large-scale graphs of informationflows by adding an inter-level proximity matrix as an extra term, M. The inter-levelproximity matrix models the fact that groups of nodes are closely linked together to formclusters that interact with each other. This allows NCDawareRank to converge fasterthan PageRank while also being more resistant to manipulation of scores, because therank for nodes within the same level will be limited.

Shown in matrix notation, NCDawareRank is calculated as:

fi = O÷fi + Mµfi + E(1 ≠ ÷ ≠ µ)fi, (10)

where:

O is the outlink matrixM is the inter-level proximity matrixE is the teleportation matrixfi is the NCDawareRank÷ is the fraction of importance that is given via outlinksµ is the fraction of importance given to proximal accounts

This definition is the same as for PageRank, only with the addition of M and µ. ForNEM, ÷ is 0.7 and µ is 0.1. The details of how each of these variables is calculated are asfollows.

Let W be the set of all harvesting-eligible accounts. For u œ W , Gu

is the set ofaccounts that have received more in value transfers from account u than have sent u.Nearly completely decomposable (NCD) partitions of W are defined as {A

1

, A2

, ..., AN

},such that for every u œ W , there is a unique K such that u œ A

K

. The proximal accountsof each u, ‰

u

, are thus defined as:

‰u

,€

wœ(ufiGu)

A(w)

, (11)

and Nu

denotes the number of NCD blocks in ‰u

.

Page 30 of 54

set of known nodes

pj is the initial trust in node j

the pre-trusted nodes for other existing nodes in the network. Knowing about n nodes inthe network each node can build an initial trust vector p by setting:

pj

=Y]

[

1

|P | if nodej

œ P

0 otherwise

pj

indicates how much trust a node initially has in node j.

After some time node i had some interactions with other nodes and can update its localtrust values by first calculating:

sij

=Y]

[

success(i,j)

success(i,j)+failed(i,j)

if success(i, j) + failed(i, j) > 0p

j

otherwise

and then normalizing the local trust values:

cij

= sijq

m

sim

to define the local trust vector ci

with components cij

.

6.3 Aggregating local trust values

From time to time nodes broadcast their local trust values to other nodes. Having receivedthe local trust values from other nodes, node i can calculate an aggregate trust value fornode k by weighing the local trust node j has in node k with its own trust in node j:

tik

=ÿ

j

cij

cjk

This can be written in matrix notation by defining C = (ckj

) and ti

having componentstik

:ti

= CT ci

If we define the iteration:ti+1

= CT ti

then this will converge to the left principal eigenvector t of the matrix C under theassumptions that C is irreducible and aperiodic. To guarantee the assumptions beingvalid, we slightly change the iteration by mixing a portion of the initial trust vector p intothe iteration:

ti

=Y]

[p if i = 0(1 ≠ a)CT t

i≠1

+ ap otherwise(9)

where a suitable 0 < a < 1 is chosen. This iteration will always converge to a vector t,which represents the trust a node has in other nodes.

Page 22 of 54

the pre-trusted nodes for other existing nodes in the network. Knowing about n nodes inthe network each node can build an initial trust vector p by setting:

pj

=Y]

[

1

|P | if nodej

œ P

0 otherwise

pj

indicates how much trust a node initially has in node j.

After some time node i had some interactions with other nodes and can update its localtrust values by first calculating:

sij

=Y]

[

success(i,j)

success(i,j)+failed(i,j)

if success(i, j) + failed(i, j) > 0p

j

otherwise

and then normalizing the local trust values:

cij

= sijq

m

sim

to define the local trust vector ci

with components cij

.

6.3 Aggregating local trust values

From time to time nodes broadcast their local trust values to other nodes. Having receivedthe local trust values from other nodes, node i can calculate an aggregate trust value fornode k by weighing the local trust node j has in node k with its own trust in node j:

tik

=ÿ

j

cij

cjk

This can be written in matrix notation by defining C = (ckj

) and ti

having componentstik

:ti

= CT ci

If we define the iteration:ti+1

= CT ti

then this will converge to the left principal eigenvector t of the matrix C under theassumptions that C is irreducible and aperiodic. To guarantee the assumptions beingvalid, we slightly change the iteration by mixing a portion of the initial trust vector p intothe iteration:

ti

=Y]

[p if i = 0(1 ≠ a)CT t

i≠1

+ ap otherwise(9)

where a suitable 0 < a < 1 is chosen. This iteration will always converge to a vector t,which represents the trust a node has in other nodes.

Page 22 of 54

the pre-trusted nodes for other existing nodes in the network. Knowing about n nodes inthe network each node can build an initial trust vector p by setting:

pj

=Y]

[

1

|P | if nodej

œ P

0 otherwise

pj

indicates how much trust a node initially has in node j.

After some time node i had some interactions with other nodes and can update its localtrust values by first calculating:

sij

=Y]

[

success(i,j)

success(i,j)+failed(i,j)

if success(i, j) + failed(i, j) > 0p

j

otherwise

and then normalizing the local trust values:

cij

= sijq

m

sim

to define the local trust vector ci

with components cij

.

6.3 Aggregating local trust values

From time to time nodes broadcast their local trust values to other nodes. Having receivedthe local trust values from other nodes, node i can calculate an aggregate trust value fornode k by weighing the local trust node j has in node k with its own trust in node j:

tik

=ÿ

j

cij

cjk

This can be written in matrix notation by defining C = (ckj

) and ti

having componentstik

:ti

= CT ci

If we define the iteration:ti+1

= CT ti

then this will converge to the left principal eigenvector t of the matrix C under theassumptions that C is irreducible and aperiodic. To guarantee the assumptions beingvalid, we slightly change the iteration by mixing a portion of the initial trust vector p intothe iteration:

ti

=Y]

[p if i = 0(1 ≠ a)CT t

i≠1

+ ap otherwise(9)

where a suitable 0 < a < 1 is chosen. This iteration will always converge to a vector t,which represents the trust a node has in other nodes.

Page 22 of 54

7.3 NCDawareRank

There are many ways to determine the salience of nodes in a network, and PageRank isone method. NCDawareRank is similar to PageRank, where the stationary probabilitydistribution of an ergodic Markov chain is calculated [9, 11]. NCDawareRank additionallyexploits the nearly completely decomposable structure of large-scale graphs of informationflows by adding an inter-level proximity matrix as an extra term, M. The inter-levelproximity matrix models the fact that groups of nodes are closely linked together to formclusters that interact with each other. This allows NCDawareRank to converge fasterthan PageRank while also being more resistant to manipulation of scores, because therank for nodes within the same level will be limited.

Shown in matrix notation, NCDawareRank is calculated as:

fi = O÷fi + Mµfi + E(1 ≠ ÷ ≠ µ)fi, (10)

where:

O is the outlink matrixM is the inter-level proximity matrixE is the teleportation matrixfi is the NCDawareRank÷ is the fraction of importance that is given via outlinksµ is the fraction of importance given to proximal accounts

This definition is the same as for PageRank, only with the addition of M and µ. ForNEM, ÷ is 0.7 and µ is 0.1. The details of how each of these variables is calculated are asfollows.

Let W be the set of all harvesting-eligible accounts. For u œ W , Gu

is the set ofaccounts that have received more in value transfers from account u than have sent u.Nearly completely decomposable (NCD) partitions of W are defined as {A

1

, A2

, ..., AN

},such that for every u œ W , there is a unique K such that u œ A

K

. The proximal accountsof each u, ‰

u

, are thus defined as:

‰u

,€

wœ(ufiGu)

A(w)

, (11)

and Nu

denotes the number of NCD blocks in ‰u

.

Page 30 of 54

local trust vector

i is the local node, j is peer, m are all peers

Page 61: NEM Blockchain Meetup H27/07/11

EIGENTRUST++

6.4 Enhancing the algorithm

The above algorithm to compute trust su�ers from the fact that the veracity of thefeedback reported from other nodes is unknown. Malicious nodes could collude and reportlow trust values for honest nodes and high trust values for dishonest nodes.

An improvement is to estimate the credibility of the feedback of other nodes and weightthe reported trust values by the credibility score. To do that, common(u, v) is defined fortwo nodes u and v as the set of nodes with which both nodes have interacted. Given that,a measure for the similarity of the feedback of the two nodes (sim(u, v)) can be calculated:

sim(u, v) =

Y__]

__[

A

1 ≠Úq

wœcommon(u,v)(suw≠svw)

2

|common(u,v)|

Bb

if common(u, v) ”= ÿ

0 otherwise

where b is a positive integer (the original paper suggests b = 1).

Then, feedback credibility can be defined as:

fij

=Y]

[

sim(i,j)qm

sim(i,m)

if qm

sim(i, m) > 00 otherwise

and, finally, the matrix L = (lij

) can be defined as:

lij

=Y]

[

fijcijqm

fimcimif q

m

fim

cim

> 00 otherwise

which incorporates both the reported trust values and the feedback credibility.

It is now straightforward to define an iteration analog to Equation 9:

ti

=Y]

[p if i = 0

(1 ≠ a)LT ti≠1

+ ap otherwise

which converges to the left principal eigenvector of the underlying matrix of the poweriteration.

The original Eigentrust++ paper suggests using additional measures to limit trustpropagation between honest and dishonest nodes. The paper was written with file sharingnetworks in mind. In such networks, incomplete data is shared (parts of files) and cannotbe checked for validity. Therefore, even honest nodes could distribute malicious data.Nodes in the NEM network always download entities in their entirety and verify theirintegrity before distributing them to other nodes. NEM network simulations have shownthat the results without the additional trust propagation measures are good enough tomitigate the presence of malicious nodes.

Page 23 of 54

for two nodes u and v, common(u,v) is the set of nodes where both u and v; b should be any positive integer

6.4 Enhancing the algorithm

The above algorithm to compute trust su�ers from the fact that the veracity of thefeedback reported from other nodes is unknown. Malicious nodes could collude and reportlow trust values for honest nodes and high trust values for dishonest nodes.

An improvement is to estimate the credibility of the feedback of other nodes and weightthe reported trust values by the credibility score. To do that, common(u, v) is defined fortwo nodes u and v as the set of nodes with which both nodes have interacted. Given that,a measure for the similarity of the feedback of the two nodes (sim(u, v)) can be calculated:

sim(u, v) =

Y__]

__[

A

1 ≠Úq

wœcommon(u,v)(suw≠svw)

2

|common(u,v)|

Bb

if common(u, v) ”= ÿ

0 otherwise

where b is a positive integer (the original paper suggests b = 1).

Then, feedback credibility can be defined as:

fij

=Y]

[

sim(i,j)qm

sim(i,m)

if qm

sim(i, m) > 00 otherwise

and, finally, the matrix L = (lij

) can be defined as:

lij

=Y]

[

fijcijqm

fimcimif q

m

fim

cim

> 00 otherwise

which incorporates both the reported trust values and the feedback credibility.

It is now straightforward to define an iteration analog to Equation 9:

ti

=Y]

[p if i = 0

(1 ≠ a)LT ti≠1

+ ap otherwise

which converges to the left principal eigenvector of the underlying matrix of the poweriteration.

The original Eigentrust++ paper suggests using additional measures to limit trustpropagation between honest and dishonest nodes. The paper was written with file sharingnetworks in mind. In such networks, incomplete data is shared (parts of files) and cannotbe checked for validity. Therefore, even honest nodes could distribute malicious data.Nodes in the NEM network always download entities in their entirety and verify theirintegrity before distributing them to other nodes. NEM network simulations have shownthat the results without the additional trust propagation measures are good enough tomitigate the presence of malicious nodes.

Page 23 of 54

f = feedback

Page 62: NEM Blockchain Meetup H27/07/11

EIGENTRUST++

6.4 Enhancing the algorithm

The above algorithm to compute trust su�ers from the fact that the veracity of thefeedback reported from other nodes is unknown. Malicious nodes could collude and reportlow trust values for honest nodes and high trust values for dishonest nodes.

An improvement is to estimate the credibility of the feedback of other nodes and weightthe reported trust values by the credibility score. To do that, common(u, v) is defined fortwo nodes u and v as the set of nodes with which both nodes have interacted. Given that,a measure for the similarity of the feedback of the two nodes (sim(u, v)) can be calculated:

sim(u, v) =

Y__]

__[

A

1 ≠Úq

wœcommon(u,v)(suw≠svw)

2

|common(u,v)|

Bb

if common(u, v) ”= ÿ

0 otherwise

where b is a positive integer (the original paper suggests b = 1).

Then, feedback credibility can be defined as:

fij

=Y]

[

sim(i,j)qm

sim(i,m)

if qm

sim(i, m) > 00 otherwise

and, finally, the matrix L = (lij

) can be defined as:

lij

=Y]

[

fijcijqm

fimcimif q

m

fim

cim

> 00 otherwise

which incorporates both the reported trust values and the feedback credibility.

It is now straightforward to define an iteration analog to Equation 9:

ti

=Y]

[p if i = 0

(1 ≠ a)LT ti≠1

+ ap otherwise

which converges to the left principal eigenvector of the underlying matrix of the poweriteration.

The original Eigentrust++ paper suggests using additional measures to limit trustpropagation between honest and dishonest nodes. The paper was written with file sharingnetworks in mind. In such networks, incomplete data is shared (parts of files) and cannotbe checked for validity. Therefore, even honest nodes could distribute malicious data.Nodes in the NEM network always download entities in their entirety and verify theirintegrity before distributing them to other nodes. NEM network simulations have shownthat the results without the additional trust propagation measures are good enough tomitigate the presence of malicious nodes.

Page 23 of 54

matrix L combines trust with feedback credibility

6.4 Enhancing the algorithm

The above algorithm to compute trust su�ers from the fact that the veracity of thefeedback reported from other nodes is unknown. Malicious nodes could collude and reportlow trust values for honest nodes and high trust values for dishonest nodes.

An improvement is to estimate the credibility of the feedback of other nodes and weightthe reported trust values by the credibility score. To do that, common(u, v) is defined fortwo nodes u and v as the set of nodes with which both nodes have interacted. Given that,a measure for the similarity of the feedback of the two nodes (sim(u, v)) can be calculated:

sim(u, v) =

Y__]

__[

A

1 ≠Úq

wœcommon(u,v)(suw≠svw)

2

|common(u,v)|

Bb

if common(u, v) ”= ÿ

0 otherwise

where b is a positive integer (the original paper suggests b = 1).

Then, feedback credibility can be defined as:

fij

=Y]

[

sim(i,j)qm

sim(i,m)

if qm

sim(i, m) > 00 otherwise

and, finally, the matrix L = (lij

) can be defined as:

lij

=Y]

[

fijcijqm

fimcimif q

m

fim

cim

> 00 otherwise

which incorporates both the reported trust values and the feedback credibility.

It is now straightforward to define an iteration analog to Equation 9:

ti

=Y]

[p if i = 0

(1 ≠ a)LT ti≠1

+ ap otherwise

which converges to the left principal eigenvector of the underlying matrix of the poweriteration.

The original Eigentrust++ paper suggests using additional measures to limit trustpropagation between honest and dishonest nodes. The paper was written with file sharingnetworks in mind. In such networks, incomplete data is shared (parts of files) and cannotbe checked for validity. Therefore, even honest nodes could distribute malicious data.Nodes in the NEM network always download entities in their entirety and verify theirintegrity before distributing them to other nodes. NEM network simulations have shownthat the results without the additional trust propagation measures are good enough tomitigate the presence of malicious nodes.

Page 23 of 54

the pre-trusted nodes for other existing nodes in the network. Knowing about n nodes inthe network each node can build an initial trust vector p by setting:

pj

=Y]

[

1

|P | if nodej

œ P

0 otherwise

pj

indicates how much trust a node initially has in node j.

After some time node i had some interactions with other nodes and can update its localtrust values by first calculating:

sij

=Y]

[

success(i,j)

success(i,j)+failed(i,j)

if success(i, j) + failed(i, j) > 0p

j

otherwise

and then normalizing the local trust values:

cij

= sijq

m

sim

to define the local trust vector ci

with components cij

.

6.3 Aggregating local trust values

From time to time nodes broadcast their local trust values to other nodes. Having receivedthe local trust values from other nodes, node i can calculate an aggregate trust value fornode k by weighing the local trust node j has in node k with its own trust in node j:

tik

=ÿ

j

cij

cjk

This can be written in matrix notation by defining C = (ckj

) and ti

having componentstik

:ti

= CT ci

If we define the iteration:ti+1

= CT ti

then this will converge to the left principal eigenvector t of the matrix C under theassumptions that C is irreducible and aperiodic. To guarantee the assumptions beingvalid, we slightly change the iteration by mixing a portion of the initial trust vector p intothe iteration:

ti

=Y]

[p if i = 0(1 ≠ a)CT t

i≠1

+ ap otherwise(9)

where a suitable 0 < a < 1 is chosen. This iteration will always converge to a vector t,which represents the trust a node has in other nodes.

Page 22 of 54

the pre-trusted nodes for other existing nodes in the network. Knowing about n nodes inthe network each node can build an initial trust vector p by setting:

pj

=Y]

[

1

|P | if nodej

œ P

0 otherwise

pj

indicates how much trust a node initially has in node j.

After some time node i had some interactions with other nodes and can update its localtrust values by first calculating:

sij

=Y]

[

success(i,j)

success(i,j)+failed(i,j)

if success(i, j) + failed(i, j) > 0p

j

otherwise

and then normalizing the local trust values:

cij

= sijq

m

sim

to define the local trust vector ci

with components cij

.

6.3 Aggregating local trust values

From time to time nodes broadcast their local trust values to other nodes. Having receivedthe local trust values from other nodes, node i can calculate an aggregate trust value fornode k by weighing the local trust node j has in node k with its own trust in node j:

tik

=ÿ

j

cij

cjk

This can be written in matrix notation by defining C = (ckj

) and ti

having componentstik

:ti

= CT ci

If we define the iteration:ti+1

= CT ti

then this will converge to the left principal eigenvector t of the matrix C under theassumptions that C is irreducible and aperiodic. To guarantee the assumptions beingvalid, we slightly change the iteration by mixing a portion of the initial trust vector p intothe iteration:

ti

=Y]

[p if i = 0(1 ≠ a)CT t

i≠1

+ ap otherwise(9)

where a suitable 0 < a < 1 is chosen. This iteration will always converge to a vector t,which represents the trust a node has in other nodes.

Page 22 of 54

the pre-trusted nodes for other existing nodes in the network. Knowing about n nodes inthe network each node can build an initial trust vector p by setting:

pj

=Y]

[

1

|P | if nodej

œ P

0 otherwise

pj

indicates how much trust a node initially has in node j.

After some time node i had some interactions with other nodes and can update its localtrust values by first calculating:

sij

=Y]

[

success(i,j)

success(i,j)+failed(i,j)

if success(i, j) + failed(i, j) > 0p

j

otherwise

and then normalizing the local trust values:

cij

= sijq

m

sim

to define the local trust vector ci

with components cij

.

6.3 Aggregating local trust values

From time to time nodes broadcast their local trust values to other nodes. Having receivedthe local trust values from other nodes, node i can calculate an aggregate trust value fornode k by weighing the local trust node j has in node k with its own trust in node j:

tik

=ÿ

j

cij

cjk

This can be written in matrix notation by defining C = (ckj

) and ti

having componentstik

:ti

= CT ci

If we define the iteration:ti+1

= CT ti

then this will converge to the left principal eigenvector t of the matrix C under theassumptions that C is irreducible and aperiodic. To guarantee the assumptions beingvalid, we slightly change the iteration by mixing a portion of the initial trust vector p intothe iteration:

ti

=Y]

[p if i = 0(1 ≠ a)CT t

i≠1

+ ap otherwise(9)

where a suitable 0 < a < 1 is chosen. This iteration will always converge to a vector t,which represents the trust a node has in other nodes.

Page 22 of 54

For iteration,

Page 63: NEM Blockchain Meetup H27/07/11

• 悪なピアがずっと全く嘘ばかりなデータだけを送信する場合のシムレーションである

0% 10% 20% 30% 40% 50% 60% 70% 80% 90%0

20

40

60

80

0

0.5

0.9

1.1

1.7

1.9

2.3

3

3.3

3.6

0

10

20

30

40

50

60

70

80

90

percentage of malicious nodes

perc

enta

gefa

iled

calls

0% honest data, 0% honest feedback

Eigentrust++Uniform trust

Figure 6: Simulation with attacking nodes that are always dishonest

6.5 Benefits of the reputation system

Having a reputation system for nodes allows nodes to select their communication partneraccording to the trust values for other nodes. This should also help balance the load ofthe network because the selection of the communication partner only depends on a node’shonesty but not its importance.

Simulations show that the algorithm reduces the number of failed interactions consid-erably. If malicious nodes only provide dishonest data and dishonest feedback they areeasily identified (Figure 6).

But even if the malicious nodes collude to give other malicious nodes a high trust valueand provide false data and feedback only to a certain percentage, the trust algorithm stillcuts down the percentage of failed interactions (Figure 7).

Page 24 of 54

EIGENTRUST++

Page 64: NEM Blockchain Meetup H27/07/11

EIGENTRUST++

0% 10% 20% 30% 40% 50% 60% 70% 80% 90%0

10

20

30

40

50

0

0.4

1

1.7

2.4

3.4

4.9

6.7

8.6

18.9

0

6

12

18

24

30

36

42

48

54

percentage of malicious nodes

perc

enta

gefa

iled

calls

40% honest data, 40% honest feedback

Eigentrust++Uniform trust

Figure 7: Simulation with attacking nodes that are sometimes dishonest

Page 25 of 54

• 悪なピアが6割嘘ばかりなデータだけを送信する場合のシムレーションである

Page 65: NEM Blockchain Meetup H27/07/11

P2P時計サービス

• NEMは分散型システムで、信用できる時計はないため、P2P時計サービスを作成することにより、ネットワークが利用できる「時計」が実現した

Page 66: NEM Blockchain Meetup H27/07/11

P2P時計サービス• ノードがオンラインする際、ランダム的に20個の重要性スコアが0でないノードを選んで、ローカル時間と各相手の時間を比較する

t1

t2

t3

t4

request

local node

partner node

response

network time

network time

Figure 13: Communication between local and partner node.

For all selected partners the local node sends out a request asking the partner for itscurrent network time. The local node remembers the network time stamps when therequest was sent and when the response was received. Each partner node responds witha sample that contains the time stamp of the arrival of the request and the time stampof the response. The partner node uses its own network time to create the time stamps.Figure 13 illustrates the communication between the nodes.

Using the time stamps, the local node can calculate the round trip time

rtt = (t4

≠ t1

) ≠ (t3

≠ t2

)

and then estimate the o�set o between the network time used by the two nodes as

o = t2

≠ t1

≠ rtt

2

This is repeated for every time synchronization partner until the local node has a listof o�set estimations.

8.2 Applying filters to remove bad data

There could be bad samples due to various reasons:

• A malicious node can supply incorrect time stamps.

• An honest node can have a clock far from real time without knowing it and withouthaving synchronized yet.

• The round trip time can be highly asymmetric due to internet problems or one ofthe nodes being very busy. This is known as channel asymmetry and cannot beavoided.

Page 45 of 54

t1

t2

t3

t4

request

local node

partner node

response

network time

network time

Figure 13: Communication between local and partner node.

For all selected partners the local node sends out a request asking the partner for itscurrent network time. The local node remembers the network time stamps when therequest was sent and when the response was received. Each partner node responds witha sample that contains the time stamp of the arrival of the request and the time stampof the response. The partner node uses its own network time to create the time stamps.Figure 13 illustrates the communication between the nodes.

Using the time stamps, the local node can calculate the round trip time

rtt = (t4

≠ t1

) ≠ (t3

≠ t2

)

and then estimate the o�set o between the network time used by the two nodes as

o = t2

≠ t1

≠ rtt

2

This is repeated for every time synchronization partner until the local node has a listof o�set estimations.

8.2 Applying filters to remove bad data

There could be bad samples due to various reasons:

• A malicious node can supply incorrect time stamps.

• An honest node can have a clock far from real time without knowing it and withouthaving synchronized yet.

• The round trip time can be highly asymmetric due to internet problems or one ofthe nodes being very busy. This is known as channel asymmetry and cannot beavoided.

Page 45 of 54

t1

t2

t3

t4

request

local node

partner node

response

network time

network time

Figure 13: Communication between local and partner node.

For all selected partners the local node sends out a request asking the partner for itscurrent network time. The local node remembers the network time stamps when therequest was sent and when the response was received. Each partner node responds witha sample that contains the time stamp of the arrival of the request and the time stampof the response. The partner node uses its own network time to create the time stamps.Figure 13 illustrates the communication between the nodes.

Using the time stamps, the local node can calculate the round trip time

rtt = (t4

≠ t1

) ≠ (t3

≠ t2

)

and then estimate the o�set o between the network time used by the two nodes as

o = t2

≠ t1

≠ rtt

2

This is repeated for every time synchronization partner until the local node has a listof o�set estimations.

8.2 Applying filters to remove bad data

There could be bad samples due to various reasons:

• A malicious node can supply incorrect time stamps.

• An honest node can have a clock far from real time without knowing it and withouthaving synchronized yet.

• The round trip time can be highly asymmetric due to internet problems or one ofthe nodes being very busy. This is known as channel asymmetry and cannot beavoided.

Page 45 of 54

(o is offset)(rtt is round trip time)

Page 67: NEM Blockchain Meetup H27/07/11

P2P時計サービスデータ・フィルタリング

• t4-t1 > 1秒の場合、サンプルを無視する

• offsetはノードのバウンド以上であれば、無視する

• 残っているサンプルは順番で並んで、alpha

trimする

Page 68: NEM Blockchain Meetup H27/07/11

P2P時計サービスSYBIL攻撃の防止

• 重要性値を利用して、offsetに重みを付ける

Filters are applied that try to remove the bad samples. The filtering is done in 3 steps:

1. If the response from a partner is not received within an expected time frame (i.e. ift4

≠ t1

> 1000ms) the sample is discarded.

2. If the calculated o�set is not within certain bounds, the sample is discarded. Theallowable bounds decrease as a node’s uptime increases. When a node first joins thenetwork, it tolerates a high o�set in order to adjust to the already existing consensusof network time within the network. As time passes, the node gets less tolerant withrespect to reported o�sets. This ensures that malicious nodes reporting huge o�setsare ignored after some time.

3. The remaining samples are ordered by their o�set and then alpha trimmed on bothends. In other words, on both sides a certain portion of the samples is discarded.

8.3 Calculation of the e�ective o�set

The reported o�set is weighted with the importance of the boot account of the nodereporting the o�set. This is done to prevent Sybil attacks.

An attacker that tries to influence the calculated o�set by running many nodes withlow importances reporting o�sets close to the tolerated bound will therefore not have abigger influence than a single node having the same cumulative importance reporting thesame o�set. The influence of the attacker will be equal to the influence of the single nodeon a macro level.

Also, the numbers of samples that are available and the cumulative importance of allpartner nodes should be incorporated. Each o�set is therefore multiplied with a scalingfactor.

Let Ij

be the importance of the node reporting the j-th o�set oj

and viewSize be thenumber of samples divided by the number of nodes that were eligible for the last PoIcalculation.

Then the scaling factor used is

scale = minA

1q

j

Ij

,1

viewSize

B

This gives the formula for the e�ective o�set o

o = scaleÿ

j

Ij

oj

Page 46 of 54

Filters are applied that try to remove the bad samples. The filtering is done in 3 steps:

1. If the response from a partner is not received within an expected time frame (i.e. ift4

≠ t1

> 1000ms) the sample is discarded.

2. If the calculated o�set is not within certain bounds, the sample is discarded. Theallowable bounds decrease as a node’s uptime increases. When a node first joins thenetwork, it tolerates a high o�set in order to adjust to the already existing consensusof network time within the network. As time passes, the node gets less tolerant withrespect to reported o�sets. This ensures that malicious nodes reporting huge o�setsare ignored after some time.

3. The remaining samples are ordered by their o�set and then alpha trimmed on bothends. In other words, on both sides a certain portion of the samples is discarded.

8.3 Calculation of the e�ective o�set

The reported o�set is weighted with the importance of the boot account of the nodereporting the o�set. This is done to prevent Sybil attacks.

An attacker that tries to influence the calculated o�set by running many nodes withlow importances reporting o�sets close to the tolerated bound will therefore not have abigger influence than a single node having the same cumulative importance reporting thesame o�set. The influence of the attacker will be equal to the influence of the single nodeon a macro level.

Also, the numbers of samples that are available and the cumulative importance of allpartner nodes should be incorporated. Each o�set is therefore multiplied with a scalingfactor.

Let Ij

be the importance of the node reporting the j-th o�set oj

and viewSize be thenumber of samples divided by the number of nodes that were eligible for the last PoIcalculation.

Then the scaling factor used is

scale = minA

1q

j

Ij

,1

viewSize

B

This gives the formula for the e�ective o�set o

o = scaleÿ

j

Ij

oj

Page 46 of 54(o is the effective offset)

viewSize = number of samples# accts in PoI calc

Page 69: NEM Blockchain Meetup H27/07/11

P2P時計サービス

0 5 10 15 20

0.2

0.4

0.6

0.8

1

round

coup

ling

Figure 14: Coupling factor

Note that the influence of an account with large importance is artificially limited becausethe viewSize caps the scale. Such an account can raise its influence on a macro level bysplitting its NEM into accounts that are not capped. But, doing so will likely decreaseits influence on individual partners because the probability that all of its split accountsare chosen as time-sync partners for any single node is low.

8.4 Coupling and threshold

New nodes that just joined the network need to quickly adjust their o�set to the alreadyestablished network time. In contrast, old nodes should behave a lot more rigid in orderto not get influenced by malicious nodes or newcomers too much.

In order to enable this, nodes only adjust a portion of the reported e�ective o�set.Nodes multiply the e�ective o�set with a coupling factor to build the final o�set.

Each node keeps track of the number of time synchronization rounds it has performed.This is called the node age.

The formula for this coupling factor c is:

c = max1e≠0.3a, 0.1

2where a = max(nodeAge ≠ 5, 0)

This ensures that the coupling factor will be 1 for 5 rounds and then decay exponentiallyto 0.1.

Finally, a node only adds any calculated final o�set to its internal o�set if the absolute

Page 47 of 54

0 5 10 15 20

0.2

0.4

0.6

0.8

1

round

coup

ling

Figure 14: Coupling factor

Note that the influence of an account with large importance is artificially limited becausethe viewSize caps the scale. Such an account can raise its influence on a macro level bysplitting its NEM into accounts that are not capped. But, doing so will likely decreaseits influence on individual partners because the probability that all of its split accountsare chosen as time-sync partners for any single node is low.

8.4 Coupling and threshold

New nodes that just joined the network need to quickly adjust their o�set to the alreadyestablished network time. In contrast, old nodes should behave a lot more rigid in orderto not get influenced by malicious nodes or newcomers too much.

In order to enable this, nodes only adjust a portion of the reported e�ective o�set.Nodes multiply the e�ective o�set with a coupling factor to build the final o�set.

Each node keeps track of the number of time synchronization rounds it has performed.This is called the node age.

The formula for this coupling factor c is:

c = max1e≠0.3a, 0.1

2where a = max(nodeAge ≠ 5, 0)

This ensures that the coupling factor will be 1 for 5 rounds and then decay exponentiallyto 0.1.

Finally, a node only adds any calculated final o�set to its internal o�set if the absolute

Page 47 of 54

・The coupling factor limits the influence of new nodes on older nodes・effective offset is multiplied by the coupling factor for the final value

Page 70: NEM Blockchain Meetup H27/07/11

NEMのプログラミングAPI• 説明書:http://bob.nem.ninja/docs/import'requests'

req'='requests.get(‘http://104.156.232.219:7890/account/get/forwarded?address=NC2ZQKEFQIL3JZEOB2OZPWXWPOR6LKYHIROCR7PK')'

req.json(){"u'account':"{u'address':"u'NALICE2A73DLYTP4365GNFCURAUP3XVBFO7YNYOW',"

""u'balance':"15794218396666,"

""u'harvestedBlocks':"1181,"

""u'importance':"0.0015975939387324564,"

""u'label':"None,"

""u'publicKey':"u'bdd8dd702acb3d88daf188be8d6d9c54b3a29a32561a068b25d2261b2b2b7f02',"

""u'vestedBalance':"15792456424453},"

""u'meta':"{u'cosignatories':"[],"

""u'cosignatoryOf':"[],"

""u'remoteStatus':"u'ACTIVE',"

""u'status':"u’LOCKED'}"}

Page 71: NEM Blockchain Meetup H27/07/11

NEMモザイクタイル

namespace

reputation

quantity

composition

computation

storage

bandwidth

information

serialization

socialweb

transferability

divisibility

name

description

expirability

mutable

quantity

mosaic

NEMはこれから実現する予定の、スマート・コントラクトも含めて、いろんなものを階層的に組み合わせ、シンプルなものから複雑なものまで何でもできるものを作れる

Page 72: NEM Blockchain Meetup H27/07/11

DOMAIN NAMESlevel 1

level 2 level 2

level 3level 3 level 3

microsoft.windows

level 1 domain mosaic name

Page 73: NEM Blockchain Meetup H27/07/11

REPUTATION AND RATING

• reputationがないと役に立てない

• domain nameに付ける

• 取引に重みを付けたレーティングすることが可能

Page 74: NEM Blockchain Meetup H27/07/11

REPUTATION AND RATING• reputationがないと相手を信用できない

• 一般人に不可欠である

Page 75: NEM Blockchain Meetup H27/07/11

NODE EXPLORER

http://www.nodeexplorer.com/

Page 76: NEM Blockchain Meetup H27/07/11

NEMBEX

http://nembex.nem.ninja

Page 77: NEM Blockchain Meetup H27/07/11

THEORETICAL SCALABILITY

• NEMではSHA-3を利用して、0.5msで取引の署名を確認できるので、ネットワーク通信を無視すると理論的に一秒間で2000個の取引を処理することが可能

• もっと高速なインプリメンテションを利用すると一秒で4000個の取引の署名を確認できる

• VISAは一秒で2000個の取引に対応している

Page 78: NEM Blockchain Meetup H27/07/11

NEM超速度ネットワーク• Sybil攻撃を防止するため、3,000,000XEM以上がアカウントにあれば、参加できる資格

• モバイルアプリが利用可能

• 将来:高速分散型ledgerもしく並列ブロックチェーン

Page 79: NEM Blockchain Meetup H27/07/11

価値とは?• 一言に人の命である

• 短い人生のに、時間がお金(価値)に交換する

• money := quantized life

• money is also information

• life == information??

Page 80: NEM Blockchain Meetup H27/07/11

KNOWLEDGE PYRAMID

データ情報

智慧知識

アカウント残高等

取引

computation・Dapps

intelligencecrypto 3.0

crypto 2.0

1.0

Page 81: NEM Blockchain Meetup H27/07/11

2020東京五輪にて• NEMphone

• passport on blockchain — go overseas with only your NEMphone

• connect to NEM nodes - global, free internet

• use local currencies; currencies are automatically converted by decentralized markets

• explicit taxes are no longer needed; governments make money by running NEM nodes and collecting fees automatically every time one of their currencies (mosaics) are transacted

• JPY is the reserve currency of the world using Lon’s method

• using prescription in a mosaic to fill medicine in Japan

• proof of participant identity

• proof of participant passing drug testing

Page 82: NEM Blockchain Meetup H27/07/11

2020東京五輪にて• モノのインターネット

• m-of-n multisig lock for the room storing medals

• toilet tells you about your health by sending encrypted messages to your NEM account

Page 83: NEM Blockchain Meetup H27/07/11

NEM COMMUNITY FUND

• Decentralized Autonomous Organization

• Lots of XEM in it ^^

Page 84: NEM Blockchain Meetup H27/07/11

COMPANIES BUILDING ON NEM

• 取引所

• 送金サービス

• PhotoMead

• インスタグラムみたいけど、写真を売れる

• Yamfy

Page 85: NEM Blockchain Meetup H27/07/11

実現したい会社

• 有機証明サービス

• 薬トラッキングシステム

• 酒造

Page 86: NEM Blockchain Meetup H27/07/11

活人剣

殺人刀

原子力のおかげで180万人が生きているが、恐れるべきか?

Page 87: NEM Blockchain Meetup H27/07/11

マトメに• ブロック・チェーン技術は分散形情報通信に役に立つものであり、送金することに限らない

• 上手く利用すると人を活かせる技術であろう

• NEMは次世代プラットフォームで、経済活動及び情報通信に革命を起こすことに目指す