Top Banner
Huming Zhu, Maoguo Gong, Baolin Huang [email protected] 2013.11 Complex Network Clustering Using GPU-based Parallel Non-negative Matrix Factorization Xidian university
31

HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Matrix Factorization, by Huming Zhu

Jan 13, 2015

Download

Technology

Presentation HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Matrix Factorization, by Huming Zhu at the AMD Developer Summit (APU13) November 11-13, 2013.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Matrix Factorization, by Huming Zhu

Huming Zhu, Maoguo Gong, Baolin Huang [email protected] 2013.11

Complex Network Clustering Using GPU-based Parallel Non-negative Matrix Factorization

Xidian university

Page 2: HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Matrix Factorization, by Huming Zhu

openCL COURSE! ID:0222277,0242277 ! Opencl PROGRAMMING,Practice! 2011、2012,2013

Page 3: HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Matrix Factorization, by Huming Zhu
Page 4: HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Matrix Factorization, by Huming Zhu

2 Parallel Bayesian NMF on GPU

Contents

4 Experiment

5 Conclusion

Complex Network Clustering of NMF 1

3 Sparse BNMF on GPU

Page 5: HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Matrix Factorization, by Huming Zhu

5 Xidian University 12/7/13 5

* All pictures are from Internet

Complex Network Clustering

Page 6: HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Matrix Factorization, by Huming Zhu

6 Xidian University 12/7/13 6

Complex Network Clustering

Network clustering aims to divide a network into several communities. It is required

that the number of edges linking nodes of the same communities should be higher

than the number of edges joining nodes belonging to different communities.

•  Network clustering is essential for understanding how a network is organized and functions.

Page 7: HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Matrix Factorization, by Huming Zhu

7 Xidian University 12/7/13 7

Non-negative Matrix Factorization (NMF)

" powerful interpretability and close relationship between clustering methods.

" Need a lot of computation power.

"  The NMF problem is defined as a searching for an approximation of the matrix

A with respect to some metric (e.g., the norm) by factoring A into the product

W × H of two reduced matrices W and H.

"  NMF was applied in many areas, image processing,

[1] D. D. Lee, H. S. Seung: Learning the parts of objects by non-negative matrix factorization. Nature 401,pp. 788–791 (1999).

Page 8: HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Matrix Factorization, by Huming Zhu

8 Xidian University 12/7/13 8

Bayesian NMF

Input : Nonnegative data (observation) matrix A, fixed hyperparameters a, b. Output : Nonnegative matrices W and H Step1 :Initialize W and H to nonnegative values

Step5. If convergence then stop, otherwise, go to step2.

Page 9: HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Matrix Factorization, by Huming Zhu

9 Xidian University 12/7/13

2 Parallel Bayesian NMF on GPU

Contents

4 Experiment

5 Conclusion

Complex Network Clustering of NMF 1

3 Sparse BNMF on GPU

Page 10: HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Matrix Factorization, by Huming Zhu

10 Xidian University 12/7/13

Parallel Bayesian NMF

• P-BNMF • Sparse-BNMF。

Page 11: HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Matrix Factorization, by Huming Zhu

11 Xidian University 12/7/13

P-BNMF kernel

matrix multiplication

Matrix square sum

Page 12: HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Matrix Factorization, by Huming Zhu

12 Xidian University 12/7/13

"  Update matrix:W*H "  Kernel: mat_mult_AB

Matrix multiplication

Page 13: HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Matrix Factorization, by Huming Zhu

13 Xidian University 12/7/13

sum of square of Matrix

Page 14: HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Matrix Factorization, by Huming Zhu

14 Xidian University 12/7/13

2 Parallel Bayesian NMF on GPU

Contents

4 Experiment

5 Conclusion

Complex Network Clustering of NMF 1

3 Sparse BNMF on GPU

Page 15: HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Matrix Factorization, by Huming Zhu

15 Xidian University 12/7/13

Sparse-BNMF

Problem

GPU memory 1G,P-BNMF scale limit!

Sparse matrix storage format (CSR) ,Present Sparse-BNMF。

Solution

Page 16: HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Matrix Factorization, by Huming Zhu

16 Xidian University 12/7/13

Sparse-BNMF

CSR :Aj, Av, Ap

CSR column :Aj_column, Av_column, Ap_column

Page 17: HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Matrix Factorization, by Huming Zhu

17 Xidian University 12/7/13

Page 18: HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Matrix Factorization, by Huming Zhu

18 Xidian University 12/7/13

Pseudo-code for A_WH_csr kernel luint row = globalidy; if(row < row_num) {

uint rowStart = Ap[row]; //get the start start position in Aj of this row.

uint rowEnd = Ap[row+1]; //get the end position of this row. int index = rowStart + groupidx * 16 + localid; //the size of group is 16*1

//get the position of this pe(processing elelmet). int col = Aj[index];//get the position in Av of this pe.

int aStart = widthA *groupidy; int aEnd = aStart + widthA -1; int aStep = 16; float Csub = 0.+0.000001; int bStart = col; int bStep = 16*widthB; for(int a = aStart, b = bStart; a < aEnd; a += aStep, b += bStep) { if(rowStart + groupidx * 16 < rowEnd) {//if there exist any nonzero value in this group As[localid]=W[a + localid]; barrier(CLK_LOCAL_MEM_FENCE); } if(rowStart + groupidx * 16+ localid < rowEnd) {// if this pe correspond to a nonzero value for(int k=0; k<16; k++) Bs[k*16+localid]= H[b + k*widthB]; for(int k=0; k<16; k++) Csub += Bs[k*16+localid]*As[k]; } if(rowStart + groupidx * 16+ localid < rowEnd) Av_result[index] =1.0/Csub; }

}

Page 19: HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Matrix Factorization, by Huming Zhu

19 Xidian University 12/7/13

Page 20: HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Matrix Factorization, by Huming Zhu

20 Xidian University 12/7/13

2 Parallel Bayesian NMF on GPU

4 Experiment

5 Conclusion

Complex Network Clustering of NMF 1

3 Sparse BNMF on GPU

Contents

Page 21: HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Matrix Factorization, by Huming Zhu

21 Xidian University 12/7/13 21

Machine

" AMD Accelerated Parallel Processing (APP) SDK v2.7, OpenCL 1.2 " Microsoft Visual Studio 2010;

Host Device

Product Name HP xw9400 workstation Product Name AMD Radeon HD 7770

OS Windows 7 .x64 Edition Engine Speed 1000MHz

CPU 4× Dual-Core AMD Opteron 2220 2.80GHz Processing Elements 640

Memory 32GB Memory 1GB GDDR5

Memory Bandwidths 72GB/s

PCI PCI Express® 3.0 x16

Page 22: HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Matrix Factorization, by Huming Zhu

22 Xidian University 12/7/13 22

synthetic real-world networks Data Vertex Edges Q Data Vertex Edges Q

Benchmark 128 1024 0.450 Facebook 324 4436 0.620

LFR

500 5135 0.813 Email 1133 5451 0.531

1000 9582 0.904 Netscience 1461 2742 0.905

5000 38007 0.908 Power 4941 6594 0.599

10000 148470 0.860 Scientists 6650 59870 0.647

50000 748337 0.900 Hep 7610 15751 0.772

Evaluation Modularity(Q)[1]

1 ( ) ( , )2 2

i jij i j

ij

k kQ A C C

m mδ= −∑

[1]. M. E. J. Newman, M. Girvan, Finding and evaluating community structure in networks, Phys. Rev. E 69 (2) (2004) 026113.

Q↑,Better Network structure

Page 23: HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Matrix Factorization, by Huming Zhu

23 Xidian University 12/7/13 23

Network demo

Netscience (part)

Facebook• The netscience network is a network of co-authorship of scientists working on network theory and experiment.

Page 24: HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Matrix Factorization, by Huming Zhu

24 Xidian University 12/7/13 24

Speedup

Data Vertex K BNMF(s) P-BNMF(s) Sparse-BNMF(s) P-Ratio Sparse-Ratio

Benchmark 128 64 4.165 0.166 0.226 4.37 3.1

LFR

500 128 109.9 0.823 1.096 67.63 51.35 1000 128 712.5 2.98 2.798 187.58 181.6 5000 128 31031.5 109.96 71.167 279.39 417.21

10000 128 186321.7 615.09 334.23 302.92 556.2

50000 128 * * 8250.28 * *

Facebook 324 128 46.25 1.328 1.656 34.82 27.93

Email 1133 128 774.4 3.901 3.042 162.24 189.33

Netscience 1461 128 1253.2 6.725 4.628 166.11 215.81

Power 4941 128 26202.4 108.30 61.787 239.29 404.38 Hep 7610 128 76827.2 271.28 152.66 281.75 491.85

Scientists 6650 128 63254.5 208.2 125.55 303.81 503.84

K is the number of clustering,BNMF(s) serial time,P-Rati: P-BNMF/BNMF speedup Sparse-Ratio:Sparse-BNMF/BNMF speedup。

Page 25: HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Matrix Factorization, by Huming Zhu

25 Xidian University 12/7/13 25

Speedup

" Netscience " Cluster number K 64~256. " Speedup,Sparse-BNMF better。

Page 26: HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Matrix Factorization, by Huming Zhu

26 Xidian University 12/7/13 26

"  Using CodeXL to analyze OpenCL kernels on AMD GPUs

Page 27: HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Matrix Factorization, by Huming Zhu

27 Xidian University 12/7/13 27

Method GlobalWorkSize WorkGroupSize Time

Update_H {1472 128 1} {16 16 1} 6.12726 mat_mult_AB {1472 1472 1} {16 16 1} 10.73615 mat_dot_div {1472 1472 1} {16 16 1} 3.70267

mat_mult_AtB {1472 128 1} {16 16 1} 9.72355 mat_dot_mult {1472 128 1} {16 16 1} 0.30133

mat_squ_sum_row {1472 128 1} {64 1 1} 0.5483 mat_squ_sum_col { 128 1472 1} { 1 64 1} 7.27985

update_invbeta { 128 1 1} { 4 1 1} 0.03763 Update_W { 128 1472 1} {16 16 1} 6.25437

mat_mult_AB {1472 1472 1} {16 16 1} 10.75037

mat_dot_div {1472 1472 1} {16 16 1} 3.64148 mat_mult_ABt { 128 1472 1} {16 16 1} 9.04222 mat_dot_mult { 128 1472 1} {16 16 1} 0.2843

Method GlobalWorkSize WorkGroupSize Time

Update_H {1472 128 1} {16 16 1} 6.11407 A_WH_csr_col {1472 1472 1} { 1 16 1} 7.76119

mat_mult_A_s_col {1461 2048 1} { 1 16 1} 5.36341 mat_dot_mult {1472 128 1} {16 16 1} 0.2917

mat_squ_sum_row {1472 128 1} {64 1 1} 0.55304

mat_squ_sum_col { 128 1472 1} { 1 64 1} 6.99467 update_invbeta {128 1 1} { 4 1 1} 0.03748

Update_W { 128 1472 1} {16 16 1} 6.17718 A_WH_csr {1472 1472 1} {16 1 1} 6.29185

mat_mult_s_Bt {2048 1461 1} {16 1 1} 5.37615

mat_dot_mult { 128 1472 1} {16 16 1} 0.27763

Table1. P-BNMF kernel Table 2.Sparse-BNMF kernel的

" Table 1, bolt kernel,W* H,dot matriply,AtB。 " Table 2, Sparse kernel, A_WH_csr_co和mat_mult_A_s_col。 " CSR is better。

Kernel information provided by CodeXL

Page 28: HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Matrix Factorization, by Huming Zhu

28 Xidian University 12/7/13 28

PNMF Sparse-BNMF

SIZE small(<10000) big

speedup low high

PNMF VS Sparse-BNMF

# the Sparse-BNMF algorithm can solve the memory limit problem effectively,

# which enables the algorithm to deal with larger scale networks.

Page 29: HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Matrix Factorization, by Huming Zhu

29 Xidian University 12/7/13

2 Parallel Bayesian NMF on GPU

4 Experiment

5 Conclusion

Complex Network Clustering of NMF 1

3 Sparse BNMF on GPU

Contents

Page 30: HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Matrix Factorization, by Huming Zhu

30 Xidian University 12/7/13 30

Our work

" Present P-BNMF and Sparse-NMF;

"  P-BNMF;

"  Sparse-BNMF, CSR;

" speedup.

Future

" Portablity。

Page 31: HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Matrix Factorization, by Huming Zhu

31

Thank You!