Top Banner

Click here to load reader

svd diplomatiki

Apr 21, 2015

ReportDownload

Documents

Ivaylo Kehayov (: 268) : :

2011

-i-

. . , , . . , . , . , / . . , . , , . , . , . , .

Ivaylo Kehayov 2011-i-

-ii-

.................................................................................................................... I .......................................................................................................... III 1 ................................................................................................................1 1.1 ...............................................................................1 1.1.1 ...............................................................................2 1.1.2 ................................................................3 1.1.3 .........................................................4 1.2 ..................................................................................................6 2 - ................................................7 2.1 FLESH .................................................................................................................7 2.2 .............................................................................10 2.3 SVD AGGREGATION ..................................................................................13 3 SINGULAR VALUE DECOMPOSITION ............................................................21 3.1 ...........................................................................................................21 3.2 SVD .....................................................................22 3.2.1 SVD .......................................................24 3.2.2 ...................................................30 3.3 TOY EXAMPLE ..................................................................................................33 4 REUTERS .....................................41 4.1 ......................................................................................41 4.2 REUTERS..........................................................................................46 4.2.1 ...............................................................................46 4.2.2 ....................................................................................48 4.2.3 ..........................................................................................49 5 ............................................................................51 5.1 .........................................................................................................51-iii-

5.2 0% .................................. 52 5.3 25% ................................ 53 5.4 50% ................................ 58 5.5 REUTERS ....................................... 65 6 ........................................... 71 ........................................................................................................... 75 1: ................................................................. 79 1.1 GENERATOR .............................................................................. 79 1.2 REUTERS ......................................................................................................... 80 1.3 .................................................... 83

-iv-

1 , 1994 ( 1998) , . . Flesh Reading Ease . , , . Flesh Reading Ease . , (data set) .

1.1 , . , , , , . . , , .-1-

1.1.1 (text mining). , . : : . , . : PubGene . : , IBM Microsoft, . : , Tribune Company, , . : . : , , . : . , (.. ) . , Nature, .

-2-

, , . , , , . , spam e-mail .

1.1.2 . . , . (text classification categorization) (document classification categorization) , , , . : (supervised classification), ( ) . (unsupervised classification), . - (semi-supervised classification), . ( Flesh Reading Ease) . (data set) , (collection corpus). : (training set)

-3-

(test set). 70% , 30%. . . , ( ), , . , ( , ). , . , training set test set . training set ground truth . : nave Bayes, tfidf, Latent Semantic Indexing (LSI), Support Vector Machines (SVM), , k- (kNN), ( ID3 C4.5), concept mining .

1.1.3 , . , , , . (term frequency vector). , , , (dictionary). ( ) training set. . , . , . , automobile . . , automobile , 7, -4-

, 0 . . , . , , : 1: The Sun is a star. 2: The Earth is a planet. 3: The Earth is smaller than the Sun. ( , ): = [a, Earth, is, planet, smaller, star, Sun, than, the] , 9. : 1 = [1, 0, 1, 0, 0, 1, 1, 0, 1] 2 = [1, 1, 1, 1, 0, 0, 0, 0, 1] 3 = [0, 1, 1, 0, 1, 0, 1, 1, 2] , 0 1, 2 the 3. . , a, is the. . , . stop words, . , . , .

-5-

( 2) .

1.2 : (related work), SVD. SVD, . , Reuters-21578. . . , .

-6-

2 . , SVD-Aggregation.

2.1 Flesh Flesh . Flesh, Flesh Reading Ease Flesh-Kincaid Grade Level. , , , . , Flesh Reading Ease Flesh-Kincaid Grade Level. Rudolf Flesh ( J. Peter Kincaid), . Flesh-Kincaid Grade Level 1975 J. Peter Kincaid Rudolf Flesh. Flesh Reading Ease 1978 . Flesh Reading Ease. . Flesh Reading Ease : , . -7-

. :

. ( ) 120, . , , . 0 100 .

1: Flesh Reading Ease

Flesh Reading Ease 90-100 80-89 70-79 60-69 50-59 30-49 0-29

.

-8-

2: Flesh Reading Ease

90-100

60-70

0-30

, http://en.wikipedia.org/wiki/Alan_Turing Alan Turing: Alan Mathison Turing, OBE, FRS (pronounced TEWR-ing; 23 June 1912 7 June 1954), was an English mathematician, logician, cryptanalyst and computer scientist. He was highly influential in the development of computer science, providing a formalization of the concepts of "algorithm" and "computation" with the Turing machine, which played a significant role in the creation of the modern computer. During the Second World War, Turing worked for the Government Code and Cypher School at Bletchley Park, Britain's code breaking center. For a time he was head of Hut 8, the section responsible for German naval cryptanalysis. He devised a number of techniques for breaking German ciphers, including the method of the bombe, an electromechanical machine that could find settings for the Enigma machine. After the war he worked at the National Physical Laboratory, where he created one of the first designs for a stored-program computer, the ACE. : 256 : 144 : 7 Flesh Reading Ease :

-9-

Flesh Reading Easy 35.56 . Microsoft Office Word, KWord, WordPro, IBM Lotus Symphony, Abiword WordPerfect. Flesh Reading Ease. , , Flesh Reading Ease Flesh.

2.2 . , . -1 1. 0, 1, . 90 0 , . 180 -1. , . ( ) ( ). , 0 1, . 90. . :

-10-

. , A B, :

1 .

1:

, : 1: The Sun is a star. 2: The Earth is a planet. 3: The Earth is smaller than the Sun. : = [a, Earth, is, planet, smaller, star, Sun, than, the] : 1 = [1, 0, 1, 0, 0, 1, 1, 0, 1] 2 = [1, 1, 1, 1, 0, 0, 0, 0, 1] 3 = [0, 1, 1, 0, 1, 0, 1, 1, 2]-11-

, : 4: The Sun is not a planet. : 4 = [1, 0, 1, 1, 0, 0, 1, 0, 1] training set test set. . . , , . , , . : 1 4:

2 4:

3 4:

4 1 2 3. 3 , 4 . : test set training set ( ). test train . training . , training 1, test . , 1, 2 3, -12-

. test set.

2.3 SVD Aggregation Singular Value Decomposition (Aggregation) / oy . SVD , . , training set test set. training set ( 3).

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.