Top Banner
Plagiarism Detection as a Plagiarism Detection as a Problem of Machine Problem of Machine Learning Learning Academician Yuri I. Zhuravlev Correspondent member of RAS Konstantin V. Rudakov Gleb V. Nikitov Computing Center of Russian Academy of Sciences Forecsys Corporation
23

Plagiarism Detection as a Problem of Machine Learning Academician Yuri I. Zhuravlev Correspondent member of RAS Konstantin V. Rudakov Gleb V. Nikitov Computing.

Jan 03, 2016

Download

Documents

Gary Snow
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Plagiarism Detection as a Problem of Machine Learning Academician Yuri I. Zhuravlev Correspondent member of RAS Konstantin V. Rudakov Gleb V. Nikitov Computing.

Plagiarism Detection as a Plagiarism Detection as a Problem of Machine LearningProblem of Machine Learning

Academician Yuri I. ZhuravlevCorrespondent member of RAS Konstantin V. Rudakov

Gleb V. Nikitov

Computing Center of Russian Academy of Sciences

Forecsys Corporation

Page 2: Plagiarism Detection as a Problem of Machine Learning Academician Yuri I. Zhuravlev Correspondent member of RAS Konstantin V. Rudakov Gleb V. Nikitov Computing.

About the problemAbout the problem

Detect citing in students’ papers

Do it quickly and conveniently Do it qualitatively and with

substantiation

Page 3: Plagiarism Detection as a Problem of Machine Learning Academician Yuri I. Zhuravlev Correspondent member of RAS Konstantin V. Rudakov Gleb V. Nikitov Computing.

Decisions

Turnitin Mydropbox www.antiplagiat.ru

Page 4: Plagiarism Detection as a Problem of Machine Learning Academician Yuri I. Zhuravlev Correspondent member of RAS Konstantin V. Rudakov Gleb V. Nikitov Computing.

`Working scheme

Paper

Instructor AntiplagiatCollection ofdocuments

Page 5: Plagiarism Detection as a Problem of Machine Learning Academician Yuri I. Zhuravlev Correspondent member of RAS Konstantin V. Rudakov Gleb V. Nikitov Computing.

Searching domainSearching domainInternet

Page 6: Plagiarism Detection as a Problem of Machine Learning Academician Yuri I. Zhuravlev Correspondent member of RAS Konstantin V. Rudakov Gleb V. Nikitov Computing.

Already usingAlready usingHigher School of EconomicsMoscow Institute of Economics, Management and LawMoscow Pedagogical State UniversityMoscow Municipal Psychological and Pedagogical InstituteNizhni Novgorod State UniversityAcademy of Budget and Treasury of the Russian Ministry of Finance

Page 7: Plagiarism Detection as a Problem of Machine Learning Academician Yuri I. Zhuravlev Correspondent member of RAS Konstantin V. Rudakov Gleb V. Nikitov Computing.

Non-educational use

Higher Certifying Commission Russian State Library (ex-named

after Lenin)

Page 8: Plagiarism Detection as a Problem of Machine Learning Academician Yuri I. Zhuravlev Correspondent member of RAS Konstantin V. Rudakov Gleb V. Nikitov Computing.

NegotiationsNegotiations

Moscow State University Moscow Physical and Technical

Institute Russian Academy of Justice International Academy of Enterprise

Page 9: Plagiarism Detection as a Problem of Machine Learning Academician Yuri I. Zhuravlev Correspondent member of RAS Konstantin V. Rudakov Gleb V. Nikitov Computing.

Quality and PerformanceQuality and Performance

Leading positions in operating speed not affecting quality of the results

70 thousands of registered users Generating about 20 thousands

originality reports every day Continual improvement of searching

algorithms and expanding functionality

Page 10: Plagiarism Detection as a Problem of Machine Learning Academician Yuri I. Zhuravlev Correspondent member of RAS Konstantin V. Rudakov Gleb V. Nikitov Computing.

Plagiarism, what is it?

Page 11: Plagiarism Detection as a Problem of Machine Learning Academician Yuri I. Zhuravlev Correspondent member of RAS Konstantin V. Rudakov Gleb V. Nikitov Computing.

Formulation of problem Permissible objects:

Descriptive functions:

Fixed set of functions:

1 2| {1,..., } , | 1, ,i i iS i N Fr Fr i N

:| DDDescr

nSSDD |)()( 00

Page 12: Plagiarism Detection as a Problem of Machine Learning Academician Yuri I. Zhuravlev Correspondent member of RAS Konstantin V. Rudakov Gleb V. Nikitov Computing.

Formulation of problem

The problem:

Initial information:

Final information:

fiA :

)(0 Di

1,...,1,0 kf

Page 13: Plagiarism Detection as a Problem of Machine Learning Academician Yuri I. Zhuravlev Correspondent member of RAS Konstantin V. Rudakov Gleb V. Nikitov Computing.

Formulation of problem

Precedent information:

Precedent conditions:

1 1, ,..., , , , , 1,q q j jfS An S An где S An j q

0

1,...,

j j

qj A D S An

Page 14: Plagiarism Detection as a Problem of Machine Learning Academician Yuri I. Zhuravlev Correspondent member of RAS Konstantin V. Rudakov Gleb V. Nikitov Computing.

Formulation of problem

Transitive and reflective relation :

Example:

0 0 0 0 0 01 2 1 2 1 2 1 2

1,...,, , ,i i

NFr Fr and Fr Fr i Fr Fr Fr Fr

)(),(min

,,

21

2121 FrLFrL

FrFrWLFrFr

22

12

12

11

22

21

12

11 ,,,, FrFrFrFrFrFrFrFr

Page 15: Plagiarism Detection as a Problem of Machine Learning Academician Yuri I. Zhuravlev Correspondent member of RAS Konstantin V. Rudakov Gleb V. Nikitov Computing.

Formulation of problem

Additional conditions:

1 1 2 2

1 1 2 2

1 2 1 2 1 21,...,

0 01 2 1 2

, , ,

, ,

i i i i

N

i i i i

i i Fr Fr Fr Fr

A D Fr Fr A D Fr Fr

Page 16: Plagiarism Detection as a Problem of Machine Learning Academician Yuri I. Zhuravlev Correspondent member of RAS Konstantin V. Rudakov Gleb V. Nikitov Computing.

Criteria

Solvability criteriaFor existence of the correct algorithm A it is

necessary and sufficient that the following conditions are met:

1 2 1 1 2 2 1 20 01 2 1 2

{1,..., } {1,..., ], : , : &j j i j i j i i

q Nj j An An i i S S S S D S D S

2121:21},...,1{

jjjj

qAnAnSSjj

Page 17: Plagiarism Detection as a Problem of Machine Learning Academician Yuri I. Zhuravlev Correspondent member of RAS Konstantin V. Rudakov Gleb V. Nikitov Computing.

Criteria

Regularity:Definition (according to Zhuravlev). The problem Z is regular if all the problems with arbitrary final information are simultaneously solvable

Regularity criteria:For a problem to be regular it is necessary and

sufficient that the following conditions are met:

2121

},...,1{

jj

qSSjj

212211 0021

},...,1{21

},...,1{&:,: iijiji

NqSDSDSSSSiijj

Page 18: Plagiarism Detection as a Problem of Machine Learning Academician Yuri I. Zhuravlev Correspondent member of RAS Konstantin V. Rudakov Gleb V. Nikitov Computing.

Criteria

Monotonous solvability criteria:For monotonous solvability of the problem it is

necessary and sufficient that the conditions of solvability criteria are met and are also met the following conditions:

1 2 1 1 2 2

1 2

1 2 1 2{1,..., } {1,..., }

0 0

: &j j i j i j

q N

i i

j j An An i i S S S S

D S D S

Page 19: Plagiarism Detection as a Problem of Machine Learning Academician Yuri I. Zhuravlev Correspondent member of RAS Konstantin V. Rudakov Gleb V. Nikitov Computing.

Criteria

Monotonous regularity criteria:For monotonous solvability of the problem it is

necessary and sufficient that are met the following conditions:

2121

},...,1{

jj

qSSjj

212211 0021

},...,1{21

},...,1{||&:,, iijiji

NqSDSDSSSSiijj

Page 20: Plagiarism Detection as a Problem of Machine Learning Academician Yuri I. Zhuravlev Correspondent member of RAS Konstantin V. Rudakov Gleb V. Nikitov Computing.

Criteria

Supercompleteness:The family of algorithms M is called supercomplete in the described class of problems if for each problem Z from the set of solvable problems there exist in M at least one correct algorithm.

S] [Z Z

Page 21: Plagiarism Detection as a Problem of Machine Learning Academician Yuri I. Zhuravlev Correspondent member of RAS Konstantin V. Rudakov Gleb V. Nikitov Computing.

Criteria

Completeness:The family of algorithms M is called complete in the described class of problems if for each problem Z from the set of regular problems there exist in M at least one correct algorithm.

R] [Z Z

Page 22: Plagiarism Detection as a Problem of Machine Learning Academician Yuri I. Zhuravlev Correspondent member of RAS Konstantin V. Rudakov Gleb V. Nikitov Computing.

Criteria

Supercompleteness criteria:For the family of algorithmic

operators to be supercomplete it is necessary and sufficient that the following conditions are met:

0M

1 2 1 20 0

1 2{1,..., }

, : :i i i i

Ni i S S B B D S B D S

0M

Page 23: Plagiarism Detection as a Problem of Machine Learning Academician Yuri I. Zhuravlev Correspondent member of RAS Konstantin V. Rudakov Gleb V. Nikitov Computing.

Criteria

Completeness criteria:For the family of algorithmic

operators to be complete it is necessary and sufficient that the following conditions are met:

0M

1 2 2 1 1 20 0

1 2{1,..., }, : & :i i i i i i

Ni i S S S S B B D S B D S

0M