Top Banner
Automated Patent Classification By Yu Hu
12

Automated Patent Classification By Yu Hu. Class 706 Subclass 12.

Dec 28, 2015

Download

Documents

Joanna Lee
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Automated Patent Classification By Yu Hu. Class 706 Subclass 12.

Automated Patent Classification

By Yu Hu

Page 2: Automated Patent Classification By Yu Hu. Class 706 Subclass 12.

Class 706

Subclass 12

Page 3: Automated Patent Classification By Yu Hu. Class 706 Subclass 12.

Patent Classifier

• Input: descriptions of the invention(abstracts)• Output: US Classification • Data from USPTO Full-text database• Extract abstracts and classifications

Page 4: Automated Patent Classification By Yu Hu. Class 706 Subclass 12.

Abstract of US8452718

• Automated determination of a number of profiles for a training data set to be used in training a machine learning system for generating target function information from modeled profile parameters. In one embodiment, a first principal component analysis (PCA) is performed on a training data set, and a second PCA is performed on a combined data set which includes the training data set and a test data set. A test data set estimate is generated based on the first PCA transform and the second PCA matrix. The size of error between the test data set and the test data set estimate is used to determine whether a number of profiles associated with the training data set is sufficiently large for training a machine learning system to generate a library of spectral information.

Page 5: Automated Patent Classification By Yu Hu. Class 706 Subclass 12.

Bag of Words

• Automated determination of a number of profiles for a training data set to be used in training a machine learning system for generating target function information from modeled profile parameters. In one embodiment, a first principal component analysis (PCA) is performed on a training data set, and a second PCA is performed on a combined data set which includes the training data set and a test data set. A test data set estimate is generated based on the first PCA transform and the second PCA matrix. The size of error between the test data set and the test data set estimate is used to determine whether a number of profiles associated with the training data set is sufficiently large for training a machine learning system to generate a library of spectral information

Page 6: Automated Patent Classification By Yu Hu. Class 706 Subclass 12.

K Nearest Neighbor

Page 7: Automated Patent Classification By Yu Hu. Class 706 Subclass 12.

Data

• 631 most recently filed patent application of Apple Inc.• Preprocessing: Remove html tags, punctuation, stopwords, (numbers)• Extract abstracts and classifications

Page 8: Automated Patent Classification By Yu Hu. Class 706 Subclass 12.

• Document_term_matrix• Training/Test Split: 70/30• # of Training documents: 110 • K= 9• Confusion Matrix• computer graphics v. document processing v. telecommunication

Page 9: Automated Patent Classification By Yu Hu. Class 706 Subclass 12.

382 Image Analysis v. 435 Chemistry: molecular biology and microbiologyTraining documents: ~ 400; 80% split

382:Precision = 86.4%Recall = 97.4%

435:Precision = 81.8%Recall = 96.4%

Page 10: Automated Patent Classification By Yu Hu. Class 706 Subclass 12.

Subclasses of Image Analysis Overlap

Page 11: Automated Patent Classification By Yu Hu. Class 706 Subclass 12.

Subclass classification of 382 Image Analysis

382/181: Pattern Recognition382/232: Image Compression

If-Idf:  term frequency-inverse document frequencyThis weight is a statistical measure used to evaluate how important a word is to a document in a collection or corpus. The importance increases proportionally to the number of times a word appears in the document but is offset by the frequency of the word in the corpus.

181:Precision = 83.7%Rrecall = 75%

232:Precision = 67.5%Recall = 78.1%

Page 12: Automated Patent Classification By Yu Hu. Class 706 Subclass 12.

• Thank you!• Questions?