Top Banner
Approaches to ML Techniques on Real World Data A Demo on Behavioral Analysis in Social Networking Sites. --Venkata Ramana C
22
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Approaches to ml techniques on real world data

Approaches to ML Techniques on Real World Data

A Demo on Behavioral Analysis in Social Networking Sites.

--Venkata Ramana C

Page 2: Approaches to ml techniques on real world data

Real World Data

Social Networking Sites Blogs

Forums Tweets …

Page 3: Approaches to ml techniques on real world data
Page 4: Approaches to ml techniques on real world data

Aim

• Create an Environment where you create your own rules on how to share your data on the Web.

Page 5: Approaches to ml techniques on real world data

Technique

• Active Learning

Page 6: Approaches to ml techniques on real world data

Active Learning

• The key idea behind active learning is that a machine learning algorithm can achieve greater accuracy with fewer labelled training instances if it is allowed to choose the data from which is learns.

• An active learner may ask queries in the form of unlabeled instances to be labelled by an oracle (e.g., a human annotator).

Page 7: Approaches to ml techniques on real world data

Scenario

Page 8: Approaches to ml techniques on real world data

Uncertainty Sampling• The technique of the project is as follows:

• there will be questions to the user about each friend as below, followed by what the user wants to share with his friend -- which are the profile features.

• How are u associated with your friend? ( or )

• What do u have in common with your friend?

• 1.personal

• 2.donno how (shall take some time to decide ?)

• 3.we have ...x.y.z....(specify) in common. [this will form a group]

Page 9: Approaches to ml techniques on real world data

Pseudo Algorithm

• Input: initial small training set L, and pool of unlabeled data set U Use L to train the initial classifier C

– Repeat• Use the current classifier C to label all unlabeled examples in U.• Use uncertainty sampling technique to select m2 most informative unlabeled examples, and ask oracle H for labelling.• Augment L with these m new examples, and remove them from U.• Use L to retrain the current classifier C.

Until the predefined stopping criterion SC is met.

Page 10: Approaches to ml techniques on real world data

So When is This Useful?

Page 11: Approaches to ml techniques on real world data

Friend Groups

Page 12: Approaches to ml techniques on real world data

Active Learning for Privacy

Courtesy: Privacy Wizards for Social Networking Sites. WWW2010

Page 13: Approaches to ml techniques on real world data

Concepts

• Gini Impurity - the expected error rate.

• Entropy - how mixed up a set is? p(i) = frequency(outcome) = count(outcome) / count(total rows) Entropy = sum of p(i) x log(p(i)) for all outcomes

Page 14: Approaches to ml techniques on real world data

Courtesy: Collective Intelligence by Toby Segaran

Page 15: Approaches to ml techniques on real world data

CART (Classification and Regression Trees)

• Decision tree classifiers are simple to view and interpret.

• If-Then rules.

Page 16: Approaches to ml techniques on real world data

Application Data.

• Id group1 group2 ..... DOB • 123 Yes No ..... Share• 124 No Yes ..... NotShare• ...... ... ..... .....• ...... ... ..... .....• ...... ... ..... .....• ...... ... ..... .....• 129 Yes Yes ..... Share

Page 17: Approaches to ml techniques on real world data

Demo

• We will use Decision Trees to train some of our friends in Social Networks to set Privacy Preferences.

Page 18: Approaches to ml techniques on real world data

Demo Results:

• [u'project', u'srm', u'sssg'] => Feature Vector [0, 1, 2] => mapped by CART algorithm• Goal is to ‘Share’ or ‘NotShare’

Dob, zip, religion, phone, email.For Dob: [‘yes’, ‘No’, ‘No’, ‘Share’] (from Vasu) [‘No’, ‘Yes’, ‘No’, ‘NotShare’] (from Cigith) [‘No’, ‘No’, ‘Yes’, ‘Share’] (from Harish)

Page 19: Approaches to ml techniques on real world data

Decision Tree for DOB: 0: Yes? F/ \T{‘Share’: 2} {‘NotShare’: 1}

Zip Code:• [['Yes', 'No', 'No', 'Share'], ['No', 'Yes', 'No',

'NotShare'], ['No', 'No', 'Yes', 'Share']] 1:Yes? T-> F->{'NotShare': 1} {'Share': 2}

Page 20: Approaches to ml techniques on real world data
Page 21: Approaches to ml techniques on real world data
Page 22: Approaches to ml techniques on real world data

References

• Collective Intelligence by Toby Segaran.

• Privacy Wizards for Social Networking Sites. WWW2010.