Transcript

Approaches to ML Techniques on Real World Data

A Demo on Behavioral Analysis in Social Networking Sites.

--Venkata Ramana C

Real World Data

Social Networking Sites Blogs

Forums Tweets …

Aim

• Create an Environment where you create your own rules on how to share your data on the Web.

Technique

• Active Learning

Active Learning

• The key idea behind active learning is that a machine learning algorithm can achieve greater accuracy with fewer labelled training instances if it is allowed to choose the data from which is learns.

• An active learner may ask queries in the form of unlabeled instances to be labelled by an oracle (e.g., a human annotator).

Scenario

Uncertainty Sampling• The technique of the project is as follows:

• there will be questions to the user about each friend as below, followed by what the user wants to share with his friend -- which are the profile features.

• How are u associated with your friend? ( or )

• What do u have in common with your friend?

• 1.personal

• 2.donno how (shall take some time to decide ?)

• 3.we have ...x.y.z....(specify) in common. [this will form a group]

Pseudo Algorithm

• Input: initial small training set L, and pool of unlabeled data set U Use L to train the initial classifier C

– Repeat• Use the current classifier C to label all unlabeled examples in U.• Use uncertainty sampling technique to select m2 most informative unlabeled examples, and ask oracle H for labelling.• Augment L with these m new examples, and remove them from U.• Use L to retrain the current classifier C.

Until the predefined stopping criterion SC is met.

So When is This Useful?

Friend Groups

Active Learning for Privacy

Courtesy: Privacy Wizards for Social Networking Sites. WWW2010

Concepts

• Gini Impurity - the expected error rate.

• Entropy - how mixed up a set is? p(i) = frequency(outcome) = count(outcome) / count(total rows) Entropy = sum of p(i) x log(p(i)) for all outcomes

Courtesy: Collective Intelligence by Toby Segaran

CART (Classification and Regression Trees)

• Decision tree classifiers are simple to view and interpret.

• If-Then rules.

Application Data.

• Id group1 group2 ..... DOB • 123 Yes No ..... Share• 124 No Yes ..... NotShare• ...... ... ..... .....• ...... ... ..... .....• ...... ... ..... .....• ...... ... ..... .....• 129 Yes Yes ..... Share

Demo

• We will use Decision Trees to train some of our friends in Social Networks to set Privacy Preferences.

Demo Results:

• [u'project', u'srm', u'sssg'] => Feature Vector [0, 1, 2] => mapped by CART algorithm• Goal is to ‘Share’ or ‘NotShare’

Dob, zip, religion, phone, email.For Dob: [‘yes’, ‘No’, ‘No’, ‘Share’] (from Vasu) [‘No’, ‘Yes’, ‘No’, ‘NotShare’] (from Cigith) [‘No’, ‘No’, ‘Yes’, ‘Share’] (from Harish)

Decision Tree for DOB: 0: Yes? F/ \T{‘Share’: 2} {‘NotShare’: 1}

Zip Code:• [['Yes', 'No', 'No', 'Share'], ['No', 'Yes', 'No',

'NotShare'], ['No', 'No', 'Yes', 'Share']] 1:Yes? T-> F->{'NotShare': 1} {'Share': 2}

References

• Collective Intelligence by Toby Segaran.

• Privacy Wizards for Social Networking Sites. WWW2010.

top related