Approaches to ml techniques on real world data

Approaches to ML Techniques on Real World Data

A Demo on Behavioral Analysis in Social Networking Sites.

--Venkata Ramana C

Real World Data

Social Networking Sites Blogs

Forums Tweets …

• Create an Environment where you create your own rules on how to share your data on the Web.

Technique

• Active Learning

Active Learning

• The key idea behind active learning is that a machine learning algorithm can achieve greater accuracy with fewer labelled training instances if it is allowed to choose the data from which is learns.

• An active learner may ask queries in the form of unlabeled instances to be labelled by an oracle (e.g., a human annotator).

Scenario

Uncertainty Sampling• The technique of the project is as follows:

• there will be questions to the user about each friend as below, followed by what the user wants to share with his friend -- which are the profile features.

• How are u associated with your friend? ( or )

• What do u have in common with your friend?

• 1.personal

• 2.donno how (shall take some time to decide ?)

• 3.we have ...x.y.z....(specify) in common. [this will form a group]

Pseudo Algorithm

• Input: initial small training set L, and pool of unlabeled data set U Use L to train the initial classifier C

– Repeat• Use the current classifier C to label all unlabeled examples in U.• Use uncertainty sampling technique to select m2 most informative unlabeled examples, and ask oracle H for labelling.• Augment L with these m new examples, and remove them from U.• Use L to retrain the current classifier C.

Until the predefined stopping criterion SC is met.

So When is This Useful?

Friend Groups

Active Learning for Privacy

Courtesy: Privacy Wizards for Social Networking Sites. WWW2010

Concepts

• Gini Impurity - the expected error rate.

• Entropy - how mixed up a set is? p(i) = frequency(outcome) = count(outcome) / count(total rows) Entropy = sum of p(i) x log(p(i)) for all outcomes

Courtesy: Collective Intelligence by Toby Segaran

CART (Classification and Regression Trees)

• Decision tree classifiers are simple to view and interpret.

• If-Then rules.

Application Data.

• Id group1 group2 ..... DOB • 123 Yes No ..... Share• 124 No Yes ..... NotShare• ...... ... ..... .....• ...... ... ..... .....• ...... ... ..... .....• ...... ... ..... .....• 129 Yes Yes ..... Share

• We will use Decision Trees to train some of our friends in Social Networks to set Privacy Preferences.

Demo Results:

• [u'project', u'srm', u'sssg'] => Feature Vector [0, 1, 2] => mapped by CART algorithm• Goal is to ‘Share’ or ‘NotShare’

Dob, zip, religion, phone, email.For Dob: [‘yes’, ‘No’, ‘No’, ‘Share’] (from Vasu) [‘No’, ‘Yes’, ‘No’, ‘NotShare’] (from Cigith) [‘No’, ‘No’, ‘Yes’, ‘Share’] (from Harish)

Decision Tree for DOB: 0: Yes? F/ \T{‘Share’: 2} {‘NotShare’: 1}

Zip Code:• [['Yes', 'No', 'No', 'Share'], ['No', 'Yes', 'No',

'NotShare'], ['No', 'No', 'Yes', 'Share']] 1:Yes? T-> F->{'NotShare': 1} {'Share': 2}

References

• Collective Intelligence by Toby Segaran.

• Privacy Wizards for Social Networking Sites. WWW2010.

Approaches to ml techniques on real world data

technique active learning

active learner

friend groups

current classifier c

privacy wizards

privacy courtesy

decision trees

data fromwhich

Education

Approaches and Techniques for Isolating and Cultivating...

Empowerment Approaches with Offenders: Techniques that Heal

Mediation and dispute resolution techniques and approaches

Techniques And Approaches For Selecting A Quality Roofer

2 Requirements Elicitation: A Survey of Techniques,...

Pr1 research techniques. Unit 6 critical approaches

RECENT APPROACHES AND MEASUREMENT TECHNIQUES FOR...

Patient and Public Involvement - approaches and techniques.....

Rhinoplasty approaches ,anatomy,techniques

Mechanical Design Reliability Handbook: Simplified ... ·...

IDS Using Machine What is ML? Learning Techniques

NIHR Supporting the development of innovative diagnostic...

Approaches and techniques in budgeting

Nanotechnology: Applications, techniques, approaches, & the

UNIT-4 APPROACHES, METHODS AND TECHNIQUES OF …

Anomaly detection in production systems using ML techniques