A System to Filter Unwanted Messages from OSN User Walls Presented By: Gajanand Sharma M. E. Scholar UVCE Bangalore Guided By: Ms. Vandana Jha Ph. D. Scholar UVCE Bangalore
Jul 15, 2015
A System to Filter Unwanted Messages from
OSN User Walls
Presented By:
Gajanand Sharma
M. E. Scholar
UVCE Bangalore
Guided By:
Ms. Vandana Jha
Ph. D. Scholar
UVCE Bangalore
Introduction
Related Work
Model
Algorithm
Implementation
Performance
Conclusion
Bibliography
The underlying issue in today’s Online Social Networks is to give users theability to control the messages posted on their own timeline.
Online Social Networks provide a little support to this necessity.
The proposed system allows users to have a direct control on their timelineposts.
This is achieved by using a flexible rule based system allowing users tocustomize the filtering criteria.
Online Social Networks are one of the most popular medium for communication,
sharing and broadcasting the human life information.
Due to huge and dynamic character of data, web content mining strategies are
assumed to automatically discover the useful information from the data.
In OSNs this strategy is used to filter and remove unwanted posts on the user walls.
It can be implemented using ad - hoc classification strategies because wall messages
contain short text for which traditional classification methods do not work.
So the aim of proposed system is to evaluate an automated system able to filter
unwanted messages form user walls.
Machine Learning text categorization techniques are used to automatically assign with
each short text message a set of categories based on its content.
By using this technique, short messages are categorized into neutral and non-neutral.
Then Non-neutral messages are further classified into different categories.
By using Filtering Rules, users can state what contents should not be displayed on
their walls.
Filtering Rules exploit user profiles, user relationships as well as the output of the
Machine Learning categorization process to state the filtering criteria to be enforced.
The system also provides the support for user-defined Black-Lists. i.e., lists of users
that are temporarily prevented to post any kind of messages on a user wall.
N.J. Belkin and W.B. Croft, introduced Information filtering system, in “Information
Filtering and Information Retrieval: Two Sides of the Same Coin?”
P.J. Denning, introduced content based filtering system in paper entitled “Electronic Junk,”
P.W. Foltz and S.T. Dumais, also discussed information filtering system in the paper
“Personalized Information Delivery: An Analysis of Information Filtering Methods,”
M. Vanetti, E. Binaghi, B. Carminati, M. Carullo, and E. Ferrari, given the concept of content
based filtering in the paper “Content-Based Filtering in On-Line Social Networks,”
The architecture in support of OSN services is a three-tier structure.
1. Social Network Manager (SNM)
-Aims to provide the basic OSN functionalities…
2. Social Network Applications (SNAs)
-Provides the support for external Social Network Applications…
3. Graphical User Interfaces (GUIs)
-GUI to set up and manage FRs/ BLs by users…
OSN
Information
Filtering
Policy-based
Personalization
Short Text
Classification
Information filtering can be used for a different, more sensitive, purpose. This is
due to the fact that in OSNs there is the possibility of posting or commenting other
posts on particular public/private areas, called in general walls.
Information filtering can therefore be used to give users the ability to automatically
control the messages written on their own walls, by filtering out unwanted
messages.
Information filtering
It is something like first identifying Neutral sentences, then classifying Non-neutral
sentences…
First level task is somehow hard task i.e. labeling massage sentences Neutral or Non-
Neutral…
In second level non-neutral sentences are further classified into different classes…
The second level soft classifier produces a gradual membership for each non-neutral
sentence…
Short Text Classifier
A classification method has been proposed to categorize short text messages in
order to avoid overwhelming users of microblogging services by raw data.
Filtering policy language allows the setting of FRs according to a variety of
criteria, that do not consider only the results of the classification process but also
the relationships of the wall owner with other OSN users as well as information on
the user profile.
Policy based Personalization
Vector Space Model
underlying model for text representation
This is the underlying model for text representation according to which a text
document dj is represented as a vector of binary or real weights.
T is the set of terms that occur at least once in at least one document of the
collection Tr.
wkj є [0,1] represents how much term tk contributes to the semantics of
document dj.
RBFN Model
RFBNs have a single hidden layer of processing units with local, restricted
activation domain: A Gaussian function is commonly used.
RBFN main advantages are that classification function is nonlinear, the model
may produce confidence values and it may be robust to outliers.
Drawbacks are the potential sensitivity to input parameters, and potential
overtraining sensitivity.
A creator specification creatorSpec implicitly denotes a set of OSN users. It can
have following forms-
A set of attribute constraints of the form an OP av
A set of relationship constraints of the form (m, rt, minDepth, maxTrust)
A filtering rule FR is a tuple (author, creatorSpec, contentSpec, action)
Creator specification
Filtering Rule
A BL rule is a tuple (author, creatorSpec, creatorBehavior, T)
author is the OSN user who specifies the rule, i.e., the wall owner;
creatorSpec is a creator specification, specified according to Definition 1;
creatorBehavior consists of two components RFBlocked and minBanned.
RFBlocked = (RF, mode, window)
minBanned = (min, mode, window)
T denotes the time period the users identified by creatorSpec and
creatorBehavior have to be banned from author wall.
Black Lists
The short message goes in user’s filtering wall and checked using the filtering rules
defined by the user.
According to the user defined filtering rules, it is labeled as the class in it resides.
Then the gradual value of message is compared with the system defined threshold
value.
If message crosses the threshold value then it goes to block list. Otherwise it is posted
to user’s wall.
Two different types of measures will be used to evaluate the effectiveness of first-level
and second-level classifications.
In the first level, the short text classification procedure is evaluated on the basis of the
contingency table approach.
At second level, measures Precision (P) that permits to evaluate the number of false
positives, Recall (R), that permits to evaluate the number of false negatives, and the
overall metric F-measure (F β) defined as the harmonic mean between the above two
indexes.
Evaluation Metrics
The blacklist guarantees 100% filtering of messages
coming from suspicious sources.
The process of detecting and filtering spam is transparent,
regulated by standards and fairly reliable.
Flexibility, and the possibility to fine-tune the settings.
Rarely make mistakes in distinguishing spam from
legitimate messages.
Overall Performance
DicomFW is the GUI of this study work. It is a prototype Facebook application.
The main focus is on implementation of Filtering Rules throughout the
implementation.
This application permits to-
1. View the list of users’ Filtering Walls;
2. View messages and post a new one on a Filtering Walls;
3. Define Filtering Rules using the OSA tool.
DicomFW
In this whole study work, a system to filter undesired messages from Online Social
Network walls is presented.
The system exploits a Machine Learning soft classifier to enforce customizable
content-dependent Filtering Rules.
The flexibility of the system in terms of filtering options is enhanced through the
management of Black Lists.
The aim behind this work is to investigate a tool able to automatically recommend
trust values for those contacts user does not personally known.
[1] M. Vanetti, E. Binaghi, B. Carminati, M. Carullo, and E. Ferrari, “Content-Based Filtering in On-
Line Social Networks,” Proc. ECML/PKDD Workshop Privacy and Security Issues in Data Mining and
Machine Learning (PSDML ’10), 2010.
[2] Y. Zhang and J. Callan, “Maximum Likelihood Estimation for Filtering Thresholds,” Proc. 24th Ann.
Int’l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 294-302, 2001.
[3] M. Carullo, E. Binaghi, and I. Gallo, “An Online Document Clustering Technique for Short Web
Contents,” Pattern Recognition Letters, vol. 30, pp. 870-876, July 2009.
[4] M. Carullo, E. Binaghi, I. Gallo, and N. Lamberti, “Clustering of Short Commercial Documents for
the Web,” Proc. 19th Int’l Conf. Pattern Recognition (ICPR ’08), 2008.
[5] C.D. Manning, P. Raghavan, and H. Schu ¨tze, Introduction to Information Retrieval. Cambridge
Univ. Press, 2008.
[6] J. Moody and C. Darken, “Fast Learning in Networks of LocallyTuned Processing Units,” Neural
Computation, vol. 1, no. 2, pp. 281-294, 1989.
[7] M.J.D. Powell, “Radial Basis Functions for Multivariable Interpolation: A Review,” Algorithms for
Approximation, pp. 143-167, Clarendon Press, 1987.
[8] J. Park and I.W. Sandberg, “Approximation and Radial-BasisFunction Networks,” Neural
Computation, vol. 5, pp. 305-316, 1993.
[9] C. Cleverdon, “Optimizing Convenient Online Access to Bibliographic Databases,” Information
Services and Use, vol. 4, no. 1, pp. 37-47, 1984.
[10] J.A. Golbeck, “Computing and Applying Trust in Web-Based Social Networks,” PhD dissertation,
Graduate School of the Univ. of Maryland, College Park, 2005.