Top Banner
Searching for patterns in crowdsourced Information Silvia Puglisi
32

Searching for patterns in crowdsourced information

Oct 30, 2014

Download

Technology

Silvia Puglisi

 
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Searching for patterns in crowdsourced information

Searching for patterns in crowdsourced Information

Silvia Puglisi

Page 2: Searching for patterns in crowdsourced information

- Let me introduce myself..- What is crowdsourcing?- Discovering network dynamics and patterns in unstructured data.- Where to go from here..

Table of content

Page 3: Searching for patterns in crowdsourced information

Let me introduce myself..

2007: Graduated in Computer Engineering from Polimi [Politecnico di Milano].

Thesis on applications in robotics of a model of the hippocampal spatial function.

The project involved applying a path-planning algorithm based on neural networks on a e-puck robot.

http://www.e-puck.org for more info on e-puck

Page 4: Searching for patterns in crowdsourced information

Let me introduce myself..

2007: Joined Google as Corporate Operations Engineer.

My responsibilities included maintaining, designing, diagnosing, troubleshooting and/or updating Google corporate IT infrastructure and user-facing services.

Page 5: Searching for patterns in crowdsourced information

Let me introduce myself..

2010: Joined Google Enterprise team as Technical Account Manager for Gmail and Postini.

My responsibilities included: - Develop creative solutions to maximize the adoption of Google Apps in organisations.- Work with product and engineering teams to translate customer needs into a better product experience.- Develop and implement processes and infrastructure to scale customer-facing operations.

Page 6: Searching for patterns in crowdsourced information

Let me introduce myself..

2012: Left Google to finish M.Sc. Thesis and prepare for Ph.D.

2012: Graduated from Trinity College Dublin in M.Sc. program in Management of Information Systems.

Final Thesis: Proposing a method for evaluating the quality of crowdsourced geographical information.

Page 7: Searching for patterns in crowdsourced information

What is crowdsourcing?

Crowdsourcing can be defined as the application of Open Source principles to fields outside of software.

Howe, 2006.

Page 8: Searching for patterns in crowdsourced information

What is crowdsourcing?

Crowdsourcing takes a decentralized approach to problem solving, sourcing tasks that have been performed traditionally by individuals, to a group of people:

the crowd.

Page 9: Searching for patterns in crowdsourced information

From crowdsourcing to spontaneous collaboration.

Crowdsourcing initiatives usually starts with a call for solutions from an organization or an entity.

Although..Networks dynamics sometimes are also an indirect source for data and answers to specific problems.

Wikipedia is maybe the most striking example of this phenomenon, for which people decide to collaborate spontaneously towards a task.

Page 10: Searching for patterns in crowdsourced information

Discovering networks dynamics and patterns in unstructured data.

“Some twenty years ago I saw, or thought I saw, a synchronal or simultaneous flashing of fireflies. I could

hardly believe my eyes, for such a thing to occur among insects is certainly contrary to all natural laws.”

Philip Laurent, Science Journal 1917

Page 11: Searching for patterns in crowdsourced information

Discovering networks dynamics and patterns in unstructured data.

Complex network structures describe a wide variety of systems, of technological and biological importance.

The web itself is an example of a complex network of pages linked by their hyperlinks.

A social network is instead an idea of a network whose nodes are the human beings and whose edge are the various human relationships that occur between them.

Page 12: Searching for patterns in crowdsourced information

The web is a giant bobble of unstructured data.

The web has hence been developing as an open environment with infinite possibilities for collaboration and information sharing.

Users activity on the web now generates content which provides a variety of diverse information regarding the interaction between different entities and the world around them.

This is enhanced in Social Networks where people voluntarily share information about anything.

Page 13: Searching for patterns in crowdsourced information

Volunteered Information VS web pages.

Volunteered information constitute snippets of text, most of the times just a few words, with other media attached: photos, videos, sounds.

Volunteered information are to web pages what post-its or snippets are to books.

Page 14: Searching for patterns in crowdsourced information

Volunteered Information VS web pages.

Volunteer information do not exhibits an explicit network structure constituted by the explicit link between them.

In the case of a web page, this structure is evident, since one page can link to other pages explicitly.

Links between volunteered information are instead created by the relationships between the context of a document.

Page 15: Searching for patterns in crowdsourced information

The context of a document is made of the surrounding circumstances and facts that influence the meaning of a sentence, a passage, or even just a picture, a video or an audio file.

Understanding the context is the key point towards understand the semantic of a document and hence how much valuable information is actually contained in it.

Defining context..

Page 16: Searching for patterns in crowdsourced information

Defining context..

Defining context hence means trying to figure out what can be automatically inferred regarding:

- Where the document was created?- Who created the document and shared it?- What does the document describe?- When was it shared?

Page 17: Searching for patterns in crowdsourced information

Context is the key ingredient.

Context is then the ingredient that adds value to information.

If a document can be contextually linked to other documents it becomes more relevant.

It means more information can be inferred regarding that document.

Page 18: Searching for patterns in crowdsourced information

Which context?

Regarding volunteer information, five types of context can be identified for a given object:

1) personal, 2) social, 3) geographical, 4) temporal, 5) linguistic.

Page 19: Searching for patterns in crowdsourced information

A network model.

If context is interpreted as a property for a given object, we find out that at every level, each attribute will define a derived hierarchy in which an element “belongs” or is a “child” of another element higher or lower in the hierarchy.

Page 20: Searching for patterns in crowdsourced information

A network model.

Let's imagine the following - followed relationship in a social network..

John Stewart follows Dave Matthews and Stephen ColbertTim Reynolds follows Dave Matthews and Stephen ColbertStephen Colbert follows John StewartDave Matthews follows John Stewart and Tim Reynolds

Page 21: Searching for patterns in crowdsourced information

A network model.

Page 22: Searching for patterns in crowdsourced information

A network model.

Let's now concentrate on attributes for volunteered information.

Every attribute could describe a node in our system.

Every edge describes with which frequency (or probability) two attributes are most likely to appear together.

This behaviour can be particularly true for tags networks.

Page 23: Searching for patterns in crowdsourced information

A network model.

Such a model consist hence of N nodes, connected with probability p between one another, creating a graph with approximately p N (N-1) / 2 edges distributed randomly.

This is what is called a random graph model, and it is among the most used models in complex networks theory.

Page 24: Searching for patterns in crowdsourced information

Small world networks.

It is agreed that the relationships between a node and another in such networks it is not entirely random, but displays some hints of the underlying organizing principles.

One of such principle is the small-world concept, which describes how despite their often large size, in complex networks there is a relatively short path between any two nodes (Watts, D. J., & Strogatz, S. H., 1998).

Page 25: Searching for patterns in crowdsourced information

Properties of small world networks.

A common property of such networks is that the relationships between the nodes tend to form cliques.

Cliques may represent circle of acquaintances at a social level, they can even describe all the users of an online community that tend to communicate together, or they can describes relationships between words in different documents.

Page 26: Searching for patterns in crowdsourced information

Properties of small world networks.

Another important aspect of complex networks to better understand their properties and dynamics is the degree distribution, i.e. a measurement of the number of edges at a given node in the network.

In fact, we would expect that not all nodes in the network would have the same node degree, but this would be characterized by a probability distribution function P(k), which give the probability that a randomly selected node has exactly k edges.

Page 27: Searching for patterns in crowdsourced information

Where to go from here?

Page 28: Searching for patterns in crowdsourced information

Search and Quality Ranking.

In Page and Brin PageRank algorithm the Rank of a node in the network (i.e. a web page), could be calculated as follow:

Page 29: Searching for patterns in crowdsourced information

Search and Quality Ranking.

Where Bi is the set of documents connected to i, R(i) is the rank of the given document i, R(j) is the rank of a document j connected to i, and N(j) is the number of connections from j.

Page 30: Searching for patterns in crowdsourced information

Search and Quality Ranking.

Both the local clustering coefficient and the degree distribution for a given node in the network give an estimate of how much a given node is connected to other nodes nearby.

Because the model used is built on the document context, more connections are therefore an indication of a richer content and a better quality of the information contained in the document itself.

Page 31: Searching for patterns in crowdsourced information

Privacy and Security.. just some food for thoughts.

We said that a common property of small world networks is that the relationships between the nodes tend to form cliques.

What if this could be applied to the rules in a stateful firewall?

What if we want to find out which data we are most likely to share with which people on a social network?

Page 32: Searching for patterns in crowdsourced information

Questions and Answers.

?