Xuan Bao and Romit Roy Choudhury Mobicom 08 ACM MobiHeld 2009 VUPoints: Collaborative Sensing and Video Recording through Mobile Phones
Jan 11, 2016
Xuan Bao and Romit Roy Choudhury
Mobicom 08ACM MobiHeld 2009ACM MobiHeld 2009
VUPoints: Collaborative Sensing and Video Recording through Mobile Phones
VUPoints: Collaborative Sensing and Video Recording through Mobile Phones
Context
Next generation mobile phones will havelarge number of sensors
Cameras, microphones, accelerometers, GPS, compasses, health monitors, …
Each phone may be viewed as a micro lens
Exposing a micro view of the physical worldto the Internet
Context
With 3 billion active phones in the world today
(the fastest growing comuting platform …)
Our Vision is …
Context
Internet Internet
A Virtual Information Telescope
Our Prior Work
Micro-Blog [MobiSys08]Intantiate this vision through mobile blogs Sensor querying Participatory responses
Virtual TelescopeVirtual Telescope
Cellular,WiFi
Cellular,WiFi
WebService
WebService
PhonesPeople
Motivating Current Work
Several research questions … Of which, one arises frequently
Which information is of interest? Humans already with a high noise level in life
Can such information be distilled out? Based on a notion of “human interest”
Can this be done automatically? Exploiting rich sensing, computing, communication
capabilities
This Work: VUPoints
Early effort towards information distillationin a restricted application space
Asks the question:
Can phones identify “interesting” events in a social occasion and “record” them automatically,
through mobile phones …
… creating a highlights of the occasion
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
VUPoints
Envisioning the end product Imagine a social party of the future Assume phones are wearable The goal is to get a 10 minute video highlights of the
party Without human intervention
The idea Mobile phones sense the ambience Collaboratively infer an “interesting event” Trigger a video recording on the phone with a good view Finally, stitches all the clips to form the highlights
Event Coverage
Several sensing opportunities to detect events
Birthday cake comes … everyone turns to the table Compass orientations from collocated phones suggest event
People dance to music Accelerometers and microphone observe simultaneous
activity
People laught and clap at jokes Acoustic signatures match, triggering video recording
… etc.
Another Perspective
Video highlights is = social event coverage Analogous to spatial coverage in sensor networks
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Applications
Personal travel blogging Phone identifies and automatically clicks photos/videos
Smart, distributed surveillance
Don’t miss your baby’s first crawl, laugh, talk … Get a highlights of important events in the office
Multi-view vision Watching a basketball game from multiple viewpoints
ArchitectureIdentify multi-modal event triggers Video record
from Phone with best view
Video recordfrom Phone with best view
Design Challenges
Social grouping Which phones form the same social group? Not necessarily spatial
Trigger detection Characterizing social interest to measurable triggers Dictionary of sensor triggers (collaborative)
View selection Which phone in the best view What is “best”? … what is “view”?
Social Grouping
Acoustic High frequency ring tones Phones grouped based on
audible ringtones
Light Intensity sensed through camera 3 intensity bands
Bright Regular
Dark
Social Grouping
Similar views Multiple phones looking
at the same object
Exploiting spatiograms Equation
Event Trigger Detection
Simultaneous orientation changes Large number of people may rotate
towards the birthday cake, or towardswedding speech …
Ambience fluctuations Noise floor might increase Light intensity might change New signatures detected …
Event Trigger Detection
Acoustic signatures Laughter, clapping, whistle,
screaming, singing …
View Selection
Current prototype activates all phones In the same social group “Best view” manually selected later
Building and Experiment Setting
Nokia N95 phones + Nokia 6210 Python + Symbian C++ MATLAB Basic SVM libraries (for acoustic signature classification)
Artificial social gathering of students 5 students taped phones on shirt pocket Gathered in a group
Chatting Watching movies Playing video games
Methodology and Metric
One dedicated phone for complete recording Other phones run VUPoints Form groups, search for triggers (time-stamps them) Triggers used to select “interesting” clips (offline) Clips stitched to form highlights
Metric Manually identified logical events from original video
Identified events labeled
VUPoints identifies 10 second events Observe the overlap in the events Observe detection delay Compute time-window overlap
Results
Trigger detectionexamined Time of detection Accuracy
(A third person identified interestingevents … VUPointswas matched againstthese events)
Limitations
VUPoints in very early stage Event space detected is limited
Difficult to identify “what humans perceive interesting events” Aggressive event detection --> false positives
Cameras in poor view of the event (even if wearable) Occlusions in front
Ongoing work Exploring larger set of triggers Offload the central server -- some tasks at the phone Energy efficiency of phones Possibility to combine with other devices
Wall mounted cameras, webcams, laptop microphones, …
Conclusion
Mobile phones becoming capability-rich Exploiting them as sensor network
Different from existing mote-based networks Human centric Complex objectives, often subjective New kinds of problems
Developing VUPoints: A collaborative framework for ambience sensing
and video recording
Many challenges … only scratching the surface
Thanks
Visit Systems Networking Research Group (SyNRG)@ Duke University
Google “synrg duke”
Context
Sensing, Computing, Communication Converging on the mobile phone platform
Combined with density, humans Forms a capability-rich platform
Question is … What can we do with it? Where are the opportunities?
Mobile phones equipped with multiple sensors Camera, microphone, accelerometer, compass …
Almost everyone carrying a phone Almost 3 billion phones today
Drawing Parallels
One way to view this is like a sensor network With more powerful capabilities
But more important distinctions are … Human interfacing / participatory Human scale density Human mobility Personal
In view of this, we are viewing the phone platform as A sensor network for human applications
Rough
A stripped down version Simplify the context as a starting step
We motivate with a use-case: Imagine you go to a party. The goal is to get a 10 minute video highlights …
New Research Pastures
Many new problems In the context of humans
The translation is like: Localization Coverage Energy efficiency Security Privacy
Content Sharing
Virtual TelescopeVirtual Telescope
Cellular,WiFi
Cellular,WiFi Visualization ServiceVisualization Service
Web ServiceWeb Service
People
Physical SpacePhysical Space
Phones
Content Querying
Virtual TelescopeVirtual Telescope
Cellular,WiFi
Cellular,WiFi Visualization ServiceVisualization Service
Web ServiceWeb Service
PhonesPeople
Physical SpacePhysical Space
Some queries participatoryIs beach parking available?
Some queries participatoryIs beach parking available?
Others are notIs there WiFi at the beach café?
Others are notIs there WiFi at the beach café?
SyNRGSyNRG
Demo Setting
The experiments involved 4 users, pretending to be in different types of gatherings. Each user taped a Nokia N95 phone near his shirt pocket. The N95 model has a 5 megapixel camera, and a 3-axis accelerometer. Two of the users also carried a Nokia N6210 in their pockets--the N6210 has a compass that the N95s do not have.
The user-carried phones formed social groups and detected triggers throughout the entire occasion. The occasions were also continuously video-recorded by a separate phone. At the end, all sensed and video-recorded data (from all the phones) were downloaded, and processedin MATLAB.
The triggers were identified, and using their time-stamps, a 20-second video-clip for each trigger was extracted from the continuous video file. Allthe clips were then “stitched” in a chronological manner.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
VUPoints: An attempt to collaboratively sense andvideo record social events through mobile phones