Kinect Hacks for Dummies Tomoto Shimizu Washio [email protected] Twitter: @tomoto335e (en) / @tomoto335 (ja) Rev 1: 3/11/2011 (JTPA Geek Saloon) Rev 2: 6/12/2011
May 15, 2015
Kinect Hacks for Dummies
Tomoto Shimizu Washio
Twitter: @tomoto335e (en) / @tomoto335 (ja)
Rev 1: 3/11/2011 (JTPA Geek Saloon)
Rev 2: 6/12/2011
Table of Contents
Introduction
◦ Who is the author?
◦ Overview
◦ Kinect basics
Chapter 1: Tech Side
◦ Hardware/software preparation
◦ Hacks and tips
Chapter 2: Biz Side
◦ Original intention and actual feedback
◦ Video view analyses
◦ What else happened
Who is the Author?
1973
1979
1985
1991
1997
2003
2009
CPU able to speak with
MN1610
Z80
SC62015
x86, ARM,Other RISCs
BASIC
Programming language able to speak
LISP
Perl
C VB C++Java Tcl/Tk
RubyC#Haskell
ActionScript
Elementary school
High school
Part-time programmer 30%Guitar player and composer 65%University student 5%
Hitachi
California againat Hitachi Data Systems!
Born in Japan
Javascript
Python
Californiaat Stanford!
Golden age!
Posting original games to computer magazine
and getting money
First programming in assembly language
Software product R&D• Large-scale OO design• Middleware• HCI• System management• User-centric design• …
Where/what I was
Overview (1)
What is this presentation all about?◦ My Kinect Hacks as holiday project:
http://code.google.com/p/kinect-ultra/http://code.google.com/p/kinect-kamehameha/
◦ How much fun Kinect hacking could be
What is Kinect Hacks?◦ Creating your own cool stuff using Kinect, motion sensing gaming
system for Xbox360
When and how I started Kinect Hacks?◦ On Dec 2011, a month after Kinect’s release
◦ From my friend’s tweet about Kinect and Kinect Hacks
◦ Me: “Wow, I’ve gotta do this! I don’t mind spending all my Winter Holidays!”
Overview (2)
What is interesting with my Kinect Hacks?
◦ Intensive crash project
Major part(*) done in a week (before my wife runs out patience)
No special knowledge of motion detection or 3D CG at the beginning
◦ Challenge for “the silliest thing ever”
Me: “I’ll take my hats off to the smart hacks created by the brilliant people all around the world. Then, I’ll create something silly nobody ever thought of, dedicating the best of my intelligence, energy, and CPU & GPU power. It must be a fun!”
◦ Got unexpectedly huge response from public
Huge views in YouTube & Nicovideo (300k in 1st week)
Appeared on news blogs, newspapers, TV, and other media
Contest-awarded
Contacted by investor for commercialization
…
(*) kinect-ultra V1 that earned largest public response
Overview (3)
What you may learn today
◦ How to start cool Kinect Hacks by yourself
Chapter 1: Tech Side
◦ Some hints for a geek to make a “hit” (Well, I hope so)
Chapter 2: Biz Side
Disclaimers
◦ I am totally amateur for image recognition, motion detection, and 3D CG
◦ I know only things interesting and/or necessary for me
◦ I do not care much for academic accuracy (Be careful I may be lying)
◦ I am a geek but not a business person
Kinect Basics (1)
What is Kinect actually?
◦ Gaming system for Xbox360 that enables intuitive and natural game play without controllers
◦ Released at Nov 2011
What is Kinect Sensor?
◦ Input device with RGB camera, IR depth sensor, and some other auxiliary sensors
640x480@30fps, 1280x1024@10fps(*)
Internals developed by PrimeSense
◦ Connectable to PC via USB
Drivers and libraries available for free
◦ In this presentation, “Kinect” refers to Kinect Sensor
(*) With Avin’s Windows driver
Kinect Basics (2)
What can you do with Kinect? Generally speaking…
Kinect provides color of and distance to the
object for each pixel
3D object recognition by PC
Skeleton recognition by PC (So you will get 3D
positions for each joint)
RGB camera + Depth sensor
Don’t you see you can build any cool
stuff on this?Let’s hack!
Far
Very Far
Near
Chapter I: Tech Side
This chapter explains the nuts-and-bolts behind this crash project
◦ Like the tricks behind a magic, it’s nothing surprising once you get to know
◦ General mathematics (especially geometry) required
How much time did I spend?
◦ Study: 3 days
◦ kinect-ultra: 7 days (for V1) + 2 days (for V2)
◦ kinect-kamehameha: 1 day (for V1) + 1 day (for V2)
I think I should count “night” rather than “day” actually
Got huge public response for this
Hardware Preparation
Kinect, of course!
◦ Caution: buy standalone, but not Xbox-bundle
Xbox-bundle does not have the adapter for USB connector
Windows PC
◦ With fairly fast CPU and GPU
The more powerful your hardware is, the more energy you can use for cool essential stuff rather than performance optimization
Mine: Core i7 2600 + GeForce GTX 285
◦ How about Mac and Linux?
I am not so familiar, but probably Windows is safer because of good driver support(*) and Microsoft’s SDK in the future
You don’t need Xbox
(*) Avin’s Windows driver can automatically calibrate RGB camera and IR depth sensor, but I was not able to find the same feature in Linux drivers when I tried. It could be better now.
Software Preparation (1)
OpenNI + NITE + Avin’s SensorKinect
◦ Basic software component set for sensor information access and recognition algorithms
OpenNI: Framework
NITE: OpenNI-compatible implementation
Avin’s SensorKinect: OpenNI-compabitle Kinect driver
◦ Advantages to other options (such as OpenKinect)
Released by PrimeSense
Player recognition and skeleton tracking available out-of-the-box!
Actually, this was the key success factor for me to get this project done so quickly without any special knowledge about motion recognition
Auto calibration between RGB camera and IR sensor
Thanks to Avin for nice driver implementation
◦ In this presentation, “OpenNI” refers to all of these software components as a set
Software Preparation (2)
OpenGL support libraries
◦ Chose OpenGL for my first 3D API to learn
◦ Just followed “OpenGL SuperBible 5th Edition”
Standard support libraries (e.g. freeglut)
Original library in this book (GLTools)
Others
◦ OpenCV
Only used for reading image files and Gaussian random number
Hack 0: Study with Sample Programs
Study for 3 days before starting kinect-ultra
◦ Surveyed both OpenKinect and OpenNI, and chose latter
◦ Learned basic pixel information access and OpenGL usage from OpenNI’s sample programs
First practice piece: depth-aware delayed-overlay
See “Algorithm March by Kinect” http://www.youtube.com/watch?v=j4ABDmFhkgA
Hack 1: Transformation
Use “calibration complete” event to trigger transformation
◦ Calibration by “psi pose” is common for Kinect apps to start skeleton tracking
◦ “Something happens on calibration complete” is Kinect-ish entertainment
Modulate color of player area to represent the superhero suit
◦ OpenNI reports “hey, this pixel seems a part of player #1” so the app easily knows which pixels should be modulated
◦ Switch color (red or gray) for each pixel based on its distance from head
App can calculate Euclid distance between any pixels/joints in real world coordinates
It is slow, however; some optimization is required
◦ You: “Isn’t it too rough?” Me: “Well, that’s OK, this is meant to be funny after all!”
Skinning should be ideal, but too serious and challenging
Ψpsi pose
TIP: A Bit about Coordinate Systems
Kinect coordinates
Each XY plane(0, 0)~(640, 480)
0
Real world coordinates• Skeleton positions from OpenNI•Virtual 3D polygon objects
Transformed by OpenNI API(a little slow)
10000~
• Raw pixel & depth datafrom Kinect
OpenGL coordinates
1.0
0.0
Z-buffer(Non-linear)
Each XY plane(-1.0, -1.0)~(1.0, 1.0)
• Raw vertex & pixel datafor OpenGL
Projected by OpenGL API
Depth(seems linear)
XY plane
Z
Hack 2: Detect Pose Shoot Laser
No motion detection, only pose detection!
◦ Calculation is tremendously easy without time derivative
◦ Once the positions of skeleton parts are given, elementary vector operations (distance, dot product, cross product) work very well
◦ Try and error to decide good parameters (e.g. thresholds)
Spawn laser while pose is detected
◦ Laser is flat rectangle object in 3D space with alpha texture, and laid over image from RGB camera
◦ Position/direction/initial velocity calculated from the pose
Same approach for shooting Eye Slugger
◦ With an additional stability check
Hack 3: Hidden Surface Processing
Place each pixel from Kinect as point object in 3D space
◦ Not texture mapping
◦ So pixels and other 3D objects hide each other
Handle pixels in projective coords for good performance
◦ 3D objects basically reside in real world coords, but mapping all pixels into real world is too slow
◦ Instead, directly map pixels from Kinect coords to OpenGL raw coords by transforming depth value to OpenGL Z-buffer value
◦ See next page, it was a hack
TIP: Fast Depth Transformation
Kinect coordinates
Each XY plane(0, 0)~(640, 480)
0
Real world coordinates• Skeleton positions from OpenNI•Virtual 3D polygon objects
10000~
• Raw pixel & depth datafrom Kinect
OpenGL coordinates
1.0
0.0
Z-buffer(Non-linear)
Each XY plane(-1.0, -1.0)~(1.0, 1.0)
• Raw vertex & pixel datafor OpenGL
Projected by OpenGL API
Depth(seems linear)
XY plane
Z
Direct transformation from Kinect’s depth value to OpenGL Z-buffer value
is much faster! Some hacking was needed to figure out the formula.
Transformed by OpenNI API(a little slow)
Uniform everything into real world makes the logic easier,
but slow.
Hack 4: Hit testing
Hit-test between lasers (= rectangles in 3D space) and image pixels (= points in 3D space), and convert lasers into sparks
◦ Impractical to check the distance between all the objects
◦ Instead, divide the real world space into coarse 1-bit voxels, and mark voxels that contain points
No distance calculation, just voxel look up is enough for hit testing
Mark voxels with down-sampled pixels
Marking voxels needs to be done in the real world coordinates thus slow
◦ Maybe inaccurate, but fun!
IR laser depth sensing works even in dark room
◦ http://www.youtube.com/watch?v=nvvQJxgykcU
◦ Cast random dot pattern and analyze parallax
TIP: How Kinect Works in Darkness?
(capture from above URL)
Drawing white circle does not look light ball at all…
Instead, brighten surroundings as per distance from light ball center
You feel dazzling light and heat! (Thanks to human illusion)
Use approximation because real Euclid distance calculation for all pixels is slow
Calculate “pseudo” distance in projective coordinates (with tweaking Z value a bit)
Try and error to decide how to modulate brightness by pseudo distance
Not 100% scientific and realistic, but good enough and, most importantly, fun!
Hack 5: Light Ball
Hack 6: Energy Wave (1)
Represented by long-stretched polygon sphere
Decide transparency by dot product between normal of polygonal surface and sight vector (for nebular effect)
◦ Solid around center, transparent around edge
◦ Implemented by GLSL (shading language)
Although it was first time for me to work on this language, it’s done in about 30 minutes by tweaking a sample code in a book
Add random fluctuation to normal (for misty/swirly effect)
◦ Accidentally discovered from bug
Hack 6: Energy Wave (2)
n
v
Simple Reflection
rgb = rgb·(n·v / |v|)k
(sight vector)
(normal)
Nebular Effect
a = (n·v / |v|)k
After a quick tweak…
Add random fluctuation to the normal to make the transparency
roughly modulated by position and time. This makes the energy
wave look misty or swirlyAct as transparency
Act as brightness
Hack 7: Hair!
Secret formula to model the hair
◦ O = center of head, P = each pixel on player’s border near and above O
◦ Render narrow triangle from P to the direction of OP with length of n|OP|where n is a simple linear saw-wave function of rwhere r is the angle of OP against the horizon
Add some repulsion against energy ball
Randomly blend graded yellow (for “goldish shine” effect)
Everything is calculated/rendered in 2D on projective plane
◦ Easy and unrealistic, but cartoonish and funny
O
P
n|OP|n = simple linear saw-wave function of r
Player’s border
0r
π/2
n
r
Chapter 2: Biz Side
Got unexpected huge response to uploaded video
Maybe able to read some hint for a geek to make a “hit”…
What Did I Intend Actually?
Absolutely no intention to be “successful”, but had other clear intentions which might be eventual success factors◦ Desire to be in the same line as other Kinect Hackers
◦ Must be differentiated -- useless, nonsense, and never-seen
◦ Must be quickly done Before real game studios publish their serious work
Before someone else (as crazy as myself) shoot lasers
◦ Completeness of entertainment First created laser shooting only (in 2 days), then added other features one
by one till satisfied with “completeness”
Motivated by “hey, this idea is too good! I couldn’t finish without it!” Transformation, hidden surface, hit testing, Eye Slugger, timeout, flying out, …
◦ Targeted at worldwide Created videos in both Japanese and English, and uploaded them to both
YouTube and Nicovideo (Japanese video site)
Creating only for one community would mean not to welcome the other
Examples of unexpected feedback
It’s for kids!
◦ “My kid keeps PC and never leaves.”
◦ “When my kids and I play heroes and bad guys, they identify themselves with the heroes in their mind. If they can actually become the heroes out of their imagination, it will be wonderful.”
It makes my dream come true!
◦ “I wanted to do this since I was a kid.”
◦ “The kid’s part of me says ‘Look! He transforms! I wanna do it!’ and drown out my adult’s words.”
Me: “I did not mean it at all. I just tried to be silly and funny. But, it is definitely a pleasure to see people get excited about the future of the technology demonstrated by this.”
0
100,000
200,000
300,000
400,000
500,000
600,000
Total Views
0
20000
40000
60000
80000
100000
120000
140000
Views/day in first two weeks
Nico (ja)
YT (ja)
YT (en)
Video View Analysis of kinect-ultra
Exploded within 24 hours and reached to 300k in a week
◦ More discussion in next page
Japan heats up and cools down very quickly while worldwide seems a little slower
Forgotten while nothing happens, and remembered by occasional events
Nicovideo-award nominee
Explosion
Hypothesis of explosion mechanism
Interesting to think how access could grew up so largely and rapidly
Hypothesis: multistage explosive chain reaction among video, tweets, and news sites(*)
Is it possible to make it happen intentionally?Not sure, probably very difficult
Stage 1 (~10h)
• Maniac communities first notice the video, and start tweeting• Views and tweets increase slowly
Stage 2(~20h)
• Number of tweets penetrates some threshold• News sites notice it and post articles (independent blog sites first and then major news sites such as Yahoo! News)• Views and tweets rapidly increase by positive feedback effect
Stage 3 • Number of views penetrates some threshold and ranks in most popular videos• Feedback effect even more accelerated
Cool down (48h~)
• Tweet cools down and feedback effect stops gradually
(*) My colleague tracked the public activity and came up with this hypothesis. Great job of him.
Video View Analysis of kinect-kamehameha
No explosion
◦ Got many views at first in Nicovideo (more than ultra in fact), but did not fuse explosion
◦ Probably insufficient impact to make them tweet and penetrate the threshold
Sustainable popularity from worldwide more than Japan
◦ From DBZ fans in the world? Most views come from Brazil
◦ Sporadic jump up – don’t know what is happening
0
20,00040,000
60,00080,000
100,000120,000140,000160,000180,000200,000
Total Views
0
20004000
60008000
100001200014000160001800020000
Views/day in first two weeks
Nico (ja)
YT (ja)
YT (en)
What else happened (1)
Appear on media
◦ Blog, news, and tech review sites
◦ Papers and magazines (e.g. Japan Times)
◦ TV shows (e.g. NHK BS1/2 in Japan)
◦ Net casting (in Japan and France)
◦ For more information:http://code.google.com/p/kinect-ultra/wiki/Articleshttp://code.google.com/p/kinect-kamehameha/wiki/Articles
Public demos and presentations
◦ 3D Vision & Kinect Hacking Meetup
◦ JTPA Geek Saloon
◦ Maker Faire (Thanks to Matt Bell for involving me)
◦ Campus Party (Did not make it, though)
What else happened (2)
Win and nominated for awards
◦ Matt Cutt’s Kinect Contest Winner
◦ Maker Faire 2011 Bay Area Editor’s Choice Winner
◦ Nicovideo Award 2011 Spring Nominee
Other interesting contacts from
◦ Other hackers, of course!
◦ Investors
◦ Artists (who wanted to use the video in his art work)
◦ 3D modelers (who kindly contributed Eye Slugger model)
Thanks! Any questions?