Piano Playing Lego Mindstorm NXT Robot with iPhone Alvin Liao Ronald Orozco Group 2 April 30, 2012 Abstract In the 21st century mobile devices have made an enormous presence around the world where an estimate of two thirds of the world’s population uses one. Not only has the number of mobile users increased exponentially, so has the features and computing powers of the mobile device. As mobile devices are ubiquitous and have the capabilities to run complex applications, it is possible to incorpo- rate robots and mobile devices together. The opportunities to develop robotic based mobile applications for the public is both welcoming and encouraging. The vision of this project is to develop a mobile robotic system consisting of the Lego Mindstorm NXT and incorporating computer vision through the Apple iTouch camera. Objective The concept of this project is to develop a piano playing robot using the con- cepts of computer vision. Using computer vision is not the most practical way to develop a piano playing robot compared to a robot that can play based off the predetermined location of the piano. However by using computer vision the robot will imitate the methods used by humans to play the piano. Therefore the objective of this project is to develop a robot that will simulate a human mind playing a piano. The functions the robot will perform includes distinguishing notes, recognizing piano keys, and playing the piano. In terms of distinguishing notes, the robot will be able to analyze and compare the raw data of notes in order to interpret 1
47
Embed
Piano Playing Lego Mindstorm NXT Robot with iPhonecrose/capstone12/entries/PianoPlay_Final.pdf · Piano Playing Lego Mindstorm NXT Robot with iPhone Alvin ... to develop a piano playing
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Piano Playing Lego Mindstorm NXT Robot withiPhone
Alvin LiaoRonald Orozco
Group 2
April 30, 2012
Abstract
In the 21st century mobile devices have made an enormous presence around theworld where an estimate of two thirds of the world’s population uses one. Notonly has the number of mobile users increased exponentially, so has the featuresand computing powers of the mobile device. As mobile devices are ubiquitousand have the capabilities to run complex applications, it is possible to incorpo-rate robots and mobile devices together. The opportunities to develop roboticbased mobile applications for the public is both welcoming and encouraging. Thevision of this project is to develop a mobile robotic system consisting of the LegoMindstorm NXT and incorporating computer vision through the Apple iTouchcamera.
Objective
The concept of this project is to develop a piano playing robot using the con-cepts of computer vision. Using computer vision is not the most practical wayto develop a piano playing robot compared to a robot that can play based offthe predetermined location of the piano. However by using computer vision therobot will imitate the methods used by humans to play the piano. Therefore theobjective of this project is to develop a robot that will simulate a human mindplaying a piano.
The functions the robot will perform includes distinguishing notes, recognizingpiano keys, and playing the piano. In terms of distinguishing notes, the robotwill be able to analyze and compare the raw data of notes in order to interpret
1
Figure 1: The system consist of a iTouch and Lego NXT Robot. The robotis constructed using three motors, two for movement and one for pressing thepiano key. The iTouch is mounted on top of the robot and has the camera facingdirectly above the piano keys.
the data into robot movements. The robot will be able to identify which pianokey it is currently in front of and actively search for the next requested pianokey. To accomplish these objectives the project will implement Bluetooth andOpenCV.
Procedure
The project can be broken down into four main stages, creating the iTouch workenvironment in XCode with BTStacks and OpenCV, establishing network connec-tion between Lego Mindstorm NXT and Apple iTouch, retrieving and recognizingthe physical features on a piano, and determining the location of the next piano
2
key to be played.
XCode with BTStacks and OpenCV
Developing on the iTouch with iOS 5 requires the usage of a Macintosh installedwith at least Mac OS Lion 10.7.1 and XCode 4.1. The purpose of using the BT-Stack framework library is to develop a Bluetooth connection between the iTouchand Lego NXT. It is important to note that Lego NXT is not an authorized Appledevice and therefore cannot use Apple’s Bluetooth library. BTStack is not anapproved library by Apple and therefore applications using BTStack will not beaccepted in the App Store. To bypass this problem, a jail broken iTouch installedwith iOS5 is necessary. Through jail breaking methods, the BTStack library isdownloaded on to the iTouch which will enable the iTouch for Bluetooth devel-opment with the Lego NXT.
The application implements the iTouch camera, BTStacks framework library,and OpenCV framework library. The iTouch camera can be accessed using theUIKit, AVFoundation, CoreGraphics, CoreVideo, and CoreMedia framework li-braries. These framework libraries are already included with XCode and do notrequire external files. The BTStack framework library contains the functionsused to establish Bluetooth connections. The OpenCV framework library con-tains the functions used to process the images. Both the BTStacks and OpenCVframework libraries need to be added to the framework library in XCode exter-nally. The OpenCV framework library used for development was built by AzizBaibabaev from Aptogo. This version of OpenCV supports iOS 5 and the armv7architecture. In addition the OpenCV framework library requires the applicationto be developed in Objective-C++ and not Objective C. Minor differences arethe file extension name (.mm for C++ and .m for C) and strict coding structureregarding the placement of include/import statements.
Since the application requires the usage of the iTouch camera and Bluetoothtechnology, it is not ideal to run and build a simulation on the Macintosh com-puter. To test and build the application on the iTouch, an Apple Developeraccount and license is required. If the build is successful, the iTouch shoulddisplay the application on the screen.
Bluetooth Connection
For the Bluetooth portion, the project will be using the RFComm and HCI pro-tocols from the BTStack Library. The Bluetooth connection is initiated by theiTouch while the Lego NXT is in standby mode awaiting a request from theiTouch. The first set of packets sent are part of the HCI protocol which deter-mine the requirements for the connection. The HCI protocol is used to search for
3
the Lego NXT device and establish the initial connection between the iTouch andLego NXT. During the connection process, the iTouch may be required to enter aPIN number which is set to ”1234” by default on the Lego NXT. Afterwards theiTouch will attempt to open a channel which will be used to transfer packets tothe Lego NXT. When the connection is set, the iTouch will be able to send pack-ets consisting of bytes to the Lego NXT. The purpose of the RFComm protocolis to send the data packets to the Lego NXT. The direct transfer of packets mustuse the RFComm protocol as required by the Lego NXT Bluetooth capabilities.The packets sent consist of hexadecimal values. For the sake of simplicity thesystem is design such that only the iTouch sends packets to the Lego NXT whichwould receive the packets. In addition there are three basic functions that couldbe called, goLeft(), goRight(), and play(). These commands control the motorsA, B, and C connected to the Lego NXT brick.
goLeft() - sends packet command that activates the A and B motors to move30 degrees per second to the left
goRight() - sends packet command that activates the A and B motors to move30 degrees per second to the right
play() - sends packet command that activates the A and B motors to stop anymovement, activates the C motor to drop 70 degrees, pause for 1 second, andraise 70 degrees
It is required that the iTouch’s Bluetooth is disabled at the start in order for theconnection to be successful, otherwise the application will stall. This incident oc-curs when the application is not terminated correctly. The Bluetooth connectionis safely terminated when the iTouch sends an exit packet to the Lego NXT. Inthis case the channel will close and the Lego NXT will terminate the LeJOS pro-gram. The iTouch will indicate the Bluetooth connection has terminated whenthe Bluetooth logo is inactive. Other ways to terminate the connection is toforcefully disrupt the connection by shutting off either the iTouch or Lego NXTdevice. In this case the other device will acknowledge that the connection hasbeen severed and thus close the Bluetooth connection.
Piano Key Recognition
To identify and distinguish the piano keys, the robot takes notice of the placementof the black keys on the piano. This method is similar to the method use byhumans to identify piano keys. The feature on the piano is that there are twosets of black keys, one set contains two while the other set contains three. Baseon this feature, the robot will divide the image into seven regions of interests.To do so, the cvSetImageROI function is used to change the view of the source
4
image to be the region specified by the defined rectangular area. The image getsreset to the original state with cvResetImageROI and proceeds to repeat theprocess with cvSetImageROI function. Since the piano is conveniently coloredin black and white, the robot tracks for black pixels in the image. The imageis converted to grayscale to give the image pixel values relevant to the colorintensity of black. To determine whether a region contains a black key, theimage is converted into binary image using an inverse threshold with cvThresholdand thresholdType=CV-THRESH-BINARY-INV. With CV-THRESH-BINARY-INV, a maximum value is defined for a pixel to be a black pixel. The maximumvalue is the set pixel values if the conditions are satisfied, else the value is set tozero. Within the region, all nonzero values are counted using countNonZero todetermine the percentage of coverage over the dimension of the region of interest.If the region is identified to be black, the value assigned is 1. If the regionis identified to be white, the value assigned is 0. In the case where the blackkey may be in between regions, both regions containing the black key will beconsidered. If both are mostly black, the first region will be assigned the value 1,otherwise 0. The seven regions produce a signal that represents a key and eachkey has a unique signal assigned to it.
Figure 2: The table defines the piano keys C,D,E,F,G,A,B and their expectedsignals represented in binary form. The 1 binary or high signal represents a blackkey. The 0 binary or low signal represents a empty spot. The X binary or do notcare signal indicates the value will not affect the detection of the piano key.
Piano Key Location Tracking
The robot is able to make intelligent decision regarding the movement it mustmake to get to the next note. Aside from recognizing which key is being read,the robot must also track which octave on the piano it is currently looking at.Normally on a 61-Key Keyboard, one would first start at middle C or C3. Tosimulate the same process a human when use when playing a piano, the robot
5
Figure 3: The picture shown is the iTouch camera view in debug mode. In debugmode the regions are defined by the red lines as well as the region’s signal value.The piano key detected is G as shown in the bottom of the screen which matchesthe binary output of 0110111.
Figure 4: The picture shown is the iTouch camera view in debug mode. In debugmode the regions are defined by the red lines as well as the region’s signal value.The piano key detected is E as shown in the bottom of the screen which matchesthe binary output of 1101001. Notice in this scenario that the X value is whereone of the black key is in between the regions.
will assume that it would be starting in the third octave. The robot will keeptrack of the octaves through two keys, B and C. If the robot is moving right andlocates a C, the robot will recognize that it has jumped to the next octave, there-
6
fore increasing the octave. If the robot is moving left and locates a B, the robotwill recognize the it has move down to the octave below, therefore decreasing theoctave. The process to the decision making is listed below:
1) The robot will identify what octave it is currently in and compare its cur-rent octave with the next note’s octave. If the octaves are not the same, therobot will move left if the next note’s octave is less than the current octave andmove right if the next note’s octave is greater than the current octave.
2) If the next note’s octave and current octave are the same, the robot willcompare the next note with the current note. The notes are listed with rankingas follows [C D E F G A B] where the right most note has greater value. If thenotes are not same, the robot will move left if the next note’s value is less thanthe current note’s value and move right if the next note’s value is greater thanthe current note’s value.
3) If the next note’s value and current note’s value are equal, the robot willstop all movement and proceed to play the note.
Discussion
Figure 5: The table is a comparison between the two methods used for piano keyrecognition in the project. The experiment tested the frames per second whichwould be used to measure the performance. Over a run of 10 trials the HistogramEqualization method had an average of 12.118 seconds per frame and the ColorThreshold had an average of 0.08 seconds per frame.
7
Figure 6: The graph shows a comparison between the two methods used for pianokey recognition in the project. The experiment tested the amount of correctdetection of the piano key C3 over the course of one minute. The value of 1signifies a correct detection. A value of 0 signifies a incorrect or unidentifiabledetection. The red line represents the Color Threshold method and the blue linerepresents the Histogram Equalization method.
Histogram Equalization and Canny Edge Detection
At first the robot implemented a Histogram Equalization and Canny Edge De-tection approach to distinguish the piano keys. To determine the location, theiTouch looks at the piano through a live stream of image frames, capturing framesand keeping them in a queue, if too much time is passing the image is droppedand a new image is used. First using cvEqualizeHist the histogram equalizationgets a higher contrast of the image, then using cvCanny the Canny Edge Detec-tion Algorithm is used and edges are created. The binary output of this functionis then used in the Hough Line Probability Algorithm cvHoughLines2 to createa list of potential lines. This line data is then determined with cvGet2D andfiltered out by keeping only lines with the following constraints:
- Must be close to a vertical slope- Must be on the upper half of the image
The purpose of these constraints is to determine the cluster of lines that represent
8
Figure 7: The picture taken is a iTouch camera view of the piano key using theHistogram Equalization and Canny Edge Detection technique. The lines beingshown in the picture are created using Hough Line Probability Algorithm. Asshown the lines are concentrated around the black key area. There are howeverother lines scattered in areas that are not black keys and therefore give false data.
the region of the black keys. These black key clusters is the key to determiningthe location of the robot from the view. By identifying the pattern of black keysfrom the white keys the key note can be determined on the piano. We thencompare the calculated note with the required note to be played from the sheetmusic analysis. If the note is correct it will play, else it will either move left orright depending on what the key it found was and what key it has to play.
This approach was not included in the final demonstration and was replacedwith a simpler method. The reason for this is the fact that this method takes somuch time to collect and analyze the information per frame, so the robot has tomove slowly to gather frames and figure out where it is going. As shown in theanalysis, the camera move at one frame per 12.118 seconds. As the robot consistsof movement, the histogram equalization and canny edge detection was not fitfor the project. In fact it is incomparable with the threshold method which couldalmost instantly detect the piano key at less than 0.1 seconds. The reason forthe large time could be explained by the amount of computing that needed to beaccomplished.
In addition the piano recognition was skeptical at times and not consistent. Thegraph shown in the experimental results clearly prove the inconsistency as it isnearly less than 10 percent correct at all times and has scattered data of de-tection. Compared to the Color Threshold method, the Histogram Equalization
9
seems very inefficient, unstable, and unusable. The Color Threshold method isable to maintain the piano key detection for a tremendous amount of time with-out losing focus of the detection. This proves that the Color Threshold methodis very reliable and has potential to detect keys in movement.
Limitations with Color Threshold
Figure 8: The chart shows the correlation between the robot movement speedand accuracy using the Color Threshold method. The experiment consisted ofthe robot playing Canon in D on the piano for 10 trials at the set speed. Therun is only considered successful if all notes were correctly hit.
Color Threshold is clearly a great method that can detect piano keys in a fixedenvironment. While the method is efficient in a fixed environment, Color Thresh-old has limiting performance with incorporated movement. The first experimenttested the maximum speed the robot can move with acceptable accuracy. Thefirst speed tested was 90 degrees per second. This speed proved to be too fast asthe robot was skipping frame in between keys and therefore mixing up octaves.Similar results came with speeds of 75 and 60 degrees per second. At 45 degreeper second, the number of frames the camera was reading matched somewhatperfectly with the robot movement. The only error came when the robot pressedthe wrong note which was right beside the correct note. At slower speed such as30 degree per second, there were very little error if not any. Although the robotis able to play somewhat successfully at 45 degrees per second, in reality this isvery slow. Even very simple music piece would require the robot to move at least90 degree per second if it were to reach G3 quarter note from C3. However sincethe robot is unable to move at fast speed without losing accuracy, the robot isnot at the stage where it can be used for practical purposes.
Future Work and Extensions
While the main goal of the project was accomplished, there are many ways toimprove the robot in terms of replicating an actual human. In the future, themain goals would be to improve the process in which it recognizes the keys toimprove the speed of analysis. This task is the most important because it directlyaffects timing of traversal and this timing would determine the tempo. Once the
10
process is sped up, then the robot should be able to play more notes by having amore improved detection algorithm. Then this can be taken a few steps furtherand have multiple NXT controlled and then split up the task between the tworobots and improve traversal time since it is covering only certain regions of thepiano and thus travel less to play notes by having either one NXT or the otherfind the appropriate key.
Current Trends in Robotics
Robotics and Mobile Devices
In terms of robotics involving mobile devices, there has been a growing trendin the development of application utilizing object recognition. The developmentcan be found in all fields ranging from commercial products, research, or military.Regardless of the fields, each development contributes to the mobility of robotics.
In terms of commercial product, Visipedia hopes to provide a image based searchfor the Wikipedia website using computer vision. The users will be able to takean image on their mobile device, send the image to a online database, processthe image and return the desired webpage based on the image sent. The projectis based on using web services and also implements OpenCV[5].
In the field of independent research, a group of students use everyday objectswith their robot, including an iPhone. Other devices such as Wiimotes were alsoused. The devices were connected through Bluetooth and acted as remote con-trollers for the robot. The iPhone additionally gave the user the vision of therobot so that the user could see what the robot is seeing [6].
The military has developed applications on the iPhone which acted as remotedevices for unmaned aerial vehicle (UAV) as well as implement computer vision.The iPhone again acts as a remote control and allows the user to see what theUAV is seeing. In addition the imaged processed would give information on thescene such as identifying objects [1].
General Robotic and Computer Vision
The people in Cochin University of Science and Technology are working on amethod of character recognition for handwritten characters. The challenge isthat characters of Indian scripts are difficult to recognize, especially those ofsouth India, which is their target goal. They go about it by first doing noise re-duction through Gaussian filtering, then they use thresholds to separate ink frompaper. They then use a method called ”skeletonization”, which thins out the bi-
11
nary regions to thin lines. They then segment the area by words and characters.They normalize all of the images to have a standard sized image using bilinearand bicubic interpolation. Then there is feature of extracting and classificationthrough classifiers. [2]
Oleg Kupervasser presents methods for recovering epipolar geometry from imagesof smooth surfaces. He developed 4 different methods of acquiring the geometricpoints. The first is ”illumination characteristic points method”. The second is”outline tangent points method”. These two methods are said to be very accuratebecause the illumination and outlines give small errors. The third method andfourth method ”are termed CCPM (curve characteristic points method, greencurves are used for this method on Figures) and CTPM (curve tangent pointsmethod, red curves are used for this method on Figures), for searching epipolargeometry for images of smooth bodies based on a set of level curves (isophotocurves) with a constant illumination intensity.” [4]
There are also people who are working on new ways of fingerprint recognition.One approach to this area of study was using by first doing a DW3 decompositionto the fingerprint, then for each level of decomposition, centre area features andcanny edge parameters are created. The concatenation of the levels of the fea-ture vectors of each level of decomposition is what represents the fingerprint. Thematching chance of the fingerprint is based on the threshold values and Euclideandistance.[3]
References
[1] The Navigation and Control Technology Inside the AR.Drone Micro UAV,Milano, Italy, 2011.
[2] John Jomy, K. V. Pramod, and Balakrishnan Kannan. Handwritten characterrecognition of south indian scripts: A review. CoRR, abs/1106.0107, 2011.
[3] D. R. Shashi Kumar, K. B. Raja, R. K. Chhotaray, and Sabyasachi Pat-tanaik. Dwt based fingerprint recognition using non minutiae features. CoRR,abs/1106.3517, 2011.
[4] Oleg Kupervasser. Recovering epipolar geometry from images of smooth sur-faces. CoRR, abs/1106.0823, 2011.
[5] P. Perona. Vision of a visipedia. Proceedings of the IEEE, 98(8):1526 –1534,aug. 2010.
[6] Pierre Rouanet, Fabien Danieau, and Pierre Y. Oudeyer. A robotic game toevaluate interfaces used to show and teach visual objects to a robot in real
12
world condition. In Proceedings of the 6th international conference on Human-robot interaction, HRI ’11, pages 313–320, New York, NY, USA, 2011. ACM.
Appendix
RobotMovement.java
import lejos.nxt.Button;
import lejos.nxt.Motor;
import lejos.nxt.LCD;
import lejos.nxt.comm.*;
public class RobotMovement {
public static void main(String [] args) throws Exception {
LCD.drawString("Press any button to start...",0,0);
Button.waitForAnyPress();
LCD.clear();
String connected = "Connected";
String waiting = "Waiting...";
String closing = "Closing...";
LCD.drawString(waiting,0,0);
LCD.refresh();
BTConnection btc = Bluetooth.waitForConnection();
LCD.clear();
LCD.drawString(connected,0,0);
LCD.refresh();
btc.setIOMode(NXTConnection.RAW);
while(true) {
byte[] buf = new byte[255];
int n = btc.readPacket(buf, buf.length);//dis.readInt();
int code = ((int) buf[0] & 0xff);
if(n != 0){
//exit
if(code == 0){
13
LCD.clear();
LCD.drawString("Exiting...",0,0);
LCD.refresh();
Motor.A.stop();
Motor.B.stop();
break;
}
//move left
else if(code == 1){
LCD.clear();
LCD.drawString("Moving Left...",0,0);
LCD.refresh();
Motor.A.setSpeed(45);
Motor.A.forward();
Motor.B.setSpeed(45);
Motor.B.backward();
}
//move right
else if(code == 2){
LCD.clear();
LCD.drawString("Moving Right...",0,0);
LCD.refresh();
Motor.A.setSpeed(45);
Motor.A.backward();
Motor.B.setSpeed(45);
Motor.B.forward();
}
//stop and play note
else if(code == 3){
LCD.clear();
LCD.drawString("Stoping...",0,0);
LCD.refresh();
Motor.A.stop();
Motor.B.stop();
Motor.C.rotate(-70);
Thread.sleep(100); //Hold note
Motor.C.rotate(70);
}
}
}
Thread.sleep(100); // wait for data to drain
LCD.clear();
14
LCD.drawString(closing,0,0);
LCD.refresh();
btc.close();
LCD.clear();
}
}
MyAVController.mm
#include <OpenCV/opencv2/opencv.hpp>
#import "MyAVController.h"
#define DEBUG_MODE 0
@implementation MyAVController
@synthesize captureSession = _captureSession;
@synthesize imageView = _imageView;
@synthesize customLayer = _customLayer;
@synthesize prevLayer = _prevLayer;
//------GLOBALS ---------
int currentOctave=3;
int direction=0;
int total[5]={8,3,130,52,26};
int amountOfMovements[5]={8,3,130,52,26};
bool onC=false;
bool onB=false;
//-----------------------
#pragma mark -
#pragma mark Initialization
- (id)init {
self = [super init];
if (self) {
/*We initialize some variables (they might be not initialized depending on what is commented or not)*/