Human Interface Studies for the Communication between Human and Systems Hideyuki Sawada Faculty of Engineering Kagawa University Abstract - The human interface technologies and their applications for computer intelligence and human-robot communication will be presented. Speech, vision and tactile sensations will be especially subjected with the recent results of speech signal processing, human motion tracking and humanoid robotics studies. Assistive technologies for supporting handicapped and aged people will be also introduced. 1. Introduction The importance of human-machine interface is widely acknowledged in accordance with the development of computers and associated technologies. The faster the processing speed becomes, the wider the gap between the computing ability and the accessibility of humans to the computer becomes. In human to human communication, we are able to communicate with each other by using not only verbal media but also the five senses which are vision, audition, olfaction, palate and tactile sensation. Information transmitted through the five senses is especially able to affect our emotions and feelings directly making for smooth communication. Whereas an employment of a conventional computer and systems is not suitable to sense the communication media employed in the human world. Although a variety of sensing devices and equipment has been developed for the measurement of physical data, the data processing ability of a computer is quite different from the one that humans have. If a computer is able to understand human communication media and reacts to humans as they do, it will open a new world where humans and computers live together in mutual prosperity. In this paper, the human interface technologies and their applications for computer intelligence and human-robot communication will be introduced. Speech, vision and tactile sensations will be especially subjected with the recent results of speech signal processing, human motion tracking and humanoid robotics studies. 1 1.1 Human Interface Studies Nowadays computer science has developed dramatically, and has widely contributed to the control engineering, theoretical calculations and amusements, not only in the expert field of engineering but also in daily human life. The theoretical aspect of computer science has been constructed on the basis of C. E. Shannon's information theory in the framework of the information processing, which contributed to the development of various software and hardware applications having procedural or grammatical structures. Based on the results of such applications to logical matters, computer sciences have extended their research targets into illogical matters such as human behavior, human mind and human artistic activities Human interface studies deal with the illogical aspects of human activities and present a new system, which would contribute to interface the gap between a machine and a human who uses or interacts with it, as shown in Fig. 1. 1.2 Speech, Vision and Tactile Sensations The author has been paying attention to the information transmitted through speech, vision, gestures and tactile sensations, and has constructed multimodal systems that deal the media as a human is doing. In the presentation, a talking robot which speaks and sings like a human, a robotic arm system which tracks a particular sound among multiple sounds, a gesture recognition system, a face tracking and recognition system, a tactile display which presents various tactile sensations will be introduced by showing the demonstrations. In this paper, the human interface technologies and their applications for computer intelligence and human-robot communication was introduced. Several application systems, which accept and react to human communication media were also presented together with the demonstrations. Fig. 1 Concept of Human Interface Object, Machine Computer Human Interface Human Intuitive Actions through Vision, Audition and Body Convincing Reactions
7
Embed
Human Interface Studies for the Communication between Human
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Human Interface Studies for the Communication between Human and Systems
Hideyuki Sawada
Faculty of Engineering Kagawa University
Abstract - The human interface technologies and their applications for computer intelligence and human-robot communication will be presented. Speech, vision and tactile sensations will be especially subjected with the recent results of speech signal processing, human motion tracking and humanoid robotics studies. Assistive technologies for supporting handicapped and aged people will be also introduced. 1. Introduction
The importance of human-machine interface is widely acknowledged in accordance with the development of computers and associated technologies. The faster the processing speed becomes, the wider the gap between the computing ability and the accessibility of humans to the computer becomes. In human to human communication, we are able to communicate with each other by using not only verbal media but also the five senses which are vision, audition, olfaction, palate and tactile sensation. Information transmitted through the five senses is especially able to affect our emotions and feelings directly making for smooth communication. Whereas an employment of a conventional computer and systems is not suitable to sense the communication media employed in the human world. Although a variety of sensing devices and equipment has been developed for the measurement of physical data, the data processing ability of a computer is quite different from the one that humans have. If a computer is able to understand human communication media and reacts to humans as they do, it will open a new world where humans and computers live together in mutual prosperity.
In this paper, the human interface technologies and their applications for computer intelligence and human-robot communication will be introduced. Speech, vision and tactile sensations will be especially subjected with the recent results of speech signal processing, human motion tracking and humanoid robotics studies.
1
1.1 Human Interface Studies
Nowadays computer science has developed dramatically, and has widely contributed to the control engineering, theoretical calculations and amusements,
not only in the expert field of engineering but also in daily human life.
The theoretical aspect of computer science has been constructed on the basis of C. E. Shannon's information theory in the framework of the information processing, which contributed to the development of various software and hardware applications having procedural or grammatical structures. Based on the results of such applications to logical matters, computer sciences have extended their research targets into illogical matters such as human behavior, human mind and human artistic activities
Human interface studies deal with the illogical aspects of human activities and present a new system, which would contribute to interface the gap between a machine and a human who uses or interacts with it, as shown in Fig. 1.
1.2 Speech, Vision and Tactile Sensations
The author has been paying attention to the information transmitted through speech, vision, gestures and tactile sensations, and has constructed multimodal systems that deal the media as a human is doing. In the presentation, a talking robot which speaks and sings like a human, a robotic arm system which tracks a particular sound among multiple sounds, a gesture recognition system, a face tracking and recognition system, a tactile display which presents various tactile sensations will be introduced by showing the demonstrations.
In this paper, the human interface technologies and their applications for computer intelligence and human-robot communication was introduced. Several application systems, which accept and react to human communication media were also presented together with the demonstrations.
Fig. 1 Concept of Human Interface
Object,Machine
Computer Human InterfaceHuman
Intuitive Actions through Vision, Audition and Body
Convincing Reactions
2. Active Tracking of Particular Person Using Visual and Auditory Information
A human has various sensory perceptions, and
effectively uses them in communication. Auditory and visual functions especially play an important role for recognizing someone to talk to and understanding the conversation. In vocal communication, we are able to detect the position of a source sound in 3D space, extract a particular sound from mixed sounds, and recognize who is talking. In addition, we are able to detect a particular person by recognizing body features and individual gestures. By realizing this mechanism using a computer, new applications will be presented, which are utilized in the communication with humans. We are working for the identification of a particular person using microphones and optical motion trackers. The paper describes the development of information fusion system and how to deal with multiple data obtained by different sensors.
2.1 System Configuration
2
A typical configuration of our video conference room is shown in Figure 2. A sound system (A) and a video system (B) provide the locations of users and the identity of the speakers. The computer (C) fuses the information and provides speaker's location. Finally a camera focuses on a particular speaker and sends the video to the computer (D).
2.2 Sensing Systems A. Optical Motion Tracking System
In the project, two reflective markers are fixed to each user's head and body. The system imports a set of frames from the motion capture system to extract the marker locations, and then groups the markers by couples in order to compute the barycenter of each head.
B. Robotic Auditory Sensing System
We have been also working to develop a robotic auditory system, which tracks a 3D location of a sound source by using sound characteristics for the identification of a particular person. The microphone array shown in Figure 3 inputs 5 sound signals simultaneously to identify a speaker and the location.
Sound source location is estimated by using the phase difference and sound pressure difference among the 5 microphones [2]. Mel-cepstrum coefficients are used for the sound identification since they present the sound characteristics.
2.3 Identification of Particular Person
The information fusion system bundles and controls the sensing units distributed in the environment to track a person using visual and auditory information.
The video-based 3D acquisition system gives the precise location of users, however it is difficult to
identify who is speaking. The sound system is able to provide the direction of the sound source and to identify the speaker's voice signature, however it is difficult to distinguish one speaker among multiple users with an environmental noise. In this study, the identification of users and the estimation of the position are executed by the two approaches. Figure 4 shows the identification algorithm of a particular person. It is difficult to run various functions in one computer. We are constructing a distributed system consisting of different computers having various functions. In this study, we analyze these different steps, and present our solution based on OSGi frameworks running on each computer.
2.4 Conclusion of Tracking Techniques
In this chapter, an information fusion system was introduced, together with the application of the identification of a particular person in a group of plural people.
Fig.2 Video Conference Room Fig. 3 Microphone Array
Optical motion tracking Auditory sensing
Input sound signal Input Gesture
Human Body model
Estimation of sound
Fig. 4 Flow of Active Tracking
3. Image-based Gesture Recognition for Intuitive User Interface
In a study of virtual reality and mixed reality, the
developments of an input device and user interface are important for the interaction with virtual objects and environment. However, a mouse and a joystick are commonly used in present systems. If we can manipulate a virtual object intuitively by natural
Feature vector selection position
Template matching
Position of marker
Body feature Sound feature
Position of sound
Who ? Location of user
MFCC extraction
gestures without any sensors or devices, the system would contribute to present reality. In this paper, we suggest a technique to let a user move his hand in front of a camera to manipulate a virtual object, and the movement is recognized in real-time to establish an interactive communication.
3
3.1 System Configuration The system consists of a web camera, a computer
and a display as shown in Figure 5. Use's gestures are captured by the camera in front of him with the size of 320*240 pixels, and the gestural actions are recongnized in realtime.The user is able to manipulate a virtual object by his gestures and to intuitively interact with the three dimensional virtual environment.
PC
Web camera
User
Virtual object
Fig.5 System configuration
3.2 Image processing for gesture recognition A. Gesture recognition algorithm
The gesture recognition algorithm is constracted to identify characteristic motions given by user’s arm motions. In this study, we try to build a system to be simple as possible, so that a user operates the system to have interactions in realtime.
The image processing algorithm is described by referring to Figures 6. By inputting images from the web camera, the system applies spatial averaging to an image I(w,h,t) at time t. The brightness between two adjacent images B(x,y,t) and B(x,y, t-1) are compared, and the motion areas are extracted to obtain a binary image F(w,h) based on the threshold value as shown in Figure 6 (a).
Then, a morphological filter as presented in Equation 1 is applied to the image F(w,h) for removing small noises. Several white blocks still remain, and the system finds the largest block, which is regarded as the area of human moving hand. Equation 2 shows the formula to compute the size of the white blocks.
⎪⎪⎭
⎪⎪⎬
⎫
=≥++
=≤++
∑ ∑
∑∑
−= −=
−= −=
1),(,6),(
0),(,3),(
1
1
1
1
1
1
1
1
jiFmjniF
jiFmjniF
n m
n m (1)
(2) ⎭⎬⎫
⎩⎨⎧
= ∑∑x
i
y
jjiFjiFyxL ),(),,(max),(
(a) Motion area detection (b) Barycenter calculation
Fig.6 Flow of image processing After finding the largest block, the system computes
a barycenter G(gx, gy) in the image F(w,h) as presented in Figure 6 (b), and calculates the speed and the direction of the moving hand by referring to image sequence.
B. Experiments
An experiment was conducted to validate the algorithm to extract human hand motions. A subject made a circular motion as shown in Figure 7 (a), and the motion was measured as presented in Figure 7 (b). Small noises along the trajectory are still observed, however the characteristics of circular motions are generously recognized. The execution time for one frame is less than 20 milliseconds while transferring images through USB using a note PC (Intel Pentium M 1.6GHz, 760MB RAM), and we found that the algorithm works in realtime.
(a) Gestural motion (b) Measurement results
Fig. 7 Experiment of gesture tracking 3.3 System applications
To evaluate the gestural interface to be employed for the manipulation of computational system, a 3D table-tennis game played with hand gestures was constructed as shown in Figure 8. A ball with a striped pattern is bouncing around in a virtual 3D space, in which a wall of stacked blocks is situated in the center. A user stands in front of the computer display to see the virtual space, and at the moment the ball comes
back to the entrance face, he strikes the ball by the hitting motion using the hand. The ball bounces back when it hits the wall, and if the ball hits a block, it disappears.
The system recognizes the gestural trajectory, the direction and the motion speed of the user's hand actions, and they are related to the reactions of the ball to bounce back, so that the user plays the virtual tennis by the interaction with the 3D space in realtime. If he gives a stroking gesture to the left, the ball returns to the left direction. By giving an arch motion, he is able to put a spin on the ball according to the amount of the stroke. If he strokes quickly, the ball bounces back with higher speed. The user has to control the direction and speed of the ball to hit a targeted block to disappear, because if the ball speed is not strong enough, the block does not disappear.
We conducted an experiment to evaluate the application system. Nine subjects, who all have the experience of computers more than 5 years and are accustomed to the use of a conventional mouse and a keyboard, played the 3D table-tennis game, and evaluated it from the viewpoints of the user-interface and the interaction with a virtual object by answering an questionnaire. The questions to the subjects are;
A) Could you easily understand the manipulation? B) Were you easily accustomed to the manipulation
employing gestures? C) Could you well control the ball in 3D space?
and the evaluation results are shown in Table 1. The gestural manipulation was evaluated almost
favorably. Especially, most of the users pointed out the intuitive and easy understanding of the manipulation, and evaluated positively. On the other hand, there were several opinions to point out the difficulties in controlling the direction and speed of the ball based on the recognition of gestures. The improvement of the recognition ability and the flexible association of user's actions with CG outputs should be further examined in the future system.
4
Fig. 8 Gesture-manipulated table-tennis
Table 1. Result of the questionnaire Max Min Ave
A 5 3 4.5 B 4 2 3.4 C 3 2 2.6
3.4 Conclusions We developed a vision-based gestural interface to
be used for an interactive manipulation of a virtual object, and introduced an application to a virtual tennis game, by which a user is able to play tennis employing gestural actions without using any sensors or special devices. The interface was preferably evaluated by a questionnaire, and several problems were also obtained. In the next stage, a further flexible interface system employing gestures will be developed by integrating with other modalities such as audition and tactile sensations.
4. Micro-vibration Actuators and the Presentation of Tactile Sensation to Human Skin Humans are able to communicate with each other
by using not only verbal media but also the five senses. However, few devices are found for presenting tactile information together with vision and sounds. This paper introduces the development of a tactile display using a shape-memory alloy (SMA) thread, and describes the presentation of tactile sensations to human skin.
4.1 Design of Tactile Display
We developed a micro vibration actuator electrically driven by periodic signals generated by current control circuits for the tactile information transmission [3]. Figure 9 shows the actuator, which is composed of a 5mm-long SMA thread with a diameter of 0.05mm to be able to attach to body surfaces easily. Figure 10 shows the tactile display setting 6 actuators by 3 x 3 matrix.
4.2 Presentation of Tactile Sensations to Narrow Area on Different Body Surfaces Blocks An experiment was conducted for the presentation
of tactile sensations to different body surfaces of an able-bodied subject, to examine the possibility of its information presentation not only to palms but also to any body locations. A palm and the back of a hand have different skin structure of tactile receptors. Hence, the stimulus area was set as shown in Figure 11, where 100 spots were arranged in 4.5 x 18 mm. The vibratory stimuli were generated by setting the frequency 50 Hz and the duty ratio 1:20, and a subject
Shadow of Ball
reported the sensations when the stimuli were applied with the vibration actuator.
5
The results are shown in Figure 12. On the palm, all the stimuli were uniformly perceived as mechanical vibratory sensations. On the other hand, on the back of the hand, a localization of vibratory sensations was found, which proved the different localizations of tactile receptors. It is also assumed that by giving vibratory stimuli with different frequency, another localization of different sensations would be found, which we would study in a further experiment.
4.3 Presentation of Texture Sensation
Two SMA actuators are able to generate Apparent Movement (AM) and Phantom Sensation (PS), and "rubbing" or "stroking" sensations are perceived by multiple AM and PS [3]. By driving 8 actuators, various texture sensations would be presented [4]. An experiment was conducted by employing 9 subjects to examine texture sensations generated by the display. Eight real objects, which are a felt, a towel, a sponge, a handkerchief, a cardboard, a rubber, a paper and a smooth wood board, were prepared, and a subject answered which object had the close tactile texture with a sensation given by the display.
Figure 13 shows the result. With 50 Hz, most subjects perceived rough textures like a felt, a towel, or a sponge. On the other hand, with the frequency 100Hz, smooth objects such as a handkerchief, a cardboard, or a smooth wood board.
4.4 Conclusions
In the paper, different sensations generated by the tactile display with SMA were introduced. By driving actuators randomly, various texture sensations were also perceived. In the next stage, we will investigate further suitable conditions for presenting texture sensations and "real" touch feelings, together with the physiological mechanism of human tactile sensations.
よる発話訓練を受けているが、その発話訓練において、自分自身で口の中を見ることができない、夏休みなどの長期休暇があると声道形状を忘れてしまう、患者の数に対して ST の数が圧倒的に少ない等の問題がある。そこで、患者が目視で自分自身の声道形状を確認できるもの、ST の代替となるようなものが求められている。以上のことから、患者が自分自身で声道形状を確認しながら訓練できるロボットの実現を考えている。 本研究では、我々がこれまでに開発してきた発
(参考文献) [1] Khairunizam Wan and Hideyuki Sawada: "3D
Measurement of human upper body for gesture recognition", International Symposium on Optomechatronic Technologies, Vol. 6718, 67180I-1 - 8, 2007.
[2] Hideyuki Sawada and Toshiya Takechi, "A Robotic Auditory System that Interacts with Musical Sounds and Human Voices", Journal of Advanced Computational Intelligence and Intelligent Informatics, Vol.11, No.10, pp.1177-1183, 2007.
[3] Y. Mizukami and H. Sawada, "Tactile information transmission by apparent movement phenomenon using shape-memory alloy device", International Journal on Disability and Human Development, Vol.5, No.3, pp.277-284, 2006
[4] 水上陽介、澤田秀之:「微少振動子を用いた触
覚ディスプレイと駆動信号の発生確率密度制
御による触覚感覚の呈示」、情報処理学会 イン
タラクション 2008 論文集, pp. 195-202, 2008 [5] Hideyuki Sawada: "Talking Robot and the
Autonomous Acquisition of Vocalization and Singing Skill ", Chapter 22 in Robust Speech Recognition and Understanding, pp.385-404, June 2007, ISBN: 978-3-902613-08-0