Monday, April 6, 2009



Gestonic is a video-based interface for the sonification of hand gestures for real-time timbre control. The central role of hand gestures in social and musical interaction (such as conducting) was the original motivation for this project. Gestonic is being used to make computer-based instruments more interactive. It also allows the musicians to create sonorous and visual
compositions in real time. Gestonic explores models for sonification of musical expression. It does not use the direct-mapping of gesture-to-sound such as is commonly applied in acoustic instruments. Instead, it employs an indirect mapping strategy by making use of the color and timbre of the sound. The system consists of a laptop's camera, the filtering of camera input via the open source software known as Processing, the sending of OSC control messages to the audio-processing program known as ChucK, and finally the parameter-mapping and sound synthesis enabled by ChucK.

Gestonic consists of two main components:

• Gesture and Image Processing: This part of the system consists of a laptop's video camera and an Open Source software called Processing to filter and calibrate data received from the camera. In the current prototype of Gestonic, the input screen is divided into four sections: each represents a different instrument. In each section, the relative and absolute brightness and
the amount of change compared to the previous frame in red, green and blue is measured. Furthermore, four different blobs each detecting a different color (white, red, green and blue) show up on the screen. By moving objects with the same color as a blob, color tracking those objects with blob tracking is possible, so there are four more possible parameters to map to sound. Chuck and Processing communicate via OSC messages sent from Processing to ChucK in order to manipulate sound and send the opposite direction to control the video output to make the instrument more expressive.

• Data Processing and Sound Synthesis: ChucK programs are used to manipulate data received from Processing to synthesize sound.

Work in Progress

Gestonic is a work-in-progress and there is a lot more to be done to formulate expressive sounds from expressive gestures. Each section on the video frame is mapped to a different instrument. So far, the modules for four types of instruments are implemented. One is a drone like sound. The second instrument is a randomly generated, particle-like sound. The timbre and reverb of this sound is manipulated with gestures. In the future, the density of these random sounds will be indirectly mapped to the density of motion in the image. The third instrument is a beat-detecting instrument tracking the beats in motion. The fourth instrument is a set of human voices. The voices are manipulated with a granular synthesizer and grain parameters are mapped to blob motions received from the video.

Progress week 2

I started reading on Neural Networks to train the instrument by making some basic gesture recognitions possible. I looked into Neural Networks in Processing and Neural Network toolbox in Matlab. Some Neural Networks related References are added below.

Progress week 3

After playing around with matlab's nn toolbox and learning about basic concepts of image recognition such as morphology I decided to use something more practical. Matlab is good for analyzing images, but not for real time performance.

I am finally using Wekinator a free package to facilitat rapid development with machine learning in live music performance. The big advantage of this package is that it is very chucK friendly and it helps me to do real time motion extraction from camera input and the implementation of learning methods in Wekinator and sound synthesis with chuck.

Progress week 4

This week I started to make a simple one layered Neural Network in processing. It gets input from mouse, I haven't mapped it to the video camera yet. So far I can read six different drawings from the screen and train the network with those input drawings. The longer the training the less the error of recognizing the proper drawing. The next step is to get input from the camera. Then the question is how can I proceed? How can I make the training work in real time?

Progress week 5

As we approached the middle of the quarter, we have to deliver the first draft of our paper for this project, so I started to read more and get a deeper understanding of gesture based systems using neural network. I ran into at least twenty different systems and each in a way similar to the others but also unique is certain ways.

- Glove Talker
- Japanese sign-language recognition system
- Japanese manual alphabet recognition system
- Musical conducting gesture recognition system
- handshape recognition system
- Given: a handshape(postures) and dynamic gestures recognition system
- Coverbal gesture recognition system
- Sign motion understanding system

I am going to explain some details about these systems and some of their similarities that are useful in my implementation. Some main structural components of gestures that were used in most of these systems are:

- motion path length
- gesture duration
- maximum hand velocity
- flex for thumb, index, middle and annular fingers
- hand orientations

Progress week 6

This week we are submitting the first draft of our paper. I will upload my paper here soon.
In addition I worked on some image processing stuff. I have approached the problem from two different ways:
- analyzing by brightness
- analyzing by pixelation

I am still working on feeding these values to the neural net.

Progress week 7

This week I worked on making new sounds to map to gestures. It is hard to make sounds interesting enough and map in a non- linear way to make it more EXPRESSIVE!

A good inspiration was that I met with Troika Ranch Dance company. They demonstrated their software, isadora which is totally what I want to want to achieve with my software but instead of their approach, I only use open source software.


The final paper that I summarized all the findings of this project is submitted and published at IHCI conference 2009, San Diego. Paper is available upon request.

Links and References

1. Machover, T.: Instruments, Interactivity, and Inevitability. Proceedings of the NIME International Conference (2002)

2. Kurze, M.: TDraw: a Computer-based Tactile Drawing Tool for Blind People. Proceedings of 2nd Annual ACM Conference on Assistive technologies. ACM Press. Canada (1996) 131-138

3. Fels, S.S., Hinton, G.E.: Glove-Talk: A Neural Network Interface between a Data-glove and a Speech Synthesizer. IEEE Trans. On Neural Networks, Vol. 4, No. 1 (1993)

4. “Processing” website

5. Wright, M., Freed, A.: Open SoundControl: A New Protocol for Communicating with Sound Synthesizers. ICMC. Thessaloniki (1997)

6. Wang, G., Cook, P.R.: ChucK: A DAFx, Concurrent, On-the-fly Audio Programming Language. Proceedings of the ICMC (2003)

7. Carette, E.C., Kendall, R.A.: Comparative Music Perception and Cognition. Academic Press (1999)

Neural Network References

1. Hunt, A., Hermann, T. : The Importance of Interaction in Sonification, ICAD (2004).

2. Kolman, E., Margaliot, M. : A New Approach to Knowledge-Based Design of Recurrent Neural Networks. (2006)

3. Franklin, K., Roberts, J. : A Path Based Model for Sonification.

4. Boehm, K., Broll, W., Sokolewicz, M. : Dynamic Gesture Recognition Using Neural Networks; A Fundament for Advanced Interaction Construction, SPIE Conference Electronic Imaging Science & Technology, San Jose California. (1994)

No comments:

Post a Comment