The other reason being the desire to introduce the concept of space to machines and programs. As Underkoffler puts it, programs and computers are "hideously insensate when it comes to space".
The g-speak SOE is made up of three parts. It is one third gestural input/output that gives high definition output for high-fidelity input. Input is by hand gestures, movement and pointing. Finger and hand motions are tracked to 0.1 mm at 100 Hertz. The system also supports two-handed and multi-user input. This effectively gets rid of the mouse and keyboard input system, although the software still allows input by these two devices in conjunction with the gestural input.
The second part is 'recombinant networking'. What this means is that the g-speak platform allows for multi-computer collaboration. The data can be displayed and shared among many devices. Recombinant networking also means that the platform supports the integration of legacy applications (old applications) into g-speak. It is possible to adapt the legacy application with very little new code.
The third part is 'real world pixels'. This means that the platform can recognise real world objects and can accept input from them. G-speak can also work with multiple screens.
In the video below, John Underkoffler demonstrates the g-speak platform and tells the origin of g-speak. Another mind-blowing video:
And here is an overview of the g-speak:
g-speak overview 1828121108 from john underkoffler on Vimeo.
Story sources: http://oblong.com/
http://www.ted.com/talks/lang/eng/john_underkoffler_drive_3d_data_with_a_gesture.html