Doctoral Degrees (Computer Science and Informatics)
Permanent URI for this collection
Browse
Browsing Doctoral Degrees (Computer Science and Informatics) by Subject "Automatic speech recognition"
Now showing 1 - 1 of 1
Results Per Page
Sort Options
Item Open Access Enhancing the user experience for a word processor application through vision and voice(University of the Free State, 2011) Beelders, Tanya René; Blignaut, P. J.English: Multimodal interfaces may herald a significant improvement on current GUIs which have been commonplace until now. It is also possible that a multimodal interface could provide a more intuitive and natural means of interaction which, simultaneously, negates the reliance on traditional, manual means of interaction. Eye gaze and speech are common components of natural human-human communication and were proposed for use in a multimodal interface for a popular word processor for the purposes of this study. In order for a combination of eye gaze and speech to be a viable interface for a word processor, it must provide a means of text entry and facilitate editing and formatting of the document contents. For the purposes of this study a simple speech grammar was used to activate common word processing tasks, as well as for selection of text and navigation through a document. For text entry, an onscreen keyboard was provided, the keys of which could be pressed by looking at the desired key and then uttering an acceptable verbal command. These functionalities were provided in an adapted Microsoft Word 2007® to increase the customisability and possibly the usability of the word processor interface and to provide alternative means of interaction. The proposed interaction techniques also had to be able to execute typical mouse actions, such as point-and-click. The usability of eye gaze and speech was determined using longitudinal user testing and a set of tasks specific to the functionality. Results indicated that the use of a gravitational well increased the usability of the speech and eye gaze combination when used for pointing-and-clicking. The use of a magnification tool did not increase the usability of the interaction technique. The gravitational well did, however, result in more incorrect clicks due to natural human behaviour and the ease of target acquisition afforded by the gravitational well. However, participants learnt how to use the interaction technique over the course of time, although the mouse remained the superior pointing device. Speech commands were found to be as usable, or even more usable, than the keyboard and mouse for editing and selection purposes, although navigation was hindered to some extent. For text entry purposes, the keyboard far surpasses eye gaze and speech in terms of performance as an input method as it is both faster and results in fewer errors than eye gaze and speech. However, even though the participants were required to complete a number of sessions and a number of text entry tasks per session, more practice may be required for using eye gaze and speech for text entry. Subjectively, participants felt comfortable with the multimodal interface and also indicated that they felt improvement as they progressed through their sessions. Observations of the participants also indicated that as time passed, the participants became more adept at using the multimodal interface for all necessary interactions. In conclusion, eye gaze and speech can be used instead of a pointing device and speech commands are recommended for use within a word processor in order to accomplish common tasks. For the purposes of text entry, more practice is advocated before a recommendation can be made. Together with progress in hardware development and availability, this multimodal interface may allow the word processor to further exploit emerging technologies and be a forerunner in the use of multimodal interfaces in other applications.