A Isolated Word Recognizer for the Swahili Dialect Using Sphinx4-Neural Network Hybrid
Abstract
Speech is one of the most effective means of communication. Efforts to develop an automatic speech recognizer for Swahili dialect has faced numerous challenges due to lack of a standardized acoustic model for the dialect. This study aimed at developing an isolated speech recognizer for automatic transcription of Swahili dialect using Sphinx4 – Neural Network hybrid. The objectives of the study was to design and develop a neural network on the front-end module of the Sphinx4 and to compare performance between sphinx4 and sphinx4 neural network hybrid. The study was guided by statistical and probabilistic theory and the theory of linguistics. This study followed a positivist philosophical paradigm and embraced experimental research design. To achieve this, the researcher developed a small corpus of the dialect which contained twenty words. A total of two hundred speech recordings were made from ten volunteers. The words and the volunteers were selected using purposive and convenience sampling methods respectively. Results obtained from both sphinx4 and sphinx4-neural network hybrid were evaluated using descriptive analysis techniques. This study established that a neural network could be integrated with sphinx4. The study also established that sphinx4-neural network hybrid with its neural network trained to an error less than 0.0175 had a performance that was statistically at par with that of sphinx4. The study concluded having established that for isolated word recognition in Swahili, sphinx4-neural network hybrid can be used in place of sphinx4 HMM recognizer. Areas of further study recommended include; using this tool for continuous speech recognition, evaluation of the tool using a larger speech sample, comparative analysis between the tool and others line NICO and establishing ways of improving accuracy of the tool.
Article Views and Downloands Counter
References
Bamgbose, A. (2011). African languages today: The challenge of and prospects for empowerment under globalization. In E. G. Bokamba, Selected Proceedings of the 40th Annual Conference on African Linguistics. Somerville,MA: Cascadilla Proceedings Project.
Bourlard, H., & Morgan, N. (1997). Hybrid HMM/ANN systems for speech recognition: Overview and New Research Directions. Computer, 30(3), 1-29.
Freeman, J. A., & Skapura, D. M. (1991). Neural networks:Algorithms, applications and programming techniques. Reading, Massachusetts: Addison-Wesley Publishing Company, Inc.
Glass, J. R., Hazen, T. J., & Hetherington., L. (1999). Real-Time telephone-Based speech recognition in the Jupiter domain. International Conference on Acoustics, Speech, and Signal Processing, 61-64.
Hadrien, G., Solomon, T., Laurent, B., & François, P. (2010). Quality assessment of crowdsourcing transcriptions for African languages. Lyon: Universit´e de Lyon.
Haykin, S. (2005). Neural Networks: A Comprehensive Foundation. Delhi, India: Pearson Education (Singapore) Pte Ltd.
Huang, X., Acero, A., & Hsiao-wuen, H. (2001). Spoken language processing. Upper Saddle River, New Jeysey: Prentice Hall Inc.
Juang, B. H., & Rabiner, L. R. (1993). Fundamentals of speech recognition. Eaglewood Cliffs, New Jeysey: PTR-Prentice Hill.
Kimutai, S. K., Milgo, E., & Gichoya, D. (2013). Isolated Swahili words recognition using Sphinx4. International Journal of Emerging Science and Engineering, 2(2), 51-57.
Lyons, J. (2022, 12 20). Mel frequency cepstral coefficient (MFCC) tutorial. Retrieved from Practial Cryptographyy: http://practicalcryptography.com/miscellaneous/machine-learning/guide-mel-frequency-cepstral-coefficients-mfccs/
Myers, B. A. (1998). A brief history of human-computer interaction technology. interactions, 5(2), 44-54.
Peretto, P. (1994). An Introduction to modeling neural networks. New York, NY: University of Cambridge press.
Rusell, S., & Norvig, P. (2010). Artificial inteligence: A modern approach. Upper Saddle River: Pearson Education inc.
Stefanov, E. (2013, 12 7). Introduction to Feed Forward Neural Networks. Retrieved from http://www.emilstefanov.net/Introduction_to_Feedforward_Neural_Networks.aspx
Tebelskis, J. (1995). Speech Rrecognition using neural networks. Thesis. Pittsburgh, Pennsylvania: School of Computer Science, Carnegie Mellon university.
Willie, W., Paul, L., Philip, K., Bhiksha, R., Rita, S., Evandro, G., . . . Woelfel, J. (2004). Sphinx-4: A flexible open source framework for speech recognition. (pp. 1-9). SUN MICROSYSTEMS INC.
Copyright (c) 2023 Africa Journal of Technical and Vocational Education and Training

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Copyright Notice Copyright of published articles is held by AfriTVET. No limitation will be placed on the personal freedom of authors to copy or to use in subsequent work, material contained in their papers. Please contact the Publisher for clarification if you are unsure of the use of copyright material. Apart from fair dealing for the purposes of research and private study, or criticism and or review, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the Publishers.