Testing of the voice communication in smart home care
© Vanus et al. 2015
Received: 29 September 2014
Accepted: 1 June 2015
Published: 11 June 2015
This article is aimed to describe the method of testing the implementation of voice control over operating and technical functions of Smart Home Come. Custom control over operating and technical functions was implemented into a model of Smart Home that was equipped with KNX technology. A sociological survey focused on the needs of seniors has been carried out to justify the implementation of voice control into Smart Home Care. In the real environment of Smart Home Care, there are usually unwanted signals and additive noise that negatively affect the voice communication with the control system. This article describes the addition of a sophisticated system for filtering the additive background noise out of the voice communication with the control system. The additive noise significantly lowers the success of recognizing voice commands to control operating and technical functions of an intelligent building. Within the scope of the proposed application, a complex system based on fuzzy-neuron networks, specifically the ANFIS (Adaptive Neuro-Fuzzy Interference System) for adaptive suppression of unwanted background noises was created. The functionality of the designed system was evaluated both by subjective and by objective criteria (SSNR, DTW). Experimental results suggest that the studied system has the potential to refine the voice control of technical and operating functions of Smart Home Care even in a very noisy environment.
Technologies that help seniors to live a quality life and to be more self-sufficient in everyday life are being developed. One of the fields of this development and innovation of modern technologies is Smart Home  to ensure independent living for senior citizens or disabled persons in household environment with the option of assistance care (Smart Home Care).
A sociological survey has been carried out to find out the needs of seniors  in the field of modern technology. The survey was aimed towards persons over 45 years old and thus also towards people that are not yet in senior age. The total number of respondents was 98. The goal of the survey was to find out what attitude towards modern technology not only seniors, but also middle-aged people preparing for retirement have. The survey shows that seniors use technology more and more. Prejudices and fears of new machines and devices fall. The survey was handed out in homes for the elderly, in nursing homes and to people living in private homes. Conditions that have often occurred in the survey replies for the use of modern technology in personal lives of seniors were as follows: “the need to understand modern technology, easy usability of the technology, usefulness of the technology for everyday life”.
69 % of the respondents are satisfied with their current situation in terms of living and would not want to change it,
5 % of respondents would want to move from a home for the elderly or a nursing home to a private home,
4 people (4 %) would move to a home for the elderly in the future and
6 people (6 %) would move to a nursing home,
16 people (16 %) would live in an intelligent house.
CCTV in common areas (64 %)
A SOS button (56 %)
Sensor equipment (smoke detectors, gas and water leakage detectors), (43 %)
Medical equipment (check of physiological functions) along with
Cameras in private areas scored the least points (13 % and 2 %)
46 % of respondents still prefer manual control over the house/apartment
24 % of respondents would want to control Smart Home by voice
12 % of them would control it via a computer
8 % by touchscreen and
10 % of respondents would use it through portable devices
In real-life implementations of the designed systems for voice communication [3-6] with the control system it is necessary to resolve an issue with additive noises for the particular environment [7,8]. Voice control in smart apartments is preferred mainly by older or handicapped fellow-citizens . This paper focuses on the design and implementation of a comfortable voice control operational and technical functions in intelligent building with the KNX technology system. The whole work could be notionally divided into three parts, namely the development of applications for voice recognition, programming and animation has finished school model and implementation of a communication interface between these parts. The actual recognition of voice commands will be implemented on the platform. NET Framework 4.0 using C# programming language. Recognition results in machine code will be sent over UDP server using KNX / IP router from the host to the school model. After completing work on a model of intelligent building and a successful implementation of voice control is carried out functional and quality voice recognition testing.
The task of the designed software G.H.O.S.T for voice control of operational technical functions of a Smart House is to provide users with easy and simple access to control of their house. This application can serve as a complement to other technologies, supporting Tele Care in Smart Home Care. For example, for determining the position of the senior citizen’s in Smart Home Care .
Description of development environment
The application has been created in “Sharp Develop” environment, which is an open source integrated development environment.
Application appearance and basic features
In the upper part of the screen there is a button in form of an icon with a microphone, which serves for turning off and on of voice recognition. The basic menu consists of six buttons (Visualization, “Statistics”, Voice commands, Settings, Help and Close) providing switching to other screens. Each of the submenus provides the user with the possibility to control functions of Smart Home Care or the G.H.O.S.T program itself.
Indication of blinds movement relates to bedroom, kitchen and living room, where these components are used. After entering a particular voice command, the application is able to evaluate the command and visualize the changes performed on this screen.
Help screen - contains brief description of the individual tabs and their functions. Individual commands for voice control that have been used are also listed here.
There are two ways to quit the application: either by a voice command, or by clicking the Close button.
Application structure and description of source codes
The full extent of the graphical part roughly consists of a hundred graphical components. The text part, which implements main functionality of the whole application (Fig. 6), spreads out in about seven hundred lines of suitably commented code.
If a control phrase is recognized in the buffer, the whole buffer is cleared and an operational action is performed. Choice of the operational action is created using switch-type command. After choosing and performing a certain action, the program responds to the user using the synthesizer. The described operational action is implemented using method “lights blinds change of the state ()”. This method forms the basis for the visualization screen. Its basis is again a “switch” type command. Using the condition-al command “switch”, this method chooses a block of code performing the respective changes both in the visualization part and in the smart building model by sending the respective commands KNX fieldbus through UDB server.
Interconnection of the application and KNX communication fieldbus
Connection of the entire communication network
Connection is very simple if is consider the wiring connection alone. Much more difficult is the need to adapt software to individual protocols (KNX and UDP). A slight disadvantage is also the fact that the used KNX/IP router module uses rather the simple UDP communication protocol, which does not guarantee transfer reliability. However, for this developed application this solution is more than sufficient. Difference between TCP and UDP protocol is described follow. TCP is a connection-oriented protocol which means that to establish “end-to-end” communication it requires so-called “handshaking” to occur between the client and server. After a connection is established, data may be transferred in both directions.
IP adress: 220.127.116.11
Message content: 06:10:05:30:00:11:29:00:bc:d0:11:1a:09:01:01:00:81
Implementation of interconnection directly in the G.H.O.S.T application
The KNX technology uses its own communication protocol for communication between the individual peripheral devices. The algorithm to generate codes sent over the fieldbus would be very complicated and the whole solution would be quite cumbersome. In-put data in form of component addresses and states of the individual modules are not fully static parameters and may be changed using ETS application. That is why a more straightforward way was chosen. The used way is based on mapping IP addresses and UDP messages being sent over the fieldbus on every change on the sensor part of the system. The Wire Shark application version 1.10.5 was very helpful in this application. Only a UDP client was programmed in the developed application, which is able to send these basic messages in hexadecimal format. Therefore the application can emulate pressing of any key through software (it is able to emulate any change both in the sensor part and in the operational part of the system). This way it was possible to get around the complicated generating of messages based on variable messages and to control the whole system in the most straightforward and in principle the most natural way for the fieldbus. A total of 31 similar operational actions were mapped into the application.
Experimental part - testing and command recognition success rate statistics
Testing is a part of the created application. Fig. 14 illustrates an example of testing application, where setting of number of repetitions and selection of the tested command can be seen. After the test is completed, a notification window is displayed with information about the particular test. Although the number of well recognized commands is one hundred of one hundred, the average recognition success rate is only 94.61 %. This is because the number of recognized words is a real value, while average recognition success rate represents a program-wise estimated recognition success rate. Ambient noise, which influences the very recognition, is a major factor that has a great influence on aver-age recognition success rate. 10 persons of different sex and age have participated in testing of this system. Both men and women in age range of 22 – 50 years. An integrated microphone, which is a part of a PC, and a wireless microphone (Logitech Wireless Headset H600) have been used for testing. Every speaker tested a random voice command several times. Each of the ten speakers has tested the given command, in the first case (lights on) 100 times. In order to compare results of the individual speakers, testing was performed on the same command for each of them. The result is an average percentage success rate of the tested command and real recognition success rate of the success rate.
Microphone integrated in PC (without distance, without ambient noise) – test 1
Microphone integrated in PC (distance 3 m, without ambient noise) – test 2
Younger speakers achieved slightly worse results, which may be due to incorrect English pronunciation. Also, the programmable estimated recognition liability is lower, because of ambient noise detected by the microphone - the microphone is not able to catch the entire command.
Microphone integrated in PC (with ambient noise) – test 3
Wireless microphone (without ambient noise) – test 4
The only restriction is the range of the selected wireless microphone. All five speakers did the testing again using the “shower” voice command. Using a portable microphone gave us almost a hundred percent success rate. This could be the result of several factors. One of the most important factors is the use of a high-quality microphone, mainly due to the fact that the user has the microphone very near to his mouth. The lower percentage for programmable estimated reliability may also be due to the incorrect pronunciation of certain speakers.
Wireless microphone (with ambient noise) – test 5
Even in this case, the success rate of voice command recognition is very high despite the fact that testing was done with the presence of ambient noise. This is mainly due to the use of a high-quality microphone and due to the fact that the speakers were very close to the microphone.
Suppression of additive background noise for voice control over operating and technical functions in the smart home care
Implemented application of voice communication with the control system was supplied with a sophisticated system for filtering the additive background noise from the real environment of Smart Home Care [7,11]. The background noise significantly degrades command recognition in voice control over operating and technical functions inside an intelligent building. Within the scope of the designed application, a complex system based on fuzzy-neuron networks, specifically the ANFIS (Adaptive Neuro-Fuzzy Interference System)  for adaptive suppression of unwanted background noise was developed. The designed system was tested on real speech signals and its functionality is an important part of the developed application, because there is an expectation that there will be an unwanted background noise in the real environment of Smart Home Care (e.g. a TV, radio, noises from outside, so called urban noise, kids, household devices – vacuum cleaner, washing machine, fan, refrigerator, etc.). The Least Mean Squares (LMS)  adaptive algorithm is currently most commonly used for suppressing the background noise. The LMS algorithm is simple and mathematically modest. However, in real applications, it reaches lower convergence speed and higher error rate during the filtration process . Because of these reasons, the ANFIS system was used instead. Detailed description of the technology is described in [12,13], and .
Description of the reference room used for the experiments
Description of experiments for voice communication inside the reference room to suppress the additive background noise
During the implementation of the experiments, current knowledge in the field of voice recognition, voice recognition with additive background noise to determine the signal - noise ratio, language recognition, implementation of the ANFIS system to process the speech signal and in the field of filtering the noise from the speech signal using the ANFIS system, was used. For the quality evaluation of the processing of speech signal picked up on the output of the filter with the LMS algorithm, the DTW criterion was used. For reaching the designated goal, a simulation model of the ANFIS system with an application for filtering the additive noise of the speech signal was used. A numerical simulation of the ANFIS system model with verifying of the influence of individual parameter settings on its behavior was also conducted inside the MATLAB. A method for setting the optimal parameters of the adaptive filter with the LMS algorithm was then designed.
The designed system consists of two inputs (primary and reference microphone). The first input is the reference microphone, which picks up the unwanted noise. This signal is marked as n 1(n), (Fig. 21). The second input is the primary microphone (measured signal) which picks up the usable signal (voice - commands) plus unwanted background noise, this signal is marked as m(n). Detailed description of the designed system is in [12,13], and .
Information about ANFIS Structures Used
Building the ANFIS Model
Number of Nodes (NN)
Number of Linear Parameters (NLP)
Number of Nonlinear Parameters (NNP)
Total Number of Parameters (TNP)
Number of Fuzzy Rules (NFR)
Figure 23 (a-c) shows 3D spectrograms of examined signals. A 3D spectrogram is a form of spectrogram that is displayed in three dimensions. Compared to the classic spectrogram, the intensity of respective frequencies is displayed on the Z-axis. If a sectional plane parallel to the frequency axis and the Z-axis of the 3D spectrogram was done at a certain time, a spectrum of signals in the respective time would be the result.
Figure 24 (a-f) shows spectrograms of analysed speech signals. It is a 3D graph, which has two individually variable axes - frequency and time (the order of section spectres). A 2D spectrogram is used here; it is a top view on the original 3D graph.
Description of the methods for evaluating the quality of filtering the additive noise out of the speech signal
K is the number of segments in speech activity,
VADi is information about speech activity (values zero and one), further x i(n) = x(m i + n), n i(n) = n(m i + n) – segments of length M selected step m,
SNRi is Local SNR (Signal to Noise Ratio).
Resulting Values of the SSNR Improvement and DTW Criterion for 180-s Recording
P = [p(1),…, p(P)] of length P,
test vector (output speech signal from ANFIS): O = [o(1), . . . o(T)] of length T.
Table 2 shows that the best properties of the ANFIS system to filter the noise out of the speech signal occurred in the structure of ANFIS D.
Conducted experiments confirmed the functionality of the implemented system for filtering the background noise out of the voice control of operating and technical functions of an intelligent building. The experiments showed that the designed technology can successfully extract voice commands even when they are fully contaminated by background noise.
The aim of this work was to develop, implement and test voice control of operational and technical functions in a smart building. The final implementation was performed on a simulation model of a smart apartment. This model is able to simulate control of lighting, sun-blinds and air conditioning. Another task was to design a connection of the voice recognition system to the KNX bus and connection of the smart building simulation model. After performing an analysis of available and usable voice recognition systems, a set of criteria was established in order to enable the final implementation of the created system. The main criterion for the selection of the final solution was the price of the application. Use of the Microsoft SAPI module is free in the English version of the Windows system, however, an application to communicate with the KNX bus system on the user side and on the other side had to be created.
A voice synthesis feature was implemented in the application. The final application is able to recognize not only the user’s voice commands, but the application may also answer back. In order to test the functionality of the entire work, a statistics module was added to the application. To improve user comfort, a settings menu was added which enables the user to adjust the basic properties of the synthesis and voice recognition modules.
The next step in the development was the actual start-up and programming of the Smart Home Care simulation model. ETS software was used to program the KNX technology. Functionality was demonstrated by controlling lights and blinds. After the Smart Home Care module was successfully programmed, communication between this model and the created application was tested. A KNX/IP router was used to create communication between the KNX network and the personal computer. The advantage of this solution, as opposed to using a USB bus, is that it would be theoretically possible to connect/insert a Wi-Fi router between the KNX/IP router and the computer, and communicate with the entire network wirelessly. Communication was done via the Ethernet. The application was additionally programmed with a UDP client and a KNX/IP router was integrated into the Smart Home Care simulation model. The Wire Shark program was used to obtain communication addresses, ports and to transmit messages. The most difficult part was the actual creation of the voice recognition application and establishing communication with the KNX bus. The designed and implemented solution works according to the requirements. The indisputable advantage is zero cost, provided that the user is running Windows Vista (or higher) in the English version. As evident from the success achieved by the proposed voice recognition communication method, the voice recognition feature works very well. Ten people participated in the testing. Each person conducted five tests and uttered 100 voice commands. The average success rate of real voice recognition based on all tests is approximately 98.78 % (note that interference, noise or greater distance between the speaker and the microphone were used during certain tests). Some minor errors occasionally occurred when the application was in idle mode (waiting for the activation phrase) and a conversation in Czech language was taking place in the room. This problem is eliminated by adding a restrictive conditions to the program, which accepts the initiation phrase if the application is at least 90 % sure that the phrase was correctly received/said. Despite this, the program may sometimes recognize a Czech word as the actual initiation phrase. As a possible improvement - when used in the real world, it is advisable to use a microphone network covering the entire area of the building.
The article describes the draft and the implementation process of the voice control of operating and technical functions inside an intelligent building with assistive care for seniors (Smart Home Care) within a real environment with additive noise. This work and its implementation has been divided into five main parts: programming and starting the simulation panel of Smart Home Care with the KNX technology, implementation of communication between the tool created for voice control and the Smart Home Care simulation panel, programming a software application for voice control of operating and technical functions inside an intelligent building, conducting a statistic examination of the created software application’s speech recognition and implementation of a sophisticated tool for filtering the noise out of speech. This article represents a complex solution to voice control of operating and technical functions of the Smart Home Care.
This paper has been elaborated in the framework of the project Opportunity for young researchers, reg. no. CZ.1.07/2.3.00/30.0016, supported by Operational Programme Education for Competitiveness and co-financed by the European Social Fund and the state budget of the Czech Republic. This work is partially supported by the Science and Research Fund 2014 of the Moravia-Silesian Region, Czech Republic. This research was supported in part by VSB-Technical University Ostrava, FEECS under the project SGS registration number SP 2015/181, SP 2015/154.
- Merz H, Hansenmann T, Hubener C (2008) Automatizované systémy budov: Sdělovací systémy KNX/EIB, LON a BACnet. 2008. vyd. Grada Publishing, a.s, Praha, 978-80-247-2367-9Google Scholar
- Vanus J, Koziorek J, Hercik R (2013) Design of a smart building control with view to the senior citizens’ needs. In: ‘Book Design of a smart building control with view to the senior citizens’ needs’, 1st edn., pp 422–427Google Scholar
- Park KH, Bien Z, Lee JJ, Kim BK, Lim JT, Kim JO, Lee WJ (2007) Robotic smart house to assist people with movement disabilities. Autonomous Robots 22(2):183–198View ArticleGoogle Scholar
- Hsu CL, and Chen KY (2009) Practical design of intelligent remote-controller with speech-recognition and self-learning function. In Machine Learning and Cybernetics, 2009 International Conference on (Vol. 6, pp. 3361–3368). IEEE. (2009, July).Google Scholar
- Soda S, Nakamura M, Matsumoto S, Izumi S, Kawaguchi H, and Yoshimoto M (2012) Implementing virtual agent as an interface for smart home voice control. In Software Engineering Conference (APSEC), 2012 19th Asia-Pacific (Vol. 1, pp. 342–345). IEEE. (2012, December).Google Scholar
- Verma P, Singh R, Singh AK (2013) A framework to integrate speech based interface for blind web users on the websites of public interest. Human-centric Computing and Information Sciences 3:21, doi:10.1186/2192-1962-3-21View ArticleGoogle Scholar
- Vanus J, Styskala V (2011) Application of variations of the LMS adaptive filter for voice communications with control system. Tehnički vjesnik – Technical Gazette 18(4):553–560Google Scholar
- Martinek R, Al-Wohaishi M, and Zidek J (2010) Software based flexible measuring systems for analysis of digitally modulated systems. In Roedunet International Conference (RoEduNet), 2010 9th (pp. 397–402). IEEE. (2010, June).Google Scholar
- Vanus J, Koziorek J, Hercik R (2013) The design of the voice communication in smart home care. In Telecommunications and Signal Processing (TSP), 2013 36th International Conference on (pp. 561–564). IEEE. (2013, July).Google Scholar
- Luo Y, Hoeber O, Chen Y (2013) Enhancing Wi-Fi fingerprinting for indoor positioning using human-centric collaborative feedback. Human-centric Computing and Information Sciences 3:2, doi:10.1186/2192-1962-3-2View ArticleGoogle Scholar
- Martinek R, Zidek J (2010) Use of Adaptive Filtering for Noise Reduction in Communication systems. In Conference Proceeding: The International Conference Applied Electronics (AE). Pilsen, Czech Republic, 8–9 September 2010, pp. 215–220, ISBN 978-80-7043-865-7, ISSN 1803–7332, INSPEC Accession Number: 11579482.Google Scholar
- Martinek R, Zidek J (2014) The Real Implementation of ANFIS Channel Equalizer on the System of Software-Defined Radio. In: IETE Journal of Research, vol 60. Taylor & Francis, London, UK, Issue 2, pages 183–193, ISSN 0377–2063 (Print), 0974-780X (Online), doi:10.1080/03772063.2014.914698Google Scholar
- Martinek R, Manas J, Zidek J, Bilik P (2013) Power Quality Improvement by Shunt Active Performance Filters Emulated by Artificial Intelligence Techniques. In Conference Proceedings: 2nd International Conference on Advances in Computer Science and Engineering (CSE 2013). Los Angeles, CA, USA, July 1–2, 2013, pp. 157–161, ISSN 1951–6851, ISBN 978-90786-77-70-3, doi:10.2991/cse.2013.37.Google Scholar
- Martinek R, Zidek J (2012) Refining the diagnostic quality of the abdominal fetal electrocardiogram using the techniques of artificial intelligence. In Journal: Przeglad Elektrotchniczny (Electrical Review), Volume 88, Issue 12B, Warszawa, Poland, pp. 155–160, ISSN 0033–2097.Google Scholar
- ITU-T Test Signals for Telecommunication Systems, ITU-T P. 501, web. http://www.itu.int/net/itu-t/sigdb/genaudio/Pseries.htm.
- ITU-T Recommendation P.34 was revised by the ITU-T Study Group XII (1988–1993) and was approved by the WTSC (Helsinki, March 1–12, 1993), web. http://www.itu.int/ITU-T/recommendations/rec.aspx?rec=1726&lang=en.
- Vonasek M, Pollak P (2005) Methods for Speech SNR Estimation: Evaluation Tool and Analysis of VAD Dependency. Radioengineering 14(1):6–11, ISSN 1210–2512. (2005)Google Scholar
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.