A framework to integrate speech based interface for blind web users on the websites of public interest
© Verma et al.; licensee Springer. 2013
Received: 18 April 2012
Accepted: 29 November 2013
Published: 4 December 2013
KeywordsOnline TTS on web Speech service Speech interfaces Accessibility Usability Navigability
The original purpose of the World Wide Web was to be able to represent documents on different platforms and different user interfaces including text-based and auditory interfaces in a single computer network. It was then planned to convert each document into Braille . Today, in 21st century, Internet is one of the biggest tools that have eased our life to a major extent. It has offered alternate ways of performing tasks which were otherwise tedious and time consuming. Thus, without requiring traveling a long distance and waiting in a queue for hours, one can access any information or can perform a task in seconds. Unfortunately, the benefits of this powerful tool are still away from the blind users who make the use of screen readers or similar assistive tools to surf the web. Findings of a study  reveal that accessing web content was “frustratingly difficult” for blind users, implying the need for availability of more accessible and usable web content and better assistive technology.
To get the status of web accessibility & usability among blind web users, we made a study on a group of 50 blind users at Adult Training Centre, National Institute of Visually Handicap, Dehradun, India during February, 2012. For this study purpose, we categorized web usage by a blind user into simple, intermediate and complex. A usage is simple if a blind user browses for some news article, e-book or collects information on some topic. Screen Readers may serve well for all such simple usages. Tasks like sending or receiving e-mails, performing simple queries like finding examination result of a student by entering his/her roll number may be considered as of intermediate complexity. Tasks like getting a travel ticket reserved or online shopping are of complex category because they require multiple form filling that may spread across several web pages in a complex structure.
The participants were comfortable in using normal keyboards for providing inputs to the computer. Screen Reader, JAWS was being used by them for web browsing and email. They admired JAWS and admitted that they were able to use internet only because of this software. However, they also told that sometimes they were not able to access all the components of a webpage using JAWS. Most often, they were not able to find where to click on the webpage and as a result not able to proceed further. They were also facing problems in selecting dates from the Calendar while form filling, accessing information from social networking sites like face-book, chatting etc. Thus using JAWS they were able to perform simple tasks e.g. news paper reading, general surfing, knowledge gathering, simple query etc. but they were not comfortable in performing complex tasks involving multiple form filling. Besides, JAWS is not a freeware and its cost is too high to be afforded by an average Indian individual. Thus, the web usage of the participants was limited to the institute laboratory only.
In this paper, some of these issues and challenges have been taken up. We propose a framework using which, dedicated speech based interface may be provided on an existing website of public interest by its owner for providing its important services to the blind users. Thus, a blind user can perform important tasks independently on such a website without using any assistive tool like screen reader.
Existing systems and related work
There have been two popular approaches among the researchers to address the issues related to speech based web access for blind user. The first approach employs a client based assistive tool (e.g. screen reader) to speak out the web content in some desired order. The other approach makes the use of online Text to Speech (TTS) service through a proxy server to convert and send the web data in mp3 or other format to the client where it is played by a browser plug-in. In both cases, a transcoder may be used to renders the web content after converting it to a more accessible form. Unfortunately, both the approaches do not provide perfect solution for the accessibility problem and suffer from their own limitations. Usability of the screen readers is mainly constrained by the complex structure/poor accessibility of web pages. The proxy server based approach may not be treated as reliable as they are maintained by a third party. Besides, they may not work on secure sites. Thus, there are many important web based tasks which, at present, cannot be performed satisfactorily by the blind user using any of the available assistive tools.
Various systems have been developed using approaches like content analysis, document reading rules, context summary, summary/gist based, semantic analysis, sequential information flow in web pages etc. But these systems have a number of issues which make them less usable. First, they are essentially screen readers or their extension. Second, they provide only browsing and do not support other applications like mail, form-filling, transaction, chat etc. A brief survey of some important Screen Readers is given here.
Emacspeak  is a free screen reader for Emacs developed by T. V. Raman and first released in May 1995; it is tightly integrated with Emacs, allowing it to render intelligible and useful content rather than parsing the graphics.
Brookes Talk  is a web browser developed in Oxford Brookes University in 90’s. Brookes Talk provides function keys for accessing the web page. Brookes Talk reads out the webpage using speech synthesis in words, sentences and paragraph mode by parsing the web page content. It also uses some mechanism for searching the suitable results using search engines and supports a conceptual model of website too. It supports modeling of information on web page and summarizes the web page content.
Csurf  is developed by Stony Brook University. Csurf is context based browsing system. Csurf brings together content analysis, natural language processing and machine learning algorithm to help blind user to quickly identify relevant information. Csurf is composed of interface manager, context analyzer, browser object from tress processor and dialog generator. Csurf web browser uses the functionality of voice XML, JSAPI, freeTTS, Sphinx, JREXAPI, etc.
Aster (Audio system for technical reading) , developed by T. V. Raman, permits visually disabled individuals to manually define their own document reading rules. Aster is implemented by using Emacs as a main component for reading. It recognizes the markup language as logical structure of web page internally. Then user can either listen to entire document or any part of it.
Some researchers have also proposed to extract the web content using semantics .
Hearsay  is developed at Stony Brook University. It is a multimodal dialog system in which browser reads the webpage under the control of the user. It analyzes the web page content like HTML, DOM tree, segments web page and on the basis of this generates VoiceXML dialogues.
A Vernacular Speech Interface for People with visual Impairment named ‘Shruti’ has been developed at Media Lab Asia research hub at IIT Kharagpur, India. It is an embedded Indian language Text-to-Speech system that accepts text inputs in two Indian languages - Hindi and Bengali, and produces near natural speech output.
Shruti-Drishti  is a Computer Aided Text-to-Speech and Text-to-Braille System developed in collaboration with CDAC Pune and Webel Mediatronics Ltd, (WML) Kolkata. This is an integrated Text-to-Speech and Text-to-Braille system which enables persons with visual impairment to access the electronic documents from the conference websites in speech and braille form.
Screen reading software SAFA (Screen Access For All)  has been developed by Media Lab Asia research hub at IIT Kharagpur in collaboration with National Association for the Blind, New Delhi in Vernacular language to enable the visually disabled persons to use PC. This enables a person with visual impairment to operate PC using speech output. It gives speech output support for windows environment and for both English and Hindi scripts.
As far as general surfing is concerned, above mentioned screen readers are important and useful tool for the blind users. But, in case of complex tasks like information query, complex navigation, form-filling or some transaction, they do not work to the level of satisfaction. Some elements of websites that do not comply with the accessibility guidelines may be inaccessible to Screen Readers. Besides, they provide accessibility through abundant use of shortcut keys for which blind users have to be trained. Also, the screen readers need to be purchased and installed on the local machine which prevents them to use the internet on any public terminal.
Speech based web browsers
Prospects of Text-to-Speech (TTS) on web are gaining momentum gradually. At present, fetching mp3 on a remote web service is the only standard way for converting text to speech. APIs used for this purpose are proprietary and provide text to speech services, e.g. BrowseAloud  is a TTS service using which a web site can be speech enabled. Google Translate Service also has a TTS feature. Although many websites have provision of reading its contents, but it is limited to playing the content as a single mp3 file. There is no provision for interactive navigation and form filling in most of them .
Like screen readers, WebAnyWhere reads out the elements in sequential order by default. Although few shortcut keys are assigned to control the page elements, user has to make an assessment of the whole page in order to proceed further. In websites with poor accessibility design, user may be trapped during a complex navigation.
Although WebAnyWhere is a step forward in the direction of online installation-free accessibility, it has certain limitations: As the contents in WebAnyWhere are received through a third party, they may not be treated reliable. Fear of malware attacks, phishing etc. is associated with such access. Secure sites cannot be accessed using this approach as they allow only restricted operations on their contents. This is a major drawback since most of the important tasks like bank transaction, filling examination forms, using e-mail services etc. are performed over secure sites. These drawbacks compromise the usability of WebAnyWhere and limit it to an information access tool only.
ReadSpeaker Form Reader  is an online TTS service to website providers. It claims that it can help their customers/users in fill out online forms.
Task based approaches
Few websites e.g. State Bank of India (http://www.onlinesbi.com) offer key shortcuts to perform certain tasks e.g. fund transfer, check the balance etc. Fortunately these tasks are simple and limited to single page only. Still, no provision is made for speech synthesis assuming that Screen Reader will be used by the blind users.
There is a Framework, SICE  for developing speech based web applications that may provide specific functionality to blind users. The framework is based on VoiceXML specifications for developing web based interactive voice applications. Communications are made using VOIP (Voice Over Internet Protocol); therefore, user does not have the dependency on telephony interface as required by existing VoiceXML specifications. Unlike telephony, complex web data can be conveniently handled using customized two way dialogue based access system in a controlled way. The framework is more effective in developing speech based web access systems for dedicated functionalities rather than developing generalized speech interfaces. The drawback of the framework is that, being a heavyweight system, it requires huge investment on hardware and software on server side.
System design and architecture
So far, researchers have been emphasizing on finding the generic solutions to the accessibility issues. Expectations from the web site owners or content providers have been limited to providing accessible contents that could be usable with screen readers. Being client side tools, Screen Readers most often fail to perform the task as the web author intentions may not be well understood by assistive tool. As a result, blind users fail to perform important tasks like reservation booking, tax deposition, bill payment, online shopping etc. We propose that for all such important utilities, accessibility solution could be automated on the server side by the website author/owner. Thus, there is a need to have an authoring tool which could provide an integrated solution for sighted and blind users using a single website.
Issues to be tackled are manifold. While surfing internet using a client based assistive technology like screen reader, blind user has to face problems in both intra webpage navigation and inter webpage navigation. The situation becomes worse in case of complex web pages. First, the website navigational structures need to be traversed on a webpage each time a screen reader user wishes to locate a relevant link. Besides, she may get stuck in-between when something goes wrong during the form validation on submit button of a form. Inter webpage navigational issues are primarily caused by possible multiple entry/exit points on related web pages. After submitting a form on webpage, control may not reach to the relevant form or link on the next page. As a result, visually disabled user has to search the relevant form or link sequentially. Efforts have been made to tackle the issue using some semi-automatic client based approaches e.g. Curf  uses a context based approach to reach to the most relevant link on the next page. Unfortunately, these approaches do not serve our purpose.
Our proposed framework is inspired by online TTS Systems like WebAnywhere, BrowseAloud, ReadSpeaker form reader etc.. Working of a client-based tool, IMacro  has equally inspired us to automate the user activity, rather from server side in our case. Although IMacro is not an assistive technology, as it has been used to automate bulk form filling on client machine, our idea is to provide a similar functionality to client from server side.
Although, direct speech enabling of public websites seems to be the best strategy in terms of usability and accuracy with which blind users can interact with complex form elements, it has not been a popular approach in industry. There are certain fears that must be overcome to make the approach feasible in general. First and most prominent one is that web authors need to create two documents for everything they write, which is obviously an overhead. The issue is tackled as follows: Is it possible to send the text in speech (mp3) form to the blind user from the same webpage that caters the other users? If so, overheads of maintaining two copies of same webpage can be eliminated.
The other fear is related with the hardware cost to maintain the speech server and other interfaces. A lightweight solution is imperative in terms of hardware and software architecture. Meeting these two requirements can make direct speech enabling the websites a feasible solution.
Design goals and decisions
The goal of this work is to design a framework which will facilitate the owner of website to provide a speech interface to its important services for the blind users. The framework should be based on providing an alternate access system on the fly on the same website so that the overheads involved are minimal. It should be scalable i.e. can be expanded for additional functionalities over time. Accessibility, Usability and Navigability should be enhanced considerably. Access time/Usage time for a given task should be reduced. Instances of confusion or indecisiveness during navigation should be eliminated completely. The system should be able to run for secure connections or at least in a mix of secure and insecure connection with extra measures of security through providing restricted access to the alternate speech based system.
Local installations should not be required. Thus, user should be able to access and use the website from any public terminal. However, it should seamlessly work with screen reader like JAWS if available, without any conflict. Speech synthesis for the text on the web and voice feedback for keyboard operations as a prerequisite for a blind user to use the web should be provided.
The conceptual architecture
Case-study: Indian railways website
Indian Railways, being fully State controlled, is a major means of local transport in India. At present, Indian Railways provides various services viz. enquiry, online reservation, tour planning & management and hotel booking through its exclusive website, i.e. http://www.irctc.co.in. Prior to this service being operational, the travelers had to physically visit the reservation centers to avail the reservation and other services. They had to wait in long queues to get the services. Now, as a convenient choice, a user can avail the service promptly, without making any physical movement, through online access to the Indian Railways Website. A large number of persons are taking advantage of this facility every day.
Unfortunately, this facility is not being availed by approximately seven million visually disabled in the country due to lack of any suitable interface available on the website for them. As compared to their sighted counterparts, they are in more need of such a facility that could empower them by avoiding their physical movement to get the service at a reservation centre. To take care of security related issues at high priority and to prevent various malpractices & abuses made by agents or others, the site owners have imposed various types of restrictions for using the website or its database. This prevents the use of third party approaches like ‘WebAnyWhere’ to access the site through speech.
The website has several noted instances of inaccessibility where a screen reader user may get trapped during a task e.g. Book a Berth. If this task could be done by providing a separate dialogue based system accessible from the homepage of the Indian Railways website, it would be more than worth making effort for providing such an interface.
Navigation related issues
Even some information pertaining to table data may not be semantically co-related since screen reader speaks out all the table headings followed by all the table data. For a big table, it becomes difficult to remember that which data belongs to what heading.
A blind user may accidently click a link or picture of an advertisement which may take him/her away from his website of interest.
Identify the key functionalities of the website
The first step is to identify the key functionalities on the Indian Railways website, which should be made accessible and usable through a speech based interface. The functionalities may be speech enabled in a phased manner: more important first. To begin with, Book a Berth is a heavily used functionality on the site. Thus, it a good start point to make this functionality speech enabled.
Speech enabling the book a berth functionality
Home Page (Figure 9).
Blind user presses the designated Functionality Key shortcut for Book a Berth
Control transfers to “Login Form” on the left frame.
System speaks: User Name, Text Box.
Blind user enters his user name in the text box currently in focus, listens speech feedback for key-presses made by him/her.
System speaks: Password, Text Box.
User enters his Password, listens his key presses in the text box currently in focus.
System speaks: Login, Submit Button.
On user click, Control transfers to linked page at “Plan My Travel” Form.
Plan My Travel Page (Figure 5).
System speaks: From, Text Box.
User enters the code for Source Station, listens his key presses in the text box currently in focus.
System speaks: To, Text Box.
User enters the code for Destination Station, listens his key presses in the text box currently in focus.
System speaks: Date, Date Table Object.
System speaks the Headers (Month name) from each table of months,
User clicks the month name in the table header currently in focus.
System speaks each date of the selected Month table sequentially.
User clicks the desired date in the table currently in focus.
System speaks: Ticket Type, Option Button.
System speaks options for Ticket Type, sequentially.
User clicks the desired Ticket Type in the Option Button currently in focus.
System speaks: Quota, Option Button.
System speaks options for Quota, sequentially.
User clicks the desired Quota Type in the Option Button currently in focus.
System speaks: Find Trains, Submit Button.
On user click, Control transfers to linked page at “List of Trains” Page.
List of Trains Page (Figure 6).
System speaks: List of Trains, Option Button.
System speaks options for Trains and berth availability.
User clicks the desired Train & Class in the Option Button currently in focus.
On user click, Control transfers to the page having “Train Details”.
Train Details Page (Figure 7).
System speaks: Availability, Table with Links.
System speaks the table data sequentially for date, availability and Link for “Book.”
User clicks the “Book” link on the desired date currently in focus.
Control transfers to the “Passenger Details” page
Passenger Details Page (Figure 8).
System speaks: Name, Text Box.
User enters his name, listens his key presses in the text box currently in focus.
System speaks: Age, Text Box.
User enters his Age, listens his key presses in the text box currently in focus.
System speaks: Sex, Option Box.
System speaks the Options
User selects his Sex, currently in focus.
System speaks: Berth Preference, Option Box.
System speaks the Options,
User selects his Berth Preference, currently in focus.
System speaks: Senior Citizen, Checkbox.
User checks if eligible, the checkbox currently in focus.
System speaks: Verification Code, Text Box.
System speaks the code.
User enters each character of the code in the text box currently in focus after listening it.
System speaks: Go, Submit Button.
On user click, Control transfers to linked page at “Ticket Details” Page.
Ticket Details Page (Figure 10).
System speaks the details of the ticket.
System Speaks: Make Payment, Submit Button.
On user click, Control transfers to the “Make Payment” Page
Make Payment Page (Figure 11).
System speaks: Types for Payments, Option Button.
System speaks each available payment type
User clicks the preferred type currently under the focus.
System speaks: Options available, Radio Button.
System speaks the available options for the type of payment selected by the user.
User selects the preferred option under focus. Control transfers to the bank page for payment.
System assessment and performance analysis
In this paper we have outlined an alternative approach for addressing the issue related to usability of the websites of public interest. There are instances of important tasks like the one described by us as a case study, which are difficult to be performed by blind users using any of the existing web surfing tools. The proposed framework will provide workable and robust solutions to such complex tasks. Thus, a blind user can perform the task on any available public terminal conveniently.
Using screen reader on the Indian Railways website, none of the participants could complete the task without assistance. Four participants got stuck on the ‘Plan my Journey’ page. Only one participant could reach up to ‘List of Trains’ page where he could not locate the train details using screen reader. Participants were then asked to perform the same task on our speech-enabled dummy website for railway reservation. All the participants completed the task without difficulty. In fact, they were amazed by the simplicity and ease with they could perform this tedious task.
Overall results of this evaluation can be summarized as follows:
Usability is visibly enhanced as all the participants could perform the task without difficulty.
Time taken to perform ‘Book a Berth’ functionality is considerably reduced as compared to screen reader JAW.
However, an exhaustive assessment of the framework with target group is still to be made that would be possible only after the System is developed completely.
Conclusions and future directions
Traditionally, an industry criterion for making investment on a project has been determined by the number of use cases offered by the aimed product. Thus, working at such a micro level may not gain due importance. Our work is primarily addressed to the responsible owners of the big websites meant for public usage that are accessed heavily. A little effort on their part could help blind users to a great extent. Thus, owners of the websites of public interest should come forward to add provisions for dedicated speech based interfaces for blind users.
All websites are not accessed equally on the Web. Therefore, it seems important that the websites which account for heavy traffic are made better accessible . Public utility website owners should feel a greater sense of responsibility in providing feasible, error free and workable functionality from the point of view of blind users rather than leaving them to struggle with their screen readers. Usability, in its best form, at least for the most important functionalities, must be incorporated by the owners of these websites at the time of design itself. Other functionalities may be added up on the need basis in an incremental way.
It is hoped that the goal of achieving the benefits of internet to all including visually disabled users as desired by its original propagators can be achieved to some extent by providing the speech based dedicated functionalities on important websites of public interest.
Prabhat Verma works as Asst. Professor in CSE Department, Harcourt Butler Technological Institute, Kanpur. He has about 13 Years experience in academics. He did his Ph. d. in Computer Science and Engineering from Uttarakhand Technical University, Dehradun in January 2013, M. Tech. in Computer Science and Engineering from U. P. Technological Institute, Lucknow in 2008 and B. E. in Computer Technology from Barkatullah University, Bhopal in 1992. His interests include Object Oriented Systems, Human Computer Interaction, and Web Technology.
Raghuraj Singh works as Professor in CSE Department, Harcourt Butler Technological Institute, Kanpur. He has about 22 Years experience in academics. He did his Ph.d. in Computer Science and Engineering from U. P. Technical University, Lucknow in 2006, M. S. in Software Systems from B.I.T.S. Pilani in 1997 and B. Tech. in Computer Science and Engineering from Kanpur University, Kanpur in 1990. He has graduated five Ph.D. students. His interests include Software Engineering, Artificial Intelligence, and Human Computer Interaction.
Avinash Kumar Singh has worked as a Research Fellow in CSE Department, Harcourt Butler Technological Institute, Kanpur. He has completed M. S. in Software Systems from BITS, Pilani in 2013 and Master in Computer Application from IGNOU, New Delhi in 2008. His interests include Brain Computer Interaction, Machine Learning, and Robotics.
The authors would like to thank the University Grant Commission, New Delhi, Uttarakhand Technical University, Dehradun and Adult Training Centre, National Institute of Visually Handicap, Dehradun for providing their support for the research work.
- Steinmetz R, Nahrstedt K: Multimedia applications. X media Publishing Series, Springer; 2004.View ArticleGoogle Scholar
- Enabling Dimensions: Usage of computers and internet by the visually disabled: issues and challenges in the Indian context, findings of a study conducted by enabling dimensions, January 2002, New Delhi. 2002. http://www.enablingdimensions.com/downloadsGoogle Scholar
- Freedom Scientific access date 18.02.2012 http://www.freedomscientific.com/ access date 18.02.2012
- Harper S, Patel N: Gist summaries for visually disabled surfers. ASSETS’05: Proceedings of the 7th international ACM SIGCCESS conference on Computers and accessibility, New York, NY, USA, 2005 2005, 90–97.Google Scholar
- EMACSPEAK http://emacspeak.sourceforge.net/smithsonian/study.html
- Zajicek M, Powel C, Reeves C: Web search and orientation with brookestalk. In Proceedings of tech. and Persons with disabilities Conf. 1999. Los Angeles: (CSUN’99); 1999.Google Scholar
- Mahmud J, Borodin Y, Ramakrishnan IV: Csurf: A context-driven non-visual web-browser. Proceedings of the International Conference on the World Wide Web (WWW’07), Banff, Alberta, Canada, May 08–12, 2007 2007.Google Scholar
- Raman TV: Audio system for technical reading, Ph.D. Thesis. Springer; 1998.View ArticleGoogle Scholar
- Huang A, Sundaresan N: A semantic transcoding system to adapt web services for users with disabilities. In Fourth International ACM Conference on Assistive Technologies, November 13–15, 2000, Virginia, ACM-SIGCAPH. Published by ACM; 2000.Google Scholar
- Ramakrishnan I, Stent A, Yang A: Hearsay: enabling audio browsing on hypertext content. In Proceedings of the 13th International Conference on World Wide Web (2004). ACM Press; 2004:80–89.View ArticleGoogle Scholar
- Media Lab Asia http://medialabasia.in/index.php/shruti-drishti
- Screen Access For All (SAFA) access date 18.02.2012 http://punarbhava.in/index.php?option=com_content&view=article&id=919&Itemid=291 access date 18.02.2012
- Aloud Browse access date 18.02.2012 http://www.browsealoud.com access date 18.02.2012
- Bigham J, Prince C, AND Ladner R W4A2008 -Technical, April 21–22, 2008. In WebAnywhere: a screen reader on-the-go. Beijing, China: Co-Located with the 17th International World Wide Web Conference; 2008.Google Scholar
- Read Speaker access date 22.09.2013 http://www.readspeaker.com/readspeaker-formreader/ access date 22.09.2013
- Verma P, Singh R, Singh A: SICE: an enhanced framework for design and development of speech interfaces on client environment, 2011. Int J Comp Appl (0975 – 8887) 2011, 28: 3.Google Scholar
- Internet Macros access date 22.09.2013 http://www.iopus.com/imacros/ access date 22.09.2013
- Bigham J, Ladner R: Accessmonkey: a collaborative scripting framework for web users and developers, in W4A2007 technical paper May 07–08, 2007. Banff, Canada: Co-Located with the 16th International World Wide Web Conference; 2007.View ArticleGoogle Scholar
- WebInsight 2012.http://webinsight.cs.washington.edu/accessibility/ access date 18.02.2012
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.