Internet Speech


Requires Macromedia Flash
	Web Browsing
Requires Audio Program. Click the right mouse button to download.
	E-mail
	Stock Quote

Rendering Whitepaper - Talking and Listening to the Internet: An Application of Website Rendering

Introduction
The idea of listening to the Internet may at first sound a bit like watching the radio. How does a visual medium rich in icons, text, and images translate itself into an audible format that is meaningful and pleasing to the ear? The answer lies in an innovative integration of three distinct technologies that render visual content into short, precise, easily navigable, and meaningful text that can be converted to audio.

The three technologies employed to accomplish this feat are:

1. Speech recognition
2. Text-to-speech translation, and
3. Web content rendering

Background
For background information on voice/audio Internet, the InternetSpeech Web site contains an Industry White Paper, which discusses the Digital and Language Divides, voice portals, Wireless Application Protocol (WAP), and how the netECHO® technology developed by InternetSpeech overcomes the limitations of VXML, SALT, and other web based programming solutions employed to voice enable Internet web content. A Marketing White Paper discusses in detail the markets for voice/audio Internet, applications for its use, an overview of the technology used in netECHO and possible partnership mechanisms. This paper serves as a technical supplement to the Industry and Marketing White Papers described above. It contains an examination of the current state-of-the-art employed in converting visual Web content into meaningful and concise text, which in turn can be rendered into audio or displayed on a handheld device that lends itself to textual presentation as opposed to the rich visual imaging displayed by traditional Internet Web pages. An illustration of each of these applications is presented.

Definition
What is rendering? Merriam-Webster Unabridged defines rendering as “a work forming a presentation, expression, or interpretation (as of an idea, theme, or part)”. Information technology uses this term to refer to how information is presented according to the medium, for example, graphically displayed on a screen, audibly read using a recording device, or printed on a piece of paper.

In the context of voice/audio Internet, Web content rendering entails the translation of information originally intended for visual presentation into a format more suitable to audio. Conceptually this is quite a straightforward process but tactically, it poses some daunting challenges in executing this translation. What are those challenges and why are they so difficult to overcome? These questions are explored in the next section of this paper.

The rendering problem
Computers possess certain superhuman attributes, which far outstrip that of mortal man—most notable are their computational capabilities. The common business spreadsheet is a testament to this fact. Other seemingly more mundane tasks, however, present quite a conundrum for even the most sophisticated of processors. Designing a high-speed special purpose computer capable of defeating a grandmaster at chess took the computing industry over 50 years to perfect. Employing strategic thinking is not a computer’s forte. That is because in all the logic embodied in their digitized ones and zeroes, there is no inherent cognitive thought. This one powerful achievement of the brain along with our ability to feel and express emotion separates the human mind from its computerized equivalent—the centralized processing unit (CPU).

It is doubtful (and academically arguable) whether or not computers may one day be capable of expressing emotion but computer scientists have been struggling with the issue of developing cognitive thought and have made surprising progress over the years. The branch of computing involved in this pursuit is known as Artificial Intelligence. It is a field rife with “buzz words” such as, fuzzy logic, neural networks, adaptive control, and stochastic reasoning.

The relevance of cognitive thought to text rendering may not be immediately obvious but it is one of the major challenges faced when attempting to present information designed for one medium and rendering it to another. This is because there are no hard and fast objective rules to follow. Computers are very good at following instructions when they can be reduced to very objective decision points. They are not so good when value judgments are involved. A human being can readily distinguish a cat from a dog, or a relevant news link on a Web page from a link for an advertisement. For a computer this simple exercise is significantly more challenging than applying the Taylor expansion formula to a set of polynomials—something a computer can do quite handily.

Navigating a Web page is very similar in process to reading a newspaper. We immediately note the lead story, scan headlines, and then select a story of interest to us. Once we begin reading a news item we inevitably must turn to an inside page and specific column to finish the story. This is the Internet equivalent of linking to another page with a mouse click. Having performed that link our brain processes the visual clues on the next Web page to begin reading where the previous page left off—easy for us but not so easy for a computer.

Solving the problem
To solve the rendering problem, some intelligent techniques must be applied. The relevant data must be selected, navigated to its conclusion, and reassembled for presentation by a different medium. All of this must be done for all web pages, dynamically, in real-time and in an automated fashion. We have used an Intelligent Agent (IA) that uses various intelligence techniques including “artificial intelligence”.

Using Visual Clues
Understanding the process that our brains go through in making qualitative choices is key to developing an artificially intelligent solution. In the example of Web page navigation we know that our brains do not attempt to read and interpret an entire page of data rather they take their cues from the visual clues implemented by the Web designer. These clues include such things as placement of text, use of color, size of font, and density of content. From these clues a list of potential areas of interest can be developed and presented as a list of candidates.

Upon selecting an item of interest it is common to have to navigate to another Web page to read all the data of interest (just like in the newspaper example). To do so we click on a Web link. When following a page link the problem of continuity of thought is encountered because almost assuredly the newly linked page contains data in addition to the thread of information we are attempting to follow. In order to maintain continuity with the item from the previous page a contextual correlation must be made. Once again, this cognitive process poses a formidable challenge for the computer and requires application of Intelligent Agent (or artificial intelligence) principles to solve.

Simplifying for speech
The first step involves dynamically removing all the programming constructs and coding tags that comprise the instruction to a Web browser on how to visually render the data. HTML, CHTML, XML, and other languages are typically used for this purpose. Because the data is now being translated or rendered to a different medium, these tags no longer serve any purpose.

It is doubtful that every single data item on a page will be read. Just like reading a newspaper, we read only items of interest and generally skip advertisements completely. Thus, we need to automatically render important information on a page and then when a topic is selected, only the relevant information from the linked page corresponding to the selected topic needs to be presented. . Rendering is achieved by using Page Highlights (using a method to find and speak the key contents on a page), finding right as well as only relevant contents on a linked page, assembling right contents from a linked page, and providing easy navigation.

Finding and Assembling Relevant Information
To find relevant information, the Intelligent Agent (IA) uses various deterministic and non-deterministic algorithms that use contextual and non-contextual matches, semantic analysis, and learning. This is again very similar to how we do use our eyes and brain to find the relevant contents. To ensure real-time performance, algorithms are simplified as needed yet producing very satisfactory results. Once relevant contents are determined, they are assembled in appropriate order that makes sense when listen to in audio or viewed on a small screen.

How Well Does the Rendering Work?
To answer how well the ‘rendering’ and Voice Internet can provide meaningful contents from today’s Internet, we need to answer the following questions:

(1) can the contents really be provided from any web site on the Internet?

(2) can the existing Internet contents be rendered in a manner that the rendered content can be obtained in real time, is short, precise, easy to navigate, meaningful in audio and pleasant to listen?

The answers to both questions are “yes”. Depending on the site, the “yes” can be a very strong “yes” or a strong “yes” a weak “yes”. A content rich page with a small number of links makes rendering and navigation easy since there are only a few choices, and one can quickly select a particular topic or section. If the site is rich in content, links and images/graphics, the problem is more difficult but good solution still exists by carefully selecting a built-in feature called “Page Highlights”. The most difficult case is when a page is very rich in images/graphics and links. In such cases, the main information is located several levels down from the home page and so navigation becomes more difficult as one has to go through multiple levels. Using multi-level Page Highlights and customized Highlights, the content can still be rendered well. But in this case, it is not as easy to navigate as the other two cases. Usually most of the Internet contents fall under the first and second categories.

Rendering to a new medium
The two key media for rendering into are Audio using any phone and Visual using a cell phone screen or PDA. There is a good synergy between these two modes from rendering standpoint. Both need small amount of meaningful information at a time that can be heard or viewed at ease with easy navigation. This is achieved by using Page Highlights mentioned above and finding relevant contents, column at a time like we do when we read a news paper or website.

A column of text information can be converted to audio that can be heard with ease. The rate of hearing i.e. content delivery can be controlled to suit user’s needs. The selection of a website, Page Highlight, speed of hearing etc can be all done by Voice Commands. This results Voice Internet i.e. basically talking and listening to the Internet.

The same column of text can be displayed on a small screen that can be viewed at ease as a small screen can easily display a column; but not a whole page. The contents are then automatically scrolled using various speeds and hence can easily be viewed and absorbed at ease. This is what results a MicroBrowser or “true” wireless Internet that does not need any re-write of the website and presents contents at ease in a meaningful way.

netECHO: a voice Internet application
Here’s how InternetSpeech has given a voice to the Internet. Voice Internet technology, netECHO®, uses an Intelligent Agent (IA) as described above that transforms an ordinary telephone into a high-tech tool for accessing the Internet. A user calls the service, is greeted by the IA. The IA then provides a menu of items to choose contents from the Internet. The user can surf any website, search for sites or information using search word(s), send and receive email and conduct e-commerce. In addition, common voice portal features, such as news, weather, horoscopes and directions can be quickly accessed from a menu of items. Users can give simple commands, such as “go to Yahoo” or “read my email” to get to the net-based information they want, when they want it, whether they’re out on an appointment, stuck in traffic, sitting in an airport, or cooking dinner. Users can quickly locate information, such a late-breaking news, traffic reports, directions, or anything else they’re interested in on the World Wide Web.

This automated way of accessing any content from the Internet using an Intelligent Agent is the key to creating a voice Internet that doesn’t require re-writing the web sites. The IA dynamically translates the accessed web pages into speech. There is no limit to the sites you can access, since all common markups languages ( HTML, WML, XML, VoiceXML, etc) are supported. The IA evaluates the site and determines which information is most useful and meaningful (“rendering”), then presents the content in easy-to-follow chunks using the “Page Highlights” feature. The system takes the caller to the selected content on a linked page with easy navigation by simply saying which link he/she wants after being given a short list of choices.

A similar software Intelligent Agent is also used for business-to-business and government applications that let a company’s customers hear and interact with their web site (or web based applications) from any phone, without a computer. The software allows customers to retrieve product and pricing information, check an order or account status, purchase products, or obtain product support, etc., using their own voice.

The work that has been accomplished to date makes the majority of sites very useful, and with customization, allows for further interaction with the site for e-commerce and other complex form filling applications. Further development of the IA technology will increase the usefulness of all sites.

Robotics and Beyond
The basic web “rendering” technology can be applied in many other applications including Robotics, Distance Learning and Question and Answer (Q & A) system. A robot can easily access the Internet, get desired information and provide it to its clients. A person can ask a question using natural language (instead of giving a list of words for search) and IA can find the answer(s) in short, precise and meaningful way from the vast majority of available information on the Internet.

Let’s Get Started
We would like to forge a strategic relationship with you to address the opportunities outlined above.
Call us toll free today at 877-231-9286 or send Email at corporate@internetspeech.com .
Location 5942 Foligno Way, San Jose, CA 95138, U.S.A.
Web site InternetSpeech, Inc

netECHO^® is a registered trademark of InternetSpeech.