Here are 4 questions to Igor Jablokov Ceo Yap :
Current status (roadmap, progress, number of users) : The first ever client implementation of Yap‘s platform is a next generation SMS and web services application written in J2ME. We originally intended to open it up as a public beta shortly after the TechCrunch event, but we were surprised with the volume of interest from operators, media, social networks, and web portals. The San Francisco Chronicle was especially kind in writing a headline called: "Why type when you can yap?" As such we are now evolving discussions with a number of major entities that are attempting to secure distribution rights. It is a good problem to have! That said, we have been in closed beta for many months now and it has helped shape our feature set and iron out remaining issues. We are extremely focused on an easy to use interface as possible; if our mother cannot use this it is junk. At the end of the year, we are comfortable stating that we will have a similar scale to Microsoft’s Tellme speech recognition platform. We also hope to bring this to the European market as fast as possible, but will do so with strategic partners. That is another dimension we are working on…do we stay independent? Or accelerate the work with the help of strategic investors or venture capitalists? We are already in and accepting new discussions in that realm now, to help set our geographical and technical priorities going forward.
Your know-how : I was formerly the head of IBM’s research and development focused on multimodal interaction and voice portals. Multimodality is a computer science focused on blending different user interfaces, in our example it would be graphics and speech, to accomplish tasks. It is more natural for a human to say something and see the results (think about how you prefer to interact with your contacts). This is also especially well suited to small form factor devices, which have small screens and keyboards. Given that Google reports that an average mobile activity takes 12 clicks, and further an AT&T executive once said they lose 90% of eyeballs with each click, you can see how one button access to *anything* might be exciting! Our team previous worked on many other advanced projects, such as Apple’s iPod, GM’s OnStar, Honda’s navigational systems, etc., with dozens of patents between us all. So we’re fairly confident we’re innovating in an area with few to no competitors.
Can you tell us more about your service ?
Our little adventure starts with our family background; my brother and I were born in Greece to artist parents, so we were always interested in rich user interfaces. We then connected those dots to found this company since our sister was starting to commute to university, and we noticed that she and her friends were texting while driving! It is quite unsafe, and when we researched it further we found out that over 66% of teenagers in the United States admitted to doing just that…quite scary! I know that governments throughout the world are trying to ban that where possible. After creating that implementation, we discovered the experience was well suited to many other activities, like asking for the latest news, weather conditions, searching the web, checking travel schedules, etc. So that’s when our platform strategy was born. So while most people are attracted to our mobile client, the true secret of our ambitions is that we created a very accurate, high speed, and vastly scalable freeform speech recognition platform. What this means is that an end user can say *anything* and we recognize it, unlike the current state of the art in speech recognition where you can only select from grammars (which are virtual lists). For example, you could use our SMS application to "write" a message with you voice (the text appears on your screen a couple seconds later) or post it to your blog.
What is your business model ?
We are still evolving these strategies, but it will end up being a mixture, given we can support multiple application types with our platform. We are quite lucky that the former head of business development for Amazon.com is on our board, since he is helping shape our thinking in this area. Our SMS application could be ad supported, we could derive revenue shares off 3rd party applications that we speech enable, or gather transactional fees based on voicemail to text conversions. There are other models available that we are unable to discuss publicly yet, but we are being fairly creative in ensuring this will be a sustainable, if not wildly profitable venture. In fact, a major executive at one of the top web portals labelled us one of the few mobile startups with a "real" business model. We take that as a good sign! Independent of how we make money, we are compelled to focus on how the rest of the ecosystem participates as well. Since our Chairman is also simultaneously the Chairman of an American wireless operator, he is keeping us focused on how they can participate in our contextual advertising revenues while also supporting their business through increased SMS volumes and data consumption. We also help differentiate device makers as they create ever smaller mobiles yet need to keep them usable. Case in point, touch based designs such as the iPhone still lack satisfactory text input.
What are the projects, additional features you’re preparing ?
We remain focused on securing investments to add talent and advance our internal initiatives for speech recognition, business intelligence, and supercomputing. Unfortunately there is not much I can disclose as specifics yet, as we are now working with a number of developers to speech enable their mobile applications (for consumers and enterprises), web services to add them into our client, wireless operators to customize versions for them, and device makers to optimize for their products. Independent of those activities, we are continually upgrading the client and are happy to be partnered with Tricastmedia (they are based in the UK); they allow our crazy visions for next generation visual user interfaces to be possible on low resource devices. By the time we are finished, a child will be able to open up their parent’s mobile phone, ask "why is the sky blue?" and we will answer them a second later. This dream of bringing all human knowledge to you within a single click makes all these late nights worth it…
What are the current users’ feedback, your learnings on the service’s usage ?
The students that had access during the closed beta were stunned when they first experienced the application, since they thought we used "magic" to convert anything they said! They found it especially useful while walking through campus between classes, chatting with their friends or dictating class notes. One student was especially lazy and realized that it worked quite well as spellcheck. Why? It was easier to say a complex word and get the written result back a couple seconds later than opening up their computer and fumbling attempts to enter a word that they did not know how to spell in the first place! Beyond the normal features you would expect being well received, like threaded SMS, we did have to step back and figure out how to allow you to select a contact to message if you have hundreds in your address book. That is one of the largest areas we are improving since everything else was accepted as we originally envisioned. Some of our most humbling moments were when our user interface was complimented by folks at frog design and iPod product managers. Considering how experienced they are in the field of human friendly design, we were quite honored.
Demo : An example of using the Yap9 client for text messaging, injecting relevant ads within the stream (Starbucks in this case), getting travel info from Orbitz, weather results from Yahoo!, and location based services from Google. It’s designed to be hyper efficient on low end devices and connects to the high powered Yap platform for remote speech recognition. (Please excuse our poor voice acting!)