Speech to Text with ATT API

Talk with the computer, your mobile device or even, your wristwatch/computer/mobile device is something the geeks always looking for.

I remember when that IBM dictation software become real and I started dictating pages and pages of nosense words, just for the pleasure of see the words magically come up the screen.

One decade after, speech to text(the not funny standard name to ‘My own Jarvis software’) become so available then companies launch SAAS for it.

You can use ATT, IBM and other saas services to understand you client voice commands and make your software obey this.

I had a very fast experience with this kind of service where did this post come from while develop a package using ATT speech to text API and this text here is the description of this experience.

This is  my list of 5 highlight about this experience

1 – Insufficiente documentation and samples
When you start working with something like speech to text you hope for an extensive documentation and samples because you at most of the time have no references for this. I dont talk about code you simple copy/paste, I am talking about use scenarios you need to understand the details. I can give you a example of this situations: Everyone can easilly use ATT api for english, but there is samples about who to use this for foreign languages and it is a bit confuse in the documentation.

2 –  Grammar syntax
If you go with a other language than english you need understand how to declare a grammar for what you expect to “translate” and you has many ways to do this.

You can declare a sentence ordering the words and set options like “Today is a rainy|sunny|cloudy day” or just the terms [rainy|sunny|clody], detach of the sentence/paragraph style.

3 – Community Forum/Support
The community over the ATT API services give me the impression of agility on answers question, even when this is a little bit more complex then “how do I use curl to request oauth credentials?”. It is good but not good because explcity show ATT knows their lack of documentation and do the forum works better to handle this instead of invest time and effort to fix the docs.

4 – Content negotiation
In the porsuit of give the developers a bunch of options , ATT fails on document it. I do prefer ATT made the choise of ONE format with extensive documentation.

5 – My lack of knowledge
In the sense of justice I need made this note. I have zero knowledge of how the things works in the Speech To Text world. I talk about the standards of the industry and interface so far. But I imagine SAAS as something usable for it audience and the audience of Speech to text become literally anyone developer with sense of user experience.

If you like to know more about att speech to text, visit it site in https://developer.att.com/apis/speech and have fun!

if you are a php developer, here the github for att code kit https://github.com/attdevsupport/codekit-php

if you are interested in some real raw sample http://developerboards.att.lithium.com/t5/Apps/Simple-Example-of-Speech-To-Text-in-PHP/td-p/35630

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s