Writing

A Brief History of ASR: Automatic Speech Recognition

A Brief History of ASR: Automatic Speech Recognition

This moment has been a long time coming. The technology behind speech recognition has been in development for over half a century, going through several periods of intense promise — and disappointment. So what changed to make ASR viable in commercial applications? And what exactly could these systems accomplish, long before any of us had heard of Siri?

The story of speech recognition is as much about the application of different approaches as the development of raw technology, though the two are inextricably linked. Over a period of decades, researchers would conceive of myriad ways to dissect language: by sounds, by structure — and with statistics.

Early Days

Human interest in recognizing and synthesizing speech dates back hundreds of years (at least!) — but it wasn’t until the mid-20th century that our forebears built something recognizable as ASR.

Image result for ibm shoebox

1961  —  IBM Shoebox

Among the earliest projects was a “digit recognizer” called Audrey, created by researchers at Bell Laboratories in 1952. Audrey could recognize spoken numerical digits by looking for audio fingerprints called formants — the distilled essences of sounds.

In the 1960s, IBM developed Shoebox — a system that could recognize digits and arithmetic commands like “plus” and “total”. Better yet, Shoebox could pass the math problem to an adding machine, which would calculate and print the answer.

Meanwhile researchers in Japan built hardware that could recognize the constituent parts of speech like vowels; other systems could evaluate the structure of speech to figure out where a word might end. And a team at University College in England could recognize 4 vowels and 9 consonants by analyzing phonemes, the discrete sounds of a language.

But while the field was taking incremental steps forward, it wasn’t necessarily clear where the path was heading. And then: disaster.

Image result for whither speech recognition

October 1969  —  The Journal of the Acoustical Society of America

A Piercing Freeze

The turning point came in the form of a letter written by John R. Pierce in 1969.

Pierce had long since established himself as an engineer of international renown; among other achievements he coined the word transistor (now ubiquitous in engineering) and helped launch Echo I, the first-ever communications satellite. By 1969 he was an executive at Bell Labs, which had invested extensively in the development of speech recognition.

In an open letter³ published in The Journal of the Acoustical Society of America, Pierce laid out his concerns. Citing a “lush” funding environment in the aftermath of World War II and Sputnik, and the lack of accountability thereof, Pierce admonished the field for its lack of scientific rigor, asserting that there was too much wild experimentation going on:

“We all believe that a science of speech is possible, despite the scarcity in the field of people who behave like scientists and of results that look like science.” — J.R. Pierce, 1969

Pierce put his employer’s money where his mouth was: he defunded Bell’s ASR programs, which wouldn’t be reinstated until after he resigned in 1971.

Progress Continues

Thankfully there was more optimism elsewhere. In the early 1970s, the U.S. Department of Defense’s ARPA (the agency now known as DARPA) funded a five-year program called Speech Understanding Research. This led to the creation of several new ASR systems, the most successful of which was Carnegie Mellon University’s Harpy, which could recognize just over 1000 words by 1976.

Meanwhile efforts from IBM and AT&T’s Bell Laboratories pushed the technology toward possible commercial applications. IBM prioritized speech transcription in the context of office correspondence, and Bell was concerned with ‘command and control’ scenarios: the precursors to the voice dialing and automated phone trees we know today.

Despite this progress, by the end of the 1970s ASR was still a long ways from being viable for anything but highly-specific use-cases.

The ‘80s: Markovs and More

A key turning point came with the popularization of Hidden Markov Models (HMMs) in the mid-1980s. This approach represented a significant shift “from simple pattern recognition methods, based on templates and a spectral distance measure, to a statistical method for speech processing”—which translated to a leap forward in accuracy.

A large part of the improvement in speech recognition systems since the late 1960s is due to the power of this statistical approach, coupled with the advances in computer technology necessary to implement HMMs.

HMMs took the industry by storm — but they were no overnight success. Jim Baker first applied them to speech recognition in the early 1970s at CMU, and the models themselves had been described by Leonard E. Baum in the ‘60s. It wasn’t until 1980, when Jack Ferguson gave a set of illuminating lectures at the Institute for Defense Analyses, that the technique began to disseminate more widely.

The success of HMMs validated the work of Frederick Jelinek at IBM’s Watson Research Center, who since the early 1970s had advocated for the use of statistical models to interpret speech, rather than trying to get computers to mimic the way humans digest language: through meaning, syntax, and grammar (a common approach at the time). As Jelinek later put it: “Airplanes don’t flap their wings.”

These data-driven approaches also facilitated progress that had as much to do with industry collaboration and accountability as individual eureka moments. With the increasing popularity of statistical models, the ASR field began coalescing around a suite of tests that would provide a standardized benchmark to compare to. This was further encouraged by the release of shared data sets: large corpuses of data that researchers could use to train and test their models on.

In other words: finally, there was an (imperfect) way to measure and compare success.

Image result for November 1990, Infoworld

November 1990, Infoworld

Consumer Availability — The ‘90s

For better and worse, the 90s introduced consumers to automatic speech recognition in a form we’d recognize today. Dragon Dictate launched in 1990 for a staggering $9,000, touting a dictionary of 80,000 words and features like natural language processing (see the Infoworld article above).

These tools were time-consuming (the article claims otherwise, but Dragon became known for prompting users to ‘train’ the dictation software to their own voice). And it required that users speak in a stilted manner: Dragon could initially recognize only 30–40 words a minute; people typically talk around four times faster than that.

But it worked well enough for Dragon to grow into a business with hundreds of employees, and customers spanning healthcare, law, and more. By 1997 the company introduced Dragon NaturallySpeaking, which could capture words at a more fluid pace — and, at $150, a much lower price-tag.

Even so, there may have been as many grumbles as squeals of delight: to the degree that there is consumer skepticism around ASR today, some of the credit should go to the over-enthusiastic marketing of these early products. But without the efforts of industry pioneers James and Janet Baker (who founded Dragon Systems in 1982), the productization of ASR may have taken much longer.

Image result for whither speech recognition 25 years later

November 1993, IEEE Communications Magazine

Whither Speech Recognition— The Sequel

25 years after J.R. Pierce’s paper was published, the IEEE published a follow-up titled Whither Speech Recognition: the Next 25 Years⁵, authored by two senior employees of Bell Laboratories (the same institution where Pierce worked).

The latter article surveys the state of the industry circa 1993, when the paper was published — and serves as a sort of rebuttal to the pessimism of the original. Among its takeaways:

  • The key issue with Pierce’s letter was his assumption that in order for speech recognition to become useful, computers would need to comprehend what words mean. Given the technology of the time, this was completely infeasible.
  • In a sense, Pierce was right: by 1993 computers had meager understanding of language—and in 2018, they’re still notoriously bad at discerning meaning.
  • Pierce’s mistake lay in his failure to anticipate the myriad ways speech recognition can be useful, even when the computer doesn’t know what the words actually mean.

The Whither sequel ends with a prognosis, forecasting where ASR would head in the years after 1993. The section is couched in cheeky hedges (“We confidently predict that at least one of these eight predictions will turn out to have been incorrect”) — but it’s intriguing all the same. Among their eight predictions:

  • “By the year 2000, more people will get remote information via voice dialogues than by typing commands on computer keyboards to access remote databases.”
  • “People will learn to modify their speech habits to use speech recognition devices, just as they have changed their speaking behavior to leave messages on answering machines. Even though they will learn how to use this technology, people will always complain about speech recognizers.”

The Dark Horse

In a forthcoming installment in this series, we’ll be exploring more recent developments and the current state of automatic speech recognition. Spoiler alert: neural networks have played a starring role.

But neural networks are actually as old as most of the approaches described here — they were introduced in the 1950s! It wasn’t until the computational power of the modern era (along with much larger data sets) that they changed the landscape.

But we’re getting ahead of ourselves. Stay tuned for our next post on Automatic Speech Recognition by following Descript on Medium, Twitter, or Facebook.

Image result for Timeline via Juang & Rabiner

Timeline via Juang & Rabiner

This article is originally published at Descript.

Writing

Overcoming Writer’s Block with Automatic Transcription

If you’re a writer — of books, essays, scripts, blog posts, whatever — you’re familiar with the phenomenon: the blank screen, a looming deadline, and a sinking feeling in your gut that pairs poorly with the jug of coffee you drank earlier.

If you know that rumble all too well: this post is for you. Maybe it’ll help you get out of a rut; at the very least, it’s good for a few minutes of procrastination.

Here’s the core idea: thinking out loud is often less arduous than writing. And it’s now easier than ever to combine the two, thanks to recent advances in speech recognition technology.

Of course, dictation is nothing new — and plenty of writers have taken advantage of it. Carl Sagan’s voluminous output was facilitated by his process of speaking into an audio recorder, to be transcribed later by an assistant (you can listen to some of his dictations in the Library of Congress!) And software like Dragon’s Naturally Speaking has offered automated transcription for people with the patience and budget to pursue it.

But it’s only in the last couple of years that automated transcription has reached a sweet spot — of convenience, affordability and accuracy—that makes it practical to use it more casually. And I’ve found it increasingly useful for generating a sort of proto-first draft: an alternative approach to the painful process of converting the nebulous wisps inside your head into something you can actually work with.

I call this process idea extraction (though these ideas may be more accurately dubbed brain droppings).

Part I: Extraction

Here’s how my process works. Borrow what works for you and forget the rest — and let me know how it goes!

  • Pick a voice recorder. Start talking. Try it with a topic you’ve been chewing on for weeks — or when an idea flits your head. Don’t overthink it. Just start blabbing.
  • The goal is to tug on as many threads as you come across, and to follow them as far as they go. These threads may lead to meandering tangents— and you may discover new ideas along the way.
  • A lot of those new ideas will probably be embarrassingly bad. That’s fine. You’re already talking about the next thing! And unlike with text, your bad ideas aren’t staring you in the face.
  • Consider leaving comments to yourself as you go — e.g. “Maybe that’d work for the intro”. These will come in handy later.
  • For me, these recordings run anywhere from 20–80 minutes. Sometimes they’re much shorter, in quick succession. Whatever works.

Part II: Transcription

Once I’ve finished recording, it’s time to harness ⚡️The Power of Technology⚡️

A little background: over the last couple of years there’s been an explosion of tools related to automatic speech recognition (ASR) thanks to huge steps forward in the underlying technologies.

Here’s how ASR works: you import your audio into the software, the software uses state-of-the-art machine learning to spit back a text transcript a few minutes later. That transcript won’t be perfect—the robots are currently in the ‘Write drunk’ phase of their careers. But for our purposes that’s fine: you just need it to be accurate enough that you can recognize your ideas.

Once you have your text transcript, your next step is up to you: maybe you’re exporting your transcript as a Word doc and revising from there. Maybe you’re firing up your voice recorder again to dictate a more polished take. Maybe only a few words in your audio journey are worth keeping — but that’s fine too. It probably didn’t cost you much (and good news: the price for this tech will continue to fall in the years ahead).

A few more tips:

  • Use a recorder/app that you trust. Losing a recording is painful — and the anxiety of losing another can derail your most exciting creative moments (“I hope this recorder is working. Good, it is… @#*! where was I?”)
  • Audio quality matters when it comes to automatic transcription. If your recording has a lot of background noise or you’re speaking far away from the mic, the accuracy is going to drop. Consider using earbuds (better yet: Airpods) so you can worry less about where you’re holding the recorder.
  • Find a comfortable space. Eventually you may get used to having people overhear your musings, but it’s a lot easier to let your mind “go for a walk” when you’re comfortable in your environment.
  • Speaking of walking: why not go for a stroll? The pains of writing can have just as much to do with being stationary and hunched over. Walking gets your blood flowing — and your ideas too.
  • I have a lot of ideas, good and bad, while I’m thinking out loud and playing music at the same time (in my case, guitar — but I suspect it applies more broadly). There’s something about playing the same four-chord song on auto pilot for the thousandth time that keeps my hands busy and leaves my mind free to wander.

The old ways of doing things — whether it’s with a keyboard or pen — still have their advantages. Putting words to a page can force a sort of linear thinking that is otherwise difficult to maintain. And when it comes to editing, it’s no contest: QWERTY or bust.

But for getting those first crucial paragraphs down (and maybe a few keystone ideas to build towards)? Consider talking to yourself. Even if you wind up with a transcript full of nothing but profanity — well, have you ever seen a transcript full of profanity? You could do a lot worse.

This article is originally published by Descript.

Writing

Best Tips for Freelance Writers in 2018

Setting out to become a successful freelance writer is not a bed of roses especially the first few months when you are trying to achieve stability in adapting to the new life. However, as much as challenging it can get you can find ways to make your life easier and the freelance writing a fun part. For that matter, we have brought you the best tips to make your freelance content creation ventures a success in 2018.

Evaluate your short-term and long-term goals

It is imperative to evaluate your short term and long term goals in order to organize your work and time properly. As exciting as it is to think about all of your plans for your work, it is important to concentrate on the tasks at hand to keep your writing services running. This doesn’t mean that you have to forget about your ambition but only that you should focus 95% of your time on the current tasks while the remaining 5% for planning about the future big projects. This is because you will never get to your long-term goals unless you manage your short term ones properly.

Prioritize

After having evaluated your short-term and long-term goals, you would want to prioritize your tasks at hand in order to accomplish them. Analyze which projects and clients are important for your growth and would be suitable for both your time and budget. The ones that aren’t fitting your criteria might be a distraction which you can move to the ‘future possibility list’ when you are in a better position. This step will help you in narrowing down your goals further.

Determine the price for your services

Many people don’t have the talent of translating their ideas into cohesive writing and might need the help of professional writing services to do so. This is where you have to be confident about your skill set and set the price for your services, however, it doesn’t need to be an overwhelming process.

Exhibit your dedication to producing high-quality work

A solid proof of your excellent work would support your professional rates and create a win-win situation for you. You will exhibit your dedication through your quality work and relay the message to your clients that how amazing it is to work with you. The perfect blend of content marketing and copyrighting can help you achieve this.

Evaluate all projects thoroughly

As a premium freelancer, you wouldn’t be able to accept every project that is thrown your way. It is necessary for both the work and the writer to be good for each other or else all the efforts would go to ashes. Gather all the information about each project that comes your way and then analyze it whether you can produce quality work on it or not and whether it will lead to generating regular work or not and then decide whether you want to take it or not.

Provide a compelling proposal

After evaluating a particular project, create an outline of the content that you would create around it so that the prospect may choose you. Also, convince them by providing information regarding how your services are better and will help them achieve what they want. An appealing proposal would spark excitement in the client and they would want to get the task done by you.

Meet the important deadlines

Try to create deadlines for yourself that are way earlier than the ones provided by the client. This approach will provide you time for handling any unexpected events while still managing to provide the work to the client in time. This will not only help you with managing your work in a better way but leave a mark on your client as well that you are capable of handling the work in a professional manner. In case the client doesn’t provide a deadline, still create a deadline and let them know when you will provide it.

Recognize your ideal client

If you being a freelance writer wouldn’t specify that what kind of work you want then you might end up with low-paying tasks and dissatisfying jobs. So, it is important to recognize your ideal client in order to get the perfect jobs according to your caliber so that you create creative work and get perfect price for it rather than going below your standard and doing jobs that are unfulfilling.

Learn content marketing tips and tricks

Good writers are great content marketers as well as they know how to clearly communicate when you wish to deliver the message of a company. So, learn content marketing strategies and techniques in order to meet the prospects and guide them what they want and where? And the answer would revolve around how your business will be the most reasonable for you.

Keep your email inbox organized

An immaculate inbox will help you to keep a track of things as well as responding to the prospects timely. Use labels and folders to keep a track of all the inquiries. This will also help you to reference them in future at any time.

Be flexible

Your initial stages of becoming a freelancer might not go as smoothly as you might anticipate. Itis perfectly normal to make some mistakes initially as this would help you to become a better writer. So, keep moving forward and don’t be as hard on yourself as it might discourage you and you might give it up altogether.

These outstanding tips for freelance writers can do wonders in making them successful in 2018. You can follow them and become a successful content write this year and earn considerable amounts of money while working right from your home. So, go ahead and use them to be an outstanding freelance writer.