If you work in eLearning, you might be making video. If you’re not making video, you are certainly making slides with narration. Slides have been around for at least 75 years. Video hasn’t been around nearly as long, and it’s still a growing phenomenon in eLearning. It took long enough. OK, we’re making more video, but are we making really good video? Are we using the medium to its best advantage? How can we help our learning audience really get engaged with the video?

What I want to suggest in this article is that what makes really good video is more than what people see. Much of what makes video engaging, what makes it real, is what people hear during the video. While we’re making video we record some sound, to be sure, but we don’t often take the sound we record and make that into a “soundtrack” to go with the video. I am going to share some secrets with you. People win Academy Awards for sound editing, and there’s a reason for that! (Maybe we should give awards for sound editing in eLearning video.)

What is a soundtrack, and why should you care?

Is a soundtrack different from an audio track? Both are audio tracks. How is a soundtrack different from a sound recording? A sound recording is exactly that: a recording of what a microphone “hears,” most often the human voice. The concept of a soundtrack is to create a sonic environment. A sonic environment is built of all sorts of different sounds: the spoken word, ambient sounds of all sorts, and perhaps sound effects. By itself, the human voice, even when well recorded, can be flat and two-dimensional, at least in the setting of a video. If you record in a studio, the goal is to keep any background noise out of the sound you record. You hear a voice. That’s all. Does a voice by itself carry a story for video? On the surface, yes, but it doesn’t tell the whole story. Are you making your learning audience “buy into” the story you’re trying to tell? Damien Bruyndonckx, a speaker at DevLearn 2015, said, “An audio soundtrack gives video the third dimension.” So how do we go about creating the third dimension for our video projects?

First, you have to write it

An eLearning script starts with the written word. First, write everything you hear, including all the sound effects and ambient sounds as well as the dialogue. As a writer, you need to “visualize” the soundtrack in your head. This is something that professional video and cinema script writers do. Why? What’s the best way to think about your audio and sounds? Where is your audience? Transport them there by thinking what it would sound like without dialogue or narration.

Frequently during training design and development projects, the instructional designer is the writer as well. While most instructional designers receive training in writing instructional material, it’s not the same as writing dialogue or narration for video. Unless you’re an experienced dialogue writer, writing dialogue is one of the most difficult things you can do. Good narration and dialogue sound natural. They feel spontaneous. Good narration and dialogue propel the story. Bad dialogue isn’t really heard. It’s just bad.

The other half of writing a script for eLearning is the sonic environment. Is the action taking place on the floor of a factory? Or in a big room where people are working in a call center? The sonic environment is what “sells” the story to the learner.

Script development starts with the budget!

eLearning project budgets usually allocate too few dollars for script development and sound recording. There are lots more dollars for Storyline, Captivate, or other development tools. Clients expect the script to come with the instructional design. The two are treated as one and the same. They’re not, but seldom is there also an expectation that the script comes from the instructional design. Frequently, it feels (and sounds) to me as though the script and sounds are an afterthought.

Dialogue or narration, along with sound recording, should be a separate items in a budget. How much of the budget? A good rule of thumb would be 25 to 30 percent. A good script can take the learning concepts and goals and give them a life outside of rote training. A good script doesn’t go by the numbers. At many conferences, I’ve heard speakers make statements or teach about sound, but not about the sonic environment. Perhaps it’s the advent of video into our eLearning mix that’s not just slides and voice-over to make our training point. It’s a hard sell to get the right money allocated to script writing and dialogue writing and assign the correct amount of funding for the soundtrack. Bad scripts and lack of sound design do not help the success of our projects, that’s for sure.

Soundtracks are different from recorded audio

What do I mean by a soundtrack? It’s a good question. A voice recorded in a studio is technically a soundtrack. A nice vocal recording is most assuredly not a soundtrack. I call it flat. I call it boring. I call it not teaching anything, but just pushing information. Yes, information push is part of what we do. We will never get away from that, since it’s a primary part of learning and training. Video, though, has the power to be immersive and transformative. If we don’t make it immersive, then it’s our bad, so we should look for ways to do it

So what exactly is a soundtrack? First, listen to these two soundtracks.

Mad Men:


The Firesign Theatre:


The first soundtrack was from Mad Men. Many of the Emmys awarded to the show during its run were for sound recording. In this particular soundtrack, I counted at least 15 different sounds. Listen to it again very closely. Put on headphones just like a sound engineer would. It starts out with a Bellini aria that then goes under the dialogue; there are all sorts of other sounds in the mix that make up the sonic environment you’re listening to. Listen to it again. If you are or were a fan of Mad Men, you’ll feel like you could walk into the scene. There are sounds of menu pages being turned, subdued dialogue from different tables, dishes and silverware clanking on the plates, a waiter walking away after asking a question. You can hear footfalls and clothing rustle. It goes on. It’s a soundtrack. There are more sounds in it as well. Many sound engineers add some distortion in their tracks to make the voices more real. One caveat: Using music alone can be deadly to your learners. If you’re just using music as a background, it will take away from your message.

The Firesign Theatre recording is a different story on so many levels. First, it’s not a soundtrack that was part of a video. It’s from a standalone album made in 1970. Yes, 1970. It’s analog audio taken to an extreme. And it’s a true sonic environment. The story is told in a vivid sonic manner—so vivid, you can visualize it if you close your eyes. Every single one of the sounds other than the voices was recorded separately; then, they were spliced together with something that looked suspiciously like Scotch tape and recorded again on a mix tape. It was a long, tedious, and laborious process.

Today this is a different story. We can just record or find the sounds we want to use, drop them on a timeline in a program like Adobe Audition or Sound Forge (Sony just sold this to Magix), and put it all together and make an mp3. Whew! Things were a lot more difficult to accomplish back in the days of analog recording.

True enough, eLearning doesn’t need this much sound. But it needs some. Sometimes you only need one or two environmental sounds to make the soundtrack realistic enough to get learners believing they’re in the situation you want them to believe in.

Two schools of sound

There seem to be two schools of sound in the eLearning industry.

The first school likes a clean announcer style with no other sounds, except an occasional effect that is loud and obvious. A trained announcer records the script in a sound booth or somewhere similarly quiet, and that sound is played behind slides or video in the lesson. I encounter it frequently in eLearning. Too frequently. Does this kind of sound actually take away from the learning experience? I believe it does.

The second kind of soundtrack, which I’ll call “new school,” uses non-obviously-trained voices in natural environments. The new school adds ambient sounds, like the air around us, along with appropriate background sounds that bring the speaker or speakers and their environment to life. This is a more realistic and immersive way to create sound, especially when we have to “build” slides because of the instructional design.

There’s a third way that is sort of in between these approaches, and that would look like “clean” voice sound with a couple of effects thrown in because they seem appropriate. This hybrid can actually be the worst of all three. The listeners don’t believe they’re in an environment for a nanosecond. The developers just spent time making something that actually made the lesson worse than it was before. Why did they bother?

Reality isn’t just for reality TV

Today we live in an extended reality environment. That marvelous soundtrack from Mad Men was accurate. The clanking of china and forks and knives on china were real recorded sounds. The background dialogue was recorded for the scene. Twenty or even 10 years ago, we really weren’t “plugged in” to environmental sounds to the extent we are today. We live in a rich sonic environment. We listen to all sorts of sounds without being aware they are there. A big difference today is that we can fairly easily create the environment we want with sound effects. And anyone can do it easily, which makes me wonder why it’s not being done for every eLearning session.

Soundtrack creation should be the standard rather than the exception. It takes hard thought and good execution to create a real soundtrack—or a surreal soundtrack, if your lesson calls for it (and some do!). You have to think of everything that might be in the environment you’re creating as a writer and then find those sounds online or make them yourself. The thought process is complex, and even though it doesn’t take all that long, it’s nonetheless a process.

So why not take advantage of the soundtrack-building potential available in any number of programs? These programs range in price from freeware to thousands of dollars. Sound programs are really easy to use—in fact, a whole lot easier than video editing and effects programs.

Making a script

I wish there were a way to just tell you how to write a script for voice and sonic environment, but this isn’t about that necessarily. One simple way is to write the words and then record your own voice; or, even simpler, just listen to yourself speaking the words you wrote. Do they sound natural? Are they a part of an ecosystem of words you hear when you’re actually talking to someone else? Do the two or more people having a conversation really sound like they’re having a conversation? In most cases, you’ll probably do tons of edits on the script. An easy way is to speak the dialogue with someone else. Does it really sound natural? Is it a conversation you could be having with someone else that just happens to inform them with the lesson?

If you want to learn about script writing, Masterclass is offering a screenwriting class by Aaron Sorkin. It offers many other classes in the creative arts as well. I’m taking a filmmaking class by Werner Herzog. The classes are $90, and while not really in depth, there’s still (to me anyway) a lot of value in them for the price.

If it seems like there are more questions than answers, there are. It’s something you need to be aware of as an eLearning designer and developer. You can’t just write up a lesson any more, record it, and hand it off to the Captivate or Storyline developer. You have to create an environment. That’s part of the evolving job of an eLearning designer and developer. Welcome to Hollywood.