Subtitles and captions with WebVTT
One drawback of HTML5 multimedia is accessibility. For hearing impaired
users, audio and video content is nearly-useless without an alternative.
This is where the track element and
WebVTT come in handy.
WebVTT, short for “Web Video Text Tracks”, can be used to provide timed
subtitles and/or captions for multimedia content. WebVTT
files are
plain text, but must be served with a text/vtt header.
Even though it's plain text, WebVTT does adhere to a special format. The
first line must be WEBVTT, and separated from a series of cues by
a blank line. Each cue is made of a start time, an end time, and some
descriptive text — either subtitles, translated dialogue, or a
description of background audio. Below is an example of dialogue from an
excerpted clip of Nina Paley's Sita Sings the
Blues.
WEBVTT
0:00:05.000 --> 0:00:11.000
When? I don't remember what year. There's no year.
How do you know there's a year for that?
0:00:11.000 --> 0:00:12.800
I think they say the 14th century.
That first line marks the boundaries of our first cue. It starts at roughly 5 seconds into the clip and ends at 11 seconds. During that time, the text below will appear on screen. Cues must be separated by a blank line. Our next cue begins at 11 seconds and ends at 12.8 seconds.
Cues and CSS
In Chrome, Safari iOS7, and Opera 16+, we can style our cues using CSS
and the ::cue pseudo-element.
font:18px / 1.5 verdana, sans-serif;
}
Firefox and Internet Explorer don't support this just yet. Firefox'
support for ::cue is in
progress. I assume
the same is true of Internet Explorer.
Simple WebVTT Markup
WebVTT supports a subset of HTML tags and a few of its own elements. We
can bold or italicize text using <b> and <i> elements. It's also
possible to specify the language of a particular snippet of cue text
using the <lang> element.
Perhaps most useful is the ability to mark up different speakers using
voice elements or the <v> tag.
0:00:16.500 --> 0:00:20.499
<v Man1>That's when the Moguls were ruling. Babur was in India.</v>
<v Woman>The 11th then...</v>
Then we can style them using the ::cue psuedo-element as a function.
color:#9f0;
}
::cue(v[voice=Woman]){
color:#ece;
}
Browser support for this is still a bit scattershot, though. Chromium
and its derivatives (Chrome and Opera) have the most robust support for
WebVTT features. Those browsers support most of WebVTT's tags, and allow
the most control over the appearance of captions and subtitles with CSS.
Chromium-based browsers even support using @font-face with WebVTT
cues.
Using with <track>
To use WebVTT with the track element, you need to set your path to the
WebVTT file as the src attribute. By default, track elements are
subtitles. If you like them to be treated as captions by the browser,
set the kind attribute to captions. Though a label isn't required,
some browsers — notably, Internet Explorer — will display
less-than-helpful defaults. Make your label a descriptive name.
<track kind="captions" srclang="en-US" src="dialogue.vtt" label="English">
The srclang attribute is only required when kind="subtitles".
Without it, subtitles won't work. The value of srclang should be a
BCP 47 language
code. We've
included it here even though our track is a captions track. Safari will
prioritize the srclang as a label when both are present.
Most browsers support the track element for the video element only.
For audio, there are two options. Either:
- include a text transcript in the reference document along with your audio media; or
- serve audio files with the
videotag
You can see how it all comes together in the related demo.
Want more?
I cover HTML5 audio and video, as well as the ins-and-outs of WebVTT in Jump Start HTML5 Multimedia from Learnable and SitePoint.