Subtitles and captions with WebVTT
One drawback of HTML5 multimedia is accessibility. For hearing impaired
users, audio and video content is nearly-useless without an alternative.
This is where the track
element and
WebVTT come in handy.
WebVTT, short for “Web Video Text Tracks”, can be used to provide timed
subtitles and/or captions for multimedia content. WebVTT
files are
plain text, but must be served with a text/vtt
header.
Even though it's plain text, WebVTT does adhere to a special format. The
first line must be WEBVTT
, and separated from a series of cues by
a blank line. Each cue is made of a start time, an end time, and some
descriptive text — either subtitles, translated dialogue, or a
description of background audio. Below is an example of dialogue from an
excerpted clip of Nina Paley's Sita Sings the
Blues.
WEBVTT
0:00:05.000 --> 0:00:11.000
When? I don't remember what year. There's no year.
How do you know there's a year for that?
0:00:11.000 --> 0:00:12.800
I think they say the 14th century.
That first line marks the boundaries of our first cue. It starts at roughly 5 seconds into the clip and ends at 11 seconds. During that time, the text below will appear on screen. Cues must be separated by a blank line. Our next cue begins at 11 seconds and ends at 12.8 seconds.
Cues and CSS
In Chrome, Safari iOS7, and Opera 16+, we can style our cues using CSS
and the ::cue
pseudo-element.
font:18px / 1.5 verdana, sans-serif;
}
Firefox and Internet Explorer don't support this just yet. Firefox'
support for ::cue
is in
progress. I assume
the same is true of Internet Explorer.
Simple WebVTT Markup
WebVTT supports a subset of HTML tags and a few of its own elements. We
can bold or italicize text using <b>
and <i>
elements. It's also
possible to specify the language of a particular snippet of cue text
using the <lang>
element.
Perhaps most useful is the ability to mark up different speakers using
voice elements or the <v>
tag.
0:00:16.500 --> 0:00:20.499
<v Man1>That's when the Moguls were ruling. Babur was in India.</v>
<v Woman>The 11th then...</v>
Then we can style them using the ::cue
psuedo-element as a function.
color:#9f0;
}
::cue(v[voice=Woman]){
color:#ece;
}
Browser support for this is still a bit scattershot, though. Chromium
and its derivatives (Chrome and Opera) have the most robust support for
WebVTT features. Those browsers support most of WebVTT's tags, and allow
the most control over the appearance of captions and subtitles with CSS.
Chromium-based browsers even support using @font-face
with WebVTT
cues.
Using with <track>
To use WebVTT with the track element, you need to set your path to the
WebVTT file as the src
attribute. By default, track elements are
subtitles. If you like them to be treated as captions by the browser,
set the kind
attribute to captions. Though a label isn't required,
some browsers — notably, Internet Explorer — will display
less-than-helpful defaults. Make your label a descriptive name.
<track kind="captions" srclang="en-US" src="dialogue.vtt" label="English">
The srclang
attribute is only required when kind="subtitles"
.
Without it, subtitles won't work. The value of srclang
should be a
BCP 47 language
code. We've
included it here even though our track is a captions track. Safari will
prioritize the srclang
as a label when both are present.
Most browsers support the track
element for the video
element only.
For audio
, there are two options. Either:
- include a text transcript in the reference document along with your audio media; or
- serve audio files with the
video
tag
You can see how it all comes together in the related demo.
Want more?
I cover HTML5 audio and video, as well as the ins-and-outs of WebVTT in Jump Start HTML5 Multimedia from Learnable and SitePoint.