Monday, November 8, 2010

Voice Acting and Games

The addition of voice acting into games is pretty much standard nowadays for big titles. Players expect characters to be voiced, and moreover, they expect the people delivering those voiced lines not to deliver hackneyed performances. The gaming community is no longer content to put up with overacting or flat line delivery. And beware the collective opinion of your game's audience should it have a small number of voice actors like Oblivion.

We can safely say that voiced games are not going to disappear any time soon, and if anything, they are going to become more popular and prevalent.  Now that games are attracting famous actors to provide VO work for titles, acting seems set to be increasingly important.  Keeping in mind my gaming preference towards RPGs, this post could easily head towards the topic of Dragon Age 2 and the effect that a voiced protagonist will have on the game compared to Origins. However, that raises a whole different can of worms that I don't want to play with right now.

Hawke, Dragon Age 2's voiced protagonist

Instead, let's focus on why voiced characters have become so popular and what they add to games. With the realistic graphics of modern games modelling characters, their movements and facial expressions accurately, it is only natural that a player will want to hear these characters speak as well. Compare the impact of a static character portrait with a line of text next to it against the scenario of watching a character's face close up as they deliver every word with matching facial and tonal emotion. Games are giving us a cinematic experience almost parallel to movies at times, so the impact of those visual scenes would fall flat if characters were not voiced.

It has been suggested that when communicating with people in person, 55% of communication is non-verbal (facial cues, body language, gestures, etc), 38% voice quality (tone, speed, etc), and 7% the actual words. If a game only gives us text for dialogue of characters, then we are potentially missing around 93% of the content associated with those words to give context to them in real life. If we are able to see characters within a game in full 3D, complete with animations, facial expressions and the tone of their voice, then the experience of our interaction with that character more closely mimics our real-life expectations and increases the player's level of engagement. Anything that brings characters to life makes the interactive experience of a game more engrossing, which is the designer's ultimate aim.

A face can speak a thousand words

While the benefits of voicing are clear, is it always necessary? Can games get away without voice over work and use other methods to convey their story? Would a game's budget in terms of money and disk space be better spent providing additional playable content for the player to experience rather than voice over work for a reduced amount of material? There is something to be said for games who decide to ditch VO for the sake of additional content, reserving communication entirely via written text. The potential increase would be significant once the cost of hiring a studio, recording, re-recording, mastering, lip-syncing, facial animation/capture, animations, etc, are all taken into account.  Providing voice content is not cheap. Moreover, if dialogue is delivered badly, it can actually be more grating then were we just to read the lines as text. Bad voice acting or continually repeated voice cues or lines can quickly aggravate the player. Virtually all players of S.T.A.L.K.E.R. quickly became annoyed by the line "Get out of here, stalker."

The question that remains to be asked is simple: Is it commercially viable to create games with large amounts of text? Planescape Torment did not achieve significant commercial success despite critical acclaim and its transition into RPG "cult classic" status. Games are not books, and to remove the audio-visual cues that games borrow from cinematic presentation greatly reduces their potential impact. A game could describe a scene with text on a black screen, or simply present the scene and deliver it in a matter of seconds.

Is it easier to describe or see this scene?

This is not to say that text cannot be used effectively in games, but if it is, then a balance must be struck. Anything that the player has to interact with arguably must be seen, because they will have to see it while playing the game. Pure text can be used to deliver other information, whether observations to augment the visuals, backstory delivered through the game's user interface (as opposed to in-game elements) or any in-game elements that would not be voiced. If a player find a book or paper in the game world, then allow them to read it. Plain text can be used to enhance the believability of the game environment every bit as much as voice acting can.


  1. I don't really disagree in a Bioware-esque context, but your post reads like there's a choice between V/O and lots of text. That's only true if you have a lot of dialogue, which is far from necessary. There are superb RPGs with very little dialogue (the Etrian Odyssey series where your intimacy is with the environment rather than characters is a great example), and games with strong story that do not rely on conversation (Ico/Shadow of the Colossus spring to mind).

    If anything, full voice belongs to a niche - a mature, AAA niche, but largely restricted to in-home core gaming. It's not an expectation in the huge social/iOS/DS markets, and even PSP games frequently go "partially voiced".

  2. "Compare the impact of a static character portrait with a line of text next to it against the scenario of watching a character's face close up as they deliver every word with matching facial and tonal emotion"
    Admittedly, I look to be in the minority but I just don't get this. How does a character living in the same environment and talked to in the same way by the different people assume different roles - you know, making this a role-playing game? Are all characters 'roles' inherently decided before one starts the game and then, it is all a matter of how you can influence the game rather than having both give and take to/from the story/environment/characters?

    When I see that static character portrait and a text dialogue, I imagine how the character will say it - based on the portrait, the past interactions with him/her and other NPCs and then, base my choice of of that.
    In a textual context, a simple 'Oh?' by an NPC can be construed in different ways (based, again, on past interactions and other factors) which can influence your future decisions. That is never possible with fully voiced NPCs.
    With voiced characters, this goes a step further, IMO, in alienating the character. Irrespective of which era/region Dragon Age (or any other game) is set in, when I play, I like to think of my dialogue lines as I would speak them - not hear how someone else would speak them. I seriously don't understand how this lends even an iota of extra immersion into the game or expand on the role-playing possibilities - unless you speak similar to the voiced character.

  3. Jye: I'd agree that full VO is currently somewhat niche, but any time there's cinematic presentation, players (somewhat rightfully) expect VO. Imagine GTA IV, or Red Dead Redemption without voice overs given their presentation. It's a matter of matching presentation and audio - if the visuals are cinematic, the audio should support that.

    Timelord: That is indeed an interesting point. I'll reiterate the call to steer clear of discussion regarding a voiced protagonist, as that's a complex and thorny subject when we're talking about roleplaying games. However, if we just limit the discussion to NPCs that the player talks to, it's still very interesting. I was going to post a response in the comments here, but it turned out my response was long enough that it pretty much warranted an entirely new blog post...

  4. "It has been suggested that when communicating with people in person, 55% of communication is non-verbal (facial cues, body language, gestures, etc), 38% voice quality (tone, speed, etc), and 7% the actual words."

    Solution: Simlish. 93% as good.