Forum 2: Discourses on Diegesis
On the Relevance of Terminology
To complete the section of this issue dedicated to the cinema proper, we have a forum addressing an ongoing debate regarding the continuing relevance of the term diegesis and its attendant distinctions between diegetic and non-diegetic sound. This forum arose out of various discussions that have taken place on the sound-article email list hosted by filmsound.org, the now legendary site created by Sven E. Carlsson which has become the internet’s foremost resource on matters of film sound. These discussions were often founded upon a basic split between theorists and practitioners, the latter generally feeling that this terminology is of little value in the actual making of films. Many such practitioners expressed the idea that such terminology is strictly the domain of academics who like to use elitist jargon while over-interpreting their objects of scrutiny. I can certainly understand where such sentiments might stem from. I have no problem believing that industry film sound people don’t use the terms diegetic and non-diegetic while working on their projects. And I certainly have no problem agreeing that some academics are guilty of throwing jargon around unnecessarily. Yet as an academic who also practices the art of sound recording and mixing (though admittedly on a non-professional level), I can’t help but think there is some value in holding onto the terminology founded upon the concept of the diegesis.
So I invited opinions on the matter from members of the sound-article list and beyond. The resulting forum presents five takes on this terminology and its usefulness to the theory and practice of film sound. Henry M. Taylor kicks things off by reminding us that the use of this terminology in film theory has been somewhat misappropriated from its origins, while ultimately conceding that its successful adaptation has earned it a vibrant place amongst film scholars, clearly illustrating its continued relevance. We then follow with three essays that point to interesting areas of film sound that this terminology helps us to flesh out: Martin F. Norden examines the role of diegetic sound as provider of narrative commentary, a role that belies its often perceived status as indifferent part of the story world; Mark Kerins asks how the auditory construction of the diegesis has changed in the era of multi-channel sound; and I postulate what value there might be in distinguishing between two kinds of diegetic sound not often discussed. And the last word goes to Academy Award-winning sound designer Randy Thom. He brings a view from the side of industry, taking a position that reflects the opinions held by many of the folks working as professional sound people: that this terminology simply isn’t needed when filmmakers discuss sound during the production process. However, he takes it one step further, offering up an intriguing hypothesis as to why the term is irrelevant when considering the actual uses to which film sound is so often put. As such, Thom crosses the boundary line between theorist and practitioner, ultimately suggesting that the reasons why sound designers don’t use these terms might well be the same reasons why theorists should put them to bed once and for all.
When all is said and done, this forum serves as an interesting view of the boundary that exists between film theorists and those who produce our objects of study. For some, these lines are not to be crossed. However, as many of the ideas presented in this forum suggest, there may yet be methods of reconciling the ways in which theorists and practitioners think about their work. In the end, perhaps issues of language need not stand in the way of celebrating those cinematic moments so enjoyed by theorists and practitioners alike.
The Success Story of a Misnomer
By Henry M. Taylor
Diegesis refers to narration, the content of the narrative, the fictional world as described inside the story. In film it refers to all that is really going on on-screen, that is, to fictional reality.
- Susan Hayward
Diegesis, and its various adjectival forms, diegetic, non-diegetic, meta-diegetic, homo-diegetic etc., have long functioned in literary and film studies as fetish or code words separating the cognoscenti of these disciplines from the general, “ignorant” public. To those on the outside, these terms must sound like cryptic jargon, a situation somewhat reminiscent of the mid-80s, when only the initiated knew what the acronym MS-DOS stood for. Yet it may come as something of a surprise that the highly successful term diegetic is really a misnomer. In the third book of The Republic (Politeia), Plato distinguishes between two kinds of narrative: the simple narrative (haple diegesis), featuring a narrator speaking directly in his own, undisguised voice, and mimesis, or imitative representation, in which the author speaks indirectly, i.e. through other characters. Plato is critical of imitation as imitation, regarding it as dangerous since it simply copies the appearance of the real, providing us only with reproductions of shadows; hence mimesis is an inferior, degraded form of storytelling. This is, of course, ironic, as Plato himself in his dialogues does not speak to us directly, but through the voice of Socrates.
All this changes with Plato’s pupil Aristotle, who, in his Poetics, recontextualizes and expands the significance of mimesis and mimetic narrative. Mimesis now does not reproduce reproductions (or shadows), but reality itself, and hence it is a first and not second order imitation. Reading Aristotle with Paul Ricœur (in the latter’s Time and Narrative), we could say that any kind of creation of poetic worlds is mimetic. Aristotle still adheres to Plato’s term diegesis, but reassigns it to the mode of mimesis: hence, while all narrative is mimesis in the wider sense, the simple or direct narrative (as in voice-over narration in film) is diegetic mimesis, whereas dramatic representations (of actors in a scene, for instance) are, strictly speaking, mimetic mimesis. Therefore, mimesis in Aristotle is the umbrella term designating all poetic representation.
Alas, for a variety of reasons, what should have been called mimesis and mimetic came to be designated by the terms diegesis and diegetic. In film studies, before the academic discipline was established as such, the French term diégèse was introduced around 1950 by Etienne and Anne Souriau, even if there is some dispute about its precise origin. According to Gérard Genette, it was Etienne Souriau who first used the term in 1948; whereas Anne Souriau claims in the Vocabulaire d’esthétique to have coined it herself in 1950; and David Bordwell in his Narration in the Fiction Film refers to Etienne Souriau’s introduction of the concept in a widely known 1953 publication, which is presumably appropriate regarding the expression’s historical reception. This dispute notwithstanding, the English terms diegesis and diegetic, referring to the spatial story worlds primarily of fictional texts/films, are translations of the French words diégèse and diégétique — the matter being complicated by the fact that Genette (aligning himself with Etienne Souriau) asserts that these terms are not derived from the Greek diegesis.
Of course, by now this terminology has been so well established that it would be futile not to use it in its accustomed sense. It has been particularly useful in designating aspects and features of filmic sound as it relates to the relatively closed story-worlds of fiction (regarding non-fiction, and the documentary in particular, the terms remain somewhat problematic). A variant that I first heard of in the early 90s, and that may have been coined at Zurich University’s film studies department, is the expression transdiegetic, referring to sound’s propensities to cross the border of the diegetic to the non-diegetic and remaining unspecific. A good example can be found in Francis Ford Coppola’s Apocalypse Now (USA 1979): when the PT boat crew member played by Laurence Fishburne turns up the radio playing the Stones song “Satisfaction,” the music (at first simply located in on-screen space) swells, thereby encompassing on- and off-screen space; a similar phenomenon can be observed in Kenneth Branagh’s Henry V (UK 1989), when, after the victorious battle at Agincourt, one of the English soldiers (played by the film’s composer, Patrick Doyle) starts singing “Non Nobis,” before the rousing chant is picked up by what is in effect a phantom choir and orchestra on the soundtrack. The term transdiegetic, therefore, reveals that filmic sound, unlike the image, is not place-specific and delimitated, and acts not solely to weld the images together (across shot edits), but also to engulf the spectator in the filmic experience. Plato would not have approved.
Aristotle. (1997). Poetics (Malcolm Heath, Trans). Penguin Classics ed. Reprint. London: Penguin.
Bordwell, David. (1985). Narration in the Fiction Film. Madison, Wisconsin: University of Wisconsin Press.
Genette, Gérard. (1988). Narrative Discourse Revisited (Jane E. Lewin, Trans). Ithaca, N. Y.: Cornell University Press. – Originally published as: Nouveau discours du récit. Paris: Seuil, 1983.
Hayward, Susan. (1996). Key concepts in cinema studies. London: Routledge.
Plato. (1996). The Republic (Richard W. Sterling and William C. Scott, Trans.). Norton paperback ed. New York: Norton. – This ed. first published: 1985.
Ricœur, Paul. (1990). Time and Narrative, vol. 1 (Kathleen McLaughlin and David Pellauer, Trans.). Paperback ed. Reprint. Chicago: University of Chicago Press. – Previously published: 1984. Originally published as: Temps et récit, tome 1. Paris: Seuil, 1983.
Souriau, Etienne. (1990). Vocabulaire d’esthétique. Publié sous la direction de Anne Souriau. Paris: Presses Universitaires de France.
By Martin F. Norden
As a professor who has taught intro-to-film classes for many years, I find critical and pedagogical value in maintaining the distinction between diegetic and non-diegetic sound as blurry as that line often is. I also think, however, that assumptions about these two broad variants of sound usage sometimes interfere with our understanding of what filmmakers can actually do with them.
To illustrate this general point, let us first consider a fundamental observation about non-diegetic sound: that it provides commentary on a film’s characters, events, locations, etc., but only for the benefit of the audience. This commentary can be literal—as in the case of voice-over narration—but can also take the form of general mood music or specific musical “indicators.” Whether it’s the narration spoken by Morgan Freeman in War of the Worlds (2005), Jerry Goldsmith’s ominous strings-and-organ music in Seconds (1966), or John Williams’ bass-fiddle motif in Jaws (1975), non-diegetic sound places the audience in a privileged position: we are privy to information—often crucial information—of which the characters are frequently unaware.
Though I would argue that this assumption holds up reasonably well, it would be a mistake to conclude that diegetic sound cannot be used in a similar way. Indeed, there are a number of instances in which sound, audible to both characters and audience, conveys information that the characters may only be dimly aware of, if at all.
In some films, the commentary may be rather heavy-handed, such as the use of several Tammy Wynette songs to mirror the emotions of C&W fan Rayette DiPesto (Karen Black) in Five Easey Pieces (1970). Other “diegetic commentaries” are subtler and perhaps more rewarding. Consider, for instance, the scene in The Conversation (1974) in which professional wiretapper Harry Caul (Gene Hackman) lies on a bed in his workshop while listening to an audio recording of the titular conversation. He and the audience hear a woman’s taped voice (provided by Cindy Williams) commenting on a homeless man asleep on a park bench. At the moment the woman begins the phrase, “he was once somebody’s baby boy,” the film cuts to a shot of Harry who now resembles the homeless man due to the way he’s photographed on the bed. As the camera slowly dollies in on the emotionally childlike protagonist, it’s apparent that the woman’s statement applies as much to him as to the man she’s observing. This image-sound juxtaposition, which occurs entirely within the film’s diegetic space, is a filmic utterance created solely for the audience. The Conversation offers further Harry-as-baby commentary when Caul awakens in a hotel room to the sound of a television set playing an episode of The Flinstones. The program’s visuals are obscured, but its characters are clearly talking about the impending birth of a child. Once again, Harry has little if any clue that this commentary applies, however obliquely, to him.
For this type of diegetic sound usage to work, the characters and the audience have to ascribe differing values to the things that they hear, just as people do in everyday life. The whistled tune in Fritz Lang’s M (1931) and the varied responses to it serve well as a case in point. To Hans Beckert (Peter Lorre), the murderer who whistles “In the Hall of the Mountain King,” it may simply be the product of a nervous habit; in fact, he may be unaware that he’s doing it. To his young victims, it’s a quirk of a seemingly kind gentleman. To the blind balloon seller, it’s the primary means of identifying the killer. To the film’s audience, the tune becomes a frisson-inducing motif; whenever we hear it, we know the murderer is about to strike again. Diegetic commentaries have the potential to be a more poignant use of sound than their non-diegetic counterparts, in that certain characters have a chance to learn something simultaneously with the audience. If these characters ignore the sound or downplay its significance, a particularly resonant form of dramatic irony often ensues.
Constructing the Diegesis in a Multi-Channel World
By Mark Kerins
On the rare occasions they pay attention to the aural portion of film, theorists often mention whether sounds are diegetic or non-diegetic. The continued use of these terms despite their limitations demonstrates that making this distinction has some value. Yet the focus on differentiating sounds based on their existence inside or outside of the diegesis seems to have overshadowed the question (perhaps more interesting in recent years) of how diegetic sounds are used. In particular, the last decade or so has seen a major shift in where diegetic sounds appear in relation to the screen and, indeed, how the diegesis itself is constructed.
In the monophonic era, all sounds — diegetic and otherwise — came from the screen. In the 1970s the widespread adoption of Dolby Stereo and its rear (or “surround”) channel allowed the diegetic space to spread out into the theater itself. Atmospheres and ambiences could envelop the audience, enhancing the aural illusion that the theater space itself had been replaced with an environment matching the one seen onscreen. Yet this stage of cinema required a curious disconnect between the aural space of the spectator and the diegetic space of the narrative world. Thanks to Dolby Stereo’s technological limitations on what types of sounds could be placed in each channel, only “non-essential” elements like room tone, backgrounds, and music could make their way out into the space of the theater. Other elements more crucial to the story, like dialogue, were forced to remain tightly anchored to the screen. So while the sounds of the diegetic world could theoretically envelop the audience, it was really only the “background” portions of that world that were given the freedom to leave the screen. The result was a filmic environment where just about all important sounds emerged from the same place as the image, regardless of where their sources were supposed to be located in relation to the world represented by the image.
With the adoption of digital surround systems (from 5.1 onward) as the exhibition standard in the 1990s, those rules were thrown out the window; filmmakers gained the ability to place and move any sounds throughout the space of the theatre. For the first time, a coherent diegetic world can be constructed, where the location of sounds in space reflects their logical position with respect to the screen. Rather than the screen being the focus of both the eye and the ear, it now becomes merely a point of departure for the audience to understand the multi-dimensional aural world through their visual “window” into that space. The spatial component of the diegesis, in other words, is constructed by the multi-channel soundtrack, while the screen acts as a point of reference.
Some filmmakers have been afraid of the so-called “exit door effect,” where spot sounds in the surround channels distract the audience’s attention away from the screen. This can be particularly problematic if the sounds are interpreted by the audience as “non-diegetic,” such as when the sound of a door slamming in the surrounds is mistaken for the theater door. The result has been a hesitancy by some to abandon the old models of screen-centered soundtrack mixing. Nevertheless, the last ten years have seen a variety of films usher in this new model of diegesis creation by taking advantage of the new multi-channel systems to create complicated spaces primarily through sound. The opening battle scene of 1998’s Saving Private Ryan, for instance, places the audience right in the middle of the fray, with bullets whizzing and explosions erupting all through the theater space (including directly behind the audience). The emotional effect is striking – we feel as if we are in the battle, not merely watching it onscreen. 1999’s The Matrix frequently employs the same strategy, but relies even more heavily on its multi-channel soundtrack to create the narrative space. With several key sequences employing few wide shots and no establishing shots, we understand the location of key characters and other objects through the constantly shifting soundspace, which changes with nearly every cut to maintain a consistent spatial match between image and sound. Even less action-driven movies have found new ways to build diegetic spaces; Being John Malkovich (1999), to cite one example, uses complex multi-channel mixing to help us distinguish between diegetic, non-diegetic, and voiceover sound for the scenes within Malkovich’s head.
In the end, these films still use the idea of diegetic sound as a frame of reference, but they exploit this idea in new ways: to create the diegesis itself, and to move the movie from “what’s going on in front of us” to “what’s going on all around us.” From one perspective, then, the important question becomes not whether sounds are diegetic, but how both diegetic and non-diegetic sounds are used. This is not to diminish the use value of the diegetic / non-diegetic distinction, but to point out that today there is more to the relationship than mere nomenclature. In a multi-channel world, many (though not all) films are willing to let diegetic sounds spread out into the theater to create a more “complete” space.
But what about non-diegetic sounds, such as music and voiceover? Should they envelop the audience along with the diegetic sound, or should the theater space be left to the diegetic world? What about sounds like music that may start out as non-diegetic and then become diegetic — where should they be located? These are questions without clear-cut answers that at the moment are being addressed differently by different filmmakers. And they will only become more complicated as sound systems continue to evolve through the adoption of 6.1-channel and 7.1-channel systems, with even more complex arrays on the horizon. Perhaps only one thing can be said for sure: the concept of the “diegesis” will only grow richer as filmmakers experiment with new relationships between the onscreen image, the soundscape, and the filmic world.
Does Anybody Hear?
By Randolph Jordan
If a tree falls in the forest, does it make a sound? When Bart Simpson was asked this question by his sister Lisa as part of his Zen training for an upcoming miniature golf tournament, he answered by mimicking the sound of a falling tree crashing into its neighbors. “But Bart,” Lisa continued, “how can it make a sound if there’s nobody there to hear it?” Here Bart has a moment of clarity, and is now able to move to the next level of perception. Prior to this enlightenment, Bart seems never to have considered the idea that sound might only come into being in the presence of ears. This, of course, taps into phenomenological questions about the nature of perceived reality. It is also of some importance when considering the concept of diegetic sound.
When explaining the concept of diegetic sound to neophytes (students, neighbours, in-laws, etc), the question that I have found most useful is this: could a character in the film hear the sound under discussion? If so then it’s diegetic. After all, it’s easy for most of us to understand that while we watch Darth Vader striding through the halls of the Death Star, John Williams isn’t conducting an orchestra through the “Imperial Theme” just outside the frame (though the Imperials might well enjoy such a thing, and could certainly afford it). Yet here my use of the word could is crucial. To ask if a character could hear the sound is different from asking if this same character does hear the sound. The first scenario suggests sound that comes from a character’s environment, regardless of whether or not anyone is there to hear it. The second suggests that diegetic sound is that which is, in fact, heard by a character.
The cinema provides us with endless examples of environmental sound that exists solely for our benefit as an audience, with no ear-bearing creatures within earshot of the environment in question. By one way of thinking, such sounds might only be considered sounds because WE are there to hear them while seated in the theatre. We might accept this when considering non-diegetic sounds that are intended for our ears only. But within the diegesis, the world of the story, can sounds heard by nobody really be considered sounds at all? Surely this question is preposterous, for it is the possibility of these sounds being heard that constitutes their categorization as diegetic. Yet if we leave the question of defining sound out of the equation, might there be some use in distinguishing between diegetic sounds that are actually heard, and those that have only the potential to be heard?
I came to this question when presented with what seems to be a very unusual sonic moment from Gus Van Sant’s Elephant. Very early in the film, we meet photographer Eli just before he encounters a couple on the field next to his school and takes a few shots of them for his portfolio. As he walks we hear him sniffle as though afflicted with some light nasal congestion. At this point the camera is positioned at a distance roughly matched to the auditory perspective we are given. As he then takes his leave of the couple, the camera moves right to follow him briefly, then halts and remains still as we watch him recede into the distance. The sound of his footsteps diminishes accordingly until we are left alone with a few moments of ambient soundscape, grounding us in a firm congruence between the perspective of the camera and our auditory positioning. Yet just before the end of this shot, another nasal sniff is heard along with some light shuddered breathing – a sob. Who are we hearing here? The sound is close-miked, indicating that Eli might be wearing a lapel mic which continues to record him in close proximity even as pulls farther away from the camera. Yet there are no other sounds of his breathing or clothing rustling to suggest this. So I find this moment disturbing, a gently ominous presence revealing itself to be just outside of view, yet which was not acknowledged by any of the characters previously in his or her vicinity. And we can wait the entire film for one of its signature temporal replays that show us the same events unfolding from different perspectives: the source of this sobbing is never revealed.
We can, however, search for its significance. Is it the sound of another student, seated on the field in tears, perhaps in some precognizant awareness of the horror about to unfold in the school? Or perhaps the tears are for some other personal turmoil, one of the many inner worlds into which the film grants little access, leaving us with no real sense of what motivates any of the characters we see on screen, much less the motivations for the shootings that take place shortly thereafter. Perhaps the shootings are equivalent to tears, pain externalized and made perceptible to others. The subtlety of rendering this gentle sobbing audible stands in stark contrast to the overt acts of violence that the film centers upon. This is a necessary contrast, indicating the tension between inside and outside that drives the film forward, culminating in a mad scramble amoungst the students to find a way out of the building which has been turned inside out by the gunmen.
The gunshots are heard by everyone, but by then it’s too late. Did anyone hear the soft sounds of the mysterious sobber on the field? This question is facilitated by a distinction between the two categories of diegetic sound that I have discussed here. It is important that the actually heard and the potentially heard are both grounded within the diegesis, for it is the potential locked within the latter category that bears the most ethical weight when considering the case of Elephant. It seems to me that these are the fundamental questions of the film: Could the shootings have been stopped? Did anyone hear the pain of the two killers? Could anyone see the Elephant in the room? To my mind, the question of sound that has the potential to be heard, but which remains unheard within the diegesis, is of great importance when considering Van Sant’s film. In turn, the term diegesis remains an important part of this exercise, for without it the realm of unheard sound bleeds outward into areas that have no bearing on these questions. So I’ll end by reprising the question posed to Bart Simpson by turning to the words of Bruce Cockburn: “If a tree falls in the forest, does anybody hear?”
Acoustics of the Soul
By Randy Thom
In the thirty years of conversations I’ve had with co-workers on feature films in the USA and Britain, nobody has ever used the word diegetic except to deride it as an academic term of little practical use. I’ve never heard anyone ask a director a question like, “is this gunshot diegetic?” or “is this saxophone solo diegetic?” I suspect one reason is that there are more straightforward ways to ask such questions when working on a film: “Does Jim hear the gunshot?” “Does Angela hear the saxophone?”
But there may be a deeper and more interesting reason for sound practitioners to avoid using the word “diegetic.” I think it’s a term more appropriate for analyzing a film than for making one. Most filmmakers, whether they are directors, composers, or sound designers, are minimally analytical about their own work. We’ve all witnessed Q&A sessions with directors who, when asked a learned question about the theoretical foundation for a certain moment in one of their films, has said something to the effect of, “Interesting… I don’t know.”
Storytellers tend to live and work on gut feelings, intuition, and their own raw nerve endings. They thrive on finding new ways to use ambiguity to their advantage. If you ask them, “Does Angela hear the saxophone?” They are likely to say, “Maybe, what do you think?”
Most filmmakers simply don’t find it very interesting, and even less useful, to ask the question, “Is it theoretically possible for Angela to hear the saxophone when she is lying on her bed in the scene after the party?” The music mixer may ask whether the sax should be treated as “source” or “score” in order to know if it should be muffled and treated with artificial reverb to make it sound like it’s coming from an adjacent room “source,” or if it should be played cleanly and crisply to give the impression that it is not coming from a place where it could be heard. The director will usually answer that question with a response like, “Play it as source.” Or, “Play it as score.” Or, “Try it half way between.”
I have another, more radical suspicion: I think the question of whether a sound in a given scene is diegetic or not is often irrelevant to the effect the story has on its audience. I suspect that the audience intuits that Angela hears the saxophone regardless of whether or not it’s theoretically possible for her to do so. Not only does she hear it, but we hear it through her. It emanates from her. She is the saxophone at that moment. Any artificial reverb we may add to it in an attempt to make it sound like it’s coming from the hotel room down the hall will tend to be interpreted instead by the audience as the acoustics of Angela’s soul, making the question of the diegesis moot.
- The Success Story of a Misnomer by Henry M. Taylor
- Diegetic Commentaries by Martin F. Norden
- Constructing the Diegesis in a Multi-Channel World by Mark Kerins
- Does Anybody Hear? by Randolph Jordan
- Acoustics of the Soul by Randy Thom