Sounding Transnational Flows: Schizophonic Reflexivity in Takashi Miike’s The Bird People in China

by Randolph Jordan Volume 21, Issue 3 / March 2017 17 minutes (4061 words)

Takashi Miike’s 1998 film The Bird People in China explores a realm of reflexivity that exposes the surface nature of the cinema while refusing to give way to overt self-referentiality. There are no men with movie cameras here, and we are far from the bait and switch anti-illusionism extolled by Alejandro Jodorowsky as he declares “move back camera!” from high atop his Holy Mountain. Rather, The Bird People in China builds an interpenetrating form and narrative that questions the role of mediation in our experience of the world, and posits a simultaneity of immersion and distanciation that places its spectators, like its characters, suspended somewhere between belief and disbelief in the world it creates both on and off screen. The main strategy Miike uses to achieve this effect is a gradual build up to a fundamental disjunction between sound and image that ultimately calls attention to the artificiality of the boundaries we draw between diegetic and non-diegetic sound. It is along the boundary of the diegesis that the film’s reflexivity rests, and Miike succeeds in fashioning a film whose narrative drive towards a gentle blurring of the distinction between fantasy and reality is mired within a tale of complex cultural and temporal cross-pollination and mirrored by a formal treatment which lends itself well to a state of liminal experience. I’m going to call this sound induced liminality between immersion and distanciation a “schizophonic reflexivity,” and here’s why.

The film’s essential narrative and formal exploration of sound/image disjunction is best described in terms of the concept of “schizophonia,” coined in the late 60s by R. Murray Schafer, the founder of the World Soundscape Project at Simon Fraser University. In The Tuning of the World (1977) he uses the term schizophonia to refer to “the split between an original sound and its electroacoustical transmission or reproduction” (90). Schafer believes that the separation of a sound from its source through “unnatural” means has resulted in a psychological malaise akin to what Frederic Jameson would later describe as being the schizophrenic breakdown in the signifying chain attributed to postmodern culture’s loss of historical context (1991, 27). Schafer’s version of this cultural critique suggests that such over-representation is akin to a desire to “transcend the present tense,” effectively blending the past with the present in what can be thought of as a Bergsonian dissolution of the virtual/actual split, a concept that would later become important to Gilles Deleuze in Bergsonism and the Cinema books (1986, 1988, 1989).

Schafer exhibits a clear bias towards the idea of a pre-industrial soundscape, one in which he supposes schizophonia could not exist. This bias is linked to his distaste for the idea transcending the present tense. Yet the soundscapes that he would have us return to are a product of a distant past that we can only glimpse under certain conditions in today’s world. He relies heavily on written ear-witness testimony from times past, and has himself pioneered the use of recording technology for the purposes of documenting, analyzing, and ultimately hoping to preserve certain of today’s changing soundscapes. Schafer’s line of thinking exhibits an incongruity when considering his appeal to a time other than the present and his engagement with various modes of representation within his own work. However, this incongruity is only apparent if schizophonia retains the negative connotations that he intended.

What is crucial to note about Schafer’s concept of schizophonia is that it is based on the idea that a represented soundscape can effectively replace an existing soundscape. The idea here is that the listener loses grounding within the context of the listening environment and enters the time and place of the recording rather than that in which the recording is transmitted. While something approaching this space-replacement model of schizophonic experience (Jordan 2007) might exist in very controlled circumstances, such as acoustically treated recording studios, movie theatres, or the use of headphones, the idea is essentially impossible in the context of our general experience of represented sound within existing soundscapes. No reproduced soundscape can ever fully replace the pre-existing soundscape of the place in which it is being transmitted. What can happen, however, is a layering effect whereby the soundscape of a given place is mixed with a represented soundscape, thus causing an interplay between the two that calls attention to itself as such. Our awareness of the way a space should sound creates an awareness of how it sounds differently in the face of represented sound. This is a simultaneity of immersion and distanciation, and this is how I think the idea of schizophonia is best applied as a conceptual tool for examining such simultaneity in the world, particularly within the sonic arts. It will be our level of awareness that dictates the extent to which we experience schizophonia as a doubling effect. Our ability to recognize this doubling might help keep us grounded in our environments, or it might bring us that much further out of our comfort zones and towards a feeling of space-replacement. Most often, however, the experience of schizophonia will be a mixture of both. It is with this understanding of schizophonia that I will be exploring its manifestation within Miike’s film.

The Bird People in China is a great starting point for such a discussion as it intertwines problems in the methodology of Schafer’s soundscape project with similar problems that exist in ideas about permeable borders between cultures and, just as importantly, between the past and the present. The ancient and the contemporary co-exist rather seamlessly in the little Chinese village where the Bird People reside, a village that turns out to be far more cosmopolitan than one would initially suspect. Of particular importance here is that the permeability of cultural and temporal borders is represented in the film, at least in part, through the use of sound and the inclusion of sound reproduction technologies within the film’s narrative. It is in these uses of sound and sound technologies that the film’s reflexive strategies reside, and these will be exposed as we sort through the conclusion the film draws with respect to issues related to the idea of schizophonia.

Transcending the Present Tense

The film’s theme of juxtaposing different times and cultures is apparent right from the outset, as is the film’s strategy of juxtaposing seemingly contradictory sounds and images. Our first image in the film is of a cave painting depicting a winged-anthropod of some kind, followed by a shot of a man in contemporary clothing strapped to his back. Then the sound of a jet plane is heard, first accompanied by a shot of one of the cloth wings passing across the frame, followed by a commercial airliner landing at an airport. The cave-painting is suggestive of the ancient; the airplane is clearly modern; and the man is caught somewhere between the two. In these few shots the entire narrative drive of the film is established along with its connection to the film’s formal strategies. The sound of the jet juxtaposed with the cloth wing presents a discontinuity that seems initially deceptive. But this moment serves as the beginning of a story in which apparent discontinuities are revealed to be the very fabric of a fully formed dissolution of cultural and temporal boundaries which is made manifest in the small Chinese village at the heart of the film. A past of long ago will come face to face with contemporary experience, a meeting that will explore its own mediation through technologies of sound reproduction and come to the conclusion that the deception that Schafer suggests is at the heart of schizophonia may not be such a bad thing after all.

The man with the wings is Wada, a young Japanese businessman living in a major urban center. At the beginning of the film he sets out to investigate possible jade deposits in the remote mountains of China’s Yun Nan province. He carries a portable tape recorder wherever he goes and has a propensity towards recording both his own musings and the sounds of his external environments as he travels.

He is followed by Ujiie, a Yakuza hoping to muscle in on the potential Jade fortune in order to collect a debt owed by the company for which Wada works. Upon arriving in their destination village, they discover that the villagers operate a school that supposedly teaches children to fly on home-made wings (though no evidence of actual flight is suggested until late in the film). They also discover that the area is, in fact, rich with Jade.

Torn by his duty as a gangster and his desire to leave the life, Ujiie decides that he must not let foreign influences destroy the village and its traditions. But the villagers are excited about the possibility of wealth finding its way to their homes, a wealth epitomized by the coming of electricity. A middle ground must be found, and Ujiie ends up staying in the village to act as a mediator between the villagers and outside forces so as to allow them some modernization while not entirely sacrificing their traditional ways.

Mediating Modernity

Ujiie’s role as mediator can be understood in terms of a more general concept of mediation represented by the role that sound reproduction technology plays in the narrative of the film. The primary subplot involves Wada’s quest to understand the lyrics to a song sung by a young woman who heads the village school and who remains unnamed throughout the film. Both Wada and Ujiie recognize the tune she sings, but the words are difficult to comprehend because they are in English, a second-language to both men. To make matters worse, the words are being sung by someone who doesn’t understand the language herself. Wada entices the woman to record her song on his tape recorder so that he can set to work transcribing each word with the help of his computer’s dictionary. Alas, his recorder’s batteries are failing and he isn’t able to complete his task before the end comes.

Wada’s desire to access the past through its representation, both in the woman’s song and his recording of it, raises important questions about the idea of understanding history through its mediated manifestation in the present. As we’ll discover, this song is appropriately at the heart of a set of relationships between cultural and temporal contexts, as well as offering a decisive insight into the artificiality of the distinctions we make between diegetic and non-diegetic sound in the discourse of film sound theory.

Wada and Ujiie discover that the song was taught to the young woman by her grandfather: a pilot in the British air force who crash landed in the region during the Second World War. The tail of his plane rises out of a local pond like a crucifix watching over the village. She tells them that there has been belief in the ability to fly since ancient times but was lost in the recent past, only to be rekindled by the appearance of her grandfather out of the sky. This ancient tradition thus finds rebirth as the result of contact with the contemporary, a fact nicely illustrated by the rock that serves as the headstone for the pilot’s burial plot on which we find a cruciform mark representing the tail of the aircraft next to the mark of the winged being that we saw in the film’s opening moments.

The richness of the symbolic markings in the film becomes even more apparent in light of an early scene in which the travelers meet a young Japanese man before they arrive at the village. He carries a photo of the cave painting that we see in the film’s opening shot and later on the pilot’s headstone. We learn from this man that it is an image that has been found at both the northern and southern tips of Japan. He has come to see if he can find it in the Yun Nan region of China because it is here, he says, that many believe the roots of Japanese culture are to be found. And so the plot thickens, the little village’s status as a nexus of intercultural relationships increasing all the while.

This element of the story illustrates the complicated forces at work in understanding the history of the village. The fact that the young man has come in search of the roots of Japan in another country, combined with the idea that the ancient village whose traditional ways must be preserved are, in part, a product of foreign influence during wartime, suggests that the idea of the village as being isolated in space and time turns out to be far from the truth. Though Wada attempts to sort out some of this complex lineage with the help of his recording device, we soon find that his efforts are as unstable as the workings of memory itself.

Sounding Borders

The problems of accessing other times and cultures through technologies of representation are illustrated throughout the film in Miike’s distinctions between diegetic and non-diegetic sound, particularly with respect to the woman’s voice in song. There are three basic treatments of the woman’s voice throughout the film. There is the diegetic sound of her singing while we see her on screen, characterized by what Rick Altman would call its “spatial signature”; there is the sound of her singing in playback on the tape, a category of sound that Michel Chion refers to as “on-the-air”; and there is the non-diegetic sound of her voice which is generally pristine and uncharacterized by the particular qualities of the diegetic and/or on-the-air versions.

The idea of on-the-air sound is particularly important. In Audio-Vision (1994) Chion describes on-the-air sounds as “sounds in a scene that are supposedly transmitted electronically…by radio, telephone, amplification, and so on – sounds that consequently are not subject to ‘natural’ mechanical laws of sound propagation” (76). Chion argues that on-the-air sound, especially in the case of music, is interesting because it “can transcend or blur the zones of onscreen, offscreen, and nondiegetic” (77). It can also play with our understanding of time in the film. Does on-the-air sound in film “refer to the time of its production or to the time at which we are hearing it?” (77). Chion tells us that the answer to this question lies in the emphasis of a sound’s particularities: does it sound as though we are hearing it at the time of its production, or in the time of a reproduction? Does the sound indicate we are hearing its initial source or its terminal source? (77)

In “The Material Heterogeneity of Recorded Sound” (1992), Altman similarly stresses the importance of paying attention to sound’s “discursive dimensions” when dealing with film, the reality that sound is “multiple, complex, heterogeneous, and three-dimensional” (16). For Altman, every sound is a unique narrative event that can contain great significance when understood correctly. He calls for a narrative analysis of sound based on what the sound can tell us about itself. One of the main characteristics a sound will have is its “spatial signature,” a set of properties that tells us in what kind of context the sound is being produced and consumed (24). To complicate matters, Altman notes that sound recordings also carry signatures of their own, “some record of the recording process, superimposed on the sound event itself” (26). On-the-air sound has the possibility to be removed from the contexts of both time and space because of its sonic qualities of being electronically reproduced, qualities that involve an emphasis on recording signature and a de-emphasis on spatial signature. The capacity for on-the-air sound to blur our understanding of time and space in film is an essential point of consideration for the machinations of historical representation in Miike’s film.

When Wada and Ujiie first hear the woman singing, the voice is presented as though it is being delivered in the space represented on screen. Then we cut to a different location at a different point in time but her singing continues uninterrupted, a clear sign that it has become non-diegetic. This is a technique generally used in film to bridge a change in setting, as it is here. Interestingly, this technique suggests schizophonia, for we begin with a sound whose source is visible on screen, and are then moved away from the source while the sound carries on.

This very common audiovisual strategy is interesting when considered in the context of later scenes which complicate this basic distinction between the inside and outside of the diegesis. In one key scene, Wada records the girl singing. In the middle of the song we cut to him listening to the recording at a later time, complete with an on-the-air quality to the sound that suggests its mediated nature. The difference between this example and the previous one is that here the mediation is clearly identified within the narrative, whereas the mediation provided by the cinematic apparatus necessary for non-diegetic music is meant to be kept hidden, as in the earlier scene.

The importance of these two scenes is that they set up formal treatments of sound that will later become convoluted, just as the distinction between cultural and temporal contexts do. This convolution can be understood as a product of Wada’s experience of the woman’s voice becoming increasingly represented. Hearing it on the tape, transcribing it into written language, and comparing it with his memory of the original that he has heard in the past, her voice becomes ever more disembodied, just as the song for her has become abstracted from the stranded pilot that passed it on to her village. She essentially acts as a technology of sound reproduction, which removes the song another step from its original context in much the same way as Wada’s tape recorder.

Spatio-Temporal Slippage

The increasingly represented experience of the woman’s voice culminates in a scene that finds Wada standing outside one of the village huts at night. He turns his head suddenly as though hearing something, but there is no corresponding sound to be heard on the soundtrack. Shortly after this, however, we begin to hear the woman’s voice singing with the pristine characteristic now associated with non-diegetic sound. Its status as non-diegetic seems to be confirmed by the gradual emergence of accompanying instrumentation. So, the situation is a little bizarre. Wada has apparently heard something and ventured off to see what it is. But the sound that he should be hearing doesn’t start on the soundtrack at the time that he turns his head, nor does it sound as though it should be diegetic. Thus we have the makings of a schizophonic moment that takes the idea further than we have encountered thus far, and offers potential new interpretations of the more mundane crossing of the diegetic/non-diegetic boundary line found in the earlier examples.

After walking some distance, Wada catches sight of the woman singing while seated next to the village lute player. So it seems that she is singing within range of Wada’s hearing after all, and the presence of the other musician lends some credibility to the other instruments that we hear. However, the stringed accompaniment on the soundtrack is not that of a single lute, and her lip movement is clearly not in synch with the voice that we hear. This is not a simple case of poor post-production sound work. Given that such discrepancies exist nowhere else in the film, Miike is deliberately confusing the categories of diegetic and non-diegetic sound. This confusion culminates later in the scene when the image actually turns to slow motion and becomes a lyrical representation of the act of singing that now has even less synchretic correspondence with the sound of her voice. Wada holds up his recorder to capture the sound, but its little red battery light is fading and we are shown close-ups of the tape struggling. In mid-song the recorder finally dies, but Wada continues the recording gesture anyway in a move that suggests the medium’s abilities to transcend its own limitations.

This is a moment where sound slips off the image in a way that creates neither a total break from the diegesis nor full confidence in it. We see her singing and we hear the sound of her singing. But these two events are taking place in different times and spaces. Indeed, the non-diegetic sound of her voice suggests that it takes place outside of time and space altogether, just as the song’s relationship to the village is at the center of complex historical conflation of the ancient and the contemporary. The categories of diegetic, non-diegetic, and on-the-air sound all converge in this moment in the film, and our experience as spectators may well be similar to that of Wada. Perhaps the woman is not actually singing the same song that she has been throughout the rest of the film. We can’t be sure because we are not given access to the sound of the diegesis at this point. Perhaps the non-diegetic presentation of the song suggests Wada’s obsession with it, playing it over and over in his mind as well as on his tape recorder. The sound is not treated with an on-the-air quality, but it does offer the kind of freedom from the ordinary conditions of film sound that Chion suggests is one of the key potential characteristics of sound emanating from technologies of representation. And here Schafer would agree, for it is this unbounded quality to on-air-sound that lies at the heart of the schizophonia that he decries.


The failure of the tape recorder in this scene suggests the inevitable failure of all representation to properly provide access to that which it represents, another downfall of schizophonic experience. Yet representation is the only means we have to access that which is not within our direct experience, so its limitations become an inextricable part of the fabric of this very experience. Continuing the recording gesture after the batteries die is a positive move that suggests the medium’s abilities to transcend its own limitations, a faith in technology that Schafer would likely never be able to muster.

It is in this spirit of faith that the film’s ambiguous concluding shot becomes important: we finally get to see the villagers caught in an act of interspecies mediation between the Earth and the sky, circling their highest peak on wings made of cloth. We don’t know if this image is grounded in the context of the village’s material reality, or if it is a representation of the village spirit breaking free of its Earthly constraints. This lack of certainty corresponds with the instability that Schafer finds in the realm of the schizophonic, the point illustrated by Wada and his tape recorder all along. But in these final frames the feeling is overwhelmingly positive, suggesting an up side to the kind of unbound freedom that Schafer so laments. Finally, this is a schizophonically reflexive moment for it plays with the idea of suspension of disbelief. We can choose for ourselves whether or not to make the suspension, but I suggest that neither choice will have the effect of remaining fully immersed in the film or becoming fully distanced from it. We will hover in our own act of possible flight, occupying the space between immersion and distance which has been delineated by the line of the diegesis that Miike has sought to render with transparent opacity.

Works Cited

Altman, Rick. 1992. “The Material Heterogeneity of Recorded Sound.” Sound Theory, Sound Practice. Rick Altman, ed. New York: Routledge.

Chion, Michel. 1994. Audio-Vision: Sound on Screen. Claudia Gorbman, trans. New York: Columbia UP.

Deleuze, Gilles. 1988. Bergsonism. Hugh Tomlinson and Barbara Habberjam, trans. New
York: Zone Books.

__________. 1989. Cinema 2: The Time-Image. H. Tomlinson and R. Galeta, trans.
Minneapolis: University of Minnesota Press.

Jameson, Frederic. 1991. Postmodernism, or, The cultural logic of late capitalism. Durham: Duke University Press.

Jordan, Randolph. 2007. “Film Sound, Acoustic Ecology, and Performance in Electroacoustic Music.” In Music, Sound and Multi-Media. Jamie Sexton, ed. University of Edinburgh Press, 121-141.

Schafer, R. Murray. 1977. The Tuning of the World. Toronto: McClelland and Stewart.

Sounding Transnational Flows: Schizophonic Reflexivity in Takashi Miike’s The Bird People in C

Randolph Jordan is a Montreal-based film scholar, educator, and multimedia practitioner. He earned his first and second cycle degrees in philosophy and film studies respectively before completing his Ph.D. in the interdisciplinary Humanities program at Concordia University. His ongoing research examines the intersection between acoustic ecology and film sound theory/practice, and he has recently completed a postdoctoral research fellowship in the School of Communication at Simon Fraser University where he was investigating geographical specificity in Vancouver-based film and media by way of sound studies and critical geography. His research has been published widely and his photographs, films and sound works have been exhibited internationally. He is now writing a book for Oxford University Press provisionally titled, Reflective Audioviewing: An Acoustic Ecology of the Cinema (expected 2016). He has been covering Montreal film, music and new media festivals for Offscreen since 2001.

Volume 21, Issue 3 / March 2017 Essays film soundjapanese filmtakashi miiketransnationalsim