Skip to content Skip to sidebar Skip to footer

Music, Image Schemata and “The Hidden Art”

At the center of Kant’s first Critique lies the schematism. The problem Kant faces concerns the relationship between pure concepts of the understanding and empirical intuitions. “How,” Kant writes, “is the subsumption of intuitions under pure concepts, the application of a category to appearances, possible?”1 Because the pure concepts of the understanding are different in kind from empirical intuitions, Kant’s theory requires some “third thing” which is homogeneous to both concepts and intuitions. He famously proffered the schematism as the solution to this problem; for, “the schemata…are the true and sole conditions for providing [the pure concepts of the understanding] with a relation to objects, thus with significance.2 Although Kant requires a homogeneous mediator if his theory is to remain intact, he is famously vague about the schematism’s nature and operation. In a notorious sentence, he was forced to admit, “This schematism of our understanding with regard to appearances and their form is a hidden art in the depths of the human soul, whose real modes of activity nature is hardly likely ever to allow us to discover…”3

Despite Kant’s inability to offer an explanatory mechanism, the schematism found a surprising reception in the work of philosopher Mark Johnson. Johnson often invokes research in cognitive science in order to defend a monist theory of the mind, where image schemata, derived from ongoing interaction between the organism and the environment, are posited as structures that organize and ground experience at an embodied, non-propositional, pre-conceptual level. In 1987, when Johnson elaborated his theory in The Body in the Mind, the schematism was explicitly invoked because Kant’s theory of the schematizing activity of the imagination, “offers us the most promising foundation for an adequate theory.”4 Although Johnson’s account ultimately diverges from Kant’s, a connection is intentionally drawn. He writes, “My use of the term [schema] derives from its original use as it was first elaborated by Immanuel Kant.”5

While acknowledging the derivation and divergence of Johnson’s theory from Kant, a problem still remains. Johnson’s theory of image schemata, much like the Kantian schematism that inspired it, is compromised by a theoretical demand that relies on an inexplicable, but necessary, hidden art. Although Johnson’s task will no longer be the synthesis of concepts and intuitions in some homogeneous “third thing,” his account of image-schemata, which are instantiated in patterns of neural activation and claimed to structure and organize experience, relies on its own “hidden art.”

Rather than raise my objections at the general level of Johnson’s theory, I will follow Johnson into the realm of musical analysis in order to draw out my critique in a specific context. In Johnson’s most recent book, The Body in the Mind, music is a privileged domain because of way that music offers “exemplary cases of embodied, immanent meaning.”6 “When we turn to music…we will see just how much…embodied meaning is operating below the level of words and propositional content.”7 In particular, my critique focuses on the relation between description and explanation in Johnson’s account of musical meaning-making.

Johnson claims that image-schemata are relevant for shaping musical meaning. Generally, schemata are necessary because, “in order for us to have meaningful, connected experiences…there must be a pattern and order to our actions, perceptions and conceptions. A schema is a recurrent pattern, shape, and regularity in, or of, these ongoing ordering activities.8 More specifically, certain kinds of schemata, which Johnson calls orientational metaphors, are useful for addressing a classic problem of musical perception: why do we refer to pitches as being high and low when frequencies are in reality fast and slow, and (moreover) our means of producing pitches may run completely counter to this conceptualization?

According to Johnson, orientational schemata, like the VERTICALITY schema, “arise from the fact that we have bodies of the sort we have and that they function as they do in our physical environment.”9 Thus, despite culturally distinct ways of applying this schema, orientational schemata are non-propositional, embodied, and fundamental in structuring concepts and experiences with respect to one another. According to music theorist Larry Zbikowski, whose work is methodologically indebted to Johnson, the VERTICALITY schema is “invoked by the various conceptual metaphors that use vertical space as a source domain through which to structure target domains…[such as] musical pitch.”10 The VERTICALITY schema is grasped “repeatedly, in thousands of perceptions” and reinforced daily in such experiences as, “perceiving a tree, our felt sense of standing upright, [or] the activity of climbing the stairs.”11 The pervasiveness of these experiences is intended to account for the pervasiveness of VERTICALITY as a structuring schema for pitch perception.

Figure 1. The VERTICALITY schema

The schema presented in Figure 1, used to map the source domain VERTICALITY onto the pitch domain, possesses three salient aspects that are not explicitly addressed by Johnson but are necessarily entailed in any perceptual correlation of pitch with verticality. The three aspects are:

  • First, one-dimensionality, in that pitches are structured in terms of a one-dimensional Euclidean line.
  • Second, unique correspondence, meaning that each pitch corresponds to a particular point on the line.
  • And third, the preservation of equivalence, meaning that pitches perceived to stand in the same psychological relation correspond to points separated by the spatial distance.12

Johnson relies on all three of these aspects of the VERTICALITY schema when offering his first example of the process of “musical meaning-making”, the melody of “Over the Rainbow.”13

Figure 2. Arlen and Harburg, “Over the Rainbow,” opening phrase.

The first aspect, one-dimensionality, is invoked when Johnson describes the opening octave leap as a “move from the lower pitch to the higher pitch”;14 the second aspect, unique correspondence, is invoked when describing the motion of the melody back to the original, unique starting point, the lower E-flat. Finally, the third aspect, preservation of equivalence, is invoked when Johnson points out how measures 5 and 6, “structurally mirror the pattern of the opening two measures.”

Johnson adds one other aspect to the VERTICALITY schema: tension. Johnson claims, “The slide from ‘Some’ (E-flat) up to ‘where’ (the octave) creates a tension, the felt tension as we move from the lower pitch to the higher pitch and feel the strain and increased energy required to reach the higher note.”15 The addition of musical tension introduces the kinds of bodily entailments that Johnson wants, for not only do pitches move from low to high, they are felt to do so in a gravitational field where lifting an object requires work and where the potential energy stored in such efforts demands release. Johnson claims that the tension of the melody is not fully resolved until the end of the phrase, when the E-flat returns to its original position. Perhaps one could say that musical tension is being understood as a conceptual metaphor for the buildup and release of potential energy.

As an experiment, say I were to alter just one note of the melody, and change the low E-flat with which the melody begins up one semitone, to E-natural.

Figure 3. “Over the Rainbow,” with E-natural replacing E-flat.

In terms of distance, the opening interval is barely altered—reduced from 12 semitones to 11 (a reduction of roughly 8 percent). But listen to how the whole melody has been altered in terms of tension.

Yet, this change in tension cannot be accounted for in terms of the VERTICALITY schema and its association of musical tension with potential energy. Listen to the opening interval in isolation.

We still have a leap, yet the leap is now tenser than before, although it requires less work to leap there. Moreover, you may now feel that the leap wants to resolve the tension by moving upwards (like so…).

Figure 4. Resolution of modified opening interval.

According to Johnson’s entailments, this opening interval should feel less tense because the potential energy has been decreased by one semitone. But that is not the case. Doesn’t the felt tension of the altered interval directly contradict the structuring role of the VERTICALITY schema, by contradicting thousands of everyday experiences of potential energy and gravitational pull? I would argue that melodic tension is not sufficiently explained by conceptual metaphors of verticality and entailments of gravitation alone.

Similarly, Johnson’s account of “Over the Rainbow” also founders on the problem of octave equivalence. When I changed the opening note of the melody, I changed the opening interval from a perfect octave to a diminished octave, or (if I had spelled it differently, a major seventh). There is something important in the fact that this alteration now makes the opening interval leap from pitch class E-natural to E-flat, and not remain on the pitch class E-flat. In other words, the original opening leap has both sameness and difference about it–the pitch class (E-flat) remains the same (E-flat) but the register changes; verticality alone does not capture the sense of sameness and difference. Moving upwards along a chromatic scale is not merely an ascent; the chromatic scale, when heard in relation to a tonic, produces the experience of circularity—or so it could be conceptualized. Perhaps we could say that our understanding of pitch chroma (which is technical name for the phenomenon at issue here) is based on a CIRCULARITY schema, where a listener perceives pitches as if they move around a wheel, returning again and again to a their starting point. Music psychologist Roger Shepard offers a more robust spatial model than the VERTICALITY schema alone, one that captures both the phenomenon of the circularity of pitch chroma and the verticality of pitch height by combining both schemata into a simple helix.

Figure 5. A helix representing pitch height and chroma. (From Shepard 1982: 353.)

I will return to Shepard in a moment, but not before noting one further insufficiency of Johnson’s account. When describing measures 3 and 4 of the opening phrase, Johnson understands the melody’s tension according to his gravitational model, so the repose on B-flat “resolves the tension somewhat, but not completely…”

Figure 6. “Over the Rainbow,” first half of opening phrase.

What if I, acting as an arranger, wanted to improve the melody by slackening the tension a bit; according to Johnson’s reasoning, I might think that A-natural would be a better note upon which to repose than B-flat, because it is a semitone lower. Yet upon hearing it, I might be disappointed in the results.

Figure 7. “Over the Rainbow,” first half of first phrase, with A-natural replacing B-flat.

Johnson’s dependence on the VERTICALITY schema cannot explain why B-flat is a better note that A-natural—because the VERTICALITY schema and its gravitational theory of tension cannot account for the role played by fifth-relations in tonal music. In other words, it cannot account for the unusual fact that notes separated by the interval of a perfect fifth are perceived to be more closely bound together than notes separated by other intervals. Fifth-relations can be represented on another circle, like the pitch chroma circle, known commonly as the circle of fifths.

Again, Shepard offers a model. By combining the circularity of fifth-relations with the salient verticality of pitch height, Shepard offers another helix. (Note that the helix Figure 8 is double because Shepard is tracing two pitches on opposite sides of the circle.)

Figure 8. A double helix representing pitch height and fifth-relations (From Shepard 1982: 362.)

This model captures the perceptual proximity of B-flat to E-flat, which is only one hour apart on the clockface, but this new model no longer represents the circularity of pitch chroma. To capture all the features addressed simultaneously (that is verticality, chroma, and fifth-relatedness) requires  a much more complicated, but much more robust model. The first helix would get wrapped around the second, resulting in a helix wrapped around a helical cylinder, in five dimensions!

Figure 9. A helix wrapped around a helical cylinder in five dimensions, representing pitch height, chroma and fifth-relations. (From Shepard 1982: 364.)

Shepard’s final model captures the perceptual saliencies at work in “Over the Rainbow” that Johnson addresses, yet the proponent of image-schemata would hardly feel comfortable arguing that this model is embodied in the way the VERTICALITY schema is embodied. What kind of everyday experiences could ever support this five-dimensional model? However, as a spatial representation, it is far more accurate at modeling the relevant aspects of pitch perception that concern Johnson.


How would an image-schematic theorist respond to this challenge? He or she would likely respond by saying that, unlike Shepard, who is interested in representing as much of the phenomenology of pitch perception as possible in a single model, they have a different goal in mind. Their theory claims not that there is a single unified schema, but that cognition is based on multiple, overlapping and simultaneous embodied metaphors. Johnson argues that image-schematic theory depends on a pluralistic ontology of metaphors—that “typically there are multiple inconsistent metaphors for any given phenomenon…Each of these different, and often inconsistent, metaphorical structurings of a concept gives us different logics that we need in order to understand the richness and complexity of our experience.”16

I could agree with Johnson, if his claim were simply descriptive in character. That is, I would find nothing objectionable if he were only claiming that we require multiple, often inconsistent structures (or habits, or preferences, or norms) to describe musical works, given the overdetermined multiplicity of aspects inherent in even in the simplest of musical phenomena.17 But Johnson’s claim is not intended to be descriptive; image-schematic theory is intended to explain how such experiences are conceptualized in the first place, i.e. how they are structured.

In Johnson’s original theory of 1987, he claimed the term schema was derived from the Kantian schematism. Like Kant, Johnson’s could not locate a mechanism for explaining his schemata; both placed all responsibility for their respective schemata under the governance of the imagination. But the story had changed by 2007, the year Johnson’s The Meaning of the Body was published. With the intervening rise of research in cognitive science, the imagination was discarded in favor of neural origins. Johnson often cites the work of Antonio Damasio to justify the neural basis of image-schemata. Although Damasio’s work addresses “images” and does not refer to image-schemata per se, Johnson elides this difference. At one point Johnson writes of his own work, “Image schemas appear to be realized as activation patterns (or ‘contours’) in human topological neural maps.”18 And later, when describing Damasio’s images, “images…are patterns of neural activation that result from ongoing interaction of organism and environment.”19 Thus, for Johnson, image-schemata, “do not so much ‘picture’ or ‘represent’ objects and events as they simply are the patterns of our experience of those objects and events. Consequently, when we talk about meaning in music, it will be in terms of the way auditory images and their relations evoke feeling-thinking responses in us.”20

To bring these statement to bear on the musical example above, one might reconstruct Johnson’s claims as follows: when we experience variations in pitch, those variations evoke feeling-thinking responses in us; in particular, those variations evoke patterns of neural activation that are also shared, or related to, or somehow associated with, other patterns of neural activation that are bound to experiences of verticality, such as “perceiving a tree, our felt sense of standing upright, [or] the activity of climbing the stairs.” But if the perception of pitch triggers the brain into habitual patterns that are associated with verticality, how does it pick or choose which parts of those total experiential patterns will be relevant? For instance, why is gravitational pull not consistently relevant to pitch perception, e.g., why can we alter one pitch and now defy all of those overwhelmingly repeated feelings and experiences? What mechanism constrains which features of the neural pattern are to be exploited and which are neglected?  What makes one aspect of the source domain salient when transferred (or associated, or compared, or related…) and another one irrelevant?

As an explanation, Johnson’s theory is ad hoc. A critical reader is not offered principles for how distinct instances of neural activation patterns are related to one another, nor how such patterns can be responsible for justifying Johnson’s ostensible phenomenological evidence for cross-domain mapping. Rather, Johnson offers a hand waving appeal to neural patterns as some sort of explanatory catch-all. If Johnson constrained image-schemata to operating as descriptions of musical works, I would be less troubled by its ad hoc character. For, I could take that bit of description and compare it to my own understanding of the work. I could treat it as a claim, demanding a certain way of hearing some stretch of sound. And I could also reject this description as inaccurate, or proffer forth my own description. But, qua explanation, Johnson’s theory oversteps its legitimacy by claiming that it is an account of how that phenomenon is structured. For all of Johnson’s monistic revision of the Kantian schematism, in the end we are forced to accept a bit of the old hidden art: a mechanism, concealed in the depths of the human soul, demanded by the exigencies of the theory; a mysterious third term, which mediates the materiality of neural activation and the phenomenality of lived experience; and an explanation that amounts to little more than a solicitation of the reader’s faith.


1. Immanuel Kant, Critique of Pure Reason, tr. Norman Kemp Smith (Bedford: St. Martin’s Press, 1969), A138/B177.
2. Ibid., A146/B185.
3. Ibid., A141/B180.
4. Mark Johnson, The Body in the Mind (Chicago: University of Chicago Press, 1987), p. 140.
5. Ibid., 19.
6. Mark Johnson, The Meaning of the Body (Chicago: University of Chicago Press, 2007), p. 234.
7. Ibid.
8. Johnson, The Body in the Mind, p. 29
9. George Lakoff and Mark Johnson, Metaphors We Live By (Chicago: University of Chicago Press, 1980), p. 14.
10. Lawrence M. Zbikowski, Conceptualizing Music (New York: Oxford University Press, 2002), 69.[/ft1] [ft num=11]Ibid., 68.
12. These aspects of the perception of musical height are borrowed form Roger Shepard, “Structural Representations of Musical Pitch,” in The Psychology of Music, ed. Diana Deutsch (New York: Academic Press, 1982).
13. Johnson, The Body in the Mind, p. 239.
14. Ibid., 240.
15. Ibid.
16. Ibid., 258-9.
17. The complexity of Shepard’s model registers something of music’s perceptual complexity, and it must be noted that it only captures three aspects of pitch perception, to say nothing of harmonic motion, dissonance and consonance, duration, rhythm, meter, timbre, form, phrasing, genre, style, etc.
18. Johnson, The Body in the Mind, p. 143, my emphasis.
19. Ibid., 243, my emphasis.
20. Ibid.
Show CommentsClose Comments