Chapter 3
Music-Cultural Evolution
in the Light of Memetics

‘But it isn’t Easy’ [to make up a Pooh song about Owl’s old house], said Pooh to himself, as he looked at what had once been Owl’s House. ‘Because Poetry and Hums aren’t things which you get, they’re things which get you. And all you can do is to go where they can find you’. – Winnie the Pooh. (Milne & Shepard, 2016, p. 146, emphasis in the original)

3.1 Introduction: Cultural Replicators, Vehicles and Hierarchies

One of the most difficult conceptual leaps to be made when understanding music in an evolutionary context is to move from considering – as Chapter 1 and Chapter 2 have done – the evolution of humans as musical creatures and the associated role of music in our individual development and daily lives, to considering the evolution of music itself. As will be argued in Chapter 4 , the concept of evolution has played a largely metaphorical role in scholarly discourses on music, but my aim in this chapter is to take music’s relationship with evolution literally. That is, I consider here the evolution of music itself from a systemic standpoint, arguing that its changes over time are driven by the same evolutionary forces, those of the VRS algorithm, that have driven evolution in the natural world (Jan, 2007). In this sense, I am again adopting a Universal Darwinian standpoint (§1.5.1 ), arguing that there is no meaningful distinction, on an algorithmic level, between biological evolution – as manifested, for instance, in the difference between a Flutist wren (Microcerculus ustulatus) and a Superb lyrebird (Menura novaehollandiae) – and cultural evolution – as manifested, for instance, in the difference between the style of Mozart and that of Beethoven (no avian analogy intended).

As the leading candidate theory of cultural evolution, the main focus of this chapter will be upon memetics (see Blackmore, 1999 for an overview and Dennett, 2017, Ch. 11 for rebuttals of criticisms). While the adherents of various theories of cultural evolution assert that there is clear blue water between them, for such a theory to be truly Darwinian – as memetics most certainly is – it would have to cleave to the notion that cultures evolve because they implement the VRS algorithm; that is, they change as a consequence of the variation, replication and selection of particulate chunks of cultural information. To this principle, memetics adds the epidemiological notion of cultural information moving through communities like a bacterium or virus (§3.4.1 ). Harari, for instance, argues that

[e]ver more scholars see cultures as a kind of mental infection or parasite, with humans as its unwitting host.… [Analogously to organic parasites,] cultural ideas live inside the minds of humans. They multiply and spread from one host to another, occasionally weakening the hosts and sometimes even killing them.… [C]ultures are mental parasites that emerge accidentally, and thereafter take advantage of all people infected by them. (Harari, 2014, p. 242)

When engaging with memetics it is important not to accept the potential limitation of its scope that has arisen in recent years. As Figure 3.0a indicates, in contemporary popular and internet culture a meme has been reduced to the status of a comic image with a large-font caption, one usually mocking the hapless target of the latest online faux-outrage (Shifman, 2013). As Figure 3.0b indicates, even so-called “music” memes bear little relationship to the replicated sound patterns that are discussed in this chapter. While such images certainly testify to the infective power of memes – this considerably augmented in the digital world (§7.6.1 ) – to regard them as the only entities that exemplify the cultural replicator would significantly limit the scope and subtlety of Dawkins’ (1989) original concept, which covers phenomena of great diversity. In music, a meme can encompass any replicable entity, from a short three-note pattern, to a structural archetype hidden from immediate perception but engendered by more tractable lower-level patterns, to an abstract idea for manipulating a particular class of musical patterns (what might be termed a “musico-operational/procedural meme” (Jan, 2011b, pp. 242–243)).

PIC

(a) Cat Meme.

PIC

(b) “Music” Meme.
Figure 3.1: Internet Memes.

This chapter continues by addressing the question of why cultural replicators are required in the first place (§3.2 ), arguing that biological replicators alone are insufficient to explain the origin and complexity of human musics. It then looks at certain precursor theories to memetics, in order to identify common threads in cultural-evolution models (§3.3 ). Thereafter, it turns to memetics itself, exploring certain key themes pertinent to the understanding of music and musicality as well as to other cultural forms (§3.4 ). The next section looks at certain specifically musical issues from the perspective of memetics3.5 ). The following section returns to the issue of taxonomy covered in §1.7 , attempting to extend certain principles of cladistic taxonomy to music-cultural evolution3.6 ). Having covered music-memetic evolution, the issue of dual-replicator coevolution is addressed next, in order to explore how genes and memes affect each others’ evolutionary opportunities (§3.7 ). Lastly, the chapter returns to the issue of music and language (co)evolution, exploring how semantics and syntax might have arisen from memetic processes and how the mechanisms of how these dimensions of music and language might be implemented in the brain (§3.8 ).

3.2 Why the Need for Cultural Replicators?

One can answer the question at the head of this section by referring back to the quotation by Harari on bee societies (page 8). In bee and many other insect species, most behaviours are genetically, not culturally, transmitted. In extreme cases, if an interconnected sequence of behaviours is interrupted, the animal will repeat the sequence of actions from the start, mechanistically. As a famous example, certain wasps of the genus Sphex deposit their prey (usually a paralysed insect) at the entrance of their nest and then enter the nest in order to check it. If the prey is moved away (by a human experimenter), the wasp will move the prey back to the nest entrance, but will also repeat the nest-inspection behaviour, a pattern that seems to be replicable ad infinitum. Indeed, the creature is truly enslaved by its genes – more specifically, by the patterns of behaviour-generating neuronal firing those genes motivate – to the extent that it is difficult to speak of it possessing any free will. This condition is aptly termed, after these gene-shackled wasps, “sphexishness” by Hofstadter (1985, p. 529).

To speak of free will is to presume, if not a consciousness capable of self-reflection (§7.3 ), then at least a capacity for weighing up options and deciding upon alternative courses of action. While such decision-making can also be genetically determined – by means of hard-wired option-choice circuits – much of it in humans is driven by learning (nurture) rather than instinct (nature). That is, decisions are based on ideas of utility and correctness which, while they generally correlate with the genetic “good”, are ultimately cultural, not biological. As summarised in Table 3.1 , Dennett (1995) expands upon this notion, identifying four categories of “creature” that occupy concentric circles of increasingly smaller magnitude (see also Dennett, 2017, pp. 98–99). These represent a progression from the application of the VRS algorithm in the domain of nature towards its application in that of culture.

Creature

Attributes

Darwinian

Those subject to the operation of the VRS algorithm, operating (only) upon genes (Dennett, 1995, 374, Fig. 13.1).

Skinnerian (after ideas of B. F. Skinner)

Those endowed with “conditionable (phenotypic) plasticity”, such that operant conditioning (or instrumental conditioning) – a form of the VRS algorithm that acts upon genetically controlled behaviours – reinforces (i.e., favours for future deployment) actions that, on testing, result in benefits to the organism (1995, 374–375, Fig. 13.2).89

Popperian (after ideas of Karl Popper)

Those possessing an evolutionarily designed internal (virtual) selective environment able to preview and mentally pre-test candidate actions in order to determine, without risk, which would be most advantageous to deploy in specific real-world situations (1995, 375–377, Fig. 13.3).

Gregorian (after ideas of the psychologist Richard Gregory)

Those whose internal selective environment is able to draw upon culturally transmitted information, such as tool design/use and language (itself a higher-order, cognitive tool), in previewing and mentally pre-testing candidate actions. (1995, 377–378, Fig. 13.4).90

Table 3.1: Dennett’s Four Types of Creature.

These creatures broadly correspond with, and are indeed products of, what Plotkin and Odling-Smee (1981) term the “four levels of evolution”, characterised by different modes of “information gain and storage” (1981, p. 229). These are: (i) the level of the gene, where “the site of [information] storage is a population’s gene pool”, changes in gene frequencies being a function of interactions between phenotypes and environments (giving rise to Darwinian creatures) (1981, pp. 228–229); (ii) the level of “variable epigenesis”,91 where phenotypes are modifiable during epigenesis by environmental factors, leading to polymorphism, i.e., alternative-track phenotypes driven by specific alleles (giving rise to Skinnerian creatures) (1981, p. 229); (iii) the level of the “learning phenotype”, where an individual is capable of transcending its inherited genetic information – and thus of solving the “uncertain futures” problem (Plotkin & Odling-Smee, 1981, p. 230; Plotkin, 1995, p. 144) – by acquiring additional, non-genetic, information via learning over the course of its lifespan, but where this information is confined to that individual (giving rise to Popperian creatures) (1981, pp. 229–230); and (iv) the level of “sociocultural” evolution, where non-genetic information acquired by an individual via learning can additionally be (memetically) transmitted to others (giving rise to Gregorian creatures) (1981, pp. 230–231).92

Gregorian creatures ostensibly have the greatest survival advantage, for not only do they have millennia of evolutionarily wired survival knowledge from their Darwinian, Skinnerian and Popperian heritage – Dennett’s “Smart Moves” (1995, p. 374) – they can also draw upon various culturally transmitted tools for survival and problem-solving. In this sense a coevolutionary perspective (§3.7 ) is needed to understand them – to understand us, given that we are the prime exemplar of this creature on earth – one that attempts to reconcile gene with meme and nature with nurture, or at least to hypothesise which might have the upper hand in any particular context. As summarised in Table 3.2 , the interactions between these two domains have been modelled by four main theories, broadly in terms of dominance hierarchies.

Discipline

Privileged Dimension

Precepts

Sociobiology (E. O. Wilson, 2000)

nature/gene

Culture is on the “leash” of the genes and serves adaptation; gene-based natural selection is all-powerful.

Evolutionary Psychology (Pinker, 1997)

nature/gene

Culture is determined and constrained by genetically evolved psychological predispositions.

Gene-Culture Coevolutionary Theory (GCC) (Boyd & Richerson, 1985)

nature/gene and culture (re-produced cultural information)

Human behaviour is the result of subtle interactions between genes and inherited cultural information.

Memetics (Blackmore, 1999)

culture/meme

Culture is transmitted by memes that are partially independent of genes and sometimes in control of them.

Table 3.2: Four Perspectives on Nature and Culture.

Moving down Table 3.2 , the four theories shift from a gene-centred to a meme-centred orientation. The extremes are demarcated by Wilson’s famous dictum that “[t]he genes hold culture on a leash. The leash is very long, but inevitably [cultural] values will be constrained in accordance with their effects on the human gene pool” (1978, p. 167), and Blackmore’s theory of memetic drive (1999), whereby meme replication is hypothesised to have shaped human genetic-cognitive development in the direction of ever greater imitative and culture-fostering ways (§3.7.1 ). The via media is perhaps best represented by (Richerson & Boyd, 2005, pp. 237–238), who, paraphrasing Dobzhansky (1973), assert that “nothing about culture makes sense except in the light of [biological and cultural] evolution”.

As will be explored more fully in Chapter 4 , the existence of cultural replicators is alluded to in the musicological literature, although rarely in explicitly evolutionary terms. One manifestation of this awareness is the idea of the composer ab/extracting a lexicon of patterns by exposure to the music of his/her culture and assortatively recombining elements of this lexicon in order to create “new” music (Ratner, 1970). A flavour of this tendency is given by Mattheson in his Der vollkommene Cappelmeister of 1739 (Mattheson & Harriss, 1981), when he asserts that “[t]he composer, through much experience and attentive listening to good work, must have assembled something now and then on modulations, little turns, clever events, pleasant passages and transitions, which, though they are only isolated items, nevertheless could produce usual and whole things through appropriate combination” (1981, p. 283, para. 15; see also Ledbetter, 2013).

While it is necessary to be sensitive to the cultural situatedness of this view – the eighteenth century is a time when discussion of assortative recombination as a compositional principle reaches its zenith – it is arguably broadly applicable to most if not all human musics. This is on account of the fact, discussed in §2.7.6 , human, that gestalt segmentation forces in conjunction with the limitations of STM will – from both a poietic and an esthesic standpoint, as Nattiez (1990) would frame it – impose strong (evolutionary-psychological) pressures in favour of music’s existing as discrete particles. The latter attribute, together with the tendency of the VRS algorithm to “feed upon” such particles, mean that a purely sphexish explanation is both inadequate and unnecessary to explain the richness and diversity of human musics.

3.3 Pre- and Proto-Memetic Theories of Cultural Evolution

Given that the evolution of music – as distinct from the evolution of musicality – relies upon our status as Gregorian creatures, it is useful briefly to review the history of the concept of the cultural replicator, before examining in more detail what such a notion can offer to our understanding of music. The following subsections consider, necessarily selectively, three key stages in the development of cultural replicator theory since the early-twentieth century, seeing them as stepping-stones towards the modern theory of memetics. These theories generally focus on replication in verbal culture, but their precepts are applicable in principle to any medium of culture, including the visual and the sonic.

3.3.1 The Mneme

Dawkins maintained that the name for his cultural replicator, the meme, arose from a contraction of “mimeme” (Dawkins, 1989, p. 192), itself derived from mimeisthai (μιμ𝜀ισ𝜃αι; to imitate) (Laurent, 1999, p. 1). Laurent argues that a “more straightforward source” for “meme” is “mneme”, which he maintains derives from mimneskesthai (μιμν𝜀σκ𝜀σ𝜃αι; to remember), and which is related to Mnemosyne (Mνημoσυνη), the Greek goddess of memory (1999, p. 1). Laurent locates an appearance of “mneme” in Maeterlinck’s entomological study The life of the white ant of 1927 (Maeterlinck, 1927).93 He notes that the white ant (i.e., the termite) is regularly referred to by Dawkins (see, for example Dawkins, 1989, p. 171; Dawkins, 2006, p. 151), and hypothesises that this may have influenced Dawkins’ development of the term “meme” (Laurent, 1999, p. 1).

Before Maeterlinck (and indeed Marais), however, and at the turn of the twentieth century, the German zoologist Richard Semon was also using the term “Mneme” (Semon, 1909; Semon, 1911; Semon, 1921; Semon et al., 1923). Despite the seemingly different etymology of Dawkins’ “meme” (mimeisthai–mimeme–meme) and Semon’s “Mneme” (mimneskesthai/Mnemosyne–Mneme), the concepts are broadly similar. That is, both refer to a particulate unit of information that is stored in an organic form – in the substance of the brain. Dawkins makes this clear when he says – drawing on ideas of Delius (1989, 1991) (see also §3.8.3 ) – that memes are “self-replicating brain structures, actual patterns of neuronal wiring-up that reconstitute themselves in one brain after another” (1989, p. 323). This formulation aligns with Semon’s belief that the experiences undergone by an organism lead to the formation of memory traces – engrams – that record the event and that can subsequently be re-activated. As Semon explains,

I use the word engram to denote this permanent change wrought by a stimulus; the sum of such engrams in an organism may be called its ‘engram-store’, among which we must distinguish inherited from acquired engrams. The phenomena resulting from the existence of one or more engrams in an organism I describe as mnemic phenomena. The totality of the mnemic potentialities of an organism is its ‘Mneme’. (Semon, 1921, p. 24, emphasis in the original)

Aside from the fact that Semon is using the term Mneme here to refer not to a single stimulus-driven memory change but to the totality of an organism’s engrams (i.e., what I term the memome; Table 1.3 ), there is a more significant difference between Semon’s and Dawkins’ conceptions. This is the former’s Lamarckian belief that such memory structures can be transmitted biologically, from one generation to another – his “inherited engrams” – as well as culturally, from one person to another – his “acquired engrams”. Dawkins, by contrast, maintains that memes are not transmitted biologically, but only culturally; and that the latter process is Darwinian, not Lamarckian. There are, nevertheless, what might be termed epimemetic complications relating to this point, discussed in §3.4.3 .

3.3.2 Evolutionary Epistemology

Although nineteenth-century commentators – even before the publication of the Origin of species – made the connection between the development of living things and the growth of human intellectual constructs, Donald Campbell, developing ideas of Karl Popper’s, was arguably the first to set such speculations on a firm footing (Popper, 1959; Campbell, 1960; Campbell, 1965; Campbell, 1974; Campbell, 1990). One of Campbell’s important early contributions was to distinguish clearly between a number of contrasting approaches to the application of evolutionary theory to human culture. These fall into two broad categories.

The first category is concerned with the “interaction of culture and social organization with man’s biological evolution” (Campbell, 1965, p. 19), which Campbell subdivides into: (i) “genetic influence upon culture” (1965, p. 19), in which cultural change is a manifestation of processes occurring at the genetic level; and (ii) its converse, “cultural influence upon genetics” (1965, p. 20), in which genes are affected by cultural changes. The second category is the most pertinent here, being concerned with “socio-cultural evolution of socio-cultural forms independent of changes in genetic stock” (1965, p. 20). This second category is also subdivided, into: (i) a number of “theories descriptive of the facts and course of socio-cultural evolution” (1965, p. 21); and (ii) a “theory descriptive of the process of evolution: variation and selective retention” (1965, p. 22). It is this latter principle – variation and selective retention (the latter essentially a form of replication) – that forms the basis of Campbell’s application of biological models to cultural change.

Asserting that this “evolutionary epistemology” is grounded on the “psychological and epistemological point that all processes leading to expansions of knowledge involve a blind-variation-and-selective-retention [“BVSR”] process” (Campbell, 1960, p. 397) – note the attenuation of agency and intentionality implied by the adjective “blind” – Campbell takes the mechanism of evolution by natural selection and applies it directly to the growth of human culture. He identifies that “[t]hree conditions are necessary: a mechanism for introducing variation, a consistent selection process, and a mechanism for preserving and reproducing the selected variations” (1960, p. 381). This closely parallels Dennett’s, Calvin’s and Plotkin’s summaries of evolution given in §1.5.1 , echoing their articulation of the three terms of the VRS algorithm. As the VRS algorithm (g-t-r) in another guise, BVSR represents the same fundamental paradigm – subsumed under the aegis of Universal Darwinism1.5 ) – that underpins all increases in complexity in the universe.

While Campbell’s illustrations – in keeping with their Popperian foundations – often focus upon the growth of verbally mediated scientific knowledge, any human conceptual system that can sustain complex mental constructs, irrespective of medium or symbolic system, is amenable in principle to evolutionary-epistemological processes.94 Moreover, in emphasising the blindness of the process, Campbell foregrounds the lack of agency and intentionality – at best, the golden serendipity; at worst the hapless fumbling – that very often attends the inception of insights in both the scientific and the artistic realms, and that has a direct parallel in biological evolution’s lack of “strategic” long-term goals (Dawkins, 2006).

Lastly, understanding Campbell’s model in terms of the VRS algorithm challenges Sereno’s assertion that evolutionary epistemology is an example of the organism/concept analogy (1991, p. 476) (§1.6.2 ). This is because replicators, and not vehicles, are subject to the operation of the VRS algorithm1.6.1 ), and thus evolutionary epistemology’s focus upon discrete units of blind variation and selective retention – single ideas, albeit often organised into complexes – implies that the gene (as replicator), not the organism (as vehicle), is the appropriate analogue to the particulate unit of knowledge.

3.3.3 Cultural Ethology

Asking “is a cultural ethology possible?”, Cloak (1975) anticipated many of Dawkins’ (1989) precepts of memetics, and aspects of its later development by others. These precepts include: (i) the digital nature of cultural information, which Cloak maintained exists as “tiny, unrelated snippets” (1975, p. 167), or “corpuscles of culture” (1975, p. 168); (ii) a distinction between (in memetic terms) the memomic and the phemotypic forms of a meme, in Cloak’s terms between “specific interneural instructions culturally transmitted from generation to generation” and their material products, or between “i[nstruction, internal]-culture” and “m[aterial, external]-culture” (1975, pp. 167–168); (iii) the control of m-culture by i-culture in order to foster the latter’s replication (“the natural selection of instructions”) (1975, p. 169); (iv) the assembly (or co-replication) of units of cultural information to form complexes, or “cooperating cultural instructions” (1975, p. 169); and (v) the view that a unit of i-culture is “more analogous to a viral or bacterial gene than to a gene of the carrier’s own genome”, so is at best symbiotic with and, at worst, parasitic upon, its human “hosts” (1975, p. 172).

Central to Cloak’s thesis is the idea (point (iii)) that the human behaviour (leading to the production of m-culture artefacts) that is the concern of (cultural) ethology is controlled by replicators – corpuscles of (i-)culture – in ways that foster their replication. In a manner that is directly analogous to gene-based natural selection, Cloak argues that,

[a]s a system of instructions [i.e., a memeplex] proliferates in a given environmental subregion, its several instantiations come into ‘constructive’ competition with each other. Any instantiation of the system which is fortuitously modified – usually by the acquisition of a novel component instruction – so that the m-culture feature it produces is better able to help determine the occurrence of the whole set in certain locations will often thereby exclude the other instantiations from surviving or propagating in those locations. Then it is only a matter of time before the modified instantiation becomes typical of the system. As this competition process is repeated, of course, the system becomes more complex and, as a rule, the m-culture feature becomes more elaborate and more ‘powerful’ in terms of its particular environmental effects. (Cloak, 1975, p. 169)

Of course, to equate a unit of cultural information with a “corpuscle” is to align it with a cell and not, as Dawkins proposed, with a sub-cellular molecule (a gene; level seven of Table 1.4 ). Nevertheless, the reference is presumably metaphorical, being made to stress the indivisible, particulate nature of cultural inheritance: Cloak implies that, like “genetically programmed instructions”, the units of cultural replication are “fixed and discontinuous”, not “plastic [and] continuously variable” (1975, p. 166). Thus, the fundamental units of cultural information are the “specific interneural instructions” referred to in point (ii) above.

3.4 Key Issues in Memetics

For all their different origins, the pre- and proto-memetic theories of culture outlined in §3.3 have several features in common, generally hypothesising a particulate basis for culture in which variant forms of units arise quasi-randomly and are selected according to some set of (conscious or unconscious) criteria for further replication. In this sense, memetics – to the extent that it has been theorised – is not fundamentally different from its precursor theories. It does, however, appear to have greater traction, certainly in popular culture, compared with its antecedents. This is perhaps the result of Dawkins’ wise formulation of the word “meme” as an analogue of “gene”3.3.1 ), and the arguable considerable sonorous appeal, concision (and similarity to “même”, for francophones) of the word. In this sense, the acceptance of Dawkinsian memetics is not necessarily the result of its intrinsically greater explanatory power compared with, for example, Cloak’s (1975) hypothesis. Rather, it arises, at least in part, from the kinds of cultural-saliency effects memetics predicts, this salience to some extent serving to validate the theory itself. In short, the “‘meme’ meme” (Costall, 1991) is a good replicator; the rest of the theory of memetics – the wider verbal-conceptual memeplex – piggybacks on the selfishness of this “index” term. In this section, I consider three aspects of memetics that seem key to the idea of cultural replicators, illustrating some aspects of them by reference to musical examples.

3.4.1 Qualitative versus Quantitative Memetics

Memetics celebrated its fortieth birthday in 2016, if the publication of the first edition of Dawkins’ The selfish gene (Dawkins, 1989) is taken to be the inception of this particular incarnation of cultural replicator theory. How high is its intellectual capital at the time of writing, and how has this changed over the last four decades? Perhaps a more tractable question might be: “how widely replicated is the ‘meme’ meme and what might this tell us about the esteem in which memetics is (or is not) held”? Of course, any current salience of the term does not necessarily mean that memetics is an established academic discipline, nor, more importantly, that it necessarily captures some or all of the truth. Indeed, repeated citations of a term might indicate attempts to bury it, rather than to praise it, as Mark Anthony might have said.95 Nevertheless, one way of measuring its changing impact, if not its veracity, is by tracking citations of terms such as “meme(s)” and “memetic(s)” (Jan, 2015a, pp. 71–72, Fig. 2). These occurrences serve as markers of the “meme” meme – as noted above, it is strictly a verbal-conceptual memeplex, indexed by “meme” – in the sense that their appearance is normally correlated with expositions, discussions and critiques – and even endorsements – of the concept(s) encompassed by the memeplex.

The justification for undertaking such tracking is that, as a verbal-conceptual memeplex, memetics is as subject to the operation of the VRS algorithm as any other memeplex. Tracking citations explicitly measures the “R” element of the algorithm and implicitly captures the “S” element. The “V” element is not directly measurable using such approaches, because the search terms are, as noted, merely markers of the larger memeplex and do not evidence internal structural changes within it – these occurring by means, as Cloak (1975, p. 169) would have it, of “the acquisition of a novel component instruction”. Only more detailed study of such sub-terms of the memeplex can allow one to track changes in its wider complexion and structure over time. The Mark-Anthony caveat notwithstanding, selection is often a marker of some level of acceptance of the concept selected.

To illustrate how this tracking might be accomplished, Figure 3.2 shows a visual representation of the chronological and conceptual-spatial distribution of a subset of publications containing the term “memetic” – in their title, abstract, keywords and (crucially) their references – from 1980–2020 listed in the Scopus research database (Scopus, 2020) and generated by the CiteSpace citation-analysis/visualisation software (Chen, 2019b; Chen & Song, 2019).96 CiteSpace

is designed to answer questions about a knowledge domain …. A knowledge domain is typically represented by a set of bibliographic records of relevant publications.… CiteSpace is designed to make it easy … to answer questions about the structure and dynamics of a knowledge domain[, such as] …: What are the major areas of research based on the input dataset? How are these major areas connected, i.e., through which specific articles? Where are the most active areas? What is each major area about? Which/where are the key papers for a given area? Are there critical transitions in the history of the development of the field? Where are the ‘turning points’? The design of CiteSpace is inspired by Thomas Kuhn’s [The] Structure of Scientific Revolutions [(Kuhn, 2012)97]. The central idea is that centers of research focus change over time, sometime incrementally and other times drastically. The development of science can be traced by studying their footprints revealed by scholarly publications. (Chen, 2014, p. 4)

CiteSpace essentially maps the forms of conceptual transmission described by the epidemiological “virus-of-the-mind” (Brodie, 1996) and “thought-contagion” (Lynch, 1996) formulations common in the memetics literature of the 1990s (see also Rosati et al., 2021). By “a visual representation of the chronological and conceptual-spatial distribution” in the paragraph before the quotation above is meant a depiction of the cultural-transmission relationships between sources dealing with the chosen concept and the groupings they form. Sources are termed “nodes” in CiteSpace, and are represented by small coloured dots in the “visualisations” it generates. Groupings are termed “clusters”, and are represented by collections of nodes of varying density connected by coloured lines emanating from one or two central nodes, the latter being identified by associated author-date citations. Clusters therefore arise when certain relatively discrete, highly interconnected constellations of nodes develop as a result of their drawing upon one or two seminal (highly-cited) nodes at their notional “centre”, creating a network of many citers connected to few citees. In this sense, “[e]ach cluster corresponds to an underlying theme, a topic, or a line of research” (Chen, 2020, sec. 4.2).

From a Darwinian perspective, the connections binding together clusters essentially trace replication relationships from intellectual antecedents to their consequents. To map these epistemological spaces, clusters are identified by a number (starting at “#0”, in descending order of cluster size) and a verbal label, these being associated with one or two node labels identifying the most important sources in each cluster. Cluster labels are generated by CiteSpace using title, index/keyword, or abstract terms, utilising specific statistical-weighting models.98 Cluster #0 in Figure 3.2 , for instance, represents sources linked by the noun-phrase “evolutionary ecology” and its cognates, the analysis extracting this label using a log-likelihood ratio (LLR) distribution from node-titles (other statistical-analysis methodologies may alternatively be utilised for this purpose). The analysis and representation of cluster distribution by CiteSpace is extensively configurable using a considerable array of mathematical functions, and one could compare and contrast the outcomes of several of them in order to understand more fully the cultural-transmission dynamics of the knowledge domain in question. For present purposes, however, Figure 3.2 represents the results of employing the default settings of CiteSpace and of following the guidance for use given in Chen (2019a).

Returning to the dataset, using “memetic” as the search term will also locate “memetics”, and will avoid confusion of “meme” with “même” in literature in French. At the time of searching, and using the search-parameters selected, the total number of publications containing this term was 4,158, the earliest being Ball (1984) and not Dawkins (1989). This is because Dawkins (1989) (the first edition of which was published in 1976), while it coins the term “meme”, does not use the term “memetic” in its title. To constrain the search results to a reasonable size, CiteSpace analysed a subset of these 4,158 publications, namely entries in Scopus’s Arts and Humanities category, which, at the time of the query, contained 160 records. The justification for this constraint is that this subset represents a clear disciplinary boundary from other subsets, such as the Mathematics category (1,609 records), or the Biochemistry, Genetics and Molecular Biology category (93 records). The Arts and Humanities subset does not include Ball (1984) as a record because it is not assigned to this category, but this source is (as the citee) referenced in an article (as the citer) from 1998. The earliest record in this category of the Scopus database to contain the search term relates to an article dating from 1996.99

PIC
Figure 3.2: CiteSpace Visualisation of Citations of “Memetic” per year 1980–2020 in Scopus.

Having explained the necessary context, what does Figure 3.2 reveal about the chronological and conceptual-spatial distribution of the selected literature on memetics? Before examining the visualisation itself, CiteSpace’s analysis of the number of unique records (of the total 119 given in note 99 on page 430) per year, graphed in Figure 3.3 , shows a halting but clear increase, indicating growing dissemination of the “meme” meme.

PIC
Figure 3.3: Number of Records Containing “Memetic” Per Year 1996–2019.

Turning back to Figure 3.2 , and to summarise a complex set of relationships,100 one might make the following observations:

1.
While there are 116 notional clusters and eleven clusters graphed in the default layout (#0–#6, #10, #14, #16 and #24), using CiteSpace’s facility to display only the largest of them reduces this number to eight principal clusters (#0–#6 and #10). The presence of a number of clusters, entirely typical of CiteSpace visualisations, indicates that, as with most knowledge domains, transmission here does not occur in orderly concentric circles from a single central point in the manner of ripples in a pond, but rather in the form of various semi-discrete breakout “infections”, which spawn their own local progeny. Another way to regard the non-concentric layout of Figure 3.2 is to invoke the concept of speciation. While the verbal-conceptual memeplexes underpinning the different clusters are not, according to Figure 1.4 , analogous to species (memeplexes occupy level six; species occupy level three), a similar process is at work in that once a cluster has broken away from its “parent”, it tends not to re-aggregate with it.101
2.
As might be expected from a nascent discipline, some of these clusters arise from sources that appear to have (co-)fostered the development of more than one cluster. Those sources are Boyd and Richerson (1985) (clusters #2 and #6) and Blackmore (1999) (clusters #1, #2 and #4). In addition to these highly cited sources, highly cited authors include (unsurprisingly) Dawkins, represented by Dawkins (1989) (labelled on Figure 3.2 by the date of publication of the first edition, 1976) (cluster #1), and Dawkins (1983a) (labelled by its first-edition date of 1982) (cluster #3); and Aunger, represented by Aunger (2000) (cluster #0), and Aunger (2002) (cluster #5). CiteSpace’s term for such pivotal sources is “centrality”, which “quantifies the importance of the node’s position in a network” (Chen, 2006, p. 362). The program’s “narrative summary” of this network identifies the three most central nodes as (in decreasing order of centrality) Aunger (2000), Dawkins (1989) and Blackmore (1999). Moreover, the summary identifies, in its “citation count” ranking, the three most cited nodes as (in decreasing order of citations) Dawkins (1989), Blackmore (1999) and Aunger (2000).
3.
As noted above, cluster #0 is associated with the concept of evolutionary ecology, and Aunger (2000) is the central node. Its intellectual focus is exemplified by one of the “hidden” nodes – i.e., one not explicitly labelled with an author-date citation – of Figure 3.2 . On Blute’s definition, evolutionary ecology “seeks a theoretical halfway house between the near-universal tautology of the fitness-selection nexus and the near-complete historical specificity of the myriad details of what is adaptive in locally prevailing circumstances” (2002, sec. 1, see also Tab. 1). In ways that are directly applicable to memetics (specifically the evolution of science, in the case of Blute (2002)), the discipline considers the effects on evolution of population density (i.e., fixed boundaries, variable energy) (Blute, 2002, sec. 2, sec. 3), and of growth rate (i.e., variable boundaries, fixed energy) (2002, sec. 5).
4.
Cluster #1 relates to the extension of the “selfish gene” metaphor coined in Dawkins (1989) to cultural replicators, this “selfish meme” cluster being particularly distinct. As the layout of Figure 3.2 suggests, while the initial impetus for this cluster was provided by Dawkins (1989), it was further impelled by Blackmore (1999). The label of cluster #2 lacks the adjective “selfish” of cluster #1. This absence might account for the smaller size of cluster #2 in comparison with cluster #1 (as noted above, the lower the number, the larger the cluster), and might, indirectly, be taken as evidence of the selfish replicator concept itself.
5.
Cluster #3 is concerned with the evolution of satirical cartoons of the catastrophic oil-slick caused by the sinking of the Prestige oil tanker off the coast of Spain in 2002. While exemplified by such sources as Domínguez (2015), and while perhaps the ultimate source of the phenomenon discussed apropos Figure 3.1 , this cluster originates (as noted) from Dawkins (1983a) and also from Brodie (1996), the latter, as mentioned after the quotation on page 249, developing (as with Lynch (1996)) an epidemiological model of memetics. Associated with Aunger (2002) and Baudrillard (1988), cluster #5 relates to patenting and other intellectual-property issues understood in the light of memetics, and takes its label from the title of Bedau (2013).
6.
The transmission of memetic ideas in the musicological literature is relatively peripheral to the main centres of transmission, but – at the risk of appearing immodest – the (sub)title of one of my own publications (Jan, 2012) figures as the label of cluster #4. CiteSpace extracts the phrase “Haydn chord progression”, which might suggest that the whole cluster is concerned with this subject. It is worth remembering, however, that in this network of citers and citees (and indeed all networks analysed by CiteSpace), a wide range of sources may be referenced, and a significant portion of this literature may not necessarily be about Haydn, this specific chord progression, or even music theory more generally. In this sense, and although potentially illuminating, a cluster label may often represent the tip, as opposed to the main body, of an iceberg.
7.
Two clusters, #6 and #10, are marked by the appearance of the phrase “natural myside bias”, which relates to issues of belief-transmission in knowledge communities. Cluster #6 (which develops as an outgrowth of cluster #2) is centred on a study of children’s awareness and understanding of adult thought-processes (i.e., of children’s possession of a “Theory of Mind”; §3.7.1 , §3.8.2 ) in Cameroonian pygmies (Avis & Harris, 1991). Cluster #10 relates to issues of authority and controversy in science, represented by Hull (1988b) and Gould (1997), the latter node representing a specific skirmish in a protracted conflict between Dennett and Gould over Darwinism pitting “fundamentalists” (principally Dawkins and Dennett) against “moderates” (as Gould implicitly presents himself).

To recall the distinction made earlier, it seems that some of these 119 sources (and the 4,158 of which they form a subset) did indeed come to praise memetics and some came to bury it. Whether one believes the pro or contra sources, at the very least, as a hypothesis, memetics has had a successful replication history (although this is not to compare its replication with other theories of cultural evolution, let alone with other scientific theories more broadly). This history exemplifies a key precept of the theory, namely that transmission of an idea is independent of its veracity. Of course, undertaking a distributional analysis of a verbal-conceptual memeplex is only one form of what might be termed population memetics, one that aligns with, and is facilitated by such corpus-analytical/“big-data” approaches exemplified by CiteSpace (see also Sharma et al., 2014; Rose et al., 2015; Jeffries, 2019).

There is, moreover, an extant tradition of computer-aided intra- and inter-work pattern-analysis in music (§6.1 ), whose methodologies can be re-purposed to serve a specifically quantitative-memetic agenda. Indeed, some of this work – Savage (2017) is a good example – has essentially studied memetic evolution, albeit generally not explicitly under that rubric. Thus, while intra- and inter-work memetics has hitherto often been conducted qualitatively – certain patterns having been identified “manually” in candidate works and ascribed a memetic status on balance-of-probability grounds – there is considerable scope for applying the technologies represented by CiteSpace to music “automatically”, in order to garner quantitative data on museme prevalence and transmission.

3.4.2 Cultural Adaptation and Exaptation

Discussing the fact that the distinction between adaptations (aptations built by selection for their current role) and exaptations (aptations “coopted” for their current role) had not been fully recognised until their own article gave it an appropriate nomenclature (§2.5.1 ), Gould and Vrba argue that “the conceptual framework of modern evolutionary thought, by continually emphasizing the supreme importance and continuity of adaptation and natural selection at all levels, subtly relegated the issue of exaptation to a periphery of unimportance” (1982, p. 6). It is possible to understand this as an example of the replication of a particular verbal-conceptual memeplex (that defining exaptation) being constrained by the predominance of a more powerful memeplex (that defining “the [adaptation-focused] conceptual framework of modern evolutionary thought”). In this sense, the relationship between the two memeplexes is readily conceivable in terms of constraints on the selection of the weaker memeplex by the stronger.

What would constitute an adaptation in memetic terms, and how might it be distinguished from those phenomena that might more properly be regarded as exaptations? It is perhaps easier to find examples related to this issue in music than in verbal culture. Figure 3.4 shows candidates for these processes, Figure 3.3a showing the local subdominant of V (thus, a hint of the tonic, G major) in the dominant second half of an exposition; and Figure 3.3b showing the same inflexion but now as a beginning gesture, not as the arguably more normative ending gesture, to invoke Agawu’s tripartite “beginning-middle-ending paradigm” (1991, pp. 53–54).

PIC

(a) Adaptation: Mozart, Piano Trio in G major K. 496 (1786), I, bb. 75–78 (after bb. 72–74).

PIC

(b) Exaptation: Beethoven, Piano Trio in Emajor op. 1 no. 1 (1795), I, bb. 1–9.
Figure 3.4: Adaptation and Exaptation of Musemes.

I make this claim of normativity without advancing any supporting evidence; but hypothesise that a statistical survey of the various binary forms antecedent to sonata form, and of sonata forms themselves (Rosen, 1988; Caplin, 1998; Hepokoski & Darcy, 2006), would probably show a significant predominance of the “ending V/IV–IV” over the “beginning V/IV–IV”.102 This would suggest that the ending form evolved first (i.e., it was an earlier adaptation, perhaps for reasons of its alignment with various natural and nurtural constraints); and its “cooption”, to use Gould and Vrba’s (1982) term, as a beginning gesture was a later exaptation. Nevertheless, the use of the quiescenza schema – the archetype of this pattern – as a beginning gesture in some mid- to late-eighteenth-century music (Gjerdingen, 2007a, pp. 181–182, 460) might be taken as evidence against my “end-adaptive/beginning-exaptive” claim and in favour of its inversion, although Gjerdingen believes that “[a]s a framing device, it could also appear as an opening gambit …, though this usage was less common” (2007a, p. 460).

3.4.3 Lamarckism versus Darwinism in Cultural Evolution

The key distinction between Darwinian and Lamarckian inheritance in biological evolution was discussed in §1.8 . This section considers the extent to which the distinction is applicable to cultural evolution (see also Dennett, 2017, pp. 243–247). To summarise the earlier discussion briefly, while Lamarck believed in the inheritance of acquired characteristics, the Darwinism of the Modern Synthesis insists on the distinction between a germ line and a soma line, to recall Weismann’s terms. This means that only changes motivated by the genetic “shuffling” that occurs at conception can be transmitted to an organism’s offspring, not any modifications to a parent’s body that occur during its lifetime. One apparent manifestation of Lamarckism is the phenomenon of epigenetic inheritance, which offers a set of mechanisms – perhaps most notably the chromatin-marking EIS – by means of which certain acquired attributes might not only be inherited by cells within tissues, but which might also be transmitted to an organism’s offspring. As argued on page 89, this poses no threat to Darwinism – the Lamarckism is illusory – because genes are the only replicators on earth able to carry sufficient information to build vehicles; and, perhaps more fundamentally, because whatever mechanism carries information, the VRS algorithm does not depend upon a specific architecture for its implementation, only upon the presence of its three component processes.

One point not made in §1.8 is that epigenetics is not universally accepted by evolutionary theorists, and is particularly controversial when applied to our own species. This is due not only to ongoing scientific debates about the nature and extent of epigenetic mechanisms (which remain imperfectly understood), but also because the theory has been hijacked by those who wish to use it in the service of social engineering in order to foreground nurture over nature. As Murray remarks, “[e]pigenetics seems to promise release from genetic determinism. It seems to offer new explanations for phenotypic differences and new possibilities for remediation. At the extremes, it seems to offer hope for greater equality of capabilities and outcomes across groups” (2020, loc. 5058). Yet, having considered such organisms as the mule, the hinny and the Toadflax – in which epigenetic inheritance appears to elucidate certain phenomena that defy a genetic explanation – it should be noted that the “involvement of epigenetic mechanisms in intergenerational transmission has been yet little documented in humans …, and never across several generations” (Marcaggi & Guénolé, 2018, p. 6). Nevertheless, it is important to make a distinction between epigenetic transmission/inheritance – where some attribute is inherited by non-genetic means – and the action of epigenetic factors in brain plasticity – where some ontogenetic change occurs for reasons that are not directly genetic. Of these phenomena, the latter is more accepted than the former (see note 81 on page 224). Despite this, it is possible that epigenetics in the former sense might yet be relevant to some extent to cultural, if not to (human) biological, evolution, although not necessarily in ways its more extreme proponents might envisage. As Kellermann summarises epigenetics and his application of it,

[e]pigenetics is typically defined as the study of heritable changes in gene expression that are not due to changes in the underlying DNA sequence. Such heritable changes … often occur as a result of environmental stress or major emotional trauma and would then leave certain marks on the chemical coating, or methylation, of the chromosomes. The coating becomes a sort of ‘memory’ of the cell and since all cells in our body carry this kind of memory, it becomes a constant physical reminder of past events, our own and those of our parents, grandparents and beyond.… In the same way as parents can pass on genetic characteristics to their children, they would also be able to pass on all kinds of ‘acquired’ (or epigenetic) characteristics, especially if these were based on powerful life-threatening experiences …. Such environmental conditions would leave an imprint on the genetic material … and pass along new traits even in a single generation. (Kellermann, 2013, p. 34, emphasis in the original).

Reiterating the caution that epigenetic markers can only be passed on to an organism’s descendants if they affect gametes, the type of epigenetic inheritance hypothesised here concerns a different category of traits from those generally explored by “mainstream” epigenetics. While the latter consider the transmission of morphological and physiological changes acquired during an organism’s lifetime, for Kellermann (2013) the traits in question are, it seems, primarily psychological; and they tend to result specifically from some form of violent trauma, rather than from some other environmental or idiopathic cause. Kellermann (2013) explores the specific case of the horrors suffered by holocaust survivors, which, he believes, are re-lived by first- and second-generation descendants of victims as a result of epigenetic transmission. As he claims in connection with such “transgenerational transmission of trauma” (TTT), “[i]t seems that these individuals, who are now adults, somehow have absorbed the repressed and insufficiently worked-through Holocaust trauma of their parents, as if they have actually inherited the unconscious minds of their parents” (Kellermann, 2013, p. 33, emphasis in the original; see also Franklin et al., 2010).

Kellermann asserts that epigenetic changes to parents’ DNA resulting from trauma might be transmitted to their children and grandchildren who, as a result, would have a higher propensity to suffer from post-traumatic stress disorder (PTSD), despite not having directly experienced their parents’ or grandparents’ ordeals. PTSD is often manifested in such individuals in the form of nightmares whose specific content seems to replicate their ancestors’ experiences (Kellermann, 2013, p. 35). Despite his caution that, “[whether] any specific past memory can be epigenetically transmitted or not … must be left open to speculation and we should be careful not to slip from reasonable assumptions to fantastic and unsupported scenarios” (2013, p. 35, emphasis in the original), Kellermann appears to believe that there is indeed some mechanism whereby trauma-mediated methylation can be transmitted to offspring in ways that – and here is the leap – affect neurons in such a way as to reconstitute in the child the ancestral patterns of interconnection responsible for encoding the trauma – if not the specific details of the original memory from the parent or grandparent, at least some existential shudder caused by its epigenetic echo.103 It should be clear that this claim goes well beyond what mainstream epigenetics would be prepared to countenance, adherents generally restricting themselves to considering such cases as the odd-shaped flowers of the peloric Toadflax. For harsher critics of epigenetics, or certainly of its populist appropriation, the evidence for such extended applications is “weak, circumstantial, observational, and correlative, and … warrants circumspection and careful interpretation …” (Mitchell, in Murray, 2020, loc. 5121) – this apropos a related study by Yehuda et al. (2016).

A memetic interpretation offers a different way of understanding what appears to be happening here, countering Kellermann’s (2013) implication that memories can be biologically transmitted, whether genetically or, as he suggests, epigenetically. It seems more likely that the propensity to PTSD in the descendants of holocaust survivors results from their being influenced by the memetic transmission of imagery of horror, both within the affected family and also from the wider culture, to which affected individuals are unavoidably exposed. The effects of such cultural transmission would presumably be intensified in individuals who grew up with older family members with first-hand experience of such events, whose psychological scars – perhaps manifested in the form of high general anxiety levels or excessive risk-aversion – would be evident, even though often unspoken, and would heighten the force of culturally transmitted holocaust imagery as a result of the direct personal connections involved.104

A distinction, articulated in the form of two questions, now presents itself, which will be treated briefly, and at times somewhat speculatively, in the remainder of this section: (i) what epigenetic factors, if any, affect memetic transmission?; and (ii) if the transmission of memes is held to be analogous to the transmission of genes, is there a memetic equivalent to epigenetic inheritance – what might be termed epimemetic inheritance?

On the first question, even if, contra Kellermann, a memory cannot be epigenetically (and thus neither genetically) transmitted – which, on the basis of the above discussion, seems very likely to be the case – it might be that the memetic transmission of the memory’s information-content could still be mediated in some way by epigenetic factors. Might epigenetic modifications to the peripheral and central nervous systems, if they exist, differentially advantage (or disadvantage) certain m(us)emes? If so, is there a clear qualitative or quantitative difference between the genetic mediation of memetic transmission, where genes set the environmental “frame of reference” for memes; and the epigenetic mediation of memetic transmission, where some experience in an individual’s life (or the life of one of their (grand)parents) affects their gene expression, which in turn specifically affects the kinds of memes that individual, and his/her (grand)children, are receptive (or averse) to and/or are more likely to remember and transmit?

To say that genetic mediation affects the transmission of m(us)emes is nothing new: our innate perceptual-cognitive attributes determine what may or may not be memetically replicated, and thus our cultural life is to a significant extent contingent upon what we can and cannot perceive, comprehend and remember (Lerdahl, 1992). As discussed in §3.2 , this was framed by Wilson in terms of the metaphor of genes holding culture on a leash. Gene-imposed constraints are, however, often quite coarse-grained: they specify such generic restrictions as, in music, the duration of STM for phrases, or the normative pitch intervals of melodies; they do not, for instance, privilege precise sequences of intervallic contours, or specific rhythmic patterns. By contrast, epigenetic mediation is equivalent, to adapt Wilson’s metaphor, to the (epi)genes giving the cultural dog specific commands, or eliciting certain behaviours, perhaps using particular rewards to do so. The difference between these two categories is therefore that genetic mediation inheres in the configuration and policing of the learner bottleneck; whereas epigenetic mediation inheres in the finer-grained “nudging” of movement through that bottleneck, together with a more selective degree of filtration.

Developing the latter point, and at the risk of abandoning the cautions around epigenetics advocated above, it might, at least in principle, be possible to correlate epigenetic mediation with specific m(us)emes. Could it be, for instance, that a profound emotional experience in the early life of an individual might lead to epigenetic changes in the emotion centres of their brain, such that they or their descendants are especially sensitive to certain m(us)emes, thus making them more likely to assimilate and transmit them (or, conversely, to reject them)? In music, this might perhaps be manifested in a heightened sensitivity to musemes that have a “pain-pleasure” emotional contour owing to underpinning dissonance-consonance patterns, such as that shown in Figure 3.5 , with its 7–6 (c 2–b1 over bass d) appoggiatura in b. 27.

PIC

Figure 3.5: Dissonance-Consonance/Pain-Pleasure Museme: Mozart, Così fan tutte K. 588 (1790), no. 4, “Ah guarda, sorella”, bb. 22–28.

The answer to this question is obviously very difficult to determine, because any increased (or decreased) propensity to replicate certain musemes differentially over others may be the result of one or more of the following four factors: (i) genetic (“culture on a leash”); (ii) epigenetic (altered-gene-expression mediating perceptual-cognitive propensities); (iii) memetic (multi-museme-mediated changes to a cultural environment); or epimemetic (see below) factors. Each could produce broadly similar results to the others, and all could operate in various forms of conjunction.

On the second question raised on page 268, and occupying the distant shores of speculation, if there is a meaningful distinction between the genetic and the epigenetic, is there also a parallel distinction between the memetic and the epimemetic? One of the main hurdles this question faces relates to the quite different mechanisms of genetic and memetic inheritance: the former relies upon the complex information-architecture of patterns of nucleic acids acting, via the proteins that build bodies, to ensure their replication; the latter relies upon the complex information-architecture of patterns of neuronal interconnection acting, via behaviours and the artifacts this behaviour gives rise to, to ensure their replication. Moreover, because there is not such a clear-cut (replicator-vehicle) distinction between the memome and the phemotype – between the germ line and the soma line – as there is between the genome and the phenotype, it is arguably more difficult to distinguish between the memetic and the epimemetic than it is to distinguish between the genetic and the epigenetic. Is there anything in memetics that is even remotely analogous – functionally, if not structurally – to the chromatin-marking EIS? A comparable phenomenon might perhaps be seen in the capacity of m(us)emeplexes to contain elements that are “expressed” in some instantiations and “silenced” in others.

In the verbal-conceptual realm, for instance, a given articulation of a particular constellation of ideas might include several or most of its independent memetic subcomponents; or it might restrict their expression, such that one meme stands for the whole (silenced) verbal-conceptual memeplex, as in the rhetorical trope of synecdoche. In music, a museme that forms a component of a musemeplex might stand alone, implying the other silenced musemes. As an example, Figure 3.6 shows a two-voice pattern that is also a constituent (specifically, Musemes 1 and 5) of the musemeplex shown in Figure 3.9a and Figure 3.9b on page 291 (see also Jan, 2004, p. 73). In Haydn’s phrase, these two musemes form components of a different structure, itself possibly a musemeplex. This might be understood as suppressing the expression of those (three) other musemes, and thus the musemeplex as a whole, from the chronologically and possibly aetiologically antecedent Mozart phrases that are not shared with Haydn’s phrase.

PIC

Figure 3.6: Musemes from “Silenced” Musemeplex: Haydn, String Quartet in F major, op. 74 no. 2 (1793), II, bb. 1–8.

A putative epimemetics is also tied up with the issue of mutation/variation. In the case of genes, mutations may confer advantages upon their possessor that may differentially affect their survival. The same is true of epigenetic changes that, while they do not alter a given gene, may nevertheless mediate its expression and thus have an aptive effect via the resultant phenotype. In the case of m(us)emes, a comparable situation might be found in the aptive benefits that accrue from the (eventual) expression of what might be termed “suppressed mutations”. An intriguing passage in Narmour (1977), an early statement of his Implication-Realisation (I-R) model, serves as an illustration of this principle, and also affords an objective mechanism for certain processes often understood purely metaphorically in historiographic discourses on music (§4.3.3 ). Figure 3.7 (a much simplified version of Narmour, 1977, 127–129, Ex. 44, ignoring certain rhythmic aspects) hypothesises how implicative forces in musical patterns – a form of agency reinscribed in cognitive-psychological terms – can, if realised, become consolidated as new (historical-) stylistic norms that themselves, as a result of newly available implications, motivate further style-expanding realisations.

PIC

(a) Implication.

PIC

(b) Realisation with Further Implication.
Figure 3.7: Realisation of Implicative Forces as a Factor in Musical Style-Change.

Here, pattern x arises from the realisation in Figure 3.6b of the implication for further upward motion from the g1 in Figure 3.6a . Pattern x then carries within it the implication for further upward continuation from the a1. All these implications are instances of the structure Narmour terms “Process”, symbolised by “[P]” – i.e., they are step-wise (or small skip-wise) motions that are continued in the same direction and by similarly small intervals (1990, p. 89). The opposite structure is termed “Reversal”, symbolised by “[R]” – i.e., they are stepwise (or small skip-wise) motions that are interrupted by a large interval moving in the opposite direction (or vice versa) (Narmour, 1990, p. 151; see also Narmour, 1999). An epimemetic interpretation of this process of style-expanding mutation would see such changes as being initially suppressed by various closural forces, before eventually overwhelming those constraints and reifying that which was previously latent.

3.5 Memetics and Music

Although §3.4 included some consideration of music, this section considers in more detail three areas in which memetics might be brought to bear specifically on its evolutionary understanding. After a brief overview of some key precepts of “musicomemetics”3.5.1 ), the first area (§3.5.2 ) concerns the assemblage of musemes, a process that creates the large-scale hierarchic structures characteristic of most human musics. The second (§3.5.3 ) expands upon the first, regarding improvisation and composition as exemplifications of the processes discussed in §3.5.2 . The third (§3.5.4 ) considers the relationship between musemes and what might be termed “gestemes” – the culturally transmitted gestures intrinsic to musical performance.

3.5.1 Overview of Musicomemetics

I have covered elsewhere various aspects of memetics as it relates to music (Jan, 2007; Jan, 2010; Jan, 2011a; Jan, 2011b; Jan, 2012; Jan, 2013; Jan, 2014; Jan, 2016b; Jan, 2015b; Jan, 2016c; Jan, 2016a; Jan, 2018a; Jan, 2018b). The following discussion will serve as a very concise summary of some of the issues covered in these publications, and as an attempt to relate them to some of the ideas covered in Chapter 1 and Chapter 2 . By way of a starting-point, Figure 3.8 shows a candidate museme at various stages of its hypothesised evolutionary history.

PIC

(a) Mussorgsky: Boris Godunov (1872), Prologue, Rehearsal no. 28, bb. 8–13.

PIC

(b) Tchaikovsky: The Sleeping Beauty op. 66 (1889), Panorama, bb. 23–26.

PIC

(c) Stravinsky: The Firebird (1910), Tableau II, Rehearsal No. 200, bb. 1–4.

PIC

(d) Middleground Schema.
Figure 3.8: Museme in Three Russian Composers.

Figure 3.7a (Mussorgsky, 1987, after) shows a passage that, over a dominant pedal, features the lower-auxiliary motion 212.105 The middle element of this pattern, the 1, is harmonised by a chord that, if one assigns a local harmonic designation, is an implied vi4
2   – the “6”, e2, is not stated – within the local dominant prolongation of the auxiliary. Figure 3.7b (Tchaikovsky, 1900, after) shows a similar 212/V structure in which the middle element is harmonised by a full vi4
2   in which all components of the central seventh chord are present, giving the pattern a subtly different sonority – the “6”, here b, markedly alters the effect – to Mussorgsky’s version. Figure 3.7c (Stravinsky, 2006, after) has essentially the same progression as Tchaikovsky, save that the auxiliary motion is incomplete, being 21–(4). A schema is shown in Figure 3.7d , (i). An alternative method of harmonising such a 21–(2) auxiliary is shown in the abstract of Figure 3.7d , (ii), whereby the 1 is harmonised by, on one interpretation, a V11   , created by overlaying a IV chord over the dominant bass. The central “IV + V” element of this form is termed the “rock dominant” by Spicer (2004, p. 38), owing to its prevalence (not just in auxiliary structures) in rock and pop songs. If Figure 3.7d , (i) represents what might be termed the “Russian auxiliary” progression, then Figure 3.7d , (ii) might be termed the “Rock auxiliary”.106

Naturally these two variants (three, if Mussorgsky’s version is distinguished from Tchaikovsky’s and Stravinsky’s) of the auxiliary museme – like the “German”, “French” and “Italian” augmented sixth chords – have different aural/phenomenological properties: their different note-structure, represented by different notational symbology and explicable using different theoretical terms, gives rise to different aural effects. While it is always difficult to use verbal language to capture musical effects, there is something, to my ears at least, very striking and singular about the Russian auxiliary. Even if cultural familiarity did not perhaps lead us to associate it with such extra-musical concepts as the onion domes of Saint Basil’s Cathedral, the incense of Russian Orthodoxy, or the chill of a Siberian winter, it would perhaps impress itself upon our perception as something particularly vibrant and “colourful”. Thus, it is potentially a good museme, because it inveigles its way into our memories as something pleasurable to recall and savour. However it arose – as a series of intersecting melodic schemata or as a distinct harmonic phenomenon – it exemplifies perfectly the tendency of musical material to engender its replication in direct proportion to its perceived/cognised salience, whether this is assessed qualitatively or quantified objectively.

Of course, I have not quantified the prevalence of this museme, merely hypothesised that it might be widely replicated in this repertoire, and possibly in French music, from which Russian music drew extensively at this time.107 I have done this on the basis of the cultural context of these three composers and their use of a Russian folk melody (the direct source of Figure 3.7a ) for inspiration. Naturally, one could indeed conduct a quantitative survey – a corpus-analytical investigation along the lines of that discussed in §3.4.1 – searching a dataset of (usually symbolically) encoded music using a pattern-finding utility such as the Humdrum Toolkit (Huron, 2002; Huron, 2022; see also Velardo et al., 2016). But there is room also for the kind of qualitative intuition represented by Figure 3.8 because in some ways it validates the hypotheses on which memetics rests: if one knows a passage such as Figure 3.7b , then hearing Figure 3.7a and/or Figure 3.7c , either for the first time or on re-hearing, will perhaps “cue” one’s internal representation of the pattern, adding the new instance(s) to the extant (internal representation of the) museme allele-class.

3.5.2 Musemic Hierarchies: Recursive-Hierarchic Structure-Generation via Allele-Parataxis

As an observed principle of pedagogy, composition, improvisation and analysis, discrete musical patterns may combine in a variety of ways in order to form longer musical sequences. In some musics, such as that based on the Galant schemata of the eighteenth century, a relatively small repertoire of clearly defined patterns combines in ways that are statistically predictable, in a Markovian sense108 (Gjerdingen, 2007a, p. 372, Fig. 27.1). In other traditions, the nature of the units is more variable, and the range of combinations more extensive; but in presumably all musics there are certain more or less statistically likely, or unlikely, juxtapositions. Such concatenation is determined by two ostensibly opposing forces: the bottom-up attributes of the constituent musemes, specifically how their initial and terminal nodes (their first and last pitches) affect their patterns of (re)combination, what might be termed their conglomerative grammar (Jan, 2010, p. 13); and the top-down constraints of some structural schema, which, because such models recur consistently in cultures, are themselves musemes, at a higher structural-hierarchic level.

In terms of bottom-up forces, the harmonic and voice-leading attributes of a museme fit it for playing a particular role in a larger-scale musical structure – it might serve to modulate to a new key, to consolidate that key, or to fulfil any one of a number of other such structural/functional roles. These functions tend to occur in a specific order – a movement will not normally modulate in its final bars, for instance – and so a span of music can be thought of as a series of structural-sequential loci or nodes, each of which will tend preferentially to be filled by members of a certain set of musemes that are all broadly similar in their underpinning contrapuntal-harmonic and voice-leading framework, but which might be somewhat different in their surface details. In this sense, the set of musemes capable of occupying/instantiating a structural locus l can be thought of as museme alleles (or “allomemes” (Durham, 1991, p. 194)) of each other – they form an allele-class of (so to speak) same-shaped but different-coloured pegs that, by virtue of the first of these two properties, can fit securely into the same hole – in the same way that the class of DNA segments capable of occupying a locus l on a chromosome and controlling the expression of some phenotypic characteristic are genetic alleles of each other.109

The phenomenon of structural-sequential locus-instantiation means that certain types of museme-sequence will tend, all other things being equal, to recur, and certain others will not. As a consequence of this museme parataxis, certain “higher-order” structures will be repeatedly reinstantiated, bottom-up, from the recurrent patterns of “lower-order” museme concatenation. These higher-order structures are capable – as types of memes (see below) – of exerting a top-down regulatory role, by determining the nature and sequence of structural loci and thus biasing the likelihood of an exemplar of a particular museme allele-class appearing at a given locus. The interaction between bottom-up and top-down forces is represented in Figure 3.9 (Jan, 2010, p. 14, Fig. 1).

PIC

Figure 3.9: Recursive-Hierarchic Structure-Generation via Allele-Parataxis.

A higher-order structure may arise in one of two ways:

  • They may arise from the repeated (2 instances) recombination of (more or less) the same lower-order museme-sequence. Such paratactic assemblage of (broadly) the same set of musemes forms what might be termed a “real” musemeplex.
  • They may be reinstantiated by configurationally different but allelically equivalent (locus to corresponding locus) sequences (2 instances) of lower-order musemes. Such paratactic assemblage of different but allelically equivalent musemes forms what might be termed a “virtual” musemeplex.

That two passages might contain a variable-proportion mixture of the same musemes and of museme alleles at each locus suggests that the real and virtual types are actually end-points on a continuum, and not two mutually exclusive categories. This proviso notwithstanding, the same higher-order structure will arise in each category for 2 instances of a given set of pattern-combinations. Figure 3.9 gives examples of these two scenarios, with Figure 3.9a , 3.9b and 3.9c (after Jan, 2007, 86–90, Ex. 3.12) representing the first (therefore showing a real musemeplex), and Figure 3.9d , 3.9e and 3.9f representing the second (therefore showing a virtual musemeplex).

PIC

(a) Mozart: Flute Quartet in A major K. 298 (1787), II, Minuet, bb. 0–8.

PIC

(b) Mozart: Adagio in C major for Glass Harmonica K. 356 (617a) (1791), bb. 1–8.

PIC

(c) Middleground Schema.

PIC

(d) Mozart: Piano Concerto no. 27 in Bmajor K. 595 (1791), I, bb. 107–112.

PIC

(e) Chopin: Piano Concerto no. 1 in E minor op. 11 (1830), II, bb. 63–67.

PIC

(f) Middleground Schema.
Figure 3.9: Musemes and Musemeplexes.

The higher-order structures schematised in Figure 3.9c and Figure 3.9f form what might be termed – after museme and Ursatz – a musemesatz (Jan, 2010). This is an abstract, replicated (therefore memetic) structure of loci/nodes and their associated infill-types that, however represented, indexes a particular configuration of pattern (re)combination. It is the outcome of the process described by the somewhat unwieldy title of this section: recursive-hierarchic structure-generation via allele-parataxis, hereafter abbreviated to “RHSGAP” and represented in Figure 3.9 . The process is recursive-hierarchical because it is not necessarily limited to the illustrative two levels here: a “higher-order” structure on a given level might, in combination with other structures at that level, become a “lower-order” structure in relation to the generation of an even more abstract structure at a yet higher level.

For this reason, it is not necessary to specify the number of levels in such a hierarchy, or to fix them absolutely (as opposed to relativistically). What matters is the underlying principle that a sequences of “level-1” musemes a + b + c (or their alleles an + bn + cn) might, for instance, generate a more abstract “level-2” structure, ABC, which goes on to occupy the “a” (or the an) locus of the next-higher, “level-3”, structure – and so on, ever “upwards”. Here, levels 2 and 3 represent musemesätze, in a macrocosm of the microcosmic process by which, in Narmour’s terms, sets of style shapes – the same or a different set of shape-alleles for each structure-instantiation – assemble to generate a set of instances of the same style structure (1990, p. 34) (levels seven and six, respectively, of Table 1.4 ).

Figure 3.9c and Figure 3.9f represent phrase-length examples of a musemesatz, exemplifying in this case the common antecedent-consequent pattern; but the concept can be extended to encompass more extended section- and movement-length structures. In the latter cases, the musemesatz loci may be instantiated not only by members of particular museme allele-classes, as seen in Figure 3.9c and Figure 3.9f , but also by members of particular musemeplex allele-classes. To illustrate the scope of this process – the power of large-scale structure-generation via interactions between bottom-up and top-down memetic forces – Figure 3.10 (Jan, 2010, 38, Ex. 8) illustrates a significantly more extended musemesatz than that in Figure 3.9 , showing a musemesatz – aligned with a more normative Schenkerian Ursatz (Schenker, 1979) – common to three keyboard-sonata first-movement expositions.110

PIC

Figure 3.10: Ursatz and Musemesatz in Three Keyboard-Sonata First-Movement Expositions.

While Figure 3.10 does not necessarily verify the assertions made in this subsection, these three movements offer suggestive evidence of its basic intuition: that musical material cannot appear in a random order in a composition, and that the tendency for what is essentially narrative (thus psychological) coherence is the result of coevolutionary interactions between “natural” human perceptual-cognitive constraints, including those of memory, and the “nurtural” evolution of musemes to optimise their survival by means of cooperative alliances with other musemes in large-scale structures. This cooperation presumably extends even beyond the scale of Figure 3.10 , with a movement-length musemesatz presumably being abstractable from (and so operative in) a set of sonata-form movements, and therefore being able to represent key aspects of the form’s configuration at a particular point in its evolutionary history.

3.5.3 Improvisation and/as Composition

The model outlined in §3.5.2 is both synchronic and diachronic: it is synchronic in the sense that it offers a means by which the detailed hierarchic structure of a movement can be understood in terms of the memetic forces that gave rise to it; and it is diachronic in that it offers an account of the processes of music generation, in composition and improvisation. Well before the formalisation of Tinctoris’s distinction, made in the late-fifteenth century, between componere (improvised music) and compositor (notated music) (Dunsby & Whittall, 1988, p. 15), improvisation occupied a central place in the world’s musical cultures. Indeed, it is perhaps only in the post-Enlightenment West, with its fetishisation of the composer and of the notation that preserves his or her masterworks immutably for posterity, that compositor has attained (an increasingly unstable) primacy. The notionally “pure” and unmediated nature of improvisation is complicated by the extent to which it draws upon culturally transmitted models of structure and process. Thus, a third category, the transmission of common structures and associated rhetorical schemata, elaborated and varied by (group) improvisation, dominates many non-Western musical cultures. Yet the latter is difficult to separate from “pure” improvisation, which, as will be argued below, also draws upon inherited schemata. Given the similarity of many “traditional” musical cultures to the hypothesised earliest human musics (§2.5.5 ), the group-improvisatory embellishment of culturally shared and valued (ritualistic) formulae is likely to have a long ancestry in our species.

Whereas composition might be regarded as a process in which musical ideas organise themselves sequentially with the potential for subsequent reflective revision (whereby certain musemes in the sequence may be replaced by their alleles, or whereby the resultant/regulatory musemesatz may itself be mutated), clearly there is no scope for such editorial (synchronic) reworking in the real-time (diachronic) unfolding of an improvisation. Given this difference, it is legitimate to ask whether the process of (solo) improvisation operates broadly according the structural principles outlined in §3.5.2 , or whether it requires a fundamentally different theoretical model for its explication. My contention is that, given the nature of musemic replication, the former is likely to be the case, despite the obvious complicating factors, in improvisation, of the constraints of real-time decision-making processes and the associated need to incorporate real-time sensory and motor feedback. It is nevertheless perhaps more realistic to conceive these issues in terms of a music-generative continuum, with composition and improvisation situated at the extremes and various hybrid stages located in between, orientated according to: (i) the degree to which prior planning and notation (or the lack thereof) are factors in generation; and (ii) the structural-hierarchic depth of the regulatory musemesätze – these being deep and all-encompassing in the case of composition, and relatively shallow and time-contingent in the case of improvisation.

To support this claim, I shall review Pressing’s (1988) model of improvisation, arguably the most detailed extant formulation, which demonstrates certain alignments with the RHSGAP model, at the same time offering a critique of its most significant weakness: the lack of any notion of the role of replication in moment-to-moment pattern selection as a key feature of improvisation as much as it is of composition. Essentially, Pressing’s elegant model describes improvisation in highly formalised detail, but does not fully explain the cultural-evolutionary processes underlying it. The heart of the model is the concept of the “event cluster” (Pressing, 1988, p. 153). Represented in Pressing’s quasi-mathematical notation by E, this is a self-contained (but arbitrary length) section of an improvisation containing a number of musical events. An improvisation is therefore a sequence of such event clusters, as symbolised in Equation 3.1 (1988, p. 153).

I = E1,E2...En
(3.1)

While the two terms do not map onto each other precisely, the E seems broadly comparable to a museme or, depending on the extent of the E, to a musemeplex. For Pressing, each E “may be decomposed into three types of analytical representation: objects, features, and processes” (1988, p. 154). Objects are a “unified cognitive or perceptual entity” (1988, p. 154); they are, in my terms, a museme or a musemeplex. Features are “parameters that describe shared properties of objects” (1988, p. 154); they are an enumeration of the component elements (i.e., the “atomic” pitch and rhythm primitives) of a (“molecular”) museme (level eight of Table 1.4 ), or the elements (i.e., the musemes) of a musemeplex (level seven). Processes are “descriptions of changes of objects or features over time” (1988, p. 154); they represent the musico-operational/procedural memes regulating intra-museme/musemeplex element-connections. These three descriptors are represented using “variable-dimension arrays O, F, and P” (Pressing, 1988, pp. 154, 156, Fig. 7.1), which map objects, features and processes against (somewhat arbitrary) “cognitive strength” ratings (Pressing, 1988, p. 155). Pressing argues that

the fundamental nature of the improvisation process is … the stringing together of a series of ‘event clusters’ during each of which a continuation is chosen, based upon either the continuing of some existing stream of musical development (called here an event-cluster class [K]) by association of array entries, or the interruption of that stream by the choosing of a new set of array entries that act as constraints in the generation of a new stream (new event-cluster class). (Pressing, 1988, p. 168)

These two modes of continuation – associative generation (itself divided into similarity and contrast), and interrupt generation (Pressing, 1988, pp. 155–157) – differ according to the number of array (museme/musemeplex) components changing from Ei to Ei+1, and the extent of the cognitive-strength changes as quantified by their respective OFP arrays.

Pressing’s concept of the “event-cluster class” is analogous to the notion of the musemeplex allele-class3.5.2 ), in that it makes diachronic what, in memetics, is an abstract synchronic alignment; and it opens up the further theoretical possibility of the musemesatz allele-class – the recurrent parataxis of a set of musemes and/or musemeplexes (and/or their alleles) that engenders a common underlying structural framework that is nevertheless elaborated differently on each improvisation-instantiation. Thus, to summarise these mappings between Pressing’s model and structures theorised in memetics, an event equates to a museme or a musemeplex; an event cluster equates to a museme-sequence or a musemeplex-sequence; and an event-cluster class equates to a musemeplex allele-class or a musemesatz allele-class.

Pressing understandably encounters difficulty in theorising the details of “how one continuation comes to be chosen over all other possible ones” (1988, p. 164). He wraps this problem into two abstractions: “a set of current goals”, symbolised by 𝒢 ; and the “referent”, R, which is “an underlying piece-specific guide or scheme”, these being held in long-term memory, M, for the duration of the improvisation. They are integrated in Equation 3.2 , which represents the “process of event-cluster generation” and, as the arrow implies, event-cluster parataxis (1988, p. 153).

({E},R,𝒢 ,M )i → Ei+1
(3.2)

In acknowledging that improvisation may be guided by “a vast panorama of culturally and cognitively based musical processes and stylistic preferences” (1988, p. 164), Pressing admits the role of schemata (R) in shaping generation (𝒢 ) (1988, p. 152). Some of these schemata are cognitive but, to a significantly greater extent than in composition, others are motor: i.e., they are patterns of motor-control memes and memeplexes, discussed in §3.5.4 under the rubric of “gestemes” or “gesture-control memes”. As an illustration of the role of schemata in improvisation, Pressing considers the work of Parry (1930, 1932) and Lord (1964, 1965) on “formulaic composition” in folk epics, a genre that “is created anew at each performance by the singer from a store of formulas, a store of themes, and a technique of composition” (Pressing, 1988, p. 146). He argues that

[a] ‘formula’ is a group of words regularly employed under the same metrical conditions to express a given essential idea; it has melodic, metric, syntactic, and acoustic dimensions. By choosing from a repertoire of roughly synonymous formulas of different lengths and expanding or deleting subthemes according to the needs of the performance situation, the experienced performer is able to formulaically compose (in real-time, hence improvise) a detailed and freshly compelling version of a known song epic. As a result of the composition system, instances of pleonasm and parataxis are common.… In the words of Lord …: ‘the really significant element in the process is … the setting up of various patterns that make adjustment of phrase and creation of phrases by analogy possible’ …. In addition, the permutation of events and formulas may occur, as well as the substitution of one theme for another. (Pressing, 1988, p. 146)

This account affords clear parallels, in a different medium of memetic replication, to the operation of the RHSGAP model in music: (i) the notion of “a repertoire of roughly synonymous formulas” is equivalent to the idea of the museme allele or musemeplex allele; (ii) the concept of “expanding or deleting subthemes” is analogous to the modification, reordering, interpolation or deletion of structural loci that drives musemesatz mutation; and (iii) the “essential idea” corresponds to the musemesatz itself, generated by, yet also regulating, the lower-level processes it subsumes. That the literary process also appears analogous in several ways to musical improvisation – not least in their real-time unfolding – allows us to hypothesise that common processes of memetic conglomeration and structuralisation relate these realms, despite their different media and dissimilar phemotypic manifestations.

Within the broad structural constraints imposed by a musemesatz, those attributes of musemes and musemeplexes determining their parataxis affect their compatibility with other musemes and musemeplexes in both memomic and phemotypic forms. These factors partly decide which member of a potentially locus-generating museme allele-class or musemeplex allele-class is successful, vis-à-vis its rivals, in expressing that locus in any real-time instantiation of the improvisation’s musemesatz. Yet invoking the operation of “formulaic composition” – or, in my terms, the RHSGAP model – in improvisation, as in composition, arguably still does not fully account for the “residual decision-making” of “how one continuation comes to be chosen over all other possible ones” (Pressing, 1988, p. 164). Pressing advances four hypotheses to explain the source of this continuity: “intuition”, “free will”, “physicalism” and “randomness” (1988, p. 165). While the first can be dismissed as mystical (or, more charitably, as devolving to the third and/or fourth), the RHSGAP model aligns most closely with the third, while admitting, in keeping with the precepts of the overarching VRS algorithm, the role of the fourth. In physicalism,

complex decision making is seen to be an emergent property of the fantastically complex physical system known as a human being, in interaction with a series of environments. Free will in this perspective is either illusory, or simply a somewhat misleading metaphor for certain complex characteristics of the system. (Pressing, 1988, p. 165)

Recast in terms of the standpoint argued for in this book, physicalism suggests that memomic musemes and musemeplexes are in a state of constant competition for phemotypic expression – and thus for potential further replication – and therefore those that are most successful in this quest will, self-evidently, prevail (this being the “tautology” referred to by Dawkins in §1.6.2 ). Inherent in this is a tension between top-down and bottom-up factors: in the former, a musemesatz, often only dimly apprehended by the composer or improviser, “seeks” (in Dawkins’ rhetorical language of selfish intentionality) to select those musemes or musemeplexes that will articulate its structural loci; in the latter, musemes and musemeplexes, “aware” of this constraint, “compete” with their rivals for the survival-enhancing benefits such “victory” brings. One element of this success is a propensity for cooperative interaction – coadaptation – between replicators, both synchronically and diachronically. In summary, the sequential ordering of musemes and musemeplexes, and the configuration of the resultant musemesatz, is arguably less the product of conscious intentionality or agency on the part of the composer or improviser and more an “emergent property” of blindly algorithmic/mechanistic lower-level processes – Pressing’s notion of free will as an illusion. Indeed, Pressing’s physicalism aligns closely with Dennett’s “Multiple Drafts Model” of consciousness, discussed in §7.3 , which offers an algorithmic view of consciousness in which intentionality is framed as an illusion arising from the operation of the VRS algorithm.

3.5.4 Performance

The performance of music brings together a number of processes that can be understood in the light of evolution. Performance (including improvisation and conducting, and extending to include dance and drama) obviously utilises the body, and so depends upon, and illuminates, attributes – sensory/perceptual, cognitive and motor – shaped by millennia of evolution. Indeed, the evolutionary aspects of musical performance are predicated on the principle that the spatial movements of an organism in relation to its (geological) environment, or to another organism, are optimised to facilitate the imperatives of gene-survival, namely risk-avoidance (evasion of predators and other environmental hazards) and reward-garnering (securing shelter, food and mates). These have become hard-wired into brains so they are accessible at a split-second’s notice. Such reflex actions modulate the movements underpinning musical performance, which have become stylised microcosms and re-playings of encounters and conflicts encoded into us in our distant evolutionary past. These propensities are covered by Crewdson (2010), who formalises them under the rubric of an “etiological perspective”. Essentially, for Crewdson, when we listen to music we are transported back to our evolutionary prehistory, perceiving the virtual kinesis of music in a way analogous to that deployed when we perceive the real kinesis of an approaching predator or thunderstorm. Here, I attempt to apply this perspective to the motor actions of performance.

While innate (evolutionarily wired) movements are often preferred in nature, because they constitute optimum ways of quickly achieving certain physical goals, other movements, particularly the fine-grained actions involved in musical performance, are learned as specific motor skills, often as a result of years of painstaking practise, and often in defiance of what the body finds easy or natural.111 Such learned body movements are types of memes or, rather, they are the phemotypic effects of memes. One might term them “gestemes”, or “gesture memes” (see also Gritten & King, 2006; Gritten & King, 2011). Like all other categories of meme, they are subject to the operation of the VRS algorithm, being varied in response to cognition or discovery of different strategies for executing the gestures in question; replicated, via visual and/or oral instruction from teacher to pupil or from peer to peer (who might take the form of a recording), as part of a pedagogic interaction; and selected according to their perceived utility and efficiency in rendering the music in question.

Recent research in the study of recorded music has indicated how tempi vary significantly within individual performances; and vary from performance to performance of the same work by the same performer and from performer to performer in the same work, over time; as have certain global baseline tempi in some repertoires (Leech-Wilkinson, 2009a). This fluctuation might be regarded as controlled, in part, by gestemes, which regulate the physical tendency to move the hands and fingers more or less quickly, or to pivot the torso in certain ways and in certain directions. One might also hypothesise that gestemes are coadapted with the musemes that code for the music performed, whether these are primarily score-based, as in the performance of notated music; or largely brain-based, as in the creation of improvised music (§3.5.3 ). Performance thus appears to rely on an interplay between culturally transmitted sound patterns (musemes) and culturally transmitted gesture patterns (gestemes); and a memetics of musical performance should therefore attempt to determine how this interplay functions and to understand how the evolutionary pressures affecting each domain reinforce or contradict each other.

Two questions arising from the issue of tempo-fluctuation are: (i) is such rubato the consequence of some attribute of musemes that might motivate intra-museme tempo changes (thus, are gestemes created in part by musemes);112 and (ii) if so, once this tendency is realised in one performance, can the effect be consolidated, indeed augmented, on its cultural transmission to other performers by the synergy between the relevant museme(s) and the newly associated gesteme(s)? Extending this, if the attributes of musemes do motivate tempo changes, then presumably these might be coordinated when musemes assemble to form a musemeplex, engendering a parallel gestemeplex. Moreover, if a musemesatz is generated by the tendency of members of certain museme and musemeplex allele-classes to instantiate the structural-sequential loci of a movement (§3.5.2 ); and if members of each of these allele-classes are potentially coadapted/coaligned with members of allele-classes of gestemes; then a higher-order sequence of gestemes will arise, which might be termed a gestemesatz.

One might hypothesise that such gesteme-generating museme tempo fluctuations are driven partly by innate (natural) forces and partly by learned (nurtural) forces, in complex interactions. In the former category, the effect is partly the result of image-schematic factors4.2 , §4.5 ) and partly the result of the I-R forces illustrated in Figure 3.7 (see also Narmour (1990, 1992)).113 In the case of image-schematic factors, a quasi-gravitational force operating in three-dimensional musical “space” upon the metaphorical “mass” of the constituent musemes might be assumed to affect certain aspects of their tempo. In the case of I-R forces, the various implications intrinsic to a museme might be understood to impel the tempo forward, whereas both realisations and frustrations might conceivably act to retard the tempo.

The operation of these natural, and certain nurtural, factors is summarised in the following two-part list. Beyond being incomplete (there are presumably many more factors affecting the dynamics of performance than are identified here),114 this list is clearly over-simplistic, because: (i) the two domains cannot be entirely separated (the learned stabilities of pitch and rhythm hypothesised in the second part are underpinned by natural predispositions shaped by acoustic and morphological regularities); and (ii) multiple factors within and between each category may reinforce and/or contradict each other in complex ways (nature is modulated by nurture, and vice versa). Moreover, the effect ascribed to a particular cause might be manifested prospectively (in anticipation of the cause) or retrospectively (after the cause has been processed in cognition). This distinction itself relies upon the difference between sight-reading and performance based upon practise and reflective engagement. In the “natural” sub-list, “IS” symbolises situations where (innate) image-schematic factors are hypothesised to be dominant; “IR” symbolises situations where (innate) implication-realisation forces are hypothesised to be dominant; and combinations of these symbols indicate that the tempo-altering effect results broadly from a synergy (“IS+IR”) or a conflict (“IS-IR”) between them.

1.
Natural; primarily genetically transmitted factors:
(a)
If a museme segment or museme-museme interface is moving downwards in pitch, there may be a tendency to acceleration, in terms of shortening of inter-onset interval (IOI) and/or offset-to-onset interval (OOI) (Temperley, 2001, p. 68) (IS).
(b)
The effect of point 1a may be augmented if the museme articulates a [P] (IS+IR); and it may be diminished or counteracted if the museme articulates a [R] (IS-IR).115
(c)
If a museme segment or museme-museme interface is moving upwards in pitch, there may be a tendency to deceleration, in terms of lengthening of IOI and/or OOI (IS).
(d)
The effect of point 1c may be diminished or counteracted if the museme articulates a [P] (IS-IR); and it may be augmented if the museme articulates a [R] (IS+IR).
(e)
If a museme segment or museme-museme interface encompasses a decrease in note-length (e.g., from crotchets to quavers, or from “straight” quavers to triplet quavers), there may be a tendency to acceleration that exceeds the “measured” acceleration governed by the note durations (IS).
(f)
The effect of point 1e may be augmented if the museme articulates a [P] (IS+IR); and it may be diminished or counteracted if the museme articulates a [R] (IS-IR).
(g)
If a museme segment or museme-museme interface encompasses an increase in note-length (e.g., from quavers to crotchets, or from triplet quavers to “straight” quavers), there may be a tendency to deceleration that exceeds the “measured” deceleration governed by the note durations (IS).
(h)
The effect of point 1g may be diminished or counteracted if the museme articulates a [P] (IS-IR); and it may be augmented if the museme articulates a [R] (IS+IR).
(i)
The octave may have a multivalent effect, sometimes increasing and sometimes decreasing tempo depending on the context. Rising octaves might impel a sense of “momentum-building” to surmount the “height” of the octave, whereas falling octaves might call upon a “precipice-avoiding” steadiness (IS).116
2.
Nurtural; primarily memetically transmitted factors related to style-specific aspects of scale and chord degree and to metrical/rhythmic position:
(a)
There may be a tendency to decelerate around/into relatively stable chord-notes (the root, third or fifth) of the locally prevailing triad.
(b)
There may be a tendency to accelerate around/into relatively unstable non-chord notes sounding in conjunction with the locally prevailing triad.
(c)
There may be a tendency to decelerate around/into relatively stable scale degrees (1, 3 and 5) and/or triads (I|i, IV|iv, vi|VI and V versus I46  |i46  ) of the locally prevailing key.117
(d)
There may be a tendency to accelerate around/into relatively unstable scale degrees (2, 4, 6 and 7) and/or triads (I6
4  |i6
4  , ii|iio  , iii|III, V|v and viio
|VII) of the locally prevailing key.118
(e)
There may be a tendency to decelerate around/into rhythmically strong/accented beats (beats 1 and 3 of a 44  bar or beat 1 of a 43  bar).
(f)
There may be a tendency to accelerate around/into rhythmically weak/unaccented beats (beats 2 and 4 of a 4
4  bar or beats 2 and 3 of a 3
4  bar).119
(g)
There may be a tendency to decelerate at phrase and sub-phrase endings (followed by a compensatory acceleration at the start of the following phrase or sub-phrase), this motivated in part by the (learned) closural force of imperfect or perfect cadences.
(h)
There may be a tendency to return (via acceleration or deceleration) to the original tempo of a museme on its return, if the tempo immediately preceding the point of return has decreased or increased.

To illustrate how these factors might operate, a passage from Chopin’s Mazurka in F minor op. 7 no. 3, shown in Figure 3.10a , will be examined. One outcome of the Mazurka Project (CHARM, 2019b), conducted under the aegis of the CHARM Research Centre (CHARM, 2017), was analyses of recordings of this mazurka performed by Ignaz Friedman, made in 1930, and by Charles Rosen, made in 1989, which graphed beat-to-beat tempo fluctuations (CHARM, 2019a; see also N. Cook, 2007a). The graphs of bb. 9–17 of these recordings are shown aligned in Figure 3.10b .120 This phrase is chosen for analysis here over bb. 1–8 owing to the greater variety and movement of the later material – it is the main melody, compared with the more static introductory material of bb. 1–8 – which motivates more diversity in tempo than bb. 1–8. In the graphs, red dots indicate the beginning of the first beat of each bar and the following two blue dots indicate the beginning of the second and third beats, representing beat-onsets equidistantly on the x axis (the upper x-scale counts bars, the lower counts beats). Because of the tempo fluctuations, beats are not located equidistantly in performance: the position of the dots on the y axis represents the measured tempo of the time-slice demarcated by a beat, the left-hand scale representing beat-duration in milliseconds (ms) and the right-hand scale in beats per minute (BPM). Note that the layout of these scales means that the lower the dot on the graph, the faster the tempo, and vice versa. The various lines connecting the dots represent data from listener tempo-tapping trials that, being estimates of tempo (unlike the measurement-related dots), are not directly relevant to present concerns.

PIC

(a) Chopin: Mazurka op. 7 no. 3 (1830–1832), bb. 1–17.

PIC

(b) Tempo Graphs of bb. 9–17: Friedman 1930 (upper); Rosen 1989 (lower).
Figure 3.11: Two Performances of Chopin, Mazurka op. 7 no. 3 (1830–1832), bb. 9–17.

The intra-museme tempo fluctuations within and between bb. 9–17 of these two recordings are summarised in Table 3.2 . While bars are not always necessarily coterminous with musemes, bar lines here do indeed demarcate perceptually-cognitively salient (melodic) units, and therefore can be taken as markers of initial and terminal museme-nodes.121 Table 3.2a shows the antecedent phrase (bb. 9–12) and Table 3.2b shows the consequent phrase (bb. 13–16). The two-bar sub-phrases within each phrase are separated by double lines. The table also takes inter-museme tempo fluctuations into account, which occur in the context of the closural force of the musemes’ terminal node (see the rows for bb. 9–10, bb. 10–11, etc.). The assessed magnitude of beat-to-beat tempo change is represented by “S” = small; “M” = medium; and “L” = large. Nevertheless, at times, it is not always easy to distinguish between equal- and small-, and small- and medium-sized changes. The direction of tempo change from beat to beat is indicated by “” = acceleration; “” = deceleration; and “=” = no significant change. An ellipsis (…) separates observations pertinent to the beat 1–beat 2 span from those pertinent to the beat 2–beat 3 span within a given bar/museme. Significant cross-recording overlaps of tempo-profile between parallel musemes or museme components are indicated in bold.

Number/letter combinations in brackets refer to those hypotheses in the list on page 309 judged most relevant to explain the observed tempo variation, adopting the most parsimonious interpretation in each case.122 Sometimes these require nested brackets in order to clarify the combination of factors, thus demarcating combinations from the relationships between the combined forces and some other force or set of combined forces. For these higher-order relationships, a plus (“+”) sign indicates synergistic augmentation, or contrastive neutralisation, of two factors or combination of factors; whereas the separator “>” indicates that, in the case of contradictory factors or combinations of factors, the former is judged to outweigh the latter in any particular instance of tempo change. If there is a change of tempo direction within a bar (an increase followed by a decrease, or vice versa), hypotheses pertinent to each are separated by an ellipsis.123

Phrase, Bar

Friedman 1930

Rosen 1989

9

… S 
(1d  > 1c )

… S 
(1d  > 1c )

9–10


(1a  + 1b  + 1e  + 1f  + 2b )

10

… M 
((1g  > 1h ) + ((2a  + 2c ) > 2f ) … 1i )

… =
((1g  > 1h ) + ((2a  + 2c ) > 2f ) … 1i )

10–11


(2g )


(1e )

11

= … S 
(1c  > 1d 1c  > 1d )

… S 
(1c 1d  > 1c )

11–12


(1e )


(1e  + 1f )

12

= … S 
((2a  + 2c ) > (1a  + 1b ) … (2a  + 2c ) > (1a  + 1b ))

… S 
((1g  + 2a  + 2c ) > (1a  + 1b ) … (2a  + 2c ) > (1a  + 1b ))

12–13


(1a  + 1b )


((2a  + 2c  + 2e ) > (1a  + 1b ))

(a) Bars 9–12.

Phrase, Bar

Friedman 1930

Rosen 1989

13

… S 
(1d  > 1c 1c  > 1d )

… =
(1d  > 1c 1c  > 1d )

13–14


(1a  + 1b  + 1e  + 1f  + 2b )

=
(2e  > (1a  + 1b ))

14

=
((1g  > 1h ) + ((2a  + 2c ) > 2f ) … 2f  > (1c  + 1d ))

=
((1g  > 1h ) + ((2a  + 2c ) > 2f ) … 2f  > (1c  + 1d ))

14–15

=
(1i  > 1a )


(1a  > 1i )

15

… S 
(1d  > 1c 2c  > 2d )

… =
(1d  > 1c 2c )

15–16


(2a  + 2c )


(2a  + 2c  + 2e )

16

= …
(2c  + 2d 2g  > (1e  + 2b  + 2d  + 2f ))


(2d 2g  > (1e  + 2b  + 2d  + 2f ))

16–17


(2h )
(b) Bars 13–17.
Table 3.2: Intra- and Inter-Museme Tempo Fluctuations in Chopin, Mazurka op. 7 no. 3, bb. 9–17.

There is a good deal of data in Table 3.2 , and, perhaps unsurprisingly, some of it is contradictory. For one thing, identical figures are not always performed in the same manner, even by each pianist, as in the case of b. 9 and b. 13, especially in Friedman’s recording. Nor, indeed, are analogous figures, such as b. 9 and b. 11, rendered similarly, again particularly in the case of Friedman. More broadly, justifying the relationships posited in Table 3.2 between the tempo data and the hypotheses in the list on page 309 is beyond the scope of this chapter, so three examples from Table 3.2 must suffice for particular mention. These are outlined below:

1.
In b. 9 of both recordings there is an acceleration, L (Friedman)/M (Rosen) … S . This suggests the counteraction of the potential deceleration motivated by an ascent (point 1c ) by the countervailing “energy” of the [P] (point 1d ). The museme in b. 9 ends with a (prospective) Intervallic Process ([IP]) (Narmour, 1990, p. 350), not with a [R]. Assuming it would have a tempo-mediating effect, albeit one weaker than a [R], the [IP] occurs after the start of the third beat of the bar, and so appears not to factor into the tempo calculation. Apropos points 1b , 1d and 1f , only [R]s where the change-of-direction note is the second or the fourth quaver (in 43   time) are likely to affect the intra-bar tempo, unless there is in play the prospective cognition referred to on page 309. The effect of a [R] or an [IP] might be evident, however, on inter-bar/museme tempo, although this is not relevant in the case of bb. 9–10 here.124
2.
Comparison of the analogous b. 10 and b. 14 shows illuminating differences. In Friedman, both bars decelerate into the second beat, perhaps motivated by the “trumping” by note-length increase (point 1g ) of [P]-motivated acceleration (point 1h ); and by the combined domination of harmonic stability factors (points 2a and 2c ) over rhythmic factors (point 2f ). Bar 10 has a compensatory acceleration on the f1–f2 ascent, whereas b. 14 has no change on the analogous f1–c2 ascent. The former (octave) change might be the result of image-schematic “aspirational” forces (point 1i ), whereas the latter (fifth) change might be the result of the trumping by accelerative rhythmic forces (point 2f ) of the decelerative [R]-related forces here (points 1c and 1d ), this conflict motivating not an acceleration but tempo stability here. In Rosen, b. 10 also has a deceleration in the same place as Friedman (presumably motivated by the same factors), but no compensatory acceleration on the f1–f2 ascent; whereas b. 14 has a small deceleration (perhaps arising from weaker action of the forces attendant upon the Friedman segment) and, like Friedman, no change on the f1–c2 ascent (presumably motivated by the same factors). The significant difference in connection with the octave leap of b. 10 might be the result of the issues discussed in note 116 on page 430, with Friedman being motivated primarily by image-schematic factors and the arguably more cerebral Rosen hearing it as the “same” note owing to “Narmourean” octave-equivalence.
3.
There is a large deceleration at the end of b. 16 in both recordings (they are the largest tempo changes in Figure 3.10b ), followed by an acceleration into b. 17.125 This deceleration suggests a strong (nurtural) phrase-ending effect here (point 2g ), one that contradicts the (natural) implication of acceleration on rhythmic diminution (point 1e ), even in the absence of any (natural) acceleration-inhibiting [R] here (point 1f ). The deceleration would also appear to overrule the (nurtural) tendencies to accelerate around/into relatively unstable non-chord notes (point 2b ), around/into relatively unstable scale degrees and triads (point 2d ), and around/into weak beats (point 2f ).

Constraints of space in this section have prevented my developing a fully developed evolutionarily grounded theory of musical performance. A few suggestive conclusions have emerged, although these need to be evidenced more substantively, perhaps using large-scale computer-aided correlation of tempo-fluctuation data with museme-contour analysis. Given the multiparametric nature of music, and the complex mixture of natural and nurtural factors involved in its performance, what are clear behavioural trajectories in the realm of biological actions often become entangled in musical performance. As a result, and in a parallel to the particulate nature of genetic inheritance, one might paraphrase Dawkins and suggest that “[t]his does not mean that the [natural and nurtural factors] concerned are not [discrete and] particulate. It is just that there are so many of them …, each one having such a small effect, that they seem to blend” (1989, p. 195, emphasis in the original) when combined in the heat of the performance situation. Nevertheless, it seems the case that both biologically evolved patterns of physical movement and culturally evolved habits of nuancing those patterns play a significant role in shaping musical performance.

3.6 Music-Cultural Taxonomies

The discussion of taxonomy in §1.7 considered not only the great diversity of the natural world – as evidence by the number of taxonomic ranks (Table 1.5 ) and their internal richness – but also the conflicting views among biologists as to how sense might be made of this heterogeneity by systems of categorisation. As an approach that seeks strictly to trace evolutionary relationships, using the evidence of molecular biology as a validation of apparent connections suggested by morphological resemblances, cladistic taxonomy (§1.7.2 ) is arguably the optimal way of mapping the operation of Darwinism in nature.

On the logic of Universal Darwinism, cladism would appear also to be the optimal way of charting the operation of Darwinism in culture. Here the aspiration – one well beyond the scope of this book – would be the formulation of a complete taxonomy of human (and potentially animal and machine) culture to rival that assembled by biologists for the natural world (Jan, 2014, sec. 6). That this would in principle be possible – that there is an intrinsic connection between biological and cultural taxonomies – was recognised by Darwin, when he observed the similarities between language families and human genealogy. In a passage in which “musics” might readily be substituted for “languages”, he argued that

[i]t may be worth while to illustrate this [dendritic] view of classification, by taking the case of languages. If we possessed a perfect pedigree of mankind, a genealogical arrangement of the races of man would afford the best classification of the various languages now spoken throughout the world; and if all existing languages, and all intermediate and slowly changing dialects, had to be included, such an arrangement would, I think, be the only possible one. Yet it might be that some very ancient language had altered little, and had given rise to few new languages, whilst others (owing to the spreading and subsequent isolation and states of civilisation of the several races, descended from a common race) had altered much, and had given rise to many new languages and dialects. The various degrees of difference in the languages from the same stock, would have to be expressed by groups subordinate to groups; but the proper or even only possible arrangement would still be genealogical; and this would be strictly natural, as it would connect together all languages, extinct and modern, by the closest affinities, and would give the filiation and origin of each tongue. (Darwin, 2008, p. 311; see also Sereno, 1991, pp. 471–472)126

Thus, a cladistic orientation appears to be the most logical basis upon which to develop music-cultural taxonomies, given the concern of memetics with the operation of the VRS algorithm at several structural-hierarchic levels and across various interconnected geographical domains over time. The most obvious musical implementation of cladism, and a good model for a more thoroughgoing cladistic memetics, is the tradition of musical text-criticism, one of the most venerated elements of the “old” musicology. Deriving from palaeography and classical philology, it offers a highly systematic and formalised methodology based on transmission and mutation for uncovering the filiation, as Darwin would say, of pieces, particularly music in manuscript sources, and for generating its own form of taxonomic trees, stemmata (Grier, 1996, Ch. 3).

While it is clear what are the significant taxonomic units of biology – of the levels discussed in §1.7 , the most important from a cladistic perspective is arguably the species – it is not so clear what are the significant taxonomic units of culture. This ambiguity is the result of fundamental differences between the dynamics of biological and cultural evolution, and of the enormous variety of forms sustained by culture – both of which result from key mechanistic differences. For the former factor, and in biology, there is a clear separation between replicators and vehicles1.6.1 ); and the associated constraints of a fixed life-cycle (whatever its length) mean there is a clear rhythm of generations resulting from the time-lag between birth and the readiness of the vehicle to reproduce. In culture, no such rhythm occurs, and cultural replicators can be copied rapidly and “arhythmically”. In short, this is the difference between the primarily periodic, “vertical” (parent-to-offspring) nature of biological transmission versus the primarily aperiodic “horizontal” (peer-to-peer) nature of cultural transmission.127 “Oblique” transmission is sometimes used to refer to intergenerational transmission between adults and (unrelated) children, and is a significant mode of transmission in musical culture, as well as in most other formalised educational systems (Shennan, 2002, pp. 48–51; see also Blute, 2006, pp. 156–157). For the latter factor, the absence in cultural evolution of a mechanism connecting replicators deterministically with vehicles analogous to that – DNA-mediated protein-synthesis – in biological evolution leads to the relatively unconstrained diversity of cultural phemotypes, as against the relatively constrained uniformity of biological phenotypes.

When pursuing the application of taxonomy to memetics, it is necessary to consider correspondences between comparable levels of the nature-culture analogy hypothesised in §1.6.2 . As illustrated in Table 1.4 , there are four main levels to the analogy. At the highest level, the correspondence operates between biological species and cultural dialects (Meyer, 1996, p. 23) (level three); below this, groups within a species might be mapped onto idioms (particular composers’ styles (Meyer, 1996, p. 24)), genres, and formal-structural types (level four); at a still lower level, the equivalence is arguably between the individual organism and the individual movement or work (level five); and at the lowest level one might compare operons/genes with m(us)emeplexes/m(us)emes – I conflate levels six and seven of Table 1.4 here, given their structural and functional similarities.128 At which of these culture-hierarchic levels might one most appropriately develop methodologies for a music-memetic taxonomy?

In the case of the approach mentioned above, the stemmata of musical text-criticism, the object of investigation and classification is usually the work,129 which equates to the individual organism in biology. Clearly this is too low a level for biological taxonomy, which generally regards species (dialect) – together with (sub)species, varieties, or other such “infraspecific” taxa – as the lowest manageable units of classification; and it does not appear useful for cultural taxonomy either, for there is arguably no meaningful sense in which a work can be equated to a parent lineage that bifurcates to create child lineages, even though particular works may well serve as inspiration, models even, for the efforts of later composers.

Mappings at the four levels are discussed in the following. Further implications of this issue are explored in §4.3.1 .

3.6.1 Species-Dialect

In biology, cladistic-taxonomic discussion is primarily focused on the phenomenon of speciation, which might find its analogue in culture in the breaking-off of separate and distinct “child” dialects from a “parent” dialect. While this is a central area of cladistics in biology, the picture is somewhat more mixed in culture. Much depends upon how a dialect is defined: the options broadly devolve to some combination of the geographical/“horizontal”/synchronic (“Viennese Classicism”, “the Mannheim School”); and the chronological/“vertical”/diachronic (the “style of the 1780s”). Nevertheless, music-cultural dialects have considerably greater musemic and configurational diversity than the potentially analogous genetic and morphological consistencies that are required for the determination of species: members of a species must manifest certain genomic and phenotypic regularities, which both result from and facilitate gene replication, regularities that are not required for the propagation of musical dialects.

For cultural speciation to occur, dialects require cultural-ecological “niches” within which potential child dialects could arise and flourish. The studies of bird-song transmission in §5.4.1 suggest that this can, in principle, be engendered by geographical separation and, certainly before the twentieth century, the predominant concentration of music in urban centres meant that distinct geographical dialects, each drawing upon their own subset of a wider museme pool, could survive and flourish. As an example, while there was a generic European Galant style, distinct French, German and Italian “subspecies” coexisted, each with its own subtle variants on standard practices (Heartz, 2003). Nevertheless, it seems that the force of the species-dialect mapping is primarily as a verbal-conceptual memeplex (i.e., it is metaphorical; §4.3.3 ), and not directly music-memetic.

3.6.2 Group-Idiom/Genre/Formal-Structural Type

The nearest cultural equivalent to cladistic taxonomy’s study of speciation might be found in the study of evolving musico-structural types and categories within and across dialects – examples include the evolution of binary-form dance genres over the seventeenth and eighteenth centuries, and that of the various types of sonata forms and their associated multi-movement sequences over the eighteenth and nineteenth centuries – which corresponds with the group of organisms in biology. This of course breaks the level-mappings of Table 1.4 – which is not intended to be regarded as absolute and immutable – aligning level three in nature (species) with level four in culture (idiom/genre/formal-structural type). But given that a sub-group can form the basis of a new species, and given that the distinction between species and sub-species is not always clear, then the evolution of these particular cultural categories, might constitute a meaningful field for cultural taxonomy. As an example of potential bifurcation at this level, the often “monothematic” – or “P[rimary theme]-based S[econdary theme]” (Hepokoski & Darcy, 2006, pp. 135–136) – sonata movements of Haydn, for instance, might be regarded as a different branch to the often “bithematic” practice of Mozart and Beethoven. But – at the risk of oversimplifying a complex range of practices (there are various hybrid types) – the fact that Haydn also wrote bithematic sonata forms muddies these particular waters and separates this candidate cultural example of speciation from the more clearly demarcated lineages of biology.

3.6.3 Organism-Movement/Work

Cladistic taxonomy only considers individual organisms as tokens of the type represented by the species, recognising that to categorise them on an individual basis is meaningless in taxonomy (but not necessarily so in other domains of biology). The same holds true in culture: movements and works, as analogues of organisms, are tokens of higher-order categories, not types in themselves; and attempting to treat them cladistically, as akin to species, would again break the level-mapping of Table 1.4 by aligning, in this case, level three of nature with level five of culture. As argued in the discussion of the unit(s) of selection in §1.6.2 , musemes, not whole works, are transmitted from composer to composer. There is therefore no sense in which a work itself is subject to the operation of the VRS algorithm: this mechanism applies only to (some of) the musemes that constitute a work. Thus, it applies only indirectly, via bottom-up forces, to the idioms, genres, and formal-structural types that a work tokens. Nevertheless, the attributes of these level-four categories might additionally be shaped via the action of musico-operational/procedural memes.

3.6.4 Operon/Gene-M(us)emeplex/M(us)eme

Most cultural change at the dialect (species) level is perhaps due less to the geographical and/or chronological bifurcation of child dialects than to the evolution of the system itself brought about by internal musemic mutation, an issue covered more fully in §7.5 . Some biologists assert that the ultimate driver of evolution is gene selection, yet this is always mediated by interactions between phenotypes and environments. While this probably also holds true for culture, measuring the effects of interactions between phemotypes and environments – human perceptual-cognitive constraints acting in conjunction with effects arising from the wider culture – is difficult, whereas measuring m(us)eme-level change is more feasible. In this sense, the level equivalent to that of the gene – the m(us)eme – is arguably the most tractable for cultural taxonomies. At this level, however, the configuration of a gene-pool is, strictly, the province of population genetics, not of taxonomy; and mutation, not evolution, is the appropriate concept when considering its reconfiguration (because genes mutate whereas species evolve). Similarly, a study of the constituents of a m(us)eme pool – a classification of antecedent forms and their mutational descendants in terms of their spatio-temporal position on what would be a vast tree of transmission relationships – is one that falls, strictly, within the purview of population memetics, even though one might ostensibly conduct it under the rubric of a memetic taxonomy.

Cope’s concept of the lexicon is pertinent to this issue (Cope, 2001, p. 94; Cope, 2003, p. 20; Jan, 2016c). While he does not explicitly invoke memetics, a lexicon is essentially the outcome of assigning museme alleles – a set of structurally/functionally analogous musemes any of which might occupy a particular locus in an instantiation of a specific structural archetype (§3.5.2 ) – to their parent museme allele-class. Lexicons impinge on Cope’s work in computer-generated composition – most notably in his Sorcerer and Experiments in Musical Intelligence (EMI) systems (§6.5.1.1 ) – in that a member of a given lexicon can be inserted interchangeably with other lexicon-members into a specific position in a composition, thus reconciling high levels of pattern richness with algorithmic parsimony. Mattheson suggests that composers perform such museme allele-class assignment semi-automatically. He advises that

[t]hese particulars must not be taken so strictly that one would perhaps write down an index of like fragments, and, as is done in school, make a proper invention box out of them; but one would do it in the same way as we stock up a provision of words and expressions for speaking, not necessarily on paper nor in a book, but in one’s head, through which our thoughts, be they verbal or written, can then be quite easily produced without always consulting a lexicon. (Mattheson & Harriss, 1981, p. 284, para. 17)

This indexing of “like fragments” is a function of the sophisticated sorting and comparison powers of the human brain to group patterns that are similar according to various criteria, and operates both consciously and unconsciously. It is argued in §3.8.4 to be a function of the “hashing” formalised in Calvin’s Hexagonal Cloning Theory, whereby shared attributes of two or more cortically encoded patterns are connected by neural links to a “centrally located representation” (CLR) that serves to abstract and index their defining features. In this sense, hashing is a form of cortical taxonomy, because it creates higher-level categorical groupings that associate phenomena that are perceptually and cognitively similar in certain respects.

The phenomenon of one-way binary branching, while intrinsic to biological speciation, is difficult to apply to population memetics (as a proxy for a memetic taxonomy). While cladistic taxonomy takes as a cardinal principle the notion of strict hierarchic inclusion – the “perfect nesting” of monophyly (Dawkins, 2006, p. 367) – a taxonomy of culture must account for the hybridising interaction between members of different lineages, a phenomenon arguably applicable to several of the levels at which nature-culture alignments are hypothesised to exist. Hybridisation is evident, for instance, in the Galant schemata with which Gjerdingen (1988, 2007a) is concerned. The variety of changing-note patterns replicated by composers in the eighteenth century were presumably not the result of successive branchings in a lineage that began with a single primary schema; rather, they are more likely to have resulted from the intermixing (hybridisation) of initial and terminal schema-events from several coexistent schemata (Jan, 2013).

3.6.5 Distinguishing Homologies from Homoplasies in Music-Cultural Evolution

Having argued in §3.6.4 that m(us)emes are the most tractable units with which to construct cultural taxonomies, it is instructive to attempt to apply to them the three categories outlined in §1.7.2 used to organise biological similarities – namely, homoplasy, ancestral homology, and derived homology (page 75). To review these briefly, a homoplasy is “a character shared between two or more species that was not present in their common ancestor” (Ridley, 2004, pp. 427–428, 480), most often resulting from convergent evolution arising “when the same selection pressure has operated in two lineages” (2004, p. 429); an ancestral homology is “present in the common ancestor of the group of species under study” (2004, p. 431) and “found in some but not all of the descendants of the common ancestor” (2004, p. 480); and a derived homology “evolved after the common ancestor, within the group of species under study” (2004, p. 431) and is “found in all the descendants of the common ancestor” (2004, p. 480).

Like the palaeontologist with his or her fossil record, the musicologist has at his or her disposal the phemotypic forms of musemes, preserved as notated and recorded music. As with the fossil record, however, this account is incomplete; but whereas the palaeontologist can see slow-moving biological evolution reflected in exposed rock strata and build taxonomic trees from them (and from molecular-biological evidence), the speed of cultural evolution is so rapid, and the number of interacting individuals sustaining it so large and diverse, that only a comprehensive sequential account of all the interactions among all participants in a dialect over a given segment of geography and/or chronology can securely establish chains of museme transmission and, therefore, trees of cultural evolution. This constraint suggests that, while not impossible, developing musemic taxonomies will be difficult and time-consuming. As §6.1 and §7.5.3 suggest, computer technology may well expedite such research.

I consider here how homologies might be distinguished from homoplasies at the level of the museme, extending the discussion in Chapter 1 apropos Figure 1.1 . One fundamental issue here is that biological phylogenies take account of both morphology and molecules, which, in cultural phylogenies, equate to structure and musemes, respectively.130 This would imply an approach that attempts to identify different structural loci (analogous to morphology in biological classification), and the various museme alleles that instantiate those loci (analogous to molecules in biological classification) (§3.5.2 ). In this sense, one is recuperating the taxonomy of formal-structural types (§3.6.2 ) under the aegis of an ostensibly museme-level perspective.

Ridley lists three principal criteria by which homologies can be distinguished from homoplasies in biological evolution (2004, p. 430), and I list them here in order that inferences on the treatment of cultural homologies versus homoplasies might be made:

Structural Similarity:

homologies have the same fundamental structure, not merely surface similarity. Bird and bat wings look superficially similar, but are structurally quite different, and are in fact homoplasious (Ridley, 2004, p. 428, Fig. 15.3).

Relations to Surrounding Characters:

homologous features are usually related to surrounding structures, such as a given bone to its surrounding bones, in broadly similar ways.

Embryonic Development:

homologies normally follow similar lines of embryonic development; similar adult characteristics arrived at by different embryological routes tend to be homoplasies.

How might these three criteria be applied in cases of similarity between musemes and between musemeplexes, in order to distinguish cultural homologies (ancestral and derived) from cultural homoplasies? Table 3.3 attempts to rework for application to musical contexts the criteria for these phenomena in biology just listed; and Figures 3.11a3.11f provide candidate musical examples (taken mainly from the Viennese classical repertoire) of homoplasies and homologies.131

Criterion

Homoplasy

Ancestral Homology

Derived Homology

Structural Similarity

(i) Foreground-level pitch similarity not supported by middleground-level similarity; and/or (ii) few rhythmic resemblances; and/or (iii) few contextual/poietic connections (Figure 3.11a ).

(i) Foreground-level pitch similarity with some middleground-level similarity or vice versa; and/or (ii) some rhythmic resemblances; and/or (iii) some contextual/poietic connections (Figure 3.11b ).

(i) Foreground-level pitch similarity underpinned by significant middleground-level similarity; and/or (ii) significant rhythmic resemblances; and/or (iii) significant contextual/poietic connections (Figure 3.11c ).

Relations to
Surrounding Characters

No or limited instantiation of a virtual musemeplex (after the distinction on page 289) and (thus) no or limited instantiation of a musemesatz (Figure 3.11d ).132

Some instantiation of a virtual musemeplex or limited instantiation of a real musemeplex and (thus) some instantiation of a musemesatz (Figure 3.11e ).

Significant instantiation of a virtual musemeplex or of a real musemeplex and (thus) significant instantiation of a musemesatz.133

Embryonic Development

No evidence of derivation from antecedent musemes in a composer’s sketch materials or other poietic documents.

Some evidence of derivation from antecedent musemes in a composer’s sketch materials or other poietic documents.

Strong evidence of derivation from antecedent musemes in a composer’s sketch materials or other poietic documents.

Table 3.3: Criteria for Distinguishing Between Musemic Homoplasies and Homologies.

PIC

PIC

(a) Structural Similarity: Homoplasy. J. S. Bach: Das wohltemperirte Clavier Book II (c. 1740), Praeludium V, BWV. 874, bb. 1–2 (upper); Mozart: Symphony no. 41 in C major K. 551 (“Jupiter”) (1788), II, bb. 28–29 (lower).

PIC

PIC

(b) Structural Similarity: Ancestral Homology. Haydn: String Quartet in C major op. 76 no. 3 (“Emperor”) (1797), II, bb. 12–14 (upper); Beethoven: Piano Concerto no. 4 in G major op. 58 (1807), I, bb. 10–14 (lower).

PIC

PIC

(c) Structural Similarity: Derived Homology. Mozart: La clemenza di Tito K. 621 (1791), no. 7, “Ah perdona al primo affetto”, bb. 44–46 (upper); Ferdinand David: Concertino for Trombone in Emajor op. 4 (c. 1837), I, bb. 1–4 (lower).

PIC

PIC

(d) Relations to Surrounding Characters: Homoplasy. Mozart: Don Giovanni K. 527 (1787), no. 13, “Signor, guardate un poco”, bb. 249–253 (upper); Schubert: String Quintet in C major D. 956 (1828), I, bb. 138–142 (lower).

PIC

PIC

(e) Relations to Surrounding Characters: Ancestral Homology. Beethoven: Symphony no. 9 in D minor op. 125 (1824), I, bb. 74–80 (upper); Schubert: Symphony no. 9 in C major D. 944 (1828), III, Trio, bb. 57–64 (lower).

PIC

PIC

(f) Relations to Surrounding Characters: Derived Homology. Haydn: String Quartet in Emajor op. 9 no. 2 (1769), II, bb. 1–10 (upper); Mozart: Requiem K. 626 (1791), “Hostias”, bb. 3–10 (lower).
Figure 3.11: Musemic Homoplasies and Homologies.

As might be expected from Table 3.3 , Figures 3.11a , 3.11b and 3.11c relate to musemes, whereas Figures 3.11d , 3.11e and 3.11f relate to musemeplexes. Beginning with the “Structural Similarity” criterion, Figure 3.11a shows two patterns that, although spanning the melodic interval of a fifth, are structurally different in that the Bach passage prolongs tonic harmony whereas the Mozart outlines a 1/I–4/V progression, the g1 (5) in b. 291 being an échappée from the preceding f1 (4). The rising-fifth line is an example of the kinds of “good tricks” (Dennett, 1995, pp. 77–78) (§5.5.1 ), or “commonalities” (Cope, 2003, p. 17), which form the generic connective tissue of much tonal music.

Figure 3.11b shows a more closely related pair of passages, the upper-line museme in the second half of the Haydn phrase appearing at the start of the Beethoven passage. The different harmonisation of the penultimate element (V7   in b. 141 of the Haydn; V4
3   in b. 111–2 of the Beethoven), the result of different coadapted lower-line musemes, is the motivation for assigning this relationship to the category of ancestral homology, rather than derived homology, on the assumption that both passages derive from a common ancestor, but have diverged to some extent from it. Nevertheless, Beethoven’s museme is followed by another (b. 12) that occurs in the analogous position in the Haydn passage (b. 143–4), suggesting some “relation to surrounding characters”.

Figure 3.11c shows less divergence between the two passages, with the outline of the Mozart phrase being replicated very closely in that by David. Both are instances of the Romanesca schema, which constitutes their common ancestor.134 The two alleles in Figure 3.11c represent variants that mutate the Romanesca’s core (enclosed by a dashed-line box in both passages) by rising to the upper 1 followed by a descent to an imperfect cadence, further mutated in the David passage in its local emphasis on vi in b. 33–4. As a derived homology, these two passages “evolved after the common ancestor, within the group of species under study” (Ridley, 2004, p. 431), the group being these two examples and, possibly, others.

Turning to the “Relations to Surrounding Characters” criterion, Figure 3.11d shows a cadential figure given a minuet-topic3.8.5 ) treatment in Mozart and a march-topic garb in Schubert (Ratner, 1980, pp. 9–11, 16; see also Monelle, 2006). Other than this museme, there are no further musemic-structural alignments, as the interpolated emphasis on E minor in bb. 139–140 of the Schubert passage – which has no parallel in the Mozart – might imply. Therefore, there is no musemeplex common to these two passages, no musemesatz, and thus there are no relations (certainly in terms of the parameters considered) to surrounding characters.

Figure 3.11e shows two passages that might initially seem as dissimilar as those in Figure 3.11d . Nevertheless, as the overlay indicates, their component musemes at each locus are allelically equivalent, and therefore a virtual musemeplex, and thus a musemesatz, is generated.135 This suggests the passages are related in terms of an ancestral homology, although the close chronological proximity of the two works, and the clear cultural influence Beethoven had on Schubert, might afford counter-evidence in favour of a derived homology.136

Despite the greater chronological distance between the two works in Figure 3.11f compared with Figure 3.11e (twenty-two years as against four years, respectively), the passages in Figure 3.11f show greater structural similarities, hence the ascription of a derived homology rather than an ancestral homology. Not only does the Mozart passage reinstantiate the musemeplex, and thus the musemesatz, of the Haydn, but the generative foreground-level musemes in these two passages are more similar than is the case in Figure 3.11e , and these similarities are based upon a greater number of museme-museme correspondences. Thus, the passage is arguably closer to the “real” than to the “virtual” end of the musemeplex-type continuum identified in connection with the definitions on page 289.137

While the precepts outlined in Table 3.3 are ultimately subjective, and while Figure 3.11 applies them using relatively informal and intuitive judgements, it is useful in some situations to be able formally to measure and quantify relationships between musemes. On this logic, above a certain similarity-threshold, two similar musemes might be held to be homologous, not homoplasious, and vice versa if they are below the threshold. Various computational approaches have been developed in order to quantify similarity in music (Velardo et al., 2016). Some of these aim to model perception and cognition, in that two passages ranked according to their underlying algorithms as closely related are also perceived as such by listeners. Müllensiefen and Frieler (2004) evaluated some forty-eight similarity-detection algorithms, comparing them with the responses of listeners in tests of melodic similarity (see also Müllensiefen & Frieler, 2006). Their findings suggest that some of the most psychologically robust metrics of melodic similarity are of the “edit-distance” type (Müllensiefen & Frieler, 2004, p. 168), whereby the cost of moving from one pattern-form to another is quantified.

A well established example of this type is the metric proposed by Damerau and Levenshtein (Levenshtein, 1966; see also Orpen & Huron, 1992). This assesses the notional costs, according to some predetermined scale of values, of the operations of insertion (adding a new component), deletion (removing a component), and substitution (replacing one component by another that is equivalent to the original), by means of which a source text is transformed into a target, or a target is understood to be derived from a source. A related approach, the Earth Mover’s Distance (EMD) metric, first developed in the context of image-retrieval research (Rubner et al., 2000) and then applied to music (Typke et al., 2003; Wiering et al., 2004; Typke, 2007; Typke et al., 2007),

determines the minimum amount of work that is needed for converting one set of weighted points into another.138 The required work grows with the amount of weight that needs to be moved to different positions, and with the distance over which the weight needs to be moved. (Typke et al., 2007, pp. 154–155)

Put more simply,

[o]ne pattern … is represented as heaps of earth, the sizes of which correspond to the weights of the dots; the other pattern … as holes with a certain capacity, likewise corresponding to the dots’ weights. The task is to fill the holes with as little effort (that is, ground distance times weight) as possible. (Wiering et al., 2004, p. 117).

The EMD is defined by the following equation (Typke et al., 2007, p. 155):

            minF ∈ℱ ∑m   ∑n   fijdij
EMD (A,B ) =----------i=1---j=1-----
                   min(W, U)
(3.3)

Unpacking this,

A [source] and B [copy] are sets of weighted points. is the set of all possible flows that would convert A into B …. Every flow consists of one flow element for each pair of points out of the m points in A and the n points in B. Every flow element carries a weight of fij over a ground distance of dij from one point in A to one point in B. W and U are the sums of weights in set A and B, respectively. Therefore, the EMD is the sum of distances in the optimum flow, weighted with the corresponding weights, normalized with the total weight of the lighter point set. (Typke et al., 2007, p. 155)

Such approaches align well with the mappings of levels seven and eight of Table 1.4 , in that differences between genes and between m(us)emes can be represented in terms of edit-distance metrics used to quantify the operations of insertion, deletion and substitution (Hoeschele & Fitch, 2022; Savage et al., 2022; see also §3.6.6 ). As the mechanisms of replicator mutation, these three operations act on nucleotides – in a process termed “point mutation” (Ridley, 2004, p. 28, Fig. 2.4) – serving to move a gene away from other genes, including its alleles, in a multidimensional genetic hypervolume. The latter are vast conceptual-potential spaces encompassing, in this case, all possible genes and all their possible alleles (Jan, 2007, pp. 197–199; see also §5.5.2 , §6.5 and §7.5.3 ). A genetic hypervolume is the biochemical equivalent of Borges’ “Library of Babel” (1970; see also note ?? on page ??). The same three operations, acting upon pitches and rhythms, create museme mutations (Jan, 2007, pp. 116–117), these serving to move a museme away from other musemes, including its alleles, in a multidimensional musemic hypervolume.

3.6.6 Cultural Cladograms

Whether using informal judgement or formal similarity/difference quantification between musemes to distinguish between homoplasies and homologies, it is useful to represent the latter graphically, for just as the long-term outcomes of biological evolution can be represented in terms of branching lineages on (by convention) a tree diagram, so can those of cultural evolution. Applying the principles of cladistic taxonomy (§1.7.2 ), one might arrive at a representation, a cultural cladogram, not just of the evolutionary relationships between dialects (Savage, 2019, pp. 4–6), but also of those between musemes. As noted in §3.6.4 , the latter enterprise, population memetics, is closer to population genetics than it is to the taxonomy of species.

As a first word of caution, attempting to calculate cultural phylogenies – what might be termed phylomemies – risks falling foul of what might be termed the distinction between real and virtual phylogen/memies.139 A real phylogen/memy is one that is objectively evolutionarily correct, indicating the transmission relationships between the replicators at various positions on the cladogram. A virtual phylogen/memy is one that arrives – perhaps as a consequence of a restricted sample-size – at a “pseudo-cladogram”. This, while a logical and parsimonious representation of the patterns under investigation, is nevertheless potentially not evolutionarily true, and is therefore not properly cladistic, because it does not take into account patterning outside the sample under consideration that, if included, might alter the relationships represented by the cladogram. It would appear considerably easier to arrive at a real phylogeny – where groups of potentially related organisms are often relatively geographically localised, morphologically distinct and, nowadays, genetically tractable – than it is to arrive at a real phylomemy – where groups of potentially related cultural forms are often scattered across space and time.

Yet this enterprise is worth pursuing, if only to illustrate the possibilities of the approach, one that C. J. Howe and Windram (2011) term “phylomemetics”, the cultural equivalent of phylogenetics. As they acknowledge (2011, p. 1), this is by no means a new methodology in the humanities, where philologists in both linguistic and musical research have long attempted to reconstruct stemmata showing relationships of transmission and mutation in sources as diverse as biblical texts and Medieval music manuscripts (K. M. Cook, 2015). Conducted under (or, some might fear, annexed by) the rubric of phylomemetics, such research can incorporate all the intellectual infrastructure of Darwinism – the notions of variation, replication and selection; concepts of fitness; and ideas of lineage bifurcation – in attempting to trace connections between the phenomena under investigation.140

Using the phylogeny-calculation software Phylip (Felsenstein, 2018) – which essentially performs edit-distance calculations on symbolic representations of genetic data – six versions of the folk ballad “The two brothers” are analysed (Jan, 2018a).141 This analysis is based on the input data shown in Figure 3.12 (Jan, 2018a, p. 11, Fig. 3a), which is a date-ordered list of the melodies consisting of a sequence of their constituent pitches, grouped into two-bar-long museme alleles (“v” represents a variant form of the melody).142 It should be stressed that this is an illustrative calculation only, designed to outline a methodology that might be adopted and developed in larger studies. The highly restricted dataset naturally limits the scope of the conclusions – potentially limited to a virtual phylomemy – that can be drawn.

6  26
15sept1916 0ccegagg cceed ddbcddb bggabc
15sept16-v 0ccegagg cceed bbgbddd deeabc
16sept1916 cccegaag ccegd ddbbddb bggabc
18sept1916 gccegagg ccegd bbddedd dggabc
18sept16-v gccegagg ccegd ddbbddb gggabc
03sept1918 0ccccaag ccegd ddbcddb bggabc

Figure 3.12: Input Data for Phylomemetic Tree.

The phylomemetic tree shown in Figure 3.13 (Jan, 2018a, p. 12, Fig. 4a) is generated using the Phylip Pars utility, which “is a general parsimony program which carries out the Wagner parsimony method [(Eck & Dayhoff, 1966)] with multiple states. Wagner parsimony allows changes among all states. The criterion is to find the tree which requires the minimum number of changes” (Felsenstein, 2018). For ease of comparison, the text-based output of Pars (strictly, that of the Phylip graphics-generating utility DrawGram) is augmented in Figure 3.13 by images of the relevant melodies, in which boxed numbers distinguish museme alleles.143

PIC
Figure 3.13: Output Phylomemetic Tree.

Such cladograms represent Darwin’s “descent with modification” (2008, p. 129), whereby items located to the left (bottom/past) are hypothesised to be evolutionarily earlier than those located to the right (top/present); and where proximity to points of bifurcation (branch-length) represents relative evolutionary distance. While parsimony is a powerful constraint on evolutionary possibilities, and is a key element of Phylip’s analytical algorithm, it does not invariably align with evolutionary reality, particularly in the case of cultural cladograms. Thus, a parsimonious cladogram – one that proceeds from left to right by minimal branching and short mutational distances – is not necessarily “real”, in terms of the distinction made above. Moreover, as suggested in §2.5.1 , evolution is fundamentally a process of adaptive change (Ridley, 2004, p. 4) and not necessarily one where that change leads to an increase in “the logarithm of the total information content of the biosystem (genes plus memes)” (Ball, 1984, p. 154).144 In this light, musemic simplicity does not necessarily correlate with chronological anteriority, any more than musemic complexity corresponds with chronological posteriority.

As a second word of caution to add to the first given on page 366 – one that applies more broadly to any attempt to analyse music by means of the kinds of symbolic representations used in Phylip – in order to perform the phylomemetic analysis, the musical sounds of these melodies, already converted to their traditional western letter-name notation by (Bronson, 1959), was rendered as a series of ASCII characters to form the input to the Pars utility. In this way, the sounds of these extracts are treated as a text. This means that the analysis is operating on a representation two stages removed from a living performance: not only has a vocal rendition been regularised and shoehorned into western notation, a form of “lossy” compression; but this representation has itself been further divorced from its connection with sound by its reduction to an abstract symbol-set. Perhaps more fundamentally, while the Phylip software to some extent “understands” genetics, in that it is based on a formalisation of the dynamics of the biochemistry underpinning it, it has little conception of music and the dynamics of pitch and rhythm combination underpinning it. Nevertheless, the symbols offered as input bear at least some connection with their long-distant musical antecedents, and so permit a provisional phylomemetic analysis based on parsimony relationships to be conducted. This issue is considered further in §6.4 .

§4.4.1 considers to what extent cladograms can be related to the prolongational trees in the Generative Theory of Tonal Music (GTTM) of Lerdahl and Jackendoff (1983) (see also §3.8.6 ).

3.7 Gene-Meme Coevolution

The evolution of Homo sapiens was driven by a number of selection pressures. Many of these, certainly initially, were biological-environmental: our species had to adapt to harsh and varied climates; we had to develop strategies to counteract predators, and rival hominin species; and we had to find means to communicate and cooperate as part of our communal lifestyle. These selection pressures acted upon our genes – for whom the evolution of Homo sapiens was ultimately in the service – causing us to become stronger, faster, more cunning and more sociable. In the process, our genes evolved to become more replicable (they coded for features that enhanced the statistical likelihood of their replication), and in some respects they shaped their wider environment in order to make it more conducive to them, for example by destroying rival species and by reshaping the world in favourable ways. But a second type of selection pressure, cultural-environmental, also operated upon us, certainly from the beginning of the Cognitive Revolution2.5.5 ), if not earlier in the hominin lineage. Here, the memes that populated our brains began to exert pressures on the biological systems that sustained them in order to create a better environment for themselves. In the process, our memes evolved to become more replicable (they became leaner, fitter, more memorable and more beneficial to their hosts; or they capitalised on their hosts’ hopes and fears (see note 88 on page 224)), and in some respects they shaped their wider environment in order to make it more conducive to them, for example by leveraging three-dimensional space to provide opportunities for their aural and visual expression.

This short overview suggests that there are various ways in which intra- and inter-replicator-class relationships might operate. These are considered in this section, which encompasses some of the means by which musicality and music are shaped by gene-meme (as opposed to gene-gene or meme-meme) coevolution. Coevolution is an important topic in evolutionary theory – the key texts are Lumsden and Wilson (1981), Cavalli-Sforza and Feldman (1981), Boyd and Richerson (1985), Durham (1991), and Richerson and Boyd (2005) – even when only considering a single (genetic) replicator, not least because the evolution of complex organisms was a (intra-replicator-class) coevolutionary process. Moving from single-celled organisms to the complex multicellular structures of which we are perhaps the supreme example required collaboration between ostensibly selfish replicators. Over the course of evolutionary history, selection rewarded those replicators that joined forces to create a single, encompassing vehicle, one that served the interests of all the replicators it carried (§1.6.1 ). Typically, such coevolution was associated with the division of labour, such that certain replicators coded for vehicle-features that served one function, while others coded for features with a different, complementary, function.

Whether dealing with interactions between replicators of one class or of two, there are three fundamental categories into which their relationships fall: “cooperation or mutualism, in which both parties benefit from the interaction (a plus/plus relationship); competition in which both parties lose (a minus/minus relationship); and conflict or antagonism, in which one party benefits and the other loses (a plus/minus relationship)” (Blute, 2006, p. 154, emphases mine). There are, moreover, two broad strategies by which such coevolution has been formalised: population-genetics models and optimisation/game-theory models. “The essential difference is that population genetics attempts to model underlying informational structures, whether genetic or memetic, while optimization (for non-social situations) and game theory (for social situations) model surface or observable characteristics, including behavior, which are commonly called ‘strategies”’ (2006, p. 153). It is game theory that perhaps offers the best means by which gene-meme coevolution can be understood.

Developed in mathematics by John von Neumann – his contribution to computing is discussed in §7.3.1 – game theory is concerned with competitive situations in which agents (replicator-driven vehicles) adopt a range of strategies in order to maximise their share of a finite resource. The “Prisoner’s Dilemma” game is a simple example of some of the ideas underpinning Game Theory. Here, two players, A and B, can choose to “cooperate” or to “defect” (i.e., to break an implicit trust, leading to competition or conflict, in the terms of the first quotation from Blute (2006) above). The four outcomes resulting from their combination are often represented in a two-by-two grid. The outcomes (and their pay-offs in Dawkins’ summary of the game) are: (i) A: cooperate–B: cooperate (A and B both gain $300 as a “[r]eward for mutual cooperation”); (ii) A: defect–B: defect (A and B both lose $10, as “[p]unishment for mutual defection”); (iii) A: cooperate–B: defect (A (the “sucker”) loses $100 and B gains $500, reflecting the “[t]emptation to defect”); and (iv) A: defect–B: cooperate (in an inversion of (iii), A gains $500 and B, now the sucker, loses $100) (Dawkins, 1989, pp. 203–204). There are many variants of this game, some differing in the allocation of the pay-offs. More fundamentally, some variants move away from the determinism of simpler variants in favour of more complex-dynamic-system models (Blute, 2006, p. 162).145

Game theory was extended to evolutionary theory by John Maynard Smith (Maynard Smith, 1982), who considered the mechanics of the interactions between organisms in order to understand in what circumstances it is advantageous for them to be cooperative or to be antagonistic. Here the behaviours of cooperation and antagonism are understood as phenotypic manifestations of genes – “strategies of the kind that genes might preprogram” (Dawkins, 1989, p. 208) – so, as always in evolution, any real advantage arising from behaviours accrues to the replicator, not to the vehicle. Because cooperative and antagonistic interactions between individuals occur in the context of multiple factors, not least numerous similar interactions between other conspecifics, they – like the VRS algorithm that subsumes them – constitute a complex dynamic system, which, by nature, are intrinsically non-linear (i.e., variations in input and output are not proportional). Sometimes, such systems are constantly unstable, oscillating from one state to another. In other situations, however, they reach an equilibrium, in which one state prevails and becomes resistant to perturbation. Maynard Smith coined the notion of the Evolutionarily Stable Strategy (ESS) to describe such situations of equipoise in the evolution of interactive behaviour. This is

a strategy which, if most members of a population adopt it, cannot be bettered by an alternative strategy.… the best strategy for an individual depends on what the majority of the population are doing. Since the rest of the population consists of individuals, each one trying to maximize his own success, the only strategy that persists will be one which, once evolved, cannot be bettered by any deviant individual.… once an ESS is achieved it will stay: selection will penalize deviation from it. (Dawkins, 1989, p. 69, emphasis in the original)

The types of dimorphism arising from sexual selection2.5.3 ) represent a category of ESS, although it is one that can be perturbed by those events subsumed by the third category of computer-simulation model of the Fisher process (“invasion analysis”) on page 144.

While game theory was initially applied to model interactions between replicators of the same class (i.e., gene-gene), it has subsequently been extended to gene-meme interactions. The three categories of cooperation, competition and conflict outlined on page 375 work as follows in the case of such dual-replicator coevolution. In cooperation, genes and memes are “favored to match” (Blute, 2006, p. 155). That is, whatever circumstance or situation serves the interests of certain genes also serves the interests of certain memes: their interests match, or align (equivalent to Dawkins’ outcome (i) of the Prisoner’s Dilemma game in the list on page 376). In competition, genes and memes are “favored to unmatch” (2006, p. 155). That is, whatever serves the interests of certain genes does not serve the interests of certain memes, and vice versa (Dawkins’ outcome (ii); “mutual defection”). In conflict, “one is favored to match and the other to avoid matching, two ways” (2006, p. 155). That is, there are two possible sub-scenarios, where: (i) certain memes are favoured if they match certain genes, but in this matching those genes are themselves disfavoured (so memes “chase” genes, which try to “run away”, in an evolutionary sense); and (ii) certain genes are favoured if they match certain memes, but in this matching those memes are themselves disfavoured (so genes “chase” memes, which try to “run away”) (Dawkins’s outcomes (iii) and (iv), respectively, where his player “A” represents genes and his player “B” represents memes, and where one replicator gains and the other is the “sucker”) (2006, pp. 155–156).

These scenarios are represented in Table 3.4 , a Prisoner’s-Dilemma-type two-by-two grid after Blute (2006, p. 155, Tab. 1, p. 156, Tab. 2), with Table 3.3a representing cooperation and competition and Table 3.3b representing conflict (and where “G” and “g”, and “M” and “m” represent gene and meme alleles, respectively; and “h” and “l” indicate “high” and “low” pay-offs, respectively).146

G

g

M

\ \ \   h|l
h|l   \ \
         \

\ \ \   l|h
l|h   \ \
         \

m

\ \ \   l|h
l|h   \ \\

\ \ \   h|l
h |l  \ \\

(a) Cooperation (pay-off shown to the left of the vertical line) and Competition (pay-off shown to the right of the vertical line).

G

g

M

\   h
l \\

\    l
h \\

m

\    l
h \\

\   h
l \\

(b) Conflict.
Table 3.4: Pay-off Matrix for Gene-Meme Coevolution.

An example of gene-meme coevolution has already been given in §2.3.5 and §2.5.4 – namely, the case of the invention of fire changing the types of food humans were able to eat, thereby shaping the evolution (directly) of our digestive tract and (indirectly) of our brain. To this example one might add that of dairy farming. Patel (2018, p. 116) notes that around 8,000–11,000 years ago humans began herding animals for milk production. Previously, humans predominantly drank milk from their mothers and, after weaning, the enzyme lactase, for the digestion of the milk protein lactose, was switched off. A genetic mutation for continued lactase production appears to have spread in human populations under the selection pressure of the cultural practice of dairying. In other words, without the memes for dairying, the genes for continued production of lactase would likely not have been replicated. In turn, the genetic support for lactose-digestion fostered the further cultural evolution of memes for dairying, leading to the evolution both of better technologies for farming and of more varied uses for milk (such as cheese, yoghurt, etc.).

3.7.1 Memetic Drive

Both of the examples of gene-meme coevolution given in the last paragraph of the previous subsection are relevant to the subject of this book, because anything that augments brain capacity (enhanced nutrition, in these cases) is likely to enhance musicality and thus provide an ever more fertile environment for musemes. This section considers an another example of brain-augmenting gene-meme coevolution, the hypothesis of memetic drive (or memetic driving), whereby encephalisation – the increase in absolute and/or relative brain size – is argued to have been fostered by the selection pressures imposed by memes (Blackmore, 1999, pp. 76–80; Blackmore, 2000a, pp. 31–33; Blackmore, 2001, pp. 243–245).147 Specifically, memetic drive concerns the encephalising responses made by genes in response to the survival advantages conferred by memes, which result in memes acquiring ever greater autonomy from genes and eventually turning the tables on genes by driving genetically sub-optimal but memetically optimal additional encephalisation. Thus, memetic drive represents a variant of sub-scenario (i) on page 378, whereby memes “chase” genes to force them to provide an ever more conducive environment for their own replication. It is possible that memetic drive worked in conjunction with other encephalisation-driving processes, including the nutritional examples just given.

There are many factors limiting the indefinite expansion of brain size in an animal, but the two most important are the fact that the brain consumes a disproportionate amount of resources (it is c. 2–3% of the human body by mass but draws c. 25% of the resting body’s energy (Harari, 2014, p. 9)); and the fact that, in humans, a large brain in the uterus makes passage down the birth canal difficult and risky for both mother and infant. The latter factor may account for the relatively long period of infant care in Homo sapiens compared with many other primate species: the human infant needs such protracted care, during which brain size increases, because an infant could not have been safely delivered at a more advanced stage of brain development (Dissanayake, 2008, p. 172) (§2.3.4 ).

Given these various constraints on brain size, it is necessary to account for encephalisation in the hominin line – not just the increase in the absolute size of the brain, but the increase in its size relative to the body as a whole (as measured by the encephalisation quotient), and its associated lateralisation in humans2.7.7 ). After all, many organisms survive perfectly well with much smaller brains, so why do humans have such large, complex and physiologically expensive brains? Blackmore argues that this may be explained in terms of memetic drive. In summary, the three-stage process she hypothesises is as follows (a concise review is given in §5.2 ; see also Jan (2007, pp. 242–244)). Note that, while animals can of course copy actions, the “capacity to imitate” in the account below is arguably most potent in the domain of sound, in the form of the vocal learning discussed by Merker (2012) (2.7.5 ).

Selection for Imitation:

“Capacity-to-imitate” (hereafter “CtI”) genes (those controlling the perceptual-cognitive and vocal-motor substrates for imitation) will tend to spread in a gene-pool because of the fitness advantages imitation confers on an individual compared with trial-and-error learning (Blackmore, 1999, p. 77). Those who are most adept at imitation – the quick learners – are termed “meme fountains” (hereafter “MF”) by Blackmore (2000a, p. 32). This mechanism alone can explain an increase in brain size, because it binds encephalisation to survival advantage via Darwinian natural selection (Blackmore, 2000a, p. 32). This is because imitation is a cognitively demanding skill and therefore requires substantial brain capacity; those with the biggest brains will tend to be the best imitators and will tend, via the survival advantage imitation-transmitted knowledge confers, to have more viable offspring.148 The mechanism for this process, an element of vocal learning 2.7.5 ), is outlined in point 7 of the list on page 188, and in point 18 of the list on page 200.

Selection for Imitating the Imitators:

A genetically controlled ability to identify and preferentially imitate MFs may confer a “borrowed” gene-fitness advantage on this ability-detector’s possessor, leading to a differential increase of such “imitate-the-meme-fountains” (hereafter “ItMF”) genes (Blackmore, 1999, pp. 77–78) – i.e., genes for knowing who is a good bet to imitate. Memetic evolution and the expansion of culture gathers pace in this phase (Blackmore, 2000a, p. 32), perhaps engendering, among other replicator-types, the protemes of musilanguage.

Selection for Mating with the Imitators:

Here, advantages to genes and advantages to memes diverge. While the imitation described in the first and second points above would probably have been built on a substrate of innate primate capacities that arose initially via natural selection to fulfil a number of functions, it may subsequently have been augmented by sexual selection (Dennett, 2017, p. 266), leading to the appearance of coevolutionary sexual selection 2.5.3 ).149 As with all coevolutionary processes, there may come a point, as appears to have been the case here, where a replicator’s interests are best served not by continued cooperation but by defection, to use the terminology of the Prisoner’s Dilemma game.150

  • From the point of view of genes: (i) it is advantageous for a female to mate with a male MF because of the fitness advantages (accruing from a high capacity to imitate memes) conferred on her offspring (and grandoffspring) by the CtI genes (Blackmore, 1999, pp. 78–79) (as predicted by the “sexy sons” hypothesis (page 146)). As an instance of sexual selection, this preferential mating process will tend to lead not only to a differential increase of CtI genes (the ornament), but also of “mate-with-the-meme-fountains” (hereafter “MwtMF”) genes (the preference). Moreover, (ii) there will be an enhanced advantage for any alleles of the CtI genes that privilege replication of the most currently “favoured” memes (Blackmore, 1999, p. 80) – assuming such memes are initially gene-replication-enhancing – and, thus, an associated advantage for females to mate with those males with these specific alleles.
  • From the point of view of memes, this initially gene-beneficial privileging of the most “favoured” memes will initiate a process whereby: (i) memetic evolution is further expedited, in the form of ever more diverse and extreme ornaments; (ii) the ornament-memome may give rise to an ornament-phemotype that is detrimental to the replication of genes (such as reckless behaviours); and (iii) such gene-detrimental ornaments will tend to evolve much more rapidly than genes can evolve to control them, meaning that memes, capitalising on genetically mediated preferences, are able to “outwit” genes (Blackmore, 1999, p. 78). In this sense, memetic evolution has escaped the genes’ “leash” (Table 3.2 ) and is harnessing increased encephalisation to its own ends (Blackmore, 1999, p. 80).151

The advantage of Blackmore’s memetic drive hypothesis, specifically its third stage, is that it instantiates the type of Fisher-process linkage disequilibrium underpinning sexual selection, albeit across two replicators rather than in terms of the single-replicator perspective underpinning classical sexual selection. While the alignment of memetic drive’s proposed mechanism with a biological-evolutionary process that has been extensively modelled mathematically and computationally – using the approaches outlined on page 144 – does not in itself prove the existence of memetic drive, it is certainly suggestive that the process is credible. Indeed, to my knowledge, three studies broadly support the hypothesis, in different ways. First, a mathematical model of memetics confirms that ItMF genes can indeed spread within a population (Kendal & Laland, 2000, sec. 3). Second, adapting for a dual-replicator perspective the NKCS model of coevolving species (Kauffman, 1993), Bull et al. (2000) assert that

for most degrees of dependence between the two replicators, regardless of the dependence within the populations, a phase transition-like dynamic occurs as the relative rate of replication is varied. Within our model, until the rate of meme evolution is 130 that of genes, genes remain unaffected by their presence. From then on, until the memes evolve 10 times faster than the genes, the genes experience increasingly negative effects from the presence of the memes, and thereafter are unable to evolve effectively [i.e., auto-beneficially]. Conversely, the memes do not experience any benefit from increasing their rate of evolution until it is around 110 that of the genes. From then on, until they evolve 30 times faster than the genes, they experience increasing benefit from increasing their rate of evolution. Thereafter they suffer no beneficial or detrimental effects from any increase. (Bull et al., 2000, p. 234, emphasis in the original)

Third, Blackmore argues that a study on mirror neurons by Iacoboni (2005) supports three memetic-drive-related hypotheses. Mirror neurons have been reported in certain primates, including humans, and in song-birds. They are “multimodal association neurons that increase their activity during the execution of certain actions and while hearing or seeing corresponding actions being performed by others” (Keysers, 2009, p. 971), and have therefore been proposed as implicated in gestural and vocal imitation, social and emotional affiliation, and the capacities described by the Theory of Mind – that is, the ability to understand the motivations of others on the assumption that their mental processes are not dissimilar to our own (Fitch, 2010, p. 452; Harvey, 2017, pp. 56–58). The three hypotheses are as follows:

[(i)] if brain size has been meme-driven, then within groups of similar species brain size should correlate with the ability to imitate.… More specifically, I predicted that [(ii)] brain scans of people either initiating or imitating actions should reveal that ‘imitation is the harder part – and also that the evolutionarily newer parts of the brain should be especially implicated in carrying it out’ [(Blackmore, 2000b, p. 73)]. This implies that the parts of the brain that differ most between chimpanzees and humans should be those involved in imitation (assuming that present-day chimpanzees are closer to our common ancestor than humans are).152 Finally [(iii)], if memetic drive is responsible for the evolution of language, then we should expect the language areas in the human brain to be derived from areas originally used for imitation. This is what Iacoboni [(2005)] and his colleagues have demonstrated, thus confirming these predictions. (Blackmore, 2005b, p. 204)

Memetic drive is considered further in §5.4.1.3 , in connection with learned bird-song.

3.8 The (Co)evolution of Music and Language II: Semantics, Syntax and Thought

Having outlined in §2.7 how musilanguage might have become articulated into discrete segments, and how any segments that became freighted with meaning might have gone on to constitute the foundations of language, I now consider certain issues in the philosophy of language that have a bearing upon later stages of this hypothesised process. Because one selection pressure driving the bifurcation of musilanguage was the need to communicate thoughts and desires with ever greater precision, it follows that language is associated in some way with the thoughts it evolved to help communicate. Moreover, because much of human thought is conscious (a lot is not, shading into our automatic behaviours and reactions that, in a broad definition, are categories of thought), language is deeply implicated in the problem of consciousness7.2.1 ). I attempt to deal here, and in §7.4 , with the thorny question of the relationship between language, thought and consciousness, insofar as they apply to the evolution of musicality and music. I take certain ideas of Peter Carruthers and integrate them with precepts from memetics and neuroscience. Building a synthesis between the two main dimensions of music and language – external sound structures and internal brain implementation of musemes and lexemes – allows one to explore deep structural and functional similarities between the syntactic and semantic dimensions of language and music.

3.8.1 Language and Cognition

Considerable debate surrounds the issue of how language and thought relate to each other (Dennett, 2017, Ch. 9). Is language the mechanism for thought, the medium through which it is (exclusively) conducted, the so-called “cognitive conception” of language; is it simply a vehicle for, or translation of, thoughts conducted more fundamentally, in some kind of brain-language or “mentalese”, the so-called “communicative conception” of language; or does it occupy some intermediate position between these extremes (Carruthers, 2002, p. 657)? The cognitive conception of language, hereafter “cognitivism”, is associated with the “relativism and radical empiricism” of Whorf’s (Whorf, 1956) view of language – “the Standard Social Science Model”, in Pinker’s somewhat dismissive opinion (Carruthers, 2002, pp. 661, 664). By contrast, the communicative conception of language, hereafter “communicativism”, is generally more strongly advocated by cognitive scientists and evolutionary psychologists.

In part, the distinction devolves to one of nurture (cognitivism) versus nature (communicativism). For cognitivists, such as Dennett (Dennett, 1995), the mind exists because the tabula rasa of the new-born child is shaped (bottom-up, inductively, a posteriori) by the nurtural power of language (indeed, in Dennett’s view, by the power of memes themselves). For communicativists, such as Pinker (1997), much of the mind is naturally and innately pre-formed (top-down, deductively, a priori) at birth by natural selection, so memes, if they are implicated at all in cognition, do not do the heavy lifting; rather, they act merely as epiphenomena of more fundamental processes. Seen in these terms, cognitivism intersects partly with “constructionist” approaches to language, which assert that “[g]rammar does not involve any [innate] transformational or derivational component”; rather, “learned [memetic] pairings of form [lexemic sound-pattern] and function [meaning/concept]” constitute structures “in a network in which nodes are related by inheritance links” and in which “[s]emantics is associated directly with surface form” (Goldberg, 2013, p. 15; see also Goldberg, 2003; Boas & Sag, 2012; Gjerdingen & Bourne, 2015).

There is currently no consensus on this particular nature-nurture question, despite the two positions not being mutually exclusive; and responses to the issues involved tend, as suggested, to be split along disciplinary lines. A fuller understanding certainly requires an interdisciplinary integration of neuroscience, psychology and philosophy. The argument advanced in Carruthers (2002) (see also the peer commentaries, 2002, pp. 674–705, and Carruthers’ response 2002, pp. 705–718) is perhaps one of the most convincing attempts to unpick the issues involved, and his preferred analysis of where on the cognitivism-communicativism continuum the most robust explanation for language and/as thought lies will be taken as the basis for much of what follows, not least because of its ready accordance with the memetic interpretation advanced in this book. Essentially, Carruthers, a moderate cognitivist, attempts to chart a via media between cognitivist claims of different strengths, ranging from weak (language is necessary for at least some kinds of thought) to strong (language is essential for all types of thought) and, by doing so, implicitly illuminates the communicativist inversion of this continuum.

3.8.2 Modularity, Language and Thought

Carruthers starts from the position that while “some thoughts are carried by sentences (namely, non-domain-specific thoughts which are carried by sentences of natural language), others [i.e., domain-specific thoughts] might be carried [non-linguistically] by mental models or mental images of various kinds” (Carruthers, 2002, p. 658, emphasis in the original). His hypothesis is that

non-domain-specific [conscious and unconscious] thinking operates by accessing and manipulating the representations of the language faculty. More specifically, the claim is that non-domain-specific [conscious and unconscious] thoughts implicate representations in what Chomsky … calls ‘logical form’ (LF). Where these representations are only in LF, the thoughts in question will be non-conscious ones. But where the LF representation is used to generate a full-blown phonological representation (an imagined sentence), the thought will generally be conscious. (Carruthers, 2002, p. 658, emphasis in the original, see also p. 666)

To accept this, one has to endorse a modular view of mental structure similar to (but not necessarily in complete accordance with) the views expressed in, for example, (Fodor, 1983). In Carruthers’ account, “besides a variety of input and output modules (including, e.g., early vision, face-recognition, and language), the mind also contains a number of innately channeled conceptual modules, designed to process conceptual information concerning particular domains” (2002, p. 663). These modules, for which strong selection pressures existed in early hominins, “include a naïve physics system … a naïve psychology or ‘mind-reading’ system … a folk-biology system … an intuitive number system … a geometrical system for reorienting and navigating in unusual environments … and a system for processing and keeping track of social contracts” (2002, p. 663).

By LF is understood here the unconscious mentalese structures underpinning and motivating the various connections possible between the components of natural language, in particular the relationships between verbs and the other sentence-elements required to combine with verbs in order to make a sentence grammatical (the mechanism for which is considered in §3.8.4 ), which some grammarians discuss under the rubric of “valency” (Durrell et al., 2015, Ch. 8). As Carruthers argues, a LF, that is, “a non-conscious tokening of a natural language sentence would be … a representation stripped of all imagistic-phonological features, but still consisting of natural language lexical items and syntactic structures” (2002, p. 666). Such “imagistic-phonological” features would appear to equate to the lexemes associated with a given LF. As discussed in §2.7.6 , a lexeme is the imagined (internally heard) or spoken (physically produced) sound pattern of a word. While not framed by him in evolutionary terms, this category of replicator is broadly analogous to Saussure’s notion of the “sound image”(§3.8.5 ).

While domain-specific thought operates independently of language (using mental models or images), non-domain-specific (i.e., domain-general) thought, in being tokened by language (Carruthers, 2002, p. 660), draws upon language’s syntactic structure – mediated by the underlying Chomskyan LF – to constitute it, not merely to express it (Carruthers, 2002, p. 664). Essentially, LF impels the generative-transformational aspect of language (Chomsky, 1965; Lerdahl & Jackendoff, 1983), whereby a finite set of recursive and hierarchical syntactic structures can underpin an infinity of content-specific utterances (§3.5.2 ). In particular, Carruthers suggests that “distinct domain-specific sentences might be combined into a single domain-general one” by means of “multiple embedding of adjectives and phrases” (2002, p. 669), giving as an example “The toy is in the corner with a long wall on the left and a short wall on the right”, produced initially in mentalese as a mental model or image by the geometrical module; and “The toy is by the blue wall”, similarly produced by the “object property” module dealing, among other things, with colour.153 These become integrated (unconsciously) by LF as the basis for the non-domain-specific/domain-general, and potentially lexemically (consciously) manifested, “The toy is in the corner with a long wall on the left and a short blue wall on the right” (Carruthers, 2002, p. 669).154

Figure 3.14 (a visualisation and extension of certain aspects of Carruthers (2002), after Jan (2016b, p. 478, Fig. 1)) hypothesises how the various language-related input and output systems, and their associated modules, might be organised and how they might interact.

PIC

Figure 3.14: Modularity, Language and Thought.

The domain-specific modules – such as (naïve) physics, (folk) biology and (naïve) psychology, the latter termed here “ToM” (Theory of Mind) – are shown in the intermediate (middleground) layer.155 While these and other modules are represented here as discrete “silos”, they are presumably highly interconnected in neurobiological reality. Moreover, while conceived in terms of input-output connections, modules also store information and so involve memory, of varying degrees of volatility. This memory is hypothesised to be encoded in the brain in accordance with the precepts of the Hexagonal Cloning Theory, discussed in §3.8.3 .

The domain-specific modules receive perceptual-sensory input processed by the hearing and vision centres (and also the centres responsible for taste, touch and smell), shown in the background layer; and they can also “back-project” to these sensory inputs, as in situations where aural and visual imagination is used to recreate or generate sounds and images (Carruthers, 2002, pp. 658, 666, 670). For clarity, not all linkages from sensory input to the domain-specific modules are shown in Figure 3.14 . The language module, shown in the foreground layer, consists of comprehension and production sub-modules/sub-systems and it receives inputs from, and sends outputs to, the domain-specific modules. As Carruthers argues,

[The] production sub-system must be capable of receiving outputs from the [domain-specific] conceptual modules in order to transform their creations into speech. And its comprehension sub-system must be capable of transforming heard speech into a format suitable for processing by those same [domain-specific] conceptual modules. Now when LF representations built by the production sub-system are used to generate a phonological representation, in ‘inner speech’, that representation will be consumed by the comprehension sub-system, and made available to central [domain-specific] systems. One of these systems is a theory of mind module.… perceptual and imagistic states get to be phenomenally conscious by virtue of their availability to the higher-order thoughts generated by the theory of mind system …. this is why inner speech of this sort is conscious: It is because it is available to higher-order [ToM] thought.156 (Carruthers, 2002, p. 666)

In Figure 3.14 , the production sub-system (“P”, and the associated blue arrows) is shown receiving outputs of the Number and Geometry modules after the receipt of some visual stimulus (purple arrows).157 These mentalese inputs are synthesised into a LF that potentially serves as the foundation and cue for a lexeme – in this case, perhaps one articulating some notion of the quantity of a certain environmental shape or regularity. Whether verbalised or not (the former indicated by the arrow to “produced speech”), the production sub-system may generate a phonological representation in “inner speech” (the lexeme sounding internally, perhaps by recruiting auditory-system neurons). Over time, and as a result of enculturation, the establishment of evolutionarily stable associations (coadaptations) between certain LFs and certain lexemes – in a kind of “lock-and-key” process – constitute language acquisition, both ontogenetically and phylogenetically. This phonological representation is “consumed” by the comprehension sub-system (“C”, and the associated green arrows). Its availability to higher-order thought via the ToM module (indicated by the arrow from the comprehension sub-system to the ToM module) renders it conscious, even though (as Carruthers’ remarks might be taken to imply) consciousness (and therefore language) is not necessary for comprehension.158 This “zone of consciousness” is approximated by the dotted ellipse in Figure 3.14 . In language reception, perceived speech (red arrows, initially from “Auditory Input (Heard Speech)”) is directed towards the comprehension sub-system via the hearing centre and cognised by means of “deconstruction” of its inferred LF into the aforementioned “mental models or mental images of various kinds” (Carruthers, 2002, p. 658) and by reference to the relevant domain-specific modules necessary to understand it. In the case of Figure 3.14 , these are Biology and Number – appropriate, for example, for a sentence articulating some notion of the quantity of a particular animal or fruit.

Having explained how underlying LF mentalese may be associated with an “imagistic-phonological” lexeme, I argue in §3.8.6 for a musical equivalent to this process: an association between LF mentalese and similarly“phonological” – but perhaps less overtly “imagistic” – musemes.

3.8.3 The Hexagonal Cloning Theory (HCT)

Is there a known mechanism of neural information encoding that might be consonant with Carruthers’ hypothesis of language outlined in §3.8.2 and also accommodate lexemes and musemes? One candidate is a family of related theories that stem primarily from Donald Hebb’s work in the 1940s on the columnar organisation of neurons and the formation of representations via neuronal interconnections, the latter process sometimes called Hebbian Learning (see also §6.5.1.2 ). Hebb (1949) argued, and subsequent work has confirmed, that certain cells, the pyramidal neurons, are arranged within the cerebral cortex in discrete columns, each of which is implicated in the encoding and representation of an element of perception or cognition. Rather than being distributed randomly, these columns are, in certain brain regions, broadly equidistant, giving a cortical polka-dot pattern when viewed from above (Calvin, 1998, p. 29). Subsequent research confirmed this hypothesis (Mountcastle, 1978), determining that columns of pyramidal neurons tend to form interconnected, co-resonating arrays – “cell assemblies” (Calvin, 1998, p. 13) – in the geometrically optimal form of the triangle (Leng et al., 1990; Leng & Shaw, 1991).159

An extension of triangular-array models, Calvin’s Hexagonal Cloning Theory (HCT) (1998; see also Jan, 2011a) asserts that coordinated pyramidal-neuron “minicolumns” (1998, p. 29) forming triangular arrays “interdigitate”, allowing several attributes of a percept or concept to be represented, via association of each attribute with a specific array. Again the result of geometrical parsimony, coordinated triangular arrays are themselves optimally encompassed by (virtual) hexagonal zones of cortex, these encompassing some array-implicated and some “silent” minicolumns (1998, pp. 43–45, 62). While they are synchronic structures of relatively stable neuronal connectivity, cortical hexagons also have a diachronic dimension, in that they encode a “spatiotemporal firing pattern” (SFP) (1998, p. 47) – a characteristic sequence of array-activation. Borrowing a concept from chaos theory, Calvin argues that the minicolumns forming the vertices of the triangular arrays constituting a hexagon create “basins of attraction” in cortex, these representing sensitisation (learning) resulting from perceptual input (1998, p. 68). Encoded patterns are reactivated as the phenomenon of recognition if the same input is subsequently encountered, and they may be internally or externally triggered as recollection and memory, the latter albeit of varying degrees of coherence and durability over time (Bonnici & Maguire, 2018; Gonzales et al., 2019). Multiple basins of attraction can be overlaid upon the same region of cortex, likened by Calvin to the layers of fish in sashimi; the deeper the layer, the more strongly encoded the pattern (1998, p. 107).

The cortical hexagons of the HCT afford a mechanism by which complex perceptual and cognitive information can be implemented and integrated at the neural level. The notion of integration is important here, because this architecture is particularly characteristic of “association cortex” – those regions of the brain where input from different sensory and motor areas is brought together and reconciled (Calvin, 1998, p. 42). This integration includes the parameters of musical pitch and rhythm, and presumably other attributes of music; indeed, Calvin often illustrates his theory using various musical concepts, seeing individual triangular arrays as “notes”, hexagons as melody-playing “ensembles”, and areas of co-resonating cortex as a “chorus” (1998, p. 39). Calvin’s exposition makes it clear, however, that such alignments go well beyond metaphor: the synchronic and diachronic aspects of music are, in reality, implemented this way. Thus, a hexagon is the minimal cell-assembly – the “cerebral code” (Calvin, 1998) – for representing the neural encoding (the memome) of a museme.

For such encoded information to constitute a museme it must be: (i) perceptually-cognitively salient; and (ii) replicated. The first condition is readily satisfied, because incoming perceptual information is often pre-segmented into discrete units by gestalt processes operating at “lower” levels of the perceptual input system (represented by the background level of Figure 3.14 ). Thus music-auditory data encoded by cortical hexagons generally constitutes (but is not necessarily limited to) patterns that are at least potentially musemes (Jan, 2011a, sec. 4.1.1). The second condition is satisfied when the original brain-encoded hexagonal pattern is reconstituted in a second brain, via another individual’s engagement with those phemotypic products to which the original memome gives rise (Table 1.3 ).

Alignment between input stimuli and extant basins of attraction leads to the activation and replication – “cloning” (Calvin, 1998, p. 40) – of a hexagon’s pattern, forming territories of interlocking plaques – “mosaics of the mind” – on the surface of the cortex. For this to occur, at least two abutting hexagons, representing the “minimal cell-assembly”, are required (1998, p. 47). As the mechanism for recognition or remembering, cloning underpins another significant element of Calvin’s theory, that of competition between rival hexagons – each form representing a candidate for the optimal encoding of a multi-component percept or thought – for the conquest of cortical terrain. Indeed, the HCT regards the architecture of the brain as enabling the operation of a Darwin machine in its connectivity (1998, pp. 33–34) (§1.5.4 ; see also the quotation from Calvin (1987b) on page 832). This neural Darwinism supports the VRS algorithm by means of: (i) the “variations on the cloned pattern” potentially arising from the failure of “error-correction” mechanisms (perhaps caused by “dead-key” missing notes, by hybridisation of two patterns that encounter each other in cortical “no-man’s-land”, or by hexagons attempting to pass through corrupting “barriers” in cortex) (1998, pp. 58–59, 88); (ii) the replication of successful hexagons across cortex; and (iii) their selection according to the criterion of fit between encoded/remembered information and incoming stimuli (see also McNamara, 2011; Fernando et al., 2012).

While the argument of this book is not contingent upon there being a specific topography of neuronal structures – it requires only that discrete phenomena in the world are encoded discretely in the brain – subsequent work on spatial location encoding in the entorhinal cortex has supported Calvin’s model (Fuhs & Touretzky, 2006; Shrager et al., 2008; Burak & Fiete, 2009; Doeller et al., 2010; Mhatre et al., 2012; Killian et al., 2012; Stensola et al., 2012). Indeed, this research, together with accounts of the tonotopic organisation of the auditory cortex (Zatorre, 2003, p. 233), and of the phototopic/retinotopic organisation of the visual cortex (Braitenberg & Braitenberg, 1979; Reichl et al., 2012a; Reichl et al., 2012b), not only suggests deep similarities between brain representations of a variety of sensory inputs, but also indicates that, for all the astonishing complexity of neuronal interconnections, a triangular-hexagonal disposition of cortical minicolumns activated in a SFP is a recurrent structural-topographical configuration. Thus, while some twenty-five years old, a considerable time period when seen in the light of the rapid progress of neuroscience, more recent work has nevertheless supported the claims of the HCT and evidenced its applicability to a number of areas of brain function of relevance to music and musicality.

Recalling Marin and Perry’s (1999) assertion cited in §2.7.7 that language and music are “hemispheric specialisations” of a previously bi-lateral organisation of musilinguistic vocalisation (page 213); and understanding this differentiation in the light of the HCT, it might be hypothesised that, over the course of hominin evolution, right-hemisphere hexagonal plaques representing increasingly discrete (FOXP2-segmented?) sonic units (protemes) were yoked (by means of connections to be discussed in §3.8.4 ) to left-hemisphere hexagonal plaques regulating their syntactic interrelationship and semantic content – these perhaps even implementing a proto-LF – thereby engendering the lexemes of compositional language.

3.8.4 Implementation of Linguistic Syntax in the Light of the HCT

Carruthers’ suggestion that “distinct domain-specific sentences might be combined into a single domain-general one” by means of “multiple embedding of adjectives and phrases” (2002, p. 669) (§3.8.2 ) – a means for the implementation of his central hypothesis – has a ready mechanism in the HCT. Calvin suggests that hexagons encoding certain kinds of mental data in one part of cortex are connected to others encoding different kinds of data in other regions. Moreover, and invoking an idea of Damasio’s (1989), he argues that “there are specialized places in the cortex, called ‘convergence zones for associative memories’ [or ‘association cortex’], where [representations in] different modalities come together” (Calvin, 1998, pp. 129–130; from Calvin, 1996, p. 117). Calvin speaks of “hashing” or indexing – abstracting the attributes of a “distributed [domain-specific] ‘data base”’ in order to create a “centrally located [domain-general] representation” (CLR) – the mechanism for which appears to be hash/index-hexagonal overlapping/interdigitation in association cortex (1998, pp. 17, 135, 207).

The connections between domain-specific hexagonal codes (a sub-committee, to adapt one of Calvin’s metaphors (1998, p. 45)) and the fully “associated” domain-general LF code (a master committee) are achieved by certain types of “corticocortical projections” that go beyond the localised connectivity responsible for supporting triangular/hexagonal arrays and that involve links that “can go long distances, as from one hemisphere to another …, though most only make a U-shaped passage through the white matter of one gyrus and then terminate in a nonadjacent patch of cortex that’s only a few centimeters away” (1998, p. 131). Because such links are able to reconstitute the hexagonal plating of one area of cortex in another, Calvin terms them a “faux fax” and, writing in the mid-1990s, likens them to hyperlinks in the then nascent internet (1998, pp. 125, 131).

Figure 3.15 (Jan, 2011a, sec. 4.3.2, Fig. 13; see also Jan, 2016c, p. 459, Fig. 4) shows how the process might function in general terms. Note that the entities in the North-West, North-East and South-West quadrants might represent variously musemes, lexemes, or domain-specific thought; and that the structure in the South-East quadrant (the CLR) is a higher-level museme, musemeplex or a musemesatz3.5.2 ), or a domain-general LF.160

PIC

Figure 3.15: Calvinian Implementation of Structural-Hierarchic Abstraction/Integration.

The following is an overview of how certain key aspects of language syntax are implemented by the HCT, faux-fax linkages, and abstraction to a CLR:

1.
The adjectival modification of a noun may be accounted for by “simple borderline superposition of hexagons” (Calvin, 1998, p. 193). Beyond a certain point (several adjectives and, perhaps, prepositions), however, superposition runs the risk of creating an unspecific – Bickertonian lexical protolinguistic (1998, p. 193) – mix of words, the solution to the potential chaos of which is recursive hierarchical embedding (see point 4 below).
2.
The binding of a pronoun to its referent may be accomplished by a faux-fax link that connects the representations of these two words, even if they are in different sentences (1998, p. 194).
3.
The long-range dependencies of wh- questions are similarly implemented (1998, p. 194). The assumption for both point 2 and point 3 is that the faux-fax linkages are bidirectional. Using the metaphor of a choir, Calvin argues that “[b]ack projections … can use the same code, and so immediately contribute to maintaining a chorus above a critical size …. A backprojected spatiotemporal pattern might not need to be fully featured, nor fully synchronized, to help out with the peripheral site’s chorus” (1998, p. 194).
4.
Recursive embedding – which is “at the very top of [linguists’] Universal Grammar wish list” (1998, p. 194) – is implemented by faux-fax links that allow higher-level concepts to connect representations of subsidiary parts of a sentence intelligibly.161 According to Calvin, “if either subchorus [a discrete clause] falters, the top-level one [the integrity and sense of the sentence as a whole] stumbles” (1998, p. 194). Calvin gives the example of the sentence “I think I saw him leave to go home” (computationally/hierarchically, X://I think/I saw him/leave/to go/home), wherein the Darwinian success of the hexagonal colonies representing the top-level think verb is dependent upon the survival of the saw and leave verb colonies connected to it via faux-fax links. In a process of “stratified stability”, “[i]f the leave link stumbles, the saw hexagons might not compete very effectively and so the top level [think] dangles” (1998, p. 195). For this system to work, “[e]ach verb has a characteristic set of links: some required, some optional, some prohibited” (1998, p. 195) – termed valency in §3.8.2 .

Such connections and their associated hierarchic relationships appear to be key to the nature of LF. Moreover, the various references to specific parts of speech here arguably apply primarily to their LF representations, as functional encodings, and only secondarily to the associated (tokening) lexemes.

To summarise, the HCT (and with it faux-fax linkage and the Darwinian competition between cortical hexagons) is a candidate mechanism for Carruthers’ central hypothesis of language as the medium for domain-general thought (§3.8.2 ). This is because it affords a means by which hexagons encoding domain-specific representations of “mental models or mental images” in various regions of the brain can be interconnected to (left-hemisphere-situated?) domain-general/LF conglomerations. These LF structures can then be similarly associated with those (right-hemisphere-situated?) hexagons encoding the coadapted lexemes that render the LF conscious.

3.8.5 Semantic Homologies between Language and Music

One might extend and support the discussion in §3.8.4 by considering how musemes might also bear semantic content by virtue of mechanisms analogous to those linking linguistic LF structures – which integrate domain-specific meanings to form a domain-general representation – to lexemes. Of course, many would argue that music has a semantic as well as an affective dimension (see, for example, Nattiez, 1990; Scruton, 1997; L. Kramer, 2002). What I am hypothesising here is that the mechanism by which this operates is parallel with that operating in language. In this sense, music is understood as acting as a kind of degraded language, retaining some of the semantic capacity of musilanguage by virtue of its ability, like the sound patterns of its antecedent, to become associated (sometimes arbitrarily, sometimes not) with extra-musical concepts, but lacking the kind of rich, semantically implicative syntax of language (point 13 of the list on page 198). Clearly music has its own highly sophisticated syntax, but this is, to recall Agawu’s distinction from §2.7.3 , generally more introversive than extroversive (1991, p. 23); so whereas the inversion of words in a sentence might have global syntactic and semantic effects, a comparable inversion in music might only perturb the local syntax (see also Patel, 2008, p. 259). I consider this issue further in §3.8.6 , arguing, nevertheless, that the LF structures that lexemes token might have an analogue/parallel in music, their neural substrates perhaps being partially interconnected.

To help focus the discussion, I concentrate here primarily on the topics of late-eighteenth century music (Agawu, 1991; Ratner, 1991; Allanbrook, 1992; Caplin, 2005; Monelle, 2006; Mirka, 2014), which, in Meyer’s terms, are broadly understood and widely held “connotations” afforded by musical patterns (1956, p. 258). Topics are abstracted and sustained by educated listeners from the historically contingent, indexical connections between certain musical patterns and specific extra-musical ideas. The former include dance-associated rhythmic sequences (“types”), together with more intangible associations of pitch and texture (“styles”) (Ratner, 1980, p. 9); the latter include generic notions of social hierarchy and specific concepts and images. The mechanisms that afford semantic content to topics seem applicable in principle to more private associations, such as those individual composers and listeners might form between particular passages and pieces of music and certain extra-musical ideas, and so they may be generalisable beyond the frame of reference considered here.

One means of mediating between music and language in this respect is through classical semiology, specifically its association of a signifier with a signified. As Saussure argued in his celebrated definition,

[t]he linguistic sign unites not a thing and a name, but a concept [the signified] and a sound-image [the signifier]. The latter is not the material sound – a purely physical thing – but the psychological imprint of the sound, the impression that it makes on our senses: the sound-image is sensory, and if I happen to call it ‘material’, it is only in that sense, and by way of opposing it to the other term of the association, the concept, which is generally more abstract. (in Nattiez, 1990, p. 3, emphasis in the original)

Mapping this onto the two conceptions of language and thought of §3.8.1 – communicativism and cognitivism – the following (mutually exclusive) assertions might be made:

1.
In the communicativist view, which aligns elegantly with Saussure’s definition, the “concept” is a domain-general, LF-implemented (unconscious) thought, whereas the “sound image” is one or more internally-heard (conscious) lexemes (and, it is argued, musemes).
2.
In a cognitivist interpretation, which arguably aligns less well with Saussure’s definition, the “concept” (broadly speaking the “function”, in constructionist terms) would be regarded as existing purely (and simultaneously) in the shape of one or more (presumably) unconsciously active and consciously internalised lexemes (the constructionist “form”), and not as a LF.

Figure 3.16 162 generalises the topical association between a museme m and a lexeme l (or a “lexemeplex” or complex of lexemes). By this, I mean that m is functioning in a broadly equivalent manner to l, in that both are internal/external sound-sequences that have the capacity to token LF-underpinned semantic associations. How one conceives the detailed operation of this process is nevertheless dependent upon whether one adopts a cognitivist or a communicativist standpoint; as noted earlier, the latter perspective is adopted here. Note the following:

1.
From a cognitivist viewpoint, because most or all thought is understood to be conducted by means of the manipulation of language, any semantic content that might be possessed by m is wholly parasitic upon language, as the more fundamental medium.
2.
From a communicativist viewpoint, m’s semantic content may:
(a)
draw indirectly – i.e., via or mediated by language – upon the semantic elements of LF mentalese; but it may also
(b)
draw directly – i.e., unmediated by language – upon the semantic elements of LF mentalese

As will be argued in §3.8.6 , music might also draw directly upon the syntactic element of LF mentalese for its sequential structuring, in a manner that parallels language’s recursive-hierarchical organisation by LF mentalese.

PIC

Figure 3.16: The Memetic-Semiotic Nexus of an m-l Music-Language M(us)emeplex.

Figure 3.16 is organised according to three different dimensions. As will be evident as the discussion progresses, these relate in various ways to the hemispheric localisation of music’s and language’s neural substrates, discussed in §2.7.7 . One of these three dimensions is semiotic, in that it attempts to represent three distinct meaning levels, termed “Level One”, “Level Two” and “Level Three”. Another dimension represents the memome-phemotype (somatic-extrasomatic) distinction, whereby a (bold-type) formulation such as “m” refers to the memomic form of a museme m 3.8.3 ) and where the boxed “m” refers to its phemotypic expression. Note that the memomic level is in principle conscious and is to be distinguished from the unconscious mentalese/LF structures with which it is associated and which it tokens. The third dimension makes a distinction between the two evolutionary outcomes of musilanguage, music and language.163

In Figure 3.16 , (i) a, columns 1 and 3, and at the lowest level of referring, m – the physical sonority, through which m, via the intercession of voices (or musical instruments), impinges upon us most directly – is represented, in a “horizontal” memetic-semiotic relationship, as the phemotypic (coded-for) meme-product of the memomic (coding-for) m. Thus, m acts as a (somewhat abstract) signifier for m. mm is often associated with a grapheme GmGm, which partly governs the arguably superficial matter (from Carruthers’ point of view) of notating m and which, while not essential for its existence, is nevertheless (in the case of literate cultures) often significant for its transmission. The same principle is true, of course, in the case of lexemes.

By analogy with mm, columns 2 and 4 of Figure 3.16 , (i) a illustrate analogous relationships for the lexeme l, which codes for the spoken expression l. Paralleling GmGm, Gl is a grapheme coding for the written expression Gl. As with the music-related memes, the phemotypic forms l and Gl act as signifiers (again somewhat abstractly) for the associated memomic signified forms l and Gl, respectively.

As represented in Figure 3.16 , (i) b, columns 1 and 3, and at an intermediate level of referring, Gm also exists, now as a signifier, in “vertical” semiotic coadaptation with m, even though it is essentially independent of it (their relationship is “arbitrary” (Nattiez, 1990, p. 4)). m is similarly associated, as signified, with the corresponding phemotypic signifier meme, Gm.

Analogously, l and Gl function as signifiers of the signified language “interpretant-lexemeplex” Il. By this is meant the wider network of cognate lexemes that provides the context for l and that anchors it in a broader web of signification.164 The components of Il ultimately devolve, in a communicativist view, to the “back-end” LF-integrated “mental models and images” for which l (and Il) are the “front-end”. In this sense, Il is the essence of the “conscious propositional thought” (Carruthers, 2002, p. 664) tokened by l. As with the m-related memes, l and Gl function as signifiers of the signified Il.165

As represented in Figure 3.16 , (ii), and at the highest level of referring, the “diagonal” association between mm, as signifier, and IlIl, as signified, forms a m-l m(us)emeplex, one either confined to a particular individual166 or shared more widely (topically) within a cultural community. In such associations, the presence of the musical element triggers/cues the verbal in consciousness (or vice versa). In this sense, level-three semiosis corresponds not only to scenario 2a in the (second) list earlier in this subsection (page 407), but also potentially to scenario 2b – that is, the linking of musemes directly to the semantic elements of LF mentalese, displacing (or supplementing, in an intermediate state between scenarios 2a and 2b ) their normal lexemic token. Such “semantic elements” are the meanings arising from the interconnected mentalese codes for nominal, adjectival, verbal, prepositional, etc. functions – the “natural language lexical items and syntactic structures … stripped of all imagistic-phonological features” (Carruthers, 2002, p. 666) – that constitute LF.

This might be particularly the case with musemes that, on account of their strong image-schematic/embodied properties, link primarily iconically 2.7.6 ) with LF representations deriving from one or more of the domain-specific modules of Figure 3.14 . Nevertheless, in the case of topics, indexical linkages might also arise, because many topics have real-world (co-occurrent, albeit not always arbitrarily so) referents underpinning them, such as the emulations of horn and trumpet dotted rhythms that constitute the “military” style, or the bagpipe-like drones that define the “musette/pastorale” style (Ratner, 1980, pp. 18–19, 21; see also Monelle, 2006). In such cases, a context in which the instrument (or the dance rhythm, in the case of Ratner’s rhythmic types) is used affords meaning to the topic.

The various cells in Figure 3.16 are connected by double-headed arrows, which represent the associations or linkages between phenomena in different domains and substrates by which understanding and meaning emerges. While the representation of patterns and their linkages on a two-dimensional page is useful to foster clarity of exposition and discussion, it also appears the case that this mirrors, to some extent, real functional and structural localisation and interconnection in the brain. As intra-brain linkages, all the vertical and diagonal connections linking columns one and two of Figure 3.16 (shown as red arrows) can potentially be accounted for by the HCT3.8.3 ). Naturally, the horizontal connections from columns one to three and from two to four, and the vertical and diagonal connection between columns three and four (shown as blue arrows) cannot be accounted for in this way, because they are not intra-brain linkages but rather somatic-extrasomatic (inter-brain) associations. In the case of columns one and two, however, the red double-headed arrows are the graphical equivalent of the faux-fax links that Calvin (1998) argues connect representations in one region or functional domain of the brain with those in another.

If the argument of this subsection is true, then one might ask why music is not as semantically specific as language. One reason might be that what might be termed an evolutionary “wedge” effect came into play after the bifurcation of music and language from musilanguage. That is, after separation their evolutionary paths diverged ever more widely because of the need for compositional language, as the information-communicating successor to musilanguage, to remain broadly coherent and specific to all members of a socio-linguistic group, and the concomitant relaxation of this constraint upon music once language had began to bear this burden.167 Put another way, the Humboldtian nature of language – its compositional recombination of a relatively small number of component elements to form a near infinity of conceptual/propositional utterances (§5.6 ) – developed along more syntactically and semantically circumscribed lines than was the case in music.

Freed of its precursor’s obligation to encompass referentiality, music was increasingly able to fulfil less tangible – but no less evolutionarily important – roles, particularly the fostering of group cohesion through (holistic and multimodal) communal physicality and pleasure (§2.5.2 ), still alive today in the throbbing beats of clubs or, virtually, in the speakers of an MP3 player. This observation accords broadly with critical views on non-vocal/non-texted music from the early-Romantic period, which celebrated it precisely because it lacked the conceptual precision of language and instead communicated more generalised, holistic phenomena. For E.T.A. Hoffmann (1776–1822), author of perhaps the most celebrated of such statements (Chantler, 2006), instrumental music

is the most romantic of all the arts – one might almost say, the only genuinely romantic one – for its sole subject is the infinite. The lyre of Orpheus opened the portals of Orcus – music discloses to man an unknown realm, a world that has nothing in common with the external sensual world that surrounds him, a world in which he leaves behind him all definite feelings [and concepts] to surrender himself to an inexpressible longing [Sehnsucht]. (in Strunk et al., 1998, p. 151)

This is not to argue that music is a “universal language”, even though there are clearly certain “musical universals” (§2.5.5 ) resulting from various evolutionarily shaped physical and perceptual-cognitive constraints (Lerdahl, 1992; Velardo, 2014). Nevertheless, whereas we can glean very little linguistic information from speakers of languages with which we are unfamiliar, the music of other cultures often speaks to us directly and powerfully, despite its initial strangeness to us and our unfamiliarity with the details of its semantic and syntactic conventions. Moreover, while we might be oblivious to the grammatical structure of an unfamiliar language, we can discern a good deal of emotional information from its specifically musical elements – from the musilanguage-derived intonation of the speaker in conjunction with their facial expressions and body language. In such situations, we are transported back to the world of our hominin ancestors and compelled to activate our capacity to engage with the holistic, the manipulative, the multi-modal, the musical and – perhaps most important – the memetic.

3.8.6 Implementation of Musical Syntax in the Light of the HCT

If the communicativist view of language is one of left-hemisphere LF tokened by imagined and spoken right-hemisphere lexemes, could introversive/syntactic musical “thought” also be conducted in a form of mentalese – a left-hemisphere LF grammar of music – before association with the right-hemisphere musemes that give rise to imagined and vocalised (conscious) music? This question is an extension of point 2b – the potential for music to “draw directly – i.e., unmediated by language – upon the semantic elements of LF mentalese” – of the list on page 407, whereby not (just) the semantic but also the syntactic elements of LF is drawn upon. This extension is more problematic, because while these two dimensions are closely interconnected in language, they are clearly more independent in music.

It seems the case that processes covered under point 4 of the list on page 403 might also account for the representation of syntactic-hierarchic structure in music, such as that encompassed by the RHSGAP model (§3.5.2 ). In the same way that “faux-fax links … allow higher-level concepts to connect representations of subsidiary parts of a sentence intelligibly” (to form a fully associated domain-general LF code), they might also connect subsidiary parts of a musical phrase together under some overarching “higher-level concept”, which might be represented by such music-theoretical models as a framework harmonic progression, a “structural-melodic line” (Ratner, 1980, 89, Exx. 6–7), a Schenkerian Zug (Schenker, 1979, pp. 43–46), or some other schema (Leman, 1995; Byros, 2009). Moreover, in the same way that the structure of a clause is replicated recursively at the level of the sentence, and the multiply embedded clausal structure of a sentence is replicated at a higher level across a number of sentences, the same may be true for music. Deliège’s notion of cue abstraction and/or Gjerdingen’s concept of Il filo (the “thread”, along which a discrete series of schemata are arranged) might be candidate psychological models of this neurobiological process (Deliège, 2000; Cambouropoulos, 2001; Gjerdingen, 2007a, p. 369; see also Jan, 2010).

In this sense, music’s syntax – which has been the subject of extensive language-orientated speculation ranging from the rhetorical schemata of the seventeenth century (Bonds, 1991) to the Chomskyan applications of the 1980s (Lerdahl & Jackendoff, 1983) (§4.5 ) – might, as suggested in §3.8.5 , be to some extent dependent upon:

  • Some degree of interconnection with (linguistic) LF (the two systems operating in parallel, to use the model of Brown et al. (2006) in the quotation on page 210); or upon
  • A dedicated musical analogue to linguistic LF – musical LF – perhaps proximally located to linguistic LF in the brain (this notion going beyond the semantically orientated claims of §3.8.5 ) (domain-specific); or indeed upon
  • Some hybrid (musilinguistic) LF-system (shared).

Thus, while music-language homologies were discussed in §3.8.5 in terms of semantics, it is possible that syntax might also be implicated, given the close alignment of the latter with the former in LF. While further research is needed – this being to some extent contingent upon ever finer resolution in neuroimaging technologies – there is some neurobiological evidence for a LF-underpinned syntax of music, in that Brodmann areas 44 (pars opercularis) and 45 (pars triangularis) – Broca’s area in the left hemisphere – appear to implement a parallel “syntax/phonology interface area” subserving these functions in both domains (see the shaded cell in Table 2.1 ); and BA 22 – Wernicke’s area in the left hemisphere – appears to implement a parallel “phonology/syntax interface area” (Brown et al., 2006, p. 2798, Fig. 5). Moreover, Patel goes so far as to propose a “shared syntactic integration resource hypothesis” (SSIRH), which asserts that language and music “have distinct and domain-specific [parallel] syntactic representations (e.g., chords vs. words), but that they share neural resources for activating and integrating these representations during syntactic processing” (2008, p. 268, emphases mine; see also Fitch, 2010, p. 477).

The argument for an LF syntax of music runs as follows, and requires three coordinated “ifs”:

1.
If music and language did share a common ancestor in the form of musilanguage; and
2.
If sonically depleted but semantically rich language is a reflection of an underlying brain-language (the communicativist claim); and
3.
If the latter attribute was present originally in musilanguage

then sonically rich but semantically depleted music could have retained some element of this communicativist attribute. In this way, both evolutionary descendants of musilanguage might have retained certain elements of an ancestral, now to some extent shared, LF mentalese.

The third “if” is perhaps the most problematic in that, in its archetypal form, musilanguage (as discussed in §2.7.6 ) was likely a syntactically undeveloped form of communication, lacking the compositionality of fully developed language. As Carruthers argues, “it is natural language syntax which is crucially necessary for inter-modular integration” (Carruthers, 2002, p. 658, emphasis in the original). If his model is taken to hinge upon the underpinning and constitution of language by some form of mentalese-level, syntax-articulating LF, then perhaps the non-compositional musilanguage does not in fact implement it, and the argument for any evolutionarily persisting communicativism in music therefore falls. But if some form of communicativism does not require a fully developed syntax – if, in other words, it allows various shades of syntax, including the “protosyntax” potentially underpinning later, more developed forms of musilanguage (and, indeed, Bickertonian lexical protolanguage) – then musilanguage, and with it its evolutionary descendant, music, might indeed be amenable to a communicativist interpretation.

The latter would appear to be the more likely scenario, because – recalling the gradualistic reframing of Chomsky’s “great leap forward” in §1.3 , and Merker’s account of the vocal learning constellation in §2.7.5 and §2.7.6 – musilanguage likely evolved into language and music over many millennia by means of gradualistic cumulative selection, and not by means of saltationist single-step selection. This accords with the general view in evolutionary theory that even a little bit of a good thing is preferable to none of it (Dawkins, 2006, pp. 125–126). One piece of evidence in favour of such “shades of syntax” might be derived from the earlier discussion on segmentation (§2.7.6 ). Once the processes engendering segmentation had started to have their effect on musilanguage, the medium would be in a transitional phase – one presumably lasting many hundreds of thousands of years – where attributes of both older musilanguage and newer compositional language were simultaneously present, musilanguage acting as a framework or scaffold for the newer form of communication before finally being supplanted by it (the “safety net” phenomenon discussed in point 19 of the list on page 200). The argument advanced here is that this “post-musilanguage” possessed just enough syntax – as a proto-LF – to give rise both to compositional language, communicatively understood, and to music evolving on the basis of an underlying communicativist dualism between some form of perhaps partially shared LF mentalese and imagined (musemic) sound.

If, on the basis of the above, the third “if” is held to be true, then both language and music would appear to draw upon some form of (partially shared) LF representation. In language, this can be represented in terms of Chomsky’s generative-transformational grammar. In the literature of music theory, there are, as mentioned towards the beginning of this subsection, various music-theoretical representations of the syntactic basis of music, with one in particular, Lerdahl and Jackendoff’s GTTM (1983), being the most explicitly (Chomskyan-)linguistic, although it is one that is in its very formulation chronologically and stylistically circumscribed. Despite the common parasitism of music theory upon models derived from language, and as the final part of §2.7.7 suggests, the likely evolutionary precedence of music over language suggests that linguistic syntax is derived from musical syntax, not vice versa.

Evidence for the SSIRH – as a corollary of a shared music-language LF representation – may be found in studies, some involving neuroimaging, of music-language co-processing, where violations of musical or linguistic syntax (and linguistic semantics) are observed to affect processing speed and/or acuity in the other domain (T. Collins et al., 2014, p. 51). The mechanism for this activation/integration in music might thus involve the same kind of (faux-fax) connections between right-hemisphere music centres and left-hemisphere semantic-syntactic LF centres discussed in §2.7.7 . This reinforces the view articulated in connection with the quotation from Harvey (2017) on page 213 that, even after the bifurcation of musilanguage, both domains continued to retain significant structural and functional homologies, because it was evolutionarily inefficient for them wholly to implement a separation in their input, syntactic-semantic representation, or output systems.

3.8.7 Escaping Determinism via Evolution

As a final issue, and by way of drawing together some observations in the preceding subsections, there are clear alignments between Carruthers’ (2002) model of the mechanism of thought and consciousness, its possible neural implementation via the HCT, and the “symbolic-representational system” (1991, p. 489, Fig. 6) underpinning Sereno’s cell/person (1991, p. 478) discussed in §1.6.2 . Proposing a common mechanism for protein synthesis and language reception/production, Sereno argues that

a unique single-celled symbolic-representational system first arose from a prebiotic chemical substrate at the origin of life, permitting Darwinian evolution to occur. Subsequently, multicellular organisms evolved and they developed more and more elaborate humoral and neural control mechanisms. But … a similar, autonomous symbolic-representational system did not reemerge on any intermediate level until the origin of thought and language from the substrate of prelinguistic neural activity patterns in the brains of Pleistocene hominids. (Sereno, 1991, p. 484)

The motivation for this “reemergence” – which does not result from homology (evolutionary descent) but from homoplasy (convergent evolution) alighting upon another implementation of the same robust solution at a different structural-hierarchic level – is that

the apparatus involved in cellular protein synthesis, and the neural patterns underlying human language comprehension are both mechanisms for escaping ‘determinism’.… The pre-existing (prebiotic, prelinguistic) states can be described as complex, highly interactive, but deterministically evolving, ‘soups’ containing a number of different types of dynamically stable units (prebiotic molecules, prelinguistic neural activity patterns). The problem is simply to encode, use, and reproduce information about how to make certain ‘reactions’ (chemical reactions, alteration and recombination of neural activity patterns) in this soup happen. … In this sense, the resulting system is ‘intentional’. (Sereno, 1991, p. 484)

The main elements of this “apparatus” in cell metabolism and language processing are briefly summarised as follows, with key functions/structures italicised: (i) a collection of symbols (DNA triplets; word-sounds) exist in a chain (DNA sequence; word-sound sequence (lexeme phemotype)); (ii) this chain is converted to a symbol representation (transfer RNA (tRNA) sequence; secondary auditory cortex (Wernicke’s area) activity pattern); (iii) a chain assembler (ribosome; secondary auditory cortex activity pattern) builds a parallel “thingrepresentation (amino acid; secondary visual cortex activity pattern (objects and phenomena in the world represented in visual memory and presumably in other memory modalities)); (iv) the “thingrepresentation is linked to the symbol representation by a 3-D connector (aminoacyl-tRNA synthetase; secondary visual cortex activity pattern); (v) a reaction controller is built from the “thingrepresentation (enzyme; STM/working memory pattern (internalised lexeme sounds)) in order to act on internal objects (various enzyme substrates; mental activity patterns in various domains and modalities); and (vi) just as a non-arbitrary relationship connects symbols with symbol representations, a similarly non-arbitrary relationship connects “thingrepresentations to external “things” (prebiotic chemical compounds; prelinguistic activity patterns in the primate brain) (after Sereno, 1991, p. 489, Fig. 6, p. 491, Tab. 1).

In this outline, the “thingrepresentation in language is implemented by LF (the domain-general integration of domain-specific representations) and the symbol representation is the lexeme-sequence (the tokening of LF by internalised word-sounds). All the various mental representations, as might be expected, are able to be encoded according to the precepts of the HCT, with the necessary faux-fax linkages providing longer-range connections between brain regions (such as the communication between auditory and visual cortex and inter-hemispheric connections). While Sereno (1991) does not consider the finer details of neural organisation, such connections between right-hemisphere lexeme-sound representations and left-hemisphere syntax and semantic centres implementing LF appear to be key here, and subserve the chain assembler and 3-D connector functions (and presumably interconnect with the visual cortex centres identified by Sereno). As argued in §2.7.7 , much of this neural infrastructure initially arose in response to the evolution of musicality. Sereno’s model is thus also congruent with music as a phonological-syntactic-semantic system, and references to “lexemes” in the summary above can potentially be replaced by “musemes” as the original mechanism for escaping determinism in cultural evolution.

Sereno’s is primarily a model of language comprehension, represented in biology by the “understanding” (translation) of the DNA code (symbols) in order to produce protein “meanings” (“thingrepresentations). The reverse process, language production, would contradict the Central Dogma1.8 ). If it obtained, a mechanism for Lamarckism would exist, because proteins could back-alter DNA, giving rise to “a more thoroughgoing, minute-to-minute Lamarckianism than has ever been conceived for biological organisms” (Sereno, 1991, p. 487). Production is, of course, fundamental in language and music, because communication is a two-way process requiring that “things” (prelinguistic, domain-specific meanings) can be used to generate symbols for them that are comprehensible to others. Thus, whereas comprehension is the exclusive mode in cells, both comprehension and production operate in language. Nevertheless, Sereno, following Sapir, emphasises what he regards as the primary motivation for language, namely comprehension via symbolisation (Sereno, 1991, p. 486). This motivates the question of how the symbolic-representational system of language evolved. The account presented here of the evolution of compositional language from musilanguage implies – contra Sereno – that (domain-specific) meanings were represented in hominin brains before a (musi)linguistic system evolved for integrating and symbolising them, initially via protemes and subsequently via musemes and lexemes.

3.8.8 Summary of Music-Language (Co)evolution

To summarise the main conclusions of §2.7 and §3.8 , the following has been argued:

1.
Music and language are two sides of the same evolutionary coin. The appearance in the hominin line of the capacity to produce and control vocalisations was the result of numerous interacting aptive factors. Eventually, a holistic musilanguage arose that subserved a number of functions, initially aptive only for genes but increasingly also aptive for memes.
2.
Once the neural substrates for the segmentation of musilanguage were in place, it was inevitable that the chunks of sonorous information resulting from this process would be subject to the operation of the VRS algorithm. Computer simulation of the mechanism, and of the ever tighter association of meanings with sound-segments, offers telling evidence of its likely validity.
3.
The replicated sound patterns of language are arguably proxies of a more fundamental mental language, LF. Structures in this medium foster the integration of concepts in different domains to form multi-modal syntactic-semantic complexes that, in conjunction with sonic replicators, are not only amenable to consciousness but that also confer significant evolutionary advantages upon individuals who possess this facility.
4.
Lexemes and musemes appear to be encoded in the brain in broadly similar ways – by means of hexagonal encoding, cloning and Darwinian competition – and they are predominantly right-hemisphere localised. This constitutes further evidence for their common evolutionary origin in musilanguage. The syntactic structures encoding LF appear to be predominantly left-hemisphere localised. Faux-fax links connect the two types of representation, allowing the cross-hemispheric tokening of LF by musemes and lexemes.
5.
The mechanisms by which language acquires semantic content appear broadly replicated (albeit more loosely) in music, and might be understood in terms of multi-level semiotic process spanning different replicator domains (memome, phemotypic). Moreover, it may be the case that elements, or analogues, of LF structures might also subserve music’s syntactic organisation.

3.9 Summary of Chapter 3

Chapter 3 has argued that:

1.
To consider musicality and its products in purely biological terms is inadequate. A dual-replicator coevolutionary model is needed that takes account of both gene-based biological/musicality evolution and museme-based cultural/music evolution as instantiations of the VRS algorithm.
2.
A number of theories of cultural change were developed in the twentieth century, in an attempt to find cultural equivalents to the gene and the structures it engendered. Of these, the memetics conceived by Dawkins and championed by Dennett arguably shows the greatest potential.
3.
Issues relevant to the ongoing development of memetics include the nature and status of qualitative versus quantitative evidence; how the biological concepts of adaptation and exaptation might be applied to (music-)cultural evolution; and the extent to which the status of memetics as a Darwinian model is undermined by potentially Lamarckian factors.
4.
A significant contribution memetics can make to our understanding of music is in its formalisation of pattern-replication at multiple structural-hierarchic levels. This is relevant to many dimensions of music, including the generation of recurrent higher-level structures, and to the creation of music in improvisation and to its recreation in performance.
5.
While slavishly applying biological-taxonomic principles to music is unwarranted, the recursive ontology driven by the VRS algorithm leads to certain systemic-structural parallels between the processes and products of biological and cultural evolution that can illuminate a sensitive cultural taxonomy.
6.
Coevolutionary accounts of human musicality and music attempt to reconcile the sometimes conflicting interests of each replicator system and to understand how their genomic/memomic levels interact with their phenotypic/phemotypic levels to produce the musical competences and products that depend on both replicators. More than any other species, humans are defined by the rich cultures encephalisation made possible, this brain-augmentation having itself perhaps been impelled by culture, via the mechanism of memetic drive. As a result, music’s development has far transcended what might have been predicted on purely biological-morphological grounds.
7.
Memetics fosters a deeper understanding of the structural-evolutionary relationships between music and language, arguing that both are made up of discrete, replicated sound-parcels that are amenable (language more so than music) to association with objects and meanings. A hypothesis for musical and linguistic syntax and semantics is afforded by the Logical Form of Chomsky and Carruthers, this perhaps being implemented by the Hexagonal Cloning Theory of Calvin.

Chapter 4 will build upon the extension of gene-based Darwinism to the meme-based Darwinism outlined in this chapter in order to explore how evolutionary metaphors have been employed in scholarship on music to explain the style and structure of music over time. Taking the implications of this chapter to their logical conclusion, Chapter 4 considers how discourses on music (evolutionary and indeed non-evolutionary) are themselves amenable – as music-historical and music-theoretical/analytical verbal-conceptual memeplexes – to the VRS algorithm. It will: explore the issue of metanarratives and metaphor in musical scholarship; examine evolutionary metaphors in music historiography and music theory and analysis; consider – as part of the ongoing discussion of music-language coevolution – how linguistic tropes have been used in music-scholarly discourses; discuss how the evolution of music-scholarly discourses can be theorised and quantified; and explore the complex coevolution of music, the socio-cultural structures that sustain it, and the discourses that seek to comprehend it.

89 In addition to the (Darwinian) operant conditioning theorised by (Skinner, 1953), there exists (non-Darwinian) classical (Pavlovian) conditioning, where a neutral stimulus elicits expectation of a reward, the former having previously been associated with the latter. 90 Only this category of creature would appear fully able to deploy the intentional stance described in the quotation on page 100.

91 Not to be confused with epigenetics 1.8 , §3.4.3 and §4.4.1.1 ), epigenesis is the “[o]rigin during ontogeny of structures from undifferentiated material” (E. Mayr, 1982, p. 958).

92 Plotkin (1995) conflates levels (i) and (ii) into the “primary” – “genetic-developmental” – heuristic (1995, p. 138). Level (iii) is termed the “secondary heuristic” (1995, p. 149), and level (iv) the “tertiary heuristic” (1995, p. 206).

93 On its first mention, Laurent (1999, p. 1) mistakenly gives the name of Maeterlinck’s text as The soul of the white ant, which is in fact a work of 1925 by the ethologist Eugène Marais (Marais, 2017), from which Maeterlinck (1927) plagiarised his text.

94 Appendix I of Campbell (1974, pp. 457–458) lists sources on “trial-error and natural-selection models for creative thought”; see also Appendix II (1974, pp. 458–459), which lists sources on “natural selection as a model for the evolution of science”.

95 In some disciplines, such as anthropology, memetics is often cited in the context of criticism (Kuper, 2000), in part because memetics counters the holistic and static view of culture offered by anthropology with its own particulate and dynamic alternative.

96 “Scopus is the largest abstract and citation database of peer-reviewed research literature including … [o]ver 24,000 titles, including 4,200 Open Access journals from more than 5,000 international publishers” (Scopus, 2020).

97 Kuhn generally terms such “turning points” “paradigm changes” but “paradigm shifts” has become more common (2012, pp. xxiii, 52), perhaps because it is a superior meme.

98 CiteSpace can minimise node- and cluster-label overlaps in visualisations, but this function is not used in Figure 3.2 (or in Figure 4.10 4.6 ), which explores publications containing the terms “music” and “gender”), in order to associate as closely as possible the centres of clusters with their generative node(s). Sources obscured by overlapping labels are clarified in the text.

99 The report detailing the outcome of CiteSpace’s extraction of data from the .ris bibliographic citation file exported from Scopus states that “159 records [were] converted …. Total References [i.e., citations of literature within sources]: 6,880[;] Valid References: 6,859 (99.0%)”. It should be noted that, as is often the case with Scopus records, there is a certain amount of duplication in the data (i.e., the same article is listed as two ostensibly separate records), and so a further stage of processing was undertaken, which reduced the sample size to 119 unique records.

100 There is a risk in enumerating the analytical outcomes of programs such as CiteSpace that one ends up in the position of Borges’ map-makers in his short story On exactitude in science (1946), who decided that only a map of scale 1:1 would be adequate; thus, “the Cartographers[’] Guilds struck a Map of the Empire whose size was that of the Empire, and which coincided point for point with it” (1998, p. 325).

101 The standard layout of CiteSpace visualisations prioritises the conceptual-spatial over the chronological, in that clusters further away from the centre are not necessarily later in their formation (one can extract an average year for each cluster, which “indicates whether it is formed by generally recent papers or old papers” (Chen, 2020, sec. 4.2)). The program’s “timeline view” inverts this prioritisation.

102 Note that these are essentially the same museme – strictly, all instantiations of either form of the pattern belong in the same museme allele-class 3.5.2 ) – and that they differ primarily in respect of their structural location (Jan, 2010, pp. 11–13).

103 A variant of this situation – the biological transmission of memory – features in an episode of the Paramount Television series Star Trek: Voyager (“Flashback”, Season 3, Episode 2, originally broadcast 11 September, 1996). The Vulcan Tuvok suffers from distressing memories caused not by observation or learning (memetic transmission) but by a virus (parasitic transmission) that created a person-specific (false) memory so horrible its bearer represses it, allowing the virus to survive undisturbed.

104 The same arguments might also be made in regard to claims of alien abduction: they are memeplexes acquired from others and from the wider culture, not repressed memories of traumatic past real-life events; and they are (sometimes) triggered by sleep paralysis, which heightens (by analogy with the family-unit repercussions of holocaust trauma) the susceptibility of individuals to the alien-abduction memeplex (Blackmore, 1999, pp. 176–178).

105 This melody is based on the Russian folk song “Slava bogu” (“Praise to God [in the highest]”) (Dearmer et al., 1928, 219, no. 107), used by Beethoven as a “Thème russe” in the third movement of his String Quartet in E minor op. 59 no. 2 (“Rasumovsky”) of 1806, and by Rimsky-Korsakov in his Overture on Three Russian Themes op. 28 of 1880.

106 Such parallel harmony over the dominant, here 6
3  chords, are also found in the piano writing of Stravinsky’s Petrushka of 1911, for example b. 1 of the Russian Dance (Rehearsal no. 33). I am grateful to Nicholas Bannan (personal communication) for this point.

107 I am grateful to David Fanning (personal communication) for this point.

108 At its most basic, a Markov chain is one in which event n of a sequence determines the range of options for event n+1 (§6.5.1.3 ).

109 Cope captures this idea with his notion of seemingly different “signatures” – formulaic, often cadential, patterns – that may be regarded as allelically equivalent because, when their embellishments are stripped away, their common structural core is revealed (2001, p. 48).

110 These movements are Haydn: Sonata in F major Hob. XVI: 23 (1773), I; Mozart: Sonata in C major K. 279 (189d) (1775), I; and Beethoven: Piano Sonata no. 3 in C major op. 2 no. 3 (1795), I.

111 The popularity of such therapies as the Alexander Technique (Woodman & Moore, 2012) among musicians testifies to the consequences of systematic deviation from natural body positions intrinsic to the mastery of certain instruments.

112 While the term “rubato” is often applied in a narrow sense to the performance of certain nineteenth-century piano repertoires, I am using it here more broadly, to refer to any deviation from “metronomic” tempo.

113 The empirical testing of the I-R model in relation to performance is not advocated in the “twenty experimental questions suggested by the Implication-Realization Model” (there are actually twenty-one questions listed) that form the conclusion of Narmour (1990, pp. 418–423).

114 These factors include, but are not limited to, the intrinsic constraints of musical instruments, such as the need, on many “non-pretuned” instruments, to hesitate/elongate whilst a pitch is consolidated (Nicholas Bannan, personal communication), an effect that might potentially transfer to other (“pretuned”) instruments via a player’s familiarity with both types of instrument, or even via hearing this effect.

115 To restrict this consideration to [P] and [R] is clearly to oversimplify Narmour’s (1990) complex theory, but it nevertheless gives a flavour of how it might be applied to this issue. See also Jan (2007, pp. 129–133, Tab. 4.1).

116 Narmour argues that, in terms of I-R theory, inexperienced listeners hear the octave as a large interval, implying prospective [R]; whereas experienced listeners hear it as a register transfer (i.e., as the “same” note), with the option of perceiving it as a retrospective [(R)] (1990, p. 234).

117 Despite the ostensible stability of the tonic, Rosen gives an example (bb. 23–28 of the first movement of Beethoven’s Piano Concerto no. 4 in G major op. 58 (1807)) where rhythmically accelerating tonic-dominant alternations mean that Beethoven “turns this most consonant of chords … into a dissonance. … almost by rhythmic means alone …, the tonic chord of G major in root position clearly requires a resolution into the dominant” (1997, pp. 387–388, emphasis in the original).

118 The status of chord V is problematic in that, despite being a major triad situated in close (psycho)acoustic proximity to the tonic, it is often (contextually) relatively unstable in many styles.

119 Points 2e (decelerate around strong/accented beats) and 2f (accelerate around weak/unaccented beats) may reinforce points 2c (decelerate around/into relatively stable degrees) and 2d (accelerate around/into relatively unstable degrees) in this list, respectively, because there appears to be a correlation between the use of triads I, IV and V on strong beats and triads ii|ii  , iii|III, vi|VI and vii  |VII on weak beats (C. W. White, 2017).

120 Perhaps more than most other composers, the works of Chopin exist in numerous versions (many sanctioned by the composer) and associated editions, so Friedman and Rosen may have been playing from different editions (N. Cook, 1998, pp. 84–85).

121 This analysis focuses on the melodic line, while acknowledging that the arpeggiated left-hand accompaniment may have a potential (dragging) effect on the tempo in places.

122 Assigning a hypothesis to the equals sign (i.e., no significant tempo change at that point) is often problematic. In some cases, it represents a moment of stasis before a continuation of the tendency (acceleration or deceleration) represented by the immediately preceding symbol. In other cases, it is an apex point, before a subsequent movement in the opposite direction to that represented by the immediately preceding symbol (acceleration following deceleration or vice versa).

123 In keeping with the principle of parsimony just outlined, not every possible hypothesis (and its opposing hypotheses) is enumerated as an explanation for each observation. The reader will hopefully be able to identify the nurtural opponent(s) to a given natural force, and vice versa.

124 While many factors may break a sound-stream into discrete musemes – thus turning two adjacent pitches into initial and terminal museme nodes, respectively – often this juncture, and the resulting museme-parataxis, is articulated by I-R forces. See Jan (2010, pp. 19–22).

125 The word “rubato” appears in b. 17 of the first edition and in subsequent editions, which might imply a suggestion to return to the baseline tempo towards the end of, rather than at the beginning of, b. 17, but which Friedman and Rosen, with their rapid return to the previous tempo-range at the start of b. 17, do not take up.

126 A dendritic diagram, the only illustration in the Origin, is given in Darwin (2008, p. 90).

127 In some traditional societies, much of culture is transmitted vertically, from parent to young adult, and this is certainly true for early-years enculturation in most societies; but it is not the norm in technologically advanced societies, where children generally assimilate culture-fragments from peers from a relatively early age.

128 In the quotation on page 329, Darwin equates a language with a “race”, i.e., a group within a species.

129 I am using this term here in its broadest, least historically and aesthetically/philosophically loaded sense (Goehr, 1992).

130 The first elements of these pairs misalign the “Genetic-Structural” level five and the “Memetic-Cultural” level four of Table 1.4 ; and the second elements are aligned at the Genetic/Memetic-Structural level seven. See also §4.3.1 for a related issue.

131 It must be stressed that the criteria advanced in Table 3.3 are not hard-and-fast, and there are therefore many potential uncertainties. Moreover, examples are not given for the three categories of the Embryonic Development criterion, partly owing to space-constraints on presenting such evidence, and partly owing to the more fundamental issue – a challenge to this criterion in the case of its application to culture – that absence of evidence does not constitute evidence of absence.

132 Recall that a real musemeplex arises from the re-assembly of (more or less) the same museme-sequence; and that a virtual musemeplex arises from the re-assembly of the same museme-allele-sequence. 133 On account of this RHSGAP, this category overlaps, at a higher structural-hierarchic level, with “structural similarity/derived homology”.

134 The Romanesca schema consists of the melodic/bass scale-degree sequence 1|3/15/71/61/3 (Gjerdingen, 2007a, pp. 39–40, 454).

135 To avoid clutter, certain musemes in Figure 3.11e and 3.11f are not given the analytical overlay-symbology used in other music examples (§ ); instead, they are shown boxed.

136 Whereas the “Structural Similarity” row of Table 3.3 has criteria related to contextual/poietic connections, the “Relations to Surrounding Characters” row does not. Such connections should, however, not be disregarded when considering the latter criterion.

137 Figure 3.9a and Figure 3.9b arguably represent another instance of a derived homology, although it is debatable as to whether this category is tenable in the case of relationships between two passages by the same composer.

138 A set of weighted points is a group of discrete entities occupying multidimensional space, such as the notes of a museme, each assigned a relative weighting.

139 At the risk of terminological explosion, it is potentially useful to identify – by analogy with ontogeny – the concept of ontomemy, which might be defined as the accumulation and development of an individual’s meme complement/profile via education and enculturation over the course of their lifetime.

140 It might be argued that phylomemies differ from phylogenies in their potential for “cross-fertilisation”, whereby two lineages may share material, or even rejoin, after bifurcation. But this is also true, to a lesser extent, in nature, where gene-transfer between recently bifurcated lineages remains possible for a limited time.

141 The text of this ballad, and those of many others originating in the British isles, were collected by (Child, 1904); the associated melodies were collated by (Bronson, 1959).

142 This method of encoding might be further developed by incorporating rhythmic values, whereby, for instance, “bbb” =  and “b” = .

143 Note that these are “rooted” phylomemies: there is assumed to be an unidentified common ancestor to the left of the tree (Ridley, 2004, p. 439).

144 This may often be the case with oral transmission, where the principle of lectio difficilior potior – “the more difficult reading is the stronger” (Robinson, 2001) – might support one in ascribing chronological anteriority to a more complex form.

145 A UK television game show, charmingly named Shafted (2001), hosted by Robert Kilroy-Silk, was based on the Prisoner’s Dilemma game (IMDb, 2019). Its brief life-span – it was cancelled after only four episodes – was perhaps a result, among other deficits, of the nastiness of the defections and the evident distress of those who, in seeking to cooperate, were “shafted” and thus denied a monetary prize. Even in this manifestation, nature is revealed, in Tennyson’s phrase from In Memoriam A. H. H. (1849), as “red in tooth and claw” (Tennyson, 2007, 135 (Canto 56)).

146 Another issue inherent in dual-replicator coevolution is the seemingly greater speed of memetic versus genetic evolution. Blute argues that evolution rate (a function of fitness-enhancing variation) should not be confused with generation time; and that, for memes, the latter is significantly shorter in horizontal transmission than in vertical or oblique transmission (page 330) (2006, p. 160). For the latter two modes of transmission, “genetic and cultural generation times are necessarily equal, and all else being equal, rates of genetic and cultural evolution are necessarily identical” (2006, p. 160).

147 In addition to encephalisation, it is not unreasonable to hypothesise that memetic drive also fostered increasing brain plasticity (note 81 on page 224).

148 Blackmore notes that this stage is “a version of the Baldwin effect 1.8 ] …, which applies to any kind of learning – once some individuals become able to learn something, those who cannot are disadvantaged and genes for the ability to learn, therefore, spread” (2001, pp. 243–244; see also Podlipniak, 2017a).

149 Computer simulation broadly supports this hypothesis. While their model of song-evolution can be criticised for arguably not fully implementing clearly separate biological and cultural replicators, and (as they acknowledge) for not incorporating culturally acquired song-preferences, Werner and Todd (1997, p. 441) determined that “[w]ithout sexual selection, … simulation models have evolved little diversity in communication signals [i.e., songs; the ornament]. When instead we replace natural selection with sexual selection, signal diversity within and across generations blossoms. Our simulations here lend strong support for the role of co-evolving songs and directional (surprise-based) preferences in maintaining diversity over time …”.

150 In Blackmore (1999), this third stage of “selection for mating with the imitators” (1999, p. 78) is followed by a fourth stage of “sexual selection for imitation” (1999, p. 79), these two stages being conflated in Blackmore (2000a, pp. 32–33). The two phenomena are broadly equivalent, however, in that “selection for mating with the imitators” becomes “sexual selection for imitation” when one sex becomes established as the imitators (the bearer of the ornament) and the other sex becomes established as desirous of mating with them (the bearer of the preference).

151 Note that this is not a zero-sum game: increased encephalisation can benefit both genes and memes, although the benefits to the former need to balance the advantages – greater cognitive flexibility, including “Gregorian” (Table 3.1 ) situational modelling – with the disadvantages – increased danger during birth, higher nutritional demands – of greater brain size. Memetic drive hypothesises that there is a differential benefit to encephalisation, in favour of memes.

152 That humans and chimpanzees have followed c. six million years of separate evolution (Schaefer et al. (2021, pp. 7, 13) suggest a figure as high as c. 13 million years) might be regarded as making it impossible to triangulate the attributes of these three points – LCA, chimpanzee, human – in evolutionary time and space. Yet the clearly superior imitative abilities of humans in comparison with chimpanzees suggests Blackmore’s point is valid.

153 I adopt here Carruthers’ convention of using small capitals for concepts in mentalese and using italics for internalised and vocalised language utterances.

154 The integration of domain-specific representations by domain-general LF is essentially the process of representational redescription discussed in the quotation on page 11.

155 Structures located at the background, middleground and foreground layers are somatic; those elsewhere are extrasomatic. This hierarchic representation (after Schenker, 1979) is for expository clarity and is not intended to represent the topography of these functions in the brain, insofar as this is known (§2.7.7 ).

156 This relates directly to Levitin’s assertion, in the quotation on page 214, that “[d]uring 20 million years of evolution, it is not too difficult to imagine a new function evolving … where [a number of] regions [controlling music and language] meet, gradually enabling the brain to report what it is holding in consciousness – to start talking or singing about what it is thinking about” (2009, p. 292).

157 For the sake of expository clarity, the discussion suggests an element of unidirectionality; but in reality (and as implied by the double-headed arrows) it seems more likely that continuous bi-directional feedback loops connect structures at all three levels.

158 Blackmore also argues that consciousness presupposes a theory of mind and the associated capacity to ask “[a]m I conscious now?” (2005a, loc. 582; 2009, p. 41).

159 It should be stressed, however, that columns are not uniformly constructed across all brain regions, and that their architecture and connectivity differ substantially from region to region (Alan Harvey, personal communication; see also Tischbirek et al. (2019)).

160 The symbols “F1”–“F4” are relevant to a discussion in §6.5.1.2 .

161 These connections are encompassed by the issue of perceptual binding, which concerns the integration of information in different sensory modalities and brain regions into coherent representations (L. C. Robertson, 2005).

162 After Jan (2007, p. 104, Tab. 3.1) and Jan (2016b, p. 489, Fig. 2); the associated discussion is an extension of this earlier material.

163 For clarity, Figure 3.16 ignores the motor-control memes (a subset of which are the gestemes considered in §3.5.4 ) that govern the muscular actions engendering writing, speaking, and the production of musical sounds, many of which are learned as “implicit memory” (Snyder, 2000, pp. 72–74) and which might also be regarded as memes.

164 The term “interpretants” is Charles Sanders Peirce’s (Nattiez, 1990, pp. 5–6). In Gottlob Frege’s terminology, it aligns with the “sense” that qualifies and mediates the relationship between a term (a signifier/museme/lexeme) and its reference (a signified/object/concept) (Cross & Woodruff, 2009, p. 25).

165 In language, l, Gl, and Il give rise to an essentially unary product: the concept is effectively inseparable from its l, Gl, or Il manifestations, as symbolised by the curved brackets in column 4 of Figure 3.16 , (i) a/b. In music, however, a separation is maintained, because Gm and m give rise to separate products: the notation (Gm) and, separately, the sounds that the notation motivates and regulates (m). Thus, unlike language, these two musical replicators preserve the level-two signifiersignified dualism at the phemotypic level.

166 Strictly, such an (initially, perhaps eternally) unreplicated complex should be termed a mnemonplex.

167 This is a general phenomenon in evolution, primarily observable in the inability of two species with a common ancestor to interbreed after a certain period of separate development has elapsed.

Powered by Epublius