This paper presents an object lesson in the challenges and considerations involved in assembling a musical corpus for empirical research. It develops a model for the construction of a representative corpus of classical music of the “common practice period” (1700-1900), using both specific composers as well as broader historical styles and musical genres (e.g., symphony, chamber music, songs, operas) as its sampling parameters. Five sources were used in the construction of the model: (a) The Oxford History of Western Music by Richard Taruskin (2005), (b) amalgamated Orchestral Repertoire Reports for the years 2000-2007, from the League of American Orchestras, (c) a list of titles from the Naxos.com “Music in the Movies” web-based library, (d) Barlow and Morgenstern’s Dictionary of Musical Themes (1948), and (e) for the composers listed in sources (a)-(d), counts of the number of recordings each has available from Amazon.com. General considerations for these sources are discussed, and specific aspects of each source are then detailed. Intersource agreement is assessed, showing strong consensus among all sources, save for the Taruskin History. Using the Amazon.com data to determine weighting factors for each parameter, a preliminary sampling model is proposed. Including adequate genre representation leads to a corpus of ≈300 pieces, suggestive of the minimum size for an adequately representative corpus of classical music. The approaches detailed here may be applied to more specialized contexts, such as the music of a particular geographic region, historical era, or genre.
- Received October 21, 2012.
- Accepted March 13, 2013.
- © 2013 by The Regents of the University of California