A number of attempts have been made in the past 10 to 15 years to construct artificial systems that can simulate human expressive performance, but few systematic studies of the relationship between model output and comparable human performances have been undertaken. In this study, we assessed listeners' responses to real and artificially generated performances. Subjects were asked to identify and evaluate performances of two differently notated editions of two pieces, played by a panel of experienced pianists and by an artificial performer. The results suggest that expressive timing and dynamics do not relate to one another in the simple manner that is implemented in the model (Todd, 1992) used here, that small objective differences in the expressive profiles of different performances can lead to distinctly different judgments by listeners, and that what appears to be the same expressive feature in performance can fulfill different functions. Although one purpose of such a study is to assess the model on which it is based, more important is its demonstration of the general value of comparing human data with a model. As is often the case, it is what the model does not explain that is most interesting.