The surface structures of music and speech should coincide in musical prosody. Two processing systems have thus to be integrated, one devoted to the surface structure of speech, the other to that of music. This article is in two parts: a review of data on speech and music production and hearing, and two experimental studies on the synchronization between a rhythm and spoken sounds. In the first part, a comparison between some intensity and timing parameters that characterize the unfolding of spoken strings and of musical sequences is presented. Data from studies on performers (speakers, musicians) and listeners are compared with regard to spontaneous rates, location and duration of pauses, duration of sounds, and periodic occurrence of accents. In the second part, the ability to control the correspondence between taps and words is examined. Two experimental studies on 6-year-old children focus on the role of musical training. The reproductions of simple rhythms and simple sentences or onomatopoeias were analyzed as well as the coordination between a rhythmic sequence of taps and a spoken string. Young musicians succeeded better than nonmusicians of the same age in the synchronization between their verbal production and their motor accompaniment, mainly because they more markedly anticipated the musical string in which they integrated the spoken sounds subsequently. The results are discussed in relation to the acoustic, motor, and cognitive processes involved in the coordination of the two temporal strings.