Content area
Full Text
This study compared four common methods for scoring a popular working memory span task, Daneman and Carpenter's (1980) reading span test. More continuous measures, such as the total number of words recalled or the proportion of words per set averaged across all sets, were more normally distributed, had higher reliability, and had higher correlations with criterion measures (reading comprehension and Verbal SAT) than did traditional span scores that quantified the highest set size completed or the number of words in correct sets. Furthermore, creation of arbitrary groups (e.g., high-span and low-span groups) led to poor reliability and greatly reduced predictive power. It is recommended that researchers score span tasks with continuous measures and avoid post hoc dichotomization of working memory span groups.
Complex working memory span tasks, which require participants to fulfill both processing and storage requirements, are widely used in various areas of psychology (Miyake, 2001). Despite their popularity, no standard scoring methods exist; rather, researchers typically select a scoring method used in previous research. In some cases, researchers may try different scoring methods and select the one that seems the best, but the criteria used can vary widely across research labs. So far, little systematic effort has been made to compare and evaluate different scoring methods. In this report, we present in-depth comparisons of four common methods for scoring working memory span tasks, focusing on what is arguably the most prevalently used working memory task of all, Daneman and Carpenter's (1980) reading span test.
In the reading span test, participants read aloud sets of two to five or six sentences and attempt to remember the last word of each sentence. Usually, the participants begin with the easiest trials (those with two sentences) and work up to the most difficult ones (those with five or six sentences), but variations in which the trials are randomized also exist (e.g., Engle, Cantor, & Carullo, 1992; Friedman & Miyake, 2004b). In many cases, the task is terminated once a participant "fails" a level (e.g., if a participant fails to recall a majority of the trials in a level; Daneman & Carpenter, 1980), but some researchers prefer to administer all the trials to all the participants (e.g., Shah & Miyake, 1996). The latter procedure permits a...