MODELING SEMANTIC AND
ORTHOGRAPHIC SIMILARITY
EFFECTS ON MEMORY FOR INDIVIDUAL WORDS
Mark Steyvers
Submitted to the faculty of the University Graduate School
in partial fulfillment of the requirements for the degree
Doctor of Philosophy
in the Department of Psychology
Indiana University
September 2000
© 2000
Mark Steyvers
ALL RIGHTS RESERVED
Abstract
Many memory models assume that the semantic and physical features of words can be represented by collections of features abstractly represented by vectors. Most of these memory models are process oriented; they explicate the processes that operate on memory representations without explicating the origin of the representations themselves; the different attributes of words are typically represented by random vectors that have no formal relationship to the words in our language. In Part I of this research, we develop Word Association Spaces (WAS) that capture aspects of the meaning of words. This vector representation is based on a statistical analysis of a large database of free association norms. In Part II, this representation along with a representation for the physical aspects of words such as orthography is combined with REM, a process model for memory. Three experiments are presented in which distractor similarity, the length of studied categories and the directionality of association between study and test words were varied. With only a few parameters, the REM model can account qualitatively for the results. Developing a representation incorporating features of actual words makes it possible to derive predictions for individual test words. We show that the moderate correlations between observed and predicted hit and false alarm rates for individual words are larger than can be explained by models that represent words by arbitrary features. In Part III, an experiment is presented that tests a prediction of REM: words with uncommon features should be better recognized than words with common features, even if the words are equated for word frequency.
Acknowledgments
First and foremost, I would like to thank Rich Shiffrin who has been a great advisor and mentor. His influence on this dissertation work has been substantial and his insistence on aiming for only the best scientific research will stay with me forever. Also, Rob Goldstone has been an integral part of my graduate career with our many collaborations and stimulating conversations. I would also like to acknowledge my collaborators Ken Malmberg and Joseph Stephens in the research presented in part III of the dissertation and Tom Busey who provided both ideas and encouragement of any project of shared interest. I would also like to thank Eric-Jan Wagenmakers, Rob Nosofsky, and Dan Maki for their support and many helpful discussions. Last but not least, my friends Peter Grünwald, Mischa Bonn, and Dave Huber have always been supportive and I can highly recommend going out with these guys
Contact: Mark Steyvers at msteyver@psych.stanford.edu Stanford University. Building 420, Jordan
Hall, Stanford, CA 94305-2130, Tel:
(650) 725-5487, Fax: (650) 725-5699
Part I: Creating Semantic Spaces
for Words based on Free Association
Norms
Methods to
Construct Semantic Spaces
Word Frequency and the Similarity Structure in WAS
Predicting the Output Order of Free Association Norms
Semantic/ Associative Similarity Relations
Capturing Between/Within Semantic Category Differences in
WAS
Predicting Extralist Cued Recall
Part II: Predicting Memory
Performance with Word Association
Spaces
Semantic and Physical Similarity Effects in Memory
Word frequency effects in recognition memory
A memory
model for semantic and orthographic similarity effects
Recognition and Similarity Judgments
Predicting Individual Word Differences.
Appendix A
Words of Experiment 1
Appendix B
Words of Experiment 2
Appendix C Words
of Experiment 3
Part III: Feature Frequency Effects
in Recognition Memory
Model B: orthographic features
Appendix A
Words of Experiment 1
Appendix B
Means and standard deviations of the word frequencies and feature frequencies A
and B
Part
I:
Creating Semantic Spaces for Words
based on Free Association Norms
It has been proposed that various aspects of words can be represented by separate collections of features that code for temporal, spatial, frequency, modality, orthographic, acoustic, and associative aspects of the words (Anisfeld & Knapp, 1968; Bower, 1967; Herriot, 1974; Underwood, 1969; Wickens, 1972). In part I of this research, we will focus on the associative/semantic aspects of words.
A common assumption is that the meaning of a word can be represented by a vector which places a word in a multidimensional semantic space (Bower, 1967; Landauer & Dumais, 1997; Lund & Burgess, 1996; Morton, 1970; Norman, & Rumelhart, 1970; Osgood, Suci, & Tannenbaum, 1957; Underwood, 1969; Wickens, 1972). The main requirement of such spaces is that words that are similar in meaning should be represented by similar vectors. Representing words as vectors in a multidimensional space allows simple geometric operations such as the Euclidian distance or inner product to compute the semantic similarity between arbitrary pairs or groups of words. This makes it possible to make predictions about performance in psychological tasks where the semantic distance between pairs or groups of words is assumed to play a role.
The main goal of part I of this research is to introduce a new method for creating psychological spaces that is based on an analysis of a large free association database collected by Nelson, McEvoy, and Schreiber (1998) containing norms for first associates for over 5000 words. This method places over 5000 words in a psychological space that we will call Word Association Space (WAS).
We believe such a construct will be very useful in the modeling of episodic memory phenomena since it has been shown that associative structure of words plays a central role in recall (e.g. Bousfield, 1953; Cramer, 1968; Deese, 1959a,b, 1965; Jenkins, Mink, & Russell, 1958), cued recall (e.g. Nelson, Schreiber, & McEvoy, 1992) and priming (e.g. Canas, 1990; see also Neely, 1991). For example, Deese (1959a,b) found that the inter-item associative strength for the words on a study list can predict the number of words recalled, the number of intrusions, and the frequency with which certain words intrude.
In this paper, we will first introduce four methods to create semantic spaces. These are based on the semantic differential, multidimensional scaling on similarity ratings, LSA, and HAL. Then, we will introduce WAS, the approach of placing words in a high dimensional space by analyzing free association norms. The similarity and differences between WAS and free association norms are discussed. Two demonstrations are given that WAS is useful in predicting memory performance. First, we will show that the intrusion rates in free recall experiments observed in Deese (1959b) can be predicted on the basis of the similarity structure in the vector space. Second, we will show that WAS can predict to some degree the percentage of correctly recalled words in extra list cued recall tasks (Nelson & Schreiber, 1992; Nelson, Schreiber, & McEvoy, 1992; Nelson, McKinney, Gee, & Janczura, 1998; Nelson & Xu, 1995). We will contrast the predictions from WAS with predictions made by the LSA approach.
Semantic differential. This method was developed by Osgood, Suci, and Tannenbaum (1957). Words are rated on a set of bipolar rating scales. The bipolar rating scales are semantic scales defined by pairs of polar adjectives (e.g. “good-bad”, “altruistic-egotistic”, “hot-cold”). Each word that one wants to place in the semantic space is judged on these scales. If numbers are assigned from low to high for the left to right word of a bipolar pair, then the word “dictator” for example, might be judged high on the “good-bad”, high on the “altruistic-egotistic” and neutral on the “hot-cold” scale. For each word, the ratings averaged over a large number of subjects define the coordinates of the word in the semantic space. Because semantically similar words are likely to receive similar ratings, they are likely to be located in similar regions of the semantic space. The advantage of the semantic differential method is the simplicity and intuitive appeal. The problem inherent to this approach is the arbitrariness in choosing the set of semantic scales as well as the number of semantic scales.
MDS on similarity ratings. In this method, participants rate the semantic similarity for pairs of words. Then, those similarity ratings can be subjected to multidimensional scaling analyses to derive vector representations in which similar vectors represent words similar in meaning (Caramazza, Hersch, & Torgerson, 1976; Rips, Shoben, & Smith, 1973; Schwartz & Humphreys, 1973). While this method is straightforward and has led to interesting applications (e.g. Caramazza et al; Romney et al., 1993.), it is clearly impractical for large number of words as the number of ratings that must be collected goes up quadratically with the number of stimuli.
Latent Semantic Analysis (LSA). A method to derive high-dimensional semantic spaces that does not rely on judgments by participants is Latent Semantic Analysis or LSA (Derweester, Dumais, Furnas, Landauer, & Harshman, 1990; Landauer & Dumais, 1997; Landauer, Foltz, & Laham, 1998). The assumption Landauer and Dumais (1997) make is that similar words occur in similar contexts. A context can be defined by any connected set of text from a corpus such as an encyclopedia, or samples of texts from textbooks. For example, a textbook with a paragraph about “cats” might also mention “dogs”, “fur”, “pets” etc. This knowledge can be used to assume that “cats” and “dogs” are related in meaning. However, some words are clearly related in meaning such as “cats” and “felines” but they might never occur simultaneously in the same context. There might be indirect links between “cats” through its context words with “felines”, i.e., the words share similar contexts. The technique of singular value decomposition (SVD) can be applied on the matrix of word-context co-occurrence statistics. This methods analyzes the direct and indirect relationships between words and contexts in the matrix based on simple matrix-algebraic operations. The result of the SVD analysis is a high dimensional space in which words that appear in similar contexts are placed in similar regions of the space. Landauer and Dumais (1997) applied the LSA approach on the 68,000 words of a large encyclopedia and placed these words in a high dimensional space with the number of dimensions chosen between 100 and 400. The LSA representation has been successfully applied to multiple choice vocabulary tests, domain knowledge tests and content evaluation (see Landauer & Dumais, 1997; Landauer, Foltz, & Laham, 1998).
Hyperspace Analogue to Language (HAL). The HAL model develops high dimensional vector representations for words that like LSA is based on a co-occurrence analysis of large samples of written text (Burgess, Livesay, & Lund, 1998; Lund & Burgess, 1996; see Burgess & Lund, 2000 for an overview). For 70,000 words, the co-occurrence statistics were calculated in a 10 word window that was slid over the text from a corpus of over 320 million words (gathered from Usenet newsgroups). For each word, the co-occurrence statistics were calculated of the 70,000 words appearing before or after that word in the 10 word window. The resulting 140,000 values for each word were the feature values for the words in the HAL representation. Because the representation is based the context in which words appear, the HAL vector representation is also referred to as a contextual space: words that appear in similar contexts are represented by similar vectors. The HAL and LSA approach are similar in one major assumption: similar words occur in similar contexts. In both HAL and LSA, the placement of words in a high dimensional semantic space is based on an analysis of the co-occurrence statistics of words in their contexts. In LSA, a context is defined by a relatively large segment of text whereas in HAL, the context is defined by a window of 10 words1.
One great advantage of LSA and HAL over approaches depending on human judgments is that almost any number of words can be placed in a semantic/contextual space. This is possible because the method relies uniquely on samples of written text (of which there is a virtually unlimited amount) as opposed to ratings provided by participants. Even though a working vocabulary of 5000 words in WAS is much smaller than the 70,000 word long vocabularies of LSA and HAL, we believe it is large enough for our purpose of modeling performance in memory tasks.
Deese (1962,1965) asserted that free associations are not haphazard processes in our brain and that there is regularity underneath them. He laid the framework for studying the meaning of linguistic forms that can be derived by analyzing the correspondences between distributions of responses to free association stimuli: "The most important property of associations is their structure - their patterns of intercorrelations" (Deese, 1965, p.1). The SVD method has been successfully applied in LSA to uncover the patterns of intercorrelations of the co-occurrence statistics for words appearing in contexts. We will also use the SVD method but apply it on a different database: a large database of free association norms collected by Nelson, McEvoy, and Schreiber (1998) containing norms for first associates for over 5000 words.
In total, more than 6000 people participated in the collection of this database. An average of 149 (SD = 15) participants were presented with 100-120 English words. These words served as cues (e.g. “cat”) for which participants had to write down the first word that came to mind (e.g. “dog”). These experiments were performed on many participants so that for each cue the relative associative strengths could be calculated for responses by the proportion of subjects that elicited the response to the cue (e.g. 60% responded with “dog”, 15% with “pet”, 10% with “tiger”, etc).
The idea is to apply the SVD method to place words in a high dimensional space by analyzing the direct and indirect associative relationships between words. While the details of this procedure are discussed in the Appendix, the basic approach is illustrated in Figure 1. The free association norms were represented in matrix form. The rows represent the cues and the columns represent the responses. An entry in the matrix represents the relative frequency with which a response was generated for the particular cue (i.e., associative strength). Before SVD was applied to the matrix, it was preprocessed in two ways. First, the indirect associative strengths between words were calculated and added to the matrix6. Then, the matrix was symmetrized such that the associative strength between cue A and response B equaled the associative strength between cue B and response A. After these preprocessing steps, the matrix was subjected to SVD. The result of SVD is the placement of words in a high dimensional space, which we called Word Association Space (WAS).

Figure 1. Illustration of the creation of Word Association Spaces (WAS). By singular value decomposition on a large database of free association norms, words are placed in a high dimensional semantic space. Words with similar associative relationships are placed in similar regions of the space.
In WAS, words that have similar associative structures are represented by similar vectors. Words that are not direct associates of each other can also be represented by similar vectors if their associates are related (or if the associates of the associates of the words are related).
The representation of words in WAS is dependent on the method with which the free association norms are analyzed. By using the SVD method, words are represented by vectors with continuous feature values that have a symmetric distribution around zero. A suitable measure for the similarity between two words is the inner product of the two word vectors. The idea is that two words that are similar in meaning or that have similar associative structures have high similarity as defined by the inner product of the two word vectors.
An important variable (which we will call k) is the number of dimensions of the space2. One can think of k as the number of feature values for the words. We vary k between 10 and 400. The number of dimensions will determine how much the information of the free association database is compressed. With too few dimensions, the similarity structure of the resulting vectors does not capture enough detail of the original associative structure in the database. With too many dimensions, the similarity structure of the vectors does not capture enough of the indirect relationships in the associations between words.
To get an understanding of what the similarity
structure of WAS is like, we performed four analyses. In the first analysis,
the similarity structure of low and high frequency is compared and it is shown
that in WAS, high frequency words are more similar to other high frequency
words than to low frequency words. In the second analysis, we compared the
ordering of neighbors in WAS to the ordering of the strength of associates in
the free association norms. In the third analysis, the issue of whether WAS captures
semantic or associative relationships (or both) is addressed. It is argued that
it is difficult to make a distinction between the two kinds of relationships.
In the fourth analysis, we analyze the ability of WAS to capture the
differences between and within semantic categories. We will now discuss these
four analyses in turn.
Word frequency can be defined by the number of times words occur in large samples of written text (Kucera & Francis). The frequency of words in samples of written text correlates with the frequency with which words are produced in free association norms. High frequency words are produced more often as responses in free association norms3. We investigated the similarity structure of low and high frequency words in WAS by calculating the similarity between groups of words with different frequency ranges. In Figure 2, top panel, the average inner product is calculated between random words from different Kucera and Francis frequency ranges. The highest similarity was obtained between high frequency words. Lower similarities were obtained between high and low frequency words and the lowest similarity was obtained between low frequency words. The reason for the average similarity being higher between high frequency words is that high frequency word vectors in WAS have larger magnitudes than low frequency word vectors. This is shown in Figure 2, bottom panel. Vectors with larger magnitudes, on average lead to larger inner products.

Figure 2. The effect of word frequency on the similarity structure of WAS and the length of the word vectors. In the top panel, the average similarity (measured by inner product) between random words from different Kucera and Francis word frequency ranges is plotted. The similarity is highest when high frequency words are compared with high frequency words.
The similarity decreases when the word
frequencies of the words compared decreases. In the bottom panel, the figure
shows that the vector lengths are bigger for high frequency words than low
frequency words. Of course, it is the combination of the vector magnitudes and
the correlation between the feature values that determine the similarity as
computed by the inner product. Because high frequency words on average have
larger magnitudes, they are placed more at the outskirts of the semantic space
while low frequency words are placed more in the center of the space. Because
an inner product measure for similarity is used, the average similarity between
the high frequency words that lie at the outskirts of the space is higher than
between words that lie more in the center of the space. Of course, using a
different similarity measure should lead to different results. For example,
using Euclidian distance as a measure for (inverse) similarity, should lead to
lower similarities between high than low frequency words. This observation
becomes important for part II of this research. 
Because the word vectors in WAS are based explicitly on the free association norms, it is of interest to check whether the output order of responses (in terms of associative strength) can be predicted by WAS. We took the 10 strongest responses to each of the cues in the free association norms and ranked them according to associative strengths. For example, the response ‘crib’ is the 8th strongest associate to ‘baby’ in the free association norms, so ‘crib’ has rank 8 for the cue ‘baby’. Using the vectors from WAS, the rank of the similarity of a specific cue-response pair was computed by ranking the similarity among the similarities of the specific cue to all other possible responses. For example, the word ‘crib’ is the 2nd closest neighbor to ‘baby’ in WAS, so ‘crib’ has rank 2 for the cue ‘baby’. In this example, WAS has put ‘baby’ and ‘crib’ closer together than might be expected on the basis of free association norms. In Table 1, we compare the ranks from WAS to the ranks in the free association norms by computing the average of the ranks in WAS for the 10 strongest responses in the free association norms. The averaging was computed by the median to avoid excessive skewing of the average by a few high ranks. An additional variable that is tabulated in

Table 1 is k, the number of dimensions of WAS. There are three trends to be discerned in Table 1. First, it can be observed that for 400 dimensions, the strongest responses to the cues in free association norms are predominantly the closest neighbors to the cues in WAS. Second, responses that have higher ranks in free association have on average higher ranks in WAS. However, the output ranks in WAS are in many cases far higher than the output ranks in free association. For example, with 400 dimensions, the third largest response in free association is on average the 10th closest neighbor in WAS. Third, for smaller dimensionalities, the difference between the output order in free association and WAS becomes larger.To summarize, given a sufficiently large number of dimensions, the strongest response in free association is represented (in most cases) as the closest neighbor in WAS. The other close neighbors in WAS are not necessarily associates in free association (at least not direct associates).
To get a better idea of the kinds of neighbors words have in WAS, in Table 2, we list the first five neighbors in WAS (using 400 dimensions) to 40 cue words. For all neighbors listed in the table, if they were associates in the free association norms of Nelson et al., then the corresponding rank in the norms is given between parentheses. Since all the 40 cue words are cue words used in the free association norms of Russell and Jenkins (1954), we also list the ranks in those norms between square brackets. The comparison between these two databases is interesting because Russell and Jenkins allowed participants to generate as many responses they wanted for each cue while the norms of Nelson et al. contain first responses only. We suspected that some close neighbors in WAS are not direct associates in the Nelson et al. norms but that they would have been valid associates if participants were allowed to give more than one association per cue. In Table 3, we list the percentages of neighbors in WAS of the 100 cues of the Russell and Jenkins norms (only 40 were shown in Table 2) that are valid/invalid associates according to the norms of Nelson et al. and/or the norms of Russell and Jenkins.
The last row shows that a third of the 5th closest neighbors in WAS are not associates according to the norms of Nelson et al. but that are associates according to the norms of Russell and Jenkins. Therefore, some close neighbors in WAS are valid associates depending on what norms are consulted. However, some close neighbors in WAS are not associates according to either norms. For example, ‘angry’ is the 2nd neighbor of ‘anger’ in WAS. These words are obviously related by word form but they do not to appear as associates in free association tasks because associations of the same word form tend to be edited out by participants. Because these words have similar associative structures, WAS puts them close together in the vector space.
Also, some close neighbors in WAS are not direct associates of each other but are indirectly associated through a chain of associates. For example, the pairs ‘blue-pants’ , ‘butter-rye’, ‘comfort-table’ are close neighbors in WAS but are not directly associated with each other. It is likely that because WAS is sensitive to the indirect relationships in the norms, these word pairs were put close together in WAS because of the indirect associative links through the words ‘jeans’, ‘bread’ and ‘chair’ respectively. In a similar way, ‘cottage’ and ‘cheddar’ are close neighbors in WAS because cottage is related (in one meaning of the word) to ‘cheese’, which is an associate of ‘cheddar’.

In Table 1, we also analyzed the correspondence between the similarities in the LSA space by Landauer and Dumais (1997) with the order of output in free association. As can be observed in the table, the rank of the response strength of the free association norms clearly has an effect on the ordering of similarities in LSA: strong associates are closer neighbors in LSA than weak associates. However, the overall correspondence between predicted output ranks in LSA and ranks in the norms is weak. The overall weaker correspondence between the norms and similarities for the LSA approach than the WAS approach highlights one obvious difference between the two approaches. Because WAS is based explicitly on free association norms, it is expected and shown here that words that are strong associates are placed close together in WAS whereas in LSA, words are placed in the semantic space in a way more independent from the norms.
In the priming literature, several authors have tried to make a distinction between semantic and associative word relations in order to tease apart different sources of priming (e.g. Burgess & Lund, 2000; Chiarello, Burgess, Richards & Pollock, 1990; Shelton & Martin, 1992). Burgess and Lund (2000) have argued that the word association norms confound many types of word relations, among them, semantic and associative word relations. Chiarello et al. (1990) give “music” and “art” as examples of words that are semantically related because the words are rated to be members of the same semantic category (e.g. Battig & Montague, 1969). However, they claim these words are not associatively related because they are not direct associates of each other (according to the various norm databases that they used). The words “bread” and “mold” were given as examples of words that are not semantically related because they are not rated to be members of the same semantic category but only associatively related (since “bread” is an associate of “mold”). Finally, “cat” and “dog” were given as examples of words that are both semantically and associatively related.
We agree that the responses in free association norms can be related to the cues in many different ways, but it seems very hard and perhaps counterproductive to classify responses as purely semantic or purely associative4. For example, word pairs might not be directly but indirectly associated through a chain of associates. The question then becomes, how much semantic information do the free association norms contain beyond the direct associations? Since WAS is sensitive to the indirect associative relationships between words, we took the various examples of word pairs given by Chiarello et al. (1990) and Shelton and Martin (1992) and computed the WAS similarities between these words for different dimensionalities as shown in Table 4.

In Table 4, the interesting comparison is between the similarities for the semantic only related word pairs5 (as listed by Chiarello et al., 1990) and 200 random word pairs. The random word pairs were selected to have zero forward and backward associative strength.

It can be observed that the semantic only related word pairs have higher similarity in WAS than the random word pairs. Therefore, even though Chiarello et al. (1990) have tried to create word pairs that were only semantically related, WAS can distinguish between these not directly associated word pairs and not directly associated random word pairs. This is because WAS is sensitive to indirect associative relationships between words. The Table also shows that for low dimensionalities, there is not as much difference between the similarity of word pairs that are semantically only and associatively only related. For higher dimensionalities, this difference becomes larger as WAS becomes more sensitive in representing more of the direct associative relationships.
To conclude, it is difficult to distinguish between pure semantic and pure associative relationships. What some researchers previously have considered to be pure semantic word relations, were word pairs that were related in their meaning but that were not directly associated with each other. This does not mean however that these words are not associatively related because the information in free association norms goes beyond that of direct associative strengths. In fact, the similarity structure of WAS turns out to be sensitive to the similarities that were argued by some researchers to be purely semantic.
In this section, we give an additional demonstration that the space formed by WAS is sensitive to semantic information. Murdock’s (1976) collected 32 semantic categories with each 32 category members. Examples of categories are ‘body parts’, ‘ships’, ‘birds’, ‘fruits’, and ‘tools’. Members of the first category were for example ‘leg’, ‘arms’, ‘head’, ‘eye’ and members of the second category were for example ‘sailboat’, ‘destroyer’, ‘battleship’. If WAS is sensitive to the categorical structure of these semantic norms, then the within category similarity should on average be higher than the between category similarity. Similarity was computed by the inner product between word vectors. The within category similarity was calculated by averaging the similarities of all possible word pairs within a category. Similarly, the between category similarity was calculated by averaging the similarities of all possible word pairs that fell in different categories. In Table 5, the between and within category similarities are shown. Note that the within category similarity is 18 times higher than the between category similarity suggesting that the similarity structure of WAS is well suited to represent semantic categorical information. The row labeled ‘not normalized’ refers to the space used in part I of the research where the vector lengths are not normalized. In the second row, the table shows that when the vector lengths are normalized, the ratio of within to between category similarity is equally high. This observation becomes important in part II of this research, where we do normalize the vector lengths.
In a classic study by Deese (1959b), the goal was to predict the
intrusion rates of words in free recall.
Participants studied the 15 strongest associates to each of 36 critical
lures while the critical lures themselves were not studied. In a free recall
test, some critical lures (e.g. “sleep”) were falsely recalled about 40% of the
time while other critical lures (e.g. “butterfly”) were never falsely recalled.
Deese was able to predict the intrusion rates for the critical lures on the
basis of the average associative strength from the studied associates to the
critical lures and obtained a correlation of R=0.8. Since Deese could predict
intrusion rates with word association norms, it was expected that that the WAS
vector space derived from the association norms could also predict intrusion
rates. The idea here is that critical items that are closely related to list
words are more likely to appear as intrusions in free recall than critical
items that are not closely related to list words. The average similarity was
computed between each critical lure vector and list word vectors using
different dimensionalities. In Figure 3, a scatter plot shows the relationship
between the similarity and the intrusion rates as observed by Deese (here, the
number of dimensions was 400). The obtained correlation was R=0.775. In Table
6, the correlations for other dimensionalities are listed. The correlation
decreases with decreasing number of dimensions. This might happen because a
smaller dimensional space has less room to place 5000 words so that the
resulting similarity structure does not capture as well the differences in
observed intrusion rates. The table also shows the correlations when the
vectors are taken from LSA. It can be seen that similarity structure of LSA
does not correlate as well with the intrusion rates as WAS. Also, the effect of
varying the number of dimensions does not seem to affect the correlations

Figure 3. The average similarity between critical item and list item in WAS can predict the intrusion rates for the critical item as observed by Deese (1959b).
.

In extralist cued recall experiments, after studying a list of words, subjects are presented with cues that can be used to retrieve words from the study list. The cues themselves are novel words that were not presented during study and they typically are associatively related to one of the studied words. The degree to which a cue is successful in retrieving a particular target word is a measure of interest because this might be related to the associative/semantic overlap between cues and their targets. Research in this paradigm (e.g. Nelson & Schreiber, 1992; Nelson, Schreiber, & McEvoy, 1992; Nelson, McKinney, Gee, & Janczura, 1998; Nelson & Xu, 1995) has already shown that the associative strength between cue and target is one important predictor for the percentage correctly recalled targets. Therefore, we expect that the WAS similarity between cues and targets are correlated to the percentages of correct recall in these experiments. We used a database containing the percentages correct recall for 1115 cue-target pairs from over 29 extralist cued recall experiments from Doug Nelson’s laboratory (Nelson & Zhang, submitted; Nelson, personal communication). The correlations between the WAS similarity and observed recall rates for different dimensionalities are shown in Table 7.

By a statistical analysis of a large database of free association norms, the Word Association Space (WAS) was developed. In this space, words that have similar associative structures are placed in similar regions of the space. We showed that the output order of words in free association norms is preserved to some degree in WAS: first associates in the norms are likely to be close neighbors in WAS. There are some interesting differences between the similarity structure of WAS and the associative strengths of the words in the norms. Words that are not directly associated can be close neighbors in WAS when the words are indirectly associatively related through a chain of associates. Also, in some cases, words that are directly associated in the norms are not close neighbors in WAS at all (although these are exceptions). This makes WAS not a good model for the task of predicting free association data. However, it is important to realize that WAS was not developed as a model of free association (e.g. Nelson & McEvoy, Dennis, in press) but rather as a model based on free association.
The WAS approach is an additional method available to place words in a psychological space. It differs from the LSA and HAL approaches in several ways. LSA and HAL are automatic methods and do not require any extensive data collection of ratings or free associations. With LSA and HAL, tens of thousands of words can be placed in the space, whereas in WAS, the number of words that can be placed depends on the number of words that can be normed. It took Nelson et al. (1998) more than a decade to collect the norms, highlighting the enormous human overhead of the method.
Another difference is that LSA and HAL have the potential to model the learning process a language learner goes through. For example, by feeding the LSA or HAL model successively larger chunks of text, it can be simulated what the effect learning has on the similarity structures of words in LSA or HAL. In WAS, it is in principle possible to model a language learning process by collecting free association norms for participants at different stages of the learning process. In practice however, such an approach would not easily be accomplished.
We think that the WAS, LSA, and HAL approaches to creating semantic spaces are all useful for theoretical and empirical research. It might be that the usefulness of a particular space will depend on the task it is applied to. Since the free association norms have been an integral part in predicting episodic memory phenomena (e.g. Cramer, 1968; Deese, 1965; Nelson, Schreiber, & McEvoy, 1992), it was assumed that a vector space based on free association norms would be an especially useful construct to model memory phenomena. In this research, we have already shown with simple geometric operations how the similarity structure of WAS can predict to some degree the intrusion rates observed by Deese (1959b) in his classic false memory experiment as well as the percentages of correct recall in cued recall experiments. This suggests to us that WAS forms a useful representational basis for memory models that are designed to store and retrieve words as vectors of feature values. In part II of this research, we will combine the semantic space of WAS with a process model for recognition memory. This will allow us to model the processes of recognition memory and gives us a principled way to represent words by vectors. The assumption of representing words by vectors in memory models is relatively old. However, in most memory modeling, the vectors representing words are arbitrarily chosen and are not based on or derived by some analysis of the meaning of actual words in our language. In part II, it is expected that a memory model based on these semantic vectors from WAS will be useful to make predictions about the effects of varying semantic similarity in memory experiments.
Let the matrix A represent the information from the free association norms with Aij representing the relative frequency with which participants generate response j with cue i. The idea is to use the information in the matrix of the free association norms to place the n words in a high dimensional space by applying singular value decomposition. We first transformed A to a new matrix T by symmetrizing A and by adding the two-step indirect associative strengths6 from the cue to response and from response to cue:
(1)
The matrix T is symmetric: Tij = Tji. It is possible to decompose any square symmetric matrix T into a product of three matrices by using a special case of the singular value decomposition method7:
(2)
Here, U’0 denotes the transpose of U0. When the matrix T has size n x n (i.e., n rows and n columns), then U0 and D0 are also size n x n. The columns of matrix U0 are orthonormal and contain the N eigenvectors. The matrix D0 is diagonal and contains the n singular values. It is customary to let the first diagonal entry contain the largest eigenvalue followed by eigenvalues in decreasing order.
The purpose of this linear decomposition is to approximate matrix T by matrices with a much smaller number of singular values and singular vectors:
(3)
Here, D is the k x k diagonal matrix containing only the k largest (k << n) singular values of D0. U is the n x k matrix that contains only the first k eigenvector columns of U0. We represent words by the column vectors of the matrix X, which is formed by weighting the eigenvectors with the eigenvalues:
(4)
The matrix X represents the high dimensional vector space that is called ‘Word Association Space’. Each column vector of X represents the location of a word i