Why do children learn nouns such as cup faster than dimensional adjectives such as big? Most explanations of this phenomenon rely on prior knowledge of the noun-adjective distinction or on the logical priority of nouns as the arguments of predicates. In th
Learning Nouns and Adjectives: A Connectionist AccountMichael Gasser Computer Science and Linguistics Departments Lindley Hall 215 Indiana University Bloomington, IN 47405, USAgasser@indiana.edu
Linda B. Smith Psychology Department Psychology 332 Indiana University Bloomington, IN 47405, USAsmith4@indiana.edu
Why do children learn nouns such as cup faster than dimensional adjectives such as big? Most explanations of this phenomenon rely on prior knowledge of the noun-adjective distinction or on the logical priority of nouns as the arguments of predicates. In this paper we examine an alternative account, one which relies instead on properties of the semantic categories to be learned and of the word learning task itself. We isolate four such properties: the relative size, the relative compactness, and the degree of overlap of the regions in representational space associated with the categories and the presence or absence of lexical dimensions (what color? ) in the linguistic context of a word. In a set of ve experiments, we trained a simple connectionist network to label input objects in particular linguistic contexts. The network learned categories resembling nouns with respect to the four properties faster than it learned categories resembling adjectives.
Abstract
Young children learn nouns more rapidly and less errorfully than they learn adjectives. The nouns that children so readily learn typically label concrete things such as BLOCK1 and DOG. The adjectives that young children learn with greater di culty label the perceptible properties of these same objects, for example, RED and WET. Why are concrete nouns easier for young children to learn than dimensional adjectives?1
To whom correspondence should be addressed We will use uppercase for concepts, italics for linguistic forms, and double quotes for utterances.
Why do children learn nouns such as cup faster than dimensional adjectives such as big? Most explanations of this phenomenon rely on prior knowledge of the noun-adjective distinction or on the logical priority of nouns as the arguments of predicates. In th
It is common in the study of cognitive development to explain such di erences in learning by positing domain-speci c mechanisms dedicated to that learning. Thus one might explain the noun advantage by looking for conceptual structures that speci cally constrain or promote the learning of nouns and the lack of such speci c structures for adjectives. In this paper, we pursue an alternate idea. We propose that common nouns and dimensional adjectives are initially acquired by the very same processes in the very same way. But, we argue, many mundane factors conspire to make names for common things more easily learned than labels for the properties of those things. We test our account by examining how a general category learning device, a multi-layer feedforward connectionist network, learns concrete nouns and dimensional adjectives.
1 The PhenomenonThree kinds of evidence point to the initial priority of names for things over labels for the attributes of those same things. The rst concerns the kinds of words that comprise early productive vocabularies. Nouns dominate; dimensional adjectives are rare or nonexistent. For examp
le, in Stern's diary study of the acquisition of English (Gentner, 1978), 78% of the words produced at 20 months were nouns while none were adjectives. Similarly, in Nelson's (1973) study of 18 children learning English, fewer than 7% of the rst 50 words were adjectives. The priority of nouns over adjectives in early vocabularies is evident in other languages as well. In Dromi's (1987) study of one child learning Hebrew, only 4 of the rst 337 words were adjectives. In a longitudinal study of the acquisition of Spanish by 328 children, Jackson-Maldonado et al. (1993) found only one adjective among the 88 most common words. The nding that adjectives are infrequent in early vocabularies is remarkable given that common dimensional adjectives such as size and color terms are among the most frequently used words in adult language. The second class of evidence concerns studies of arti cial word learning. In this commonly used method, experimenters present a novel object to a child and label it with a novel word (e.g.,\this is a dax"). Children's interpretation of the word is measured by the kinds of other objects to which they generalize the newly learned label. Considerable evidence indicates that by 18 months (and quite possibly before), children interpret novel nouns as referring to taxonomic categories (Markman, 1989; Waxman, 1994). Further, the evidence suggests that children remember what they have learned over several days and weeks (Woodward, Markman,& Fitzsimmons, 1994). There have been a number of attempts to use these methods to teach novel adjectives. In these studies, the novel word is placed in an adjectival context (e.g.,\this is a daxy one") or is explicitly contrasted with a known adjective (e.g.,\this is ecru, not red"). Learning in these instances has proved modest at best, even in children as old as 36 months (Au& Laframboise, 1990; Au& Markman, 1987; Carey, 1978; Smith, Jones,& Landau, 1992; Taylor& Gelman, 1988). Cross-linguistic studies of arti cial word learning also suggest that names for concrete things are special in early language learning (Imai& Gentner, 1993; Waxman, 1994) in that there are considerable similarities in the nature of children's noun extensions across languages and considerable variability across (and within) languages in young children's interpretation of novel adjectives. Other evidence from children learning English suggests 2
Why do children learn nouns such as cup faster than dimensional adjectives such as big? Most explanations of this phenomenon rely on prior knowledge of the noun-adjective distinction or on the logical priority of nouns as the arguments of predicates. In th
that the initial meanings of dimensional terms may be highly context speci c (Keil& Carroll, 1980). In sum, whereas names for things appear to be\fast mapped" (Carey, 1982) to potential categories, the extension of a novel adjective appears more slowly and more variably determined. The third class of evidence concerns children's errors with nominal and adjectival meanings. There are extensive literatures in both areas although they are di cult to compare because of vastly di erent methods, ages of subjects, and empirical questions asked. These di erences derive directly
from the noun advantage over adjectives. The key question for researchers who study early noun acquisition is how it is that children learn so many nouns so rapidly and with so few errors. The only errors consistently studied in this literature are the overextension errors typically noticed at about the time productive vocabulary rst begins to accelerate. However, there is a debate as to whether these errors are category errors. Instead, these overextensions (for example, calling a zebra\doggy") may re ect pragmatic strategies or retrieval errors (Gershko Stowe& Smith, 1996; Huttenlocher, 1974). Consistent with this idea is the rarity of overextensions in comprehension (see, for example, Naigles& Gelman, 1995). In contrast, the key question for researchers who study the acquisition of dimensional adjectives is why they are so di cult to learn. The central phenomena are comprehension errors. Long after children begin to use dimensional words, when they are as old as 3, 4, or even 5 years, their interpretations of dimensional adjectives are still errorful. This literature is replete with examples of both within- and between-dimension errors, interpreting big to mean TALL (Maratsos, 1988), big to mean BRIGHT (Carey, 1978, 1982), dark to mean LOUD (Smith& Sera, 1992), and blue to mean GREEN (Backscheider& Shatz, 1993). Although plentiful, these errors are constrained. They consist of confusions within the semantic domain of dimensional terms. That is, children may confuse dark and loud but they do not confuse dark and room . The category speci city of these errors means that at the same time children are rapidly learning nouns and commonly misinterpreting adjectives, they have some idea that nouns and adjectives span di erent categories of meaning. In sum, the phenomena to be explained are (1) why common nouns are acquired by young children earlier, more rapidly, and with fewer errors than are dimensional adjectives and (2) how, during the protracted course of learning dimensional adjectives, young children seem to recognize that the dimensional adjectives comprise a class.
2 Rationale for a Similarity-Based ApproachOne way of construing the problem is in terms of category learning. Why are common noun categories more easily learned than common adjective categories? Several proposals have been o ered suggesting a foundational conceptual distinction between objects and their attributes. For example, Gentner (1978), Maratsos (1988), and Macnamara (1982) have all suggested that nouns are logically prior. They point out that predicates presuppose arguments but that the reverse is not true. The suggestion, then, is that children need not understand shaggy to gure out what dog means from examples like the dog is shaggy but must know dog to gure out shaggy from the same 3
Why do children learn nouns such as cup faster than dimensional adjectives such as big? Most explanations of this phenomenon rely on prior knowledge of the noun-adjective distinction or on the logical priority of nouns as the arguments of predicates. In th
sentence. Similarly, Markman (1989; see also, Carey, 1994) proposed that children's initial hypotheses about word meanings adhere to a\whole-object principle"| that children ass
ume that novel labels refer to individual whole objects rather than to their component properties or to collections of objects. Thus, by this account, children's initial hypotheses about meanings are noun-like. Although these proposals are probably somewhat correct, they seriously underspecify the processes through which knowledge about the di erences between nouns and adjectives is instantiated or acquired. We seek such speci cation in a similarity-based account. Our idea is that the noun advantage and an initial segregation of nouns and adjectives as distinct classes of words is the result of the most general and ordinary processes of associative learning. There are two arguments for this approach which we nd compelling. First, whatever else children know or believe, similarity-based associative learning is part of their biology and thus a good place to begin looking for a mechanistic account. Second, similaritybased learning would seem crucial at the front-end when children know no language. At this point, children learn many words by ostensive de nition (Mervis, 1987). Parents point to an object and say, for example,\that's a dog" or\that's big." This associative task of mapping words to perceptible properties would seem to be the very same for the learning of dimensional adjectives as for the learning of nouns. Even if the child possessed some pre-existing conceptual distinction between objects and their properties, the child could not use that knowledge at this stage because the child has no words and thus no knowledge of the syntactic frames that would distinguish whether a novel word is a noun or an adjective. In the beginning, the young child can only associate novel labels with the properties of things so labeled. Doing so will yield a representation of dog as things with DOG properties and a representation of wet as things with WET properties. While incomplete, such meanings are in fact on the right track. Given these assumptions, we ask: Why are common nouns learned more readily than common adjectives?
Previous researchers have pointed to three kinds of di erence between common noun and dimensional adjective categories.
2.1 Di erences in Similarity Structure between Nouns and Adjectives2.1.1 Many vs. Few Similarities
Gentner& Rattermann (1991), Markman (1989), Medin& Ortony (1989), and Rosch (1973a) have all argued that common nouns label objects similar across many interrelated and correlated properties. In contrast, dimensional adjectives label objects that are alike on only one property. This di erence between nouns and adjectives has important conceptual consequences (see especially Markman, 1989). For example, knowing that an object is a bird allows predictions about many di erent properties of the object but knowing that an object is a member of the category WHITE-THINGS supports only predictions about the object's color. 4
Why do children learn nouns such as cup faster than dimensional adjectives such as big? Most explanations of this phenomenon rely on prior knowledge of the noun-adjective distinction or on the logical priority of nouns as the arguments of predicates. In th
This di erence also has important implications for similarity-based learning, as illustrated in
Figure 1. This gure represents the extensions of idealized nouns and adjectives as regions in a multidimensional space of all possible objects. The relevant spaces are hyperspaces of many dimensions, all of those along which noun and adjective meanings vary, but for ease of illustration we con ne ourselves to three dimensions. For example, the dimensions shown could represent SIZE, SMOOTHNESS, and SHININESS. Each of the outlined regions within the large cube represents a hypothetical category associated with a single word, and instances of the category would be points within the region. As can be seen in the gure, categories organized by many dimensional similarities (cubes with thick outlines) are small and compactly shaped relative to those that are organized by similarity on just one property. Thus, the idealized noun is uniformly and closely bounded in all directions. It is a hypercube or hypersphere. In contrast, members of an adjective category are tightly constrained in only one direction (the relevant dimension) but extend inde nitely in all others. The idealized dimensional-adjective category thus may be thought of as a\hyperslab." Further, the volume of idealized noun categories, compact in all dimensional directions, is relatively small whereas the volume of adjective categories, extending inde nitely in all directions but one, is great.
Figure 1: Typical Noun and Adjective Categories. Only three dimensions from the set of dimensions distinguishing the categories are shown. Noun categories appear in thick outline, adjective categories in thin outline.
Given ordinary ideas about similarity and generalization, these di erences clearly favor nouns. The within-category similarity is greater for the nouns than the adjectives in Figure 2. Further for nouns, generalization can be non-selective in all directions but for adjectives generalization must be selectively inhibited in one direction. Learning 5
Why do children learn nouns such as cup faster than dimensional adjectives such as big? Most explanations of this phenomenon rely on prior knowledge of the noun-adjective distinction or on the logical priority of nouns as the arguments of predicates. In th
about adjectives but not nouns thus requires discovering and selectively attending to one relevant direction in the multi-dimensional space.
2.1.2 Category OverlapNouns and adjectives also di er in the relatedness of one category to another. Common nouns all classify objects at one level (Rosch, 1973a). An object is a dog or a house or a watch or a car or a leaf. Thus the question what is it? is answerable by one basic noun. Markman (1989) incorporated this notion in her proposal that children adhere to a mutual exclusivity assumption in early word learning. Although this idea of a one-object, one-name rule is imperfect and complicated by a hierarchical taxonomy and synonyms, it also captures something quite real about the way common nouns are commonly used (Clark, 1973; Markman, 1989; Markman& Hutchinson, 1984; Mervis, 1987; Mervis, Mervis, Johnson,& Bertand, 1992; Rosch, 1973a). Dimensional adjectives present a markedly di erent structure. They are (typically) mutually exclusive within a dimension but overlap completely across dimens
ions. Objects in the category BIG may also be in the categories WET and FURRY. An idealization of this di erence between common nouns and dimensional adjectives is depicted in Figures 2 and 3. Relatively small noun categories ll all reaches of the space but rarely overlap with one another. In contrast, the extensions of dimensional adjectives create a dense grid-work of overlapping slabs that cut through the space in multiple directions as illustrated. Again, under the ordinary assumptions of similaritybased learning, these di erences in category structure favors nouns: between-category similarity among nouns is minimal but between category similarity among adjectives is great. Nouns and adjectives also di er in their association with the linguistic form of questions about objects. Di erent words, for example what is it versus what color is it? are used to ask about object categories and object properties. Dimensional adjectives also di er among themselves in this regard: what color is it asks for a color word as an answer; how does it feel? asks for a description of texture. Backscheider& Shatz (1993) have shown that young children are sensitive to these associations between questions and the class of possible answers prior to their understanding of the meanings of the individual words. Thus in learning common nouns and adjectives, learners do not just map objects to words but they also map linguistic inputs to linguistic outputs. It is not immediately clear whether these word-to-word associations favor nouns or adjectives. However, given the overlap among the to-be-learned categories, we can be certain that they are crucial to learning. A big, red, furry dog is a member of the category BIG, the category RED, the category FURRY, and the category DOG. It is the linguistic input, the question\what is it?" or\what color is it?," that speci es the relevant class of linguistic outputs. These word-to-word maps partition all the categories that the child is learning into larger proto-syntactic categories| into\noun categories,"\color categories,"\size categories," and\texture categories." In stages of incomplete learning, 6
2.1.3 Linguistic Associations
Why do children learn nouns such as cup faster than dimensional adjectives such as big? Most explanations of this phenomenon rely on prior knowledge of the noun-adjective distinction or on the logical priority of nouns as the arguments of predicates. In th
sions are shown. Noun categories tend to be small and compact and not to overlap with one another.
Figure 2: Noun Categories. Only three dimensions from the hyperspace of possible dimendo these word-to-word maps also create a distinction between nouns and adjectives such that adjectives are confused across dimensions but are not confused with nouns? In what follows, we demonstrate that a simple associative device that approaches the task of learning about nouns and adjectives in the very same way will nonetheless show a noun advantage and also the pattern of within-category confusions shown by children. In addition, we separately investigate the roles of category shape, volume, overlap, and word-word associations in forming this developmental trajectory.
3 A Connectionist CategorizerTo tes
t our hypothesis that the noun advantage in early acquisition derives from the associative structure of the learning task, we used the most common similarity-based learning procedure in the literature| a three-layer connectionist network trained with back-propagation. Such a general learning device embodies no prior knowledge about di erences between nouns and adjectives, and learning is purely associationist and errordriven. As in several other recent modeling studies (Plunkett, Sinha, M ller,& Strandsby, 1992; Schyns, 1992), we investigate the behavior of a simple connectionist network which is trained to label a set of patterns representing perceptual inputs to the system. The 7
Why do children learn nouns such as cup faster than dimensional adjectives such as big? Most explanations of this phenomenon rely on prior knowledge of the noun-adjective distinction or on the logical priority of nouns as the arguments of predicates. In th
Figure 3: Adjective Categories. Only three dimensions are shown. Adjective categoriestend to be large and elongated and to overlap with one another.
goal in these studies is to show how the facts of lexical development emerge from the interaction between the learning device and the regularities inherent in the input patterns. In our case, the relevant facts concern the relative ease of learning nouns and adjectives, and the regularities in the patterns concern di erences in the way noun and adjective categories carve up the space of input dimensions and co-occur with particular linguistic contexts. The main di erence between our network and other simple connectionist models is our use of a modi ed form of back-propagation. Back-propagation is suitable in that early word learning in children is\supervised." Adults ask children questions about objects (e.g.,\what is that?,"\what color is that?") and they provide feedback (e.g.,\that's not a dog; it's a horse") (Callanan, 1990; Mervis, 1987; Snow, 1977; Wood, 1980). Supervision for categorization tasks such as our word-learning task, as typically realized in connectionist networks, however, is psychologically unlikely. If separate output units represent the di erent category responses, standard back-propagation changes the connection weights on each learning trial in a way that encourages the correct response and discourages all other potential responses. This is like the parent saying to the child,\This is a dog, not a plate, not a cat, not an apple, not a house..." Parents do not do this but instead explicitly reinforce correct answers (\yes, that's a doggy") and provide negative feedback only when the child explicitly gives the wrong answer (\that's not a doggy; it's a horse"). This form of back-propagation is also inappropriate in the present case because in 8
Why do children learn nouns such as cup faster than dimensional adjectives such as big? Most explanations of this phenomenon rely on prior knowledge of the noun-adjective distinction or on the logical priority of nouns as the arguments of predicates. In th
the combined task of naming objects and labeling their attributes, possible responses are not just right or wrong. There are kinds and degrees of wrongness. Consider a big, black, wet dog and the question\what color is it?" The answers\dog" and\red" are both wrong. However, it seems unlikely that parents would respond to these errors in the same way. A toddler who answers the question\what color is it?" by correctly naming the dog\dog" seems li
kely to hear a parental response of\yes, it's a dog, a black dog." A toddler who answers the same question by saying\red" is likely to hear, instead, a parental response of the sort\it's not red, it's black." Accordingly, we modi ed the back-propagation algorithm to t these assumptions about the kinds of feedback provided by parents. Brie y, we provided targets only for a limited number of output words, and we distinguished the kinds of incorrect errors by using distinct targets for them. In the next two sections, we provide a detailed description of the network and the learning rule. Figure 4 shows the network architecture. Each thin arrow represents complete connectivity between two layers of processing units. The network is designed to take objects and a linguistic context as inputs and to produce a noun or adjective as output. Inputs to the network are presented to two layers of processing units, one for the representation of the object itself and one for a linguistic context corresponding to a question the network is asked. Input objects consist of patterns of activation representing a perceptually present object in terms of a set of sensory dimensions. For the simulations discussed in this paper, the inputs are speci ed in terms of four or ve dimensions. We require that the network learn to associate points along each dimension with particular words, so the simplest possible representation of a dimension, that is, a single unit, is excluded because it would only permit the association to di erent degrees of the dimension as a whole with each word. Therefore each dimension takes the form of a group of units in the input layer of the network. That is, input to the network along a given dimension consists of a vector of numbers, each between the minimum and maximum activation values of the units in the input layer of the network. There are several ways to represent dimensional input in the form of a vector, varying in the extent to which they make explicit the ordering of points along the dimension. At one extreme is a completely localized encoding, in which each dimensional vector contains one maximum value and the remainder of the numbers take on the minimum value. This form of encoding completely obscures ordering along the dimension because there is no correlation between the numbers in di erent positions in the vector (or the activations of units in each dimension group). At the other extreme is a\thermometer" encoding (Harnad, Hanson,& Lubin, 1991). In a thermometer representation, each of the positions in the vector corresponds to a point along a scale, and the value to be encoded normally falls between two of the positions. All of those positions to the\right" of this point take on their minimum values, the rst position to the\left" of this point takes on an intermediate value, and all of the other leftward positions take on their maximum values. 9
3.1 The Network Architecture
Why do children learn nouns such as cup faster than dimensional adjectives such as big? Most explanations of this phenomenon rely on prior knowledge of the noun-adjective distinction or on the logical priority of nouns as the arguments of predicates. In th
ObjectLinguistic Context
Why do children learn nouns such as cup faster than dimensional adjectives such as big? Most explanations of this phenomenon rely on prior knowledge of the noun-adjective distinction or on the logical priority of nouns as the arguments of predicates. In th
note that, because the network is given no actual syntactic context, the noun context (what is it? ) is indistinguishable from the adjective contexts (what color is it?, etc.) at the start of training. In terms of the network's architecture, there are just several equally di erent linguistic context inputs that might be viewed as corresponding to noun, color, size, and texture. There is no hierarchical organization of the adjective terms in the architecture; that is, there is nothing that groups the adjectives as a class in opposition to the nouns. Critically, from the perspective of the network, there is also no distinction between the input activation that corresponds to the object and that which corresponds to the question. From the network's point of view, there is just one input vector of 66 numbers jointly specifying an event in the world in terms of the ve perceptual dimensions and the linguistic context input that co-occurs with the presentation of the object. The hidden layer of the network compress
es the input patterns into a smaller set of units, 15 to 24 units in the experiments we report here.3 Thus at this level, the system no longer has direct access to the input dimensions. This is an important aspect of the architecture and an important theoretical claim. It means that input dimensions that are distinct at input are not (at least not without learning) represented separately. This aspect of the architecture is based on considerable research indicating that young children have di culty attending selectively to individual dimensions (Aslin& Smith, 1988) and on our past use of this architecture to model developmental changes in selective attention to dimensions (Gasser& Smith, 1991; Smith, 1993). We will discuss more fully the wider implications of this aspect of the network in the General Discussion. The output layer consists of a single unit for each adjective and noun. A+1 activation on an output unit represents the network's labeling the input object with the corresponding word. A -1 activation represents the network's decision that the corresponding word is inappropriate for the input object, and a 0 activation represents an intermediate response, one that might be made if an object is described by the category but that is not an appropriate answer to the linguistic input question, for example, if\red" were the response to the question\what is it?" for a red dog. The speci c learning rule used operates as follows. During training, a target is associated with each input pattern; this target represents the appropriate response to the input. In ordinary back-propagation, each output unit receives a target on each trial. But, as noted above, this is an implausible procedure, as it means that all possible responses which are not appropriate are punished. Further, as noted above, not all wrong answers are wrong in the same way and unlikely to be responded to the same way by parents. Accordingly, we give the network feedback for only two sorts of words, the correct word and any incorrect words to which the network has made a signi cant response. We de ned a\response threshold" for the word units, 0.05 in all of the experiments reported on here; only activations above this threshold are treated as overt responses for which feedbackIncreasing the number of units in the hidden layer of the network both speeds up performance and leads to improvement in the asymptotic level of performance.3
3.2 The Learning Rule
Why do children learn nouns such as cup faster than dimensional adjectives such as big? Most explanations of this phenomenon rely on prior knowledge of the noun-adjective distinction or on the logical priority of nouns as the arguments of predicates. In th
is possible. Further, the target for these explicit errors depends on the input as follows. 1. The target for a correct response is+1. 2. For a response which is not a correct label for the input object under any circumstances (e.g.,\small" for a large, red object), the target for the corresponding output unit is -1. 3. For a response which would be a correct label for the input object if it matched the lexical dimension input (e.g.,\large" for a large, red object when the input question is\what color is it?"), the t
arget for the corresponding output unit is 0.
4 Experiments
In Experiment 1, we investigate how this simple three-layer network simultaneously learns many categories organized to be like nouns and to be like adjectives with respect to the properties of shape, volume, overlap, and number of di erent categories. The central question is whether their will be a noun advantage early in learning and whether, prior to complete learning, the network will show partial knowledge that nouns and adjectives are distinct classes of words.
4.1 Experiment 1: Nouns vs. Adjectives in General
4.1.1 Stimuli
The input to the network consisted of an object described on ve perceptual dimensions and the question accompanying the object. The input objects were instances of 30 possible categories. Each input object had a value for each of the ve perceptual dimensions, and each category was de ned in terms of the range of values that its instances could take along each of the dimensions. Twenty of these categories were organized to be noun-like and 10 were organized to be adjective-like. Each noun was de ned in terms of a range of 1/10 of the possible values along each of the ve input sensory dimensions. Each adjective category was de ned in terms of a range of 1/5 of the possible values along one of the input dimensions and any value along the other four. Thus each noun spanned 1 10 1 10 1 10 1 10 1 10= 0 00001 of the multi-dimensional space of all possible categories whereas each adjective spanned 1 5 of the space. Table 1 shows ranges of possible values on the ve dimensions for two of the noun and three of the adjective categories. Note that the noun categories may overlap on one or more dimensions (dimensions 2 and 5 in the example categories). No noun categories overlap completely, however. This is not so for the adjective categories. In Table 1, adjective 1 overlaps with both adjective 2 and 3 because it is possible to create an object which is an instance of both adjective 1 and adjective 2 or both adjective 1 and adjective 3. The ten adjective categories were organized into ve lexical dimensions by association with the speci c input dimension whose values were constrained within the adjective category and by association with a speci c linguistic context input, e.g.,\what size=====:=
Why do children learn nouns such as cup faster than dimensional adjectives such as big? Most explanations of this phenomenon rely on prior knowledge of the noun-adjective distinction or on the logical priority of nouns as the arguments of predicates. In th
Perceptual Dimensions Noun 1 0 9 1 1 0 2 0 1 0 3 0 1 0 4 0 1 0 5 0 1 Noun 2 0 1 0 1 0 2 0 1 0 4 3 0 5 0 4 4 0 5 0 5 0 1 Adj 1 any any any 08 4 1 any Adj 2 0 1 0 2 any any any any Adj 3 0 8 1 1 any any any any Table 1: Experiment 1: Ranges of Values on Perceptual Dimensions for 5 Input:< v<< v<:< v<:< v<:< v<:< v<:< v<::< v<::< v<:< v<::< v<< v<::< v<
Objects.
v1, etc. represent the values on the ve dimensions. Each range is expressed in terms of proportions of the distance from the minimum to the maximum value.
is it?" Thus the ten adjectives were structured into ve dimensions each with two contrasting terms.4 In Table 1,
adjectives 2 and 3 belong to the same lexical dimension. For each training instance, the inputs were generated as follows. First an output category was selected at random from the set of 30 possible outputs (the 20 nouns and the 10 adjectives). The selection of the relevant output determined the linguistic context input. Then for each of the ve perceptual dimensions, a possible value was picked at random consistent with the selected output. The linguistic context input consisted of the pattern representing a question that would be appropriate for the selected category, each question corresponding to a lexical dimension. For example, if the category was big, the input unit representing what size it is? was turned on (that is, its output was set to 1.0), and the other linguistic context units were turned o . If the category was dog, the input unit representing what is it? was turned on, and the other linguistic context units were turned o . Because there was randomness in the selection of output categories and corresponding input objects, because the input objects varied continuously, and because the targets depended in part on the network's response, the network was never trained more than once on a particular input-target pair.
4.1.2 Method
On each training trial, the network was presented with an input (object plus linguistic context), generated as just described, and an appropriate target on the output. The weights in the network, other than those feeding output units for which no targets were available, were then adjusted according to the back-propagation algorithm. Following each presentation of 1000 input patterns the network was tested on 500 novel inputs generated in the same fashion as the training patterns. There are several options for evaluating the network's performance. We chose to look only at the output unit with the highest activation, unless this unit's activation was not above the response threshold, in which case the network was viewed as not making any overt response at all. Our assumption was that production processes not modeled in our network wouldAs we will see in subsequent experiments, the noun advantage in the network does not depend on there being only two terms for each adjective dimension.4
Why do children learn nouns such as cup faster than dimensional adjectives such as big? Most explanations of this phenomenon rely on prior knowledge of the noun-adjective distinction or on the logical priority of nouns as the arguments of predicates. In th
Performance
1.000.900.800.700.600.500.400.300.200.100.00
0.001.002.00Training Patterns x 103Adjectives
Why do children learn nouns such as cup faster than dimensional adjectives such as big? Most explanations of this phenomenon rely on prior knowledge of the noun-adjective distinction or on the logical priority of nouns as the arguments of predicates. In th
0 Training Patterns 1000 Training Patterns Incorrect output Noun Context Adj Context Noun Context Adj Context Nouns .66 .34 .65 .35 Adjectives .70 .30 .37 .63 Table 2: Experiment 1: Within- and Between-Part-of-Speech Errors. Figures represent the proportion of incorrect overt responses in di erent part-of-speech categories.
di erent dimensional terms, analogous to knowing, for example, that wet and dry are attributes of one kind and that rough and smooth are attributes of another kind? These are important questions because children show clear evidence of the rst distinction in their early errors but not the second distinction (see Carey, 1994; Smith, 1984; Smith& Sera, 1992; but see Backscheider& Shatz, 1993). To answer the rst question, we de ned\within-part-of-speech errors" as the proportion of cases with an incorrect response (above threshold) for which the response was the correct\part of speech" (adjective or noun). Table 2 shows the proportion of withinand between-part-of-speech errors at the start of learning and after 1000 training trials. At the start of learning when the network knows nothing, the relative frequency of noun and adjective responses (2:1) corresponds to the relative number of noun and adjective output units (2:1) and is unrelated to the linguistic context input. However, as learning progresses, the character of the error becomes associated with the linguistic input that speci es the class of possible answers. After 1000 training trials, when the network still has not yet fully acquired the adjective terms, the network shows implicit knowledge that all the adjectives form a class. To answer the second question, we de ned\within-dimension errors" as the proportion of cases in which adjective questions received incorrect adjective responses and the response was on the right dimension. Noun questions and noun respo
nses to adjective questions did not contribute to this measure. At the start of training, such withindimension errors were rare, occurring .08 of the time. The frequency of within-category errors increased with training, reaching a maximum of .23 of the time after 2000 trials. Thus the network shows little implicit knowledge of which terms refer to attributes on the same dimension. The central result of this simulation is that a simple connectionist network when simultaneously trained on adjective-like and noun-like categories learns the nouns faster, just as children do. Yet this di erence is not due to any built-in preferences on the part of the network nor to any pre-training representation of a di erence between nouns and adjectives. It is due entirely to the similarity structure inherent in the learning task| that is, to the nature of the categories which the network learns and the linguistic input which speci es which of several classes of overlapping categories is the relevant one. In brief, a learner can show a marked advantage for the learning of one kind of category over 15
4.1.4 Discussion
Why do children learn nouns such as cup faster than dimensional adjectives such as big? Most explanations of this phenomenon rely on prior knowledge of the noun-adjective distinction or on the logical priority of nouns as the arguments of predicates. In th
another without any built-in distinction between them. The developmental precedence of nouns over adjectives in children thus need not derive from a priori conceptual distinctions, as commonly assumed, but rather from quite general similarity-based learning mechanisms. During the course of learning, the network, like young children, also exhibits a structured pattern of errors| dimensional terms are confused with each other and not with nouns. This distinction emerges as a consequence of simultaneously learning not a single adjective class but several di erent adjective categories. The most likely possibility is that this is accomplished by the rapid learning of noun categories. That is, what the network\really knows" may essentially be that adjectives are\not nouns." The implication is that this may be all that young children know too (see Smith, 1995 for a similar suggestion based on empirical evidence from children). The network did not show strong learning of the connection between pairs of terms on a single dimension. This is also consistent with the evidence from children. With the exception of color terms, betweendimension rather than within-dimension confusions characterize children's initial errors (Backscheider& Shatz, 1993; Carey, 1994; Smith& Sera, 1992). This experiment thus demonstrates the viability of a similarity-based approach to the noun advantage in children's early lexical acquisitions. In the following experiments, we examine the speci c contributions of the volume and shape of category extensions, overlap and word-word associations in creating the noun advantage by examining unnaturally structured classes of categories that di er only in their volume, shape, overlap, or associations between linguistic context inputs and outputs. In this experiment, we investigate the role of volume di erences. We create small categorie
s and large categories that are both like nouns in being de ned by similarities on many dimensions. We ask whether smaller categories of this kind have an advantage over larger ones.
4.2 Experiment 2: Category Volume
4.2.1 Stimuli and methodStimuli for this experiment were generated analogously to those in Experiment 1. There were two types of categories, those which spanned relatively wide regions of the space of all possible input objects and those which spanned relatively narrow regions. Both the Small set and the Large set contained 18 words. In the Small set, each word was de ned in terms of a range of 1/6 of the possible values along each input dimension. Thus the extension of each of these categories covered 1 6 1 6 1 6 1 6= 0 00077 of the space of possible inputs. In the Large set, each word was de ned in terms of a range of 1/3 of the possible values along each input dimension, a total of 1 3 1 3 1 3 1 3= 0 012 of the space of possible objects, that is, 16 times the size of the region occupied by the extension of each of the categories in the Small set. Note that the volumes of the two sets are closer than in the rst experiment. The Large and Small categories overlapped in the space of all possible categories. Two linguistic context inputs were used to signal the relevant kind of category, one for which the Large-volume words were====:====: