Principia Cybernetica Web

Cluster Analysis of Word Associations

The associative network of 150 nouns resulting from our adaptive hypertext experiment was analysed by means of cluster analysis. This is a statistical technique equivalent to some of the methods proposed for knowledge structuring. It makes it possible to find higher order classes or categories, containing similar nodes, or, in this case, semantically related words.

Although we selected the words to be a priori independent, by not restricting them to any particular domain, the network's evolution discovered a number of strongly related semantic "families". A cluster analysis (using powers of the network's matrix to find strongly inter-connected groups of nodes) of the matrix of connections revealed a number of stable and separable clusters corresponding to highly general categories. The following 9 clusters of associated words, each denoted by an intuitive label for the underlying conceptual category, were found in the second experiment's final structure:

"Time":
age, time, century, day, evening, moment, period, week, year
"Space":
place, area, point, stage
"Movement":
action, change, movement, road, car
"Control":
authority, control, power, influence
"Cognition":
knowledge, fact, idea, thought, interest, book, course, development, doubt, education, example, experience, language, mind, name, word, problem, question, reason, research, result, school, side, situation, story, theory, training, use, voice
"Intimacy":
love, family, house, peace, father, friend, girl, hand, body, face, head, figure, heart, church, kind, mother, woman, music, bed, wife
"Vitality":
boy, man, life, health
"Society":
society, state, town, commonwealth
"Office":
building, office, work, room
Although the learning algorithms only work on links and not on groups of nodes, it is remarkable how well the resulting clusters fit in with intuitive categories. With rare exceptions (e.g. "side" in the "Cognition" cluster), all of these words seem to be located in the right class. This again seems to confirm that the set-up achieves its aim of absorbing the common semantics of a heterogeneous group of users. The 'cognition' cluster makes up 33% of all words over all clusters, indicating its central position and importance in the network. It should be noted that this prominence may be due to the specific selection of texts in the LOB corpus from which we used the most frequent words, which may have been biased towards more "intellectual" activities.

Note also that at least the first 5 categories clearly correspond to fundamental philosophical or cybernetical concepts (thus, they are also "categories" in the Kantian sense). It is these kinds of fundamental, "ontological" concepts which the Principia Cybernetica Project attempts to define. Whereas the project's methods basically consists of traditional philosophical analysis and theorizing, the present experiment shows that such a semantic analysis may also be carried out automatically, by using the structure implicit in networks of intuitive associations.


Copyright© 1997 Principia Cybernetica - Referencing this page

Author
J. Bollen, & F. Heylighen,

Date
Jan 22, 1997 (modified)
Dec 9, 1996 (created)

Home

Project Organization

Collaborative Knowledge Development

PCP Research on Intelligent Webs

Learning Webs

Up
Prev. Next
Down



Discussion

Add comment...