Bootstrapping Methods for Knowledge Structuring
Principia Cybernetica Web

Bootstrapping Methods for Knowledge Structuring

knowledge, expressed as a network of nodesand links, can be structured in a better way by bootstrapping the distinctions between nodes, leading to the merging, differentiation or integration of ambiguously distinguished concepts


Assume that we have a mental model in the form of a semantic network, which, for example, has been elicited from an individual or group of users. How can we make sure that the model is as simple, as complete and as coherent as possible? In other words, how can we optimize the structure of the knowledge as expressed by the model? To support the necessary knowledge structuring, we have developed an algorithm based on a bootstrapping principle. "Bootstrapping", in this case, means improving the quality of knowledge only by relying on the knowledge that is already available in the model itself, without need for consulting external sources. The bootstrapping techniques assume that the model is organized as a network of nodes, representing concepts and instances, connected by a variety of semantic links, representing if-then rules and IS_A hierarchies. The algorithm searches for similarities between the sets of incoming and outgoing links for two nodes.

For example, suppose that we have two concepts, "pet" and "domesticated animal". Suppose the superclasses (outgoing IS_A links) of these concepts are the same: both are "animals" and both "live with humans". Suppose now that their subclasses or instances are also the same: "Fido" is a "pet" but is also a "domesticated animal". The algorithm interprets this situation as an ambiguity, that can be resolved in either of three ways:

Identification (merging):
"pet" and "domesticated animal" might be considered as synonyms within the model, and the two different nodes should be identified, leaving a single node "pet-domesticated animal".
Differentiation:
The two concepts are actually different, but the model lacks the links to distinguish them, e.g. "pet" should have the property "lives in the house", which "domesticated animal" lacks; or, there are instances of "domesticated animal", such as "Moo the cow", which are not instances of "pet". In that case, the algorithm asks the user to provide the missing information.
Integration (clustering):
Even if the two concepts are different in details, they may have so many properties in common that it is worth integrating them into a new higher order concept, for example, both concepts are special cases of the larger category of "animals that are dependent on people", which also includes some other categories, such as "rats" and "sparrows". In that case, the algorithm may cluster the concepts that have a high overlap in their links, and suggest that cluster to the user as a new concept.
Each of these operations will change the pattern of nodes and links in the mental model, and thus elicit a new round of searching for similarities. For example, if two nodes A and B are identified, two nodes C and D that previously were distinguished by their links to respectively A and B will now point to the same node A-B. Therefore, C and D may themselves need to be identified, differentiated or integrated. Thus, bootstrapping operations will cascade through the network, triggering an on-going process of restructuring, until all ambiguities have been resolved.

Bootstrapping can be applied not only to mental models derived from individual knowledge, but also to the collective mental models that are implicit in the structure of the web. For example, two web documents may have a high overlap in both their incoming and outgoing links, or their semantic associations. This may mean: 1) the two documents are merely copies, stored at different addresses, of the same text, and should be treated by a web agent as identical; 2) the pattern of linkages does not sufficiently reflect the intrinsic differences between the documents, and therefore it is worth creating additional links that differentiate them; 3) the documents belong to a class of similar documents, and it is worth generating a new, "overview" or "index" document, that groups links to all such documents in a single place.

See further:


Copyright© 1999 Principia Cybernetica - Referencing this page

Author
F. Heylighen,

Date
Mar 29, 1999 (modified)
Aug 2, 1994 (created)

Home

Project Organization

Collaborative Knowledge Development

PCP Research on Intelligent Webs

Up
Prev. Next
Down



Discussion

Add comment...