Unsupervised adaptive clustering for data prospecting and data mining
By Ben A. Hitt, Ph.D. Senior Fellow, American Heuristics Corporation
There was a time, well within the scope of living memory, when the dream of market analysts was to have as much data about their prospective customers as they could possibly get. It seemed clear that if one knew all they could about a customer or group customers, the correct marketing approach would become evident. Now, that dream not only has come true, but the mass of data concerning us all has become overwhelming, hence the emergence of data mining.
Indeed, it can be said that the marketer's dream has become something of a nightmare. What perhaps has happened is that the means of collecting data outpaced the ability to analyze it, and what may have initially begun as an analytic backlog rapidly crossed over into a veritable flood that resulted in the present day amorphous mass of both meaningful and meaningless facts.
It is now clear that means for sifting through this mass of data are required. It is also clear that the tools developed for gleaning information must be intelligent to some degree. The basic requirement is the location of useful nuggets of information in an otherwise chaotic dataspace. For example, a human trying to associate demographics in a database of one million records would quickly get lost in one or both of two ways.
First the search would generate so many demographic profiles that the associations would rapidly grow beyond comprehension. Second, narrowing the objectives to reduce the risk of being overwhelmed would likely result in the loss of critical relationships.
Fortunately, a group of techniques has been developed to find information in the data glut and present it to an analyst in support of decision making. This set of tools supports a practice called data mining. They consist of a variety of statistical techniques, logical methods, neural networks and some new unsupervised adaptive clustering techniques.
The importance of non-linearity
The fact is, we humans generated the data glut and continue to do so. That is indeed the source of the chaos and the masking of those elusive nuggets of information. Humans do not behave linearly, but rather exhibit discrete patterns of behavior. Therefore, any attempt to extract information about human behavior using linear techniques will ultimately miss some, if not most of the information contained in the data. The old jokes about the average American family are not without substance. None of us have fractional children.
The problem is compounded when one realizes that different lifestyles may exhibit similar external behaviors. A double income, no kid family may eat at a pizza parlor as frequently as a single mom with a couple of kids. The reasons are different but the behavior is the same. A linear modeling technique would miss the appropriate correlation. Statistical methods, for the most part, depend on linearity. There are statistical methods that account for and even take advantage of non-linearity, but they are highly sophisticated and work well, at least at this juncture, only in the hands of professional statisticians.
Enter neural networks
A successful approach to modeling non-linear relationships has been the so-called neural networks. These algorithms are the result of cognitive science's attempts to understand and mimic learning and memory in the human brain. Humans are pattern recognizers by design. We are naturally able to recognize discrete patterns in our environment, correlate them with events and alter our behavior accordingly. Neural networks can, in a limited sense, do the same thing. One particular neural network type, the back-propagation algorithm has performed very well in this regard and it is now accepted as a reliable method for data mining.
However, it has its shortcomings. The major difficulty lies in the fact that the relationships between specific variables and the neural network results are difficult, at best, to explain. It would be beneficial to understand something of the pattern/outcome correlation to assist in an overall marketing approach, for example. Additionally, the range of data with which it was trained limits the neural network. New patterns in the data will likely be classified incorrectly. Finally, the neural network is a supervised technique. A cause and effect relationship is required, and historical outcomes must be known.
An issue that requires some attention is that an analyst may not always know what the relationships are or even if any exist. How can one tell if a database is worth mining in the first place? Supervised techniques cannot address this issue. The search of data for undisclosed relationships is often referred to as knowledge discovery. In keeping with the mining metaphor however, I suggest the term data prospecting. A goal here is gaining an understanding of the actual patterns of behavior captured in the data first. After that is done then correlation of those patterns of behavior with specific events can take place in a straightforward fashion.
Introducing unsupervised adaptive pattern recognition
There are now several useful tools that are applicable to data prospecting. These do not require knowledge of a relationship or historical outcomes. All that needs defining are the specifics of the pattern, i.e. the definition of the variables that define the pattern. These may be a set of demographic variables or data relating to buying history, or both. The unsupervised algorithms then sort through the data and classify records in the database according to similarities in the patterns. This is, in fact, a method of clustering. The newer techniques are also adaptive or vigilant in that they can recognize novel patterns as they appear in the data.
A major advantage to all of these methods is that the weights are easily translated into real world values. If relationships exist between variables, they are easily expressed in normal understandable terms. The algorithms include Fuzzy Adaptive Resonance Theory, Lead Clustering or Feature Mapping, and American Heuristics' Adaptive Fuzzy Feature Map. They all provide for easily understood pattern recognition and allow for mapping into a supervised technique.
A detailed technical description of them is beyond the scope of this article. However, those who choose to prospect and mine data for information should be aware of their existence and look for tools that incorporate them or their capabilities. Proper application of these algorithms will reward the data prospector/miner with more comprehensive information to support the decision process.
Ben A. Hitt, Ph.D., has many years of experience using pattern recognition technologies and intelligent software tools to solve business problems. He has taught thousands of students in the use and principles of advanced software and machine learning technologies. Dr. Hitt is Senior Fellow at American Heuristics Corporation (AHC), an advanced software technology company based in Triadelphia, WV. (This article was previously published in the Gordian's Quarterly Newsletter June 1, 1998 issue. The parent company of the Gordian Institute is American Heuristics Corporation (AHC). The Gordian Institute can be contacted at: 1-800-405-2114, or E-mail:[email protected]