|
||||||
Analytical Data Mining TechniquesSeveral Analysis Methodologies Can be Used to Analyze Data
The techniques employed range from the use of classifying algorithms, to associating sets of records, and include clustering, estimating and sequenced-based analysis
Classification is perhaps the most often employed data mining technique. It involves a set of instances or predefined examples to classify the population of records at large. Classification AlgorithmsThe use of classification algorithms begins with a sample set of pre-classified example transactions. For a fraud detection application, this would include complete records of both fraudulent and valid transactions, determined on a record-by-record basis. The classifier-training algorithm uses these pre-classified examples to determine the set of parameters required for proper identification. The algorithm then encodes these parameters into a model called a classifier, or classification model. In the fraud detection case, the classifier would be able to identify probable fraudulent activities. Another example would involve a financial application where a classifier capable of identifying risky loans could be used to aid in the decision of whether or not to grant a loan to an individual. AssociationAssociation is an operation performed against a set of records -- a collection of items and a set of transactions, each of which contains some number of items from a given collection. The operation returns ‘affinities’ that exist among the collection of items. Association tools discover rules based on items that occur together in a given event or transaction. ‘Market basket’ analysis is a common application that uses association techniques. Market basket analysis is used by retailers running an association function over the point of sale transaction log. The goal is to determine affinities among shoppers. Another example of the use of association discovery would be an application that analyzes the claim forms submitted by patients to a medical insurance company, to discover patterns among the claimants’ treatment. Sequence-based AnalysisSequence-based analysis is often used as a variation of the association technique, when there is additional information to tie together a sequence of purchases, an account number, a credit card, or a frequent shopper number, for example, can all be used to track multiple purchases in a time series. Rules that capture these relationships can be used to identify a typical set of precursor purchases that might predict the subsequent purchase of a specific item. In the software example above, sequence-based mining could determine the likelihood of a customer purchasing a particular software product to subsequently purchase complementary software, or a hardware device such as a joystick or a video card.Sequence-based mining can be used to detect the set of customers associated with frequent buying patterns. ClusteringClustering is often one of the first steps in data mining analysis. It identifies groups of related records that can be used as starting points for exploring further relationships. This technique supports the development of population segmentation models, such as demographic-based customer segments. Additional analysis using standard analytical and other data mining techniques can determine the characteristics of these segments with respect to some desired outcome. Clustering segments a database into different groups. The goal is to find groups that differ from one another as well as similarities among members. The clustering approach assigns records with a large number of attributes into a relatively small set of groups, or “segments”. This assignment process is performed automatically by clustering algorithms that identify the distinguishing characteristics of the data set and then partition the space defined by the data set attributes along natural “boundaries.” EstimationEstimation is a variation on the classification technique, involving the generation of scores along various dimensions in the data. Rather than employing a binary classifier to determine whether a loan applicant, for instance, is approved or classified as a risk, the estimation approach generates a credit-worthiness ‘score’ based on a pre-scored sample set of transactions. That is, sample data (complete records of approved and risk applicants) are used as samples in determining the worthiness of all records in a data set. Each data mining technique has a place in the analysis of customer data, and should be assessed for its characteristics as they relate to specific data mining requirements.
The copyright of the article Analytical Data Mining Techniques in Customer Relations is owned by Duane Sharp. Permission to republish Analytical Data Mining Techniques in print or online must be granted by the author in writing.
|
||||||
|
|
||||||
|
|
||||||