Analytical Data Mining Techniques

Several Analysis Methodologies Can be Used to Analyze Data

© Duane Sharp

Mar 16, 2009
Analytical Data Mining Techniques , photorack
The techniques employed range from the use of classifying algorithms, to associating sets of records, and include clustering, estimating and sequenced-based analysis

Classification is perhaps the most often employed data mining technique. It involves a set of instances or predefined examples to classify the population of records at large.

Classification Algorithms

The use of classification algorithms begins with a sample set of pre-classified example transactions. For a fraud detection application, this would include complete records of both fraudulent and valid transactions, determined on a record-by-record basis. The classifier-training algorithm uses these pre-classified examples to determine the set of parameters required for proper identification. The algorithm then encodes these parameters into a model called a classifier, or classification model.

In the fraud detection case, the classifier would be able to identify probable fraudulent activities. Another example would involve a financial application where a classifier capable of identifying risky loans could be used to aid in the decision of whether or not to grant a loan to an individual.

Association

Association is an operation performed against a set of records -- a collection of items and a set of transactions, each of which contains some number of items from a given collection. The operation returns ‘affinities’ that exist among the collection of items. Association tools discover rules based on items that occur together in a given event or transaction.

‘Market basket’ analysis is a common application that uses association techniques. Market basket analysis is used by retailers running an association function over the point of sale transaction log. The goal is to determine affinities among shoppers.

Another example of the use of association discovery would be an application that analyzes the claim forms submitted by patients to a medical insurance company, to discover patterns among the claimants’ treatment.

Sequence-based Analysis

Sequence-based analysis is often used as a variation of the association technique, when there is additional information to tie together a sequence of purchases, an account number, a credit card, or a frequent shopper number, for example, can all be used to track multiple purchases in a time series.

Rules that capture these relationships can be used to identify a typical set of precursor purchases that might predict the subsequent purchase of a specific item. In the software example above, sequence-based mining could determine the likelihood of a customer purchasing a particular software product to subsequently purchase complementary software, or a hardware device such as a joystick or a video card.Sequence-based mining can be used to detect the set of customers associated with frequent buying patterns.

Clustering

Clustering is often one of the first steps in data mining analysis. It identifies groups of related records that can be used as starting points for exploring further relationships. This technique supports the development of population segmentation models, such as demographic-based customer segments. Additional analysis using standard analytical and other data mining techniques can determine the characteristics of these segments with respect to some desired outcome.

Clustering segments a database into different groups. The goal is to find groups that differ from one another as well as similarities among members. The clustering approach assigns records with a large number of attributes into a relatively small set of groups, or “segments”. This assignment process is performed automatically by clustering algorithms that identify the distinguishing characteristics of the data set and then partition the space defined by the data set attributes along natural “boundaries.”

Estimation

Estimation is a variation on the classification technique, involving the generation of scores along various dimensions in the data. Rather than employing a binary classifier to determine whether a loan applicant, for instance, is approved or classified as a risk, the estimation approach generates a credit-worthiness ‘score’ based on a pre-scored sample set of transactions. That is, sample data (complete records of approved and risk applicants) are used as samples in determining the worthiness of all records in a data set.

Each data mining technique has a place in the analysis of customer data, and should be assessed for its characteristics as they relate to specific data mining requirements.


The copyright of the article Analytical Data Mining Techniques in Customer Relations is owned by Duane Sharp. Permission to republish Analytical Data Mining Techniques in print or online must be granted by the author in writing.


Analytical Data Mining Techniques , photorack
       


Post this Article to facebook Add this Article to del.icio.us! Digg this Article furl this Article Add this Article to Reddit Add this Article to Technorati Add this Article to Newsvine Add this Article to Windows Live Add this Article to Yahoo Add this Article to StumbleUpon Add this Article to BlinkLists Add this Article to Spurl Add this Article to Google Add this Article to Ask Add this Article to Squidoo