Separation of Warehouse Data

Data Separation Needs Monitoring to Ensure Data Accuracy

© Duane Sharp

Apr 1, 2009
Data Quality, photorack
As data warehouses grow in volume and complexity, data in the warehouse may separate into two classes of data: active and inactive.

One example of this characteristic is that a terabyte of data may have 50GB that are actively used and 950GB that are accessed perhaps only once a month or once a quarter. The organization pays the same for the data regardless of how frequently it is used. The data warehouse administrator can either archive the inactive data or place it in near-line storage. Accessing the inactive data, moving it to near-line storage, then deleting the data from the data warehouse defines the separation.

While it is true that all data warehouses face separation, the degree of separation varies among warehouses, based on these factors:

  • Size of the warehouse
  • Type of business the warehouse supports
  • Who uses the warehouse
  • What kind of processing is being done
  • Level of sophistication of end-user analysts

Critical Success Factors

There are three critical success factors that each company needs to identify before moving forward with the issue of data quality:

  • Commitment by senior management to the quality of corporate data
  • Definition of data quality
  • Quality assurance of data.

The senior management commitment to maintaining the quality of corporate data can be achieved by instituting a data administration department that oversees the management of corporate data. The role of this department will be to establish data management standards, policies, procedures, and guidelines pertaining to data and data quality.

Data Quality

In addition to referring to the usefulness of the data, data quality has to be defined as data that meets the following five criteria:

  1. Complete
  2. Timely
  3. Accurate
  4. Valid
  5. Consistent

The definition of data quality must include the definition of the degree of quality that is required for each element being loaded into the data warehouse. If, for example, customer addresses are stored, it might be acceptable that the four-digit extension to the zip code, or the three-digit extension to a postal code, is missing. However, the street address, city, and state or province are of much higher importance. This parameter must be identified by each individual company and for each item that is used in the data warehouse.

A third factor that needs to be considered is the quality assurance of data. Since data is moved from transactional/legacy systems to the data warehouse, the accuracy of this data needs to be verified and corrected if necessary, and this will often involve cleansing of existing data. Since no company is able to rectify all of its unclean data, procedures have to be put in place to ensure data quality at the source.

Modify Business Processes

This task can only be achieved by modifying business processes and designing data quality into the system. In identifying every data item and its usefulness to the ultimate users of this data, data quality requirements can be established. One might argue that this is too costly, but is has to be kept in mind that increasing the quality of data as an after-the-fact task is five to ten times more costly than capturing it correctly at the source.

If companies want to use a data warehouse for competitive advantage and reap its benefits, the issue of data quality is extremely important. Only when data quality is recognized as a corporate asset by every member of the organization will the benefits of data warehousing and CRM initiatives be realized.


The copyright of the article Separation of Warehouse Data in Customer Relations is owned by Duane Sharp. Permission to republish Separation of Warehouse Data in print or online must be granted by the author in writing.


Data Quality, photorack
       


Post this Article to facebook Add this Article to del.icio.us! Digg this Article furl this Article Add this Article to Reddit Add this Article to Technorati Add this Article to Newsvine Add this Article to Windows Live Add this Article to Yahoo Add this Article to StumbleUpon Add this Article to BlinkLists Add this Article to Spurl Add this Article to Google Add this Article to Ask Add this Article to Squidoo