How I may help
LinkedIn Profile Email me!
Call me using Skype client on your machine

Reload this page Testing Data Warehouse DSS

Here are my notes on how to manage data in computers within a large enterprise.

Sound: Drum riff

Take the Brainbench certification test on Data Warehousing Concepts.

 

Topics this page:

  • Functional Testing
  • Performance Testing
  • Architecture
  • Inflows
  • Upflows
  • Downflows
  • Outflows
  • Software
  • Your comments???
  • Related:

  • Data Management
  • SQL
  • Data Entry using Excel

  •  

    Site Map List all pages on this site 
    About this site About this site 
    Go to first topic Go to Bottom of this page


    Set screen Functional Testing Decision Support Systems

      DSS Testing

      A "Black box" approach to testing would start from the end result of all the processing: summary reports created from multi-dimensional cubes, which are equivalent to a table in a relational database. The decision support system's output would be compared against detail data found in the cubes that displays detailed data in multi-dimensional cubes (the Excel Add-in for Hyperion Essbase Analytic Services).

      The add-in may connect to a Essbase Spreadsheet Web Service (Java application), part of Hyperion's Essbase Spreadsheet Services product (since Release 7.1.2)

      Because existing software to compare multi-dimensional cubes are not reliable, verification of multi-dimensional cubes require the generation of a format that is common to both the data mart and and the cubes so that the data can be compared. Such an approach is problematic because two sets of complex extract code would need to be created (an extract from a relational database and an export from a multi-dimensional database). Oracle 10g Data Pump technology can be used to simplify the extract.

      Oracle's Warehouse Builder has a Data Quality Option used to determine, correct, and remove bad data.


    Go to Top of this page.
    Previous topic this page
    Next topic this page

    Set screen Performance Testing Decision Support Systems


    Go to Top of this page.
    Previous topic this page
    Next topic this page

    Set screen Basic Concepts

    • Members are based containing values.
    • Up to 63 dimensions are supported by MS-SQL7/2000.
    • Virtual dimensions are based on member properties, such as the number of children in the household.
    • Aggregations are totals of member values, such as sales by store manager.
    • Write-back is the ability to change member values in order to analyze "what if" effects.
    • This is why the system is sometimes called Decision Support Services
    • A Logical Cube uses measures and dimensions from physical cubes, usually to compartmentalize security permissions. Example is payroll information for department x.
    • Each table is used to store rows and columns of information.
    • A column (field) is a single item of information kept for each record in the table.
    • A row of columns (also called a record) contains the information for a single entry.
    • A cursor points to the record being processed.
    • A primary key is a field, or set of fields, that is used to uniquely identify a specific entry record in a table.
    • A foreign key is used to create a link between the table it is in and another table. The relationship between a primary and foreign key is that a foreign key in one table 'points' to a primary key in another table to create a link or relationship between a record from each table.
    • Referential Integrity describes the validity of the relationship between Parents and child data.
    • A variable is a container to store values which can be changed.
    • Variables must be declared before being referenced (no forward referencing).
    • SQL statements can be embedded into C, COBOL, FORTRAN, and other programming source between exec SQL execute and end-exec for processing by the pre-compiler .
    • References to host variables from within a SQL block begin with a colon (:).
    • Database engines such as "SQL Server" do the queries, additions, deletions, etc that are performed on the data.
      SQL = Structured Query Language The first implementation of SQL was on IBM's DB2 mainframe databases.
    • A data warehouse collects historical subject-oriented data for non-volatile time-series analysis using analytical engines such as Excel 2000. Such databases are kept separate from operational data stores to avoid degrading the performance of on-line transactions. This separation also allows for integration of data from various sources.
    • A trigger is a procedure that is stored in the database and executed automatically (implictly) before or after insert, update, or delete commands is issued against the associated table. Triggers are used to transparently create events logs which can be used to generate statistics or to review security enforcement. Trigger conditions must be a SQL condition and not a PL/SQL condition.
    • Scalar datatypes have no internal components.
    • composite datatypes have internal components which can be manipulated.
     
    Go to Top of this page.
    Previous topic this page
    Next topic this page

    Set screen Architectures Compared

      Microsoft's SQL Server Analytical Server Services uses Decision Support Objects (DSO) to communicate with a Pivot Table service on the client. Its repository metadata is stored by default in the msmdrep.mdb Access database.

      On clients, VB applications communicate with the Pivot Table service via OLE DB or ADO MD (Microsoft's ActiveX Data Objects Multi-Dimensional). This is the way web servers access the database.

      Microsoft looks to Excel's Pivot Table. However, Excel is limited to 65,000 rows.


    Go to Top of this page.
    Previous topic this page
    Next topic this page

    Set screen Inflows

      Data from OLTP and legacy systems provide Inflow into staging servers of a data warehouse.

      For example, in a bank, data is gathered from loan processing, pass book processing, and accounting systems. In a retail store, data is gathered from point-of-sale devices, cash registers, and entry/exit monitors.

      The first step is typically data cleansing.

      MMD's (Multi-Dimensional Databases) address the use of data generated by on-line transaction processing (OLTP) systems.

      Neil Raden's Modeling the Data Warehouse article excerpted in the January 29, 1996 issue of Information Week, identified these differences:

      Relational Model Multi-Dimentional Model
      Transaction View Slice of Time View
      Local Consistency Global Consistency
      Audit Trail Big Picture
      Explicit Relationships Implied Relationships

      Collected data typically go through cleaning and transformation before being mapped and loaded into the warehouse. Examples of data cleaning include removing inconsistencies, adding missing fields, and cross-checking for data integrity. Examples of data transformation include adding date/time stamp fields, summarizing detailed data, and deriving new fields to store calculated data.

      Information about data in the warehouse (such as the location, description, and other information) about the data structures is referred to as Metadata. Like a library card catalog, metadata helps the reader determine if an item of information exists in the library and if it does, provides a description of it and points to its location.. The technical description for each item of information include file name, file type, location, data source, rules used in cleaning or mapping, and date of creation or last access. The business catalog enables end users to interpret the contents of a data warehouse by:

      • listing pre-defined queries and reports
      • presentingnon-technical description of data sources and formulas for defining computed fields used in ad hoc querying, and
      • providing a tool (GUI) for exploring information (and Information Explorer).

      Cube (blue), 1991 (Serigraph) by Sol Lewitt
    Get this print framed for your wall!


    Go to Top of this page.
    Previous topic this page
    Next topic this page

    Set screen Upflows

      Data warehouses provide an Upflow of summaries rolled-up from the detailed data. Averages and summaries are pre-calculated and stored as a separate unit to answer common queries raised in decision-making.

      Companies offering products to perform ETL (Extraction, Transformation, and Loading) include:

      • Ascential, acquired by IBM
      • Oracle Warehouse Builder (OWB), which take advantage of Oracle database set-based and row-based operations, PL/SQL bulk processing and table functions, foreign key constraint manipulation, use of inline views to speed loading, partition exchange loading, external table support, multi-table insert, merge, direct path insert, and parallel operations.
      • Sunopsis (acquired by Oracle in late 2006) favors the "E-LT" approach.
      • Informatica COGNOS

      These tools are faster and less error prone than manual scripting because they generate SQL code based on metadata about changing sources or targets. NOTE: The OWB repository can be exported as a metadata loader (.mdl) file type.

      ETL tools also provide a visual record of the process that can be adjusted when sources or targets change.

      A variant of ETL is ELT (Extract, Load, and then Transform).

    Set screen Outflows

      The greatest value (payback) from a Data warehouse is in the way it provide quick and flexible Outflows of transformed data that leads its users to actionable insights useful to forecasting and planning decisions. This is why such systems are called "Business Intelligence".

      Zyga consultants say:

      Data Warehousing with OLAP tools also enables companies to manage by exception. Managers today are deluged with status reports on company operations. Often this information either comes too late or does not require managers to take any corrective action. By using threshold analysis and intelligent agents that trigger exception alarms, a Data Warehouse provides managers timely access to only the critical information they require in order to take action.

      The heart of a data warehouse is its planning and analysis applications called On-Line Analytical Processing (OLAP). The term "MOLAP" for Multi-dimensional OLAP is also used because, unlike OLTP entity-relationship models consisting of two-dimensional tables, data warehouses use a multi-dimensional model for storing data.

      Information in a Data Warehouse is organized into various dimensions. For example,

      • a sales analysis database is organized by product, time, territory, and other dimensions.
      • an invoice database could use time, customer, product, and supplier dimensions.
      • A flat dimension have dimension members who are equivalent (such as a Category of Actual, Budget, Forecast, What-if).
      • A Hierarchical dimension have dimension members that are parts of a whole, such as PreTaxIncome and Taxes totaling to NetIncome.

      The most common basis for summarization are the dimensions:

      • Measures

        allow changes to the view of data, such as Periodic, Week To Date (WTD), Month To Date (MTD), Quarter to date (QTD), Year to date (YTD). Daily transactions are aggregated to provide consolidated weekly or a monthly comparisons viewed using a Calendar interface.
      • Dimensions

        such as Service or Product and their properties (such as COLOR and SIZE) can totaled into a hierarchy of BRAND, MANUFACTURER, CATEGORY, or other aggregate.
      • Currency

        (InputCurrency and RptCurrency).
      • Location (Geography)

        each retail store's data is summarized into CITY, REGION, STATE, or COUNTRY levels of granularity.
      • To group applications such as Finance, Sales, etc. which can share common dimensions, SAP's Business Planning and Consolidation (BPC) application uses the concept of Application Sets, equivalent to a single MS Analysis Services database.

      Each dimension, such as time, can be structured in a hierarchy of consolidation levels -- years, quarters, months, weeks, individual days, or other level of data granularity. But "day of the week" is an extended attribute.

      The lower (finer, more detailed) the level of granularity available for analysis, the more costly it is to store and process the data.

      Other dimensions depend on business needs:

      • Customer Entity

        who purchased can totaled into a hierarchy of INDIVIDUALBUYER, MARKET CHANNEL, LIFESTAGE, or other aggregate.
      • Diagnosis / Need:

        (e.g., medical DCD codes).
      • Entity Organization

        providing the server/product, which can be totaled into a hierarchy of DEPARTMENT, STORE, DISTRICT, DIVISION, CORPORATION, or other grouping.
      • Supplier

      • Account

        (Income Statement, Balance Sheet, Cash Flow, KPI)
      • Product Inventory

      • Product Returns

      • Event

      • DataSrc

        (for Data Source).
      • IntCo

        Inter-Company Eliminations

      At the center of the data model, measures (numeric attributes such as sales dollars, Invoice Amount, etc.) are stored in a fact table. To make access information multi-dimensionally, fact tables also contain several foreign keys used to join facts to several dimension tables.

      Dimension tables organize and index the data stored in a fact table. A visual representation of this connection between fact tables and dimension tables appears as a star.

     


    Go to Top of this page.
    Previous topic this page
    Next topic this page

    Set screen Updates to Slowly Changing Data


    Go to Top of this page.
    Previous topic this page
    Next topic this page

    Portions ©Copyright 1996-2011 Wilson Mar. All rights reserved. | Privacy Policy |


    How I may help

    Send a message with your email client program


    Your rating of this page:
    Low High




    Your first name:

    Your family name:

    Your location (city, country):

    Your Email address: 



      Top of Page Go to top of page

    Thank you!