dmdw lesson 03 - data warehouse theory
DESCRIPTION
I'm planning to give you a detailed introduction to the concepts of the data warehouse world. We will also see why data mining and data warehouses are closely connected to each other.TRANSCRIPT
STAATLICHANERKANNTEFACHHOCHSCHULE
Author I: Dip.-Inf. (FH) Johannes HoppeAuthor II: M.Sc. Johannes HofmeisterAuthor III: Prof. Dr. Dieter HomeisterDate: 18.03.2011
STUDIERENUND DURCHSTARTEN.
STAATLICHANERKANNTEFACHHOCHSCHULE
Data Warehouse
Author I: Dip.-Inf. (FH) Johannes HoppeAuthor II: M.Sc. Johannes HofmeisterAuthor III: Prof. Dr. Dieter Homeister Date: 18.03.2011
Data Warehouse
01
3
Definition DW
4
“A data warehouse is a single source for key, corporate information needed to enable business decisions .”
Dieter Homeister (his DM Script)
Data Warehouse
OLTP and DSS Defined
› An application that updates is called an on-line transaction processing (OLTP) application
› An application that issues queries to the readonly database is called a decision support system (DSS)
5
Data Warehouse
Stovepipe vs. Integration
› When systems stand by themselves they are often referred to as stovepipes
› Systems that easily share data are called well integrated systems
6
Data Warehouse
Problems with Stovepipe Architecture (1/2)
› Problems› Users who wish to access data must query several different DSS
to find it › Data may have fundamental conflicts between DSS › a department code table in one DSS may differ in another DSS › a measurement may be stored in meters in one DSS and yards in
another
7
Data Warehouse
Problems with Stovepipe Architecture (2/2)
› Solution:› Use a data warehouse, where data is integrated from the
several different stovepipe systems › Data warehouse is really sharing-lite -- you don’t have to
co-ordinate as much when applications are built and you still reap the benefits of data sharing
8
Data Warehouse
Data Warehouse Solution
› A data warehouse is an attempt to integrate separate DSS so that users can query one place to find the answers to their questions
› A data warehouse has the key, corporate data in the organization
› A data warehouse tracks historical data
9
Selling the Data Warehouse
02
10
Data Warehouse
Selling the Data Warehouse (1/2)
› A data warehouse project will fail without corporate sponsorship
› Preferably, the project should be sponsored by the CEO › The CEO must be sold on the value to the business to improve
competitive advantage by deploying a data warehouse
11
Data Warehouse
Selling the Data Warehouse (2/2)
› If an active, corporate sponsor does not exist, data sources will be very difficult to identify
› Only add data to the warehouse that will answer key, corporate questions asked by the corporate sponsor. Otherwise, you will have a data dump
12
Data Warehouse
Building a Useful Data Warehouse
You really need: › strong executive sponsorship › good knowledge of the data › sound software engineering › stability from source systems › users who want a success › A 75 percent failure rate is often cited
13
Data Warehouse
Enterprise Information System
› An EIS (Enterprise Information System) allows users to query data in a data warehouse
› Now users can access key, corporate data in the data warehouse
14
Data Warehouse
Users of an Enterprise Information System (1/3)
› multiple EIS (or different graphical interfaces) are needed to satisfy different types of users
General users› want a tool that provides detailed data, but is easy to use › Want access to the data warehouse to do routine tasks
such as Find me Joes phone number, etc. › Simple application, not focused on large reports
15
Data Warehouse
Users of an Enterprise Information System (2/3)
Executives› Want a high-level, summary data (and a simple tool)› Must be easy to use, users want to click a few buttons and
get data they want › Results must be graphs › Users should be able to drill-down into key areas.
16
Data Warehouse
Users of an Enterprise Information System (3/3)
Analysts› want a flexible, more detailed tool › Often very knowledgeable about the data › Willing to do more work to learn about the data › Sometimes even learn SQL to issue their own ad-hoc
queries
17
Data Warehouse
Need for Data Warehouses
› Data warehouses provide a single place to store key corporate data › users can go one place to find this key data using an enterprise
information system (EIS) › also a place to store and access historical data
› Users measure performance goals for their company over a period of time› Company statistics are available › Data not stored in the same place is difficult to locate and compare, easily lost › Single query can be used to access key data
18
Data Warehouse
Security & Data Warehouses
› Building a data warehouse does increase security risk because key, corporate information are all in one place
› Risk reduction: database system components can be used to protect the data warehouse. These include › Views › Access control › Security Administration › Encryption › Audit (logging of all accesses)
19
Data Warehouse
Moving Data into the Data Warehouse
› Moving data from source OLTP systems to the data warehouse is one of the hardest tasks in data warehousing
› Updates to the data warehouse are performed periodically › weekly , nightly, monthly …
› Occasionally, real-time data is needed in a data warehouse, but this is not very common
see the document about ETL, too!
20
Data Warehouse
Data Mart
› A data mart is a subset of the data warehouse that may make it simpler for users to access key corporate data
› Sometimes, users only need a piece of data from the data warehouse
› The data mart is typically fed from the data warehouse
21
References
Data Warehouse Books and References
Ralph Kimball, Margy Rossl: The Data Warehouse Toolkit, 2nd Ed., John Wiley & Sons 2002 (Lists of pitfalls, very detailed for several applications like CRM, HR, Insurances)
W. H. Inmon: Building the Data Warehouse, 3rd Ed., John Wiley & Sons 2002 (DW design, migration, techical details)
Claudia Imhoff, Nicholas Galemmo, Jonathan G. Geiger: Mastering Data Warehouse Design, John Wiley & Sons 2003 (Technical and business view, design, optimization)
Donald K. Burleson, W. H. Inmon, Joseph Hudicka: The Data Warehouse eBusiness DBA Handbook, BMC Software and DBAzine/Rampant Techpress 2003 (Available as eBook, technical details, eBusiness, focus on Oracle, DB/DW administration, tools)
22
References
Data Warehouse Books and References
Maria Sueli Almeida, Missao Ishikawa, Joerg Reinschmidt, Torsten Roeber: Getting Started with DataWarehouse and Business Intelligence, www.redbooks.ibm.com, 1999 (eBook from IBM, focus on DB2, very technical)
Mark W. Humphries, Michael W. Hawkins, Michelle C. Dy: Data Warehousing, Pearson Education, 1998 (Very technical, incl. project mgmt., architecture, hardware and parallel computing)
Chris Todman: Designing a Data Warehouse, Prentice Hall 2000 (Introduction, not very detailed)
23
THANK YOUFOR YOUR ATTENTION
24
References
Data Warehouse Books and References
David Grossman, Ophir Frieder: Introduction to Data Warehouse, Illinois Institute of Technology 2005
Dr. Andreas Geppert, Credit Suisse: Data Warehousing - Data-Warehouse-Entwurf, 2006, http://arvo.ifi.unizh.ch/dbtg/Classes/DWH/Slides/dwh-04-sl.pdf (p31: Explaination of star/snowflake/galaxy scheme, in German)
Carmela R. Balassiano: Data Warehouse Design Feb. 2007, http://academic.brooklyn.cuny.edu/cis/cbalassiano/CIS717-2%20course%20documents/week2/Data%20Warehouse%20primer.ppt (p12, p18: Explaination of star/snowflake/galaxy scheme, in English)
25