intelligent data lake mit informatica teil 2/3

32
© 2015 Mieschke Hofmann und Partner Gesellschaft für Management- und IT-Beratung mbH Big Data – vom Datensumpf zum „Intelligent Data Lake“ (IDL) Intelligent Data Lake mit Informatica Teil 2/3 Sascha Dorner & Sören Eickhoff | MHPBoxenstopp: 14.02.2017 © 2017 MHP – A Porsche Company

Upload: mhp-a-porsche-company

Post on 11-Apr-2017

155 views

Category:

Technology


0 download

TRANSCRIPT

© 2015 Mieschke Hofmann und Partner Gesellschaft für Management- und IT-Beratung mbH

Big Data – vom Datensumpf zum „Intelligent Data Lake“ (IDL)

Intelligent Data Lake mit Informatica Teil 2/3

Sascha Dorner & Sören Eickhoff | MHPBoxenstopp: 14.02.2017

© 2017 MHP – A Porsche Company

© 2017 MHP – A Porsche Company 2

21.02.2017 SAP Solution Manager 7.2 – Verwendung in der Anforderungsanalyse von

Requirements Management Rollout Projekten

07.03.2017 Data Governance mit Informatica Teil 3/3 Anforderungen und Chancen für den Einsatz moderner

Datenwerkzeuge

21.03.2017 Mobilität im urbanen Raum von Morgen Herausforderungen an die Smart City

Weitere MHPBoxenstopps www.mhp.com/events

Fahrplan Zu Beginn sind alle Teilnehmer auf stumm geschalten.

1. MHPBoxenstopp Vortrag Sascha Dorner, Sören Eickhoff (Informatica)

2. Pressekonferenz (Fragen & Antworten) Sie können bereits während der Web Session über die

Chatfunktion im rechten Fenster Fragen einreichen.

www.youtube.de/MHPProzesslieferant www.slideshare.net/MHPInsights

MHPBoxenstopp verpasst? Alle vergangenen MHPBoxenstopps finden Sie hier:

MHPBoxenstopp: Intelligent Data Lake mit Informatica Teil 2/3

© 2017 MHP – A Porsche Company 3

Sascha Dorner

Manager

BIG DATA & IoT Technologies

Consulting (MHP)

Produktentwicklung KIS

Dipl. Informatiker (FH)

Manager (MHP)

Fahrerprofil

MHPBoxenstopp: Intelligent Data Lake mit Informatica Teil 2/3

Sören Eickhoff (Informatica GmbH)

Sales Consultant Big Data

Service Management Solution Architect (IBM)

Senior Technical Sales Professional (IBM)

Dipl. Wirtschaftsinformatiker

Sales Consultant Big Data Management

© 2017 MHP – A Porsche Company 4

1. Everybody talks Big Data…

2. The Lake – Solution Informatica Data Lake (IDL)

3. Use Cases: Data Lake

MHPBoxenstopp: Intelligent Data Lake mit Informatica Teil 2/3

© 2017 MHP – A Porsche Company 5

1. Everybody talks Big Data…

2. The Lake – Solution Informatica Data Lake (IDL)

3. Use Cases: Data Lake

MHPBoxenstopp: Intelligent Data Lake mit Informatica Teil 2/3

© 2017 MHP – A Porsche Company 6

#1 in 6 Data Categories …

1. Everybody talks Big Data…

Big Data

Management

Cloud Data

Management

Data

Integration

Master Data

Management

Data

Quality

Data

Security

© 2017 MHP – A Porsche Company 7

Big Data Related Business Initiatives

1. Everybody talks Big Data…

• Fraud Detection

• Risk & Portfolio Analysis

• Investment

Recommendations

• Customer analytics

Financial Services

• Proactive Customer

Engagement

• Location Based

Services

Retail & Telco Media & Entertainment

• Online & In-Game Behavior

• Customer X/Up-Sell

• Connected Vehicle

• Predictive Maintenance

Manufacturing

• Predicting Patient Outcomes

• Total Cost of Care

• Drug Discovery

Healthcare & Pharma • Health Insurance Exchanges

• Public Safety

• Tax Optimization

• Fraud Detection

Public Sector

© 2017 MHP – A Porsche Company 8

Big Data Journey in Phases

1. Everybody talks Big Data…

Machine Device,

Cloud

Documents and

Emails

Relational,

Mainframe

Social Media,

Web Logs

Dri

ven

by IT

D

rive

n b

y B

usi

ness

Data Warehouse

Optimization

Lower infrastructure

costs

Data Discovery

& Analytics

Discover new

insights to drive

business value

Real-Time

Operational

Intelligence

Manage data assets

for new & better

services

Lower Infrastructure Cost Added Business Value

First Pilot(s)

Prove out initial

use-cases

Intelligent Data Lake

Increase

Customer

Loyalty

Reduce

Security Risk

Improve

Predictive

Maintaince

Increase

Operational

Efficiency

© 2017 MHP – A Porsche Company 9

Use Case: Data Lake / Data Platform Reference Architecture

1. Everybody talks Big Data…

Machine Device,

Cloud

Documents and

Emails

Relational,

Mainframe

Social Media, Web

Logs

Data Lake

Landing Zone Structured and unstructured enterprise and external data is landed in its raw

form, normalized and ready for use

Discovery Zone User sandbox for self-serve access to data for exploration, data blending,

hypothesis testing, analytics, and collaboration

Production Zone Sanitized transactional, master, and reference data & enriched data models

certified for enterprise use

Data Platform

Data Modeler Data Scientist Data Analyst Data Steward Data Engineer Business Increase

Customer

Loyalty

Improve Fraud

Detection

Reduce

Security Risk

Improve

Predictive

Maintenance

Increase

Operational

Efficiency

© 2017 MHP – A Porsche Company 10

Challenges Faced by the Business and IT Today

1. Everybody talks Big Data…

Can’t easily find trusted data

Limited access to the data

Frustrated by slow response from IT due to

long backlog

Constrained by disparate desktop tools,

manual steps

No way to collaborate, share, and update

curated datasets

Can’t cope with growing demand from the

business

No visibility into what the business is

doing with the data

Struggling to deliver value to the business

Loosing the ability to govern and manage

data as an asset

IT Data Analysts

© 2017 MHP – A Porsche Company 11

1. Everybody talks Big Data…

2. The Lake – Solution Informatica Data Lake (IDL)

3. Use Cases: Data Lake

MHPBoxenstopp: Intelligent Data Lake mit Informatica Teil 2/3

© 2017 MHP – A Porsche Company 12

Informatica Data Lake Management

2. The Lake – Solution Informatica Data Lake (IDL)

Data Lake Management

Enterprise Information

Catalog

Intelligent

Data Lake

Secure@Source

TITAN Blaze

Big Data

Management

Intelligent

Streaming

Live Data Map

(metadata integration)

Big Data Management

(data integration)

Data Architect /

Steward Data Scientist / Analyst InfoSec Analyst Data Engineer Data Engineer

© 2017 MHP – A Porsche Company 13

Enterprise Information Catalog

2. The Lake – Solution Informatica Data Lake (IDL)

Unified view into enterprise information assets

• Business-user oriented solution

• Semantic search with dynamic facets

• Detailed Lineage and Impact Analysis

• Business Glossary Integration

• Relationships discovery

• High level data profiling

• Automatic Classifications with Data domains

• Business classifications with Custom Attributes

• Broad metadata source connectivity

• Big data scale

© 2017 MHP – A Porsche Company 14

Intelligent Data Lake

2. The Lake – Solution Informatica Data Lake (IDL)

Self-service data preparation with collaborative data governance

Collaborative project workspaces Automated data ingestion Search data asset catalog Rapid blend of datasets Crowd-sourced data asset, tagging & data

sharing Automated data asset discovery &

Recommendations Rapid ‘industrialization’ of preparation steps

into re-usable workflows Complete tracking of usage, lineage, and

security Easily support Data Discovery Platforms

© 2017 MHP – A Porsche Company 15

Big Data Management

2. The Lake – Solution Informatica Data Lake (IDL)

Easily integrate more data faster from more data sources

Visual development interface accelerates

developer productivity

Near universal data connectivity

Complex data parsing on Hadoop

Data profiling on Hadoop

High-speed data ingestion and extraction

Process and deliver data at scale on Hadoop

Dynamic schemas and mapping templates

Data Quality and Data Governance on

Hadoop

Smart Executor

Informatica Big Data Management

Informatica Data

Transformation

Engine on

dedicated DI

servers

Data

Connectivity

Data

Connectivity Data

Connectivity

Data

Connectivity

Data

Connectivity

© 2017 MHP – A Porsche Company 16

Informatica Intelligent Streaming

2. The Lake – Solution Informatica Data Lake (IDL)

Collect, ingest and process data in realtime and streaming

Streaming analytics capability into the

Intelligent Data Platform

Unified UI with multiple engines underneath

the covers

Frictionless integration conversion/extension

of batch mappings into streaming context

Abstracted from runtime framework

© 2017 MHP – A Porsche Company 17

Intelligent Data Lake

2. The Lake – Solution Informatica Data Lake (IDL)

Data Analyst / Scientist

Prepare & Publish

Search & Discover

Share and Collaborate

Who?

© 2017 MHP – A Porsche Company 18

Intelligent Data Lake - Terminology

2. The Lake – Solution Informatica Data Lake (IDL)

Data Asset

• Data you work with as a unit

Project

• A project contains

data assets and worksheets.

Recipe

• The steps taken to prepare

data in a worksheet.

Data Preparation

• The process of combining, cleansing,

transforming, and structuring data from one

or more data assets so that it is ready For

analysis.

Data Publication

• the process of making prepared

data available in the data lake

© 2017 MHP – A Porsche Company 19

Search and Discovery - Data discovery through a powerful search engine to

find relevant data

2. The Lake – Solution Informatica Data Lake (IDL)

Semantic

search

Fact filtering by

asset, resource

Type, latest , size,

custom

attributes…

© 2017 MHP – A Porsche Company 20

Data Asset Overview - Overview with asset attributes and integrated profiling

stats

2. The Lake – Solution Informatica Data Lake (IDL)

Add data asset To

Project from any

exploration views

Column profiling stats

including

Null/Unique/Duplicate

percentages, Inferred

data types and data

domains.

Details stats include

value and pattern

distributions

Asset attributes

enriched by users

to add business

context

Asset attributes

collected from the

source system

© 2017 MHP – A Porsche Company 21

Data Lineage - Interactively trace data origin through summarized lineage

views for analysts

2. The Lake – Solution Informatica Data Lake (IDL)

Use Lineage and Impact Sliders to drill

down to desired lineage levels on either

side of the seed object.

© 2017 MHP – A Porsche Company 22

Relationship View - Shows ecosystem of the asset in the enterprise based on

association to other assets

2. The Lake – Solution Informatica Data Lake (IDL)

Get a 360 Degree View of

data asset using the

relationship view. Includes

related tables, views,

domains and reports,

users etc.

Ability to Zoom,

find specific

assets in the view

and filter by asset

types

Expand

relationship

circles to get

more details on

relationship

types and

objects.

© 2017 MHP – A Porsche Company 23

Data Preparation continued… - Excel-based data preparation on Sample data

2. The Lake – Solution Informatica Data Lake (IDL)

Advanced

functionality such

as Join, Merge,

Aggregate, Filter,

Sort etc.

New values are

calculated and

shown right away

Large number of

functions

available for all

types of data

string, numeric,

date, statistical,

Math etc.

New

formula

definition

with type-

ahead

© 2017 MHP – A Porsche Company 24

Data Preparation continued… - Excel-based data preparation on Sample data

2. The Lake – Solution Informatica Data Lake (IDL)

Column level Suggestions

Data

preparation

steps

captured as

“Recipe”

Column value

distributions

Column level

summary

© 2017 MHP – A Porsche Company 25

Data Publication - Execution of data preparation steps on actual data using

Infa mapping

2. The Lake – Solution Informatica Data Lake (IDL)

Publish the output of data

preparation steps back to

the lake

Users credentials are used

to access the underlying

database.

Recipe steps are translated

into Informatica mapping

Informatica mapping is

handed over to BDM

platform for execution on

actual data sources

BDM platform uses either

Map/Reduce, Blaze or

Spark to execute the

mapping

Mapping is available to

the ETL specialists to open

in Informatica Developer

tool to operationalize

© 2017 MHP – A Porsche Company 26

1. Everybody talks Big Data…

2. The Lake – Solution Informatica Data Lake (IDL)

3. Use Cases: Data Lake

MHPBoxenstopp: Intelligent Data Lake mit Informatica Teil 2/3

© 2017 MHP – A Porsche Company 30

Use Cases: Data Lake

3. Use Cases: Data Lake

Machine Device, Cloud

Documents

and Emails

Relational,

Mainframe

Social Media,

Web Logs

Dashboards

& Mobile

Apps

1. Load or archive

batch data

2. Replicate

change data

3. Stream real-time

data

4. Discover & profile data

Visualization &

Analytics 5. Mask

sensitive data 6. Govern &

metadata

7. Prepare data for analysis –

curate data

Data Integration

Hub

8. Subscribe to

datasets

© 2017 MHP – A Porsche Company 31

Organizations need ONE solution that helps them…

3. Use Cases: Data Lake

Easily Find &

Catalog Data &

Discover

Relationships

Rapidly Prepare &

Share Data Exactly

When it is Needed

Get instant Access

to Trusted & Secure

Data for Advanced

Analytics

Ingest, Cleanse, Integrate & protect data at scale

© 2017 MHP – A Porsche Company 32

Forrester Wave™: Big Data Fabric, Q4 ’16

3. Use Cases: Data Lake

© 2017 MHP – A Porsche Company 33

Questions ?

3. Use Cases: Data Lake

34 © 2017 MHP – A Porsche Company

MHPBoxenstopp: Intelligent Data Lake mit Informatica Teil 2/3

Gesellschaft für Management- und IT-Beratung mbH

MHP – A PORSCHE COMPANY

Sascha Dorner

Manager

Business Intelligence

Mobil: +4915120301647

E-Mail: [email protected]

© 2017 MHP – A Porsche Company 35

MHPBoxenstopp SAP Solution

Manager 7.2

21.02.17 – 11:00Uhr

MHPBoxenstopp

Data Governance

mit Informatica

Teil 3

07.03.17 – 11:00 Uhr

MHPBoxenstopp

Mobilität im

urbanen Raum von

Morgen

21.03.17 – 11:00 Uhr

MHPBoxenstopp MHPBoxenstopp verpasst? Kein Problem!

Mitschnitte und Videos:

http://www.youtube.com/MHPProzesslieferant

Präsentationsunterlagen:

http://de.slideshare.net/MHPInsights

MHPBoxenstopp: Timetable 2017