architektur von big data lösungen
TRANSCRIPT
BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH
Architektur von Big Data LösungenGuido Schmutz ([email protected])
@gschmutz
Guido Schmutz
Working for Trivadis for more than 20 yearsOracle ACE Director for Fusion Middleware and SOACo-Author of different booksConsultant, Trainer, Software Architect for Java, SOA & Big Data / Fast DataMember of Trivadis Architecture BoardTechnology Manager @ Trivadis
More than 30 years of software development experience
Contact: [email protected]: http://guidoschmutz.wordpress.comSlideshare: http://www.slideshare.net/gschmutzTwitter: gschmutz
2 Architektur von Big Data Lösungen
Agenda
1. Introduction2. Big Data Reference Architectures
• Traditional Big Data• Event / Stream-Processing• Lambda Architecture• Kappa Architecture• Unified Architecture• Microservices Architecture
3. Big Data Ecosystem – many choices sorted!
3 Architektur von Big Data Lösungen
Introduction
4 Architektur von Big Data Lösungen
Big Data Definition (4 Vs)
+Timetoaction?– BigData+Real-Time=StreamProcessing
CharacteristicsofBigData:ItsVolume,VelocityandVarietyincombination
Reliable Data Ingestion in Big Data/IoT
How to do Big Data? Why is a structuring / architecture important?
6 Architektur von Big Data Lösungen
Why talk about Big Data Architectures?
Choosing the right architecture is key for any (big data) project
Big Data is still quite a rather young field and therefore a “moving target”
no standard architectures available which have been used for years
In the past years, some architectures and best practices have evolved
Know your use cases before choosing your architecture / technologies
To have a reference architecture in place helps in choosing the right/matching technologies
7 Architektur von Big Data Lösungen
Important Properties for choosing (Big) Data Architecture
Latency
Keep raw and un-interpreted data “forever” ?
Volume, Velocity, Variety, Veracity
Ad-Hoc Query Capabilities needed ?
Robustness & Fault Tolerance
Scalability
…
8 Architektur von Big Data Lösungen
Big Data Reference Architectures -Traditional Big Data
9 Architektur von Big Data Lösungen
“Traditional Architecture” for Big Data
DataIngestion (Analytical)DataProcessing
DataSources
DataConsumer
Reports
Service
AnalyticTools
AlertingTools
Content
RDBMS
Social
ERP
Logfiles
Sensor
Machine
Batchcompute
PushingIngestion ResultStore
QueryEngine
ComputedInformation
RawData(Reservoir)
=DatainMotion =DataatRest
PullingIngestion
Channel
10 Architektur von Big Data Lösungen
“Traditional Architecture” for Big Data – Hadoop Technology Mapping
DataIngestion (Analytical)DataProcessing
DataSources
DataConsumer
Reports
Service
AnalyticTools
AlertingTools
Content
RDBMS
Social
ERP
Logfiles
Sensor
Machine
Batchcompute
PushingIngestion ResultStore
QueryEngine
ComputedInformation
RawData(Reservoir)
=DatainMotion =DataatRest
PullingIngestion
Channel
11 Architektur von Big Data Lösungen
“Traditional Architecture” for Big Data – Spark Technology Mapping
DataIngestion (Analytical)DataProcessing
DataSources
DataConsumer
Reports
Service
AnalyticTools
AlertingTools
Content
RDBMS
Social
ERP
Logfiles
Sensor
Machine
Batchcompute
PushingIngestion ResultStore
QueryEngine
ComputedInformation
RawData(Reservoir)
=DatainMotion =DataatRest
PullingIngestion
Channel
12 Architektur von Big Data Lösungen
“Traditional Architecture” for Big Data – Feeding in High-Volume Event Streams
DataIngestion (Analytical)DataProcessing
DataSources
DataConsumer
Reports
Service
AnalyticTools
AlertingTools
Content
RDBMS
Social
ERP
Logfiles
Sensor
Machine
Batchcompute
PushingIngestion ResultStore
QueryEngine
ComputedInformation
RawData(Reservoir)
=DatainMotion =DataatRest
PullingIngestion
Channel
?
?
13 Architektur von Big Data Lösungen
Traditional Architecture for Big Data
• Batch Processing - “Data at Rest”
• Not for low latency use cases• Responses are delivered “after the fact”• Maximum value of the identified situation is lost• Decision are made on old and stale data
• Spark Core is a faster alternative to Hadoop Map Reduce, but still Batch Processing
• Spark Ecosystems offers a lot of additional advanced analytic capabilities (machine learning, graph processing, …)
14 Architektur von Big Data Lösungen
Big Data Reference Architectures –Event/Stream Processing
15 Architektur von Big Data Lösungen
Event / Stream Processing – “Data in Motion”
“Data in motion”
Events are analyzed and processed in real-time as the arrive
Decisions are timely, contextual and based on fresh data
Decision latency is eliminated
16 Architektur von Big Data Lösungen
Event / Stream Processing Architecture
DataIngestion
Batchcompute
DataSources
Channel
DataConsumer
Reports
Service
AnalyticTools
AlertingTools
Content
Logfiles
Social
RDBMS
ERP
Sensor
Machine
(Analytical)Real-TimeDataProcessing
Stream/EventProcessing
Messaging
ResultStore
=DatainMotion =DataatRest17 Architektur von Big Data Lösungen
Challenges for Ingesting Data
Multitude of sensors
Real-Time Streaming
Multiple Firmware versions
Bad Data from damaged sensors
Regulatory Constraints
Data Quality
18 Architektur von Big Data Lösungen
Continuous Data Ingestion
DBSource
BigData
Log
StreamProcessing
IoT Sensor
EventHub
Topic
Topic
REST
Topic
IoT GW
CDCGW
Conn
ectCDC
DBSource
Log CDC Native
IoT Sensor
IoT Sensor
19
DataflowGW
Topic
Topic
Queue
MessageGW
Topic
DataflowGW
Dataflow
TopicRE
ST19FileSourceLog
Log
Log
Social
Native
Topic
Topic
19 Architektur von Big Data Lösungen
Continuous Data Ingestion
DBSource
BigData
Log
StreamProcessing
IoT Sensor
EventHub
Topic
Topic
REST
Topic
IoT GW
CDCGW
Conn
ectCDC
DBSource
Log CDC Native
IoT Sensor
IoT Sensor
20
DataflowGW
Topic
Topic
Queue
MessageGW
Topic
DataflowGW
Dataflow
TopicRE
ST20FileSourceLog
Log
Log
Social
Native
Topic
Topic
20 Architektur von Big Data Lösungen
DataIngestion (Analytical)Real-TimeDataProcessing
Event / Stream Processing Architecture – Open Source Technology Mapping
Batchcompute
DataSources
Channel
DataConsumer
Reports
Service
AnalyticTools
AlertingTools
Content
Logfiles
Social
RDBMS
ERP
Sensor
Machine
Stream/EventProcessing
Messaging
ResultStore
=DatainMotion =DataatRest22 Architektur von Big Data Lösungen
DataIngestion (Analytical)Real-TimeDataProcessing
Event / Stream Processing Architecture – Oracle Technology Mapping
Batchcompute
DataSources
Channel
DataConsumer
Reports
Service
AnalyticTools
AlertingTools
Content
Logfiles
Social
RDBMS
ERP
Sensor
Machine
Stream/EventProcessing
Messaging
ResultStore
=DatainMotion =DataatRest23 Architektur von Big Data Lösungen
Event / Stream Processing Architecture
The solution for low latency use cases
Process each event separately => low latency
Process events in micro-batches => increases latency but offers better reliability
Previously known as “Complex Event Processing”
Keep the data moving / Data in Motion instead of Data at Rest => raw events were not stored
24 Architektur von Big Data Lösungen
Event / Stream Processing Architecture - Keep raw event data
DataIngestion
Batchcompute
DataSources
Channel
DataConsumer
Reports
Service
AnalyticTools
AlertingTools
Content
Logfiles
Social
RDBMS
ERP
Sensor
Machine
(Analytical)Real-TimeDataProcessing
Stream/EventProcessingMessaging
ResultStore
(Analytical)BatchDataProcessing
RawData(Reservoir)
=DatainMotion =DataatRest25 Architektur von Big Data Lösungen
Big Data Reference Architectures -Lambda Architecture for Big Data
26 Architektur von Big Data Lösungen
“Lambda Architecture” for Big Data
DataIngestion
(Analytical)BatchDataProcessing
Batchcompute
DataSources
Channel
DataConsumer
Reports
Service
AnalyticTools
AlertingTools
Content
RDBMS
Social
ERP
Logfiles
Sensor
Machine
(Analytical)Real-TimeDataProcessing
Stream/EventProcessing
Batchcompute
Messaging
ResultStore
QueryEngine
ResultStore
ComputedInformation
RawData(Reservoir)
=DatainMotion =DataatRest
PullingIngestion
27 Architektur von Big Data Lösungen
Lambda Architecture for Big Data
Combines (Big) Data at Rest with (Fast) Data in Motion
Closes the gap from high-latency batch processing
Keeps the raw information forever
Makes it possible to rerun analytics operations on whole data set if necessary => because the old run had an error or => because we have found a better algorithm we want to apply
Have to implement functionality twice• Once for batch• Once for real-time streaming
29 Architektur von Big Data Lösungen
Big Data Reference Architectures -„Kappa“ Architecture
30 Architektur von Big Data Lösungen
“Kappa Architecture” for Big Data
DataIngestion
“RawDataReservoir”
Batchcompute
DataSources
Channel
DataConsumer
Reports
Service
AnalyticTools
AlertingTools
Content
RDBMS
Social
ERP
Logfiles
Sensor
Machine
(Analytical)Real-TimeDataProcessing
Stream/EventProcessing
Messaging
ResultStore
RawData(Reservoir)
ComputedInformation
=DatainMotion =DataatRest31 Architektur von Big Data Lösungen
Queryable State
Organizing NoSQL Data Stores – Different Types KeyValueStore
Wide-columnstore
Documentstore
Graphstore
Key ValueK1 V1K2 V2K3 V3
Document{k1:v1,k2:v2,k3:[v1,v2,v3]}
RowkeyCK1
RK1V1
CK2V2
CK3V3
CK4V4
……
CK1RK2V1
CK4V4
CK6V6
……
…………
CK3V3
32 Architektur von Big Data Lösungen
Organizing NoSQL Data Stores – and the Products KeyValueStore
Wide-columnstore
Documentstore
Graphstore
33 Architektur von Big Data Lösungen
Big Data Reference Architectures -„Unified“ Architecture
34 Architektur von Big Data Lösungen
“Unified Architecture” for Big Data
DataIngestion
(Analytical)BatchDataProcessing(CalculateModelsofincomingdata)
Batchcompute
DataSources
Channel
DataConsumer
Reports
Service
AnalyticTools
AlertingTools
Content
RDBMS
Social
ERP
Logfiles
Sensor
Machine
(Analytical)Real-TimeDataProcessing
Stream/EventProcessing
Batchcompute
Messaging
ResultStore
ResultStore
ComputedInformation
RawData(Reservoir)
=DatainMotion =DataatRest
PredictionModels
35 Architektur von Big Data Lösungen
Queryable State
Event Driven (Micro-) Service Architectures
36 Architektur von Big Data Lösungen
MicroserviceMicroservice
MicroserviceMicroservice
Event-Driven (Micro-) Services Architecture
DataIngestion
“RawDataReservoir”
Batchcompute
DataSources
Channel
DataConsumer
Reports
Service
AnalyticTools
AlertingTools
Content
RDBMS
Social
ERP
Logfiles
Sensor
MachineMicroservice 2
Service
RawData(Reservoir)
ComputedInformation
=DatainMotion =DataatRest37 Architektur von Big Data Lösungen
State
Batchcompute
Microservice 1
Service State
API
ResultStore
Big Data Ecosystem – many choices sorted!
38 Architektur von Big Data Lösungen
Building Blocks for (Big) Data ProcessingData
Acquisition
FormatFile System
Stream Processing
Batch SQL
Graph DBMS
Document DBMS
Relational DBMS
Visualization
IoT
Messaging
Analytics
OLAP DBMS
Query Federation
Table-Style DBMS
Key Value DBMS
Batch Processing
In-Memory
39 Architektur von Big Data Lösungen
Big Data Ecosystem – many choices sorted!
40 Architektur von Big Data Lösungen