zühlke meetup - mai 2017
Post on 23-Jan-2018
325 Views
Preview:
TRANSCRIPT
IoT-Daten: Mehr und schneller ist nicht automatisch besser
Dr. Boris Adryan Head of IoT & Data Analytics
@BorisAdryan
Nachfolgende 4 Abbildungen aus: Abschlussbericht Arbeitskreis Industrie 4.0
Vertikale Integration: Entlang der gesamten Wertschöpfungskette
Horizontale Integration: Vernetztes Produktionssystem
• Internet-Verbindung • Datenintegration • kollektive Analyse • Reaktionsfähigkeit
aus: Technical Foundations of IoT
fast nebensächlich
das macht das IoT aus!
IoT cost expectations
many sensors + complicated analytics + expensive infrastructure —————————————— IoT has little benefit
“…because my data scientist said the more the better ”
39% of survey participants are worried about the cost of an industrial IoT solution.
“Why aren’t you doing IoT?”
peanuts: “a spoon full”
How many peanuts are that on average?
0 50 100
“on average”
3 samples
Do I get more peanuts at Maxie Eisen or at Logenhaus?
0 50 100
“on average” Maxie Eisen 3 samples
“on average” Logenhaus
0 50 100
4 samples
Do I get more peanuts at Maxie Eisen or at Logenhaus?
“on average” Maxie Eisen
“on average” Logenhaus
0 50 100
n samples
statistical power through large numbers of samples
deviation
Do I get more peanuts at Maxie Eisen or at Logenhaus?
“on average” Maxie Eisen
“on average” Logenhaus
Statisticians and data scientists LOVE larger sample sizes!
…but if sampling costs time and resources, we need a compromise.
Zühlke Data Analytics Framework
precision and accuracy that can be achieved
theoretically
Sampling strategy
precision and accuracy that is needed to get
a job done
accurate and precise
not accurate, but precise
accurate, not precise
not what you want
• how to cut down on hardware costs
• how to cut down on software costs
Sweetening IoT for your customer
A few recommendations from the trenches:
many sensors + complicated analytics + expensive infrastructure —————————————— IoT has little benefit
less
reasonable
IoT - is it worth it?
The upgrade of a ‘dumb’ asset to a ‘smart’ asset is an investment.
time, money
Asset monitoring
base
Monday
WednesdayTraditional process
• small maintenance task (if needed)
• weekly site visits to all assets
• two independent tours • time to reach asset is
main contributor to cost • traffic-dependent
Data sources
Let’s assume the future isn’t going to be much different than the past…
• log from past site visits: approx. likelihood for maintenance • a collection of traffic data that’s somewhat representative
Log from previous visits
Monday tours
Wednesday tours
Maintenance likelihood
• test for dependency between Monday and Wednesday tours
none
• test for dependency within tours
none
The assumption of temporal uniformity is reasonable.
Monte Carlo simulations
p1(need today)
patterns for a demand-driven tour
‘cost function’: sum of edges
base
default tour
base
p2(need today)
p3(need today)
p4(need today)
p5(need today)
p6(need today)
Travelling salesman problem
what’s the most reasonable tour from to , visiting all ?
heuristic search is good enough, but requires a distance matrix
Traffic harvesting
• based on Google API
• generate a distribution of travel times for each edge in the graph, dependent on time of day (weekdays only)
IoT - is it worth it?
cost
awaiting confirmation!
weeks
cost
weeks
Westminster Parking Trial
https://www.westminster.gov.uk/new-trial-improve-conditions-disabled-drivers
IoT solution
Service company
~750 independent parking lots with a total of
>3,500 individual spaces
access to
Humans don’t scale that well…
labour: expensive
sensor: cheap
While the cost of the sensors is falling (and follows Moore’s Law), digging them in and out for deployment and maintenance is a significant cost factor.
Can we learn an optimal deployment and sampling pattern?
•sampling rate of 5-10 min •data over 2 weeks in May 2015 •overall 2.6 million data points
Can we make customers’ budget go further by • reducing the number of sensors in a geographic area? • lowering the sampling rate for better battery life?
A quick glimpse into the raw data
Correlation and clustering
0
5
10
15
20
0 3 6 9 12
“correlated”
0
5
10
15
20
0 3 6 9 12
“anti-correlated”
0
5
10
15
20
0 3 6 9 12
“independent”
lorry
coach
car
bike
skateboard
hierarchical clustering on the basis of a feature matrix
Good news: temporal occupancy pattern roughly predicts neighbours
lots in Southampton
lots around the corner of each other
750 parking lots
A caveat: Is a high-degree of correlation a function of parking lot size?
finding two lots of 20 spaces that correlate
finding two lots of 3 spaces that correlate
0:00 12:00 23:59
0:00 12:00 23:59
“more likely”
“less likely”
Bootstrapping in DBSCAN clusters
Simulation: Swap the occupancy vectors between parking lots of similar size and test per grid cell if these lots still correlate
What makes a good spatial cluster?
Density-Based Spatial Clustering of Applications with Noise (DBSCAN)
https://en.wikipedia.org/wiki/DBSCAN#/media/File:DBSCAN-Illustration.svg
2 parameters:
epsilon (distance) minPoints (in cluster)
A - core points B, C - corner points N - noise point
Stratification strategy
3 lots with cc > 0.5
2 spaces 4 spaces 4 spaces
Test:
1. Take occupancy profile of ONE random 2-space parking lot and TWO random 4-space parking lots.
2. Determine cc.
3. Repeat n times and get a cc distribution for that parking lot combination.
Combining stats with street knowledge
Suggested technology for trials
A temporary survey would have allowed us to make the same recommendation, including the insight that the provided 5’ resolution is probably not required.
• how to cut down on hardware costs
• how to cut down on software costs
Sweetening IoT for your customer
A few recommendations from the trenches:
many sensors + complicated analytics + expensive infrastructure —————————————— IoT has little benefit
less
reasonable
My current pet hate: Deep Learning
Deep learning has delivered impressive results mimicking human reasoning, strategic thinking and creativity.
At the same time, big players have released libraries such that even ‘script kiddies’ can apply deep learning.
It’s already leading to unreflected use of deep learning when other methods would be more appropriate.
“I need to do real-time analytics!”
microseconds to seconds
seconds to minutes
minutes to hours
hours to weeks
on device
on stream
in batch
am I falling? counteract
battery level should I land?
how many times did I
stall?
what’s the best weather for
flying?
in process
in database
operational insight
performance insight
strategic insight
e.g. Kalman filter
e.g. with machine learning
e.g. rules engine
e.g. summary stats
Can IoT ever be real-time?
zone 1:
real-time [us]
zone 2:
real-time [ms]
zone 3:
real-time [s]
Edge, fog and cloud computing
Edge Pro: - immediate compression from raw
data to actionable information - cuts down traffic - fast response
Con: - loses potentially valuable raw data - developing analytics on embedded
systems requires specialists - compute costs valuable battery life
Cloud Pro: - compute power - scalability - familiarity for developers - integration centre across
all data sources - cheapest ‘real-time’
option
Con: - traffic
Fog Pro: - same as Edge - closer to ‘normal’ development work - gateways often mains-powered
Con: - loses potentially valuable raw data
Some of our examples for real-time analytics
Choosing the appropriate method and toolset on every level.
Dr. Boris Adryan @BorisAdryan
‣ Preliminary surveys and data analysis can help to minimise the number of sensors and develop an optimal deployment strategy and sampling schedule.
‣ Super-fast analytics and state-of-the-art methods are not automatically the most useful solution.
‣ A good understanding on the type of insight that is required by the business model is essential.
Summary
mobile communications series
BORIS ADRYAN DOMINIK OBERMAIER PAUL FREMANTLE
IoT
THE TECHNICAL FOUNDATIONS OF
B O S T O N I L O N D O N
www.artechhouse.com
PMS Black PMS 7549
A RT E C H H O U S E
This comprehensive resource presents a technical introduction to the components, architectures, software, and protocols of IoT. This book was designed specifically for those interested in researching, developing, and building IoT. The book covers the physics of electricity and electromagnetism, laying the foundation for understanding the components of modern electronics and computing. Readers learn about the fundamental properties of IoT, along with security and privacy issues related to developing and maintaining connected products.
From the launch of the Internet from ARPAnet in the 1960s, to recent connected gadgets, this book highlights the integration of IoT in various verticals such as industry, smart cities, connected vehicles, and smart and assisted living. Overall design patterns, issues with UX and UI, and different network topologies related to architectures of M2M and IoT solutions are explored. Hardware development, power, sensors, and embedded systems are discussed in detail. This book offers insight into the software components that impinge on IoT solutions, their development, network protocols, backend software, data analytics, and conceptual interoperability.
Boris Adryan is the head of IoT & Data Analytics at Zuhlke Engineering (Germany) and the founder of thingslearn Ltd (UK). He holds a Ph.D. in genetics from the Max Planck Institute for Biophysical Chemistry, and led academic research as a Royal Society University Research Fellow at the University of Cambridge.
Dominik Obermaier is the cofounder and CTO at dc-square company, where he created the HiveMQ MQTT broker. He received his B.Sc. in computer science from the University of Applied Sciences Landshut.
Paul Fremantle cofounded WSO2, where he was instrumental in creating the Carbon middleware platform. He studied mathematics, philosophy and computing at Oxford University, gaining B.A. and M.Sc. degrees. He is currently pursuing his Ph.D. at the University of Portsmouth, focusing on security and privacy of IoT.
mobile communications series
THE TEC
HN
ICA
L FOU
ND
ATIO
NS O
F IoTA
DR
YAN
• O
BER
MA
IER •
FREM
AN
TLEInclude bar code
ISBN 13: 978-1-63081-025-2ISBN: 1-63081-025-8
erscheint Juni oder Juli
top related