institute for software science – university of viennap.brezany web services und grid services im...
TRANSCRIPT
Institute for Software Science – University of Vienna
P.Brezany
Web Services und Grid Services im Grid Computing
Peter Brezany
Institut für Softwarewissenschaften
Universität Wien
Institute for Software Science – University of Vienna
P.Brezany2
Medien, die radikal die Gesellschaft beeinflußten
Web
1500sDruckpresse
1840sPenny Post
1850sTelegraph
1920sTelefone
1930sRadio
1990s
1950s TV
20xxGrid
Institute for Software Science – University of Vienna
P.Brezany3
Grid Computing Vision
"The Internet is about
getting computers to talk together;
Grid computing is about
getting computers to work together."
Tom Hawk, IBM's general manager of Grid computing
Institute for Software Science – University of Vienna
P.Brezany4
Grid Computing Vision (2)
Tim Berners-Lee replies to the question „What did you have in mind when you first developed the
Web?“ by saying
"The dream behind the Web is of a common information space in which we
communicate by sharing information.“
If applied to the Grid computing this sentence can be rephrased to
“The dream behind the Grid computing is a common resource space in which we can work together using shared recources.“
Institute for Software Science – University of Vienna
P.Brezany5
Web im Vergleich zum Grid
ClassicalWeb
ClassicalGrid
More computation
Institute for Software Science – University of Vienna
P.Brezany6
Web im Vergleich zum Grid (2)
ClassicalWeb
SemanticWeb
Ric
her
sem
antic
s
Institute for Software Science – University of Vienna
P.Brezany7
Web im Vergleich zum Grid (3)
ClassicalWeb
ClassicalGrid
SemanticWeb
Ric
her
sem
antic
s
More computation
SemanticGrid
Source: Norman Paton
Institute for Software Science – University of Vienna
P.Brezany8
Lernziele
• Motivation für Grids
• Grundbegriffe
• Bestehende Architekturen
• Neue Entwicklungen
– Von Web Services zu Grid Services– Weiterentwickung und Integration von Web Services und
Grid Services
• Grid Lösungen
Institute for Software Science – University of Vienna
P.Brezany9
Beispiele und logische Konsequenzen• Beispiel Wasserversorgung
– Früher: „Hausquelle“ / Brunnen – Heute: Wassersammelstelle Leitungen Wasserhahn
• Beispiel Energieversorgung– Früher: Generator– Heute: „Großer Generator“ Stromleitungen Steckdose– Power Grid Computational Grid / Grid Computing (z.B.: NASA: „Information Power Grid“ (www.ipg.nasa.gov))
• Logische Konsequenz: Grid Computing Rechenleistung (und vieles mehr) aus der „Steckdose“
• Viele Rechner zu einem Großen Netz verbunden; Vorteile:– Komplett neue Möglichkeiten der Zusammenarbeit für Unternehmen– Hardwareersparnis („mieten“) (vgl. Generator / Quelle)– Teuere Software „mieten“ statt kaufen– Selbst z.B. Rechenleistung anbieten
Institute for Software Science – University of Vienna
P.Brezany10
Grid Computing - Definition• Definition nach www.globus.org1: „The Grid“ ist eine Infrastruktur, die eine integrierte, gemeinschaftlicheVerwendung von Ressourcen erlaubt. Als Ressourcen kommen nicht nur Re-chenleistung und Speicherplatz in Frage, sondern auch ganze (beliebige)Geräte können im Grid gemeinschaftlich verwendet werden, also zum Beispiel Hochleistungscomputer, Netzwerke, Datenbanken, Teleskope, Mikroskope bis zu Elektronenbeschleunigern. Ziel des Grid ist es, dassman auf Geräte zugreifen kann, als ob man sie besitzen würde, ohne sie kaufen zu müssen.
• Charakteristika von Grid-Anwendungen:- Große Datenmengen- Großer Rechenaufwand– Sicheres Resourcen-Sharing zwischen unabhängigen Organisationen– Aufbau von Virtuellen Organisationen (VO)
----------------------------------------------------------- 1Praktisch alle wichtigsten Grid Projekte bauen auf middleware Globus (1998 -
Globus 1, 2001 - Globus 2, 2003 - Globus 3)
Institute for Software Science – University of Vienna
P.Brezany11
VO Beispiel
• Autohersteller beauftragt:
– Application service provider (ASP) Finanzielle Vorhersage
– Storage service provider (SSP) (Historische) Daten
– Cycle providers Rechenleistung für die Analyse
Szenarienanalysen für neue Fabrik (bzw. Standort) durchzuführen.
Institute for Software Science – University of Vienna
P.Brezany12
VO Beispiel (2)
Figure: An actual organization can participate in one or more VOs by sharing some or all of itsresources. We show three actual organizations (the ovals), and two VOs: P, which links participants in anaerospace design consortium, and Q, which links colleagues who have agreed to share spare computingcycles, for example to run ray tracing computations. The organization on the left participates in P, the oneto the right participates in Q, and the third is a member of both P and Q. The policies governing access toresources (summarized in “quotes”) vary according to the actual organizations, resources, and VOsinvolved.
Institute for Software Science – University of Vienna
P.Brezany13
Definitionen: Protokoll, Dienst, API, SDK• Protokoll:
– Menge von Regeln für Endpunkte von
Telekommunikationssystemen zum Informationsaustausch
– Standardprotokoll gewährleistet Interoperabilität
• Dienst (Service):
– Netzwerkfähige Instanz mit einer bestimmten Fähigkeit
Definiert durch Protokoll und Reaktion auf eine Protokoll-Nachricht
(service = protocol + behavior)
• Application Program Interface (API):
– Standardinterface für Zugriff auf Funktionalität (ein Protokoll kann mehrere APIs haben)
– Ermöglicht Portabilität
• Software Develpment Kit (SDK):• – Implementiert ein API
Institute for Software Science – University of Vienna
P.Brezany14
Grid Protokoll Architektur vs. IP Architektur
Application
Fabric“Controlling things locally”: Access to, & control of, resources
Connectivity“Talking to things”: communication (Internet protocols) & security
Resource“Sharing single resources”: negotiating access, controlling use
Collective“Coordinating multiple resources”: ubiquitous infrastructure services, app-specific distributed services
InternetTransport
Application
Link
Inte
rnet P
roto
col
Arch
itectu
re
Institute for Software Science – University of Vienna
P.Brezany15
Grid Architektur (2)
• Fabric:
– (Computer / Dateisysteme / Archive / Netzwerke / Sensoren / ...) (open, read, write, close, ...) – Kaum Beschränkungen am low-level solang Schnittstellen
erfüllt
• Connectivity:
– Kommunikation (IP, DNS, Routing, ...) – Sicherheit (Grid Security Infrastructure, GSI) - Einheitliche Authentifikation - Single sign-on - Delegation - Public Key Technologie
Institute for Software Science – University of Vienna
P.Brezany16
Grid Architektur (3)• Resource Layer:
– Grid Resource Allocation Management (GRAM) Zuweisung, Reservierung, Monitoring, Steuerung von Rechenresourcen
– GridFTP Protokoll (FTP Erweiterungen) Hochgeschwindigkeitsdatenzugriff und –Transport
– Grid Resource Information Service (GRIS) Zugang zu Struktur- und Statusinformationen
– Netzwerkreservierung, Beobachtung und Steuerung
– Baut auf Connectivity Layer (GSI & IP) auf.
Institute for Software Science – University of Vienna
P.Brezany17
Grid Architektur (4)• Collective Layer: – Globale Protokolle und Dienste – Baut auf dem „neck“ auf – ist komplett
„unabhängig“ von den Resourcen
– Verzeichnisdienste – Monitoring- und Diagnosedienste – Datenreplikationsdienste – etc.
• Applications: – Verwenden Dienste beliebiger Layer
Institute for Software Science – University of Vienna
P.Brezany18
Data Grid
• Ursprüngliche Motivation: Wissenschaftliche Anwendungen sind sehr daten intensiv und enorm große Menge von Forschern aus der ganzen Welt will einen schnellen Zugriff auf diese Daten haben.
Institute for Software Science – University of Vienna
P.Brezany19
State of the Art in 2002
• Die bisher diskutierten Konzepte implementiert von mehreren SDK, z.B. Globus (U.S.), Unicorn (EU Projekt), European Data Grid (EU Projekt), usw.
• Nur in wissenschaftlichen Kreisen gut bekannt und Fokus auf „big-science“ Anwendungen.
• Fast keine Anbindung von Datenbanktechnologien, Anwendung von „flat files“.
• Notwendigkeit näher zum „every-day life“ (e-Business, medicine, usw.) zu sein.
• Ignorierung von Web Entwicklung – Web Service Technologien
• Große Firmen (IBM, Sun, Microsoft, usw.) beginen jetzt auch mitzumachen.
Institute for Software Science – University of Vienna
P.Brezany20
Grid and Web Services: Convergence?
Grid
Web
GT – Globus Toolkit, OGSI – Open Grid Service Infrastructure
However, despite enthusiasm for OGSI, adoption within Web community turned out to be problematic
Started far apart in apps & tech
OGSI
GT2
GT1
HTTPWSDL,
WS-*
WSDL 2,
WSDM
Have beenconverging ?
2004
1991
Institute for Software Science – University of Vienna
P.Brezany21
Grid Service – OGSA – OGSI – GT3
OGSA – Open Grid Service Architecture
Institute for Software Science – University of Vienna
P.Brezany22
Grid Service – OGSA – OGSI – GT3 (2)
• Grid Services are defined by OGSA. The Open Grid Services Architecture (OGSA) aims to define a new common and standard architecture for grid-based applications. RIght at the center of this new architecture is the concept of a Grid Service. OGSA defines what Grid Services are, what they should be capable of, what types of technologies they should be based on, but doesn't give a technical and detailed specification (which would be needed to implement a Grid Service).
• Grid Services are specified by OGSI. The Open Grid Services Infrastructure is a formal and technical specification of the concepts described in OGSA, including Grid Services.
• The Globus Toolkit 3 is an implementation of OGSI. GT3 is a usable implementation of everything that is specified in OGSI (and, therefore, of everything that is defined in OGSA).
• Grid Services are based on Web Services. Grid Services are an extension of Web Services. We'll see what Web Services are in the next page, and what Grid Services are in the page after that.
• I still don't get it: What is the difference between OGSA, OGSI, and GT3? Consider the following simple example. Suppose you want to build a new house. The first thing you need to do is to hire an architect to draw up all the plans, so you can get an idea of what your house will look like. Once you're happy with the architect's job, it's time to hire an engineer who will make detailed blueprints that specify construction details (like where to put the master beams, the power cables, the plumbing, etc.). The engineer then passes all those blueprints to qualified professional workers (construction workers, electricians, plumbers, etc) who will actually build the house. We could say that OGSA (the definition) is the architect, OGSI (the specification) is the engineer, and GT3 (the implementation) is the workers.
Institute for Software Science – University of Vienna
P.Brezany23
OGSA - GridService
Institute for Software Science – University of Vienna
P.Brezany24
GT 3 Architecture I• Grid Services, which we have already seen, are the 'GT3 Core'
layer. Let's take a look at the rest of the layers from the bottom up:
• GT3 Security Services: Security is an important factor in grid-based applications. GT3 Security Services can help us restrict access to our Grid Services, so only authorized clients can use them. For example, we said that only our New York, Los Angeles, and Seattle offices could access MathService. We want to make sure only those offices have access to MathService and, of course, we want all the data exchanged between MathService and clients to be encrypted so we can keep malicious users from intercepting our data. Besides the usual security measures (putting the web server behind a firewall, etc.) GT3 gives us one more layer of security with technologies such as SSL and X.509 digital certificates.
• GT3 Base Services: This layer actually includes a whole lot of interesting services:
• Managed Job Service: Suppose some particular operation in MathService might take hours or even days to be done. Of course, we don't want to simply stand in front of a computer waiting for the result to arrive (specially if, after 8 hours of waiting, all we get might simply be an error message!) We need to be able to check on the progress of the operation periodically, and have some control over it (pause it, stop it, etc.) This is usually called job management (in this case, the term 'job' is used instead of 'operation'), The Managed Job Service allows us to treat our invocations like jobs, and manage them accordingly.
Institute for Software Science – University of Vienna
P.Brezany25
GT 3 Architecture II• Index Service: Remember from A short introduction to Web
Services that we usually know what type of Web Service we need, but we have no idea of where they are. This also happens with Grid Services: we might know we need a Grid Service which meets certain requirements, but we have no idea of what its location is. While this was solved in Web Services with UDDI, GT3 has its own Index Service. For example, we could have several dozen MathServices all around the country, each with different characteristics (some might be better suited for statistical analysis, while others might me better for performing simulations). Index Service will allow is to query what MathService meets our particular requirements.
• Reliable File Transfer (RFT) Service: This service allows us to perform large file transfers between the client and the Grid Service. For example, suppose we have an operation in MathService which has to crunch several gigabytes of raw data (for a statistical analysis, for example). Of course, we're not going to send all that information as parameters. We'll be able to send it as a file. Furthermore, RFT guarantees the transfer will be reliable (hence its name). For example, if a file transfer is interrupted (due to a netwok failure, for example), RFT allows us to restart the file transfer from the moment it broke down, instead of starting all over again.
• GT3 Data Services: This layer includes Replica Management, which is very useful in applications that have to deal with very big sets of data. When working with large amount of data, we're usually not interested in downloading the whole thing, we just want to work with a small part of all that data. Replica Management keeps track of those subsets of data we will be working with.
• Other Grid Services: Other non-GT3 services can run on top of the GT3 Architecture.
Institute for Software Science – University of Vienna
P.Brezany26
Service Data
Institute for Software Science – University of Vienna
P.Brezany27
Service Data
Institute for Software Science – University of Vienna
P.Brezany28
Service Data
Institute for Software Science – University of Vienna
P.Brezany29
Notification Interfaces
Institute for Software Science – University of Vienna
P.Brezany30
Pull-Notifications
Institute for Software Science – University of Vienna
P.Brezany31
Push-Notifications
Institute for Software Science – University of Vienna
P.Brezany32
Notifications in GT3
Institute for Software Science – University of Vienna
P.Brezany33
Challenge:Advanced Grid Applications
Example: Knowledge Discoveryin Grid Databases
Institute for Software Science – University of Vienna
P.Brezany34
MotivationBusiness
Medicine
Scientificexperiments
SimulationsEarth observations
Data and data exploration cloud
Data and data exploration cloud
Institute for Software Science – University of Vienna
P.Brezany35
DataWarehouse
Knowledge
Cleaning andIntegration
Selection andTransformation
Data Mining
Evaluation andPresentation
The Knowledge Discovery Process
OLAP
Online Analytical Mining
OLAP Queries
Institute for Software Science – University of Vienna
P.Brezany36
The GridMiner Project in Vienna
• GridMiner : A knowledge discovery Grid infrastructure (http://www.gridminer.org/) OGSA-based architecture Workflow management Grid-aware data preprocessing and data mining services Data mediation service OLAP service GUI Implementation on top of Globus Toolkit 3.0
• Application : Management of patients with traumatic brain injuries
Institute for Software Science – University of Vienna
P.Brezany37
GridMiner Architecture
GMMSMediation
GMPPSPreprocessing
GMDMSData Mining
GMPRSPresentation
GM DSCEDynamic Service Control
GMDISIntegration
GMOMSOLAM
GMISInformation
GMRBResource Broker
GridMiner Core
GMCMSOLAP / Cubes
GridMiner Base
GridMiner Workflow
Grid CoreServices
SecurityFile and Database
Access ServiceReplica
Management
Grid Core
Grid Resources Data Source
Fabric
Institute for Software Science – University of Vienna
P.Brezany38
Collaboration of GM-Services
Example 3:
Institute for Software Science – University of Vienna
P.Brezany39
The Control Layer
• Control Layer– Provision of the whole knowledge discovery process to a client
• Knowledge discovery process in GridMiner– services to execute not known– order of service execution– sequential and concurrent execution
• Approaches investigated:– Data Mining Query Language– Standard Workflow Orchestration Approach (BPEL4WS, WSFL,
GSFL, …)– Our approach: Dynamic Service Control
Institute for Software Science – University of Vienna
P.Brezany40
The Control LayerStandard Service Orchestration Approach
(BPEL4WS)
Institute for Software Science – University of Vienna
P.Brezany41
Workflow Models
Composition by Service Publisher Composition by Service Consumer
Institute for Software Science – University of Vienna
P.Brezany42
The Control Layer - Approaches:Dynamic Service Control
• Dynamic Service Control Language (DSCL)
– based on XML– easy to use– supports OGSA Grid Services– specially design to support
knowledge discovery processes
• Dynamic Service Control Engine (DSCE)
– processes workflow according to DSCL
DSCE
Service A
Service C
Service D
Client
OGSA Grid Services
Notification sinkDSCL
subscribe
query results
notify
(re)connect
Start,
stop,
resume…
Service B
Institute for Software Science – University of Vienna
P.Brezany43
Dynamic Service Control Language (DSCL)
• Features– Control flow
» concurrent execution of activities» sequential execution of activities
– Activities» creation of new Grid Service Instances» invoking operations on Grid Service Instances» querying information of Grid Service Instances» destroying of Grid Service Instances
Institute for Software Science – University of Vienna
P.Brezany44
DSCL - Structure
variables
composition
dscl
qreateService
invokequerySDE
qreateService
invokequerySDE
qreateService
invoke
Institute for Software Science – University of Vienna
P.Brezany45
• Initializing by simple type value
• Initializing by arrays
<variable name=“intvar”><ns1:value xsi:type=“xsd:int”
xmlns:xsi=“http://www.w3.org/2001/XMLSchema-instance”xmlns:xsd=“http://www.w3.org/2001/XMLSchema”xmlns:ns1=“http://ogsa.globus.org”>4711</ns1:value>
</variable>
DSCL - Variables
<variable name=“arrayvar”><ns1:value xsi:type=“soapenc:Array” soapenc:arrayType=“xsd:int[2]”
xmlns:xsi=“http://www.w3.org/2001/XMLSchema-instance”xmlns:soapenc=“http://schemas.xmlsoap.org/soap/encoding/”xmlns:xsd=“http://www.w3.org/2001/XMLSchema”xmlns:ns1=“http://ogsa.globus.org”>
<soapenc:item>23</soapenc:item><soapenc:item>-112</soapenc:item>
</ns1:value></variable>
Institute for Software Science – University of Vienna
P.Brezany46
• Initializing by a complex type value
<variable name="address-var"> <ns1:value xmlns:ns1="http://ogsa.globus.org"> <ns1:country xmlns:ns1="http://www.gridminer.org/test/">Austria</ns1:country> <ns1:zip xmlns:ns1="http://www.gridminer.org/test/">1090</ns1:zip> <ns1:city xmlns:ns1="http://www.gridminer.org/test/">Vienna</ns1:city> <ns1:street xmlns:ns1="http://www.gridminer.org/test/">Liechtensteinstr.</ns1:street> <ns1:number xmlns:ns1="http://www.gridminer.org/test/">18</ns1:number> </ns1:value></variable>
DSCL - Variables
<xsd:schema targetNamespace="http://www.gridminer.org/test/"xmlns:tns="http://www.gridminer.org/test/"
...<xsd:complexType name="address">
<xsd:sequence><xsd:element name="country" type="xsd:string/"><xsd:element name="zip" type="xsd:string"/><xsd:element name="city" type="xsd:string"/><xsd:element name="street" type="xsd:string"/><xsd:element name="number" type="xsd:string"/>
</xsd:sequence></xsd:complexType>...
</xsd:schema>
Institute for Software Science – University of Vienna
P.Brezany47
DSCL Control Flow
composition
dscl
sequence
parallel
invoke activityID=“act2.1” …
invoke activityID=“act2.2” …
createService activityID=“act1” …
sequence
variables
act1
act2.1
act2.2
…
Institute for Software Science – University of Vienna
P.Brezany48
Grid and Web Services: Convergence: Yes!
Grid
Web
The definition of WSRF means that Grid and Web communities can move forward on a common base First publications on WSRF: January 2004
WSRF
Started far apart in apps & tech
OGSI
GT2
GT1
HTTPWSDL,
WS-*
WSDL 2,
WSDM
Have beenconverging
Web Services Resource Framework - WSRF
Institute for Software Science – University of Vienna
P.Brezany49
Literatur
1. Grid Computing – Making the Global Infrastructure a Reality. By F. Berman, G. Fox, T. Hey (Eds.), Wiley 2003
2. www.globus.org
3. www.gridminer.org (unser Forschungsprojekt)
4. Viele Dokumente im Web