title: multivariate geostatistics for the predictive modelling of

19
Page 1 of 2 Title: Multivariate geostatistics for the predictive modelling of the surficial sand distribution in shelf seas Author(s): Els Verfaillie (UGent), Marc Van Meirvenne (UGent) & Vera Van Lancker (UGent) Document owner: Els Verfaillie ([email protected]) Reviewed by: Workgroup: MESH action: 4 Version: n/a Date published: File name: WE_UGent_Multivariate_geostatistics.pdf Language: English Number of pages: 17 (including these pages) Summary: Multivariate geostatistics have been used to obtain a detailed and high-quality map of the median grain-size distribution of the sand fraction at the Belgian Continental Shelf. Sandbanks and swales are the dominant geomorphological features and impose a high-spatial seafloor variability. Interpolation over complex seafloors is difficult and as such various models were investigated. In this paper, linear regression and ordinary kriging (OK) were used and compared with kriging with an external drift (KED) that makes use of secondary information to assist in the interpolation. KED proved to be the best technique since a linear correlation was found between the median grain-size and the bathymetry. The resulting map is more realistic and separates clearly the sediment distribution over the sandbanks from the swales. Both techniques were also compared with a simple linear regression of the median grain-size against the bathymetry. An independent validation showed that the linear regression yielded the largest average prediction error (almost twice as large as with KED). Unlike most static sedimentological maps, our approach allows for defining grain-size classes that can be adapted according to the needs of various applications. These relate mainly to the mapping of soft substrata habitats and of the most suitable aggregates for extraction. This information is highly valuable in a marine spatial planning context. Reference/citation: Verfaillie, E., Van Meirvenne, M. & Van Lancker, V., 2006. Multivariate geostatistics for the predictive modelling of the surficial sand distribution in shelf seas, Continental Shelf Research, 26 (19), 2454-2468. Keywords: Multivariate geostatistics; Median grain-size; Bathymetry; Habitat mapping; Resource maps; Belgian continental shelf Bookmarks:

Upload: others

Post on 12-Sep-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Title: Multivariate geostatistics for the predictive modelling of

Page 1 of 2

Title: Multivariate geostatistics for the predictive modelling of the surficial sand distribution in shelf seas

Author(s): Els Verfaillie (UGent), Marc Van Meirvenne (UGent) & Vera Van Lancker (UGent)

Document owner: Els Verfaillie ([email protected])

Reviewed by:

Workgroup:

MESH action: 4

Version: n/a

Date published:

File name: WE_UGent_Multivariate_geostatistics.pdf

Language: English

Number of pages: 17 (including these pages)

Summary: Multivariate geostatistics have been used to obtain a detailed and high-quality map of the median grain-size distribution of the sand fraction at the Belgian Continental Shelf. Sandbanks and swales are the dominant geomorphological features and impose a high-spatial seafloor variability. Interpolation over complex seafloors is difficult and as such various models were investigated. In this paper, linear regression and ordinary kriging (OK) were used and compared with kriging with an external drift (KED) that makes use of secondary information to assist in the interpolation. KED proved to be the best technique since a linear correlation was found between the median grain-size and the bathymetry. The resulting map is more realistic and separates clearly the sediment distribution over the sandbanks from the swales. Both techniques were also compared with a simple linear regression of the median grain-size against the bathymetry. An independent validation showed that the linear regression yielded the largest average prediction error (almost twice as large as with KED). Unlike most static sedimentological maps, our approach allows for defining grain-size classes that can be adapted according to the needs of various applications. These relate mainly to the mapping of soft substrata habitats and of the most suitable aggregates for extraction. This information is highly valuable in a marine spatial planning context.

Reference/citation: Verfaillie, E., Van Meirvenne, M. & Van Lancker, V., 2006. Multivariate geostatistics for the predictive modelling of the surficial sand distribution in shelf seas, Continental Shelf Research, 26 (19), 2454-2468.

Keywords: Multivariate geostatistics; Median grain-size; Bathymetry; Habitat mapping; Resource maps; Belgian continental shelf

Bookmarks:

Page 2: Title: Multivariate geostatistics for the predictive modelling of

Related information:

Page 3: Title: Multivariate geostatistics for the predictive modelling of

Page 1 of 2

Title: Multivariate geostatistics for the predictive modelling of the surficial sand distribution in shelf seas

Author(s): Els Verfaillie (UGent), Marc Van Meirvenne (UGent) & Vera Van Lancker (UGent)

Document owner: Els Verfaillie ([email protected])

Reviewed by:

Workgroup:

MESH action: 4

Version: n/a

Date published:

File name: WE_UGent_Multivariate_geostatistics.pdf

Language: English

Number of pages: 17 (including these pages)

Summary: Multivariate geostatistics have been used to obtain a detailed and high-quality map of the median grain-size distribution of the sand fraction at the Belgian Continental Shelf. Sandbanks and swales are the dominant geomorphological features and impose a high-spatial seafloor variability. Interpolation over complex seafloors is difficult and as such various models were investigated. In this paper, linear regression and ordinary kriging (OK) were used and compared with kriging with an external drift (KED) that makes use of secondary information to assist in the interpolation. KED proved to be the best technique since a linear correlation was found between the median grain-size and the bathymetry. The resulting map is more realistic and separates clearly the sediment distribution over the sandbanks from the swales. Both techniques were also compared with a simple linear regression of the median grain-size against the bathymetry. An independent validation showed that the linear regression yielded the largest average prediction error (almost twice as large as with KED). Unlike most static sedimentological maps, our approach allows for defining grain-size classes that can be adapted according to the needs of various applications. These relate mainly to the mapping of soft substrata habitats and of the most suitable aggregates for extraction. This information is highly valuable in a marine spatial planning context.

Reference/citation: Verfaillie, E., Van Meirvenne, M. & Van Lancker, V., 2006. Multivariate geostatistics for the predictive modelling of the surficial sand distribution in shelf seas, Continental Shelf Research, 26 (19), 2454-2468.

Keywords: Multivariate geostatistics; Median grain-size; Bathymetry; Habitat mapping; Resource maps; Belgian continental shelf

Bookmarks:

Page 4: Title: Multivariate geostatistics for the predictive modelling of

Related information:

Page 5: Title: Multivariate geostatistics for the predictive modelling of

ARTICLE IN PRESS

0278-4343/$ - se

doi:10.1016/j.csr

E-mail addre

Vera.VanLanck

Marc.VanMeirv

Continental Shelf Research 26 (2006) 2454–2468

www.elsevier.com/locate/csr

Multivariate geostatistics for the predictive modelling of thesurficial sand distribution in shelf seas

Els Verfailliea, Vera Van Lanckera, Marc Van Meirvenneb

aRenard Centre of Marine Geology (RCMG), Ghent University, Krijgslaan 281, S8, 9000 Gent, BelgiumbDepartment of Soil Management and Soil Care, Ghent University, Coupure 653, 9000 Gent, Belgium

Received 22 December 2005; received in revised form 13 July 2006; accepted 26 July 2006

Available online 18 September 2006

Abstract

Multivariate geostatistics have been used to obtain a detailed and high-quality map of the median grain-size distribution

of the sand fraction at the Belgian Continental Shelf. Sandbanks and swales are the dominant geomorphological features

and impose a high-spatial seafloor variability. Interpolation over complex seafloors is difficult and as such various models

were investigated. In this paper, linear regression and ordinary kriging (OK) were used and compared with kriging with an

external drift (KED) that makes use of secondary information to assist in the interpolation. KED proved to be the best

technique since a linear correlation was found between the median grain-size and the bathymetry. The resulting map is

more realistic and separates clearly the sediment distribution over the sandbanks from the swales. Both techniques were

also compared with a simple linear regression of the median grain-size against the bathymetry. An independent validation

showed that the linear regression yielded the largest average prediction error (almost twice as large as with KED).

Unlike most static sedimentological maps, our approach allows for defining grain-size classes that can be adapted

according to the needs of various applications. These relate mainly to the mapping of soft substrata habitats and of the

most suitable aggregates for extraction. This information is highly valuable in a marine spatial planning context.

r 2006 Elsevier Ltd. All rights reserved.

Keywords: Multivariate geostatistics; Median grain-size; Bathymetry; Habitat mapping; Resource maps; Belgian continental shelf

1. Introduction

Seabed habitats are subject to increasing pres-sures from human developments such as fisheries,aggregate extraction, dredging/dumping and wind-mill farms. In this context, the mapping of habitatsand their prediction becomes crucial, both at thelevel of baseline studies as during the monitoring

e front matter r 2006 Elsevier Ltd. All rights reserved

.2006.07.028

sses: [email protected] (E. Verfaillie),

[email protected] (V. Van Lancker),

[email protected] (M. Van Meirvenne).

and decommitment phase. There is a differencebetween the physical (or abiotic) and the biological(or biotic) part of a seabed habitat (Fig. 1).However, if a full coverage map of the physicalhabitat is available and if the relations between thephysical and the biological habitat are known, it ispossible to create a full coverage map of thebiological habitat. Nowadays there is an increasingdemand for full coverage information. ‘Filling thegaps’ and ‘predictive modelling’ or the prediction ofphysical and biological information in areas withgaps, is a hot topic (e.g. ICES, 2005) in the

.

Page 6: Title: Multivariate geostatistics for the predictive modelling of

ARTICLE IN PRESS

Fig. 1. A seabed habitat consists of a biotic and an abiotic part. ICES (2005) defines a habitat as: ‘‘A recognizable space which can be

distinguished by its abiotic characteristics and associated biological assemblage, operating at particular spatial and temporal scales.’’ In

this paper only the abiotic part is considered.

E. Verfaillie et al. / Continental Shelf Research 26 (2006) 2454–2468 2455

framework of habitat mapping and nature protec-tion. This is one of the aims of the project MESH(Development of a framework for Mapping Eur-opean Seabed Habitats) and BWZee (BiologicalValuation Map of the Belgian Continental Shelf), inwhich the current research plays an important role.

Data, describing the physical habitat, are availableas point information (e.g. sediment samples), as fullcoverage information (e.g. Digital Elevation Model orDEM) or as full coverage information from a model(e.g. current data, shear stress). The datasets availablein this study were sediment samples and a DEM.

For the mapping of soft substrata habitats, it hasbeen shown that the sedimentology (mainly themedian grain-size and the silt-clay percentage) is animportant parameter to explain and predict theoccurrence of (macro)benthos (seabed organismslarger than 1mm) (e.g. Wu and Shin, 1997; Lee-caster, 2003; Van Hoey et al., 2004). Althoughsediment samples are generally more available thanbiological samples, it remains difficult to predict (orinterpolate) their distribution and this particularlyover complex seafloors. As such, a sound methodol-ogy for the interpolation of these data is necessary.

The general aim of this paper was to produce ahigh-quality map of the median grain-size at theBelgian Continental Shelf (BCS), looking for thebest interpolation method. This map is a valuableproduct in the context of aggregate extraction,habitat mapping, ecological valuation, spatial plan-ning and sediment transport.

2. Material and methods

2.1. Data description

Grain-size data was derived from a sedimentolo-gical database (‘sedisurf@’) hosted by RenardCentre of Marine Geology, Ghent University. The

dataset is a compilation of sample information since1976 and contains more than 6000 samples.

As a second variable, a high-resolution DEM wascompiled based on data from the Ministry of theFlemish Community (Department of Environmentand Infrastructure, Waterways and Marine AffairsAdministration, Division Coast, Hydrographic Of-fice) and completed with data from the Hydro-graphic Office of the Netherlands and the UnitedKingdom. Regarding the Belgian shelf, this is a veryvaluable source of information as its very largedensity allowed an interpolation to a resolution of80m, using a simple inverse distance algorithm.From the DEM, a slope map was derived. Based onthe DEM and the slope map, homogeneous zones atthe BCS could be defined (Fig. 2). These zones allowa distinction between sandbanks, swales and fore-shore zones. The delimitation of the zones was doneby alternatively inspecting the DEM and the slopemap and the visual drawing of polygons in ageographical information system (GIS). From eachzone the amount of samples, the variation of thegrain-size (e.g. mean value, variance, sum,y) canbe queried in GIS. In this way it is possible to get animpression of the variation of the samples withineach zone and to carry out a quality control of thegrain-size values. The quality control of thesediment samples was done assuming that samplesinside the same zone, are more similar than samplesfrom different zones. Extreme values can beidentified and if necessary, removed. For thatpurpose a sound knowledge of the sedimentologicaldata is needed. On this basis 83 points were removedout of the dataset.

2.2. Linear regression

A well-known approach consists of modelling therelation between the median grain-size and the

Page 7: Title: Multivariate geostatistics for the predictive modelling of

ARTICLE IN PRESS

Fig. 2. Large-scale zonation, distinguishing swales, sandbanks and foreshores.

E. Verfaillie et al. / Continental Shelf Research 26 (2006) 2454–24682456

depth using a linear function of the type

zðxÞ ¼ a�0 þ a�1yðxÞ

with z(x) equal to the measurement of median grain-size at location x, a�0 the intercept constant value, a�1the slope constant value; y(x) the measurement ofdepth at location x.

With this relation, each depth value can beconverted into a median grain-size value. This typeof regression has the major shortcoming that themedian grain-size is only derived from the depth atthe same location x, regardless of the surroundingvalues (Goovaerts, 1999).

2.3. Geostatistical approach

Geostatistical interpolation techniques (generallyknown as kriging) have the advantage that they are

stochastic in contrast with deterministic techniqueslike trend surfaces. The latter predict an unknownvalue in a unique way without an associatedmeasure of uncertainty. Stochastic techniques pro-vide a number of possible values, with a probabilityof occurrence. A unique solution cannot beexpected (Goovaerts, 1997). Moreover, geostatisti-cal techniques have the advantage that they makeuse of the spatial correlation between neighbouringobservations, to predict values at unsampled places(Goovaerts, 1999). These techniques give an indica-tion of the errors and uncertainties associated withthe interpolated values, based on a variance surfaceof the estimated values (Burrough and McDonnell,1998). Multivariate geostatistics can be used if thereis a relation between the predicted variable (e.g.median grain-size) and a secondary variable (e.g.bathymetry). It is possible to include this secondary

Page 8: Title: Multivariate geostatistics for the predictive modelling of

ARTICLE IN PRESSE. Verfaillie et al. / Continental Shelf Research 26 (2006) 2454–2468 2457

information into the interpolation. This additionalinformation results in a more accurate and completeprediction of the variable than without the second-ary information. In practice, secondary informationis often cheaper or easier to obtain, and as such cancomplement the sparsely sampled (primary) obser-vations. More details on the applied geostatisticalanalysis can be found in Goovaerts (1997), Deutschand Journel (1998) and Wackernagel (1998).

2.3.1. Variogram analysis

The variogram g(h) represents the average variancebetween observations separated by a distance h. Thevalue plays an important role in the description andinterpretation of the structure of the spatial variabilityof the investigated regionalized variable. It isestimated by (Journel and Huijbregts, 1978):

gðhÞ ¼1

2NðhÞ

XNðhÞa¼1

zðxaÞ � zðxa þ hÞ� �2

with z(xa) equal to the measurement at location xa,z(xa+h) the measurement at location xa+h, g(h) thevariogram for distance vector ( ¼ lag) h betweenmeasurements z(xa) and z(xa+h), the N(h) number ofcouples of measurements separated by h.

A variogram is presented as a graph (Fig. 3),where the calculated variogram values (dots)represent the experimental variogram. The fittingof a theoretical variogram (curve) is an importantstep in the variogram analysis. Hereby, the ‘sill’ isthe total variance s2 of the variable, the ‘range’ is themaximal spatial extent of spatial correlation be-tween observations of the variable and the ‘nugget’is the random error.

Fig. 3. Experimental variogram (black dots) and theoretical

variogram (curve) (from Burrough and McDonnell, 1998).

The theoretical variogram can be composed ofnested models or structures. Common models arethe nugget model, spherical model, exponentialmodel, Gaussian model and power model. Directiondependant variograms can be set up in the case ofanisotropic variability. The formulas of thesemodels can be found in e.g. Journel and Huijbregts,1978; Wackernagel, 1998.

For the variogram analysis the programmeVariowin 2.21 (Pannatier, 1996) was used.

2.3.2. Interpolation with kriging

A univariate and a multivariate variant of krigingwere compared:

Ordinary kriging (OK) of median grain-size (d50)using directional variograms; � Kriging with an external drift (KED) of d50 with

bathymetrical values as secondary informationand with an omnidirectional variogram.

For the geostatistical analysis the softwareGSLIB 1998 (Deutsch and Journel, 1998) was used.

OK is the most frequently used kriging technique.The OK algorithm uses a weighted linear combina-tion of sampled points situated inside a neighbour-hood (or interpolation window) around the locationx0, where the interpolation is conducted. Anunderlying assumption is that the mean value (m)is locally stationary (i.e. that it has a constant valueinside the interpolation neighbourhood). The algo-rithm can be written as

Z�ðx0Þ ¼Xnðx0Þ

a¼1

la ZðxaÞ �m½ �� �

þm

¼Xnðx0Þ

a¼1

laZðxaÞ� �

þ 1�Xnðx0Þ

a¼1

la

" #m

with la equal to the weights attributed to the n(x0)observations z(xa); n the total number of observa-tions z(xa); n(x0) the subset of n, lying inside theinterpolation window.

The weights la are obtained by solving a set ofequations (the kriging system) involving knowledgeof the variogram (see e.g. Goovaerts, 1997). Theseweights are constrained to sum to one, leading tothe elimination of the parameter m from theestimator which is thus written as

Z�OK ðx0Þ ¼Xnðx0Þ

a¼1

laZðxaÞ withXnðx0Þ

a¼1

la ¼ 1.

Page 9: Title: Multivariate geostatistics for the predictive modelling of

ARTICLE IN PRESSE. Verfaillie et al. / Continental Shelf Research 26 (2006) 2454–24682458

KED is a multivariate variant of ‘Kriging with aTrend Model’ (KT), formerly called ‘UniversalKriging’. KED and KT are non-stationary meth-ods, meaning that the statistical properties of thevariable are not constant in space (i.e. no constantmean within the interpolation neighbourhood).With KT, the trend is modelled as a function ofthe spatial coordinates, whilst for KED the trendm(x0) is derived as a local linear function of thesecondary variable z2(x0), which is formulated ineach interpolation window (Goovaerts, 1997)

mðx0Þ ¼ b0 þ b1z2ðx0Þ

with m(x0) the trend at location x0, b0, b1 theunknown parameters of the trend, calculated ineach interpolation window from a fit to theobservations, z2(x0) the secondary variable atlocation x0.

The KED estimator has the same form as the OKestimator.

For non-stationary geostatistics such as KED,Z(x) can be decomposed into a deterministicfunction or a drift m(x) and a residual randomfunction Y(x) (Wackernagel, 1998)

Y ðxÞ ¼ ZðxÞ �mðxÞ.

The underlying variogram associated with Y isdirectly accessible, when the drift is not active in aparticular direction of space. The variogram in thisdirection can be extended to the other directionsunder an assumption of isotropic behaviour of theunderlying variogram (Wackernagel, 1998).

KED is a multivariate geostatistical technique, asit makes use of secondary information. However,this secondary data must be available at all primarydata locations as well as at all locations beingestimated. A more complex multivariate geostatis-tical technique is cokriging, which does not requirethis secondary information to be known at alllocations being estimated. Cokriging is much moredemanding than other kriging techniques becauseboth direct and cross variograms must be inferredand jointly modelled and because a large cokrigingsystem must be solved (Goovaerts, 1997).

2.3.3. Validation

To enable a thorough quality control of thegeostatistical analysis, the sedisurf@ database wasdivided into two subsets: a prediction and anindependent validation dataset. The proportion ofboth datasets is respectively, 70% and 30% of the

whole dataset. The validation dataset was selectedusing a random selection of data points.

Several indices are suitable to evaluate theinterpolation. These indices are all a measure ofthe estimation error that is the difference betweenthe estimated and the observed value: z�ðxaÞ � zðxaÞ.

1.

The mean estimation error (MEE), which has tobe about zero to have an unbiased estimator.

MEE ¼1

n

Xn

a¼1

z�ðxaÞ � zðxaÞð Þ:

2.

The mean-square estimation error (MSEE), whichhas to be as low as possible and which is useful tocompare different procedures. The root mean-

square estimation error (RMSEE) is used toobtain the same units as the variable. Thisparameter has to be compared to the varianceor the standard deviation of the dataset.

MSEE ¼1

n

Xn

a¼1

z�ðxaÞ � zðxaÞð Þ2.

3.

The mean absolute estimation error (MAEE),which is analogous to the MSEE, but lesssensitive to extreme deviations.

MAEE ¼1

n

Xn

a¼1

z�ðxaÞ � zðxaÞ�� ��.

4.

The Pearson correlation coefficient between z�ðxaÞ

and z(xa), which indicates the degree of linearcorrelation between observed and estimatedvalues. This value always has to be consideredin combination with the MEE. The correlationcoefficient is itself a measure of the proportion ofvariance explained, hence is related to MSEE.

3. Results

3.1. Linear regression

The relation between median grain-size and depthwas modelled as

d50 ¼ 179:84þ 5:94� depth;

resulting in the map of the median grain-size shownat Fig. 4. This map is a simple rescaling of theDEM, converted into grain-size values between 179and 508. Linear regression is not an exact inter-polator, meaning that the interpolated map does nothonour observations; the measurements are only

Page 10: Title: Multivariate geostatistics for the predictive modelling of

ARTICLE IN PRESS

Fig. 4. Map of median grain-size, on the basis of linear regression.

E. Verfaillie et al. / Continental Shelf Research 26 (2006) 2454–2468 2459

used to calculate a linear regression function. As themap is a transformation of the DEM, it shows veryclearly the anisotropy, but the typical, more patchypattern of the grain-size is completely lost (comparewith Fig. 10a and b).

3.2. Geostatistical approach

3.2.1. Exploratory data analysis

The histogram of the grain-size data (Fig. 5)shows a symmetric distribution.

At every location where the median grain-size d50is known the depth is also known from the DEM(Fig. 6). The Pearson correlation coefficient rij

between both variables is 0.46, indicating a moder-ately strong correlation. The Spearman rank corre-lation is slightly larger (0.52) indicating the presence

of some outliers (as can be seen at Fig. 6) reducingthe Pearson correlation coefficient.

The scatterplot suggests the existence of twopopulations (one parallel with and one perpendi-cular to the X axis). However after splitting the twopopulations, the correlations did not improve. Topreserve the added value of the secondary variablein the geostatistical analysis, the decision was madeto keep the dataset as a single entity.

3.2.2. Variogram analysis

The maximal diagonal distance at the BCS isabout 90 km. Following a rule of thumb, theproduct of the lag interval distance and the numberof lags should not exceed half of this largestdimension: i.e. between 30 and 45 km. Conse-quently, the variogram surface was calculated using

Page 11: Title: Multivariate geostatistics for the predictive modelling of

ARTICLE IN PRESS

Fig. 5. Histogram of the prediction dataset of the d50 values.

Fig. 6. Scatter plot of d50 value compared to the depth.

E. Verfaillie et al. / Continental Shelf Research 26 (2006) 2454–24682460

11 lags of 3000m. This variogram surface (Fig. 7)shows a clear anisotropy. The direction of thelargest continuity is about 501 (expressed as atrigonometric angle), corresponding to the directionof the sandbanks at the BCS and to the smallestvariogram values. This indicates that the sandbankshave a strong influence on the spatial variability ofthe data. This is the case for the median grain-size

(Fig. 7, left), but it is stronger with the depth values(Fig. 7, right). The direction of the largest dis-continuity is about 1301, corresponding to thedirection perpendicular to the sandbanks. Tocharacterize the spatial variability in differentdirections directional variograms were calculatedin the directions: 401, 851, 1301 and 1751.

For the directional variograms computed over alarge distance (i.e. over a distance of 33 km or 11lags of 3000m.), a sill was reached in the directionof the largest continuity (401). In the directionsperpendicular to this direction, no sill was reached.This indicates a spatial trend, i.e. an increasingvariability with increasing distance, which is causedby a non-stationary mean (i.e. a non-constant meanmedian grain-size over the BCS). Therefore, for OKthe variogram (Fig. 8) was restricted to a distance of10 km (with 20 lags of 500m), which is large enoughto cover the interpolation window.

For KED the experimental variogram (Fig. 9)was estimated using increasing lag spacings between500 and 1000m. In this way it was possible to modelaccurately both the short and long distance patternsduring the variogram analysis. The short distancevariability is important for the fitting of the nuggetand initial behaviour of the variogram, while thelong distance variability is important for the fittingof the range and eventually compound or ‘nested’models. Only the variogram in the direction of the

Page 12: Title: Multivariate geostatistics for the predictive modelling of

ARTICLE IN PRESS

Fig. 7. Variogram surface of d50 value (left) and depth value

(right), with 11 lags of 3000m.

Fig. 8. Directional variograms in the direction of largest

discontinuity (1301) and in the direction of largest continuity

(401), corresponding to, respectively the direction perpendicular

to the sandbanks and parallel to the sandbanks.

Fig. 9. Variogram in the direction of largest continuity (501),

considered as an omnidirectional variogram.

E. Verfaillie et al. / Continental Shelf Research 26 (2006) 2454–2468 2461

largest continuity (501) was calculated, because weconsider it as representative for stationary condi-tions without a trend. For KED a linear trend withthe depth (causing the anisotropy) was calculatedwithin each interpolation window. The variogram isalso shown at a distance of 10 km, to make itcomparable with the directional variograms of OK,although the sill would not change anymore over alarger distance (as 501 is the direction of the largestcontinuity).

To the variogram of OK an exponential modelwas fit with a nugget of 1240 mm2, a range of 2200min the direction of the largest continuity and a rangeof 880m in the direction of the lowest continuity.This represents a geometrical anisotropy, meaningthat there are different ranges in different directions.This anisotropy is modelled using an ellipse, withthe largest and the smallest range as, respectively themain axis and the side axis. The ratio between the

largest and the smallest ranges, is the anisotropyratio. The anisotropy ratio is 0.40. The sill of thestructure has a value of 7740 mm2.

The theoretical variogram of KED was bestmodelled as a nested structure. The nugget is1560 mm2, the first structure is an exponential modelwith a range of 2400m and the second structure is aspherical model with a range of 9000m, the sill is7410 mm2.

Page 13: Title: Multivariate geostatistics for the predictive modelling of

ARTICLE IN PRESSE. Verfaillie et al. / Continental Shelf Research 26 (2006) 2454–24682462

3.2.3. Interpolation with kriging

For the calculation of the final OK map, the fittedvariogram parameters (nugget, range, sill, anisotro-py ratio) were used. Minimum two and maximum16 observations were required for the interpolation.Quadrants (i.e. circles divided in four equal parts)were used, with a maximal amount of observationsof four per quadrant. The search radius was 5000m.So, points further than 2200m (i.e. maximal range)were also involved in the interpolation. This isadvantageous for locations with a low density ofdata (e.g. in the northern part of the BCS). Theseobservations obtain very low weights, because theyare located outside of the distance of the maximalrange, but still carry some information.

The result of the OK (Fig. 10, top) is an almostfull coverage map. A strip in the northeast of theBCS is not covered, because data for interpolationare lacking. This map appears quite continuous(excepted the three spots in the northern part of theBCS), without the ‘bull’s eyes’ or concentricpatterns around data points, typical for ‘inversedistance’ interpolations. However, the map showsgrain-sizes with continuous values across thesandbanks. As no secondary bathymetry informa-tion was used for this map, the topography of theseabed cannot be observed inside of the pattern ofthe median grain-size values.

For the calculation of the KED map, theparameters from the variogram analysis were alsoused. Besides minimum two and maximum 16observations and quadrants with maximum fourobservations are used. The maximal search radius is9000m, corresponding to the maximal range.

The result of KED (Fig. 10, bottom) looks muchmore realistic than the result of OK. Moreover themedian grain-size varies in proportion to the depth.This is very clear in the Hinderbanks region(northern part of BCS). Values between 400 and500 mm are mainly found in the swales, while thevalues between 350 and 400 mm are dominantlyfound at the sandbanks. This pattern is also clearcloser to the coast. Unlike OK, KED made use ofthe secondary information of the bathymetry. Thetopography pattern of the seabed can be clearly seeninside of the median grain-size map.

3.2.4. Validation

The scatter plots (Fig. 11) of the observed versusthe estimated values give a first indication in thevalidation of different techniques. For linear regres-sion the correlation coefficient is much lower than

the values for both OK and KED, which demon-strates the inefficiency of this technique comparedwith both kriging techniques. The correlationcoefficient between both values is slightly largerfor KED than for OK, indicating that KED givesbetter results.

However, scatter plots have to be considered incombination with validation indices (Table 1).Linear regression yields the largest error for eachvalidation index. KED provides a better resultcompared to OK, next to a visually more realisticmap.

The estimation variance of the kriging analysisgives an indication of the overall reliability of thekriging. This is not an absolute measure ofreliability of the kriging estimate (Journel, 1993;Armstrong, 1994; Goovaerts, 1997), but it givesmore an indication of the sampling density (a highsampling density means logically a high quality).This is valuable information as it can be used toguide future sampling campaigns. Where thevariance reaches high values, new samples arepreferably taken. This allows filling gaps andmonitoring on a purposive and efficient manner.Fig. 12 shows the estimation variance of KED. Asthe KED and the OK map are based on the samesamples, only the result of KED is given in Fig. 12.For the interpolation the extreme minimum of twoobservations was used, to obtain a map thatapproaches a full coverage map. Fig. 12 indicatesclearly where this minimum of two observations istoo low to give a reliable grain-size value.

4. Discussion

The result of KED is a high-quality and high-resolution map (250� 250m) of the median grain-size at the BCS, using the bathymetry as secondaryinformation (Fig. 10 bottom). Leecaster (2003) alsoused a multivariate kriging technique in combina-tion with bathymetry for the mapping of the grain-size in Santa Monica Bay, California. She showedthat the inclusion of depth in the model improvedthe prediction in the depth-defined areas likecanyons, canyon lips and shortbanks. Most applica-tions of multivariate geostatistics using a DEM assecondary information are, however, found in thesoil science (Bourennane et al., 2000; Bourennaneand King, 2003, Hengl et al., 2004) and climatology(Goovaerts, 1999; Hudson and Wackernagel, 1994and Martinez-Cob and Cuenca, 1992). As such, itwas a challenge to apply and test these techniques in

Page 14: Title: Multivariate geostatistics for the predictive modelling of

ARTICLE IN PRESS

Fig. 10. Maps of median grain-size, on the basis of ordinary kriging (top) and kriging with an external drift (bottom). The topography of

the seabed can be recognized inside of the map below, because this methodology uses the bathymetry to assist with the interpolation.

E. Verfaillie et al. / Continental Shelf Research 26 (2006) 2454–2468 2463

Page 15: Title: Multivariate geostatistics for the predictive modelling of

ARTICLE IN PRESS

Fig. 11. Independent validation: scatter plot of true compared to estimated value using linear regression (top left), OK (top right) and

KED (bottom).

Table 1

Validation indices of independent validation for linear regression, OK and KED

LR OK KED

MEE: mean estimation error �9.17 �8.09 �5.71

MSEE: mean square estimation error 12469.29 7409.95 6745.60

RMSEE: root mean square estimation 111.67 86.09 82.13

MAEE: mean absolute estimation error 74.89 54.97 50.29

Pearson correlation coefficient r 0.42 0.72 0.75

The different parameters are explained in Section 2.

E. Verfaillie et al. / Continental Shelf Research 26 (2006) 2454–24682464

a complex marine environment, dominated by ahigh spatial variability imposed by sandbanks.Moreover, the technique was used over a large area(3600 km2) comprising a nearshore, coastal andoffshore zone, each with different morphologies.Although the data availability drastically decreased

in an offshore direction, the results were verysatisfactory.

By comparing and validating linear regressionand the two-kriging techniques (OK and KED), it isobvious that kriging is a better interpolationmethod. Moreover, for data which are unevenly

Page 16: Title: Multivariate geostatistics for the predictive modelling of

ARTICLE IN PRESS

Fig. 12. KED estimation variance of the median grain size.

E. Verfaillie et al. / Continental Shelf Research 26 (2006) 2454–2468 2465

distributed (such as the samples of the mediangrain-size), kriging has a declustering effect becauseof its well-known ‘screening effect’ (see e.g. Goo-vaerts, 1997). This results from the fact that krigingconsiders both the distance to the interpolationpoint as the sampling configuration (i.e. distancebetween observations). Consequently, kriging ispreferable to non-declustering techniques (such aslinear regression or inverse distance interpolation)for situations with unevenly distributed data.

In cases of a general anisotropy or trend (drift),one solution is to use a small search neighbourhoodso that one can assume local stationary conditionswithin it. This is clearly not a solution in thissituation where quite abrupt local changes of thebathymetry occur which needs to be modelled as a

local trend. So even locally stationary conditionscannot be assumed over the entire study area.Therefore a non-stationary method like KEDshould be used. A discussion on this topic is givenin Meul and Van Meirvenne (2003).

For applications outside the Belgian shelf, someprecautions are needed as the correlation betweenthe bathymetry and the grain-size will depend on themorphology, topography and on the substratetype. However, it is likely that some level ofcorrelation will exist (e.g. Leecaster, 2003). Thestudy area of this paper has a very definite presenceof the sandbanks dominating the topography ofthe seabed. The grain-size is expected to varyfollowing the alternation of sandbanks and swales.However, it remains open to discussion whether the

Page 17: Title: Multivariate geostatistics for the predictive modelling of

ARTICLE IN PRESSE. Verfaillie et al. / Continental Shelf Research 26 (2006) 2454–24682466

environmental setting of the study area defines thebenefit of KED and whether this should beinvestigated first. Where the bathymetry is not adominating characteristic of the study area, othersecondary information (e.g. current and waveparameters) might control the pattern of grain-sizes. At a local scale, it can be expected that thegrain-size is more related with the geomorphologythan with the bathymetry as such. However, thiskind of data is more difficult to obtain and is morevulnerable to subjectivity than bathymetry data.However, future research will test the potentialcorrelation as, nowadays, several algorithms exist toestimate the morphological variance from bathy-metric-derived features (such as depressions, crests,flats and slopes, both on a large- and small-scale).The calculation of the bathymetric position index isthe most widely available technique and is ameasure of where a location, with a definedelevation, is relative to the overall landscape (Weiss,2001; Iampietro and Kvitek, 2002; Lundblad et al.,2006). Further research will focus on the relationbetween bathymetric-derived features and physicaldatasets such as the median grain-size. Furthermorethe geomorphology will be analysed in the contextof marine habitat mapping, as topographic featuresare assumed to be important possible habitats formarine organisms.

If a good correlation can be found between thegrain-size distribution and the bathymetry (or otherenvironmental variables such as bathymetric-de-rived features), detailed grain-size maps can beproduced. These have considerable advantagescompared to the traditional static sedimentologicalmapping. Most sedimentological maps are based onthe Folk classification (Folk, 1974) giving apercentage of gravel, sand and mud. These mapsremain highly valuable, but are rather difficult touse for detailed purposes. Nowadays, there ishowever a need for detailed maps that give a directreflection of the grain-size itself. Median grain-sizevalues become more widespread available as alsobathymetry data that can assist the interpolation.This combination of information is crucial to definethe most suitable areas for aggregate extraction andto reserve these areas in a spatial planning context.Moreover, numerical sediment maps are needed toserve as an input layer for various modellinginitiatives. These relate to sediment transportmodelling or to the predictive modelling of thedistribution of soft substrata habitats. In literature,it has been shown that macrobenthic communities

in sandy shelf environments have a clear relation-ship with well-defined ranges of median grain-sizeand silt-clay percentage (Van Hoey et al., 2004; Lu,2005; Willems et al., in press). As such a mapping ofthese variables, and their querying, enables directpredictions that are biologically relevant. This callshowever for detailed sedimentological maps andthese are rarely available. In an internationalcontext, there is also a growing interest in sedimen-tological maps and this related to the concept of‘Marine Landscapes’ (Roff and Taylor, 2000; Roffet al., 2003; Golding et al., 2004). Generally, marinelandscape modelling is an approach that usesgeophysical data as a surrogate for biologicalmapping. Biological data are only used for thevalidation of the marine landscapes in terms of theirbiological relevance. In most cases, the approachremains rather broad-scale, mostly because of thelimited detail of the sedimentological maps that areused in the analysis. Schelfaut (2005) used howeverthe detailed grain-size map, described in this paper,and was able to obtain very detailed marinelandscapes with high relevance towards the biologi-cal value. Future research will focus on the mappingof other target variables, such as the silt-claypercentage, because of its importance for themapping of the occurrence of the macrobenthos insoft substrata (Van Hoey et al., 2004; Lu, 2005;Willems et al., in press).

5. Conclusions

There is a growing need for a detailed mapping ofthe seafloor and this is required at a full coveragebasis. Apart from the bathymetry, the most crucialvariable is sedimentology, as it rules sedimenttransport processes and it is often the missing linkfor the prediction of the occurrence of soft substratahabitats or macrobenthic communities/species. Themedian grain-size was chosen as environmentalparameter, as this parameter is the most calculatedby a wide variety of scientists and the mostfrequently used in modelling studies. Hence, asound interpolation of these data is highly valuablefor a wide range of disciplines.

Kriging techniques proved to be the mostpromising tools to obtain a detailed and high-quality map of the median grain-size distribution.These techniques differ from other linear estimationtechniques in their aim to minimize the errorvariance. In addition, kriging with an external driftallowed using correlated secondary information

Page 18: Title: Multivariate geostatistics for the predictive modelling of

ARTICLE IN PRESSE. Verfaillie et al. / Continental Shelf Research 26 (2006) 2454–2468 2467

such as bathymetry to assist in the interpolation.Linear regression, ordinary kriging (OK) andkriging with an external drift were compared andvalidated using an independent dataset. Severalvalidation indices were involved. The independentvalidation showed that the KED map of the mediangrain-size is much better than the results obtainedusing linear regression and better than using OK.

KED enabled to obtain a high-quality and high-resolution map (250� 250m) of the median grain-size at the BCS, using the bathymetry as secondaryinformation. The resulting map is more realistic andseparates clearly the sediment distribution over acomplex of sandbanks and swales.

Acknowledgements

This paper is part of the Ph.D. research of thefirst author, financed by the Institute for thePromotion of Innovation through Science andTechnology in Flanders (IWT-Vlaanderen). Thestudy frames into the research objectives of theproject MESH (‘‘Development of a framework forMapping European Seabed Habitats’’), financed bythe INTERREG IIIB North West Europe Pro-gramme. It also frames into the research objectivesof the projects MAREBASSE (‘‘Management,Research and Budgeting of Aggregates in ShelfSeas related to End-users’’, EV/02/18A) and BWZee(‘‘A Biological Valuation Map for the BelgianContinental Shelf’’, EV/88/37B), financed by theBelgian Science Policy.

The bathymetric data were delivered by theMinistry of the Flemish Community, Departmentof Environment and Infrastructure, Waterways andMarine Affairs Administration, Division Coast,Hydrographic Office. The sedisurf@ database hasbeen compiled by the Renard Centre of MarineGeology and consists out of data from variousinstitutes.

Finally, the authors would like to thank thereviewers for their very useful comments andsuggestions.

References

Armstrong, M., 1994. Is research in mining geostats as dead as a

dodo? In: Dimitrakopoulos, R. (Ed.), Geostatistics for The

Next Century. Kluwer Academic Publishers, Dordrecht,

pp. 303–312.

Bourennane, H., King, D., 2003. Using multiple external drifts to

estimate a soil variable. Geoderma 114, 1–18.

Bourennane, H., King, D., Couturier, A., 2000. Comparison of

kriging with external drift and simple linear regression for

predicting soil horizon thickness with different sample

densities. Geoderma 97, 255–271.

Burrough, P.A., McDonnell, R.A., 1998. Principles of Geo-

graphic Information Systems. Oxford University Press,

Oxford, 333pp.

Deutsch, C.V., Journel, A.G., 1998. GSLIB: Geostatistical

Software Library and User’s Guide, second ed. Oxford Univ.

Press, New York, 369pp.

Folk, R.L., 1974. The Petrology of Sedimentary Rocks. Hemphill

Publishing Co, Austin, 182pp.

Golding, N., Vincent, M.A., Connor, D.W., 2004. Irish Sea

Pilot—Report on the development of a Marine Landscape

classification for the Irish Sea, Joint Nature Conservation

Committee, unpublished, WWW page, www.jncc.gov.uk/

irishseapilot.

Goovaerts, P., 1997. Geostatistics for Natural Resources Evala-

tion. Oxford University Press, New York, 483pp.

Goovaerts, P., 1999. Using elevation to aid the geostatistical

mapping of rainfall erosivity. Catena 34, 227–242.

Hengl, T., Heuvelink, G.B.M., Stein, A., 2004. A generic

framework for spatial prediction of soil variables based on

regression-kriging. Geoderma 120, 75–93.

Hudson, G., Wackernagel, H., 1994. Mapping temperature

using kriging with external drift: theory and an example

from Scotland. International Journal of Climatology 14,

77–91.

Iampietro, P., Kvitek, R., 2002. Quantitative seafloor habitat

classification using GIS terrain analysis: effects of data

density, resolution, and scale. In Proceedings of the 22nd

Annual ESRI User Conference. San Diego, CA, 8–12

July, unpublished, WWW page, http://gis.esri.com/library/

userconf/proc02.

ICES, 2005. Report of the Working Group on Habitat Mapping

(WGMHM), Bremerhaven, Germany, 5–8 April, ICES CM

2005/E:05, 85 pp., unpublished.

Journel, A.G., 1993. Geostatistics: roadblocks and challenges. In:

Soares, A. (Ed.), Geostatistics Troia ’92. Kluwer Academic

Publishers, Dordrecht, pp. 213–224.

Journel, A.G., Huijbregts, C.J., 1978. Mining Geostatistics.

Academic Press Inc, London, UK, 600pp.

Leecaster, M., 2003. Spatial analysis of grain-size in Santa

Monica Bay. Marine Environmental Research 56, 67–78.

Lu, L., 2005. The relationship between soft-bottom macrobenthic

communities and environmental variables in Singaporean

waters. Marine Pollution Bulletin 51, 1034–1040.

Lundblad, E., Wright, D.J., Miller, J., Larkin, E.M., Rinehart,

R., Battista, T., Anderson, S.M., Naar, D.F., Donahue, B.T.,

2006. A benthic terrain classification scheme for American

Samoa. Marine Geodesy 29 (2), 89–111.

Martinez-Cob, A., Cuenca, R.H., 1992. Influence of elevation on

regional evapotranspiration using multivariate geostatistics

for various climatic regimes in Oregon. Journal of Hydrology

136, 353–380.

Meul, M., Van Meirvenne, M., 2003. Kriging soil texture

under different types of non-stationarity. Geoderma 112,

217–233.

Pannatier, Y., 1996. Variowin-Software for Spatial Data Analysis

in 2D. Springer, New York.

Roff, J.C., Taylor, M.E., 2000. National frameworks for marine

conservation—a hierarchical geophysical approach. Aquatic

Page 19: Title: Multivariate geostatistics for the predictive modelling of

ARTICLE IN PRESSE. Verfaillie et al. / Continental Shelf Research 26 (2006) 2454–24682468

Conservation: Marine and Freshwater Ecosystems 10,

209–223.

Roff, J.C., Taylor, M.E., Laughren, J., 2003. Geophysical

approaches to the classification, delineation and monitoring

of marine habitats and their communities. Aquatic Conserva-

tion: Marine and Freshwater Ecosystems 13, 77–90.

Schelfaut, K., 2005. Defining Marine Landscapes on the Belgian

continental shelf as an approach to holistic habitat mapping.

M.Sc. Thesis, Universiteit Gent, Belgium, unpublished.

Van Hoey, G., Degraer, S., Vincx, M., 2004. Macrobenthic

community structure of soft-bottom sediments at the Belgian

Continental Shelf. Estuarine, Coastal and Shelf Science 59,

599–613.

Wackernagel, H., 1998. Multivariate Geostatistics. Springer,

Berlin, 291pp.

Weiss, A.D., 2001. Topographic positions and landforms analysis

(Conference Poster). ESRI International User Conference.

San Diego, CA, 9–13July, unpublished.

Willems, W., Goethals, P., Van den Eynde, D., Van Hoey, G.,

Van Lancker, V., Verfaillie, E., Vincx, M., Degraer, S.

(in press). Where is the worm? Predictive modelling of the

habitat preferences of the tube-building polychaete Lanice

conchilega (Pallas, 1766). Ecological Modelling.

Wu, R.S.S., Shin, P.K.S., 1997. Sediment characteristics and

colonization of soft-bottom benthos: a field manipulation

experiment. Marine Biology 128, 475–487.