Calvalus –processing at NSDC CalFin
Mikko Kervinen, SYKE/TK
● Standard products and real-time processing do not satisfy all users
● New algorithms are developed and data is re-processed frequently
● Data volumes are constantly increasing
● Focus is shifting towards time series
In addition to distributing data to users, we want users to bring processors to data
Motivation
Monimet Life+ / K. Böttcher
3
● Sentinel 2 in the Baltic Sea region
• ~ 7 products / day
• ~ 7 GB / product
• ~ 0.35 TB / week
• ~18 TB / Year
• + Sentinel 2B!
● Envisat MERIS 2002 – 2012
• Full mission, full resolution dataset ~15Tb
● Developed during an ESA LET-SME project by Brockmann Consult GmbH
● Combination of
• Off-the-shelf computer hardware cluster and
• Software that is based on the OS big-data middleware Hadoop:
MapReduce programming model (MR) combined with a Distributed File System (DFS).
● Operates on sets of files and runs concurrent data-local processes
● Stores L1 data on-line for instant processing
● Implements common EO workflows
What is Calvalus
CalFIN as a part of data delivery chain
5
CalFIN
© ESA:SPA-COPE-ENG-RP-066-00-03
● Sentinel 2 Level-1C: Top of atmosphere reflectances in fixed cartographic geometry (combined UTM projection and WGS84 ellipsoid). These products contain applied radiometric and
geometric corrections
● Sentinel 2 Level-2A: Bottom of atmosphere reflectances in
cartographic geometry. This product is currently processed on the user side by using a processor running on ESA’s Sentinel-2
Toolbox.
CalFIN source datasets
Satellite Instrument Temporal coverage
Harvesting AOI Current (TB)
Yearly (TB)
Sentinel 2 MSI 2015 - NRT Baltic 8,6 15
Sentinel 3 OLCI + SLSTR 2016 - NRT Baltic n/a n/a
Envisat MERIS (FR) 2002 – 2012 Manual Baltic n/a 1,5
Landsat 8 OLI + TIRS 2015 - NRT Finland 1,5 1
Terra MODIS 2011 – Manual Baltic 6 1
Aqua MODIS 2016 - NRT Global 4 10
● Calvalus is system for expert EO users, not a service for general public
● Typical users can access the system via web- portal to
• Upload L2 processors to system
• Run moderate-size processing tasks
• Retrieve the results via http for analysis
● Project users gain internal access to system to
• Use the system in command-line level via SSH
• Set up a processing instances and run bulk
processing with several processing steps for large input datasets
• Implement new workflows and aggregators
• Manage data in HDFS and retrieve results via SCP
● NSDC can also process the data according to user specifications and deliver the result dataset to the customer
7
Calvalus user profiles
● A processor bundle consists of a processor and some wrapper scripts
● A processor takes L1 data file as input and generates output(s)
● Supported processors types
• BEAM / SNAP Graph processing framework operators and graphs
• Linux executable (Fortran, Python, C++ etc.)
● Bundles can be uploaded via portal and are deployed automatically to all processing nodes
User provided L2 processors bundles
● Generates biophysical products
• one input one output (dataset)
● Parallelized by processing each input independently
● User-configurable parameters: processor, source dataset, AOI etc.
9
L2 workflow
L1 File L2 Processor
(Mapper Task) L2 File L1 File L2 Processor
(Mapper Task) L2 File L1 File L2 Processor
(Mapper Task) L2 File L1 File L2 Processor
(Mapper Task) L2 File L1 File L2 Processor
(Mapper Task) L2 File
Images © Brockmann Consult GmbH
● Generates spatio-temporally gridded products
● Multiple inputs one output
● L2 is generated on-the-fly
● Map-reduce for binning step
● User-configurable parameters and aggregation strategy
L3 workflow
L3 Temp. Binning (Reducer Task) Spa.Bins
L1 File
L2 Proc. & Spat.
Binning (Mapper Task) L1 File
L2 Proc. & Spat.
Binning (Mapper Task)
Spat.Bins L1 File
L2 Proc. & Spat.
Binning (Mapper Task)
Spat.Bins L1 File
L2 Proc. & Spat.
Binning (Mapper Task)
Spat.Bins L1 File
L2 Proc. & Spat.
Binning (Mapper Task)
Spat.Bins
L3 Temp. Binning (Reducer Task)
L3 File(s) Temp.Bins
Temp.Bins
L3 Formatting (Staging)
● Generates matchup of L2 EO data and in-situ point observations
● L2 is generated on-the-fly
● Process only the corresponding pixels
11
Matchup analysis
L1 File L2 Proc. & Matcher
(Mapper Task) OutpRecs
L1 File L2 Proc. & Matcher
(Mapper Task) OutpRecs
L1 File L2 Proc. & Matcher
(Mapper Task) OutpRecs
L1 File L2 Proc. & Matcher
(Mapper Task) OutpRecs
L1 File L2 Proc. & Matcher
(Mapper Task) OutpRecs
MA Output Gen.
(Reducer Task) In-situ (point data)
MA Report
Images © Brockmann Consult GmbH
● Preliminary operations 2016-2017
● Service opens for collaborators 2017
● More information
• http://www.ymparisto.fi/envibase
• http://nsdc.fmi.fi/
• http://www.brockmann-consult.de/calvalus/
● Contact: Mikko.Kervinen@ymparisto.fi
● Demonstration wrap-up
• http://localhost:8080/calfin/calvalus.jsp
Roadmap
13
● 1 master node
• 2*2 TB SSD, 64 GB
● 1 feeder node
• 4U, 1*2TB, 24 empty 3.5’’ HD slots, 10Gbit SPF+
● switch
• 30*1GBit, 2*10Gbit
● 15 computing nodes
• 2U, 1 quadcore, 32 GB, 1 SSD, 4*4TB, 1Gbit
• Maximum of 120 concurrent processes
● ~ 240 TB hdfs storage
• replication of source data by factor of 2
● HW is extended in late 2016
CalFIN
15