Sarracenia is a small application iteratively developed by addressing one use case at a time, so development and deployment have been inextricably linked up to this point. That iterative process precipitated changes in the core of the application which have made it something of a moving target until now. In January 2018, the application has reached the point where all intended use cases are addressed by the application core. In the coming year, the emphasis will be on facilitating on-boarding, development of some derived services, and deploying the newly complete application more generally.
The November 2015 video ( Sarracenia in 10 Minutes ) outlined a vision. First phase of development work occurred in 2015 and early 2016, followed by important deployments later in 2016. This update, written in early 2018, explores progress made mostly in 2017.
Use cases mentioned in the video which were implemented:
Use cases in the video, but not yet realized:
Un-anticipated use cases implemented:
Details to follow.
The slide below corresponds to deployed daily data flows in support of Environment Canada, mostly for operational weather forecasting, in place since January 2018.
Sarracenia is being used operationally to acquire about four terabytes observations from automated weather observing systems, Weather RADARS which deliver data directly to our hubs, international peer operated public file services, which provide satellite imagery and numerical products from other national weather centres.
Within the main high performance computing (HPC) data centre, there are two supercomputers, two site stores, and two pre and post-processing clusters. Should a component in one chain fail, the other can take over. The input data is sent to a primary chain, and then processing on that chain is mirrored, using sarracenia to copy the data to the other chain. That´s about 16 of the 25 terabytes of the data centre traffic in this diagram.
A distillation of the data acquired, and the analysis and forecasts done in HPC, is the seven terabytes at the top right, that is sent to the seven regional Storm Prediction Centres (SPC´s.)
The products of the SPC´s and the central HPC are then shared with the public and partners in industry, academia and other governments.
FIXME: picture?
There is a number (perhaps a dozen?) older applications (most prominent ones being BULLPREP and Scribe) used for decades in the Storm Prediction Centres to create forecast and warning products. These applications are based on a file tree that they read and write. Formerly, each application had its own backup strategy with one of the six other offices and bi-lateral arrangements were made to copy specific data among the trees.
In January 2017, complete 7-way replication of the state file trees of the applications was implemented so that all offices have copies of files in real-time. This is accomplished using Sarracenia through the eastern hub. Any forecast office can now take over work on any product for any other, with no specific application work needed at all.
Acquisition of simulated and real GOES-R products from NOAA's PDA, as well as via local downlinks at one location (eventually to become two) was entirely mediated by Sarracenia. The operational deployment of GOES-R happened in the first week of January, 2018.
FIXME: picture?
There supercomputing environment was entirely replaced in 2017. As part of that, the client Environmental Data acquisition suite (ADE French acronym) was re-factored to work with much higher performance than formerly, and to accept Sarracenia feeds directly, rather than accepting feeds from previous generation pump (Sundew.) The volume and speed of data acquisition has been substantially improved as a result.
If we begin with RADAR data acquisition as an example, individual RADAR systems use FTP and/or SFTP to send files to eastern and western communications hubs. Those hubs run the directory watching component (sr_watch) and determine checksums for the volume scans as they arrive. The Unified RADAR Processing (URP) systems sr_subscribes to a hub, listening for new volume scans, and downloads new data as soon as they are posted. URP systems then derive new products and advertise them to the local hub using the sr_post component. In time, we hope to have a second URP fully at the western hub.
In regional offices, the NinJo visualization servers download volume scans and processed data from URP using identical subscriptions, pulling the data from whichever national hub makes the data available first. The failure of a national hub is transparent for RADAR data in that the volume scans will be downloaded from the other hub, and the other URP processor will produce the products needed.
Each site has multiple Ninjo servers. We use http-based file servers, or web accessible folders to serve data. This allows easy integration of web-proxy caches, which means that only the first Ninjo server to request data will download from the national hub. Other Ninjo servers will get their data from the local proxy cache. The use of Sarracenia for notifications when new products are available is completely independent of the method used to serve and download data. Data servers can be implemented with a wide variety of tools and very little integration is needed.
All through 2017, work was proceeding to implement high speed mirroring between the supercomputer site stores to permit failover. That work is now in a final deployment phase, and should be in operations by spring 2018. For more details see: HPC Mirroring Use Case
Development of Sarracenia had been exploratory over a number of years. The use cases initially attacked were those with a high degree of expert involvement. It proceeded following the minimum viable product (MVP) model for each use case, acquiring features to deal with next use case prior to deployment. In 2016, national deployment of NinJo and the Weather.
Expanded use cases explored:
Changes to support end user usage:
The only major operational feature introduced in 2017 was save/restore/retry: If a destination has a problem, there is substantial risk of overloading AMQP brokers by letting queues of products to transfer grow into millions of entries. Functionality to efficiently (in parallel) offload broker queues to local disk was implemented to address this. At first, recovery needed to be manually triggerred (restore) but by the end of the year, an automated recovery (retry) mechanism was working its way to deployment, which will reduce requirements for oversight and intervention in operations.
As of release 2.18.01a5, all of the use cases targeted have been explored and reasonable solutions are available, so there should be no further changes to the existing configuration language or options. No changes to existing configuration settings are planned. Some minor additions may still occur, but not at the cost of breaking any existing configurations. The core application is now complete.
Expect in early 2018 for the last alpha package release and for subsequent work to be on a beta version with a target of a much more long-lived stable version some time in 2018.