Case Study: HPC Mirroring

Continuously Mirror 27 Million File Tree Very Quickly

CAVEAT: This is a bit speculative at the time of this writing (2018/01) We expect to deploy over the winter and be completed in 2018/03. Given the volumes being copied the exact performance isn't easily measured. This article will be amended to reflect the advancing solution, until complete, then this note will be removed.

This project has taken longer than expected, over a year, as the problem space was explored with the help of a very patient client while the tool to design and implement the efficient solution was eventually settled on. The client as actually more of a partner, who had very large test cases available and ended up shouldering the responsibility for all of us to understand whether the solution was working or not. While there are many specificities of this implementation, the resulting tool relies on no specific features beyond a normal Linux file system to achieve a 72:1 speedup compared to rsync on real-time continuous mirroring of 16 terabytes in 1.9 million files per day between two trees of 27 million files each. While this averages to 185 Mbytes/second sustained over a 24 hour period, it should be noted that the transfers are very peaky. On the same equipment at the same time, another 4 terabytes per day is being written to clusters on another network, so the aggregate read rate on the operational cluster is 20 Terabytes per day (231 mbytes/second) for this application, while the file systems are used for all the normal user applications as well.

While this project had to suffer through the development, with the lessons learned and the tools now available, it should be straightforward to apply this solution to other cases. The end result is that one adds a shim library to the users' environment (transparent to user jobs), and then every time a file is written, an AMQP message with file metadata is posted. A pool of transfer daemons are standing by to transfer the files posted to a shared queue. The number of subscribers is programmable and scalable, and the techniques and topology to do the transfer are all easily controlled to optimize transfers for whatever criteria are deemed most important.

In November 2016, Environment and Climate Change Canada's (ECCC) Meteorological Service of Canada (MSC), as part of the High Performance Computing Replacement (HPCR) project asked for very large directory trees to be mirrored in real-time. Shared Services Canada (SSC) had primary responsibility for deployment of HPCR, with ECCC/MSC being the sole user community. It was known from the outset that these trees would be too large to deal with using ordinary tools. It is expected that it will take about 15 months to explore the issue and arrive at an effective operational deployment. It should be noted that SSC worked throughout this period in close partnership with ECCC, and that this deployment required the very active participation of sophisticated users to follow along with the twists and turns and different avenues explored and implemented.

The computing environment is Canada's national weather centre, whose primary application is production numerical weather prediction, where models run 7/24 hours/day running different simulations (models of the atmosphere, and sometimes waterways and ocean, and the land under the atmosphere) either ingesting current observations aka assimilation, mapping them to a grid analysis, and then walking the grid forward in time prediction/prognostic. The forecasts follow a precise schedule throughout the 24hour cycle, feeding into one another, so delays ripple, and considerable effort is expended to avoid interruptions and maintain schedule.

In the diagram above, if operations are in Data Hall 1 (left of centre) and they fail, then the goal is to resume operations promptly from data hall 2 (on the right.) For this to be realistic, production data must be available in the other hall, on the other site store, quickly. The mirroring problem is synchronizing a very large subset of data between site store 1 and site store 2. For monitoring purposes, at the same time, a smaller subset to must be mirrored to data hall 0.

There is a pair of clusters running these simulations, one normally mostly working on operations, and the other as a spare (running only research and development loads.) When the primary fails, the intent is to run operations on the other supercomputer, using a spare disk to which all the live data has been mirrored. As there are (nearly) always runs in progress, the directories never stop being modified, and there is no maintenance period when one can catch up if one falls behind.

There are essentially three parts of the problem:

Detection: obtain the list of files which have been modified (recently.)

Transfer: copy them to the other cluster.

Performance: aspirational deadline to deliver a mirrored file: five minutes.

The actual trees to mirror are the following:

psilva@eccc1-ppp1:/home/sarr111/.config/sarra/poll$ grep directory *hall1*.conf
policy_hall1_admin.conf:directory /fs/site1/ops/eccc/cmod/prod/admin
policy_hall1_archive_dbase.conf:directory /fs/site1/ops/eccc/cmod/prod/archive.dbase
policy_hall1_cmop.conf:directory /fs/site1/ops/eccc/cmod/cmop/data/maestro/smco500
policy_hall1_daily_scores.conf:directory /fs/site1/ops/eccc/cmod/prod/daily_scores
policy_hall1_hubs.conf:directory /fs/site1/ops/eccc/cmod/prod/hubs
policy_hall1_products.conf:directory /fs/site1/ops/eccc/cmod/prod/products
policy_hall1_stats.conf:directory /fs/site1/ops/eccc/cmod/prod/stats
policy_hall1_version_control.conf:directory /fs/site1/ops/eccc/cmod/prod/version_control
policy_hall1_work_ops.conf:directory /fs/site1/ops/eccc/cmod/prod/work_ops
policy_hall1_work_par.conf:directory /fs/site1/ops/eccc/cmod/prod/work_par
psilva@eccc1-ppp1:/home/sarr111/.config/sarra/poll$

Initially, it was known that the number of files was large, but there was no knowledge of the actual amounts involved. Nor was that data even available until much later.

The most efficient way to copy these trees, as was stated at the outset, would be for all of the jobs writing files in the trees to explicitly announce the files to be copied. This would involve users modifying their jobs to include invocation of sr_post (a command which queues up file transfers for third parties to perform.) ECCC set the additional constraint that modification of user jobs was not feasible, so the method used to obtain the list of files to copy had to be implicit (done by the system without active user involvement.)

One could just scan at a higher level in order to scan a single parent directory, but the half-dozen sub-trees trees were picked in order to have smaller ones which worked more quickly, regardless of the method being used to obtain lists of new files. What do we mean when we say these trees are too large? The largest of these trees is hubs ( /fs/site1/ops/eccc/cmod/prod/hubs. ) rsync was run on the hubs directory, as just walking the tree once, without any file copying going on. The walk of the tree, using rsync with checksumming disabled as an optimization, resulted in the log below:

psilva@eccc1-ppp1:~/test$ more tt_walk_hubs.log
nohup: ignoring input
rsync starting @ Sat Oct  7 14:56:52 GMT 2017
number of files examined is on the order of: rsync --dry-run --links -avi --size-only /fs/site1/ops/eccc/cmod/prod/hubs /fs/site2/ops/eccc/cmod/prod/hubs |& wc -l
27182247
rsync end @ Sat Oct  7 20:06:31 GMT 2017
psilva@eccc1-ppp1:~/test$

A single pass took over five hours, to examine 27 million files or examining about 1500 files per second. The maximum rate of running rsyncs on this tree is thus on the order of once every six hours (to allow some time for copying) for this tree. Note that any usual method of copying a directory tree requires traversing it, and that there is no reason to believe that any other tool such as find, dump, tar, tree, etc... would be significantly quicker than rsync. We need a faster method of knowing which files have been modified so that they can be copied.

There is a Linux kernel feature known as INOTIFY, which can trigger an event when a file is modified. By setting an INOTIFY trigger on every directory in the tree, we can be notified of when any file is modified in the tree. This was the initial approach taken. It turns out (in January 2017), that INOTIFY is indeed a Linux feature, in that the INOTIFY events only propagate across a single server. With a cluster file system like GPFS, one needs to run an INOTIFY monitor on every kernel where files are written. So rather than running a single daemon, we were faced with running several hundred daemons (one per physical node), each monitoring the same set of 10's of millions of files. Since the deamons were running on many nodes, the memory use rose into the terabyte range.

An alternate approach is, instead of running the modification detection at the Linux level, use the file system itself, which is database driven, to indicate which files had been modified. The HPC solution's main storage system uses IBM's General Parallel File System, or GPFS. Using the GPFS-policy method, a query is run against the file system database at as high a rhythm as can be sustained (around five to ten minutes per query.) combined with sr_poll to announce of files modified (and thus eligible for copying.) This is completely non-portable, but was expected to be much faster than file tree traversal.

Over the winter 2016/2017, both of these methods were implemented. The Inotify based sr_watch was the fastest method (instantaneous), but the daemons were having stability and memory consumption problems, and they also took too long to startup ( requires an initial tree traversal, which takes the same time as the rsync). While slower (taking longer to notice a file was modified), the GPFS policy had acceptable performance and was far more reliable than the parallel sr_watch method,and by the spring, with deployment expected for early July 2017, the GPFS policy approach was selected.

As the migration progressed, the file systems grew in that they had more files in the trees, and the GPFS-policy method progressively slowed. Already in July, this was not an acceptable solution. At this point, the idea of intercepting jobs' file i/o calls with a shim library was introduced. ECCC told SSC at the time, that having correct feeds, and having everything ready for transition was the priority, so the focus of efforts was in that direction until the migration was achieved in September. In spite of being a lower priority over the summer, a C implementation of the sending portion of the sarra library was implemented along with a prototype shim library to call it.

It needs to be noted that while all of this work was progressing on the 'obtain the list of files to be copied' part of the problem, we were also working on the 'copy the files to the other side' part of the problem. Over the summer, results of performance tests and other considerations militated frequent changes in tactics. The site stores are clusters in their own right. They have protocol nodes for serving traffic outside of the GPFS cluster. There are siteio nodes with infiniband connections and actual disks. the protocol nodes (called nfs or proto) are participants in the GPFS cluster dedicated to i/o operations, used to offload i/o from the main compute clusters (PPP and Supercomputer), which have comparable connections to the site store as the protocol nodes.

There are multiple networks (40GigE, Infiniband, as well as management networks.) and the one to use needs to be chosen as well. Then there are the methods of communication (ssh over tcp/ip? bbcp over tcp/ip? GPFS over tcpip? ipoib? native-ib? )

Many different sources and destinations (ppp, nfs, and protocol nodes), as well many different methods ( rcp, scp, bbcp, sscp, cp, dd ) and were all trialled to different degrees at different times. At this point several strengths of sarracenia were evident:

The separation of publishing from subscribing means that one can subscribe on the source node and push to the destination, or on the destination and pull from the source. It is easy to adapt for either approach. (ended up on destination protocol nodes, pulling from the source
The separation of copying from the computational jobs means that the models run times are unaffected, as the i/o jobs are completely separate.
The ability to scale the number of workers to the performance needed. (Eventually settled on 40 workers performing copies in parallel.)
The availability of plugins download_cp, download_rcp, download_dd, allow many different copy programs (and hence protocols) to be easily applied to the transfer problem.

Many different criteria were considered (such as: load on nfs/protocol nodes, other nodes, transfer speed, load on PPP nodes,) The final configuration selected of using cp (via the download_cp plugin) initiated from the receiving site store's protocol nodes. So the reads would occur via GPFS over IPoIB, and the writes would be done over native GPFS over IB. This was not the fastest transfer method tested (bbcp was faster) but it was selected because it spread the load out to the siteio nodes, resulted in more stable NFS and protocol nodes and removed tcp/ip setup/teardown overhead. The 'copy the files to the other side' part of the problem was stable by the end of the summer of 2017, and the impact on system stability is minimized.

Unfortunately, the mirroring between sites was running with about a 10 minutes lag on the source files system ( about 30 times faster than a naive rsync approach. ), and was only working in principle, with many files missing in practice, it wasn't usable for it's intended purpose. The operational commissioning of the HPCR solution as a whole (with mirroring deferred.) occurred in September of 2017, and work on mirroring essentially stopped until October (because of activities related to the commissioning work.)

We continued work on two approaches, the libcshim, and the GPFS-policy. The queries run by the GPFS-policy had to to be tuned, eventually an overlap of 75 seconds (where a succeeding query would ask for file modifications up to a point 75 seconds before the last one ended.) because there were issues with files being missing in the copies. Even with this level of overlap, there were still missing files. At this point, in late November, early December, the libcshim was working well enough to be so encouraging that folks lost interest in the GPFS policy. In contrast to an average of about 10 minutes delay starting a file copy with GPFS-policy queries, the libcshim approach has the copy queued as soon as the file is closed on the source file system.

It should be noted that when the work began, the python implementation of Sarracenia was a data distribution tool, with no support for mirroring. as the year progressed features: symbolic link support, file attribute transportation, file removal support were added to the initial package. The idea of periodic processing (called heartbeats) was added, first to detect failures of clients (by seeing idle logs) but later to initiate garbage collection for the duplicates cache, memory use policing, and complex error recovery. The use case precipitated many improvements in the application, including a second implementation in C for environments where a python3 environment was difficult to establish, or where efficiency was paramount (the libcshim case.)

In December 2017, the software for the libcshim approach looks ready, it is deployed in some small parallel (non-operational runs.) It is expected that in January 2018, more parallel runs will be tried, and it should proceed to operations this winter. It is expected that the delay in files appearing on the second file system will be on the order of five minutes after they are written on the source tree, or 72 times faster than rsync (see next section for performance info.)

The question naturally arose, if the directory tree cannot be traversed, how do we know that the source and destination trees are the same? A program to pick random files on the source tree is used to feed an sr_poll, which then adjusts the path to compare it to the same file on the destination. Over a large number of samples, we get a quantification of how accurate the copy is. The plugin for this comparison is still in development.

FIXME: include links to plugins
FIXME: Another approach being considered is to compare file system snapshots.

The GPFS-policy runs are the still the method in use operatonally as this is written (2018/01.) The performance numbers given in the summary are taken from the logs of one days of GPFS-policy runs.

Hall1 to Hall2: bytes/days: 18615163646615 = 16T, nb file/day: 1901463

Hall2 yo CMC: bytes/days: 4421909953006 = 4T, nb file/day: 475085

All indications are that the shim library copies more data more quickly than the policy based runs, but so far (2018/01) only subsets of the main tree have been tested. On one tree of 142000 files, the GPFS-policy run had a mean transfer time of 1355 seconds (about 23 minutes), where the shim library approach had a mean transfer time of 239 seconds (less than five minutes.) or a speedup for libshim vs. GPFS-policy of about 4:1. On a second tree where the shim library transferred 144 thousand files in a day, the mean transfer time was 264 seconds, where the same tree with the GPFS-policy approach took 1175 (basically 20 minutes) The stats are accumulated for particular hours, and at low traffic times, the average transfer time with the shim library was 0.5 seconds, vs. 166 seconds with the policy. One could claim a 300:1 speedup, but this is just inherent to the fact that GPFS-policy method must be limited to a certain polling interval (five minutes) to limit impact on the file system, and that provides a lower bound on transfer latency.

On comparable trees, the number of files being copied with the shim library is always higher than with the GPFS-policy. While correctness is still being evaluated, the shim method is apparently working better than the policy runs. If we return to the original rsync performance of 6 hours for the tree, then the ratio we expect to deliver on is six hours vs. 5 minutes ... or 72:1 speedup.

the above is based on the following client report:

Jan 4th
Preload:
dracette@eccc1-ppp1:~$ ./mirror.audit_filtered -c ~opruns/.config/sarra/subscribe/ldpreload.conf  -t daily -d 2018-01-04
Mean transfer time: 238.622s
Max transfer time: 1176.83s for file: /space/hall2/sitestore/eccc/cmod/cmoi/opruns/ldpreload_test/hubs/suites/par/wcps_20170501/wh/banco/cutoff/2018010406_078_prog_gls_rel.tb0
Min transfer time: 0.0244577s for file: /space/hall2/sitestore/eccc/cmod/cmoi/opruns/ldpreload_test/hubs/suites/par/capa25km_20170619/gridpt/qperad/radar/radprm/backup/ATX_radprm
Total files: 142426
Files over 300s: 44506
Files over 600s: 14666
Policy:
dracette@eccc1-ppp1:~$ ./mirror.audit_filtered -c ~opruns/.config/sarra/subscribe/mirror-ss1-from-hall2.conf  -t daily -d 2018-01-04
Mean transfer time: 1355.42s
Max transfer time: 2943.53s for file: /space/hall2/sitestore/eccc/cmod/prod/hubs/suites/par/capa25km_20170619/gridpt/qperad/surface/201801041500_tt.obs
Min transfer time: 1.93106s for file: /space/hall2/sitestore/eccc/cmod/prod/archive.dbase/dayfiles/par/2018010416_opruns_capa25km_rdpa_final
Total files: 98296
Files over 300s: 97504
Files over 600s: 96136

Jan 3rd
Preload:
dracette@eccc1-ppp1:~$ ./mirror.audit_filtered -c ~opruns/.config/sarra/subscribe/ldpreload.conf  -t daily -d 2018-01-03
Mean transfer time: 264.377s
Max transfer time: 1498.73s for file: /space/hall2/sitestore/eccc/cmod/cmoi/opruns/ldpreload_test/hubs/suites/par/capa25km_20170619/gridpt/capa/bassin/6h/prelim/05/2018010312_05ME005_1.dbf
Min transfer time: 0.0178287s for file: /space/hall2/sitestore/eccc/cmod/cmoi/opruns/ldpreload_test/hubs/suites/par/capa25km_20170619/gridpt/qperad/radar/statqpe/backup/XSS_0p1_statqpe
Total files: 144419
Files over 300s: 60977
Files over 600s: 14185
Policy:
dracette@eccc1-ppp1:~$ ./mirror.audit_filtered -c ~opruns/.config/sarra/subscribe/mirror-ss1-from-hall2.conf  -t daily -d 2018-01-03
Mean transfer time: 1175.33s
Max transfer time: 2954.57s for file: /space/hall2/sitestore/eccc/cmod/prod/hubs/suites/par/capa25km_20170619/gridpt/qperad/surface/201801032200_tt.obs
Min transfer time: -0.359947s for file: /space/hall2/sitestore/eccc/cmod/prod/hubs/suites/par/capa25km_20170619/gridpt/qperad/radar/pa/1h/XTI/201801031300~~PA,PA_PRECIPET,EE,1H:URP:XTI:RADAR:META:COR1
Total files: 106892
Files over 300s: 106176
Files over 600s: 104755

To keep in mind:

We have 12 instances for the preload while we’re running 40 for the policy.

* I filtered out the set of files that skewed the results heavily.
* The preload audit in hourly slices shows that it’s heavily instance-bound.
* If we were to boost it up it should give out much better results in high count situations.

Here’s Jan 4th  again but by hourly slice:


dracette@eccc1-ppp1:~$ ./mirror.audit_filtered -c ~opruns/.config/sarra/subscribe/ldpreload.conf  -t hourly -d 2018-01-04
00 GMT
Mean transfer time: 0.505439s
Max transfer time: 5.54261s for file: /space/hall2/sitestore/eccc/cmod/cmoi/opruns/ldpreload_test/hubs/suites/par/capa25km_20170619/gridpt/qperad/radar/pa/6h/XME/201801040000~~PA,PA_PRECIPET,EE,6H:URP:XME:RADAR:META:NRML
Min transfer time: 0.0328007s for file: /space/hall2/sitestore/eccc/cmod/cmoi/opruns/ldpreload_test/hubs/suites/par/capa25km_20170619/gridpt/qperad/radar/statqpe/backup/IWX_0p5_statqpe
Total files: 847
Files over 300s: 0
Files over 600s: 0
01 GMT
Mean transfer time: 166.883s
Max transfer time: 1168.64s for file: /space/hall2/sitestore/eccc/cmod/cmoi/opruns/ldpreload_test/hubs/suites/par/wcps_20170501/wh/banco/cutoff/2018010318_078_prog_gls_rel.tb0
Min transfer time: 0.025425s for file: /space/hall2/sitestore/eccc/cmod/cmoi/opruns/ldpreload_test/hubs/suites/par/capa25km_20170619/gridpt/qperad/biais/6h/XPG/201801031800_XPG_statomr
Total files: 24102
Files over 300s: 3064
Files over 600s: 1
02 GMT
Mean transfer time: 0.531483s
Max transfer time: 4.73308s for file: /space/hall2/sitestore/eccc/cmod/cmoi/opruns/ldpreload_test/archive.dbase/dayfiles/par/2018010401_opruns_capa25km_rdpa_preli
Min transfer time: 0.0390887s for file: /space/hall2/sitestore/eccc/cmod/cmoi/opruns/ldpreload_test/hubs/suites/par/capa25km_20170619/gridpt/qperad/radar/radprm/XMB/201801031900_XMB_radprm
Total files: 774
Files over 300s: 0
Files over 600s: 0
03 GMT
Mean transfer time: 0.669443s
Max transfer time: 131.666s for file: /space/hall2/sitestore/eccc/cmod/cmoi/opruns/ldpreload_test/hubs/suites/par/capa25km_20170619/gridpt/qperad/radar/pa/1h/WKR/201801032000~~PA,PA_PRECIPET,EE,1H:URP:WKR:RADAR:META:COR2
Min transfer time: 0.0244577s for file: /space/hall2/sitestore/eccc/cmod/cmoi/opruns/ldpreload_test/hubs/suites/par/capa25km_20170619/gridpt/qperad/radar/radprm/backup/ATX_radprm
Total files: 590
Files over 300s: 0
Files over 600s: 0
04 GMT
Mean transfer time: 59.0324s
Max transfer time: 236.029s for file: /space/hall2/sitestore/eccc/cmod/cmoi/opruns/ldpreload_test/hubs/suites/par/wcps_20170501/wf/depot/2018010400/nemo/LISTINGS/ocean.output.00016.672
Min transfer time: 0.033812s for file: /space/hall2/sitestore/eccc/cmod/cmoi/opruns/ldpreload_test/hubs/suites/par/resps_20171107/forecast/products_dbase/images/2018010400_resps_ens-point-ETAs_239h-boxplot-NS_Pictou-001_240.png
Total files: 2297
Files over 300s: 0
Files over 600s: 0
05 GMT
Mean transfer time: 6.60841s
Max transfer time: 28.6136s for file: /space/hall2/sitestore/eccc/cmod/cmoi/opruns/ldpreload_test/hubs/suites/par/rewps_20171018/forecast/products_dbase/images_prog/2018010400_rewps_ens-point-Hs_Tp_072h-45012-000_072.png
Min transfer time: 0.0278831s for file: /space/hall2/sitestore/eccc/cmod/cmoi/opruns/ldpreload_test/hubs/suites/par/capa25km_20170619/gridpt/qperad/radar/statqpe/XSM/201801032200_XSM_0p2_statqpe
Total files: 3540
Files over 300s: 0
Files over 600s: 0
06 GMT
Mean transfer time: 1.90411s
Max transfer time: 18.5288s for file: /space/hall2/sitestore/eccc/cmod/cmoi/opruns/ldpreload_test/hubs/suites/par/capa25km_20170619/gridpt/qperad/radar/statqpe/backup/ARX_0p5_statqpe
Min transfer time: 0.0346384s for file: /space/hall2/sitestore/eccc/cmod/cmoi/opruns/ldpreload_test/hubs/suites/par/capa25km_20170619/gridpt/qperad/biais/6h/WWW/201801040600_WWW_statomr
Total files: 757
Files over 300s: 0
Files over 600s: 0
07 GMT
Mean transfer time: 262.338s
Max transfer time: 558.845s for file: /space/hall2/sitestore/eccc/cmod/cmoi/opruns/ldpreload_test/hubs/suites/par/capa25km_20170619/gridpt/capa/bassin/6h/final/11/2018010400_11AA028_1.shp
Min transfer time: 0.028173s for file: /space/hall2/sitestore/eccc/cmod/cmoi/opruns/ldpreload_test/hubs/suites/par/capa25km_20170619/gridpt/qperad/biais/6h/DLH/201801040000_DLH_statomr
Total files: 23849
Files over 300s: 11596
Files over 600s: 0

Dominic Racette - ECCC CMC Operations Implementation

Client lead on the mirroring project. A lot of auditing and running of tests. integration/deployment of copying plugins. a great deal of testing and extraction of log reports.

Doug Bender - ECCC CMC Operations Implementation

Another client analyst participating in the project. Awareness, engagement, etc...

Daluma Sen - SSC DCSB Supercomputing HPC Optimization

Building C libraries in HPC environment, contributing the random file picker, general consulting.

Alain St-Denis - Manager, SSC DCSB Supercomputing HPC Optimization

Inspiration, consultation, wise man. Initially proposed shim library.

Daniel Pelissier - SSC DCSB Supercomputing HPC Integration / then replacing Alain.

Inspiration/consultation on GPFS-policy work, and use of storage systems.

Tarak Patel - SSC DCSB Supercomputing HPC Integration.

Installation of Sarracenia on protocol nodes and other specific locations. Development of GPFS-policy scripts, called by Jun Hu's plugins.

Jun Hu - SSC DCSB Supercomputing Data Interchange

Deployment lead for SSC, Developed GPFS-policy Sarracenia integration plugins, implemented them within sr_poll, worked with CMOI on deployments Shouldered most of SSC's deployment load. Deployment of inotify/sr_watch implementation.

Noureddine Habili - SSC DCSB Supercomputing Data Interchange

Debian packaging for C-implementation. Some deployment work as well.

Peter Silva - Manager, SSC DCSB Supercomputing Data Interchange

Project lead, made C implementation including shim library, hacked on the Python also from time to time. Initial versions of most plugins.

Michel Grenier - SSC DCSB Supercomputing Data Interchange

Python Sarracenia development lead. Some C fixes as well.

Deric Sullivan - Manager, SSC DCSB Supercomputing HPC Solutions

Consultation/work on deployments with inotify solution.

Walter Richards - SSC DCSB Supercomputing HPC Solutions

Consultation/work on deployments with inotify solution.

Jamal Ayach - SSC DCSB Supercomputing HPC Solutions

Consultation/work on deployments with inotify solution.

FIXME: who else should be here: ?

There was also support and oversight from management in both ECCC and SSC throughout the project.

Case Study: HPC Mirroring

Continuously Mirror 27 Million File Tree Very Quickly

Summary

Problem Statement

HPCR Solution Overview

Continuous Mirrorring

Reading the Tree Takes Too Long

Detection Methods: Inotify, Policy, SHIM

Copying Files

Shim Library Necessary

Does it Work?

Is it Fast?

Contributions