Subscribe, Acquire and ReAnnounce Products

Manual section:8
Date: October 2017
Version: 2.17.10a2
Manual group:MetPx Sarracenia Suite


sr_sarra foreground|start|stop|restart|reload|status configfile sr_sarra cleanup|declare|setup configfile


sr_sarra is a program that Subscribes to file notifications, Acquires the files and ReAnnounces them at their new locations.

The notification protocol is defined here sr_post(7)

sr_sarra connects to a broker (often the same as the remote file server itself) and subscribes to the notifications of interest. It uses the notification information to download the file on the local server its running on. After, it produces a new notification for the local file on a broker (usually on the local server).

sr_sarra can be used to acquire files from sr_post(1) or sr_watch(1) or to reproduce a web-accessible folders (WAF), that announce its' products.

The sr_sarra command takes two arguments: an action start|stop|restart|reload|status. followed by a configuration file described below.

The foreground action is different. It would be used when building a configuration or debugging things. It is used when the user wants to run the program and its configfile interactively... The foreground instance is not concerned by other actions, but should the configured instances be running it shares the same (configured) message queue. The user would stop using the foreground instance by simply pressing <ctrl-c> on linux or use other means to kill its process.

The actions cleanup, declare, setup can be used to manage resources on the rabbitmq server. The resources are either queues or exchanges. declare creates the resources. setup creates and additionnaly does the bindings of queues.


This document focuses on detailing the program's options. We invite the reader to read the document sr_subscribe(7) first. It fully explains the option syntax, the configuration file location, the credentials ... etc.

Standard sarracenia configuration would expect the config file to be found in :

  • linux: ~/.config/sarra/sarra/configfile.conf
  • Windows: %AppDir%/, this might be: C:UserspeterAppDataLocalscience.gc.casarrasarraconfigfile.conf

When creating a new configuration file, the user can take advantage of executing the program with --debug foreground configfile with a configfile.conf in the current working directory.

The options used in the configfile are described in the next sections.

Multiple Streams

When executed, the program uses the default queue name. If it is stopped, the posted messages continue to accumulate on the broker in the queue. When the program is restarted, the queue name is reused, and no messages are lost.

Message processing can be parallelized by running multiple instances of the program. The program shares the same queue. The messages will be distributed between instances. Simply launch the program with option instances set to an integer greater than 1.

Consuming Options

Setting the source broker :

broker amqp{s}://<user>:<pw>@<brokerhost>[:port]/<vhost>

Setting the queue on broker :

  • queue_name <name> (default: q_<brokerUser>.<programName>.<configName>)
  • durable <boolean> (default: False)
  • expire <duration> (default: 5m == five minutes)
  • message-ttl <duration> (default: None)
  • prefetch <N> (default: 1)
  • reset <boolean> (default: False)

Setting the bindings on the queue :

  • exchange <name> (default: xpublic)
  • topic_prefix <amqp pattern> (default: varies -- developer option)
  • subtopic <amqp pattern> (subtopic need to be set)

Using regular expression filtering messages

  • accept <regexp pattern> (optional)
  • reject <regexp pattern> (optional)
  • accept_unmatch <boolean> (default: False)

Running a plugin on selected messages

  • on_message <script_name> (default: msg_log)

Specific consuming requirements

To consume messages, the mandatory options are : broker, exchange. The default bindings is all post messages from that exchange.

If the messages are posted directly from a source, the exchange used is 'xs_<brokerSourceUsername>'. Such message may not contain a source nor an origin cluster, or a malicious user may set the values incorrectly. To protect against malicious settings, administrators should set source_from_exchange to True.

  • source_from_exchange <boolean> (default: False)

Upon reception, the program will set these values in the parent class (here cluster is the value of option cluster taken from default.conf):

self.msg.headers['source'] = <brokerUser> self.msg.headers['from_cluster'] = cluster

overriding any values present in the message. This setting should always be used when ingesting data from a user exchange.

Important note 2:

The set of on_message plugins (if provided) are invoked after a product has been selected for download as described in the next section.


These options set where the program downloads the file (or the part) described by the message.

  • attempts <integer> (default: 3)
  • document_root <path> (default: .)
  • mirror <boolean> (default: true)
  • strip <integer> (default: 0)
  • inplace <boolean> (default: true)

The attempts setting sets the maximum number of times to attempt downloading of each file. The program starts by setting the relative path of the product straight from the message url:

relative_path = message's url path

If message has self.msg.headers['rename'] than :

relative_path = message's rename path

When mirror is true, we are usually in a pump to pump configuration and we are satisfied with the message's path as is.

If mirror is false, it means that we need to add the sarracenia standard yyyymmdd/source pair in front of the relative_path

if not mirror: relative_path = YYYYMMDD/<brokerUser>/relative_path

Next, the strip option is applied, if set to N>0. The relative_path has its N first directories removed... if N is too big, the filename is kept.

The document_root sets a directory the root of the download tree. This directory never appears in the newly created amqp notifications. But it serves to set the absolute path of the local file (destination)

path = document_root + relative_path (after all options applied)

The inplace option defaults to True. The program receiving notifications of file parts, will put these parts inplace in the file in an orderly fashion. Each part, once inserted in the file, is announced to subscribers.

Depending of inplace and if the message was a part, the path can change again (adding a part suffix if necessary). The resulting variables used for the local destination to download a file (or a part) are :

parent.new_file : name of the file to write. parent.new_dir : name of the directory in which to write the file. parent.msg.local_offset : offset position in the local file parent.msg.offset : offset position of the remote file parent.msg.length : length of file or part parent.msg.in_partfile : T/F file temporary in part file parent.msg.local_url : url for reannouncement

These variables are important to know if one wants to use an on_message, on_part or on_file plugin.


There are a few options that impact the dowload of a product:

  • delete <boolean> (default: False)
  • do_download <script> (default: None)
  • on_file <script> (default: file_log)
  • on_part <script> (default: None)
  • overwrite <boolean> (default: False)
  • recompute_chksum <boolean> (default: False)
  • timeout <float> (default: None)
  • kbytes_ps <int> (default: 0)

Once the path is defined in the program, if the overwrite option is set to True, the program checks if the file is already there. If it is, it computes the checksum on it according to the notification'settings. If the local file checksum matches the one of the notification, the file is not downloaded, the incoming notification is acknowledge, and the file is not reannounced. If the file is not there, or the checksum differs, the file is overwritten and a new notification is sent to the destination broker.

If delete is set to True, when the product is downloaded, it is removed from the remote server.

timeout when the protocol supports it, this option set the number of seconds to raise a TCP connect timeout. (ftp/ftps/sftp supports it)

kbytes_ps can be use to set a target for the download speed in Kbytes per second. Default is 0, meaning no control over speed. (ftp/ftps/sftp supports it)

The do_download option defaults to None. If used it defines a script that will be called when an unsupported protocol is received in a message. The user can use all the sr_sarra class elements including the message in order to set the proper download of the product. It returns True if the download succeeded.

The on_part option defaults to None. If used it defines a script that will be called when a part is downloaded. The same ideas apply, the user can do whatever he wants with the downloaded part... etc. Again it should return True to tell the program to resume processing. If false, it will continue to the next message.

The on_file option defaults to file_log, which writes a downloading status message. If used it defines a script that will be called once the file is downloaded (or all its parts are inplace). The user can do whatever he wants with the downloaded file perform checks... etc. Again it returns True to tell the program to resume processing. If it returns false, processing of the current message is stopped, and the program skips to the next message.

For each download, the checksum is computed during transfer. If recompute_chksum is set to True, and the recomputed checksum differ from the on in the message, the new value will overwrite the one from the incoming amqp message.


The program needs to set all the rabbitmq configurations for an output broker.

The post_broker option sets all the credential information to connect to the
output RabbitMQ server

post_broker amqp{s}://<user>:<pw>@<brokerhost>[:port]/<vhost>

The program seeks for the feeder option (usually defined in default.conf) and (if found) sets it as the default for post_broker. It is usually from that account that the pump deals internally with AMQP messages.

Once connected to the source AMQP broker, the program builds notifications after the download of a file has occured. To build the notification and send it to the next hop broker, the user sets these options :

  • url <url> (MANDATORY)
  • post_exchange <name> (default: xpublic)
  • on_post <script> (default: None)

The url option sets how to get the file... it defines the protocol, host, port, and optionally, the credentials. It is a good practice not to notify the credentials and separately inform the consumers about it.

The post_exchange option set under which exchange the new notification will be posted. Im most cases it is 'xpublic'.

Whenever a publish happens for a product, a user can set to trigger a script. The option on_post would be used to do such a setup.


sr_subscribe(7) - the format of configurations for MetPX-Sarracenia.

sr_report(7) - the format of report messages.

sr_report(1) - process report messages.

sr_post(1) - post announcemensts of specific files.

sr_post(7) - The format of announcements.

sr_subscribe(1) - the download client.

sr_watch(1) - the directory watching daemon.

dd_subscribe(1) - the http-only download client.