QDDS (Quake Data Distribution System) Notes

Stephen Jacob

These are notes that Stephen Jacob made in September 1998 as he finished his part of the QDDS project. The final part of the note listed "work to be done." These tasks have been completed so they have been removed. We retain the first portion since it shows Stephen's objectives as he developed the program. (ALJ - 2000.01.25.)

How does it operate?

The software is based on an expanded client-server model. In a client-server model, the client makes requests to a server, and the server merely responds to each request. In this case, the server initiates contact when it has data to send as well as responding to requests. This means that it does not truly fit into the client-server model.

Instead, I use the term "hub" for the server-like systems at the center of the model, and "leaf" for the client-like systems. Each seismic network wishing to participate in the data exchange would have a "leaf", and there would be two or more "hubs". Since all are "servers" in a sense, one can refer to them as "leaves" and "hubs" or "leaf servers" and "hub servers".

When a seismic network has data on an event to share, they must put the data to be shared in a file in a spool directory which the QDDS software polls every few seconds.

The (leaf) software picks up the contents of the file in the spool directory, and submits/uploads it to every hub using a TCP connection. It keeps trying to upload for a period of time if it does not get an immediate connection.

(Note: Leaves keep a list of hubs, and hubs of leaves, in memory, loaded on startup from the file "comm.lst" – the communications list. It tells them both where to send messages and where to allow messages from. It contains the following information for each host: address, password, UDP port, TCP port).

When a hub receives such a message (unnumbered), it stores it to its output directory. It then assigns it a message number, initiates distribution of the message to all the leaves in the network (network in the sense of a set of communicating QDDS leaves and hubs), and stores it to a storage directory in a file identifiable by the message number. This means that it can recall it if requested to do so. The hub uses UDP datagrams to distribute the message, unless the data to be sent is of too great a size to fit in a single UDP datagram (64kb limit), in which case it uses TCP.

Because UDP datagrams may not reach all leaves and TCP socket connections are only attempted once in message distribution, a leaf may miss messages. To deal with this, we have several measures in place. A leaf may send a request to a hub for a message using the message number (ID). The hub will respond either with a data message corresponding to that ID (just as in distribution), or a message to say "I received your request, but I do not have a message with that ID". The recipient leaf will register either response as "I am no longer missing that ID from that hub".

A leaf determines that it has missed messages in one of two situations: Usually, it will receive, for example, message 3 followed by message 5. It then knows that it missed message 4. This happens often because messages may be dispatched in parallel and take varying amounts of time to arrive. The other way is due to the solution for a predicted problem. If we miss message 8, and there is no message 9 for a long time, we do not realize that we are missing message 8. We solve this by having the hubs send out an "alive"/heartbeat message if no events have been dispatched in the last few minutes. The "alive" message contains the ID of the last message that was dispatched. This means that a leaf is likely to receive the alive message, or at least one of the first few alive messages sent out during a quiet period, and can register that it has missed a message or messages, and request it/them.

The leaves (and, at least for the moment, the hubs) have a MissedTracking object which keeps track of which messages have been missed, and every few minutes sends out a request for each message that is currently missing. Each hub assigns its own set of message numbers completely independently, and in no way necessarily synchronized with the other hubs, so the MissedTracking object keeps a record of missed messages for each host in the hub/leaf’s communications list. Every time a DATA, NODATA, or ALIVE message is received, the MissedTracking object is informed, and it uses that information to keep its lists current.