Communications

Carrier

The main mode of node-to-node communication in OctoMY™ is carried over UDP. While other carriers such as Bluetooth® are supported for authentication and pairing purposes, UDP is the day to day workhorse.

Before selecting UDP, we considered many alternativers. For example TCP was a strong contender. We ultimately decided to drop TCP. The reason is that TCP tries to pretend on behalf of the user that network traffic is a assured linear unbroken stream of bytes. From the user's perspective this has the benefit of being easy to understand and use. The two major downsides however are that

  1. This far from how a network actually works and so it is hard to make TCP work in practice. It has taken decades of evolution for TCP to become as good as it is today, but it is still limited by this fallacy.
  2. Some needs may actually be better met by not thinking about the network as a linear stream of bytes, so going the length to pretend that it is actually just gets in the way.

CommsChannel

CommChannel is the main node-to-node communications API in OctoMY™, and it is a wrapper around the UDP code found  in Qt5. It tries to exploit the benefits of communications over UDP by modelling closely the benefits in a way that hides their inherent complexities. This means among other things that the API embraces the UDP payload size of ~512 bytes as the largest continuous array of data that may be shifted at one time. CommsChannel may be made to work over other carriers in the future, but this packet-centric view will not change.

  • Communications have two "layers", the intrinsic layer and the courier layer.
  • The courier part is reserved for the application layer that wish to conduct communication using comms channel.
  • The intrinsic part is reserved for internal affairs of commschannel.
  • Intrinsic and courier parts of commschannel should not depend directly on one another and their respective implementation should not be mixed.

NOTE: We have some conflicting requirements for the protocol:
  • The protocol should be as stateless as possible for robustness
  • The protocol should be as low-bandwidth as possible
Where is the conflict? For every mode/state you add, you save bytes going over the wire (because the other side will "remember" the state),  while for every mode/state you store, you will add an opportunity for failure resulting from missing state-carrying packets. In other words, sending all state in every packet means that no state is ever dropped, but adds up to lots of data. To alleviate this conflict, we do the following:
  • We add a mechanism to verify the state so that mismatch can be detected and state can be re-sent
  • For data that is small and/or important, we re-send more often.
  • For data that is larger and/or less important we re-send more seldom.
We call this mechanism "sync", and every packet may request sync.

The CommChannel API works like this:
  1. Users of the API register Couriers that each is responsible for keeping the latest data fresh and ready for sending should an opportunity present itself. Couriers each tend to a certain type of packet with a certain priority and desired sending frequency. It is up to each courier to maintain it's own state.
  2. CommChannel is in charge and decides the speed at which packets are sent and which couriers get their packets sent on each opportunity.
  3. CommChannel may at any time send non-payload data in each packet or even special purpose network-only packets to sustain it's operation. If there is no data to be sent by couriers, CommChannel may send no-op packets that facilitates the calculation of the network characteristics such as round trip time. This is done transparently to the couriers (so you could say couriers are on a higher layer in the OSI model).
  4. CommChannel binds to a local address and port, but does not really discriminate from where inbound traffic arrives. All packets are treated equal as all UDP packets inherently contain an identification of the source.
  5. Communication between control and agent is initiated when user presses the "connect" buttons in the respective user interfaces, at which point agent will attempt to contact all trusted controls at their last known address until one or more answer are received. From then on, all connections that did not result in valid responses will be closed, and only retried periodically.
  6. All agent initiated communication will be broadcast to all active controls in parallel. Remote initiated transfers will remain private.

Courier

The CommsChannel holds a list of registered Couriers and will try to keep them happy. Couriers may be active or inactive, only active couriers will recieve data or get sending opportunities by the CommsChannel. There are many types of Couriers, each look after different sets of data. Here is a list of important Couriers:
  • AgentStateCourier - The main courier. It exposes the state of the agent and facilitates a way for multiple controls to manage the state together with the agent. Each parameter has metadata assigned to it so that each party knows what value to expect for that parameter should communications fail, and the API provides mechanisms for the clients to ask such questions as "is this data verified"? and "How fresh is this value"? The AgentStateCourier maintains a stream of idle packets at all times as a means to assure each client that the latest data is correct.
  • BlobCourier - General purpose assured transfer mechanism to move blobs of data that are of arbitrary size. Takes care of re-transmission of bad packets. Has an asynchronous API that allows clients to subscribe to events such as "progress changed", "transmission aborted/failed" and "transmission complete". Used for sending big objects such as binary files etc.
CourierMandate
The Courier interface allows couriers to expose a CourierMandate object to CommsChannel. This object reveals the following:
  • How urgent does this courier feel her need to send data? Expressed as a number of milliseconds until next  sending time.
  • How high priority does this courier feel that she is? Expressed as an integer 0-255 where lower number is lower priority.
  • Do we accept reads? Couriers that does not wish to accept data will have any data sent to them discarded.
  • Do we want to send? Couriers that does not wish to send will not receive sending opportunities.
  • Payload size. When sending, we expect at most the number of bytes as expressed by this integer.
Multiplexing in controls

The agent lives the easy life. The decision to be connected is simply an on-off switch in it's user interface. The controls on the other hand has to juggle between connections for any agents that it will control. How is this managed?

We will disregard hub in this section, and focus on remoteIn remote, there exist one ClientWidget instance per agent that is communicating. That instance is responsible for adding and removing the appropriate couriers to CommsChannel dynamically as needed. The CommsChannel itself will simply look at the currently registered couriers and work with those:

  • Uppon the start of application, all entries in NodeAssociateStore are added to CommsSessionDirectory with their stored state. This ensures that session will persist accross application terminations.
  • CommsChannel may be in normal or honeymoon mode where honeymoon mode is typically enabled in a period just after communications are enabled, or at users digression.
  • CommsChannel in honeymoon mode will ping all nodes that are not active continuously at a short (seconds) interval until the honeymoon mode wears off.
  • CommsChannel in normal mode will ping all nodes that are not active at an exponentially decaying frequency as a function of how long since their last recorded ping/activity response.
  • Any node that replies to a ping are imideately upgraded to active.
  • Any node that is active but without a valid session has a handshake started, and communication is maintained with them through a newly created session until completion or until the connection times out.
  • If communication with an active node is explicitly completed, it will not receive pings, even during honeymoon, for a specified grace period. This is called celibacy.
  • If communication with an active node simply times out, it will instead continue to partake in the pinging as normal.
  • The node that was last active is always treated as if it is no older than 1 hour, and will in inactive periods thus be pinged as often.
  • A packet is defined as initial when it contains a session ID of 0
  • A packet is defined as broken when it does not adhere to the protocol by displaying a lack/excess of data or data in the wrong format
  • A packet is defined as whole when it is not broken.
  • A packet is defined as hacked when it is whole according to comms protocol but broken according to tamper protocol.

Protocols

The comms protocol is the language of OctoMY™ in network communication.

The tamper protocol is an extra layer of protection beside comms protocol to detect attempts to tamper with communications. It has validation checks that are not necessary by the comms protocol, but add tells to the authenticity of the data.

NOTE: The implementation of comms protocol has highest priority, tamper protocol will start implementation as soon as we have MVP working.

Session

If A sends first, A becomes initiator and B becomes adherent. All is dandy
If B sends first, B becomes initiator and A becomes adherent. All is dandy
If A & B send exqactly at the same time both A & B become initiator and chaos ensues

This is resolved in the following manner:

  1. Detect if both A & B are initiator
    1. If we receive a packet indicating that the other party thinks they hold the same role as us (both initiator or both adherent)
    2. Look up ID-duel and the winner gets to keep current status while the looser must change.
    3. Drop the packet with a log entry and let the flow of handshake resume but now with both having correct role.
Detect duplicate connection pairs in sessions, and remove the one that is inferior in ID-duel
Time since first ever successfull connection
Time since first ever connection attempt, successfull or not
Time since last successfull connection
Time since last unsuccessful connection attempt



// Did handshake complete already?
//if(session->established()) {
// TODO: Handle this case:
/*
When an initial packet is received after session was already established the following logic applies:
+ If packet timestamp was before session-established-and-confirmed timestamp, it is ignored
+ The packet is logged and the stream is examined for packets indicating the original session is still in effect.
+ If there were no sign of the initial session after a timeout of X seconds, the session is torn down, and the last of the session initial packet is answered to start a new hand shake
+ If one or more packets indicating that the session is still going, the initial packet and it's recepient is flagged as hacked and the packet is ignored. This will trigger a warning to the user in UI, and rules may be set up in plan to automatically shut down communication uppon such an event.
+ No matter if bandwidth management is in effect or not, valid initial packets should be processed at a rate at most once per 3 seconds.
*/
//} else {
//}



Handshake

Sessions are established through a handshake that is similar in concept to the 3-way TCP handshake. Both parts of the handshake (A & B) will need the other part's valid RSA pubkey and netowrk address before the handshake may take place. This is exchanged through the pairing process that happens a prior.

In the handshake part A is the initiator:

  • A sends SYN to B:
    • Hi B. Here is my
      • DESIRED SESSION-ID
      • NONCE
    • ENCODED WITH YOUR ( B's ) PUBKEY.
  • B answers SYN-ACK to A:
    • Hi A. HERE IS MY
      • FULL-ID
      • DESIRED SESSION-ID
      • RETURN NONCE
      • NEW NONCE
    • ENCODED WITH YOUR ( A's ) PUBKEY back atch'a.
  • A answers ACK to B:
    • Hi again B. HERE IS MY
      • FULL-ID
      • RETURN NONCE
    • WELL RECEIVED.
At this point session is established

NOTE: For every step in this protocol, if any part is waiting for data from the other that does not arrive, it will attempt to resend it's last message on a regular interval until it does (or some error condition or other state change occurs).

Node

  • Agent -> AgentControls -> AgentCourierSet
  • Remote -> ClientWidget -> RemoteCourierSet
  • CommsChannel - A mode of communication over UDP, utilizing Couriers to transfer different kinds of data
  • CommsSession - A session of using CommsChannel. Starts as rogue, proceeds with handshake to exchange full ID and 64bit nonceID, agree on security protocols and bandwidth limits.
  • CommsSessionDirectory - The list of sessions in use by the CommsChannel.
  • CommsSignature - [DEPRECATED, use full key->id() string instead] This used to be a special purpose identification mapping between full ID and short hand 64bit integer ID used only by CommsChannel and friends. It has sort of been replaced by session-ids.
  • NodeAssociate - Address book entry for one node. Stored in NodeAssociateStore. Meant to be persistent between invocations.
  • NodeAssociateStore - Place to keep NodeAssociates
  • ReliabilitySystem - Separate system to maintain reliability in the communications. Enabled when needed.
  • FlowControl - Separate system to maintain flow control (apply throttling and idling to avoid constipation and stalls) in the communications. Enabled when needed.
  • Courier - A class responsible for a certain part of communications over CommsChannel
  • CourierSet - A collection of couriers to be handled as one.
  • AgentCourierSet - Specialization of CourierSet as convenience for Agent
  • RemoteCourierSet - Specialization of CourierSet as convenience for Remote

Intrinsic parts of comms include the following:

  • Session management
    • Session initiation/handshake
    • Exchange of transmission control data
    • Session tear-down
  • Bandwidth management
    • Detection of available bandwidth
    • Throttling to avoid excessive bandwidth use
    • Continuous monitoring and ajustment of protocol parameters optimize flow of packets (including the priority and timing of couriers)
  • Encryption management
    • Signing and sign verification of un-encrypted protocol text based on client RSA keypairs
    • Generation and exchange of encryption keys based on client RSA keypairs
    • Generation of security primitives such as nonces
    • Encryption and decryption of protocol text
  • Reliability management
    • Maintaining of continued UDP connection over unreliable network components such as consumer grade routers and wireles radios with poor coverage by the dispatch of necessary communication with STUN services, or sending of antler packets.
    • Detection and removal of duplicate and corrupt packets
    • Detection and re-sending of missing packets
    • Reordering of ordered packet sequences

Please note that the the expensive and complex intrinsic features such as reliability and encryption of CommsChannel are invoked only when needed.

When the amount of data needed for intrinsic features is extensive, separate "intrinsic packets" will be sent, while other lesser-in-size intrinsic data such as counters will instead accompany each packet. Protocol dictates when such dedicated packets will be needed or not, and changes in this part of the protocol should not affect the higher level courier interface.


Comments