$Id: udp.html,v 1.15 2005/08/03 13:58:13 jrandom Exp $

Secure Semireliable UDP (SSU)

DRAFT

The goal of this protocol is to provide secure, authenticated, semireliable, and unordered message delivery, exposing only a minimal amount of data easily discernible to third parties. It should support high degree communication as well as TCP-friendly congestion control, and may include PMTU detection. It should be capable of efficiently moving bulk data at rates sufficient for home users. In addition, it should support techniques for addressing network obstacles, like most NATs or firewalls.

Addressing and introduction

To contact an SSU peer, one of two sets of information is necessary: a direct address, for when the peer is publicly reachable, or an indirect address, for using a third party to introduce the peer. There is no restriction on the number of addresses a peer may have.

    Direct: ssu://host:port/introKey[?opts=[A-Z]*]
  Indirect: ssu://tag@relayhost:port/relayIntroKey/targetIntroKey[?opts=[A-Z]*]

These introduction keys are delivered through an external channel and must be used when establishing a session key. For the indirect address, the peer must first contact the relayhost and ask them for an introduction to the peer known at that relayhost under the given tag. If possible, the relayhost sends a message to the addressed peer telling them to contact the requesting peer, and also gives the requesting peer the IP and port on which the addressed peer is located. In addition, the peer establishing the connection must already know the public keys of the peer they are connecting to (but not necessary to any intermediary relay peer).

Each of the addresses may also expose a series of options - special capabilities of that particular peer. For a list of available capabilities, see below.

Header

All UDP datagrams begin with a MAC and an IV, followed by a variable size payload encrypted with the appropriate key. The MAC used is HMAC-SHA256, truncated to 16 bytes, while the key is a full AES256 key. The specific construct of the MAC is the first 16 bytes from:

  HMAC-SHA256(payload || IV || payloadLength, macKey)

The payload itself is AES256/CBC encrypted with the IV and the sessionKey, with replay prevention addressed within its body, explained below. The payloadLength in the MAC is a 2 byte unsigned integer in 2s complement.

Payload

Within the AES encrypted payload, there is a minimal common structure to the various messages - a one byte flag and a four byte sending timestamp (*seconds* since the unix epoch). The flag byte contains the following bitfields:

  bits 0-3: payload type
     bit 4: rekey?
     bit 5: extended options included
  bits 6-7: reserved

If the rekey flag is set, 64 bytes of keying material follow the timestamp. If the extended options flag is set, a one byte option size value is appended to, followed by that many extended option bytes, which are currently uninterpreted.

When rekeying, the first 32 bytes of the keying material is fed into a SHA256 to produce the new MAC key, and the next 32 bytes are fed into a SHA256 to produce the new session key, though the keys are not immediately used. The other side should also reply with the rekey flag set and that same keying material. Once both sides have sent and received those values, the new keys should be used and the previous keys discarded. It may be useful to keep the old keys around briefly, to address packet loss and reordering.

 Header: 37+ bytes
 +----+----+----+----+----+----+----+----+
 |                  MAC                  |
 |                                       |
 +----+----+----+----+----+----+----+----+
 |                   IV                  |
 |                                       |
 +----+----+----+----+----+----+----+----+
 |flag|        time       | (optionally  |
 +----+----+----+----+----+              |
 | this may have 64 byte keying material |
 | and/or a one+N byte extended options) |
 +---------------------------------------|

Messages

SessionRequest (type 0)

Peer: Alice to Bob
Data:
  • 256 byte X, to begin the DH agreement
  • 1 byte IP address size
  • that many byte representation of Bob's IP address
  • N bytes, currently uninterpreted (later, for challenges)
Key used: introKey
 +----+----+----+----+----+----+----+----+
 |         X, as calculated from DH      |
 |                                       |
                 .   .   .               
 |                                       |
 +----+----+----+----+----+----+----+----+
 |size| that many byte IP address (4-16) |
 +----+----+----+----+----+----+----+----+
 |           arbitrary amount            |
 |        of uninterpreted data          |
                 .   .   .               
 |                                       |
 +----+----+----+----+----+----+----+----+

SessionCreated (type 1)

Peer: Bob to Alice
Data:
  • 256 byte Y, to complete the DH agreement
  • 1 byte IP address size
  • that many byte representation of Alice's IP address
  • 2 byte port number (unsigned, big endian 2s complement)
  • 4 byte relay tag which Alice can publish (else 0x0)
  • 4 byte timestamp (seconds from the epoch) for use in the DSA signature
  • 40 byte DSA signature of the critical exchanged data (X + Y + Alice's IP + Alice's port + Bob's IP + Bob's port + Alice's new relay tag + Bob's signed on time), encrypted with another layer of encryption using the negotiated sessionKey. The IV is reused here.
  • 8 bytes padding, encrypted with an additional layer of encryption using the negotiated session key as part of the DSA block
  • N bytes, currently uninterpreted (later, for challenges)
Key used: introKey, with an additional layer of encryption over the 40 byte signature and the following 8 bytes padding.
 +----+----+----+----+----+----+----+----+
 |         Y, as calculated from DH      |
 |                                       |
                 .   .   .               
 |                                       |
 +----+----+----+----+----+----+----+----+
 |size| that many byte IP address (4-16) |
 +----+----+----+----+----+----+----+----+
 | Port (A)| public relay tag  |  signed
 +----+----+----+----+----+----+----+----+
   on time |                             |
 +----+----+                             |
 |              DSA signature            |
 |                                       |
 |                                       |
 |                                       |
 |         +----+----+----+----+----+----+
 |         |     (8 bytes of padding) 
 +----+----+----+----+----+----+----+----+
           |                             |
 +----+----+                             |
 |           arbitrary amount            |
 |        of uninterpreted data          |
                 .   .   .               
 |                                       |
 +----+----+----+----+----+----+----+----+

SessionConfirmed (type 2)

Peer: Alice to Bob
Data:
  • 1 byte identity fragment info:
    bits 0-3: current identity fragment #
    bits 4-7: total identity fragments
  • 2 byte size of the current identity fragment
  • that many byte fragment of Alice's identity.
  • on the last identity fragment, the signed on time is included after the identity fragment, and the last 40 bytes contain the DSA signature of the critical exchanged data (X + Y + Alice's IP + Alice's port + Bob's IP + Bob's port + Alice's new relay key + Alice's signed on time)
Key used: sessionKey
 Fragment 1 through N-1
 +----+----+----+----+----+----+----+----+
 |info| cursize |                        |
 +----+----+----+                        |
 |      fragment of Alice's full         |
 |            identity keys              |
                 .   .   .               
 |                                       |
 +----+----+----+----+----+----+----+----+
 
 Fragment N:
 +----+----+----+----+----+----+----+----+
 |info| cursize |                        |
 +----+----+----+                        |
 |      fragment of Alice's full         |
 |            identity keys              |
                 .   .   .               
 |                                       |
 +----+----+----+----+----+----+----+----+
 |  signed on time   |                   |
 +----+----+----+----+                   |
 |  arbitrary amount of uninterpreted    |
 |        data, up from the end of the   |
 |  identity key to 40 bytes prior to    |
 |       end of the current packet       |
 +----+----+----+----+----+----+----+----+
 | DSA signature                         |
 |                                       |
 |                                       |
 |                                       |
 |                                       |
 +----+----+----+----+----+----+----+----+

RelayRequest (type 3)

Peer: Alice to Bob
Data:
  • 4 byte relay tag
  • 1 byte IP address size
  • that many byte representation of Bob's IP address
  • 1 byte IP address size
  • that many byte representation of Alice's IP address
  • 2 byte port number (of Alice)
  • 1 byte challenge size
  • that many bytes to be relayed to Charlie in the intro
  • N bytes, currently uninterpreted
Key used: introKey (or sessionKey, if Alice/Bob is established)
 +----+----+----+----+----+----+----+----+
 |      relay tag    |size| that many    |
 +----+----+----+----+----+         +----|
 | bytes making up Bob's IP address |size|
 +----+----+----+----+----+----+----+----+
 | that many bytes making up Alice's IP  |
 +----+----+----+----+----+----+----+----+
 | Port (A)|size| that many challenge    |
 +----+----+----+                        |
 | bytes to be delivered to Charlie      |
 +----+----+----+----+----+----+----+----+
 | arbitrary amount of uninterpreted data|
 +----+----+----+----+----+----+----+----+

RelayResponse (type 4)

Peer: Bob to Alice
Data:
  • 1 byte IP address size
  • that many byte representation of Charlie's IP address
  • 2 byte port number
  • 1 byte IP address size
  • that many byte representation of Alice's IP address
  • 2 byte port number
  • N bytes, currently uninterpreted
Key used: introKey (or sessionKey, if Alice/Bob is established)
 +----+----+----+----+----+----+----+----+
 |size| that many bytes making up        |
 +----+                        +----+----+
 | Charlie's IP address        | Port (C)|
 +----+----+----+----+----+----+----+----+
 |size| that many bytes making up        |
 +----+                        +----+----+
 | Alice's IP address          | Port (A)|
 +----+----+----+----+----+----+----+----+
 | arbitrary amount of uninterpreted data|
 +----+----+----+----+----+----+----+----+

RelayIntro (type 5)

Peer: Bob to Charlie
Data:
  • 1 byte IP address size
  • that many byte representation of Alice's IP address
  • 2 byte port number (of Alice)
  • 1 byte challenge size
  • that many bytes relayed from Alice
  • N bytes, currently uninterpreted
Key used: sessionKey
 +----+----+----+----+----+----+----+----+
 |size| that many bytes making up        |
 +----+                        +----+----+
 | Charlie's IP address        | Port (C)|
 +----+----+----+----+----+----+----+----+
 |size| that many bytes of challenge     |
 +----+                                  |
 | data relayed from Alice               |
 +----+----+----+----+----+----+----+----+
 | arbitrary amount of uninterpreted data|
 +----+----+----+----+----+----+----+----+

Data (type 6)

Peer: Any
Data:
  • 1 byte flags:
       bit 0: explicit ACKs included
       bit 1: ACK bitfields included
       bit 2: reserved
       bit 3: explicit congestion notification
       bit 4: request previous ACKs
       bit 5: want reply
       bit 6: extended data included
       bit 7: reserved
  • if explicit ACKs are included:
    • a 1 byte number of ACKs
    • that many 4 byte MessageIds being fully ACKed
  • if ACK bitfields are included:
    • a 1 byte number of ACK bitfields
    • that many 4 byte MessageIds + a 1 or more byte ACK bitfield. The bitfield uses the 7 low bits of each byte, with the high bit specifying whether an additional bitfield byte follows it (1 = true, 0 = the current bitfield byte is the last). These sequence of 7 bit arrays represent whether a fragment has been received - if a bit is 1, the fragment has been received. To clarify, assuming fragments 0, 2, 5, and 9 have been received, the bitfield bytes would be as follows:
      byte 0
         bit 0: 1 (further bitfield bytes follow)
         bit 1: 1 (fragment 0 received)
         bit 2: 0 (fragment 1 not received)
         bit 3: 1 (fragment 2 received)
         bit 4: 0 (fragment 3 not received)
         bit 5: 0 (fragment 4 not received)
         bit 6: 1 (fragment 5 received)
         bit 7: 0 (fragment 6 not received)
      byte 1
         bit 0: 0 (no further bitfield bytes)
         bit 1: 0 (fragment 7 not received)
         bit 1: 0 (fragment 8 not received)
         bit 1: 1 (fragment 9 received)
         bit 1: 0 (fragment 10 not received)
         bit 1: 0 (fragment 11 not received)
         bit 1: 0 (fragment 12 not received)
         bit 1: 0 (fragment 13 not received)
  • If extended data included:
    • 1 byte data size
    • that many bytes of extended data (currently uninterpreted)
    • 1 byte number of fragments
    • that many message fragments:
      • 4 byte messageId
      • 3 byte fragment info:
          bits 0-6: fragment #
             bit 7: isLast (1 = true)
          bits 8-9: unused
        bits 10-23: fragment size
      • that many bytes
    • N bytes padding, uninterpreted
Key used: sessionKey
 +----+----+----+----+----+----+----+----+
 |flag| (additional headers, determined  |
 +----+                                  |
 | by the flags, such as ACKs or         |
 | bitfields                             |
 +----+----+----+----+----+----+----+----+
 |#frg|     messageId     |   frag info  |
 +----+----+----+----+----+----+----+----+
 | that many bytes of fragment data      |
                  .  .  .                                       
 |                                       |
 +----+----+----+----+----+----+----+----+
 |     messageId     |   frag info  |    |
 +----+----+----+----+----+----+----+    |
 | that many bytes of fragment data      |
                  .  .  .                                       
 |                                       |
 +----+----+----+----+----+----+----+----+
 |     messageId     |   frag info  |    |
 +----+----+----+----+----+----+----+    |
 | that many bytes of fragment data      |
                  .  .  .                                       
 |                                       |
 +----+----+----+----+----+----+----+----+
 | arbitrary amount of uninterpreted data|
 +----+----+----+----+----+----+----+----+

PeerTest (type 7)

Peer: Any
Data:
  • 4 byte nonce
  • 1 byte IP address size
  • that many byte representation of Alice's IP address
  • 2 byte port number
  • Alice's introduction key
  • N bytes, currently uninterpreted
Key used: introKey (or sessionKey if the connection has already been established)
 +----+----+----+----+----+----+----+----+
 |    test nonce     |size| that many    |
 +----+----+----+----+----+              |
 |bytes making up Alice's IP address     |
 |----+----+----+----+----+----+----+----+
 | Port (A)| Alice or Charlie's          |
 +----+----+                             |
 | introduction key (Alice's is sent to  |
 | Bob and Charlie, while Charlie's is   |                                      |
 | sent to Alice)                        |
 |         +----+----+----+----+----+----+
 |         | arbitrary amount of         |
 |----+----+                             |
 | uninterpreted data                    |
 +----+----+----+----+----+----+----+----+

Congestion control

SSU's need for only semireliable delivery, TCP-friendly operation, and the capacity for high throughput allows a great deal of latitude in congestion control. The congestion control algorithm outlined below is meant to be both efficient in bandwidth as well as simple to implement.

Packets are scheduled according to the the router's policy, taking care not to exceed the router's outbound capacity or to exceed the measured capacity of the remote peer. The measured capacity should operate along the lines of TCP's slow start and congestion avoidance, with additive increases to the sending capacity and multiplicative decreases in face of congestion. Veering away from TCP, however, routers may give up on some messages after a given period or number of retransmissions while continuing to transmit other messages.

The congestion detection techniques vary from TCP as well, since each message has its own unique and nonsequential identifier, and each message has a limited size - at most, 32KB. To efficiently transmit this feedback to the sender, the receiver periodically includes a list of fully ACKed message identifiers and may also include bitfields for partially received messages, where each bit represents the reception of a fragment. If duplicate fragments arrive, the message should be ACKed again, or if the message has still not been fully received, the bitfield should be retransmitted with any new updates.

The simplest possible implementation does not need to pad the packets to any particular size, but instead just places a single message fragment into a packet and sends it off (careful not to exceed the MTU). A more efficient strategy would be to bundle multiple message fragments into the same packet, so long as it doesn't exceed the MTU, but this is not necessary. Eventually, a set of fixed packet sizes may be appropriate to further hide the data fragmentation to external adversaries, but the tunnel, garlic, and end to end padding should be sufficient for most needs until then.

Keys

All encryption used is AES256/CBC with 32 byte keys and 16 byte IVs. The MAC and session keys are negotiated as part of the DH exchange, used for the HMAC and encryption, respectively. Prior to the DH exchange, the publicly knowable introKey is used for the MAC and encryption.

When using the introKey, both the initial message and any subsequent reply use the introKey of the responder (Bob) - the responder does not need to know the introKey of the requestor (Alice). The DSA signing key used by Bob should already be known to Alice when she contacts him, though Alice's DSA key may not already be known by Bob.

Upon receiving a message, the receiver checks the from IP address with any established sessions - if there is one or more matches, those session's MAC keys are tested sequentially in the HMAC. If none of those verify or if there are no matching IP addresses, the receiver tries their introKey in the MAC. If that does not verify, the packet is dropped. If it does verify, it is interpreted according to the message type, though if the receiver is overloaded, it may be dropped anyway.

If Alice and Bob have an established session, but Alice loses the keys for some reason and she wants to contact Bob, she may at any time simply establish a new session through the SessionRequest and related messages. If Bob has lost the key but Alice does not know that, she will first attempt to prod him to reply, by sending a DataMessage with the wantReply flag set, and if Bob continually fails to reply, she will assume the key is lost and reestablish a new one.

For the DH key agreement, RFC3526 2048bit MODP group (#14) is used:

  p = 2^2048 - 2^1984 - 1 + 2^64 * { [2^1918 pi] + 124476 }
  g = 2

The DSA p, q, and g are shared according to the scope of the identity which created them.

Replay prevention

Replay prevention at the SSU layer occurs by rejecting packets with exceedingly old timestamps or those which reuse an IV. To detect duplicate IVs, a sequence of Bloom filters are employed to "decay" periodically so that only recently added IVs are detected.

The messageIds used in DataMessages are defined at layers above the SSU transport and are passed through transparently. These IDs are not in any particular order - in fact, they are likely to be entirely random. The SSU layer makes no attempt at messageId replay prevention - higher layers should take that into account.

Peer testing

The automation of collaborative reachability testing for peers is enabled by a sequence of PeerTest messages. With its proper execution, a peer will be able to determine their own reachability and may update its behavior accordingly. The testing process is quite simple:

        Alice                  Bob                  Charlie
    PeerTest ------------------->
                             PeerTest-------------------->
                                <-------------------PeerTest
         <-------------------PeerTest
         <------------------------------------------PeerTest
    PeerTest------------------------------------------>
         <------------------------------------------PeerTest

Each of the PeerTest messages carry a nonce identifying the test series itself, as initialized by Alice. If Alice doesn't get a particular message that she expects, she will retransmit accordingly, and based upon the data received or the messages missing, she will know her reachability. The various end states that may be reached are as follows:

Alice should choose Bob arbitrarily from known peers who seem to be capable of participating in peer tests. Bob in turn should choose Charlie arbitrarily from peers that he knows who seem to be capable of participating in peer tests and who are on a different IP from both Bob and Alice. If the first error condition occurs (Alice doesn't get PeerTest messages from Bob), Alice may decide to designate a new peer as Bob and try again with a different nonce.

Alice's introduction key is included in all of the PeerTest messages so that she doesn't need to already have an established session with Bob and so that Charlie can contact her without knowing any additional information. Alice may go on to establish a session with either Bob or Charlie, but it is not required.

Message sequences

Connection establishment (direct)

        Alice                         Bob
    SessionRequest--------------------->
          <---------------------SessionCreated
    SessionConfirmed------------------->
    SessionConfirmed------------------->
    SessionConfirmed------------------->
    SessionConfirmed------------------->
          <--------------------------Data

Connection establishment (indirect)

        Alice                         Bob                  Charlie
    RelayRequest ---------------------->
         <--------------RelayResponse    RelayIntro----------->
         <--------------------------------------------Data (ignored)
    SessionRequest-------------------------------------------->
         <--------------------------------------------SessionCreated
    SessionConfirmed------------------------------------------>
    SessionConfirmed------------------------------------------>
    SessionConfirmed------------------------------------------>
    SessionConfirmed------------------------------------------>
         <---------------------------------------------------Data

Sample datagrams

Minimal data message (no fragments, no ACKs, no NACKs, etc)
(Size: 39 bytes)
 +----+----+----+----+----+----+----+----+
 |                  MAC                  |
 |                                       |
 +----+----+----+----+----+----+----+----+
 |                   IV                  |
 |                                       |
 +----+----+----+----+----+----+----+----+
 |flag|        time       |flag|#frg|    |
 +----+----+----+----+----+----+----+    |
 |  padding to fit a full AES256 block   |
 +----+----+----+----+----+----+----+----+
Minimal data message with payload
(Size: 46+fragmentSize bytes)
 +----+----+----+----+----+----+----+----+
 |                  MAC                  |
 |                                       |
 +----+----+----+----+----+----+----+----+
 |                   IV                  |
 |                                       |
 +----+----+----+----+----+----+----+----+
 |flag|        time       |flag|#frg| 
 +----+----+----+----+----+----+----+----+
   messageId    |   frag info  |         |
 +----+----+----+----+----+----+         |
 | that many bytes of fragment data      |
                  .  .  .                                       
 |                                       |
 +----+----+----+----+----+----+----+----+

Peer capabilities

A
If the peer address contains the 'A' capability, that means they are willing and able to participate in peer tests as a 'Bob' or 'Charlie'.
B
If the peer address contains the 'B' capability, that means they are willing and able to serve as an introducer - serving as a Bob for an otherwise unreachable Alice.