i2p.www/i2p2www/spec/proposals/111-ntcp-2.rst

======
NTCP 2
======
.. meta::
    :author: EinMByte, zzz
    :editor: manas, str4d
    :created: 2014-02-13
    :thread: http://zzz.i2p/topics/1577
    :lastupdated: 2017-05-21
    :status: Open
    :supercedes: 106

.. contents::


Overview
========

This proposal describes an authenticated key agreement protocol to improve the
resistance of [NTCP]_ to various forms of automated identification and attacks.

The proposal is organized as follows: the security goals are presented,
followed by a discussion of the basic protocol. Next, a complete specification
of all protocol messages is given. Finally, router addresses and version
identification are discussed. An appendix discussing a generic attack on common
padding schemes is also included, as well as an appendix containing a number of
candidates for the authenticated cipher.


Motivation
==========

[NTCP]_ data is encrypted after the first message (and the first message appears
to be random data), thus preventing protocol identification through "payload
analysis". It is still vulnerable to protocol identification through "flow
analysis". That's because the first 4 messages (i.e. the handshake) are fixed
length (288, 304, 448, and 48 bytes).

By adding random amounts of random data to each of the messages, we can make it
a lot harder.

The authors acknowledge that standard security practices would suggest to use an
existing protocol such as TLS, but this is [Prop104]_ and it has problems of its
own. Wherever appropriate, "future work" paragraphs have been added to indicate
missing features or subjects of discussion.


Design Goals
============

- Support NTCP 1 and 2 on a single port, auto-detect, and published as a single
  "transport" (i.e. [RouterAddress]_) in the [NetDB]_.

- Publish support for version 1 only, 2 only, or 1+2 in the NetDB in a separate
  field, and default to version 1 only (don't bind version support to a
  particular router version)

- Ensure that all implementations (java/i2pd/kovri/go) can add version 2
  support (or not) on their own schedules

- Add random padding to all NTCP messages including handshake and data messages
  (i.e. length obfuscation so all messages aren't a multiple of 16 bytes)
  Provide options mechanism for both sides to request min and max padding
  and/or padding distribution. Specifics of the padding distribution are
  implementation-dependent and may or may not be specified in the protocol
  itself.

- Obfuscate the contents of messages that aren't encrypted (1 and 2),
  sufficiently so that DPI boxes and AV signatures can't easily classify them.
  Also ensure that the messages going to a single peer or set of peers do not
  have a similar pattern of bits.

- Fix loss of bits in DH due to Java format [Ticket1112]_, possibly
  (probably?) by switching to X25519.

- Switch to a real key derivation function (KDF) rather than using the DH
  result as-is?

- Add "probing resistance" (as Tor calls it); this includes replay resistance.

- Maintain 2-way authenticated key exchange (2W-AKE). 1W-AKE is not sufficient
  for our application.

- Continue to use the variable-type, variable-length signatures (from the
  published [RouterIdentity]_ signing key) for authentication. Do not rely on a
  separate published key.

- Add options/version in handshake for future extensibility.

- Add resistance to malicious MitM TCP segmentation if possible.

- Don't add significantly to CPU required for connection setup; if possible,
  reduce it significantly.

- Add message authentication (MAC), possibly HMAC-SHA256 or Poly1305 (see
  alternatives below), and remove Adler checksum.

- If possible, reduce the 4-message, two-round-trip handshake to a 3-message,
  one-round-trip handshake, as in [SSU]_. This would require moving Bob's
  signature in message 4 to message 2. Research the reason for 4 messages in
  the ten-year-old email/status/meeting archives.

- Maintain timestamps for replay and skew detection.

- Avoid any year 2038 issues in timestamps, must work until at least 2106.

- Increase max message size from 16K to 32K or 64K.

- Any new crypto should be readily available in libraries for use in Java
  (1.7), C++, and Go router implementations.

- Include representatives of Java, C++, and Go router developers in the design.

- Minimize changes (but there will still be a lot).

Goals to be clarified further:

- Support both versions in a common set of code (this may not be possible and
  is implementation-dependent in any case).

Non-Goals
---------

- Bullet-proof DPI resistance... that would be pluggable transports, [Prop109]_.

- A TLS-based (or HTTPS-lookalike) transport... that would be [Prop104]_.

- Don't change the symmetric stream crypto, continue to use AES-CBC-256 (but do
  add flags in the handshake so we can change later). However, why is using
  the last 16 bytes of the previous message as the IV better than just using
  counter mode? To be researched. Salsa 20 also an option (see alternatives
  below).

- Timing-based DPI resistance (inter-message timing/delays can be
  implementation-dependent; intra-message delays can be introduced at any
  point, including before sending the random padding, for example). Artificial
  delays (what obfs4 calls IAT or inter-arrival time) are independent of the
  protocol itself.

- Deniability of participating in a session (there's signatures in there).

Non-goals that may be partially reconsidered or discussed:

- The degree of protection against Deep Packet Inspection (DPI)

- Post-Quantum (PQ) security

- Deniability

Related Goals
-------------

- Implement a NTCP 1/2 test setup


Security Goals
==============

We consider three parties:

- Alice, who wishes to establish a new session.
- Bob, with whom Alice wishes to establish a session.
- Mallory, the "man in the middle" between Alice and Bob.

At most two participants can engage in active attacks.

Alice and Bob are both in possession of a static key pair, which is contained
in their [RouterIdentity]_.

The proposed protocol attempts to allow Alice and Bob to agree on a shared
secret key (in the sequel denoted K) under the following requirements:

1) Private key security: neither Bob nor Mallory learns anything about Alice's
   static private key. Symmetrically, Alice does not learn anything about Bob's
   static private key.

2) The session key K is only known by Alice and Bob.

3) Perfect forward secrecy: the agreed upon session key remains secret in the
   future, even when the static private keys of Alice and/or Bob are revealed
   after the key has been agreed upon. 

4) Two-way authentication: Alice is certain that she has established a session
   with Bob, and vice versa.

5) Protection against straightforward DPI: it is not trivial to detect that
   Alice and Bob are engaged in the protocol using only straightforward deep
   packet inspection (DPI) techniques.

6) Limited deniability: neither Alice nor Bob can deny participation in the
   protocol, but if either leaks the shared key the other party can deny the
   authenticity of the contents of the transmitted data.

The present proposal attempts to provide all five requirements based on the
Station-To-Station (STS) protocol [STS]_. Note that this protocol is also the
basis for the [SSU]_ protocol.

The notion of "straightforward DPI" is here taken to include the following
adversary capabilities:

1) The ability to inspect all data sent or received by the target.

2) The ability to perform operations on the observed data, such as
   applying block ciphers or hash functions.

3) The ability to store and compare with previously sent messages.

4) The ability to modify, delay or fragment packets.

However, the attacker has the following restrictions:

5) The inability to map IP addresses to router hashes. While this is trivial,
   it would require a DPI system specifically designed to target I2P.

6) The inability to use timing information to detect the protocol. 

7) Generally speaking, the DPI toolbox shouldn't contain any built-in tools
   that are specifically designed for I2P detection. This includes creating
   "honeypots", which would for example include nonrandom padding in their
   messages. Note that this does not exclude machine learning systems or highly
   configurable DPI tools as long as they meet the other requirements.

To counter payload analysis, it is ensured that all messages are
indistinguishable from random. This also requires their length to be random,
which is more complicated than just adding random padding. In fact, in Appendix
A, the authors argue that a naive (i.e. uniform) padding scheme does not
resolve the problem. Appendix A therefore proposes to include either random
delays or to develop an alternate padding scheme that can provide reasonable
protection for the proposed attack.

To protect against the sixth entry above, implementations should include random
delays in the protocol. Such techniques are not covered by this proposal, but
they could also resolve the padding length issues. In summary, the proposal
provides good protection against payload analysis (when the considerations in
Appendix A are taken into account), but only limited protection against flow
analysis.

Future work:

- Consider the behaviour of the protocol when packets are dropped or reordered
  by an attacker. Recent interesting work in this area can be found in
  [IACR-1150]_.

- Provide a more accurate classification of DPI systems, taking into account
  the existing literature related to the subject.

- Discuss the formal security of the proposed protocol, ideally taking into
  account the DPI attacker model.


Basic Protocol
==============

The protocol consists of two phases:

1) Key exchange, based on Diffie-Hellman (in principle over an arbitrary group).

2) Signature exchange, to provide authentication.

For the group used by (1), we will use multiplicative notation despite the fact
that additive notation is more common for elliptic curve groups. Finally, note
that the protocol allows the integration of post-quantum key exchange
mechanisms such as supersingular isogeny key exchange [SIDH]_. However, full
post-quantum security would also require introducing new signature types in the
RouterInfo, and is outside of the scope of this proposal.

The signatures used for (2) can be any signature supported by the RI structure
[SigningPublicKey]_.

Let g be the (known) generator, x and y private keys, and public keys
``X = g^x``, ``Y = g^y``. X and Y are elements of an the group used by (1).

The STS protocol proceeds as follows:

.. raw:: html

  {% highlight %}
Alice                           Bob

  X -------------------------------->
  <-----------------Y, E_K(S_B(X, Y))
  E_K(S_A(X, Y))-------------------->
{% endhighlight %}

where ``S_J(M)`` is the signature of ``M`` by party ``J``, and the session key
is ``K = KDF(X^y) = KDF(Y^x)``. KDF is assumed to be a key derivation function,
the choice of which is in principle arbitrary.

Some notes on the above protocol, which are also discussed in [STS]_ are listed
below:

- It is prudent to let both parties sign both X and Y, but in principle this is
  not necessary.

- The signatures are encrypted to prevent Mallory from substituting the (then
  unencrypted) signature in the last message with his own signature. This is
  further discussed in Trac [Ticket1849]_.

- NTCP2 adds various options, as well as timestamps to this protocol.

Future work:

- The original NTCP protocol requires four messages in the establishment
  sequence, for unclear reasons. This proposal does not provide an explanation
  for this yet.


Messages
========

The establishment sequence is as follows:

.. raw:: html

  {% highlight %}
Alice                           Bob

  SessionRequest ------------------->
  <------------------- SessionCreated
  SessionConfirmed ----------------->
{% endhighlight %}

Once a session has been established, Alice and Bob can exchange Data messages.
Approximately every 15 minutes, TimeSync messages are transmitted. 

All message types (SessionRequest, SessionCreated, SessionConfirmed, Data and
TimeSync) are specified in this section.

Some notations::

  - RH_A = Router hash Alice
  - RH_B = Router hash Bob

1) SessionRequest
------------------

Raw contents:

.. raw:: html

  {% highlight lang='dataspec' %}
+----+----+----+----+----+----+----+----+
  |                                       |
  +              options                  +
  |                                       |
  +----+----+----+----+----+----+----+----+
  |              ext_options              |
  +       (number implied by options)     +
  |                                       |
  ~               .   .   .               ~
  |                                       |
  +----+----+----+----+----+----+----+----+
  |                   X                   |
  +       (length implied by options)     +
  |                                       |
  ~               .   .   .               ~
  |                                       |
  +----+----+----+----+----+----+----+----+
  |      Arbitrary amount of padding      |
  +      (length implied by options)      +
  |                                       |
  ~               .   .   .               ~
  |                                       |
  +----+----+----+----+----+----+----+----+

  options :: AES-256-CBC encrypted options block
             key: RH_B
             iv: 0x0000 0000 0000 0000

  ext_options :: AES-256-CBC encrypted blocks
                 length: multiple of 16 bytes

  X :: AES-256-CBC encrypted, padded to multiple of 16
{% endhighlight %}

Unencrypted options block:

.. raw:: html

  {% highlight lang='dataspec' %}
+----+----+----+----+----+----+----+----+
  |   ver   |    KE   |   auth  |  padLen |
  +----+----+----+----+----+----+----+----+
  |        tsA        | NO | Reserved (0) |
  +----+----+----+----+----+----+----+----+

  ver :: Protocol version (currently 2)

  KE  :: Key-exchange mechanism used
            0: Diffie-Hellman in Z/pZ [RFC-3526], 2048 bit p
               KDF = SHA256 (possibly truncated)
            1: Diffie-Hellman over curve 25519 (X25519)
               KDF = SHA256 (possibly truncated)

  auth :: Authenticated encryption mode
          Key = K, to be agreed upon using KE
          0: AES-CBC-256/HMAC-MD5 [RFC-2104]
             IV  = included before the encrypted data and MAC (for first
                   message)
                 = last encrypted block of (your own) previous message
          ... (Proposed alternatives are listed in Appendix B.)

  padLen :: Length of the padding
            (Distribution to be determined, see Appendix A.)

  tsA :: Unix timestamp
         Wraps around in 2106

  NO :: Number of following option blocks.
{% endhighlight %}

Notes
`````
- The timestamp and padding length ensure that the encrypted options block is
  different for every session.

- Reserved options must be set to zero if ver = 2. This increases the accuracy
  of version detection.

- Diffie-Hellman parameters may never be sent twice to avoid DPI attacks.

- The "KE" and "auth" options must be compatible, i.e. the shared secret K must
  be of the appropriate size. If more "auth" options are added, this could
  implicitly change the meaning of the "KE" flag to use a different KDF or a
  different truncation size.

- KE = 0 is not exactly the same as in NTCP 1, where X was represented in
  Java's BigInteger format. NTCP2 uses the regular representation of X.

- auth = 0 is not exactly the same as in NTCP 1, since it includes a MAC
  (HMAC-MD5). The author suggests that this should only be used as a
  transitional option, for reasons discussed below.

- The options block and X are encrypted to ensure payload indistinguishably,
  which is a necessary DPI countermeasure.

We use AES to achieve obfuscation, rather than more complicated and slower
alternatives such as elligator2 (which would apply to X25519).

In the following, B will be the block size (in bytes) of the cipher used for
authenticated encryption (as specified in the "auth" field).

Encrypted/authenticated data will be represented as 

.. raw:: html

  {% highlight lang='dataspec' %}
+----+----+----+----+----+----+----+----+
  |   Encrypted and authenticated data    |
  +   (mode determined by auth option)    +
  |                                       |
  ~               .   .   .               ~
  |                                       |
  +----+----+----+----+----+----+----+----+
{% endhighlight %}

For AES-CBC-256/HMAC-MD5 this has the following specific format

.. raw:: html

  {% highlight lang='dataspec' %}
+----+----+----+----+----+----+----+----+
  |                                       |
  +                 MAC                   +
  |                                       |
  +----+----+----+----+----+----+----+----+
  |       AES-CBC-256 encrypted data      |
  +                                       +
  |                                       |
  ~               .   .   .               ~
  |                                       |
  +----+----+----+----+----+----+----+----+
{% endhighlight %}

The first encrypted and authenticated data (separate for Alice and Bob) starts
with a random IV:

.. raw:: html

  {% highlight lang='dataspec' %}
+----+----+----+----+----+----+----+----+
  |                                       |
  +                  IV                   +
  |                                       |
  +----+----+----+----+----+----+----+----+
  |                                       |
  +                 MAC                   +
  |                                       |
  +----+----+----+----+----+----+----+----+
  |       AES-CBC-256 encrypted data      |
  +                                       +
  |                                       |
  ~               .   .   .               ~
  |                                       |
  +----+----+----+----+----+----+----+----+
{% endhighlight %}

Future work
```````````
[RFC-6151]_ states:

    The attacks on HMAC-MD5 do not seem to indicate a practical vulnerability
    when used as a message authentication code. Considering that the
    distinguishing-H attack is different from a distinguishing-R attack, which
    distinguishes an HMAC from a random function, the practical impact on HMAC
    usage as a pseudorandom function (PRF) such as in a key derivation function
    is not well understood.

    Therefore, it may not be urgent to remove HMAC-MD5 from the existing
    protocols. However, since MD5 must not be used for digital signatures, for a
    new protocol design, a ciphersuite with HMAC-MD5 should not be included.
    Options include HMAC-SHA256 [HMAC] [HMAC-SHA256] and [AES-CMAC] when AES is
    more readily available than a hash function.

Hence, alternative authenticated ciphers should be explored for the final NTCP2
proposal. Plenty of options (other than the ones listed here) are available
and should be researched.

Consider candidates from the currently ongoing competition [CAESAR]_.

2) SessionCreated
------------------

Raw contents:

.. raw:: html

  {% highlight lang='dataspec' %}
+----+----+----+----+----+----+----+----+
  |                   Y                   |
  +       (length implied by options)     +
  |                                       |
  ~               .   .   .               ~
  |                                       |
  +----+----+----+----+----+----+----+----+
  |   Encrypted and authenticated data    |
  +   (mode determined by auth option)    +
  |                                       |
  ~               .   .   .               ~
  |                                       |
  +----+----+----+----+----+----+----+----+
  |      Arbitrary amount of padding      |
  +                                       +
  |                                       |
  ~               .   .   .               ~
  |                                       |
  +----+----+----+----+----+----+----+----+
{% endhighlight %}

Unencrypted data:

.. raw:: html

  {% highlight lang='dataspec' %}
+----+----+----+----+----+----+----+----+
  |     ts B     | padLen  |              |
  +----+----+----+----+----+              +
  |               Signature               |
  +   (length determined by RI sigtype)   +
  |                                       |
  ~               .   .   .               ~
  |                                       |
  +----+----+----+----+----+----+----+----+
  |         Padding as necessary          |
  +         (fewer than B bytes)          +
  |                                       |
  ~               .   .   .               ~
  |                                       |
  +----+----+----+----+----+----+----+----+

  ts B :: Unix timestamp
          Wraps around in 2106

  padLen :: Length of the padding
            (Distribution to be determined, see Appendix A.)
{% endhighlight %}

The signature is computed over the following data:

.. raw:: html

  {% highlight lang='dataspec' %}
+----+----+----+----+----+----+----+----+
  |                   X                   |
  +                                       +
  |                                       |
  ~               .   .   .               ~
  |                                       |
  +----+----+----+----+----+----+----+----+
  |                   Y                   |
  +                                       +
  |                                       |
  ~               .   .   .               ~
  |                                       |
  +----+----+----+----+----+----+----+----+
  |                                       |
  +            Options blocks             +
  |                                       |
  +----+----+----+----+----+----+----+----+
  |        tsB        |
  +----+----+----+----+
{% endhighlight %}

Notes
`````
- The main reason for signature encryption is to counter DPI. For unknown
  key-share attacks this does not seem to be necessary. (It is necessary in
  the SessionConfirmed message.)

- Timestamps are included to avoid replay attacks and to detect high clock
  skew.

- The entire options block is signed to avoid version downgrade attacks.

Future work
```````````
- Is it good practice to include the IP and port of both parties in the
  signature to avoid replay attacks within the bounds of what is undetectable
  with timestamps? This is what SSU does, but it doesn't seem to be necessary
  as X and Y also have to match.

- Unlike in NTCP, Bob is not able to sign Alice's RI. This should not be an
  issue, but further investigations would be appropriate.

- The arbitrary padding is neither encrypted nor authenticated. This appears
  to be unnecessary, but it should be investigated. The same applies to all
  other messages with random padding.

3) SessionConfirmed
--------------------

Raw contents:

.. raw:: html

  {% highlight lang='dataspec' %}
+----+----+----+----+----+----+----+----+
  |   Encrypted and authenticated data    |
  +   (mode determined by auth option)    +
  |                                       |
  ~               .   .   .               ~
  |                                       |
  +----+----+----+----+----+----+----+----+
  |      Arbitrary amount of padding      |
  +                                       +
  |                                       |
  ~               .   .   .               ~
  |                                       |
  +----+----+----+----+----+----+----+----+
{% endhighlight %}

Unencrypted data:

.. raw:: html

  {% highlight lang='dataspec' %}
+----+----+----+----+----+----+----+----+
  |   size  |                             |
  +----+----+                             +
  |            Alice's RouterInfo         |
  ~               .   .   .               ~
  |                                       |
  +----+----+----+----+----+----+----+----+
  |  padLen |                             | 
  +----+----+                             +
  |               Signature               |
  +   (length determined by RI sigtype)   +
  |                                       |
  ~               .   .   .               ~
  |                                       |
  +----+----+----+----+----+----+----+----+
  |                                       |
  +           Padding as necessary        +
  |               (< B bytes)             |
  ~               .   .   .               ~
  |                                       |
  +----+----+----+----+----+----+----+----+

  size :: Alice's `RouterInfo` size

  padLen :: Length of the padding
            (Maximum to be determined)
{% endhighlight %}

The signature is computed over the following data:

.. raw:: html

  {% highlight lang='dataspec' %}
+----+----+----+----+----+----+----+----+
  |                   X                   |
  +                                       +
  |                                       |
  ~               .   .   .               ~
  |                                       |
  +----+----+----+----+----+----+----+----+
  |                   Y                   |
  +                                       +
  |                                       |
  ~               .   .   .               ~
  |                                       |
  +----+----+----+----+----+----+----+----+
  |                                       |
  +            Options blocks             +
  |                                       |
  +----+----+----+----+----+----+----+----+
  |        tsB        |
  +----+----+----+----+
{% endhighlight %}

Notes
`````
- As pointed out in "Basic Protocol", both X and Y are included for reasons of
  robustness.

- The reason for signature encryption is to avoid trivial DPI, and to counter
  unknown key-share attacks.

- Timestamps are included to avoid replay attacks.

Future work
```````````
- Similar note as for SessionCreated with respect to including the IP and port
  of both parties in the signature.

- Unlike in NTCP, Alice does not sign Bob's RI (see also SessionCreated). This
  should not be an issue, but it can be included if desired.

4) Data and TimeSync
---------------------

Raw contents:

.. raw:: html

  {% highlight lang='dataspec' %}
+----+----+----+----+----+----+----+----+
  |   Encrypted and authenticated data    |
  +   (mode determined by auth option)    +
  |                                       |
  ~               .   .   .               ~
  |                                       |
  +----+----+----+----+----+----+----+----+
  |      Arbitrary amount of padding      |
  +                                       +
  |                                       |
  ~               .   .   .               ~
  |                                       |
  +----+----+----+----+----+----+----+----+
{% endhighlight %}

Unencrypted data:

.. raw:: html

  {% highlight lang='dataspec' %}
+----+----+----+----+----+----+----+----+
  |  size   | padLen  |       Data        |
  +----+----+----+----+                   +
  |                                       |
  ~               .   .   .               ~
  |                                       |
  +----+----+----+----+----+----+----+----+
  |          Padding as necessary         |
  +              (< B bytes)              +
  |                                       |
  ~               .   .   .               ~
  |                                       |
  +----+----+----+----+----+----+----+----+
{% endhighlight %}

Special case for time synchronization:

.. raw:: html

  {% highlight lang='dataspec' %}
+----+----+----+----+----+----+----+----+
  |  size=0 | padLen  |     timestamp     |
  +----+----+----+----+----+----+----+----+
  |          Padding as necessary         |
  +              (< B bytes)              +
  |                                       |
  ~               .   .   .               ~
  |                                       |
  +----+----+----+----+----+----+----+----+
{% endhighlight %}

Future work
```````````
- The padding length is either to be decided on a per-message basis and
  estimates of the length distribution, or random delays should be added.
  These countermeasures are to be included to resist DPI, as message sizes
  would otherwise reveal that I2P traffic is being carried by the transport
  protocol. The exact padding scheme is an area of future work, Appendix A
  provides more information on the topic.


Version detection
=================

NTCP and NTCP2 can run on the same port, but the supported protocol versions
should be advertised in the RouterAddress.

The RouterAddress transport identifier is "NTCP" for both protocol versions.
Routers would publish "ver=1,2" in the RouterAddress (not the RouterInfo) if
they support both NTCP 1 and NTCP 2 on the same port. "ver=1" is NTCP 1 only.
This is the default if no "ver" is present.

"ver=2" is NTCP 2 only. This can't be used for a long time, as it's not
backwards-compatible. But sometime in the future, implementers could support
version 2 only.

If new versions are added, this should also be indicated using the "ver" flag
in the RouterAddress.

To detect the version of an incoming NTCP connection, Bob proceeds as follows:

- Decrypt the first 16 bytes of the SessionRequest packet using AES-256 with
  key RH_B.
- Check whether the first 2 bytes match a meaningful version number. This
  fails with probability N / 2^16, where N is the number of protocol versions.
- If ver = 2, additionally check whether the last 4 bytes are all zero. This
  fails with probability 1 / 2^24, such that errors are very unlikely.  For ver
  > 2, the procedure will be similar unless the reserved bytes are used.


Appendix A: Padding Scheme
==========================

This section discusses an attack on typical padding schemes that allows to
reveal the probability distribution of the length of the unpadded messages, by
only observing the length of the padded messages. Let N be a random variable
describing the number of unpadded bytes, and P likewise for the number of
padding bytes. The total message size is then N + P.

Assume that for an unpadded size of n, at least ``P_min(n) >= 0`` and at most
``P_max(n) >= P_min(n)`` bytes of padding are added in a padding scheme. The
obvious scheme uses padding of length P uniformly chosen at random:

::

  Pr[P = p | N = n] = 1 / (P_max(n) - P_min(n)) if P_min(n) <= p <= P_max(n),
                      0                         otherwise.

A naive padding scheme would simply ensure that the size of the padded message
does not exceed N_max:

::

  P_max(n) = N_max - n, n <= N_max
  P_min(n) = 0.

However, this leaks information about the unpadded length.

An attacker can easily estimate ``Pr[x <= N + P <= y]``, for example by means
of a histogram.

- From this, he can also try to estimate ``Pr[n_1 <= N <= n_2]``, indeed:

::

  Pr[N + P = m] = Σ_n Pr[N = n] Pr[P = m - n | N = n].

In the naive scheme,

::

  Pr[N + P = m] = Σ_{n <= m} Pr[N = n] / (N_max - n).

It's pretty obvious, as it was before doing the above calculation, that this
leaks information about ``Pr[N = n]``: if the length of packets is almost
always more than m, then N + P <= m will almost never be observed. This is not
the largest issue though, although being able to observe the minimum message
length can be considered to be a problem by itself.

A bigger issue is that it is possible to determine ``Pr[N = n]`` exactly:

::

  Pr[N + P = m] - Pr[N + P = m-1] = Pr[N = m] / (N_max - m),

that is

::

  Pr[N = n] = (N_max - n)(Pr[N + P = n] - Pr[N + P = n - 1])

To distinguish NTCP2, then, the attacker can use any of the following:

- Estimate ``Pr[kB <= N <= (k + 1)B - 1]`` for positive integers k. It will
  always be zero for NTCP2.

- Estimate ``Pr[N = kB]`` and compare with a standard I2P profile.

This simple attack hence partially destroys the purpose of padding, which
attempts to obfuscate the size distribution of the unpadded messages. The
amount of messages that the attacker has to observe to distinguish the protocol
depends on the desired accuracy and on the minimum and maximum unpadded message
sizes that occur in practice. Note that it is easy to gather many messages for
the attacker, since he can use all traffic sent from and to the particular port
that the target is using.

In some forms (e.g. estimation of ``Pr[kB <= N <= (k + 1)B - 1]``) the attack
requires only a few bytes of memory (one integer is enough) and it could be
argued that such an attack might be included in many slightly more advanced but
nevertheless standard DPI frameworks.

This proposal suggests using one of the following countermeasures:

- Develop an alternate padding scheme that takes into account the (estimated)
  distribution of N by using a non-uniform padding length distribution. A good
  padding scheme would probably require maintaining a histogram of the number
  of blocks per message.

- Add random delays between (randomly sized) fragments of messages.

The second option is more generally preferred, because it can be simultaneously
used as a countermeasure against flow analysis. However, such delays may be out
of scope for the NTCP2 protocol, such that the first option, which is also
easier to implement, may be preferred instead. 


Appendix B: Authenticated Cipher Candidates
===========================================

1) ChaCha20 or Salsa20/Poly1305
--------------------------------

Encrypted and authenticated data format:

.. raw:: html

  {% highlight lang='dataspec' %}
+----+----+----+----+----+----+----+----+
  |                                       |
  +    Random nonce   +----+----+----+----+
  |                   |                   |
  +----+----+----+----+                   +
  |                                       |
  +   Poly1305 Tag    +----+----+----+----+
  |                   |                   |
  +----+----+----+----+                   +
  |                                       |
  +    ChaCha20/Salsa20 encrypted data    +
  |                                       |
  ~               .   .   .               ~
  |                                       |
  +----+----+----+----+----+----+----+----+
{% endhighlight %}

For ChaCha20, what is described here corresponds to [RFC-7539]_, which is also
used similarly in TLS [RFC-7905]_.

Notes
`````
- Since Salsa20 and ChaCha20 are stream ciphers, plaintexts need not be padded.
  Additional keystream bytes are discarded.

- The key for the cipher (256 bits) is agreed upon by means of the KDF defined
  by the KE field. The one-time key for Poly1305 is generated pseudorandomly
  as in [RFC-7539]_, i.e. using the Salsa20 or the ChaCha20 block function.

Future work
```````````
- Decide on using Salsa20 or ChaCha20

- Do not generate the full nonce at random every time.

2) AES-CBC-256/HMAC-SHA1
-------------------------

To be specified.

3) AES-GCM-256
-------------------------

Encrypted and authenticated data format:

.. raw:: html

  {% highlight lang='dataspec' %}
+----+----+----+----+----+----+----+----+
  |                                       |
  +                GMAC Tag               +
  |                                       |
  +----+----+----+----+----+----+----+----+
  |                                       |
  +       AES-GCM-256 encrypted data      +
  |                                       |
  ~               .   .   .               ~
  |                                       |
  +----+----+----+----+----+----+----+----+
{% endhighlight %}

The IV used in AES-GCM-256 equals the last 12 bytes of the last encrypted block
of the previously sent message.

The initial IV is contained in the first encrypted and authenticated message:

.. raw:: html

  {% highlight lang='dataspec' %}
+----+----+----+----+----+----+----+----+
  |                                       |
  +     Random IV     +----+----+----+----+
  |                   |                   |
  +----+----+----+----+                   +
  |                                       |
  +     GMAC Tag      +----+----+----+----+
  |                   |                   |
  +----+----+----+----+                   +
  |                                       |
  +       AES-GCM-256 encrypted data      +
  |                                       |
  ~               .   .   .               ~
  |                                       |
  +----+----+----+----+----+----+----+----+
{% endhighlight %}

Notes
`````
- GCM does not require the IV to be random, it only needs to be unique. This
  justifies the use of the last 12 bytes of the last encrypted block of the
  previous message as the IV.

- "Associated data" is not used, i.e. all data in the AES-GCM-256 block is both
  encrypted and authenticated.


Appendix C: AEAD Cipher Selection
=================================

Which one?

- ChaCha20/Poly1305
- IETF implementation [5]_
- AES-GCM

Performance [4]_

- ChaCha20/Poly1305 256 -> 90MBps on phone hardware
- AES-GCM 256 -> 20MBps on phone hardware

General comments:

- AES-GCM is potentially more vulnerable to cache timing attacks for software
  implementations due to using lookup tables [1]_
- AES seems to be universally considered unpleasant [2]_
- AES-GCM is vulnerable to nonce re-use attacks [2]_
- ChaCha20/Poly1305 is not vulnerable to nonce re-use attacks due to fully
  implicit nonce based on record number, if implemented as in TLS 1.3 [2]_
- Poly1305/ChaCha20 is considered secure if nonces are handled properly [3]_

So based on these facts, ChaCha20/Poly1305 seems like the option that is
considered better by the cryptographer community.


References
==========

.. [CAESAR]
    https://competitions.cr.yp.to/caesar.html

.. [IACR-1150]
    https://eprint.iacr.org/2015/1150 

.. [NetDB]
    {{ site_url('docs/how/network-database', True) }}

.. [NTCP]
    {{ site_url('docs/transport/ntcp', True) }}

.. [Prop104]
    {{ proposal_url('104') }}

.. [Prop109]
    {{ proposal_url('109') }}

.. [RFC-2104]
    https://tools.ietf.org/html/rfc2104

.. [RFC-3526]
    https://tools.ietf.org/html/rfc3526

.. [RFC-6151]
    https://tools.ietf.org/html/rfc6151

.. [RFC-7539]
    https://tools.ietf.org/html/rfc7539

.. [RFC-7905]
    https://tools.ietf.org/html/rfc7905

.. [RouterAddress]
    {{ ctags_url('RouterAddress') }}

.. [RouterIdentity]
    {{ ctags_url('RouterIdentity') }}

.. [SIDH]
    De Feo, Luca; Jao, Plut., Towards quantum-resistant cryptosystems from
    supersingular elliptic curve isogenies

.. [SigningPublicKey]
    {{ ctags_url('SigningPublicKey') }}

.. [SSU]
    {{ site_url('docs/transport/ssu', True) }}

.. [STS]
    Diffie, W.; van Oorschot P. C.; Wiener M. J., Authentication and
    Authenticated Key Exchanges

.. [Ticket1112]
    https://{{ i2pconv('trac.i2p2.i2p') }}/ticket/1112

.. [Ticket1849]
    https://{{ i2pconv('trac.i2p2.i2p') }}/ticket/1849

.. [1] http://www.chesworkshop.org/ches2009/presentations/01_Session_1/CHES2009_ekasper.pdf

.. [2] https://www.blackhat.com/docs/us-16/materials/us-16-Devlin-Nonce-Disrespecting-Adversaries-Practical-Forgery-Attacks-On-GCM-In-TLS.pdf

.. [3] https://eprint.iacr.org/2014/613.pdf

.. [4] https://www.imperialviolet.org/2013/10/07/chacha20.html

.. [5] https://tools.ietf.org/html/rfc7539