diff --git a/router/doc/tunnel.html b/router/doc/tunnel.html new file mode 100644 index 000000000..cddeaf2c0 --- /dev/null +++ b/router/doc/tunnel.html @@ -0,0 +1,373 @@ +
+1) Tunnel overview +2) Tunnel operation +2.1) Message preprocessing +2.2) Gateway processing +2.3) Participant processing +2.4) Endpoint processing +2.5) Padding +2.6) Tunnel fragmentation +2.7) Alternatives +2.7.1) Don't use a checksum block +2.7.2) Adjust tunnel processing midstream +2.7.3) Use bidirectional tunnels +2.7.4) Use smaller hashes +3) Tunnel building +3.1) Peer selection +3.2) Request delivery +3.3) Pooling +4) Tunnel throttling +5) Mixing/batching ++ +
Within I2P, messages are passed in one direction through a virtual +tunnel of peers, using whatever means are available to pass the +message on to the next hop. Messages arrive at the tunnel's +gateway, get bundled up for the path, and are forwarded on to the +next hop in the tunnel, which processes and verifies the validity +of the message and sends it on to the next hop, and so on, until +it reaches the tunnel endpoint. That endpoint takes the messages +bundled up by the gateway and forwards them as instructed - either +to another router, to another tunnel on another router, or locally.
+ +Tunnels all work the same, but can be segmented into two different +groups - inbound tunnels and outbound tunnels. The inbound tunnels +have an untrusted gateway which passes messages down towards the +tunnel creator, which serves as the tunnel endpoint. For outbound +tunnels, the tunnel creator serves as the gateway, passing messages +out to the remote endpoint.
+ +The tunnel's creator selects exactly which peers will participate +in the tunnel, and provides each with the necessary confiruration +data. They may vary in length from 0 hops (where the gateway +is also the endpoint) to 9 hops (where there are 7 peers after +the gateway and before the endpoint). It is the intent to make +it hard for either participants or third parties to determine +the length of a tunnel, or even for colluding participants to +determine whether they are a part of the same tunnel at all +(barring the situation where colluding peers are next to each other +in the tunnel). Messages that have been corrupted are also dropped +as soon as possible, reducing network load.
+ +Beyond their length, there are additional configurable parameters +for each tunnel that can be used, such as a throttle on the size or +frequency of messages delivered, how padding should be used, how +long a tunnel should be in operation, whether to inject chaff +messages, whether to use fragmentation, and what, if any, batching +strategies should be employed.
+ +In practice, a series of tunnel pools are used for different +purposes - each local client destination has its own set of inbound +tunnels and outbound tunnels, configured to meet its anonymity and +performance needs. In addition, the router itself maintains a series +of pools for participating in the network database and for managing +the tunnels themselves.
+ +I2P is an inherently packet switched network, even with these +tunnels, allowing it to take advantage of multiple tunnels running +in parallel, increasing resiliance and balancing load. Outside of +the core I2P layer, there is an optional end to end streaming library +available for client applications, exposing TCP-esque operation, +including message reordering, retransmission, congestion control, etc.
+ +Tunnel operation has four distinct processes, taken on by various +peers in the tunnel. First, the tunnel gateway accumulates a number +of tunnel messages and preprocesses them into something for tunnel +delivery. Next, that gateway encrypts that preprocessed data, then +forwards it to the first hop. That peer, and subsequent tunnel +participants, unwrap a layer of the encryption, verifying the +integrity of the message, then forward it on to the next peer. +Eventually, the message arrives at the endpoint where the messages +bundled by the gateway are split out again and forwarded on as +requested.
+ +When the gateway wants to deliver data through the tunnel, it first +gathers zero or more I2NP messages (no more than 32KB worth), +selects how much padding will be used, and decides how each I2NP +message should be handled by the tunnel endpoint, encoding that +data into the raw tunnel payload:
+The instructions are encoded as follows:
++ bits 0-1: delivery type + (0x0 = LOCAL, 0x01 = TUNNEL, 0x02 = ROUTER) + bit 2: delay included? (1 = true, 0 = false) + bit 3: fragmented? (1 = true, 0 = false) + bit 4: extended options? (1 = true, 0 = false) + bits 5-7: reserved
+ bit 0: type (0 = strict, 1 = randomized) + bits 1-7: delay exponent (2^value minutes)
+ bits 0-6: fragment number + bit 7: is last? (1 = true, 0 = false)
+ = a 1 byte option size (in bytes) + = that many bytes
The I2NP message is encoded in its standard form, and the +preprocessed payload must be padded to a multiple of 16 bytes.
+ +After the preprocessing of messages into a padded payload, the gateway +encrypts the payload with the eight keys, building a checksum block so +that each peer can verify the integrity of the payload at any time, as +well as an end to end verification block for the tunnel endpoint to +verify the integrity of the checksum block. The specific details follow.
+ +The encryption used is such that decryption +merely requires running over the data with AES in CTR mode, calculating the +SHA256 of a certain fixed portion of the message (bytes 16 through $size-288), +and searching for that hash in the checksum block. There is a fixed number +of hops defined (8 peers after the gateway) so that we can verify the message +without either leaking the position in the tunnel or having the message +continually "shrink" as layers are peeled off. For tunnels shorter than 9 +hops, the tunnel creator will take the place of the excess hops, decrypting +with their keys (for outbound tunnels, this is done at the beginning, and for +inbound tunnels, the end).
+ +The hard part in the encryption is building that entangled checksum block, +which requires essentially finding out what the hash of the payload will look +like at each step, randomly ordering those hashes, then building a matrix of +what each of those randomly ordered hashes will look like at each step. +To visualize this a bit:
+ ++ | IV | Payload | +eH[0] | eH[1] | +eH[2] | eH[3] | +eH[4] | eH[5] | +eH[6] | eH[7] | +V | +|
peer0 key=K[0] | recv | +IV[0] | P[0] | ++ | V[0] | +|||||||
send | +IV[1] | P[1] | +H(P[1]) | ++ | V[1] | +|||||||
peer1 key=K[1] | recv | +|||||||||||
send | +IV[2] | P[2] | ++ | H(P[2]) | + | V[2] | +||||||
peer2 key=K[2] | recv | +|||||||||||
send | +IV[3] | P[3] | ++ | H(P[3]) | +V[3] | +|||||||
peer3 key=K[3] | recv | +|||||||||||
send | +IV[4] | P[4] | +H(P[4]) | + | + | V[4] | +||||||
peer4 key=K[4] | recv | +|||||||||||
send | +IV[5] | P[5] | +H(P[5]) | + | + | V[5] | +||||||
peer5 key=K[5] | recv | +|||||||||||
send | +IV[6] | P[6] | +H(P[6]) | + | + | V[6] | +||||||
peer6 key=K[6] | recv | +|||||||||||
send | +IV[7] | P[7] | ++ | H(P[7]) | + | V[7] | +||||||
peer7 key=K[7] | recv | +|||||||||||
send | +IV[8] | P[8] | +H(P[8]) | + | V[8] | +
In the above, P[8] is the same as the original data being passed through the +tunnel (the preprocessed messages), and V[8] is the SHA256 of eH[0-7] as seen on +peer7 after decryption. For +cells in the matrix "higher up" than the hash, their value is derived by encrypting +the cell below it with the key for the peer below it, using the end of the column +to the left of it as the IV. For cells in the matrix "lower down" than the hash, +they're equal to the cell above them, decrypted by the current peer's key, using +the end of the previous encrypted block on that row.
+ +With this randomized matrix of checksum blocks, each peer will be able to find +the hash of the payload, or if it is not there, know that the message is corrupt. +The entanglement by using CTR mode increases the difficulty in tagging the +checksum blocks themselves, but it is still possible for that tagging to go +briefly undetected if the columns after the tagged data have already been used +to check the payload at a peer. In any case, the tunnel endpoint (peer 7) knows +for certain whether any of the checksum blocks have been tagged, as that would +corrupt the verification block (V[8]).
+ +The IV[0] is a random 16 byte value, and IV[i] is the first 16 bytes of +H(D(IV[i-1], K[i-1])). We don't use the same IV along the path, as that would +allow trivial collusion, and we use the hash of the decrypted value to propogate +the IV so as to hamper key leakage.
+ +When a participant in a tunnel receives a message, they decrypt a layer with their +tunnel key using AES256 in CTR mode with the first 16 bytes as the IV. They then +calculate the hash of what they see as the payload (bytes 16 through $size-288) and +search for that hash within the decrypted checksum block. If no match is found, the +message is discarded. Otherwise, the IV is updated by decrypting it and replacing it +with the first 16 bytes of its hash. The resulting message is then forwarded on to +the next peer for processing.
+ +When a message reaches the tunnel endpoint, they decrypts and verifies it like +a normal participant. If the checksum block has a valid match, the endpoint then +computes the hash of the checksum block itself (as seen after decryption) and compares +that to the decrypted verification hash (the last 32 bytes). If that verification +hash does not match, the endpoint takes note of the tagging attempt by one of the +tunnel participants and perhaps discards the message.
+ +At this point, the tunnel endpoint has the preprocessed data sent by the gateway, +which it may then parse out into the included I2NP messages and forwards them as +requested in their delivery instructions.
+ +Several tunnel padding strategies are possible, each with their own merits:
+ +Which to use? no padding is most efficient, random padding is what +we have now, fixed size would either be an extreme waste or force us to +implement fragmentation. Padding to the closest exponential size (ala freenet) +seems promising. Perhaps we should gather some stats on the net as to what size +messages are, then see what costs and benefits would arise from different +strategies?
+ +For various padding and mixing schemes, it may be useful from an anonymity +perspective to fragment a single I2NP message into multiple parts, each delivered +seperately through different tunnel messages. The endpoint may or may not +support that fragmentation (discarding or hanging on to fragments as needed), +and handling fragmentation will not immediately be implemented.
+ +One alternative to the above process is to remove the checksum block +completely and replace the verification hash with a plain hash of the payload. +This would simplify processing at the tunnel gateway and save 256 bytes of +bandwidth at each hop. On the other hand, attackers within the tunnel could +trivially adjust the message size to one which is easily traceable by +colluding external observers in addition to later tunnel participants. The +corruption would also incur the waste of the entire bandwidth necessary to +pass on the message. Without the per-hop validation, it would also be possible +to consume excess network resources by building extremely long tunnels, or by +building loops into the tunnel.
+ +While the simple tunnel routing algorithm should be sufficient for most cases, +there are three alternatives that can be explored:
+The current strategy of using two seperate tunnels for inbound and outbound +communication is not the only technique available, and it does have anonymity +implications. On the positive side, by using separate tunnels it lessens the +traffic data exposed for analysis to participants in a tunnel - for instance, +peers in an outbound tunnel from a web browser would only see the traffic of +an HTTP GET, while the peers in an inbound tunnel would see the payload +delivered along the tunnel. With bidirectional tunnels, all participants would +have access to the fact that e.g. 1KB was sent in one direction, then 100KB +in the other. On the negative side, using unidirectional tunnels means that +there are two sets of peers which need to be profiled and accounted for, and +additional care must be taken to address the increased speed of predecessor +attacks. The tunnel pooling and building process outlined below should +minimize the worries of the predecessor attack, though if it were desired, +it wouldn't be much trouble to build both the inbound and outbound tunnels +along the same peers.
+ +At the moment, the plan is to reuse the existing SHA256 code and build +all of the checksum and verification hashes as 32 byte SHA256 values. 20 +byte SHA1 would likely be more than sufficient, and perhaps smaller.
+ +