- tunnel-alt-creation rework

- More how_crypto and i2np_spec fixups - Quick NTCP fixup, move discussion to new page
2010-08-04 14:19:34 +00:00
parent 78f90bab94
commit 5e1cff3fdc
5 changed files with 831 additions and 682 deletions
--- a/www.i2p2/pages/how_cryptography.html
+++ b/www.i2p2/pages/how_cryptography.html
@ -35,8 +35,8 @@ block is formatted (in network byte order):
 <p>
 The H(data) is the SHA256 of the data that is encrypted in the ElGamal block,
 and is preceded by a random nonzero byte.  The data encrypted in the block 
-can be up to 222 bytes long.  Specifically, see 
+can be up to 223 bytes long. See 
-<a href="http://docs.i2p2.de/core/net/i2p/crypto/ElGamalEngine.html">[the code]</a>.
+<a href="http://docs.i2p2.de/core/net/i2p/crypto/ElGamalEngine.html">the ElGamal Javadoc</a>.
 <p>
 ElGamal is never used on its own in I2P, but instead always as part of 
 <a href="how_elgamalaes">ElGamal/AES+SessionTag</a>.
--- a/www.i2p2/pages/i2np_spec.html
+++ b/www.i2p2/pages/i2np_spec.html
@ -174,7 +174,7 @@ iv_key :: SessionKey
 reply_key :: SessionKey
             length -> 32 bytes
-reply_iv :: Integer
+reply_iv :: data
            length -> 16 bytes
 flag :: Integer
@ -182,6 +182,7 @@ flag :: Integer
 request_time :: Integer
                length -> 4 bytes
                Hours since the epoch, i.e. current time / 3600
 send_message_id :: Integer
                   length -> 4 bytes
@ -191,17 +192,27 @@ padding :: Data
           source -> random
 total length: 223
 encrypted:
 toPeer :: Hash
          length -> 16 bytes
 encrypted_data :: ElGamal-2048 encrypted data
-                  length -> 514
+                  length -> 512
 total length: 528
 {% endfilter %}
 </pre>
 <h4>Notes</h4>
 <p>
  See also the <a href="tunnel-alt-creation.html">tunnel creation specification</a>.
 </p>
 <h3 id="struct_BuildResponseRecord">BuildResponseRecord</h3>
 <pre>
 {% filter escape %}
@ -224,9 +235,17 @@ byte  527  : reply
 encrypted:
 bytes 0-527: AES-encrypted record(note: same size as BuildRequestRecord!)
 total length: 528
 {% endfilter %}
 </pre>
 <h4>Notes</h4>
 <p>
  See also the <a href="tunnel-alt-creation.html">tunnel creation specification</a>.
 </p>
 <h2 id="messages">Messages</h2>
 <table border=1>
@ -667,6 +686,11 @@ Total size: 8*528 = 4224 bytes
 {% endfilter %}
 </pre>
 <h4>Notes</h4>
 <p>
  See also the <a href="tunnel-alt-creation.html">tunnel creation specification</a>.
 </p>
 <h3 id="msg_TunnelBuildReply">TunnelBuildReply</h3>
 <pre>
@ -675,6 +699,11 @@ same format as TunnelBuild message
 {% endfilter %}
 </pre>
 <h4>Notes</h4>
 <p>
  See also the <a href="tunnel-alt-creation.html">tunnel creation specification</a>.
 </p>
 <h3 id="msg_VariableTunnelBuild">VariableTunnelBuild</h3>
 <pre>
 {% filter escape %}
@ -697,9 +726,19 @@ Total size: 1 + $num*528
 {% endfilter %}
 </pre>
 <h4>Notes</h4>
 <p>
  See also the <a href="tunnel-alt-creation.html">tunnel creation specification</a>.
 </p>
 <h3 id="msg_VariableTunnelBuildReply">VariableTunnelBuildReply</h3>
 <pre>
 {% filter escape %}
 same format as VariableTunnelBuild message
 {% endfilter %}
 </pre>
 <h4>Notes</h4>
 <p>
  See also the <a href="tunnel-alt-creation.html">tunnel creation specification</a>.
 </p>
--- a/www.i2p2/pages/ntcp.html
+++ b/www.i2p2/pages/ntcp.html
@ -2,20 +2,25 @@
 {% block title %}NTCP{% endblock %}
 {% block content %}
-<h1>NTCP (NIO-based TCP)</h1>
+Updated August 2010 for release 0.8
 <h2>NTCP (NIO-based TCP)</h2>
 <p>
-NTCP was introduced in I2P 0.6.1.22.
+NTCP
-It is a Java NIO-based transport, enabled by default for outbound
+is one of two <a href="transport.html">transports</a> currently implemented in I2P.
-connections only.  Those who configure their NAT/firewall to allow
+The other is <a href="udp.html">SSU</a>.
-inbound connections and specify the external host and port
+NTCP
-(dyndns/etc is okay) on /config.jsp can receive inbound connections.
+is a Java NIO-based transport
-NTCP is NIO based, so it doesn't suffer from the 1 thread per connection issues of the old TCP transport.
+introduced in I2P release 0.6.1.22.
 Java NIO (new I/O) does not suffer from the 1 thread per connection issues of the old TCP transport.
 </p><p>
-As of 0.6.1.29, NTCP uses the IP/Port
+By default,
 NTCP uses the IP/Port
 auto-detected by SSU. When enabled on config.jsp,
-SSU will notify/restart NTCP when the external address changes.
+SSU will notify/restart NTCP when the external address changes
 or when the firewall status changes.
 Now you can enable inbound TCP without a static IP or dyndns service.
 </p><p>
@ -23,71 +28,47 @@ The NTCP code within I2P is relatively lightweight (1/4 the size of the SSU code
 because it uses the underlying Java TCP transport.
 </p>
 <h2>Transport Bids and Transport Comparison</h2>
 <h2>NTCP Protocol Specification</h2>
 <h3>Standard Message Format</h3>
 <p>
-I2P supports multiple transports simultaneously.
+  The NTCP transport sends individual I2NP messages AES/256/CBC encrypted with
-A particular transport for an outbound connection is selected with "bids".
+  a simple checksum.  The unencrypted message is encoded as follows:
 Each transport bids for the connection and the relative value of these bids
 assigns the priority.
 Transports may reply with different bids, depending on whether there is
 already an established connection to the peer.
 </p><p>
 To compare the performance of UDP and NTCP,
 you can adjust the value of i2np.udp.preferred in configadvanced.jsp
 (introduced in I2P 0.6.1.29).
 Possible settings are
 "false" (default), "true", and "always".
 Default setting results in same behavior as before
 (NTCP is preferred unless it isn't established and UDP is established).
 </p><p>
 The table below shows the new bid values. A lower bid is a higher priority.
 <p>
 <table border=1>
 <tr>
 <td><td colspan=3>i2np.udp.preferred setting
 <tr>
 <td>Transport<td>false<td>true<td>always
 <tr>
 <td>NTCP Established<td>25<td>25<td>25
 <tr>
 <td>UDP Established<td>50<td>15<td>15
 <tr>
 <td>NTCP Not established<td>70<td>70<td>70
 <tr>
 <td>UDP Not established<td>1000<td>65<td>20
 </table>
 <h2>NTCP Transport Protocol</h2>
 <pre>
 * Coordinate the connection to a single peer.
 *
 * The NTCP transport sends individual I2NP messages AES/256/CBC encrypted with
 * a simple checksum.  The unencrypted message is encoded as follows:
 *  +-------+-------+--//--+---//----+-------+-------+-------+-------+
- *  | sizeof(data)  | data | padding | adler checksum of sz+data+pad |
+ *  | sizeof(data)  | data | padding | Adler checksum of sz+data+pad |
 *  +-------+-------+--//--+---//----+-------+-------+-------+-------+
- * That message is then encrypted with the DH/2048 negotiated session key
+</pre>
- * (station to station authenticated per the EstablishState class) using the
+  That message is then encrypted with the DH/2048 negotiated session key
- * last 16 bytes of the previous encrypted message as the IV.
+  (station to station authenticated per the EstablishState class) using the
- *
+  last 16 bytes of the previous encrypted message as the IV.
- * One special case is a metadata message where the sizeof(data) is 0.  In
+</p>
- * that case, the unencrypted message is encoded as:
+
 <p>
 0-15 bytes of padding are required to bring the total message length
 (including the six size and checksum bytes) to a multiple of 16.
 The maximum message size is currently 16 KB.
 Therefore the maximum data size is currently 16 KB - 6, or 16378 bytes.
 The minimum data size is 1.
 </p>
 <h3>Time Sync Message Format</h3>
 <p>
  One special case is a metadata message where the sizeof(data) is 0.  In
  that case, the unencrypted message is encoded as:
 <pre>
 *  +-------+-------+-------+-------+-------+-------+-------+-------+
 *  |       0       |      timestamp in seconds     | uninterpreted             
 *  +-------+-------+-------+-------+-------+-------+-------+-------+
- *          uninterpreted           | adler checksum of sz+data+pad |
+ *          uninterpreted           | Adler checksum of bytes 0-11  |
 *  +-------+-------+-------+-------+-------+-------+-------+-------+
 * 
 *
 </pre>
 Total length: 16 bytes. The time sync message is sent at approximately 15 minute intervals.
 <h3>Establishment Sequence</h3>
 In the establish state, the following communication happens.
 There is a 2048-bit Diffie Hellman exchange.
 For more information see the <a href="how_cryptography.html#tcp">cryptography page</a>.
@ -99,571 +80,33 @@ For more information see the <a href="how_cryptography.html#tcp">cryptography pa
 *  E(#+Alice.identity+tsA+padding+S(X+Y+Bob.identHash+tsA+tsB+padding), sk, hX_xor_Bob.identHash[16:31])--->
 *  <----------------------E(S(X+Y+Alice.identHash+tsA+tsB)+padding, sk, prev)
 </pre>
 Todo: Explain this in words.
 <h3>Check Connection Message</h3>
 Alternately, when Bob receives a connection, it could be a
 check connection (perhaps prompted by Bob asking for someone
 to verify his listener).
-It does not appear that 'check connection' is used.
+Check Connection is not currently used.
-However, for the record, check connections are formatted as follows:
+However, for the record, check connections are formatted as follows.
-<pre>
+     A check info connection will receive 256 bytes containing:
-     * a check info connection will receive 256 bytes containing:
+<ul>
-     * - 32 bytes of uninterpreted, ignored data
+     <li> 32 bytes of uninterpreted, ignored data
-     * - 1 byte size
+     <li> 1 byte size
-     * - that many bytes making up the local router's IP address (as reached by the remote side)
+     <li> that many bytes making up the local router's IP address (as reached by the remote side)
-     * - 2 byte port number that the local router was reached on
+     <li> 2 byte port number that the local router was reached on
-     * - 4 byte i2p network time as known by the remote side (seconds since the epoch)
+     <li> 4 byte i2p network time as known by the remote side (seconds since the epoch)
-     * - uninterpreted padding data, up to byte 223
+     <li> uninterpreted padding data, up to byte 223
-     * - xor of the local router's identity hash and the SHA256 of bytes 32 through bytes 223
+     <li> xor of the local router's identity hash and the SHA256 of bytes 32 through bytes 223
 </ul>
 </pre>
 <h2>Discussion</h2>
 Now on the <a href="ntcp_discussion.html">NTCP Discussion Page</a>.
-<h2>NTCP vs. SSU Discussion, March 2007</h2>
+<h2><a name="future">Future Work</a></h2>
-<h3>NTCP questions</h3>
+<p>The maximum message size should be increased to approximately 32 KB.
 (adapted from an IRC discussion between zzz and cervantes)
 <br />
 Why is NTCP preferred over SSU, doesn't NTCP have higher overhead and latency?
 It has better reliability.
 <br />
 Doesn't streaming lib over NTCP suffer from classic TCP-over-TCP issues?
 What if we had a really simple UDP transport for streaming-lib-originated traffic?
 I think SSU was meant to be the so-called really simple UDP transport - but it just proved too unreliable.
 <h3>"NTCP Considered Harmful" Analysis by zzz</h3>
 Posted to new Syndie, 2007-03-25.
 This was posted to stimulate discussion, don't take it too seriously.
 <p>
 Summary: NTCP has higher latency and overhead than SSU, and is more likely to 
 collapse when used with the streaming lib. However, traffic is routed with a 
 preference for NTCP over SSU and this is currently hardcoded.
 </p>
 <h4>Discussion</h4>
 <p>
 We currently have two transports, NTCP and SSU. As currently implemented, NTCP 
 has lower "bids" than SSU so it is preferred, except for the case where there 
 is an established SSU connection but no established NTCP connection for a peer.
 </p><p>
 SSU is similar to NTCP in that it implements acknowledgments, timeouts, and 
 retransmissions. However SSU is I2P code with tight constraints on the 
 timeouts and available statistics on round trip times, retransmissions, etc. 
 NTCP is based on Java NIO TCP, which is a black box and presumably implements 
 RFC standards, including very long maximum timeouts.
 </p><p>
 The majority of traffic within I2P is streaming-lib originated (HTTP, IRC, 
 Bittorrent) which is our implementation of TCP. As the lower-level transport is 
 generally NTCP due to the lower bids, the system is subject to the well-known 
 and dreaded problem of TCP-over-TCP 
 http://sites.inka.de/~W1011/devel/tcp-tcp.html , where both the higher and 
 lower layers of TCP are doing retransmissions at once, leading to collapse.
 </p><p>
 Unlike in the PPP over SSH scenario described in the link above, we have 
 several hops for the lower layer, each covered by a NTCP link. So each NTCP 
 latency is generally much less than the higher-layer streaming lib latency. 
 This lessens the chance of collapse.
 </p><p>
 Also, the probabilities of collapse are lessened when the lower-layer TCP is 
 tightly constrained with low timeouts and number of retransmissions compared to 
 the higher layer.
 </p><p>
 The .28 release increased the maximum streaming lib timeout from 10 sec to 45 
 sec which greatly improved things. The SSU max timeout is 3 sec. The NTCP max 
 timeout is presumably at least 60 sec, which is the RFC recommendation. There 
 is no way to change NTCP parameters or monitor performance. Collapse of the 
 NTCP layer is [editor: text lost]. Perhaps an external tool like tcpdump would help.
 </p><p>
 However, running .28, the i2psnark reported upstream does not generally stay at 
 a high level. It often goes down to 3-4 KBps before climbing back up. This is a 
 signal that there are still collapses.
 </p><p>
 SSU is also more efficient. NTCP has higher overhead and probably higher round 
 trip times. when using NTCP the ratio of (tunnel output) / (i2psnark data 
 output) is at least 3.5 : 1. Running an experiment where the code was modified 
 to prefer SSU (the config option i2np.udp.alwaysPreferred has no effect in the 
 current code), the ratio reduced to about 3 : 1, indicating better efficiency.
 </p><p>
 As reported by streaming lib stats, things were much improved - lifetime window 
 size up from 6.3 to 7.5, RTT down from 11.5s to 10s, sends per ack down from 
 1.11 to 1.07.
 </p><p>
 That this was quite effective was surprising, given that we were only changing 
 the transport for the first of 3 to 5 total hops the outbound messages would 
 take.
 </p><p>
 The effect on outbound i2psnark speeds wasn't clear due to normal variations. 
 Also for the experiment, inbound NTCP was disabled. The effect on inbound 
 speeds on i2psnark was not clear.
 </p>
 <h4>Proposals</h4>
 <ul>
 <li>
 1A)
 This is easy -
 We should flip the bid priorities so that SSU is preferred for all traffic, if 
 we can do this without causing all sorts of other trouble. This will fix the 
 i2np.udp.alwaysPreferred configuration option so that it works (either as true 
 or false).
 <li>
 1B)
 Alternative to 1A), not so easy -
 If we can mark traffic without adversely affecting our anonymity goals, we 
 should identify streaming-lib generated traffic and have SSU generate a low bid 
 for that traffic. This tag will have to go with the message through each hop
 so that the forwarding routers also honor the SSU preference.
 <li>
 2)
 Bounding SSU even further (reducing maximum retransmissions from the current 
 10) is probably wise to reduce the chance of collapse.
 <li>
 3)
 We need further study on the benefits vs. harm of a semi-reliable protocol 
 underneath the streaming lib. Are retransmissions over a single hop beneficial 
 and a big win or are they worse than useless?
 We could do a new SUU (secure unreliable UDP) but probably not worth it. We 
 could perhaps add a no-ack-required message type in SSU if we don't want any 
 retransmissions at all of streaming-lib traffic. Are tightly bounded 
 retransmissions desirable?
 <li>
 4)
 The priority sending code in .28 is only for NTCP. So far my testing hasn't 
 shown much use for SSU priority as the messages don't queue up long enough for 
 priorities to do any good. But more testing needed.
 <li>
 5)
 The new streaming lib max timeout of 45s is probably still too low.
 The TCP RFC says 60s. It probably shouldn't be shorter than the underlying NTCP max timeout (presumably 60s).
 </ul>
 <h3>Response by jrandom</h3>
 Posted to new Syndie, 2007-03-27
 <p>
 On the whole, I'm open to experimenting with this, though remember why NTCP is 
 there in the first place - SSU failed in a congestion collapse. NTCP "just 
 works", and while 2-10% retransmission rates can be handled in normal 
 single-hop networks, that gives us a 40% retransmission rate with 2 hop 
 tunnels. If you loop in some of the measured SSU retransmission rates we saw 
 back before NTCP was implemented (10-30+%), that gives us an 83% retransmission 
 rate. Perhaps those rates were caused by the low 10 second timeout, but 
 increasing that much would bite us (remember, multiply by 5 and you've got half 
 the journey).
 </p><p>
 Unlike TCP, we have no feedback from the tunnel to know whether the message 
 made it - there are no tunnel level acks. We do have end to end ACKs, but only 
 on a small number of messages (whenever we distribute new session tags) - out 
 of the 1,553,591 client messages my router sent, we only attempted to ACK 
 145,207 of them. The others may have failed silently or succeeded perfectly.
 </p><p>
 I'm not convinced by the TCP-over-TCP argument for us, especially split across 
 the various paths we transfer down. Measurements on I2P can convince me 
 otherwise, of course.
 </p><p>
 <i>
 The NTCP max timeout is presumably at least 60 sec, which is the RFC 
 recommendation. There is no way to change NTCP parameters or monitor 
 performance.
 </i>
 </p><p>
 True, but net connections only get up to that level when something really bad 
 is going on - the retransmission timeout on TCP is often on the order of tens 
 or hundreds of milliseconds. As foofighter points out, they've got 20+ years 
 experience and bugfixing in their TCP stacks, plus a billion dollar industry 
 optimizing hardware and software to perform well according to whatever it is 
 they do.
 </p><p>
 <i>
 NTCP has higher overhead and probably higher round trip times. when using NTCP 
 the ratio of (tunnel output) / (i2psnark data output) is at least 3.5 : 1. 
 Running an experiment where the code was modified to prefer SSU (the config 
 option i2np.udp.alwaysPreferred has no effect in the current code), the ratio 
 reduced to about 3 : 1, indicating better efficiency.
 </i>
 </p><p>
 This is very interesting data, though more as a matter of router congestion 
 than bandwidth efficiency - you'd have to compare 3.5*$n*$NTCPRetransmissionPct 
 ./. 3.0*$n*$SSURetransmissionPct. This data point suggests there's something in 
 the router that leads to excess local queuing of messages already being 
 transferred.
 </p><p>
 <i>
 lifetime window size up from 6.3 to 7.5, RTT down from 11.5s to 10s, sends per 
 ACK down from 1.11 to 1.07.
 </i>
 </p><p>
 Remember that the sends-per-ACK is only a sample not a full count (as we don't 
 try to ACK every send). Its not a random sample either, but instead samples 
 more heavily periods of inactivity or the initiation of a burst of activity - 
 sustained load won't require many ACKs.
 </p><p>
 Window sizes in that range are still woefully low to get the real benefit of 
 AIMD, and still too low to transmit a single 32KB BT chunk (increasing the 
 floor to 10 or 12 would cover that).
 </p><p>
 Still, the wsize stat looks promising - over how long was that maintained?
 </p><p>
 Actually, for testing purposes, you may want to look at 
 StreamSinkClient/StreamSinkServer or even TestSwarm in 
 apps/ministreaming/java/src/net/i2p/client/streaming/ - StreamSinkClient is a 
 CLI app that sends a selected file to a selected destination and 
 StreamSinkServer creates a destination and writes out any data sent to it 
 (displaying size and transfer time). TestSwarm combines the two - flooding 
 random data to whomever it connects to. That should give you the tools to 
 measure sustained throughput capacity over the streaming lib, as opposed to BT 
 choke/send.
 </p><p>
 <i>
 1A)
 This is easy -
 We should flip the bid priorities so that SSU is preferred for all traffic, if 
 we can do this without causing all sorts of other trouble. This will fix the 
 i2np.udp.alwaysPreferred configuration option so that it works (either as true 
 or false).
 </i>
 </p><p>
 Honoring i2np.udp.alwaysPreferred is a good idea in any case - please feel free 
 to commit that change. Lets gather a bit more data though before switching the 
 preferences, as NTCP was added to deal with an SSU-created congestion collapse.
 </p><p>
 <i>
 1B)
 Alternative to 1A), not so easy -
 If we can mark traffic without adversely affecting our anonymity goals, we 
 should identify streaming-lib generated traffic
 and have SSU generate a low bid for that traffic. This tag will have to go with 
 the message through each hop
 so that the forwarding routers also honor the SSU preference.
 </i>
 </p><p>
 In practice, there are three types of traffic - tunnel building/testing, netDb 
 query/response, and streaming lib traffic. The network has been designed to 
 make differentiating those three very hard.
 </p><p>
 <i>
 2)
 Bounding SSU even further (reducing maximum retransmissions from the current 
 10) is probably wise to reduce the chance of collapse.
 </i>
 </p><p>
 At 10 retransmissions, we're up shit creek already, I agree. One, maybe two 
 retransmissions is reasonable, from a transport layer, but if the other side is 
 too congested to ACK in time (even with the implemented SACK/NACK capability), 
 there's not much we can do.
 </p><p>
 In my view, to really address the core issue we need to address why the router 
 gets so congested to ACK in time (which, from what I've found, is due to CPU 
 contention). Maybe we can juggle some things in the router's processing to make 
 the transmission of an already existing tunnel higher CPU priority than 
 decrypting a new tunnel request? Though we've got to be careful to avoid 
 starvation.
 </p><p>
 <i>
 3)
 We need further study on the benefits vs. harm of a semi-reliable protocol 
 underneath the streaming lib. Are retransmissions over a single hop beneficial 
 and a big win or are they worse than useless?
 We could do a new SUU (secure unreliable UDP) but probably not worth it. We 
 could perhaps add a no-ACK-required message type in SSU if we don't want any 
 retransmissions at all of streaming-lib traffic. Are tightly bounded 
 retransmissions desirable?
 </i>
 </p><p>
 Worth looking into - what if we just disabled SSU's retransmissions? It'd 
 probably lead to much higher streaming lib resend rates, but maybe not.
 </p><p>
 <i>
 4)
 The priority sending code in .28 is only for NTCP. So far my testing hasn't 
 shown much use for SSU priority as the messages don't queue up long enough for 
 priorities to do any good. But more testing needed.
 </i>
 </p><p>
 There's UDPTransport.PRIORITY_LIMITS and UDPTransport.PRIORITY_WEIGHT (honored 
 by TimedWeightedPriorityMessageQueue), but currently the weights are almost all 
 equal, so there's no effect. That could be adjusted, of course (but as you 
 mention, if there's no queuing, it doesn't matter).
 </p><p>
 <i>
 5)
 The new streaming lib max timeout of 45s is probably still too low. The TCP RFC 
 says 60s. It probably shouldn't be shorter than the underlying NTCP max timeout 
 (presumably 60s).
 </i>
 </p><p>
 That 45s is the max retransmission timeout of the streaming lib though, not the 
 stream timeout. TCP in practice has retransmission timeouts orders of magnitude 
 less, though yes, can get to 60s on links running through exposed wires or 
 satellite transmissions ;) If we increase the streaming lib retransmission 
 timeout to e.g. 75 seconds, we could go get a beer before a web page loads 
 (especially assuming less than a 98% reliable transport). That's one reason we 
 prefer NTCP.
 </p>
 <h3>Response by zzz</h3>
 Posted to new Syndie, 2007-03-31
 <p>
 <i>
 At 10 retransmissions, we're up shit creek already, I agree. One, maybe two 
 retransmissions is reasonable, from a transport layer, but if the other side is 
 too congested to ACK in time (even with the implemented SACK/NACK capability), 
 there's not much we can do.
 <br>
 In my view, to really address the core issue we need to address why the 
 router gets so congested to ACK in time (which, from what I've found, is due to 
 CPU contention). Maybe we can juggle some things in the router's processing to 
 make the transmission of an already existing tunnel higher CPU priority than 
 decrypting a new tunnel request? Though we've got to be careful to avoid 
 starvation.
 </i>
 </p><p>
 One of my main stats-gathering techniques is turning on 
 net.i2p.client.streaming.ConnectionPacketHandler=DEBUG and watching the RTT 
 times and window sizes as they go by. To overgeneralize for a moment, it's 
 common to see 3 types of connections: ~4s RTT, ~10s RTT, and ~30s RTT. Trying 
 to knock down the 30s RTT connections is the goal. If CPU contention is the 
 cause then maybe some juggling will do it.
 </p><p>
 Reducing the SSU max retrans from 10 is really just a stab in the dark as we 
 don't have good data on whether we are collapsing, having TCP-over-TCP issues, 
 or what, so more data is needed.
 </p><p>
 <i>
 Worth looking into - what if we just disabled SSU's retransmissions? It'd 
 probably lead to much higher streaming lib resend rates, but maybe not.
 </i>
 </p><p>
 What I don't understand, if you could elaborate, are the benefits of SSU 
 retransmissions for non-streaming-lib traffic. Do we need tunnel messages (for 
 example) to use a semi-reliable transport or can they use an unreliable or 
 kinda-sorta-reliable transport (1 or 2 retransmissions max, for example)? In 
 other words, why semi-reliability?
 </p><p>
 <i>
 (but as you mention, if there's no queuing, it doesn't matter).
 </i>
 </p><p>
 I implemented priority sending for UDP but it kicked in about 100,000 times 
 less often than the code on the NTCP side. Maybe that's a clue for further 
 investigation or a hint - I don't understand why it would back up that much 
 more often on NTCP, but maybe that's a hint on why NTCP performs worse.
 </p>
 <h3>Question answered by jrandom</h3>
 Posted to new Syndie, 2007-03-31
 <p>
 measured SSU retransmission rates we saw back before NTCP was implemented 
 (10-30+%)
 </p><p>
 Can the router itself measure this? If so, could a transport be selected based 
 on measured performance? (i.e. if an SSU connection to a peer is dropping an 
 unreasonable number of messages, prefer NTCP when sending to that peer)
 </p><p>
 Yeah, it currently uses that stat right now as a poor-man's MTU detection (if 
 the retransmission rate is high, it uses the small packet size, but if its low, 
 it uses the large packet size). We tried a few things when first introducing 
 NTCP (and when first moving away from the original TCP transport) that would 
 prefer SSU but fail that transport for a peer easily, causing it to fall back 
 on NTCP. However, there's certainly more that could be done in that regard, 
 though it gets complicated quickly (how/when to adjust/reset the bids, whether 
 to share these preferences across multiple peers or not, whether to share it 
 across multiple sessions with the same peer (and for how long), etc).
 <h3>Response by foofighter</h3>
 Posted to new Syndie, 2007-03-26
 <p>
 If I've understood things right, the primary reason in favor of TCP (in 
 general, both the old and new variety) was that you needn't worry about coding 
 a good TCP stack. Which ain't impossibly hard to get right... just that 
 existing TCP stacks have a 20 year lead.
 </p><p>
 AFAIK, there hasn't been much deep theory behind the preference of TCP versus 
 UDP, except the following considerations:
 <ul>
 <li>
 A TCP-only network is very dependent on reachable peers (those who can forward 
 incoming connections through their NAT)
 <li>
 Still even if reachable peers are rare, having them be high capacity somewhat 
 alleviates the topological scarcity issues
 <li>
 UDP allows for "NAT hole punching" which lets people be "kind of 
 pseudo-reachable" (with the help of introducers) who could otherwise only 
 connect out
 <li>
 The "old" TCP transport implementation required lots of threads, which was a 
 performance killer, while the "new" TCP transport does well with few threads
 <li>
 Routers of set A crap out when saturated with UDP. Routers of set B crap out 
 when saturated with TCP.
 <li>
 It "feels" (as in, there are some indications but no scientific data or 
 quality statistics) that A is more widely deployed than B
 <li>
 Some networks carry non-DNS UDP datagrams with an outright shitty quality, 
 while still somewhat bothering to carry TCP streams.
 </ul>
 </p><p>
 On that background, a small diversity of transports (as many as needed, but not 
 more) appears sensible in either case. Which should be the main transport, 
 depends on their performance-wise. I've seen nasty stuff on my line when I 
 tried to use its full capacity with UDP. Packet losses on the level of 35%.
 </p><p>
 We could definitely try playing with UDP versus TCP priorities, but I'd urge 
 caution in that. I would urge that they not be changed too radically all at 
 once, or it might break things.
 </p>
 <h3>Response by zzz</h3>
 Posted to new Syndie, 2007-03-27
 <p>
 <i>
 AFAIK, there hasn't been much deep theory behind the preference of TCP versus 
 UDP, except the following considerations:
 </i>
 </p><p>
 These are all valid issues. However you are considering the two protocols in 
 isolation, whether than thinking about what transport protocol is best for a 
 particular higher-level protocol (i.e. streaming lib or not).
 </p><p>
 What I'm saying is you have to take the streaming lib into consideration.
 So either shift the preferences for everybody or treat streaming lib traffic 
 differently.
 That's what my proposal 1B) is talking about - have a different preference for 
 streaming-lib traffic than for non streaming-lib traffic (for example tunnel 
 build messages).
 </p><p>
 <i>
 On that background, a small diversity of transports (as many as needed, but 
 not more) appears sensible in either case. Which should be the main transport, 
 depends on their performance-wise. I've seen nasty stuff on my line when I 
 tried to use its full capacity with UDP. Packet losses on the level of 35%.
 </i>
 </p><p>
 Agreed. The new .28 may have made things better for packet loss over UDP, or 
 maybe not.
 One important point - the transport code does remember failures of a transport. 
 So if UDP is the preferred transport, it will try it first, but if it fails for 
 a particular destination, the next attempt for that destination it will try 
 NTCP rather than trying UDP again.
 </p><p>
 <i>
 We could definitely try playing with UDP versus TCP priorities, but I'd urge 
 caution in that. I would urge that they not be changed too radically all at 
 once, or it might break things.
 </i>
 </p><p>
 We have four tuning knobs - the four bid values (SSU and NTCP, for 
 already-connected and not-already-connected).
 We could make SSU be preferred over NTCP only if both are connected, for 
 example, but try NTCP first if neither transport is connected.
 </p><p>
 The other way to do it gradually is only shifting the streaming lib traffic 
 (the 1B proposal) however that could be hard and may have anonymity 
 implications, I don't know. Or maybe shift the traffic only for the first 
 outbound hop (i.e. don't propagate the flag to the next router), which gives 
 you only partial benefit but might be more anonymous and easier.
 </p>
 <h3>Results of the Discussion</h3>
 ... and other related changes in the same timeframe (2007):
 <ul>
 <li>
 Significant tuning of the streaming lib parameters,
 greatly increasing outbound performance, was implemented in 0.6.1.28
 <li>
 Priority sending for NTCP was implemented in 0.6.1.28
 <li>
 Priority sending for SSU was implemented by zzz but was never checked in
 <li>
 The advanced transport bid control
 i2np.udp.preferred was implemented in 0.6.1.29.
 <li>
 Pushback for NTCP was implemented in 0.6.1.30, disabled in 0.6.1.31 due to anonymity concerns,
 and re-enabled with improvements to address those concerns in 0.6.1.32.
 <li>
 None of zzz's proposals 1-5 have been implemented.
 </ul>
 {% endblock %}
--- a/www.i2p2/pages/ntcp_discussion.html
+++ b/www.i2p2/pages/ntcp_discussion.html
@ -0,0 +1,559 @@
 {% extends "_layout.html" %}
 {% block title %}NTCP Discussion{% endblock %}
 {% block content %}
 Following is a discussion about NTCP that took place in March 2007.
 It has not been updated to reflect current implementation.
 For the current NTCP specification see <a href="ntcp.html">the main NTCP page</a>.
 <h2>NTCP vs. SSU Discussion, March 2007</h2>
 <h3>NTCP questions</h3>
 (adapted from an IRC discussion between zzz and cervantes)
 <br />
 Why is NTCP preferred over SSU, doesn't NTCP have higher overhead and latency?
 It has better reliability.
 <br />
 Doesn't streaming lib over NTCP suffer from classic TCP-over-TCP issues?
 What if we had a really simple UDP transport for streaming-lib-originated traffic?
 I think SSU was meant to be the so-called really simple UDP transport - but it just proved too unreliable.
 <h3>"NTCP Considered Harmful" Analysis by zzz</h3>
 Posted to new Syndie, 2007-03-25.
 This was posted to stimulate discussion, don't take it too seriously.
 <p>
 Summary: NTCP has higher latency and overhead than SSU, and is more likely to 
 collapse when used with the streaming lib. However, traffic is routed with a 
 preference for NTCP over SSU and this is currently hardcoded.
 </p>
 <h4>Discussion</h4>
 <p>
 We currently have two transports, NTCP and SSU. As currently implemented, NTCP 
 has lower "bids" than SSU so it is preferred, except for the case where there 
 is an established SSU connection but no established NTCP connection for a peer.
 </p><p>
 SSU is similar to NTCP in that it implements acknowledgments, timeouts, and 
 retransmissions. However SSU is I2P code with tight constraints on the 
 timeouts and available statistics on round trip times, retransmissions, etc. 
 NTCP is based on Java NIO TCP, which is a black box and presumably implements 
 RFC standards, including very long maximum timeouts.
 </p><p>
 The majority of traffic within I2P is streaming-lib originated (HTTP, IRC, 
 Bittorrent) which is our implementation of TCP. As the lower-level transport is 
 generally NTCP due to the lower bids, the system is subject to the well-known 
 and dreaded problem of TCP-over-TCP 
 http://sites.inka.de/~W1011/devel/tcp-tcp.html , where both the higher and 
 lower layers of TCP are doing retransmissions at once, leading to collapse.
 </p><p>
 Unlike in the PPP over SSH scenario described in the link above, we have 
 several hops for the lower layer, each covered by a NTCP link. So each NTCP 
 latency is generally much less than the higher-layer streaming lib latency. 
 This lessens the chance of collapse.
 </p><p>
 Also, the probabilities of collapse are lessened when the lower-layer TCP is 
 tightly constrained with low timeouts and number of retransmissions compared to 
 the higher layer.
 </p><p>
 The .28 release increased the maximum streaming lib timeout from 10 sec to 45 
 sec which greatly improved things. The SSU max timeout is 3 sec. The NTCP max 
 timeout is presumably at least 60 sec, which is the RFC recommendation. There 
 is no way to change NTCP parameters or monitor performance. Collapse of the 
 NTCP layer is [editor: text lost]. Perhaps an external tool like tcpdump would help.
 </p><p>
 However, running .28, the i2psnark reported upstream does not generally stay at 
 a high level. It often goes down to 3-4 KBps before climbing back up. This is a 
 signal that there are still collapses.
 </p><p>
 SSU is also more efficient. NTCP has higher overhead and probably higher round 
 trip times. when using NTCP the ratio of (tunnel output) / (i2psnark data 
 output) is at least 3.5 : 1. Running an experiment where the code was modified 
 to prefer SSU (the config option i2np.udp.alwaysPreferred has no effect in the 
 current code), the ratio reduced to about 3 : 1, indicating better efficiency.
 </p><p>
 As reported by streaming lib stats, things were much improved - lifetime window 
 size up from 6.3 to 7.5, RTT down from 11.5s to 10s, sends per ack down from 
 1.11 to 1.07.
 </p><p>
 That this was quite effective was surprising, given that we were only changing 
 the transport for the first of 3 to 5 total hops the outbound messages would 
 take.
 </p><p>
 The effect on outbound i2psnark speeds wasn't clear due to normal variations. 
 Also for the experiment, inbound NTCP was disabled. The effect on inbound 
 speeds on i2psnark was not clear.
 </p>
 <h4>Proposals</h4>
 <ul>
 <li>
 1A)
 This is easy -
 We should flip the bid priorities so that SSU is preferred for all traffic, if 
 we can do this without causing all sorts of other trouble. This will fix the 
 i2np.udp.alwaysPreferred configuration option so that it works (either as true 
 or false).
 <li>
 1B)
 Alternative to 1A), not so easy -
 If we can mark traffic without adversely affecting our anonymity goals, we 
 should identify streaming-lib generated traffic and have SSU generate a low bid 
 for that traffic. This tag will have to go with the message through each hop
 so that the forwarding routers also honor the SSU preference.
 <li>
 2)
 Bounding SSU even further (reducing maximum retransmissions from the current 
 10) is probably wise to reduce the chance of collapse.
 <li>
 3)
 We need further study on the benefits vs. harm of a semi-reliable protocol 
 underneath the streaming lib. Are retransmissions over a single hop beneficial 
 and a big win or are they worse than useless?
 We could do a new SUU (secure unreliable UDP) but probably not worth it. We 
 could perhaps add a no-ack-required message type in SSU if we don't want any 
 retransmissions at all of streaming-lib traffic. Are tightly bounded 
 retransmissions desirable?
 <li>
 4)
 The priority sending code in .28 is only for NTCP. So far my testing hasn't 
 shown much use for SSU priority as the messages don't queue up long enough for 
 priorities to do any good. But more testing needed.
 <li>
 5)
 The new streaming lib max timeout of 45s is probably still too low.
 The TCP RFC says 60s. It probably shouldn't be shorter than the underlying NTCP max timeout (presumably 60s).
 </ul>
 <h3>Response by jrandom</h3>
 Posted to new Syndie, 2007-03-27
 <p>
 On the whole, I'm open to experimenting with this, though remember why NTCP is 
 there in the first place - SSU failed in a congestion collapse. NTCP "just 
 works", and while 2-10% retransmission rates can be handled in normal 
 single-hop networks, that gives us a 40% retransmission rate with 2 hop 
 tunnels. If you loop in some of the measured SSU retransmission rates we saw 
 back before NTCP was implemented (10-30+%), that gives us an 83% retransmission 
 rate. Perhaps those rates were caused by the low 10 second timeout, but 
 increasing that much would bite us (remember, multiply by 5 and you've got half 
 the journey).
 </p><p>
 Unlike TCP, we have no feedback from the tunnel to know whether the message 
 made it - there are no tunnel level acks. We do have end to end ACKs, but only 
 on a small number of messages (whenever we distribute new session tags) - out 
 of the 1,553,591 client messages my router sent, we only attempted to ACK 
 145,207 of them. The others may have failed silently or succeeded perfectly.
 </p><p>
 I'm not convinced by the TCP-over-TCP argument for us, especially split across 
 the various paths we transfer down. Measurements on I2P can convince me 
 otherwise, of course.
 </p><p>
 <i>
 The NTCP max timeout is presumably at least 60 sec, which is the RFC 
 recommendation. There is no way to change NTCP parameters or monitor 
 performance.
 </i>
 </p><p>
 True, but net connections only get up to that level when something really bad 
 is going on - the retransmission timeout on TCP is often on the order of tens 
 or hundreds of milliseconds. As foofighter points out, they've got 20+ years 
 experience and bugfixing in their TCP stacks, plus a billion dollar industry 
 optimizing hardware and software to perform well according to whatever it is 
 they do.
 </p><p>
 <i>
 NTCP has higher overhead and probably higher round trip times. when using NTCP 
 the ratio of (tunnel output) / (i2psnark data output) is at least 3.5 : 1. 
 Running an experiment where the code was modified to prefer SSU (the config 
 option i2np.udp.alwaysPreferred has no effect in the current code), the ratio 
 reduced to about 3 : 1, indicating better efficiency.
 </i>
 </p><p>
 This is very interesting data, though more as a matter of router congestion 
 than bandwidth efficiency - you'd have to compare 3.5*$n*$NTCPRetransmissionPct 
 ./. 3.0*$n*$SSURetransmissionPct. This data point suggests there's something in 
 the router that leads to excess local queuing of messages already being 
 transferred.
 </p><p>
 <i>
 lifetime window size up from 6.3 to 7.5, RTT down from 11.5s to 10s, sends per 
 ACK down from 1.11 to 1.07.
 </i>
 </p><p>
 Remember that the sends-per-ACK is only a sample not a full count (as we don't 
 try to ACK every send). Its not a random sample either, but instead samples 
 more heavily periods of inactivity or the initiation of a burst of activity - 
 sustained load won't require many ACKs.
 </p><p>
 Window sizes in that range are still woefully low to get the real benefit of 
 AIMD, and still too low to transmit a single 32KB BT chunk (increasing the 
 floor to 10 or 12 would cover that).
 </p><p>
 Still, the wsize stat looks promising - over how long was that maintained?
 </p><p>
 Actually, for testing purposes, you may want to look at 
 StreamSinkClient/StreamSinkServer or even TestSwarm in 
 apps/ministreaming/java/src/net/i2p/client/streaming/ - StreamSinkClient is a 
 CLI app that sends a selected file to a selected destination and 
 StreamSinkServer creates a destination and writes out any data sent to it 
 (displaying size and transfer time). TestSwarm combines the two - flooding 
 random data to whomever it connects to. That should give you the tools to 
 measure sustained throughput capacity over the streaming lib, as opposed to BT 
 choke/send.
 </p><p>
 <i>
 1A)
 This is easy -
 We should flip the bid priorities so that SSU is preferred for all traffic, if 
 we can do this without causing all sorts of other trouble. This will fix the 
 i2np.udp.alwaysPreferred configuration option so that it works (either as true 
 or false).
 </i>
 </p><p>
 Honoring i2np.udp.alwaysPreferred is a good idea in any case - please feel free 
 to commit that change. Lets gather a bit more data though before switching the 
 preferences, as NTCP was added to deal with an SSU-created congestion collapse.
 </p><p>
 <i>
 1B)
 Alternative to 1A), not so easy -
 If we can mark traffic without adversely affecting our anonymity goals, we 
 should identify streaming-lib generated traffic
 and have SSU generate a low bid for that traffic. This tag will have to go with 
 the message through each hop
 so that the forwarding routers also honor the SSU preference.
 </i>
 </p><p>
 In practice, there are three types of traffic - tunnel building/testing, netDb 
 query/response, and streaming lib traffic. The network has been designed to 
 make differentiating those three very hard.
 </p><p>
 <i>
 2)
 Bounding SSU even further (reducing maximum retransmissions from the current 
 10) is probably wise to reduce the chance of collapse.
 </i>
 </p><p>
 At 10 retransmissions, we're up shit creek already, I agree. One, maybe two 
 retransmissions is reasonable, from a transport layer, but if the other side is 
 too congested to ACK in time (even with the implemented SACK/NACK capability), 
 there's not much we can do.
 </p><p>
 In my view, to really address the core issue we need to address why the router 
 gets so congested to ACK in time (which, from what I've found, is due to CPU 
 contention). Maybe we can juggle some things in the router's processing to make 
 the transmission of an already existing tunnel higher CPU priority than 
 decrypting a new tunnel request? Though we've got to be careful to avoid 
 starvation.
 </p><p>
 <i>
 3)
 We need further study on the benefits vs. harm of a semi-reliable protocol 
 underneath the streaming lib. Are retransmissions over a single hop beneficial 
 and a big win or are they worse than useless?
 We could do a new SUU (secure unreliable UDP) but probably not worth it. We 
 could perhaps add a no-ACK-required message type in SSU if we don't want any 
 retransmissions at all of streaming-lib traffic. Are tightly bounded 
 retransmissions desirable?
 </i>
 </p><p>
 Worth looking into - what if we just disabled SSU's retransmissions? It'd 
 probably lead to much higher streaming lib resend rates, but maybe not.
 </p><p>
 <i>
 4)
 The priority sending code in .28 is only for NTCP. So far my testing hasn't 
 shown much use for SSU priority as the messages don't queue up long enough for 
 priorities to do any good. But more testing needed.
 </i>
 </p><p>
 There's UDPTransport.PRIORITY_LIMITS and UDPTransport.PRIORITY_WEIGHT (honored 
 by TimedWeightedPriorityMessageQueue), but currently the weights are almost all 
 equal, so there's no effect. That could be adjusted, of course (but as you 
 mention, if there's no queuing, it doesn't matter).
 </p><p>
 <i>
 5)
 The new streaming lib max timeout of 45s is probably still too low. The TCP RFC 
 says 60s. It probably shouldn't be shorter than the underlying NTCP max timeout 
 (presumably 60s).
 </i>
 </p><p>
 That 45s is the max retransmission timeout of the streaming lib though, not the 
 stream timeout. TCP in practice has retransmission timeouts orders of magnitude 
 less, though yes, can get to 60s on links running through exposed wires or 
 satellite transmissions ;) If we increase the streaming lib retransmission 
 timeout to e.g. 75 seconds, we could go get a beer before a web page loads 
 (especially assuming less than a 98% reliable transport). That's one reason we 
 prefer NTCP.
 </p>
 <h3>Response by zzz</h3>
 Posted to new Syndie, 2007-03-31
 <p>
 <i>
 At 10 retransmissions, we're up shit creek already, I agree. One, maybe two 
 retransmissions is reasonable, from a transport layer, but if the other side is 
 too congested to ACK in time (even with the implemented SACK/NACK capability), 
 there's not much we can do.
 <br>
 In my view, to really address the core issue we need to address why the 
 router gets so congested to ACK in time (which, from what I've found, is due to 
 CPU contention). Maybe we can juggle some things in the router's processing to 
 make the transmission of an already existing tunnel higher CPU priority than 
 decrypting a new tunnel request? Though we've got to be careful to avoid 
 starvation.
 </i>
 </p><p>
 One of my main stats-gathering techniques is turning on 
 net.i2p.client.streaming.ConnectionPacketHandler=DEBUG and watching the RTT 
 times and window sizes as they go by. To overgeneralize for a moment, it's 
 common to see 3 types of connections: ~4s RTT, ~10s RTT, and ~30s RTT. Trying 
 to knock down the 30s RTT connections is the goal. If CPU contention is the 
 cause then maybe some juggling will do it.
 </p><p>
 Reducing the SSU max retrans from 10 is really just a stab in the dark as we 
 don't have good data on whether we are collapsing, having TCP-over-TCP issues, 
 or what, so more data is needed.
 </p><p>
 <i>
 Worth looking into - what if we just disabled SSU's retransmissions? It'd 
 probably lead to much higher streaming lib resend rates, but maybe not.
 </i>
 </p><p>
 What I don't understand, if you could elaborate, are the benefits of SSU 
 retransmissions for non-streaming-lib traffic. Do we need tunnel messages (for 
 example) to use a semi-reliable transport or can they use an unreliable or 
 kinda-sorta-reliable transport (1 or 2 retransmissions max, for example)? In 
 other words, why semi-reliability?
 </p><p>
 <i>
 (but as you mention, if there's no queuing, it doesn't matter).
 </i>
 </p><p>
 I implemented priority sending for UDP but it kicked in about 100,000 times 
 less often than the code on the NTCP side. Maybe that's a clue for further 
 investigation or a hint - I don't understand why it would back up that much 
 more often on NTCP, but maybe that's a hint on why NTCP performs worse.
 </p>
 <h3>Question answered by jrandom</h3>
 Posted to new Syndie, 2007-03-31
 <p>
 measured SSU retransmission rates we saw back before NTCP was implemented 
 (10-30+%)
 </p><p>
 Can the router itself measure this? If so, could a transport be selected based 
 on measured performance? (i.e. if an SSU connection to a peer is dropping an 
 unreasonable number of messages, prefer NTCP when sending to that peer)
 </p><p>
 Yeah, it currently uses that stat right now as a poor-man's MTU detection (if 
 the retransmission rate is high, it uses the small packet size, but if its low, 
 it uses the large packet size). We tried a few things when first introducing 
 NTCP (and when first moving away from the original TCP transport) that would 
 prefer SSU but fail that transport for a peer easily, causing it to fall back 
 on NTCP. However, there's certainly more that could be done in that regard, 
 though it gets complicated quickly (how/when to adjust/reset the bids, whether 
 to share these preferences across multiple peers or not, whether to share it 
 across multiple sessions with the same peer (and for how long), etc).
 <h3>Response by foofighter</h3>
 Posted to new Syndie, 2007-03-26
 <p>
 If I've understood things right, the primary reason in favor of TCP (in 
 general, both the old and new variety) was that you needn't worry about coding 
 a good TCP stack. Which ain't impossibly hard to get right... just that 
 existing TCP stacks have a 20 year lead.
 </p><p>
 AFAIK, there hasn't been much deep theory behind the preference of TCP versus 
 UDP, except the following considerations:
 <ul>
 <li>
 A TCP-only network is very dependent on reachable peers (those who can forward 
 incoming connections through their NAT)
 <li>
 Still even if reachable peers are rare, having them be high capacity somewhat 
 alleviates the topological scarcity issues
 <li>
 UDP allows for "NAT hole punching" which lets people be "kind of 
 pseudo-reachable" (with the help of introducers) who could otherwise only 
 connect out
 <li>
 The "old" TCP transport implementation required lots of threads, which was a 
 performance killer, while the "new" TCP transport does well with few threads
 <li>
 Routers of set A crap out when saturated with UDP. Routers of set B crap out 
 when saturated with TCP.
 <li>
 It "feels" (as in, there are some indications but no scientific data or 
 quality statistics) that A is more widely deployed than B
 <li>
 Some networks carry non-DNS UDP datagrams with an outright shitty quality, 
 while still somewhat bothering to carry TCP streams.
 </ul>
 </p><p>
 On that background, a small diversity of transports (as many as needed, but not 
 more) appears sensible in either case. Which should be the main transport, 
 depends on their performance-wise. I've seen nasty stuff on my line when I 
 tried to use its full capacity with UDP. Packet losses on the level of 35%.
 </p><p>
 We could definitely try playing with UDP versus TCP priorities, but I'd urge 
 caution in that. I would urge that they not be changed too radically all at 
 once, or it might break things.
 </p>
 <h3>Response by zzz</h3>
 Posted to new Syndie, 2007-03-27
 <p>
 <i>
 AFAIK, there hasn't been much deep theory behind the preference of TCP versus 
 UDP, except the following considerations:
 </i>
 </p><p>
 These are all valid issues. However you are considering the two protocols in 
 isolation, whether than thinking about what transport protocol is best for a 
 particular higher-level protocol (i.e. streaming lib or not).
 </p><p>
 What I'm saying is you have to take the streaming lib into consideration.
 So either shift the preferences for everybody or treat streaming lib traffic 
 differently.
 That's what my proposal 1B) is talking about - have a different preference for 
 streaming-lib traffic than for non streaming-lib traffic (for example tunnel 
 build messages).
 </p><p>
 <i>
 On that background, a small diversity of transports (as many as needed, but 
 not more) appears sensible in either case. Which should be the main transport, 
 depends on their performance-wise. I've seen nasty stuff on my line when I 
 tried to use its full capacity with UDP. Packet losses on the level of 35%.
 </i>
 </p><p>
 Agreed. The new .28 may have made things better for packet loss over UDP, or 
 maybe not.
 One important point - the transport code does remember failures of a transport. 
 So if UDP is the preferred transport, it will try it first, but if it fails for 
 a particular destination, the next attempt for that destination it will try 
 NTCP rather than trying UDP again.
 </p><p>
 <i>
 We could definitely try playing with UDP versus TCP priorities, but I'd urge 
 caution in that. I would urge that they not be changed too radically all at 
 once, or it might break things.
 </i>
 </p><p>
 We have four tuning knobs - the four bid values (SSU and NTCP, for 
 already-connected and not-already-connected).
 We could make SSU be preferred over NTCP only if both are connected, for 
 example, but try NTCP first if neither transport is connected.
 </p><p>
 The other way to do it gradually is only shifting the streaming lib traffic 
 (the 1B proposal) however that could be hard and may have anonymity 
 implications, I don't know. Or maybe shift the traffic only for the first 
 outbound hop (i.e. don't propagate the flag to the next router), which gives 
 you only partial benefit but might be more anonymous and easier.
 </p>
 <h3>Results of the Discussion</h3>
 ... and other related changes in the same timeframe (2007):
 <ul>
 <li>
 Significant tuning of the streaming lib parameters,
 greatly increasing outbound performance, was implemented in 0.6.1.28
 <li>
 Priority sending for NTCP was implemented in 0.6.1.28
 <li>
 Priority sending for SSU was implemented by zzz but was never checked in
 <li>
 The advanced transport bid control
 i2np.udp.preferred was implemented in 0.6.1.29.
 <li>
 Pushback for NTCP was implemented in 0.6.1.30, disabled in 0.6.1.31 due to anonymity concerns,
 and re-enabled with improvements to address those concerns in 0.6.1.32.
 <li>
 None of zzz's proposals 1-5 have been implemented.
 </ul>
 {% endblock %}
--- a/www.i2p2/pages/tunnel-alt-creation.html
+++ b/www.i2p2/pages/tunnel-alt-creation.html
@ -2,32 +2,51 @@
 {% block title %}Tunnel Creation{% endblock %}
 {% block content %}
-<b>Note: This documents the current tunnel build implementation as of release 0.6.1.10.</b>
+This page documents the current tunnel build implementation.
-<br>
+Updated August 2010 for release 0.8
 <pre>
 1) <a href="#tunnelCreate.overview">Tunnel creation</a>
 1.1) <a href="#tunnelCreate.requestRecord">Tunnel creation request record</a>
 1.2) <a href="#tunnelCreate.hopProcessing">Hop processing</a>
 1.3) <a href="#tunnelCreate.replyRecord">Tunnel creation reply record</a>
 1.4) <a href="#tunnelCreate.requestPreparation">Request preparation</a>
 1.5) <a href="#tunnelCreate.requestDelivery">Request delivery</a>
 1.6) <a href="#tunnelCreate.endpointHandling">Endpoint handling</a>
 1.7) <a href="#tunnelCreate.replyProcessing">Reply processing</a>
 2) <a href="#tunnelCreate.notes">Notes</a>
 </pre>
-<h2 id="tunnelCreate.overview">1) Tunnel creation encryption:</h2>
+<h2 id="tunnelCreate.overview">Tunnel Creation Specification</h2>
 <p>
 This document specifies the details of the encrypted tunnel build messages
 used to create tunnels using a "non-interactive telescoping" method.
 See <a href="tunnel-alt.html">the tunnel build document</a>
 for an overview of the process, including peer selection and ordering methods.
 <p>The tunnel creation is accomplished by a single message passed along
 the path of peers in the tunnel, rewritten in place, and transmitted
 back to the tunnel creator.  This single tunnel message is made up
-of a fixed number of records (8) - one for each potential peer in
+of a variable number of records (up to 8) - one for each potential peer in
 the tunnel.   Individual records are asymmetrically encrypted to be
 read only by a specific peer along the path, while an additional
 symmetric layer of encryption is added at each hop so as to expose
 the asymmetrically encrypted record only at the appropriate time.</p>
-<h3 id="tunnelCreate.requestRecord">1.1) Tunnel creation request record</h3>
+<h3 id="number">Number of Records</h3>
 Not all records must contain valid data.
 The build message for a 3-hop tunnel, for example, may contain more records
 to hide the actual length of the tunnel from the participants.
 There are two build message types. The original
 <a href="i2np_spec.html#msg_TunnelBuild">Tunnel Build Message</a> (TBM)
 contains 8 records, which is more than enough for any practical tunnel length.
 The recently-implemented
 <a href="i2np_spec.html#msg_VariableTunnelBuild">Variable Tunnel Build Message</a> (VTBM)
 contains 1 to 8 records. The originator may trade off the size of the message
 with the desired amount of tunnel length obfuscation.
 <p>
 In the current network, most tunnels are 2 or 3 hops long.
 The current implementation uses a 5-record VTBM to build tunnels of 4 hops or less,
 and the 8-record TBM for longer tunnels.
 The 5-record VTBM (which fits in 3 1KB tunnel messaages) reduces network traffic
 and increases  build sucess rate, because larger messages are less likely to be dropped.
 <p>
 The reply message must be the same type and length as the build message.
 <h3 id="tunnelCreate.requestRecord">Request Record Specification</h3>
 Also specified in the
 <a href="i2np_spec.html#struct_BuildRequestRecord">I2NP Specification</a>
 <p>Cleartext of the record, visible only to the hop being asked:</p><pre>
  bytes     0-3: tunnel ID to receive messages as
@ -49,49 +68,79 @@ endpoint, they specify where the rewritten tunnel creation reply
 message should be sent.  In addition, the next message ID specifies the
 message ID that the message (or reply) should use.</p>
-<p>The flags field currently has two bits defined:</p><pre>
+<p>The flags field contains the following:
- bit 0: if set, allow messages from anyone
+<pre>
- bit 1: if set, allow messages to anyone, and send the reply to the
+ Bit order: 76543210 (bit 7 is MSB)
-        specified next hop in a tunnel message</pre>
+ bit 7: if set, allow messages from anyone
 bit 6: if set, allow messages to anyone, and send the reply to the
        specified next hop in a tunnel message
 bits 5-0: Undefined
 </pre>
-<p>That cleartext record is ElGamal 2048 encrypted with the hop's
+Bit 7 indicates that the hop will be an inbound gateway (IBGW).
 Bit 6 indicates that the hop will be an outbound endpoint (OBEP).
 <h4 id="encryption">Request Encryption</h4>
 <p>That cleartext record is <a href="how_cryptography.html#elgamal">ElGamal 2048 encrypted</a> with the hop's
 public encryption key and formatted into a 528 byte record:</p><pre>
-  bytes   0-15: SHA-256-128 of the current hop's router identity
+  bytes   0-15: First 16 bytes of the SHA-256 of the current hop's router identity
  bytes 16-527: ElGamal-2048 encrypted request record</pre>
 <p>Since the cleartext uses the full field, there is no need for
 additional padding beyond <code>SHA256(cleartext) + cleartext</code>.</p>
-<h3 id="tunnelCreate.hopProcessing">1.2) Hop processing</h3>
+<h3 id="tunnelCreate.hopProcessing">Hop Processing and Encryption</h3>
-<p>When a hop receives a TunnelBuildMessage, it looks through the 8
+<p>When a hop receives a TunnelBuildMessage, it looks through the
 records contained within it for one starting with their own identity
 hash (trimmed to 8 bytes).  It then decrypts the ElGamal block from
 that record and retrieves the protected cleartext.  At that point,
 they make sure the tunnel request is not a duplicate by feeding the 
-AES-256 reply key into a bloom filter and making sure the request
+AES-256 reply key into a bloom filter.
-time is within an hour of current.  Duplicates or invalid requests
+Duplicates or invalid requests
 are dropped.</p>
 <p>After deciding whether they will agree to participate in the tunnel
 or not, they replace the record that had contained the request with
-an encrypted reply block.  All other records are AES-256/CBC
+an encrypted reply block.  All other records are <a href="how_cryptography.html#AES">AES-256/CBC
-encrypted with the included reply key and IV (though each is
+encrypted</a> with the included reply key and IV (though each is
 encrypted separately, rather than chained across records).</p>
-<h3 id="tunnelCreate.replyRecord">1.3) Tunnel creation reply record</h3>
+<h4 id="tunnelCreate.replyRecord">Reply Record Specification</h4>
 <p>After the current hop reads their record, they replace it with a
 reply record stating whether or not they agree to participate in the
 tunnel, and if they do not, they classify their reason for
 rejection.  This is simply a 1 byte value, with 0x0 meaning they
 agree to participate in the tunnel, and higher values meaning higher
-levels of rejection.  The reply is encrypted with the AES session
+levels of rejection.
-key delivered to it in the encrypted block, padded with random data
+<p>
-until it reaches the full record size:</p><pre>
+The following rejection codes are defined:
-  AES-256-CBC(SHA-256(padding+status) + padding + status, key, IV)</pre>
+<ul>
 <li>
 TUNNEL_REJECT_PROBABALISTIC_REJECT = 10
 <li>
 TUNNEL_REJECT_TRANSIENT_OVERLOAD = 20
 <li>
 TUNNEL_REJECT_BANDWIDTH = 30
 <li>
 TUNNEL_REJECT_CRIT = 50
 </ul>
 To hide other causes, such as router shutdown, from peers, the current implementation
 uses TUNNEL_REJECT_BANDWIDTH for almost all rejections.
-<h3 id="tunnelCreate.requestPreparation">1.4) Request preparation</h3>
+<p>
  The reply is encrypted with the AES session
 key delivered to it in the encrypted block, padded with 527 bytes of random data
 to reach the full record size.
 The padding is placed before the status byte:
 </p><pre>
  AES-256-CBC(SHA-256(padding+status) + padding + status, key, IV)</pre>
 This is also described in the
 <a href="i2np_spec.html#msg_TunnelBuildReply">I2NP spec</a>.
 <h3 id="tunnelCreate.requestPreparation">Request Preparation</h3>
 <p>When building a new request, all of the records must first be 
 built and asymmetrically encrypted.  Each record should then be
@ -103,31 +152,49 @@ right hop after their predecessor encrypts it.</p>
 <p>The excess records not needed for individual requests are simply
 filled with random data by the creator.</p>
-<h3 id="tunnelCreate.requestDelivery">1.5) Request delivery</h3>
+<h3 id="tunnelCreate.requestDelivery">Request Delivery</h3>
 <p>For outbound tunnels, the delivery is done directly from the tunnel
 creator to the first hop, packaging up the TunnelBuildMessage as if
 the creator was just another hop in the tunnel.  For inbound
-tunnels, the delivery is done through an existing outbound tunnel
+tunnels, the delivery is done through an existing outbound tunnel.
-(and during startup, when no outbound tunnel exists yet, a fake 0
+The outbound tunnel is generally from the same pool as the new tunnel being built.
-hop outbound tunnel is used).</p>
+If no outbound tunnel is available in that pool, an outbound exploratory tunnel is used.
 At startup, when no outbound exploratory tunnel exists yet, a fake 0-hop
 outbound tunnel is used.</p>
-<h3 id="tunnelCreate.endpointHandling">1.6) Endpoint handling</h3>
+<h3 id="tunnelCreate.endpointHandling">Endpoint Handling</h3>
-<p>When the request reaches an outbound endpoint (as determined by the
+<p>
 For creation of an outbound tunnel,
 when the request reaches an outbound endpoint (as determined by the
 'allow messages to anyone' flag), the hop is processed as usual,
 encrypting a reply in place of the record and encrypting all of the
 other records, but since there is no 'next hop' to forward the
 TunnelBuildMessage on to, it instead places the encrypted reply
-records into a TunnelBuildReplyMessage and delivers it to the
+records into a
 <a href="i2np_spec.html#msg_TunnelBuildReply">TunnelBuildReplyMessage</a>
 or
 <a href="i2np_spec.html#msg_VariableTunnelBuildReply">VariableTunnelBuildReplyMessage</a>
 (the type of message and number of records must match that of the request)
 and delivers it to the
 reply tunnel specified within the request record.  That reply tunnel
 forwards the reply records down to the tunnel creator for
 processing, as below.</p>
-<p>When the request reaches the inbound endpoint (also known as the
+<p>The reply tunnel was specified by the creator as follows:
-tunnel creator), the router processes each of the replies, as below.</p>
+Generally it is an inbound tunnel from the same pool as the new outbound tunnel being built.
 If no inbound tunnel is available in that pool, an inbound exploratory tunnel is used.
 At startup, when no inbound exploratory tunnel exists yet, a fake 0-hop
 inbound tunnel is used.</p>
-<h3 id="tunnelCreate.replyProcessing">1.7) Reply processing</h3>
+<p>
 For creation of an inbound tunnel,
 when the request reaches the inbound endpoint (also known as the
 tunnel creator), there is no need to generate an explicit Reply Message, and
 the router processes each of the replies, as below.</p>
 <h3 id="tunnelCreate.replyProcessing">Reply Processing by the Request Creator</h3>
 <p>To process the reply records, the creator simply has to AES decrypt
 each record individually, using the reply key and IV of each hop in
@ -137,18 +204,37 @@ why they refuse.  If they all agree, the tunnel is considered
 created and may be used immediately, but if anyone refuses, the
 tunnel is discarded.</p>
-<h2 id="tunnelCreate.notes">2) Notes</h2>
+<p>
 The agreements and rejections are noted in each peer's
 <a href="how_peerselection.html">profile</a>, to be used in future assessments
 of peer tunnel capacity.
 <h2 id="tunnelCreate.notes">History and Notes</h2>
 <p>
 This strategy came about during a discussion on the I2P mailing list
    between Michael Rogers, Matthew Toseland (toad), and jrandom regarding
    the predecessor attack.  See: <ul>
    <li><a href="http://osdir.com/ml/network.i2p/2005-10/msg00138.html">Summary</a></li>
    <li><a href="http://osdir.com/ml/network.i2p/2005-10/msg00129.html">Reasoning</a></li>
    </ul></li>
 It was introduced in release 0.6.1.10 on 2006-02-16, which was the last time
 a non-backward-compatible change was made in I2P.
 </p>
 <p>
 Notes:
 <ul>
-<li>This does not prevent two hostile peers within a tunnel from
+<li>This design does not prevent two hostile peers within a tunnel from
 tagging one or more request or reply records to detect that they are
 within the same tunnel, but doing so can be detected by the tunnel
 creator when reading the reply, causing the tunnel to be marked as 
 invalid.</li>
-<li>This does not include a proof of work on the asymmetrically
+<li>This design does not include a proof of work on the asymmetrically
 encrypted section, though the 16 byte identity hash could be cut in
-half with the later replaced by a hashcash function of up to 2^64
+half with the latter replaced by a hashcash function of up to 2^64
-cost.  This will not immediately be pursued, however.</li>
+cost.</li>
-<li>This alone does not prevent two hostile peers within a tunnel from
+<li>This design alone does not prevent two hostile peers within a tunnel from
 using timing information to determine whether they are in the same
 tunnel.  The use of batched and synchronized request delivery
 could help (batching up requests and sending them off on the
@ -159,12 +245,34 @@ window would work (though doing that would require a high degree of
 clock synchronization).  Alternately, perhaps individual hops could
 inject a random delay before forwarding on the request?</li>
 <li>Are there any nonfatal methods of tagging the request?</li>
 <li>This strategy came about during a discussion on the I2P mailing list
    between Michael Rogers, Matthew Toseland (toad), and jrandom regarding
    the predecessor attack.  See: <ul>
    <li><a href="http://osdir.com/ml/network.i2p/2005-10/msg00138.html">Summary</a></li>
    <li><a href="http://osdir.com/ml/network.i2p/2005-10/msg00129.html">Reasoning</a></li>
    </ul></li>
 </ul>
 <h2 id="ref">References</h2>
 <ul>
 <li>
 <a href="http://prisms.cs.umass.edu/brian/pubs/wright-tissec.pdf">Predecessor 
 attack</a>
 <li>
 <a href="http://prisms.cs.umass.edu/brian/pubs/wright.tissec.2008.pdf">2008 
 update</a>
 </ul>
 <h2 id="future">Future Work</h2>
 <ul>
 <li>
 It appears that, in the current implementation, the originator leaves one record empty
 for itself, which is not necessary. Thus a message of n records can only build a
 tunnel of n-1 hops. This is to be researched and verified.
 If it is possible to use the remaining record without compromising anonymity,
 we should do so.
 <li>
 The usefulness of a timestamp with an hour resolution is questionable,
 and the constraint is not currently enforced.
 Therefore the request time field is unused.
 This should be researched and possibly changed.
 <li>
 Further analysis of possible tagging and timing attacks described in the above notes.
 </ul>
 {% endblock %}