- tunnel-alt-creation rework

- More how_crypto and i2np_spec fixups
- Quick NTCP fixup, move discussion to new page
This commit is contained in:
zzz
2010-08-04 14:19:34 +00:00
parent 78f90bab94
commit 5e1cff3fdc
5 changed files with 831 additions and 682 deletions

View File

@ -35,8 +35,8 @@ block is formatted (in network byte order):
<p> <p>
The H(data) is the SHA256 of the data that is encrypted in the ElGamal block, The H(data) is the SHA256 of the data that is encrypted in the ElGamal block,
and is preceded by a random nonzero byte. The data encrypted in the block and is preceded by a random nonzero byte. The data encrypted in the block
can be up to 222 bytes long. Specifically, see can be up to 223 bytes long. See
<a href="http://docs.i2p2.de/core/net/i2p/crypto/ElGamalEngine.html">[the code]</a>. <a href="http://docs.i2p2.de/core/net/i2p/crypto/ElGamalEngine.html">the ElGamal Javadoc</a>.
<p> <p>
ElGamal is never used on its own in I2P, but instead always as part of ElGamal is never used on its own in I2P, but instead always as part of
<a href="how_elgamalaes">ElGamal/AES+SessionTag</a>. <a href="how_elgamalaes">ElGamal/AES+SessionTag</a>.

View File

@ -174,7 +174,7 @@ iv_key :: SessionKey
reply_key :: SessionKey reply_key :: SessionKey
length -> 32 bytes length -> 32 bytes
reply_iv :: Integer reply_iv :: data
length -> 16 bytes length -> 16 bytes
flag :: Integer flag :: Integer
@ -182,6 +182,7 @@ flag :: Integer
request_time :: Integer request_time :: Integer
length -> 4 bytes length -> 4 bytes
Hours since the epoch, i.e. current time / 3600
send_message_id :: Integer send_message_id :: Integer
length -> 4 bytes length -> 4 bytes
@ -191,17 +192,27 @@ padding :: Data
source -> random source -> random
total length: 223
encrypted: encrypted:
toPeer :: Hash toPeer :: Hash
length -> 16 bytes length -> 16 bytes
encrypted_data :: ElGamal-2048 encrypted data encrypted_data :: ElGamal-2048 encrypted data
length -> 514 length -> 512
total length: 528
{% endfilter %} {% endfilter %}
</pre> </pre>
<h4>Notes</h4>
<p>
See also the <a href="tunnel-alt-creation.html">tunnel creation specification</a>.
</p>
<h3 id="struct_BuildResponseRecord">BuildResponseRecord</h3> <h3 id="struct_BuildResponseRecord">BuildResponseRecord</h3>
<pre> <pre>
{% filter escape %} {% filter escape %}
@ -224,9 +235,17 @@ byte 527 : reply
encrypted: encrypted:
bytes 0-527: AES-encrypted record(note: same size as BuildRequestRecord!) bytes 0-527: AES-encrypted record(note: same size as BuildRequestRecord!)
total length: 528
{% endfilter %} {% endfilter %}
</pre> </pre>
<h4>Notes</h4>
<p>
See also the <a href="tunnel-alt-creation.html">tunnel creation specification</a>.
</p>
<h2 id="messages">Messages</h2> <h2 id="messages">Messages</h2>
<table border=1> <table border=1>
@ -667,6 +686,11 @@ Total size: 8*528 = 4224 bytes
{% endfilter %} {% endfilter %}
</pre> </pre>
<h4>Notes</h4>
<p>
See also the <a href="tunnel-alt-creation.html">tunnel creation specification</a>.
</p>
<h3 id="msg_TunnelBuildReply">TunnelBuildReply</h3> <h3 id="msg_TunnelBuildReply">TunnelBuildReply</h3>
<pre> <pre>
@ -675,6 +699,11 @@ same format as TunnelBuild message
{% endfilter %} {% endfilter %}
</pre> </pre>
<h4>Notes</h4>
<p>
See also the <a href="tunnel-alt-creation.html">tunnel creation specification</a>.
</p>
<h3 id="msg_VariableTunnelBuild">VariableTunnelBuild</h3> <h3 id="msg_VariableTunnelBuild">VariableTunnelBuild</h3>
<pre> <pre>
{% filter escape %} {% filter escape %}
@ -697,9 +726,19 @@ Total size: 1 + $num*528
{% endfilter %} {% endfilter %}
</pre> </pre>
<h4>Notes</h4>
<p>
See also the <a href="tunnel-alt-creation.html">tunnel creation specification</a>.
</p>
<h3 id="msg_VariableTunnelBuildReply">VariableTunnelBuildReply</h3> <h3 id="msg_VariableTunnelBuildReply">VariableTunnelBuildReply</h3>
<pre> <pre>
{% filter escape %} {% filter escape %}
same format as VariableTunnelBuild message same format as VariableTunnelBuild message
{% endfilter %} {% endfilter %}
</pre> </pre>
<h4>Notes</h4>
<p>
See also the <a href="tunnel-alt-creation.html">tunnel creation specification</a>.
</p>

View File

@ -2,20 +2,25 @@
{% block title %}NTCP{% endblock %} {% block title %}NTCP{% endblock %}
{% block content %} {% block content %}
<h1>NTCP (NIO-based TCP)</h1> Updated August 2010 for release 0.8
<h2>NTCP (NIO-based TCP)</h2>
<p> <p>
NTCP was introduced in I2P 0.6.1.22. NTCP
It is a Java NIO-based transport, enabled by default for outbound is one of two <a href="transport.html">transports</a> currently implemented in I2P.
connections only. Those who configure their NAT/firewall to allow The other is <a href="udp.html">SSU</a>.
inbound connections and specify the external host and port NTCP
(dyndns/etc is okay) on /config.jsp can receive inbound connections. is a Java NIO-based transport
NTCP is NIO based, so it doesn't suffer from the 1 thread per connection issues of the old TCP transport. introduced in I2P release 0.6.1.22.
Java NIO (new I/O) does not suffer from the 1 thread per connection issues of the old TCP transport.
</p><p> </p><p>
As of 0.6.1.29, NTCP uses the IP/Port By default,
NTCP uses the IP/Port
auto-detected by SSU. When enabled on config.jsp, auto-detected by SSU. When enabled on config.jsp,
SSU will notify/restart NTCP when the external address changes. SSU will notify/restart NTCP when the external address changes
or when the firewall status changes.
Now you can enable inbound TCP without a static IP or dyndns service. Now you can enable inbound TCP without a static IP or dyndns service.
</p><p> </p><p>
@ -23,71 +28,47 @@ The NTCP code within I2P is relatively lightweight (1/4 the size of the SSU code
because it uses the underlying Java TCP transport. because it uses the underlying Java TCP transport.
</p> </p>
<h2>Transport Bids and Transport Comparison</h2>
<h2>NTCP Protocol Specification</h2>
<h3>Standard Message Format</h3>
<p> <p>
I2P supports multiple transports simultaneously. The NTCP transport sends individual I2NP messages AES/256/CBC encrypted with
A particular transport for an outbound connection is selected with "bids". a simple checksum. The unencrypted message is encoded as follows:
Each transport bids for the connection and the relative value of these bids
assigns the priority.
Transports may reply with different bids, depending on whether there is
already an established connection to the peer.
</p><p>
To compare the performance of UDP and NTCP,
you can adjust the value of i2np.udp.preferred in configadvanced.jsp
(introduced in I2P 0.6.1.29).
Possible settings are
"false" (default), "true", and "always".
Default setting results in same behavior as before
(NTCP is preferred unless it isn't established and UDP is established).
</p><p>
The table below shows the new bid values. A lower bid is a higher priority.
<p>
<table border=1>
<tr>
<td><td colspan=3>i2np.udp.preferred setting
<tr>
<td>Transport<td>false<td>true<td>always
<tr>
<td>NTCP Established<td>25<td>25<td>25
<tr>
<td>UDP Established<td>50<td>15<td>15
<tr>
<td>NTCP Not established<td>70<td>70<td>70
<tr>
<td>UDP Not established<td>1000<td>65<td>20
</table>
<h2>NTCP Transport Protocol</h2>
<pre> <pre>
* Coordinate the connection to a single peer.
*
* The NTCP transport sends individual I2NP messages AES/256/CBC encrypted with
* a simple checksum. The unencrypted message is encoded as follows:
* +-------+-------+--//--+---//----+-------+-------+-------+-------+ * +-------+-------+--//--+---//----+-------+-------+-------+-------+
* | sizeof(data) | data | padding | adler checksum of sz+data+pad | * | sizeof(data) | data | padding | Adler checksum of sz+data+pad |
* +-------+-------+--//--+---//----+-------+-------+-------+-------+ * +-------+-------+--//--+---//----+-------+-------+-------+-------+
* That message is then encrypted with the DH/2048 negotiated session key </pre>
* (station to station authenticated per the EstablishState class) using the That message is then encrypted with the DH/2048 negotiated session key
* last 16 bytes of the previous encrypted message as the IV. (station to station authenticated per the EstablishState class) using the
* last 16 bytes of the previous encrypted message as the IV.
* One special case is a metadata message where the sizeof(data) is 0. In </p>
* that case, the unencrypted message is encoded as:
<p>
0-15 bytes of padding are required to bring the total message length
(including the six size and checksum bytes) to a multiple of 16.
The maximum message size is currently 16 KB.
Therefore the maximum data size is currently 16 KB - 6, or 16378 bytes.
The minimum data size is 1.
</p>
<h3>Time Sync Message Format</h3>
<p>
One special case is a metadata message where the sizeof(data) is 0. In
that case, the unencrypted message is encoded as:
<pre>
* +-------+-------+-------+-------+-------+-------+-------+-------+ * +-------+-------+-------+-------+-------+-------+-------+-------+
* | 0 | timestamp in seconds | uninterpreted * | 0 | timestamp in seconds | uninterpreted
* +-------+-------+-------+-------+-------+-------+-------+-------+ * +-------+-------+-------+-------+-------+-------+-------+-------+
* uninterpreted | adler checksum of sz+data+pad | * uninterpreted | Adler checksum of bytes 0-11 |
* +-------+-------+-------+-------+-------+-------+-------+-------+ * +-------+-------+-------+-------+-------+-------+-------+-------+
*
*
</pre> </pre>
Total length: 16 bytes. The time sync message is sent at approximately 15 minute intervals.
<h3>Establishment Sequence</h3>
In the establish state, the following communication happens. In the establish state, the following communication happens.
There is a 2048-bit Diffie Hellman exchange. There is a 2048-bit Diffie Hellman exchange.
For more information see the <a href="how_cryptography.html#tcp">cryptography page</a>. For more information see the <a href="how_cryptography.html#tcp">cryptography page</a>.
@ -99,571 +80,33 @@ For more information see the <a href="how_cryptography.html#tcp">cryptography pa
* E(#+Alice.identity+tsA+padding+S(X+Y+Bob.identHash+tsA+tsB+padding), sk, hX_xor_Bob.identHash[16:31])---> * E(#+Alice.identity+tsA+padding+S(X+Y+Bob.identHash+tsA+tsB+padding), sk, hX_xor_Bob.identHash[16:31])--->
* <----------------------E(S(X+Y+Alice.identHash+tsA+tsB)+padding, sk, prev) * <----------------------E(S(X+Y+Alice.identHash+tsA+tsB)+padding, sk, prev)
</pre> </pre>
Todo: Explain this in words.
<h3>Check Connection Message</h3>
Alternately, when Bob receives a connection, it could be a Alternately, when Bob receives a connection, it could be a
check connection (perhaps prompted by Bob asking for someone check connection (perhaps prompted by Bob asking for someone
to verify his listener). to verify his listener).
It does not appear that 'check connection' is used. Check Connection is not currently used.
However, for the record, check connections are formatted as follows: However, for the record, check connections are formatted as follows.
<pre> A check info connection will receive 256 bytes containing:
* a check info connection will receive 256 bytes containing: <ul>
* - 32 bytes of uninterpreted, ignored data <li> 32 bytes of uninterpreted, ignored data
* - 1 byte size <li> 1 byte size
* - that many bytes making up the local router's IP address (as reached by the remote side) <li> that many bytes making up the local router's IP address (as reached by the remote side)
* - 2 byte port number that the local router was reached on <li> 2 byte port number that the local router was reached on
* - 4 byte i2p network time as known by the remote side (seconds since the epoch) <li> 4 byte i2p network time as known by the remote side (seconds since the epoch)
* - uninterpreted padding data, up to byte 223 <li> uninterpreted padding data, up to byte 223
* - xor of the local router's identity hash and the SHA256 of bytes 32 through bytes 223 <li> xor of the local router's identity hash and the SHA256 of bytes 32 through bytes 223
</ul>
</pre> </pre>
<h2>Discussion</h2>
Now on the <a href="ntcp_discussion.html">NTCP Discussion Page</a>.
<h2>NTCP vs. SSU Discussion, March 2007</h2> <h2><a name="future">Future Work</a></h2>
<h3>NTCP questions</h3> <p>The maximum message size should be increased to approximately 32 KB.
(adapted from an IRC discussion between zzz and cervantes)
<br />
Why is NTCP preferred over SSU, doesn't NTCP have higher overhead and latency?
It has better reliability.
<br />
Doesn't streaming lib over NTCP suffer from classic TCP-over-TCP issues?
What if we had a really simple UDP transport for streaming-lib-originated traffic?
I think SSU was meant to be the so-called really simple UDP transport - but it just proved too unreliable.
<h3>"NTCP Considered Harmful" Analysis by zzz</h3>
Posted to new Syndie, 2007-03-25.
This was posted to stimulate discussion, don't take it too seriously.
<p>
Summary: NTCP has higher latency and overhead than SSU, and is more likely to
collapse when used with the streaming lib. However, traffic is routed with a
preference for NTCP over SSU and this is currently hardcoded.
</p> </p>
<h4>Discussion</h4>
<p>
We currently have two transports, NTCP and SSU. As currently implemented, NTCP
has lower "bids" than SSU so it is preferred, except for the case where there
is an established SSU connection but no established NTCP connection for a peer.
</p><p>
SSU is similar to NTCP in that it implements acknowledgments, timeouts, and
retransmissions. However SSU is I2P code with tight constraints on the
timeouts and available statistics on round trip times, retransmissions, etc.
NTCP is based on Java NIO TCP, which is a black box and presumably implements
RFC standards, including very long maximum timeouts.
</p><p>
The majority of traffic within I2P is streaming-lib originated (HTTP, IRC,
Bittorrent) which is our implementation of TCP. As the lower-level transport is
generally NTCP due to the lower bids, the system is subject to the well-known
and dreaded problem of TCP-over-TCP
http://sites.inka.de/~W1011/devel/tcp-tcp.html , where both the higher and
lower layers of TCP are doing retransmissions at once, leading to collapse.
</p><p>
Unlike in the PPP over SSH scenario described in the link above, we have
several hops for the lower layer, each covered by a NTCP link. So each NTCP
latency is generally much less than the higher-layer streaming lib latency.
This lessens the chance of collapse.
</p><p>
Also, the probabilities of collapse are lessened when the lower-layer TCP is
tightly constrained with low timeouts and number of retransmissions compared to
the higher layer.
</p><p>
The .28 release increased the maximum streaming lib timeout from 10 sec to 45
sec which greatly improved things. The SSU max timeout is 3 sec. The NTCP max
timeout is presumably at least 60 sec, which is the RFC recommendation. There
is no way to change NTCP parameters or monitor performance. Collapse of the
NTCP layer is [editor: text lost]. Perhaps an external tool like tcpdump would help.
</p><p>
However, running .28, the i2psnark reported upstream does not generally stay at
a high level. It often goes down to 3-4 KBps before climbing back up. This is a
signal that there are still collapses.
</p><p>
SSU is also more efficient. NTCP has higher overhead and probably higher round
trip times. when using NTCP the ratio of (tunnel output) / (i2psnark data
output) is at least 3.5 : 1. Running an experiment where the code was modified
to prefer SSU (the config option i2np.udp.alwaysPreferred has no effect in the
current code), the ratio reduced to about 3 : 1, indicating better efficiency.
</p><p>
As reported by streaming lib stats, things were much improved - lifetime window
size up from 6.3 to 7.5, RTT down from 11.5s to 10s, sends per ack down from
1.11 to 1.07.
</p><p>
That this was quite effective was surprising, given that we were only changing
the transport for the first of 3 to 5 total hops the outbound messages would
take.
</p><p>
The effect on outbound i2psnark speeds wasn't clear due to normal variations.
Also for the experiment, inbound NTCP was disabled. The effect on inbound
speeds on i2psnark was not clear.
</p>
<h4>Proposals</h4>
<ul>
<li>
1A)
This is easy -
We should flip the bid priorities so that SSU is preferred for all traffic, if
we can do this without causing all sorts of other trouble. This will fix the
i2np.udp.alwaysPreferred configuration option so that it works (either as true
or false).
<li>
1B)
Alternative to 1A), not so easy -
If we can mark traffic without adversely affecting our anonymity goals, we
should identify streaming-lib generated traffic and have SSU generate a low bid
for that traffic. This tag will have to go with the message through each hop
so that the forwarding routers also honor the SSU preference.
<li>
2)
Bounding SSU even further (reducing maximum retransmissions from the current
10) is probably wise to reduce the chance of collapse.
<li>
3)
We need further study on the benefits vs. harm of a semi-reliable protocol
underneath the streaming lib. Are retransmissions over a single hop beneficial
and a big win or are they worse than useless?
We could do a new SUU (secure unreliable UDP) but probably not worth it. We
could perhaps add a no-ack-required message type in SSU if we don't want any
retransmissions at all of streaming-lib traffic. Are tightly bounded
retransmissions desirable?
<li>
4)
The priority sending code in .28 is only for NTCP. So far my testing hasn't
shown much use for SSU priority as the messages don't queue up long enough for
priorities to do any good. But more testing needed.
<li>
5)
The new streaming lib max timeout of 45s is probably still too low.
The TCP RFC says 60s. It probably shouldn't be shorter than the underlying NTCP max timeout (presumably 60s).
</ul>
<h3>Response by jrandom</h3>
Posted to new Syndie, 2007-03-27
<p>
On the whole, I'm open to experimenting with this, though remember why NTCP is
there in the first place - SSU failed in a congestion collapse. NTCP "just
works", and while 2-10% retransmission rates can be handled in normal
single-hop networks, that gives us a 40% retransmission rate with 2 hop
tunnels. If you loop in some of the measured SSU retransmission rates we saw
back before NTCP was implemented (10-30+%), that gives us an 83% retransmission
rate. Perhaps those rates were caused by the low 10 second timeout, but
increasing that much would bite us (remember, multiply by 5 and you've got half
the journey).
</p><p>
Unlike TCP, we have no feedback from the tunnel to know whether the message
made it - there are no tunnel level acks. We do have end to end ACKs, but only
on a small number of messages (whenever we distribute new session tags) - out
of the 1,553,591 client messages my router sent, we only attempted to ACK
145,207 of them. The others may have failed silently or succeeded perfectly.
</p><p>
I'm not convinced by the TCP-over-TCP argument for us, especially split across
the various paths we transfer down. Measurements on I2P can convince me
otherwise, of course.
</p><p>
<i>
The NTCP max timeout is presumably at least 60 sec, which is the RFC
recommendation. There is no way to change NTCP parameters or monitor
performance.
</i>
</p><p>
True, but net connections only get up to that level when something really bad
is going on - the retransmission timeout on TCP is often on the order of tens
or hundreds of milliseconds. As foofighter points out, they've got 20+ years
experience and bugfixing in their TCP stacks, plus a billion dollar industry
optimizing hardware and software to perform well according to whatever it is
they do.
</p><p>
<i>
NTCP has higher overhead and probably higher round trip times. when using NTCP
the ratio of (tunnel output) / (i2psnark data output) is at least 3.5 : 1.
Running an experiment where the code was modified to prefer SSU (the config
option i2np.udp.alwaysPreferred has no effect in the current code), the ratio
reduced to about 3 : 1, indicating better efficiency.
</i>
</p><p>
This is very interesting data, though more as a matter of router congestion
than bandwidth efficiency - you'd have to compare 3.5*$n*$NTCPRetransmissionPct
./. 3.0*$n*$SSURetransmissionPct. This data point suggests there's something in
the router that leads to excess local queuing of messages already being
transferred.
</p><p>
<i>
lifetime window size up from 6.3 to 7.5, RTT down from 11.5s to 10s, sends per
ACK down from 1.11 to 1.07.
</i>
</p><p>
Remember that the sends-per-ACK is only a sample not a full count (as we don't
try to ACK every send). Its not a random sample either, but instead samples
more heavily periods of inactivity or the initiation of a burst of activity -
sustained load won't require many ACKs.
</p><p>
Window sizes in that range are still woefully low to get the real benefit of
AIMD, and still too low to transmit a single 32KB BT chunk (increasing the
floor to 10 or 12 would cover that).
</p><p>
Still, the wsize stat looks promising - over how long was that maintained?
</p><p>
Actually, for testing purposes, you may want to look at
StreamSinkClient/StreamSinkServer or even TestSwarm in
apps/ministreaming/java/src/net/i2p/client/streaming/ - StreamSinkClient is a
CLI app that sends a selected file to a selected destination and
StreamSinkServer creates a destination and writes out any data sent to it
(displaying size and transfer time). TestSwarm combines the two - flooding
random data to whomever it connects to. That should give you the tools to
measure sustained throughput capacity over the streaming lib, as opposed to BT
choke/send.
</p><p>
<i>
1A)
This is easy -
We should flip the bid priorities so that SSU is preferred for all traffic, if
we can do this without causing all sorts of other trouble. This will fix the
i2np.udp.alwaysPreferred configuration option so that it works (either as true
or false).
</i>
</p><p>
Honoring i2np.udp.alwaysPreferred is a good idea in any case - please feel free
to commit that change. Lets gather a bit more data though before switching the
preferences, as NTCP was added to deal with an SSU-created congestion collapse.
</p><p>
<i>
1B)
Alternative to 1A), not so easy -
If we can mark traffic without adversely affecting our anonymity goals, we
should identify streaming-lib generated traffic
and have SSU generate a low bid for that traffic. This tag will have to go with
the message through each hop
so that the forwarding routers also honor the SSU preference.
</i>
</p><p>
In practice, there are three types of traffic - tunnel building/testing, netDb
query/response, and streaming lib traffic. The network has been designed to
make differentiating those three very hard.
</p><p>
<i>
2)
Bounding SSU even further (reducing maximum retransmissions from the current
10) is probably wise to reduce the chance of collapse.
</i>
</p><p>
At 10 retransmissions, we're up shit creek already, I agree. One, maybe two
retransmissions is reasonable, from a transport layer, but if the other side is
too congested to ACK in time (even with the implemented SACK/NACK capability),
there's not much we can do.
</p><p>
In my view, to really address the core issue we need to address why the router
gets so congested to ACK in time (which, from what I've found, is due to CPU
contention). Maybe we can juggle some things in the router's processing to make
the transmission of an already existing tunnel higher CPU priority than
decrypting a new tunnel request? Though we've got to be careful to avoid
starvation.
</p><p>
<i>
3)
We need further study on the benefits vs. harm of a semi-reliable protocol
underneath the streaming lib. Are retransmissions over a single hop beneficial
and a big win or are they worse than useless?
We could do a new SUU (secure unreliable UDP) but probably not worth it. We
could perhaps add a no-ACK-required message type in SSU if we don't want any
retransmissions at all of streaming-lib traffic. Are tightly bounded
retransmissions desirable?
</i>
</p><p>
Worth looking into - what if we just disabled SSU's retransmissions? It'd
probably lead to much higher streaming lib resend rates, but maybe not.
</p><p>
<i>
4)
The priority sending code in .28 is only for NTCP. So far my testing hasn't
shown much use for SSU priority as the messages don't queue up long enough for
priorities to do any good. But more testing needed.
</i>
</p><p>
There's UDPTransport.PRIORITY_LIMITS and UDPTransport.PRIORITY_WEIGHT (honored
by TimedWeightedPriorityMessageQueue), but currently the weights are almost all
equal, so there's no effect. That could be adjusted, of course (but as you
mention, if there's no queuing, it doesn't matter).
</p><p>
<i>
5)
The new streaming lib max timeout of 45s is probably still too low. The TCP RFC
says 60s. It probably shouldn't be shorter than the underlying NTCP max timeout
(presumably 60s).
</i>
</p><p>
That 45s is the max retransmission timeout of the streaming lib though, not the
stream timeout. TCP in practice has retransmission timeouts orders of magnitude
less, though yes, can get to 60s on links running through exposed wires or
satellite transmissions ;) If we increase the streaming lib retransmission
timeout to e.g. 75 seconds, we could go get a beer before a web page loads
(especially assuming less than a 98% reliable transport). That's one reason we
prefer NTCP.
</p>
<h3>Response by zzz</h3>
Posted to new Syndie, 2007-03-31
<p>
<i>
At 10 retransmissions, we're up shit creek already, I agree. One, maybe two
retransmissions is reasonable, from a transport layer, but if the other side is
too congested to ACK in time (even with the implemented SACK/NACK capability),
there's not much we can do.
<br>
In my view, to really address the core issue we need to address why the
router gets so congested to ACK in time (which, from what I've found, is due to
CPU contention). Maybe we can juggle some things in the router's processing to
make the transmission of an already existing tunnel higher CPU priority than
decrypting a new tunnel request? Though we've got to be careful to avoid
starvation.
</i>
</p><p>
One of my main stats-gathering techniques is turning on
net.i2p.client.streaming.ConnectionPacketHandler=DEBUG and watching the RTT
times and window sizes as they go by. To overgeneralize for a moment, it's
common to see 3 types of connections: ~4s RTT, ~10s RTT, and ~30s RTT. Trying
to knock down the 30s RTT connections is the goal. If CPU contention is the
cause then maybe some juggling will do it.
</p><p>
Reducing the SSU max retrans from 10 is really just a stab in the dark as we
don't have good data on whether we are collapsing, having TCP-over-TCP issues,
or what, so more data is needed.
</p><p>
<i>
Worth looking into - what if we just disabled SSU's retransmissions? It'd
probably lead to much higher streaming lib resend rates, but maybe not.
</i>
</p><p>
What I don't understand, if you could elaborate, are the benefits of SSU
retransmissions for non-streaming-lib traffic. Do we need tunnel messages (for
example) to use a semi-reliable transport or can they use an unreliable or
kinda-sorta-reliable transport (1 or 2 retransmissions max, for example)? In
other words, why semi-reliability?
</p><p>
<i>
(but as you mention, if there's no queuing, it doesn't matter).
</i>
</p><p>
I implemented priority sending for UDP but it kicked in about 100,000 times
less often than the code on the NTCP side. Maybe that's a clue for further
investigation or a hint - I don't understand why it would back up that much
more often on NTCP, but maybe that's a hint on why NTCP performs worse.
</p>
<h3>Question answered by jrandom</h3>
Posted to new Syndie, 2007-03-31
<p>
measured SSU retransmission rates we saw back before NTCP was implemented
(10-30+%)
</p><p>
Can the router itself measure this? If so, could a transport be selected based
on measured performance? (i.e. if an SSU connection to a peer is dropping an
unreasonable number of messages, prefer NTCP when sending to that peer)
</p><p>
Yeah, it currently uses that stat right now as a poor-man's MTU detection (if
the retransmission rate is high, it uses the small packet size, but if its low,
it uses the large packet size). We tried a few things when first introducing
NTCP (and when first moving away from the original TCP transport) that would
prefer SSU but fail that transport for a peer easily, causing it to fall back
on NTCP. However, there's certainly more that could be done in that regard,
though it gets complicated quickly (how/when to adjust/reset the bids, whether
to share these preferences across multiple peers or not, whether to share it
across multiple sessions with the same peer (and for how long), etc).
<h3>Response by foofighter</h3>
Posted to new Syndie, 2007-03-26
<p>
If I've understood things right, the primary reason in favor of TCP (in
general, both the old and new variety) was that you needn't worry about coding
a good TCP stack. Which ain't impossibly hard to get right... just that
existing TCP stacks have a 20 year lead.
</p><p>
AFAIK, there hasn't been much deep theory behind the preference of TCP versus
UDP, except the following considerations:
<ul>
<li>
A TCP-only network is very dependent on reachable peers (those who can forward
incoming connections through their NAT)
<li>
Still even if reachable peers are rare, having them be high capacity somewhat
alleviates the topological scarcity issues
<li>
UDP allows for "NAT hole punching" which lets people be "kind of
pseudo-reachable" (with the help of introducers) who could otherwise only
connect out
<li>
The "old" TCP transport implementation required lots of threads, which was a
performance killer, while the "new" TCP transport does well with few threads
<li>
Routers of set A crap out when saturated with UDP. Routers of set B crap out
when saturated with TCP.
<li>
It "feels" (as in, there are some indications but no scientific data or
quality statistics) that A is more widely deployed than B
<li>
Some networks carry non-DNS UDP datagrams with an outright shitty quality,
while still somewhat bothering to carry TCP streams.
</ul>
</p><p>
On that background, a small diversity of transports (as many as needed, but not
more) appears sensible in either case. Which should be the main transport,
depends on their performance-wise. I've seen nasty stuff on my line when I
tried to use its full capacity with UDP. Packet losses on the level of 35%.
</p><p>
We could definitely try playing with UDP versus TCP priorities, but I'd urge
caution in that. I would urge that they not be changed too radically all at
once, or it might break things.
</p>
<h3>Response by zzz</h3>
Posted to new Syndie, 2007-03-27
<p>
<i>
AFAIK, there hasn't been much deep theory behind the preference of TCP versus
UDP, except the following considerations:
</i>
</p><p>
These are all valid issues. However you are considering the two protocols in
isolation, whether than thinking about what transport protocol is best for a
particular higher-level protocol (i.e. streaming lib or not).
</p><p>
What I'm saying is you have to take the streaming lib into consideration.
So either shift the preferences for everybody or treat streaming lib traffic
differently.
That's what my proposal 1B) is talking about - have a different preference for
streaming-lib traffic than for non streaming-lib traffic (for example tunnel
build messages).
</p><p>
<i>
On that background, a small diversity of transports (as many as needed, but
not more) appears sensible in either case. Which should be the main transport,
depends on their performance-wise. I've seen nasty stuff on my line when I
tried to use its full capacity with UDP. Packet losses on the level of 35%.
</i>
</p><p>
Agreed. The new .28 may have made things better for packet loss over UDP, or
maybe not.
One important point - the transport code does remember failures of a transport.
So if UDP is the preferred transport, it will try it first, but if it fails for
a particular destination, the next attempt for that destination it will try
NTCP rather than trying UDP again.
</p><p>
<i>
We could definitely try playing with UDP versus TCP priorities, but I'd urge
caution in that. I would urge that they not be changed too radically all at
once, or it might break things.
</i>
</p><p>
We have four tuning knobs - the four bid values (SSU and NTCP, for
already-connected and not-already-connected).
We could make SSU be preferred over NTCP only if both are connected, for
example, but try NTCP first if neither transport is connected.
</p><p>
The other way to do it gradually is only shifting the streaming lib traffic
(the 1B proposal) however that could be hard and may have anonymity
implications, I don't know. Or maybe shift the traffic only for the first
outbound hop (i.e. don't propagate the flag to the next router), which gives
you only partial benefit but might be more anonymous and easier.
</p>
<h3>Results of the Discussion</h3>
... and other related changes in the same timeframe (2007):
<ul>
<li>
Significant tuning of the streaming lib parameters,
greatly increasing outbound performance, was implemented in 0.6.1.28
<li>
Priority sending for NTCP was implemented in 0.6.1.28
<li>
Priority sending for SSU was implemented by zzz but was never checked in
<li>
The advanced transport bid control
i2np.udp.preferred was implemented in 0.6.1.29.
<li>
Pushback for NTCP was implemented in 0.6.1.30, disabled in 0.6.1.31 due to anonymity concerns,
and re-enabled with improvements to address those concerns in 0.6.1.32.
<li>
None of zzz's proposals 1-5 have been implemented.
</ul>
{% endblock %} {% endblock %}

View File

@ -0,0 +1,559 @@
{% extends "_layout.html" %}
{% block title %}NTCP Discussion{% endblock %}
{% block content %}
Following is a discussion about NTCP that took place in March 2007.
It has not been updated to reflect current implementation.
For the current NTCP specification see <a href="ntcp.html">the main NTCP page</a>.
<h2>NTCP vs. SSU Discussion, March 2007</h2>
<h3>NTCP questions</h3>
(adapted from an IRC discussion between zzz and cervantes)
<br />
Why is NTCP preferred over SSU, doesn't NTCP have higher overhead and latency?
It has better reliability.
<br />
Doesn't streaming lib over NTCP suffer from classic TCP-over-TCP issues?
What if we had a really simple UDP transport for streaming-lib-originated traffic?
I think SSU was meant to be the so-called really simple UDP transport - but it just proved too unreliable.
<h3>"NTCP Considered Harmful" Analysis by zzz</h3>
Posted to new Syndie, 2007-03-25.
This was posted to stimulate discussion, don't take it too seriously.
<p>
Summary: NTCP has higher latency and overhead than SSU, and is more likely to
collapse when used with the streaming lib. However, traffic is routed with a
preference for NTCP over SSU and this is currently hardcoded.
</p>
<h4>Discussion</h4>
<p>
We currently have two transports, NTCP and SSU. As currently implemented, NTCP
has lower "bids" than SSU so it is preferred, except for the case where there
is an established SSU connection but no established NTCP connection for a peer.
</p><p>
SSU is similar to NTCP in that it implements acknowledgments, timeouts, and
retransmissions. However SSU is I2P code with tight constraints on the
timeouts and available statistics on round trip times, retransmissions, etc.
NTCP is based on Java NIO TCP, which is a black box and presumably implements
RFC standards, including very long maximum timeouts.
</p><p>
The majority of traffic within I2P is streaming-lib originated (HTTP, IRC,
Bittorrent) which is our implementation of TCP. As the lower-level transport is
generally NTCP due to the lower bids, the system is subject to the well-known
and dreaded problem of TCP-over-TCP
http://sites.inka.de/~W1011/devel/tcp-tcp.html , where both the higher and
lower layers of TCP are doing retransmissions at once, leading to collapse.
</p><p>
Unlike in the PPP over SSH scenario described in the link above, we have
several hops for the lower layer, each covered by a NTCP link. So each NTCP
latency is generally much less than the higher-layer streaming lib latency.
This lessens the chance of collapse.
</p><p>
Also, the probabilities of collapse are lessened when the lower-layer TCP is
tightly constrained with low timeouts and number of retransmissions compared to
the higher layer.
</p><p>
The .28 release increased the maximum streaming lib timeout from 10 sec to 45
sec which greatly improved things. The SSU max timeout is 3 sec. The NTCP max
timeout is presumably at least 60 sec, which is the RFC recommendation. There
is no way to change NTCP parameters or monitor performance. Collapse of the
NTCP layer is [editor: text lost]. Perhaps an external tool like tcpdump would help.
</p><p>
However, running .28, the i2psnark reported upstream does not generally stay at
a high level. It often goes down to 3-4 KBps before climbing back up. This is a
signal that there are still collapses.
</p><p>
SSU is also more efficient. NTCP has higher overhead and probably higher round
trip times. when using NTCP the ratio of (tunnel output) / (i2psnark data
output) is at least 3.5 : 1. Running an experiment where the code was modified
to prefer SSU (the config option i2np.udp.alwaysPreferred has no effect in the
current code), the ratio reduced to about 3 : 1, indicating better efficiency.
</p><p>
As reported by streaming lib stats, things were much improved - lifetime window
size up from 6.3 to 7.5, RTT down from 11.5s to 10s, sends per ack down from
1.11 to 1.07.
</p><p>
That this was quite effective was surprising, given that we were only changing
the transport for the first of 3 to 5 total hops the outbound messages would
take.
</p><p>
The effect on outbound i2psnark speeds wasn't clear due to normal variations.
Also for the experiment, inbound NTCP was disabled. The effect on inbound
speeds on i2psnark was not clear.
</p>
<h4>Proposals</h4>
<ul>
<li>
1A)
This is easy -
We should flip the bid priorities so that SSU is preferred for all traffic, if
we can do this without causing all sorts of other trouble. This will fix the
i2np.udp.alwaysPreferred configuration option so that it works (either as true
or false).
<li>
1B)
Alternative to 1A), not so easy -
If we can mark traffic without adversely affecting our anonymity goals, we
should identify streaming-lib generated traffic and have SSU generate a low bid
for that traffic. This tag will have to go with the message through each hop
so that the forwarding routers also honor the SSU preference.
<li>
2)
Bounding SSU even further (reducing maximum retransmissions from the current
10) is probably wise to reduce the chance of collapse.
<li>
3)
We need further study on the benefits vs. harm of a semi-reliable protocol
underneath the streaming lib. Are retransmissions over a single hop beneficial
and a big win or are they worse than useless?
We could do a new SUU (secure unreliable UDP) but probably not worth it. We
could perhaps add a no-ack-required message type in SSU if we don't want any
retransmissions at all of streaming-lib traffic. Are tightly bounded
retransmissions desirable?
<li>
4)
The priority sending code in .28 is only for NTCP. So far my testing hasn't
shown much use for SSU priority as the messages don't queue up long enough for
priorities to do any good. But more testing needed.
<li>
5)
The new streaming lib max timeout of 45s is probably still too low.
The TCP RFC says 60s. It probably shouldn't be shorter than the underlying NTCP max timeout (presumably 60s).
</ul>
<h3>Response by jrandom</h3>
Posted to new Syndie, 2007-03-27
<p>
On the whole, I'm open to experimenting with this, though remember why NTCP is
there in the first place - SSU failed in a congestion collapse. NTCP "just
works", and while 2-10% retransmission rates can be handled in normal
single-hop networks, that gives us a 40% retransmission rate with 2 hop
tunnels. If you loop in some of the measured SSU retransmission rates we saw
back before NTCP was implemented (10-30+%), that gives us an 83% retransmission
rate. Perhaps those rates were caused by the low 10 second timeout, but
increasing that much would bite us (remember, multiply by 5 and you've got half
the journey).
</p><p>
Unlike TCP, we have no feedback from the tunnel to know whether the message
made it - there are no tunnel level acks. We do have end to end ACKs, but only
on a small number of messages (whenever we distribute new session tags) - out
of the 1,553,591 client messages my router sent, we only attempted to ACK
145,207 of them. The others may have failed silently or succeeded perfectly.
</p><p>
I'm not convinced by the TCP-over-TCP argument for us, especially split across
the various paths we transfer down. Measurements on I2P can convince me
otherwise, of course.
</p><p>
<i>
The NTCP max timeout is presumably at least 60 sec, which is the RFC
recommendation. There is no way to change NTCP parameters or monitor
performance.
</i>
</p><p>
True, but net connections only get up to that level when something really bad
is going on - the retransmission timeout on TCP is often on the order of tens
or hundreds of milliseconds. As foofighter points out, they've got 20+ years
experience and bugfixing in their TCP stacks, plus a billion dollar industry
optimizing hardware and software to perform well according to whatever it is
they do.
</p><p>
<i>
NTCP has higher overhead and probably higher round trip times. when using NTCP
the ratio of (tunnel output) / (i2psnark data output) is at least 3.5 : 1.
Running an experiment where the code was modified to prefer SSU (the config
option i2np.udp.alwaysPreferred has no effect in the current code), the ratio
reduced to about 3 : 1, indicating better efficiency.
</i>
</p><p>
This is very interesting data, though more as a matter of router congestion
than bandwidth efficiency - you'd have to compare 3.5*$n*$NTCPRetransmissionPct
./. 3.0*$n*$SSURetransmissionPct. This data point suggests there's something in
the router that leads to excess local queuing of messages already being
transferred.
</p><p>
<i>
lifetime window size up from 6.3 to 7.5, RTT down from 11.5s to 10s, sends per
ACK down from 1.11 to 1.07.
</i>
</p><p>
Remember that the sends-per-ACK is only a sample not a full count (as we don't
try to ACK every send). Its not a random sample either, but instead samples
more heavily periods of inactivity or the initiation of a burst of activity -
sustained load won't require many ACKs.
</p><p>
Window sizes in that range are still woefully low to get the real benefit of
AIMD, and still too low to transmit a single 32KB BT chunk (increasing the
floor to 10 or 12 would cover that).
</p><p>
Still, the wsize stat looks promising - over how long was that maintained?
</p><p>
Actually, for testing purposes, you may want to look at
StreamSinkClient/StreamSinkServer or even TestSwarm in
apps/ministreaming/java/src/net/i2p/client/streaming/ - StreamSinkClient is a
CLI app that sends a selected file to a selected destination and
StreamSinkServer creates a destination and writes out any data sent to it
(displaying size and transfer time). TestSwarm combines the two - flooding
random data to whomever it connects to. That should give you the tools to
measure sustained throughput capacity over the streaming lib, as opposed to BT
choke/send.
</p><p>
<i>
1A)
This is easy -
We should flip the bid priorities so that SSU is preferred for all traffic, if
we can do this without causing all sorts of other trouble. This will fix the
i2np.udp.alwaysPreferred configuration option so that it works (either as true
or false).
</i>
</p><p>
Honoring i2np.udp.alwaysPreferred is a good idea in any case - please feel free
to commit that change. Lets gather a bit more data though before switching the
preferences, as NTCP was added to deal with an SSU-created congestion collapse.
</p><p>
<i>
1B)
Alternative to 1A), not so easy -
If we can mark traffic without adversely affecting our anonymity goals, we
should identify streaming-lib generated traffic
and have SSU generate a low bid for that traffic. This tag will have to go with
the message through each hop
so that the forwarding routers also honor the SSU preference.
</i>
</p><p>
In practice, there are three types of traffic - tunnel building/testing, netDb
query/response, and streaming lib traffic. The network has been designed to
make differentiating those three very hard.
</p><p>
<i>
2)
Bounding SSU even further (reducing maximum retransmissions from the current
10) is probably wise to reduce the chance of collapse.
</i>
</p><p>
At 10 retransmissions, we're up shit creek already, I agree. One, maybe two
retransmissions is reasonable, from a transport layer, but if the other side is
too congested to ACK in time (even with the implemented SACK/NACK capability),
there's not much we can do.
</p><p>
In my view, to really address the core issue we need to address why the router
gets so congested to ACK in time (which, from what I've found, is due to CPU
contention). Maybe we can juggle some things in the router's processing to make
the transmission of an already existing tunnel higher CPU priority than
decrypting a new tunnel request? Though we've got to be careful to avoid
starvation.
</p><p>
<i>
3)
We need further study on the benefits vs. harm of a semi-reliable protocol
underneath the streaming lib. Are retransmissions over a single hop beneficial
and a big win or are they worse than useless?
We could do a new SUU (secure unreliable UDP) but probably not worth it. We
could perhaps add a no-ACK-required message type in SSU if we don't want any
retransmissions at all of streaming-lib traffic. Are tightly bounded
retransmissions desirable?
</i>
</p><p>
Worth looking into - what if we just disabled SSU's retransmissions? It'd
probably lead to much higher streaming lib resend rates, but maybe not.
</p><p>
<i>
4)
The priority sending code in .28 is only for NTCP. So far my testing hasn't
shown much use for SSU priority as the messages don't queue up long enough for
priorities to do any good. But more testing needed.
</i>
</p><p>
There's UDPTransport.PRIORITY_LIMITS and UDPTransport.PRIORITY_WEIGHT (honored
by TimedWeightedPriorityMessageQueue), but currently the weights are almost all
equal, so there's no effect. That could be adjusted, of course (but as you
mention, if there's no queuing, it doesn't matter).
</p><p>
<i>
5)
The new streaming lib max timeout of 45s is probably still too low. The TCP RFC
says 60s. It probably shouldn't be shorter than the underlying NTCP max timeout
(presumably 60s).
</i>
</p><p>
That 45s is the max retransmission timeout of the streaming lib though, not the
stream timeout. TCP in practice has retransmission timeouts orders of magnitude
less, though yes, can get to 60s on links running through exposed wires or
satellite transmissions ;) If we increase the streaming lib retransmission
timeout to e.g. 75 seconds, we could go get a beer before a web page loads
(especially assuming less than a 98% reliable transport). That's one reason we
prefer NTCP.
</p>
<h3>Response by zzz</h3>
Posted to new Syndie, 2007-03-31
<p>
<i>
At 10 retransmissions, we're up shit creek already, I agree. One, maybe two
retransmissions is reasonable, from a transport layer, but if the other side is
too congested to ACK in time (even with the implemented SACK/NACK capability),
there's not much we can do.
<br>
In my view, to really address the core issue we need to address why the
router gets so congested to ACK in time (which, from what I've found, is due to
CPU contention). Maybe we can juggle some things in the router's processing to
make the transmission of an already existing tunnel higher CPU priority than
decrypting a new tunnel request? Though we've got to be careful to avoid
starvation.
</i>
</p><p>
One of my main stats-gathering techniques is turning on
net.i2p.client.streaming.ConnectionPacketHandler=DEBUG and watching the RTT
times and window sizes as they go by. To overgeneralize for a moment, it's
common to see 3 types of connections: ~4s RTT, ~10s RTT, and ~30s RTT. Trying
to knock down the 30s RTT connections is the goal. If CPU contention is the
cause then maybe some juggling will do it.
</p><p>
Reducing the SSU max retrans from 10 is really just a stab in the dark as we
don't have good data on whether we are collapsing, having TCP-over-TCP issues,
or what, so more data is needed.
</p><p>
<i>
Worth looking into - what if we just disabled SSU's retransmissions? It'd
probably lead to much higher streaming lib resend rates, but maybe not.
</i>
</p><p>
What I don't understand, if you could elaborate, are the benefits of SSU
retransmissions for non-streaming-lib traffic. Do we need tunnel messages (for
example) to use a semi-reliable transport or can they use an unreliable or
kinda-sorta-reliable transport (1 or 2 retransmissions max, for example)? In
other words, why semi-reliability?
</p><p>
<i>
(but as you mention, if there's no queuing, it doesn't matter).
</i>
</p><p>
I implemented priority sending for UDP but it kicked in about 100,000 times
less often than the code on the NTCP side. Maybe that's a clue for further
investigation or a hint - I don't understand why it would back up that much
more often on NTCP, but maybe that's a hint on why NTCP performs worse.
</p>
<h3>Question answered by jrandom</h3>
Posted to new Syndie, 2007-03-31
<p>
measured SSU retransmission rates we saw back before NTCP was implemented
(10-30+%)
</p><p>
Can the router itself measure this? If so, could a transport be selected based
on measured performance? (i.e. if an SSU connection to a peer is dropping an
unreasonable number of messages, prefer NTCP when sending to that peer)
</p><p>
Yeah, it currently uses that stat right now as a poor-man's MTU detection (if
the retransmission rate is high, it uses the small packet size, but if its low,
it uses the large packet size). We tried a few things when first introducing
NTCP (and when first moving away from the original TCP transport) that would
prefer SSU but fail that transport for a peer easily, causing it to fall back
on NTCP. However, there's certainly more that could be done in that regard,
though it gets complicated quickly (how/when to adjust/reset the bids, whether
to share these preferences across multiple peers or not, whether to share it
across multiple sessions with the same peer (and for how long), etc).
<h3>Response by foofighter</h3>
Posted to new Syndie, 2007-03-26
<p>
If I've understood things right, the primary reason in favor of TCP (in
general, both the old and new variety) was that you needn't worry about coding
a good TCP stack. Which ain't impossibly hard to get right... just that
existing TCP stacks have a 20 year lead.
</p><p>
AFAIK, there hasn't been much deep theory behind the preference of TCP versus
UDP, except the following considerations:
<ul>
<li>
A TCP-only network is very dependent on reachable peers (those who can forward
incoming connections through their NAT)
<li>
Still even if reachable peers are rare, having them be high capacity somewhat
alleviates the topological scarcity issues
<li>
UDP allows for "NAT hole punching" which lets people be "kind of
pseudo-reachable" (with the help of introducers) who could otherwise only
connect out
<li>
The "old" TCP transport implementation required lots of threads, which was a
performance killer, while the "new" TCP transport does well with few threads
<li>
Routers of set A crap out when saturated with UDP. Routers of set B crap out
when saturated with TCP.
<li>
It "feels" (as in, there are some indications but no scientific data or
quality statistics) that A is more widely deployed than B
<li>
Some networks carry non-DNS UDP datagrams with an outright shitty quality,
while still somewhat bothering to carry TCP streams.
</ul>
</p><p>
On that background, a small diversity of transports (as many as needed, but not
more) appears sensible in either case. Which should be the main transport,
depends on their performance-wise. I've seen nasty stuff on my line when I
tried to use its full capacity with UDP. Packet losses on the level of 35%.
</p><p>
We could definitely try playing with UDP versus TCP priorities, but I'd urge
caution in that. I would urge that they not be changed too radically all at
once, or it might break things.
</p>
<h3>Response by zzz</h3>
Posted to new Syndie, 2007-03-27
<p>
<i>
AFAIK, there hasn't been much deep theory behind the preference of TCP versus
UDP, except the following considerations:
</i>
</p><p>
These are all valid issues. However you are considering the two protocols in
isolation, whether than thinking about what transport protocol is best for a
particular higher-level protocol (i.e. streaming lib or not).
</p><p>
What I'm saying is you have to take the streaming lib into consideration.
So either shift the preferences for everybody or treat streaming lib traffic
differently.
That's what my proposal 1B) is talking about - have a different preference for
streaming-lib traffic than for non streaming-lib traffic (for example tunnel
build messages).
</p><p>
<i>
On that background, a small diversity of transports (as many as needed, but
not more) appears sensible in either case. Which should be the main transport,
depends on their performance-wise. I've seen nasty stuff on my line when I
tried to use its full capacity with UDP. Packet losses on the level of 35%.
</i>
</p><p>
Agreed. The new .28 may have made things better for packet loss over UDP, or
maybe not.
One important point - the transport code does remember failures of a transport.
So if UDP is the preferred transport, it will try it first, but if it fails for
a particular destination, the next attempt for that destination it will try
NTCP rather than trying UDP again.
</p><p>
<i>
We could definitely try playing with UDP versus TCP priorities, but I'd urge
caution in that. I would urge that they not be changed too radically all at
once, or it might break things.
</i>
</p><p>
We have four tuning knobs - the four bid values (SSU and NTCP, for
already-connected and not-already-connected).
We could make SSU be preferred over NTCP only if both are connected, for
example, but try NTCP first if neither transport is connected.
</p><p>
The other way to do it gradually is only shifting the streaming lib traffic
(the 1B proposal) however that could be hard and may have anonymity
implications, I don't know. Or maybe shift the traffic only for the first
outbound hop (i.e. don't propagate the flag to the next router), which gives
you only partial benefit but might be more anonymous and easier.
</p>
<h3>Results of the Discussion</h3>
... and other related changes in the same timeframe (2007):
<ul>
<li>
Significant tuning of the streaming lib parameters,
greatly increasing outbound performance, was implemented in 0.6.1.28
<li>
Priority sending for NTCP was implemented in 0.6.1.28
<li>
Priority sending for SSU was implemented by zzz but was never checked in
<li>
The advanced transport bid control
i2np.udp.preferred was implemented in 0.6.1.29.
<li>
Pushback for NTCP was implemented in 0.6.1.30, disabled in 0.6.1.31 due to anonymity concerns,
and re-enabled with improvements to address those concerns in 0.6.1.32.
<li>
None of zzz's proposals 1-5 have been implemented.
</ul>
{% endblock %}

View File

@ -2,32 +2,51 @@
{% block title %}Tunnel Creation{% endblock %} {% block title %}Tunnel Creation{% endblock %}
{% block content %} {% block content %}
<b>Note: This documents the current tunnel build implementation as of release 0.6.1.10.</b> This page documents the current tunnel build implementation.
<br> Updated August 2010 for release 0.8
<pre>
1) <a href="#tunnelCreate.overview">Tunnel creation</a>
1.1) <a href="#tunnelCreate.requestRecord">Tunnel creation request record</a>
1.2) <a href="#tunnelCreate.hopProcessing">Hop processing</a>
1.3) <a href="#tunnelCreate.replyRecord">Tunnel creation reply record</a>
1.4) <a href="#tunnelCreate.requestPreparation">Request preparation</a>
1.5) <a href="#tunnelCreate.requestDelivery">Request delivery</a>
1.6) <a href="#tunnelCreate.endpointHandling">Endpoint handling</a>
1.7) <a href="#tunnelCreate.replyProcessing">Reply processing</a>
2) <a href="#tunnelCreate.notes">Notes</a>
</pre>
<h2 id="tunnelCreate.overview">1) Tunnel creation encryption:</h2> <h2 id="tunnelCreate.overview">Tunnel Creation Specification</h2>
<p>
This document specifies the details of the encrypted tunnel build messages
used to create tunnels using a "non-interactive telescoping" method.
See <a href="tunnel-alt.html">the tunnel build document</a>
for an overview of the process, including peer selection and ordering methods.
<p>The tunnel creation is accomplished by a single message passed along <p>The tunnel creation is accomplished by a single message passed along
the path of peers in the tunnel, rewritten in place, and transmitted the path of peers in the tunnel, rewritten in place, and transmitted
back to the tunnel creator. This single tunnel message is made up back to the tunnel creator. This single tunnel message is made up
of a fixed number of records (8) - one for each potential peer in of a variable number of records (up to 8) - one for each potential peer in
the tunnel. Individual records are asymmetrically encrypted to be the tunnel. Individual records are asymmetrically encrypted to be
read only by a specific peer along the path, while an additional read only by a specific peer along the path, while an additional
symmetric layer of encryption is added at each hop so as to expose symmetric layer of encryption is added at each hop so as to expose
the asymmetrically encrypted record only at the appropriate time.</p> the asymmetrically encrypted record only at the appropriate time.</p>
<h3 id="tunnelCreate.requestRecord">1.1) Tunnel creation request record</h3> <h3 id="number">Number of Records</h3>
Not all records must contain valid data.
The build message for a 3-hop tunnel, for example, may contain more records
to hide the actual length of the tunnel from the participants.
There are two build message types. The original
<a href="i2np_spec.html#msg_TunnelBuild">Tunnel Build Message</a> (TBM)
contains 8 records, which is more than enough for any practical tunnel length.
The recently-implemented
<a href="i2np_spec.html#msg_VariableTunnelBuild">Variable Tunnel Build Message</a> (VTBM)
contains 1 to 8 records. The originator may trade off the size of the message
with the desired amount of tunnel length obfuscation.
<p>
In the current network, most tunnels are 2 or 3 hops long.
The current implementation uses a 5-record VTBM to build tunnels of 4 hops or less,
and the 8-record TBM for longer tunnels.
The 5-record VTBM (which fits in 3 1KB tunnel messaages) reduces network traffic
and increases build sucess rate, because larger messages are less likely to be dropped.
<p>
The reply message must be the same type and length as the build message.
<h3 id="tunnelCreate.requestRecord">Request Record Specification</h3>
Also specified in the
<a href="i2np_spec.html#struct_BuildRequestRecord">I2NP Specification</a>
<p>Cleartext of the record, visible only to the hop being asked:</p><pre> <p>Cleartext of the record, visible only to the hop being asked:</p><pre>
bytes 0-3: tunnel ID to receive messages as bytes 0-3: tunnel ID to receive messages as
@ -49,49 +68,79 @@ endpoint, they specify where the rewritten tunnel creation reply
message should be sent. In addition, the next message ID specifies the message should be sent. In addition, the next message ID specifies the
message ID that the message (or reply) should use.</p> message ID that the message (or reply) should use.</p>
<p>The flags field currently has two bits defined:</p><pre> <p>The flags field contains the following:
bit 0: if set, allow messages from anyone <pre>
bit 1: if set, allow messages to anyone, and send the reply to the Bit order: 76543210 (bit 7 is MSB)
specified next hop in a tunnel message</pre> bit 7: if set, allow messages from anyone
bit 6: if set, allow messages to anyone, and send the reply to the
specified next hop in a tunnel message
bits 5-0: Undefined
</pre>
<p>That cleartext record is ElGamal 2048 encrypted with the hop's Bit 7 indicates that the hop will be an inbound gateway (IBGW).
Bit 6 indicates that the hop will be an outbound endpoint (OBEP).
<h4 id="encryption">Request Encryption</h4>
<p>That cleartext record is <a href="how_cryptography.html#elgamal">ElGamal 2048 encrypted</a> with the hop's
public encryption key and formatted into a 528 byte record:</p><pre> public encryption key and formatted into a 528 byte record:</p><pre>
bytes 0-15: SHA-256-128 of the current hop's router identity bytes 0-15: First 16 bytes of the SHA-256 of the current hop's router identity
bytes 16-527: ElGamal-2048 encrypted request record</pre> bytes 16-527: ElGamal-2048 encrypted request record</pre>
<p>Since the cleartext uses the full field, there is no need for <p>Since the cleartext uses the full field, there is no need for
additional padding beyond <code>SHA256(cleartext) + cleartext</code>.</p> additional padding beyond <code>SHA256(cleartext) + cleartext</code>.</p>
<h3 id="tunnelCreate.hopProcessing">1.2) Hop processing</h3> <h3 id="tunnelCreate.hopProcessing">Hop Processing and Encryption</h3>
<p>When a hop receives a TunnelBuildMessage, it looks through the 8 <p>When a hop receives a TunnelBuildMessage, it looks through the
records contained within it for one starting with their own identity records contained within it for one starting with their own identity
hash (trimmed to 8 bytes). It then decrypts the ElGamal block from hash (trimmed to 8 bytes). It then decrypts the ElGamal block from
that record and retrieves the protected cleartext. At that point, that record and retrieves the protected cleartext. At that point,
they make sure the tunnel request is not a duplicate by feeding the they make sure the tunnel request is not a duplicate by feeding the
AES-256 reply key into a bloom filter and making sure the request AES-256 reply key into a bloom filter.
time is within an hour of current. Duplicates or invalid requests Duplicates or invalid requests
are dropped.</p> are dropped.</p>
<p>After deciding whether they will agree to participate in the tunnel <p>After deciding whether they will agree to participate in the tunnel
or not, they replace the record that had contained the request with or not, they replace the record that had contained the request with
an encrypted reply block. All other records are AES-256/CBC an encrypted reply block. All other records are <a href="how_cryptography.html#AES">AES-256/CBC
encrypted with the included reply key and IV (though each is encrypted</a> with the included reply key and IV (though each is
encrypted separately, rather than chained across records).</p> encrypted separately, rather than chained across records).</p>
<h3 id="tunnelCreate.replyRecord">1.3) Tunnel creation reply record</h3> <h4 id="tunnelCreate.replyRecord">Reply Record Specification</h4>
<p>After the current hop reads their record, they replace it with a <p>After the current hop reads their record, they replace it with a
reply record stating whether or not they agree to participate in the reply record stating whether or not they agree to participate in the
tunnel, and if they do not, they classify their reason for tunnel, and if they do not, they classify their reason for
rejection. This is simply a 1 byte value, with 0x0 meaning they rejection. This is simply a 1 byte value, with 0x0 meaning they
agree to participate in the tunnel, and higher values meaning higher agree to participate in the tunnel, and higher values meaning higher
levels of rejection. The reply is encrypted with the AES session levels of rejection.
key delivered to it in the encrypted block, padded with random data <p>
until it reaches the full record size:</p><pre> The following rejection codes are defined:
AES-256-CBC(SHA-256(padding+status) + padding + status, key, IV)</pre> <ul>
<li>
TUNNEL_REJECT_PROBABALISTIC_REJECT = 10
<li>
TUNNEL_REJECT_TRANSIENT_OVERLOAD = 20
<li>
TUNNEL_REJECT_BANDWIDTH = 30
<li>
TUNNEL_REJECT_CRIT = 50
</ul>
To hide other causes, such as router shutdown, from peers, the current implementation
uses TUNNEL_REJECT_BANDWIDTH for almost all rejections.
<h3 id="tunnelCreate.requestPreparation">1.4) Request preparation</h3> <p>
The reply is encrypted with the AES session
key delivered to it in the encrypted block, padded with 527 bytes of random data
to reach the full record size.
The padding is placed before the status byte:
</p><pre>
AES-256-CBC(SHA-256(padding+status) + padding + status, key, IV)</pre>
This is also described in the
<a href="i2np_spec.html#msg_TunnelBuildReply">I2NP spec</a>.
<h3 id="tunnelCreate.requestPreparation">Request Preparation</h3>
<p>When building a new request, all of the records must first be <p>When building a new request, all of the records must first be
built and asymmetrically encrypted. Each record should then be built and asymmetrically encrypted. Each record should then be
@ -103,31 +152,49 @@ right hop after their predecessor encrypts it.</p>
<p>The excess records not needed for individual requests are simply <p>The excess records not needed for individual requests are simply
filled with random data by the creator.</p> filled with random data by the creator.</p>
<h3 id="tunnelCreate.requestDelivery">1.5) Request delivery</h3> <h3 id="tunnelCreate.requestDelivery">Request Delivery</h3>
<p>For outbound tunnels, the delivery is done directly from the tunnel <p>For outbound tunnels, the delivery is done directly from the tunnel
creator to the first hop, packaging up the TunnelBuildMessage as if creator to the first hop, packaging up the TunnelBuildMessage as if
the creator was just another hop in the tunnel. For inbound the creator was just another hop in the tunnel. For inbound
tunnels, the delivery is done through an existing outbound tunnel tunnels, the delivery is done through an existing outbound tunnel.
(and during startup, when no outbound tunnel exists yet, a fake 0 The outbound tunnel is generally from the same pool as the new tunnel being built.
hop outbound tunnel is used).</p> If no outbound tunnel is available in that pool, an outbound exploratory tunnel is used.
At startup, when no outbound exploratory tunnel exists yet, a fake 0-hop
outbound tunnel is used.</p>
<h3 id="tunnelCreate.endpointHandling">1.6) Endpoint handling</h3> <h3 id="tunnelCreate.endpointHandling">Endpoint Handling</h3>
<p>When the request reaches an outbound endpoint (as determined by the <p>
For creation of an outbound tunnel,
when the request reaches an outbound endpoint (as determined by the
'allow messages to anyone' flag), the hop is processed as usual, 'allow messages to anyone' flag), the hop is processed as usual,
encrypting a reply in place of the record and encrypting all of the encrypting a reply in place of the record and encrypting all of the
other records, but since there is no 'next hop' to forward the other records, but since there is no 'next hop' to forward the
TunnelBuildMessage on to, it instead places the encrypted reply TunnelBuildMessage on to, it instead places the encrypted reply
records into a TunnelBuildReplyMessage and delivers it to the records into a
<a href="i2np_spec.html#msg_TunnelBuildReply">TunnelBuildReplyMessage</a>
or
<a href="i2np_spec.html#msg_VariableTunnelBuildReply">VariableTunnelBuildReplyMessage</a>
(the type of message and number of records must match that of the request)
and delivers it to the
reply tunnel specified within the request record. That reply tunnel reply tunnel specified within the request record. That reply tunnel
forwards the reply records down to the tunnel creator for forwards the reply records down to the tunnel creator for
processing, as below.</p> processing, as below.</p>
<p>When the request reaches the inbound endpoint (also known as the <p>The reply tunnel was specified by the creator as follows:
tunnel creator), the router processes each of the replies, as below.</p> Generally it is an inbound tunnel from the same pool as the new outbound tunnel being built.
If no inbound tunnel is available in that pool, an inbound exploratory tunnel is used.
At startup, when no inbound exploratory tunnel exists yet, a fake 0-hop
inbound tunnel is used.</p>
<h3 id="tunnelCreate.replyProcessing">1.7) Reply processing</h3> <p>
For creation of an inbound tunnel,
when the request reaches the inbound endpoint (also known as the
tunnel creator), there is no need to generate an explicit Reply Message, and
the router processes each of the replies, as below.</p>
<h3 id="tunnelCreate.replyProcessing">Reply Processing by the Request Creator</h3>
<p>To process the reply records, the creator simply has to AES decrypt <p>To process the reply records, the creator simply has to AES decrypt
each record individually, using the reply key and IV of each hop in each record individually, using the reply key and IV of each hop in
@ -137,18 +204,37 @@ why they refuse. If they all agree, the tunnel is considered
created and may be used immediately, but if anyone refuses, the created and may be used immediately, but if anyone refuses, the
tunnel is discarded.</p> tunnel is discarded.</p>
<h2 id="tunnelCreate.notes">2) Notes</h2> <p>
The agreements and rejections are noted in each peer's
<a href="how_peerselection.html">profile</a>, to be used in future assessments
of peer tunnel capacity.
<h2 id="tunnelCreate.notes">History and Notes</h2>
<p>
This strategy came about during a discussion on the I2P mailing list
between Michael Rogers, Matthew Toseland (toad), and jrandom regarding
the predecessor attack. See: <ul>
<li><a href="http://osdir.com/ml/network.i2p/2005-10/msg00138.html">Summary</a></li>
<li><a href="http://osdir.com/ml/network.i2p/2005-10/msg00129.html">Reasoning</a></li>
</ul></li>
It was introduced in release 0.6.1.10 on 2006-02-16, which was the last time
a non-backward-compatible change was made in I2P.
</p>
<p>
Notes:
<ul> <ul>
<li>This does not prevent two hostile peers within a tunnel from <li>This design does not prevent two hostile peers within a tunnel from
tagging one or more request or reply records to detect that they are tagging one or more request or reply records to detect that they are
within the same tunnel, but doing so can be detected by the tunnel within the same tunnel, but doing so can be detected by the tunnel
creator when reading the reply, causing the tunnel to be marked as creator when reading the reply, causing the tunnel to be marked as
invalid.</li> invalid.</li>
<li>This does not include a proof of work on the asymmetrically <li>This design does not include a proof of work on the asymmetrically
encrypted section, though the 16 byte identity hash could be cut in encrypted section, though the 16 byte identity hash could be cut in
half with the later replaced by a hashcash function of up to 2^64 half with the latter replaced by a hashcash function of up to 2^64
cost. This will not immediately be pursued, however.</li> cost.</li>
<li>This alone does not prevent two hostile peers within a tunnel from <li>This design alone does not prevent two hostile peers within a tunnel from
using timing information to determine whether they are in the same using timing information to determine whether they are in the same
tunnel. The use of batched and synchronized request delivery tunnel. The use of batched and synchronized request delivery
could help (batching up requests and sending them off on the could help (batching up requests and sending them off on the
@ -159,12 +245,34 @@ window would work (though doing that would require a high degree of
clock synchronization). Alternately, perhaps individual hops could clock synchronization). Alternately, perhaps individual hops could
inject a random delay before forwarding on the request?</li> inject a random delay before forwarding on the request?</li>
<li>Are there any nonfatal methods of tagging the request?</li> <li>Are there any nonfatal methods of tagging the request?</li>
<li>This strategy came about during a discussion on the I2P mailing list
between Michael Rogers, Matthew Toseland (toad), and jrandom regarding
the predecessor attack. See: <ul>
<li><a href="http://osdir.com/ml/network.i2p/2005-10/msg00138.html">Summary</a></li>
<li><a href="http://osdir.com/ml/network.i2p/2005-10/msg00129.html">Reasoning</a></li>
</ul></li>
</ul> </ul>
<h2 id="ref">References</h2>
<ul>
<li>
<a href="http://prisms.cs.umass.edu/brian/pubs/wright-tissec.pdf">Predecessor
attack</a>
<li>
<a href="http://prisms.cs.umass.edu/brian/pubs/wright.tissec.2008.pdf">2008
update</a>
</ul>
<h2 id="future">Future Work</h2>
<ul>
<li>
It appears that, in the current implementation, the originator leaves one record empty
for itself, which is not necessary. Thus a message of n records can only build a
tunnel of n-1 hops. This is to be researched and verified.
If it is possible to use the remaining record without compromising anonymity,
we should do so.
<li>
The usefulness of a timestamp with an hour resolution is questionable,
and the constraint is not currently enforced.
Therefore the request time field is unused.
This should be researched and possibly changed.
<li>
Further analysis of possible tagging and timing attacks described in the above notes.
</ul>
{% endblock %} {% endblock %}