i2p.www/www.i2p2/pages/todo.html

{% extends "_layout.html" %}
{% block title %}To Do List{% endblock %}
{% block content %}<p>Below is a more detailed (yet still incomplete) discussion of the major areas
of future development on the core I2P network, spanning the plausibly planned 
releases.  This does not include stego transports, porting to wireless devices,
or tools to secure the local machine, nor does it include client applications that
will be essential in I2P's success.  There are probably other things that will come
up, especially as I2P gets more peer review, but these are the main 'big things'.
See also <a href="roadmap.html">the roadmap</a>.
Want to help? <a href="getinvolved.html">Get involved</a>!
</p>

<ul>
 <li><a href="#core">Core functionality</a><ul>
  <li><a href="#nat">NAT/Firewall bridging via 1-hop restricted routes</a></li>
  <li><a href="#transport">High degree transport layer with UDP, NBIO, or NIO</a></li>
  <li><a href="#netdb">NetworkDB and profile tuning and ejection policy for large nets</a></li>
  </ul></li>
 <li><a href="#security">Security / anonymity</a><ul>
  <li><a href="#tunnelId">Per-hop tunnel id &amp; new permuted TunnelVerificationStructure encryption</a></li>
  <li><a href="#ordering">Strict ordering of participants within tunnels</a></li>
  <li><a href="#tunnelLength">Randomly permuted tunnel lengths</a></li>
  <li><a href="#fullRestrictedRoutes">Full blown n-hop restricted routes with optional trusted links</a></li>
  <li><a href="#hashcash">Hashcash for routerIdentity, destination, and tunnel request</a></li>
  <li><a href="#batching">Advanced tunnel operation (batching/mixing/throttling/padding)</a></li>
  <li><a href="#stop">Stop &amp; go mix w/ garlics &amp; tunnels</a></li>
  </ul></li>
 <li><a href="#performance">Performance</a><ul>
  <li><a href="#sessionTag">Migrate sessionTag to synchronized PRNG</a></li>
  <li><a href="#streaming">Full streaming protocol improvements</a></li>
  </ul></li>
</ul>


<ul>
<li><h2 id="core">Core functionality</h2><ul>

 <li><h3 id="nat">NAT/Firewall bridging via 1-hop restricted routes</h3>
<b><i>Implemented in I2P 0.6.0.6</i></b>

<p>The functionality of allowing routers to fully participate within 
the network while behind firewalls and NATs that they do not control
requires some basic restricted route operation (since those peers will
not be able to receive inbound connections).  To do this successfully,
you consider peers one of two ways:</p><ul>
<li><b>Peers who have reachable interfaces</b> - these peers do not need
       to do anything special</li>
<li><b>Peers who do not have reachable interfaces</b> - these peers must 
       build a tunnel pointing at them where the gateway is one of the 
       peers they have established a connection with who has both a publicly
       reachable interface and who has agreed to serve as their 'introducer'.</li>
</ul>

<p>To do this, peers who have no IP address simply connect to a few peers, 
build a tunnel through them, and publish a reference to those tunnels within
their RouterInfo structure in the network database.</p>

<p>When someone wants to contact any particular router, they first must get 
its RouterInfo from the network database, which will tell them whether they can
connect directly (e.g. the peer has a publicly reachable interface) or whether
they need to contact them indirectly.  Direct connections occur as normal, while
indirect connections are done through one of the the published tunnels.</p>

<p>When a router just wants to get a message or two to a specific hidden peer,
they can just use the indirect tunnel for sending the payload.  However, if the
router wants to talk to the hidden peer often (for instance, as part of a 
tunnel), they will send a garlic routed message through the indirect tunnel to
that hidden peer which unwraps to contain a message which should be sent to the
originating router.  That hidden peer then establishes an outbound connection to
the originating router and from then on, those two routers can talk to each other
directly over that newly established direct connection.</p>

<p>Of course, that only works if the originating peer can receive connections 
(they aren't also hidden).  However, if the originating peer is hidden, they can
simply direct the garlic routed message to come back to the originating peer's
inbound tunnel.</p>

<p>This is not meant to provide a way for a peer's IP address to be concealed,
merely as a way to let people behind firewalls and NATs fully operate within the
network.  Concealing the peer's IP address adds a little more work, as described
<a href="#fullRestrictedRoutes">below</a></p>

<p>With this technique, any router can participate as any part of a tunnel.  For
efficiency purposes, a hidden peer would be a bad choice for an inbound gateway,
and within any given tunnel, two neighboring peers wouldn't want to be hidden.  
But that is not technically necessary.</p>

 </li>
 <li><h3 id="transport">High degree transport layer with UDP, NBIO, or NIO</h3>
<b><i>Both UDP and NIO have been Implemented in I2P</i></b>

<p>Standard TCP communication in Java generally requires blocking socket calls,
and to keep a blocked socket from hanging the entire system, those blocking calls
are done on their own threads.  Our current TCP transport is implemented in a naive
fashion - for each peer we are talking to, we have one thread reading and one thread
writing.  The reader thread simply loops a bunch of read() calls, building I2NP messages
and adding them to our internal inbound message queue, and the writer thread pulls messages
off a per-connection outbound message queue and shoves the data through write() calls.</p>

<p>We do this fairly efficiently, from a CPU perspective - at any time, almost all of 
these threads are sitting idle, blocked waiting for something to do.  However, each 
thread consumes real resources (on older linux kernels, for instance, each thread would
often be implemented as a fork()'ed process).  As the network grows, the number of peers
each router will want to talk with will increase (remember, I2P is fully connected, 
meaning that any given peer should know how to get a message to any other peer, and
restricted route support will probably not significantly reduce the number of 
connections necessary).  This means that with a 100,000 router network, each router will
have up to 199,998 threads just to deal with the TCP connections!</p>

<p>Obviously, that just won't work.  We need to use a transport layer that
can scale.  In Java, we have two main camps:</p>

<h4>UDP</h4>
<b><i>Implemented in I2P 0.6 ("SSU") as documented <a href="udp.html">elsewhere</a></i></b>

<p>Sending and receiving UDP datagrams is a connectionless operation - if we are
communicating with 100,000 peers, we simply stick the UDP packets in a queue
and have a single thread pulling them off the queue and shoving them out the pipe
(and to receive, have a single thread pulling in any UDP packets received and adding
them to an inbound queue).</p>

<p>However, moving to UDP means losing the benefits of TCP's ordering, congestion 
control, MTU discovery, etc.  Implementing that code will take significant work, 
however I2P doesn't need it to be as strong as TCP.  Specifically, a while ago I was
taking some measurements in the simulator and on the live net, and the vast majority
of messages transferred would fit easily within a single unfragmented UDP packet, and
the largest of the messages would fit within 20-30 packets.  As mule pointed out, TCP
adds a significant overhead when dealing with so many small packets, as the ACKs are
within an order of magnitude in size.  With UDP, we can optimize the transport for
both efficiency and resilience by taking into account I2P's particular needs.</p>

<p>It will be a lot of work though.</p>

<h4>NIO or NBIO</h4>
<b><i>NIO Implemented in I2P 0.6.1.22 ("NTCP")</i></b>

<p>In Java 1.4, a set of "New I/O" packages was introduced, allowing Java developers 
to take advantage of the operating system's nonblocking IO capabilities - allowing 
you to maintain a large number of concurrent IO operations without requiring a separate 
thread for each.  There is much promise with this approach, as we can scalably handle
a large number of concurrent connections and we don't have to write a mini-TCP stack
with UDP.  However, the NIO packages have not proven themselves to be battle-ready, as
the freenet developer's found.  In addition, requiring NIO support would mean we can't
run on any of the open source JVMs like <a href="http://www.kaffe.org/">Kaffe</a>, as 
<a href="http://www.classpath.org/">GNU/Classpath</a> has only limited support for NIO.
<i>(note: this may not be the case anymore, as there has been some progress on Classpath's
NIO, but it is an unknown quantity)</i></p>

<p>Another alternative along the same lines is the 
<a href="http://www.eecs.harvard.edu/~mdw/proj/java-nbio/">Non Blocking I/O</a> package -
essentially a cleanroom NIO implementation (written before NIO was around).  It works 
by using some native OS code to do the nonblocking IO, passing off events through Java.
It seems to be working with Kaffe, though there doesn't seem to be much development
activity on it lately (likely due to 1.4's NIO deployment).</p>


 </li>
 <li><h3 id="netdb">NetworkDB and profile tuning and ejection policy for large nets</h3>

<p>Within the current network database and profile management implementation, we have taken
the liberty of some practical shortcuts.  For instance, we don't have the code to 
drop peer references from the K-buckets, as we don't have enough peers to even plausibly 
fill any of them, so instead, we just keep the peers in whatever bucket is appropriate.  
Another example deals with the peer profiles - the memory required to maintain each peer's
profile is small enough that we can keep thousands of full blown profiles in memory without
problems.  While we have the capacity to use trimmed down profiles (which we can maintain
100s of thousands in memory), we don't have any code to deal with moving a profile from 
a "minimal profile" to a "full profile", a "full profile" to a "minimal profile", or to
simply eject a profile altogether.  It just wouldn't be practical to write that code yet,
since we aren't going to need it for a while.</p>

<p>That said, as the network grows we are going to want to keep these considerations in
mind.  We will have some work to do, but we can put it off for later.</p>

 </li>
</ul></li>

<li><h2 id="security">Security / anonymity</h2><ul>

 <li><h3 id="tunnelId">Per-hop tunnel id &amp; new permuted TunnelVerificationStructure encryption</h3>
<b><i>Addressed in I2P 0.5 as documented <a href="tunnel-alt.html">elsewhere</a></i></b>

<p>Right now, if Alice builds a four hop inbound tunnel starting at Elvis, going to Dave, 
then to Charlie, then Bob, and finally Alice (A&lt;--B&lt;--C&lt;--D&lt;--E), all five of 
them will know they are participating in tunnel "123", as the messages are tagged as such.
What we want to do is give each hop their own unique tunnel hop ID - Charlie will receive
messages on tunnel 234 and forward them to tunnel 876 on Bob.  The intent is to prevent 
Bob or Charlie from knowing that they are in Alice's tunnel, as if each hop in the tunnel
had the same tunnel ID, collusion attacks aren't much work. </p>

<p>Adding a unique tunnel ID per hop isn't hard, but by itself, insufficient.  If Dave
and Bob are under the control of the same attacker, they wouldn't be able to tell they 
are in the same tunnel due to the tunnel ID, but would be able to tell by the message
bodies and verification structures by simply comparing them.  To prevent that, the tunnel
must use layered encryption along the path, both on the payload of the tunneled message
and on the verification structure (used to prevent simple tagging attacks).  This requires
some simple modifications to the TunnelMessage, as well as the inclusion of per-hop secret
keys delivered during tunnel creation and given to the tunnel's gateway.  We must fix a 
maximum tunnel length (e.g. 16 hops) and instruct the gateway to encrypt the message to 
each of the 16 delivered secret keys, in reverse order, and to encrypt the signature of 
the hash of the (encrypted) payload at each step.  The gateway then sends that 16-step 
encrypted message, along with a 16-step and 16-wide encrypted mapping to the first hop,
which then decrypts the mapping and the payload with their secret key, looking in the 
16-wide mapping for the entry associated with their own hop (keyed by the per-hop tunnel ID)
and verifying the payload by checking it against the associated signed hash.</p>

<p>The tunnel gateway does still have more information than the other peers in the tunnel,
and compromising both the gateway and a tunnel participant would allow those peers to 
collude, exposing the fact that they are both in the same tunnel.  In addition, neighboring
peers know that they are in the same tunnel anyway, as they know who they send the message
to (and with IP-based transports without restricted routes, they know who they got it from).
However, the above two techniques significantly increase the cost of gaining meaningful 
samples when dealing with longer tunnels.</p>


 </li>
 <li><h3 id="ordering">Strict ordering of participants within tunnels</h3>
<b><i>Implemented in release 0.6.2</a></i></b>

<p>As Connelly <a href="http://dev.i2p/pipermail/i2p/2004-July/000335.html">proposed</a>
to deal with the <a href="http://prisms.cs.umass.edu/brian/pubs/wright-tissec.pdf">predecessor attack</a>
<a href="http://prisms.cs.umass.edu/brian/pubs/wright.tissec.2008.pdf">(2008 update)</a>,
keeping the order of peers within our tunnels consistent (aka whenever Alice creates
a tunnel with both Bob and Charlie in it, Bob's next hop is always Charlie), we address
the issue as Bob doesn't get to substantially sample Alice's peer selection group.  We may even want
to explicitly allow Bob to participate in Alice's tunnels in only one way - receiving a message
from Dave and sending it to Charlie - and if any of those peers are not available to participate
in the tunnel (due to overload, network disconnection, etc), avoid asking Bob to participate
in any tunnels until they are back online.</p>

<p>More analysis is necessary for revising the tunnel creation - at the moment, we simply 
select and order randomly within the peer's top tier of peers (ones with fast + high 
capacity).</p>

<p>Adding a strict ordering to peers in a tunnel also improves the anonymity of peers with
0-hop tunnels, as otherwise the fact that a peer's gateway is always the same would be 
particularly damning.  However, peers with 0-hop tunnels may want to periodically use a
1-hop tunnel to simulate the failure of a normally reliable gateway peer (so every 
MTBF*(tunnel duration) minutes, use a 1-hop tunnel).</p>

 </li>
 <li><h3 id="tunnelLength">Randomly permuted tunnel lengths</h3>
<b><i>Addressed in I2P 0.5 as documented <a href="tunnel-alt.html">elsewhere</a></i></b>

<p>Without tunnel length permutation, if someone were to somehow detect that a destination had
a particular number of hops, it might be able to use that information to identify the router the
destination is located on, per the predecessor attack.  For instance, if everyone has 2-hop 
tunnels, if Bob receives a tunnel message from Charlie and forwards it to Alice, Bob knows Alice
is the final router in the tunnel.  If Bob were to identify what destination that tunnel served
(by means of colluding with the gateway and harvesting the network database for all of the 
LeaseSets), he would know the router on which that destination is located (and without restricted
routes, that would mean what IP address the destination is on).</p>

<p>It is to counter user behavior that tunnel lengths should be permuted, using algorithms based
on the length requested (for example, the 1/MTBF length change for 0-hop tunnels outlined 
above).</p>

 </li>
 <li><h3 id="fullRestrictedRoutes">Full blown n-hop restricted routes with optional trusted links</h3>

<p>The restricted route functionality described before was simply a functional issue - how to let
peers who would not otherwise be able to communicate do so.  However, the concept of allowing 
restricted routes includes additional capabilities.  For instance, if a router absolutely cannot 
risk communicating directly with any untrusted peers, they can set up trusted links through those
peers, using them to both send and receive all of its messages.  Those hidden peers who want to be
completely isolated would also refuse to connect to peers who attempt to get them to (as demonstrated
by the garlic routing technique outlined before) - they can simply take the garlic clove that has a
request for delivery to a particular peer and tunnel route that message out one of the hidden peer's
trusted links with instructions to forward it as requested.</p>

 </li>
 <li><h3 id="hashcash">Hashcash for routerIdentity, destination, and tunnel request</h3>

<p>Within the network, we will want some way to deter people from consuming too many resources or
from creating so many peers to mount a <a href="http://citeseer.ist.psu.edu/douceur02sybil.html">sybil</a>
attack.  Traditional techniques such as having a peer see who is requesting a resource or running a 
peer aren't appropriate for use within I2P, as doing so would compromise the anonymity of the system.
Instead, we want to make certain requests "expensive".</p>

<p><a href="http://www.hashcash.org/">Hashcash</a> is one technique that we can use to anonymously
increase the "cost" of doing certain activities, such as creating a new router identity (done only
once on installation), creating a new destination (done only once when creating a service), or
requesting that a peer participate in a tunnel (done often, perhaps 2-300 times per hour).  We don't
know the "correct" cost of each type of certificate yet, but with some research and experimentation, we
could set a base level that is sufficiently expensive while not an excessive burden for people with few
resources.</p>

<p>There are a few other algorithms that we can explore for making those requests for resources 
"nonfree", and further research on that front is appropriate.</p>

 </li>
 <li><h3 id="batching">Advanced tunnel operation (batching/mixing/throttling/padding)</h3>

<p>To powerful passive external observers as well as large colluding internal observers, standard tunnel
routing is vulnerable to traffic analysis attacks - simply watching the size and frequency of messages
being passed between routers.  To defend against these, we will want to essentially turn some of the
tunnels into its own mix cascade - delaying messages received at the gateway and passing them in 
batches, reordering them as necessary, and injecting dummy messages (indistinguishable from other "real"
tunnel messages by peers in the path).  There has been a significant amount of 
<a href="http://freehaven.net/doc/sync-batching/sync-batching.pdf">research</a> on these algorithms that
we can lean on prior to implementing the various tunnel mixing strategies.</p>

<p>In addition to the anonymity aspects of more varied tunnel operation, there is a functional 
dimension as well.  Each peer only has a certain amount of data they can route for the network,
and to keep any particular tunnel from consuming an unreasonable portion of that bandwidth, they
will want to include some throttles on the tunnel.  For instance, a tunnel may be configured to
throttle itself after passing 600 messages (1 per second), 2.4MB (4KBps), or exceeding some moving
average (8KBps for the last minute).  Excess messages may be delayed or summarily dropped.  With
this sort of throttling, peers can provide ATM-like QoS support for their tunnels, refusing to 
agree to allocate more bandwidth than the peer has available.</p>

<p>In addition, we may want to implement code to dynamically reroute tunnels to avoid failed peers
or to inject additional hops into the path.  This can be done by garlic routing a message to any 
particular peer in a tunnel with instructions to redefine the next-hop in the tunnel.</p>

 </li>
 <li><h3 id="stop">Stop &amp; go mix w/ garlics &amp; tunnels</h3>

<p>Beyond the per-tunnel batching and mixing strategy, there are further capabilities for protecting
against powerful attackers, such as allowing each step in a garlic routed path to define a delay or 
window in which it should be forwarded on.  This would enable protections against the long term 
intersection attack, as a peer could send a message that looks perfectly standard to most peers that
pass it along, except at any peers where the clove exposed includes delay instructions.</p>

 </li>
</ul></li>

<li><h2 id="performance">Performance</h2><ul>

 <li><h3 id="reply">Persistent Tunnel / Lease Selection</h3>
<b><i>Outbound tunnel selection implemented in 0.6.1.30, inbound lease selection implemented in release 0.6.2</i></b>

<p>Selecting tunnels and leases at random for every message creates
a large incidence of out-of-order delivery, which prevents the streaming lib from
increasing its window size as much as it could.
By persisting with the same selections for a given connection,
the transfer rate is much faster.
</p></li>

 <li><h3 id="reply">Reduction of Reply LeaseSet Bundling</h3>
<b><i>Implemented in release 0.6.2</i></b>

<p>I2P bundled a reply leaseset (typically 1056 bytes) with every outbound
client message, which was a massive overhead. Fixed in 0.6.2.
</p></li>

 <li><h3 id="sessionTag">Migrate sessionTag to synchronized PRNG</h3>

<p>Right now, our <a href="how_elgamalaes">ElGamal/AES+SessinTag</a> algorithm works by tagging each
encrypted message with a unique random 32 byte nonce (a "session tag"), identifying that message as
being encrypted with the associated AES session's key.  This prevents peers from distinguishing 
messages that are part of the same session, since each message has a completely new random tag.  To
accomplish this, every few messages bundle a whole new set of session tags within the encrypted 
message itself, transparently delivering a way to identify future messages.  We then have to keep
track of what messages are successfully delivered so that we know what tags we may use.</p>

<p>This works fine and is fairly robust, however it is inefficient in terms of bandwidth usage, as
it requires the delivery of these tags ahead of time (and not all tags may be necessary, or some
may be wasted, due to their expiration).  On average though, predelivering the session tag costs
32 bytes per message (the size of a tag).  As Taral suggested though, that size can be avoided by
replacing the delivery of the tags with a synchronized PRNG - when a new session is established 
(through an ElGamal encrypted block), both sides seed a PRNG for use and generate the session tags
on demand (with the recipient precalculating the next few possible values to handle out of order
delivery).</p>

 </li>
 <li><h3 id="streaming">Full streaming protocol improvements</h3>
<b><i>Several improvements implemented in I2P 0.6.1.28,
and significant additional fixes in 0.6.1.33,
but still lots here to investigate</i></b>

<p>Since I2P <a href="http://dev.i2p.net/pipermail/i2p/2004-November/000491.html">0.4.2</a>,
we have had a full sliding window streaming library, improving upon the older 
fixed window size and resend delay implementation greatly.  However, there are 
still a few avenues for further optimization:</p>

<ul>
<li>some algorithms to share congestion and RTT information across
   streams (per target destination?  per source destination?  for
   all of the local destinations?)</li>
<li>further optimizations for interactive streams (most of the focus
   in the current implementation is on bulk streams)</li>
<li>more explicit use of the new streaming lib's features in
   I2PTunnel and the SAM bridge, reducing the per-tunnel overhead.</li>
<li>client level bandwidth limiting (in either or both directions
   on a stream, or possibly shared across multiple streams).  This
   would be in addition to the router's overall bandwidth limiting,
   of course.</li>
<li>various controls for destinations to throttle how many streams
   they accept or create (we have some basic code, but largely
   disabled)</li>
<li>access control lists (only allowing streams to or from certain
   other known destinations)</li>
<li>web controls and monitoring the health of the various streams,
   as well as the ability to explicitly close or throttle them</li>
</ul>

 </li>
</ul></li>

</ul>
{% endblock %}