i2p.www/www.i2p2/pages/how_networkdatabase.html

{% extends "_layout.html" %}
{% block title %}How the Network Database (netDb) Works{% endblock %}
{% block content %}
<p>I2P's netDb is a specialized distributed database, containing
just two types of data - router contact information and destination contact
information.  Each piece of data is signed by the appropriate party and verified
by anyone who uses or stores it.  In addition, the data has liveliness information
within it, allowing irrelevent entries to be dropped, newer entries to replace
older ones, and, for the paranoid, protection against certain classes of attack.
(Note that this is also why I2P bundles in the necessary code to determine the
correct time).</p>

<p>
The netDb is distributed with a simple technique called "floodfill".
Previously, the netDb also used kademlia as a fallback algorithm. However,
it did not work will in our application, and it was completely disabled
in release 0.6.1.20. More information is <a href="status">below</a>.
</p>

<p>
Note that this document has been updated to include floodfill details,
but there are still some incorrect statements about the use of kademlia that
need to be fixed up.
</p>

<h2><a name="routerInfo">RouterInfo</a></h2>

<p>When an I2P router wants to contact another router, they need to know some
key pieces of data - all of which are bundled up and signed by the router into
a structure called the "RouterInfo", which is distributed under the key derived
from the SHA256 of the router's identity.  The structure itself contains:</p><ul>
<li>The router's identity (a 2048bit ElGamal encryption key, a 1024bit DSA signing key, and a certificate)</li>
<li>The contact addresses at which it can be reached (e.g. TCP: dev.i2p.net port 4108)</li>
<li>When this was published</li>
<li>A set of arbitrary (uninterpreted) text options</li>
<li>The signature of the above, generated by the identity's DSA signingkey</li>
</ul>

<p>The arbitrary text options are currently used to help debug the network,
publishing various stats about the router's health.  These stats will be disabled
by default once I2P 1.0 is out, and can be disabled by adding
"router.publishPeerRankings=false" to the router
<a href="http://localhost:7657/configadvanced.jsp">configuration</a>.  The data
published can be seen on the router's <a href="http://localhost:7657/netdb.jsp">netDb</a>
page, but should not be trusted.</p>

<h2><a name="leaseSet">LeaseSet</a></h2>

<p>The second piece of data distributed in the netDb is a "LeaseSet" - documenting
a group of tunnel entry points (leases) for a particular client destination.
Each of these leases specify the tunnel's gateway router (with the hash of its
identity), the tunnel ID on that router to send messages (a 4 byte number), and
when that tunnel will expire.  The LeaseSet itself is stored in the netDb under
the key derived from the SHA256 of the destination.</p>

<p>In addition to these leases, the LeaseSet also includes the destination
itself (namely, the destination's 2048bit ElGamal encryption key, 1024bit DSA
signing key, and certificate) as well as an additional pair of signing and
encryption keys.  These additional keys can be used for garlic routing messages
to the router on which the destination is located (though these keys are <b>not</b>
the router's keys - they are generated by the client and given to the router to
use).  End to end client messages are still, of course, encrypted with the
destination's public keys.</p>

<h2><a name="bootstrap">Bootstrapping</a></h2>

<p>The netDb, being a DHT, is completely decentralized, however you do need at
least one reference to a peer so that you can let the DHT's integration process
tie you in.  This is accomplished by "reseeding" your router with the RouterInfo
of an active peer - specifically, by retrieving their <code>routerInfo-$hash.dat</code>
file and storing it in your <code>netDb/</code> directory.  Anyone can provide
you with those files - you can even provide them to others by exposing your own
netDb directory.  To simplify the process of finding someone with a RouterInfo,
an alias has been made to the <a href="http://dev.i2p.net/i2pdb/">netDb</a> dir
of one of the routers on dev.i2p.net.</p>

<h2><a name="floodfill">Floodfill</a></h2>
<p>
(Adapted from a post by jrandom in the old Syndie, Nov. 26, 2005)
<br />
The floodfill netDb is really just a simple and perhaps temporary measure,
using the simplest possible algorithm - send the data to a peer in the
floodfill netDb, wait 10 seconds, pick a random peer in the netDb and ask them
for the entry to be sent, verifying its proper insertion / distribution.  If the
verification peer doesn't reply, or they don't have the entry, the sender
repeats the process.  When the peer in the floodfill netDb receives a netDb
store from a peer not in the floodfill netDb, they send it to all of the peers
in the floodfill netDb.
</p><p>
Peers still do netDb exploration and bootstrapping as before.
</p><p>
At one point, the kademlia
search/store functionality was still in place.  The peers
considered the floodfill peers as always being 'closer' to every key than any
peer not participating in the netDb.  We fell back on the kademlia
netDb if the floodfill peers fail for some reason or another.
However, kademlia has since been disabled completely (see below).
</p><p>
Determining who is part of the floodfill netDb is trivial - its exposed in each
router's published routerInfo.  If too many peers publish that flag, we may
have to have peers publish the list of peers they consider as being in the
netDb, but perhaps not, since these peers are not anonymous - if router X
publishes the flag and they suck, we know router X's IP and can handle them
accordingly.
</p><p>
As for efficiency, this algorithm is optimal when the netDb peers are known and
their quantity is appropriate for the appropriate uptime demands.  Regarding
scaling, we've got two peers who participate in the netDb right now, and
they'll be able to handle the load by themselves until we've got 10k+ eepsites
- at that point, we can toss on a few more or work out some more aggressive
load balancing among the netDb peers, but worrying about it when we have dozens
of eepsites may be a bit premature.  It's not as sexy as the old
kademlia netDb, but there are subtle anonymity attacks against non-flooded
netDbs.
</p>

<h2><a name="healing">Healing</a></h2>
<i>Needs update since kademlia is disabled.</i>

<p>While the kademlia algorithm is fairly efficient at maintaining the necessary
links, we keep additional statistics regarding the netDb's activity so that we
can detect potential segmentation and actively avoid it.  This is done as part of
the peer profiling - with data points such as how many new and verifiable
RouterInfo references a peer gives us, we can determine what peers know about
groups of peers that we have never seen references to.  When this occurs, we can
take advantage of kademlia's flexibility in exploration and send requests to that
peer so as to integrate ourselves further with the part of the network seen by
that well integrated router.</p>

<h2><a name="migration">Migration</a></h2>

<p>Unlike traditional DHTs, the very act of conducting a search distributes the
data as well, since rather than passing IP+port # pairs, references are given to
the routers on which to query (namely, the SHA256 of those router's identities).
As such, iteratively searching for a particular destination's LeaseSet or
router's RouterInfo will also provide you with the RouterInfo of the peers along
the way.</p>

<p>In addition, due to the time sensitivity of the data, the information doesn't
often need to be migrated - since a LeaseSet is only valid for the 10 minutes
that the referenced tunnels are around, those entries can simply be dropped at
expiration, since they will be replaced at the new location when the router
publishes a new LeaseSet.</p>

<p>To address the concerns of <a href="http://citeseer.ist.psu.edu/douceur02sybil.html">Sybil attacks</a>,
the location used to store entries varies over time.  Rather than storing the
RouterInfo on the peers closest to SHA256(router identity), they are stored on
the peers closest to SHA256(router identity + YYYYMMdd), requiring an adversary
to remount the attack again daily so as to maintain closeness to the "current"
keyspace.  In addition, entries are probabalistically distributed to an additional
peer outside of the target keyspace, so that a successful compromise of the K
routers closest to the key will only degrade the search time.</p>

<h2><a name="delivery">Delivery</a></h2>

<p>As with DNS lookups, the fact that someone is trying to retrieve the LeaseSet
for a particular destination is sensitive (the fact that someone is <i>publishing</i>
a LeaseSet even more so!).  To address this, netDb searches and netDb store
messages are simply sent through the router's exploratory tunnels.</p>

<h2><a name="status">History and Status</a></h2>

<h3>The Introduction of the Floodfill Algorithm</h3>
<p>
Floodfill was introduced in release 0.6.0.4, keeping Kademlia as a backup algorithm.
</p>

<p>
(Adapted from a post by jrandom in the old Syndie, Nov. 26, 2005)
<br />
As I've often said, I'm not particularly bound to any specific technology -
what matters to me is what will get results.  While I've been working through
various netdb ideas over the last few years, the issues we've faced in the last
few weeks have brought some of them to a head.  On the live net,
with the netdb redundancy factor set to 4 peers (meaning we keep sending an
entry to new peers until 4 of them confirm that they've got it) and the
per-peer timeout set to 4 times that peer's average reply time, we're
<b>still</b> getting an average of 40-60 peers sent to before 4 ack the store.
That means sending 36-56 times as many messages as should go out, each using
tunnels and thereby crossing 2-4 links.  Even further, that value is heavily
skewed, as the average number of peers sent to in a 'failed' store (meaning
less than 4 people acked the message after 60 seconds of sending messages out)
was in the 130-160 peers range.
</p><p>
This is insane, especially for a network with only perhaps 250 peers on it.
</p><p>
The simplest answer is to say "well, duh jrandom, it's broken.  fix it", but
that doesn't quite get to the core of the issue.  In line with another current
effort, it's likely that we have a substantial number of network issues due to
restricted routes - peers who cannot talk with some other peers, often due to
NAT or firewall issues.  If, say, the K peers closest to a particular netdb
entry are behind a 'restricted route' such that the netdb store message could
reach them but some other peer's netdb lookup message could not, that entry
would be essentially unreachable.  Following down those lines a bit further and
taking into consideration the fact that some restricted routes will be created
with hostile intent, its clear that we're going to have to look closer into a
long term netdb solution.
</p><p>
There are a few alternatives, but two worth mentioning in particular.  The
first is to simply run the netdb as a kademlia DHT using a subset of the full
network, where all of those peers are externally reachable.  Peers who are not
participating in the netdb still query those peers but they don't receive
unsolicited netdb store or lookup messages.  Participation in the netdb would
be both self-selecting and user-eliminating - routers would choose whether to
publish a flag in their routerInfo stating whether they want to participate
while each router chooses which peers it wants to treat as part of the netdb
(peers who publish that flag but who never give any useful data would be
ignored, essentially eliminating them from the netdb).
</p><p>
Another alternative is a blast from the past, going back to the DTSTTCPW
mentality - a floodfill netdb, but like the alternative above, using only a
subset of the full network.  When a user wants to publish an entry into the
floodfill netdb, they simply send it to one of the participating routers, wait
for an ACK, and then 30 seconds later, query another random participant in the
floodfill netdb to verify that it was properly distributed.  If it was, great,
and if it wasn't, just repeat the process.  When a floodfill router receives a
netdb store, they ACK immediately and queue off the netdb store to all of its
known netdb peers.  When a floodfill router receives a netdb lookup, if they
have the data, they reply with it, but if they don't, they reply with the
hashes for, say, 20 other peers in the floodfill netdb.
</p><p>
Looking at it from a network economics perspective, the floodfill netdb is
quite similar to the original broadcast netdb, except the cost for publishing
an entry is borne mostly by peers in the netdb, rather than by the publisher.
Fleshing this out a bit further and treating the netdb like a blackbox, we can
see the total bandwidth required by the netdb to be:<pre>
  recvKBps = N * (L + 1) * (1 + F) * (1 + R) * S / T
</pre>where<pre>
  N = number of routers in the entire network
  L = average number of client destinations on each router
      (+1 for the routerInfo)
  F = tunnel failure percentage
  R = tunnel rebuild period, as a fraction of the tunnel lifetime
  S = average netdb entry size
  T = tunnel lifetime
</pre>Plugging in a few values:<pre>
  recvKBps = 1000 * (5 + 1) * (1 + 0.05) * (1 + 0.2) * 2KB / 10m
           = 25.2KBps
</pre>That, in turn, scales linearly with N (at 100,000 peers, the netdb must
be able to handle netdb store messages totalling 2.5MBps, or, at 300 peers,
7.6KBps).
</p><p>
While the floodfill netdb would have each netdb participant receiving only a
small fraction of the client generated netdb stores directly, they would all
receive all entries eventually, so all of their links should be capable of
handling the full recvKBps.  In turn, they'll all need to send
<tt>(recvKBps/sizeof(netdb)) * (sizeof(netdb)-1)</tt> to keep the other
peers in sync.
</p><p>
A floodfill netdb would not require either tunnel routing for netdb operation
or any special selection as to which entries it can answer 'safely', as the
basic assumption is that they are all storing everything.  Oh, and with regards
to the netdb disk usage required, its still fairly trivial for any modern
machine, requiring around 11MB for every 1000 peers <tt>(N * (L + 1) *
S)</tt>.
</p><p>
The kademlia netdb would cut down on these numbers, ideally bringing them to K
over M times their value, with K = the redundancy factor and M being the number
of routers in the netdb (e.g. 5/100, giving a recvKBps of 126KBps and 536MB at
100,000 routers).  The downside of the kademlia netdb though is the increased
complexity of safe operation in a hostile environment.
</p><p>
What I'm thinking about now is to simply implement and deploy a floodfill netdb
in our existing live network, letting peers who want to use it pick out other
peers who are flagged as members and query them instead of querying the
traditional kademlia netdb peers.  The bandwidth and disk requirements at this
stage are trivial enough  (7.6KBps and 3MB disk space) and it will remove the
netdb entirely from the debugging plan - issues that remain to be addressed
will be caused by something unrelated to the netdb.
</p><p>
How would peers be chosen to publish that flag saying they are a part of the
floodfill netdb?  At the beginning, it could be done manually as an advanced
config option (ignored if the router is not able to verify its external
reachability).  If too many peers set that flag, how do the netdb participants
pick which ones to eject?  Again, at the beginning it could be done manually as
an advanced config option (after dropping peers which are unreachable).  How do
we avoid netdb partitioning?  By having the routers verify that the netdb is
doing the flood fill properly by querying K random netdb peers.  How do routers
not participating in the netdb discover new routers to tunnel through?  Perhaps
this could be done by sending a particular netdb lookup so that the netdb
router would respond not with peers in the netdb, but with random peers outside
the netdb.
</p><p>
I2P's netdb is very different from traditional load bearing DHTs - it only
carries network metadata, not any actual payload, which is why even a netdb
using a floodfill algorithm will be able to sustain an arbitrary amount of
eepsite/irc/bt/mail/syndie/etc data.  We can even do some optimizations as I2P
grows to distribute that load a bit further (perhaps passing bloom filters
between the netdb participants to see what they need to share), but it seems we
can get by with a much simpler solution for now.
</p>

<h3>The Disabling of the Kademlia Algorithm</h3>
<p>
Kademlia was completely disabled in release 0.6.1.20.
</p><p>
(this is adapted from an IRC conversation with jrandom 11/07)
<br />
Kademlia requires a minimum level of service that the baseline could not offer (bw, cpu),
even after adding in tiers (pure kad is absurd on that point).
Kademlia just wouldn't work.  It was a nice idea, but not for a hostile and fluid environment.
</p>

<h3>Current Status</h3>
<p>The netDb plays a very specific role in the I2P network, and the algorithms
have been tuned towards our needs.  This also means that it hasn't been tuned
to address the needs we have yet to run into.  I2P is currently
fairly small (a few hundred routers).
There were some calculations that 3-5 floodfill routers should be able to handle
10,000 nodes in the network.
The netDb implementation more than adequately meets our
needs at the moment, but there will likely be further tuning and bugfixing as
the network grows.</p>

<h3>The Return of the Kademlia Algorithm?</h3>
<p>
(this is adapted from <a href="meeting195.html">the I2P meeting Jan. 2, 2007</a>)
<br />
The Kademlia netdb just wasn't working properly.
Is it dead forever or will it be coming back?
If it comes back, the peers in the kademlia netdb would be a very limited subset
of the routers in the network (basically an expanded number of floodfill peers, if/when the floodfill peers
cannot handle the load).
But until the floodfill peers cannot handle the load (and other peers cannot be added that can), it's unnecessary.
</p>

<h3>The Future of Floodfill</h3>
<p>
(this is adapted from an IRC conversation with jrandom 11/07)
<br />
Here's a proposal: Capacity class O is automatically floodfill.
Hmm.
Unless we're sure, we might end up with a fancy way of DDoS'ing all O class routers.
This is quite the case: we want to make sure the number of floodfill is as small as possible while providing sufficient reachability.
If/when netdb requests fail, then we need to increase the number of floodfill peers, but atm, I'm not aware of a netdb fetch problem.
There are 33 "O" class peers according to my records.
33 is a /lot/ to floodfill to.
</p><p>
So floodfill works best when the number of peers in that pool is firmly limited?
And the size of the floodfill pool shouldn't grow much, even if the network itself gradually would?
3-5 floodfill peers can handle 10K routers iirc (I posted a bunch of numbers on that explaining the details in the old syndie).
Sounds like a difficult requirement to fill with automatic opt-in,
especially if nodes opting in cannot trust data from others.
e.g. "let's see if I'm among the top 5",
and can only trust data about themselves (e.g. "I am definitely O class, and moving 150 KB/s, and up for 123 days").
And top 5 is hostile as well.  Basically, it's the same as the tor directory servers - chosen by trusted people (aka devs).
Yeah, right now it could be expoited by opt-in, but that'd be trivial to detect and deal with.
Seems like in the end, we might need something more useful than kademlia, and have only reasonably capable peers join that scheme.
N class and above should be a big enough quantity to suppress risk of an adversary causing denial of service, I'd hope.
But it would have to be different from floodfill then, in the sense that it wouldn't cause humongous traffic.
Large quantity?  For a DHT based netdb?
Not necessarily DHT-based.
</p>

{% endblock %}