Files
i2p.www/pages/how_networkdatabase.html

113 lines
6.6 KiB
HTML
Raw Normal View History

2005-02-18 15:57:38 +00:00
<p>I2P's network database is a specialized kademlia derived DHT, containing
just two types of data - router contact information and destination contact
information. Each piece of data is signed by the appropriate party and verified
by anyone who uses or stores it. In addition, the data has liveliness information
within it, allowing irrelevent entries to be dropped, newer entries to replace
older ones, and, for the paranoid, protection against certain classes of attack.
(Note that this is also why I2P bundles in the necessary code to determine the
correct time).</p>
<h2><a name="routerInfo">RouterInfo</a></h2>
<p>When an I2P router wants to contact another router, they need to know some
key pieces of data - all of which are bundled up and signed by the router into
a structure called the "RouterInfo", which is distributed under the key derived
from the SHA256 of the router's identity. The structure itself contains:</p><ul>
<li>The router's identity (a 2048bit ElGamal encryption key, a 1024bit DSA signing key, and a certificate)</li>
<li>The contact addresses at which it can be reached (e.g. TCP: dev.i2p.net port 4108)</li>
<li>When this was published</li>
<li>A set of arbitrary (uninterpreted) text options</li>
<li>The signature of the above, generated by the identity's DSA signingkey</li>
</ul>
2005-02-18 15:57:38 +00:00
<p>The arbitrary text options are currently used to help debug the network,
publishing various stats about the router's health. These stats will be disabled
by default once I2P 1.0 is out, and can be disabled by adding
"router.publishPeerRankings=false" to the router
<a href="http://localhost:7657/configadvanced.jsp">configuration</a>. The data
published can be seen on the router's <a href="http://localhost:7657/netdb.jsp">netDb</a>
page, but should not be trusted.</p>
<h2><a name="leaseSet">LeaseSet</a></h2>
<p>The second piece of data distributed in the netDb is a "LeaseSet" - documenting
a group of tunnel entry points (leases) for a particular client destination.
Each of these leases specify the tunnel's gateway router (with the hash of its
identity), the tunnel ID on that router to send messages (a 4 byte number), and
when that tunnel will expire. The LeaseSet itself is stored in the netDb under
the key derived from the SHA256 of the destination.</p>
<p>In addition to these leases, the LeaseSet also includes the destination
itself (namely, the destination's 2048bit ElGamal encryption key, 1024bit DSA
signing key, and certificate) as well as an additional pair of signing and
encryption keys. These additional keys can be used for garlic routing messages
to the router on which the destination is located (though these keys are <b>not</b>
the router's keys - they are generated by the client and given to the router to
use). End to end client messages are still, of course, encrypted with the
destination's public keys.</p>
<h2><a name="bootstrap">Bootstrapping</a></h2>
<p>The netDb, being a DHT, is completely decentralized, however you do need at
least one reference to a peer so that you can let the DHT's integration process
tie you in. This is accomplished by "reseeding" your router with the RouterInfo
of an active peer - specifically, by retrieving their <code>routerInfo-$hash.dat</code>
file and storing it in your <code>netDb/</code> directory. Anyone can provide
you with those files - you can even provide them to others by exposing your own
netDb directory. To simplify the process of finding someone with a RouterInfo,
an alias has been made to the <a href="http://dev.i2p.net/i2pdb/">netDb</a> dir
of one of the routers on dev.i2p.net.</p>
<h2><a name="healing">Healing</a></h2>
<p>While the kademlia algorithm is fairly efficient at maintaining the necessary
links, we keep additional statistics regarding the netDb's activity so that we
can detect potential segmentation and actively avoid it. This is done as part of
the peer profiling - with data points such as how many new and verifiable
RouterInfo references a peer gives us, we can determine what peers know about
groups of peers that we have never seen references to. When this occurs, we can
take advantage of kademlia's flexibility in exploration and send requests to that
peer so as to integrate ourselves further with the part of the network seen by
that well integrated router.</p>
<h2><a name="migration">Migration</a></h2>
<p>Unlike traditional DHTs, the very act of conducting a search distributes the
data as well, since rather than passing IP+port # pairs, references are given to
the routers on which to query (namely, the SHA256 of those router's identities).
As such, iteratively searching for a particular destination's LeaseSet or
router's RouterInfo will also provide you with the RouterInfo of the peers along
the way.</p>
<p>In addition, due to the time sensitivity of the data, the information doesn't
often need to be migrated - since a LeaseSet is only valid for the 10 minutes
that the referenced tunnels are around, those entries can simply be dropped at
expiration, since they will be replaced at the new location when the router
publishes a new LeaseSet.</p>
<p>To address the concerns of <a href="http://citeseer.ist.psu.edu/douceur02sybil.html">Sybil attacks</a>,
the location used to store entries varies over time. Rather than storing the
RouterInfo on the peers closest to SHA256(router identity), they are stored on
the peers closest to SHA256(router identity + YYYYMMdd), requiring an adversary
to remount the attack again daily so as to maintain closeness to the "current"
keyspace. In addition, entries are probabalistically distributed to an additional
peer outside of the target keyspace, so that a successful compromise of the K
routers closest to the key will only degrade the search time.</p>
<h2><a name="delivery">Delivery</a></h2>
<p>As with DNS lookups, the fact that someone is trying to retrieve the LeaseSet
for a particular destination is sensitive (the fact that someone is <i>publishing</i>
a LeaseSet even more so!). To address this, netDb searches and netDb store
messages are simply sent through the router's exploratory tunnels.</p>
<h2><a name="status">Status</a></h2>
<p>The netDb plays a very specific role in the I2P network, and the algorithms
have been tuned towards our needs. This also means that it hasn't been tuned
to address the needs we have yet to run into. I2P is currently (2005/02/18)
fairly small (only 200 nodes), and we have not yet had to deal with the situations
that kademlia really shines in - times when there are thousands or even millions
of peers in the network. The netDb implementation more than adequately meets our
needs at the moment, but there will likely be further tuning and bugfixing as
the network grows.</p>