i2p.www/pages/how_networkdatabase.html

<p>I2P's network database is a specialized kademlia derived DHT, containing 
just two types of data - router contact information and destination contact
information.  Each piece of data is signed by the appropriate party and verified
by anyone who uses or stores it.  In addition, the data has liveliness information
within it, allowing irrelevent entries to be dropped, newer entries to replace
older ones, and, for the paranoid, protection against certain classes of attack.
(Note that this is also why I2P bundles in the necessary code to determine the
correct time).</p>

<h2><a name="routerInfo">RouterInfo</a></h2>

<p>When an I2P router wants to contact another router, they need to know some 
key pieces of data - all of which are bundled up and signed by the router into
a structure called the "RouterInfo", which is distributed under the key derived 
from the SHA256 of the router's identity.  The structure itself contains:</p><ul>
<li>The router's identity (a 2048bit ElGamal encryption key, a 1024bit DSA signing key, and a certificate)</li>
<li>The contact addresses at which it can be reached (e.g. TCP: dev.i2p.net port 4108)</li>
<li>When this was published</li>
<li>A set of arbitrary (uninterpreted) text options</li>
<li>The signature of the above, generated by the identity's DSA signingkey</li>
</ul>

<p>The arbitrary text options are currently used to help debug the network,
publishing various stats about the router's health.  These stats will be disabled
by default once I2P 1.0 is out, and can be disabled by adding 
"router.publishPeerRankings=false" to the router 
<a href="http://localhost:7657/configadvanced.jsp">configuration</a>.  The data 
published can be seen on the router's <a href="http://localhost:7657/netdb.jsp">netDb</a>
page, but should not be trusted.</p>

<h2><a name="leaseSet">LeaseSet</a></h2>

<p>The second piece of data distributed in the netDb is a "LeaseSet" - documenting
a group of tunnel entry points (leases) for a particular client destination.  
Each of these leases specify the tunnel's gateway router (with the hash of its 
identity), the tunnel ID on that router to send messages (a 4 byte number), and
when that tunnel will expire.  The LeaseSet itself is stored in the netDb under
the key derived from the SHA256 of the destination.</p>
  
<p>In addition to these leases, the LeaseSet also includes the destination 
itself (namely, the destination's 2048bit ElGamal encryption key, 1024bit DSA 
signing key, and certificate) as well as an additional pair of signing and 
encryption keys.  These additional keys can be used for garlic routing messages
to the router on which the destination is located (though these keys are <b>not</b>
the router's keys - they are generated by the client and given to the router to
use).  End to end client messages are still, of course, encrypted with the 
destination's public keys.</p>

<h2><a name="bootstrap">Bootstrapping</a></h2>

<p>The netDb, being a DHT, is completely decentralized, however you do need at
least one reference to a peer so that you can let the DHT's integration process
tie you in.  This is accomplished by "reseeding" your router with the RouterInfo
of an active peer - specifically, by retrieving their <code>routerInfo-$hash.dat</code>
file and storing it in your <code>netDb/</code> directory.  Anyone can provide
you with those files - you can even provide them to others by exposing your own
netDb directory.  To simplify the process of finding someone with a RouterInfo,
an alias has been made to the <a href="http://dev.i2p.net/i2pdb/">netDb</a> dir
of one of the routers on dev.i2p.net.</p>

<h2><a name="healing">Healing</a></h2>

<p>While the kademlia algorithm is fairly efficient at maintaining the necessary
links, we keep additional statistics regarding the netDb's activity so that we 
can detect potential segmentation and actively avoid it.  This is done as part of
the peer profiling - with data points such as how many new and verifiable 
RouterInfo references a peer gives us, we can determine what peers know about
groups of peers that we have never seen references to.  When this occurs, we can
take advantage of kademlia's flexibility in exploration and send requests to that
peer so as to integrate ourselves further with the part of the network seen by
that well integrated router.</p>

<h2><a name="migration">Migration</a></h2>

<p>Unlike traditional DHTs, the very act of conducting a search distributes the
data as well, since rather than passing IP+port # pairs, references are given to
the routers on which to query (namely, the SHA256 of those router's identities).
As such, iteratively searching for a particular destination's LeaseSet or 
router's RouterInfo will also provide you with the RouterInfo of the peers along
the way.</p>

<p>In addition, due to the time sensitivity of the data, the information doesn't
often need to be migrated - since a LeaseSet is only valid for the 10 minutes 
that the referenced tunnels are around, those entries can simply be dropped at
expiration, since they will be replaced at the new location when the router
publishes a new LeaseSet.</p>

<p>To address the concerns of <a href="http://citeseer.ist.psu.edu/douceur02sybil.html">Sybil attacks</a>,
the location used to store entries varies over time.  Rather than storing the
RouterInfo on the peers closest to SHA256(router identity), they are stored on
the peers closest to SHA256(router identity + YYYYMMdd), requiring an adversary
to remount the attack again daily so as to maintain closeness to the "current"
keyspace.  In addition, entries are probabalistically distributed to an additional
peer outside of the target keyspace, so that a successful compromise of the K
routers closest to the key will only degrade the search time.</p>

<h2><a name="delivery">Delivery</a></h2>

<p>As with DNS lookups, the fact that someone is trying to retrieve the LeaseSet
for a particular destination is sensitive (the fact that someone is <i>publishing</i>
a LeaseSet even more so!).  To address this, netDb searches and netDb store 
messages are simply sent through the router's exploratory tunnels.</p>

<h2><a name="status">Status</a></h2>

<p>The netDb plays a very specific role in the I2P network, and the algorithms
have been tuned towards our needs.  This also means that it hasn't been tuned 
to address the needs we have yet to run into.  I2P is currently (2005/02/18)
fairly small (only 200 nodes), and we have not yet had to deal with the situations
that kademlia really shines in - times when there are thousands or even millions
of peers in the network.  The netDb implementation more than adequately meets our
needs at the moment, but there will likely be further tuning and bugfixing as 
the network grows.</p>
0.5 2005-02-18 15:57:38 +00:00			`<p>I2P's network database is a specialized kademlia derived DHT, containing`
			`just two types of data - router contact information and destination contact`
			`information. Each piece of data is signed by the appropriate party and verified`
			`by anyone who uses or stores it. In addition, the data has liveliness information`
			`within it, allowing irrelevent entries to be dropped, newer entries to replace`
			`older ones, and, for the paranoid, protection against certain classes of attack.`
			`(Note that this is also why I2P bundles in the necessary code to determine the`
			`correct time).</p>`

			`<h2><a name="routerInfo">RouterInfo</a></h2>`

			`<p>When an I2P router wants to contact another router, they need to know some`
			`key pieces of data - all of which are bundled up and signed by the router into`
			`a structure called the "RouterInfo", which is distributed under the key derived`
			`from the SHA256 of the router's identity. The structure itself contains:</p><ul>`
			`<li>The router's identity (a 2048bit ElGamal encryption key, a 1024bit DSA signing key, and a certificate)</li>`
			`<li>The contact addresses at which it can be reached (e.g. TCP: dev.i2p.net port 4108)</li>`
			`<li>When this was published</li>`
			`<li>A set of arbitrary (uninterpreted) text options</li>`
			`<li>The signature of the above, generated by the identity's DSA signingkey</li>`
Imported from the old IIP Wiki backups. Most of the information is probably outdated, since it was already outdated. (ugha) 2004-07-31 16:21:49 +00:00			`</ul>`

0.5 2005-02-18 15:57:38 +00:00			`<p>The arbitrary text options are currently used to help debug the network,`
			`publishing various stats about the router's health. These stats will be disabled`
			`by default once I2P 1.0 is out, and can be disabled by adding`
			`"router.publishPeerRankings=false" to the router`
			`<a href="http://localhost:7657/configadvanced.jsp">configuration</a>. The data`
			`published can be seen on the router's <a href="http://localhost:7657/netdb.jsp">netDb</a>`
			`page, but should not be trusted.</p>`

			`<h2><a name="leaseSet">LeaseSet</a></h2>`

			`<p>The second piece of data distributed in the netDb is a "LeaseSet" - documenting`
			`a group of tunnel entry points (leases) for a particular client destination.`
			`Each of these leases specify the tunnel's gateway router (with the hash of its`
			`identity), the tunnel ID on that router to send messages (a 4 byte number), and`
			`when that tunnel will expire. The LeaseSet itself is stored in the netDb under`
			`the key derived from the SHA256 of the destination.</p>`

			`<p>In addition to these leases, the LeaseSet also includes the destination`
			`itself (namely, the destination's 2048bit ElGamal encryption key, 1024bit DSA`
			`signing key, and certificate) as well as an additional pair of signing and`
			`encryption keys. These additional keys can be used for garlic routing messages`
			`to the router on which the destination is located (though these keys are <b>not</b>`
			`the router's keys - they are generated by the client and given to the router to`
			`use). End to end client messages are still, of course, encrypted with the`
			`destination's public keys.</p>`

			`<h2><a name="bootstrap">Bootstrapping</a></h2>`

			`<p>The netDb, being a DHT, is completely decentralized, however you do need at`
			`least one reference to a peer so that you can let the DHT's integration process`
			`tie you in. This is accomplished by "reseeding" your router with the RouterInfo`
			`of an active peer - specifically, by retrieving their <code>routerInfo-$hash.dat</code>`
			`file and storing it in your <code>netDb/</code> directory. Anyone can provide`
			`you with those files - you can even provide them to others by exposing your own`
			`netDb directory. To simplify the process of finding someone with a RouterInfo,`
			`an alias has been made to the <a href="http://dev.i2p.net/i2pdb/">netDb</a> dir`
			`of one of the routers on dev.i2p.net.</p>`

			`<h2><a name="healing">Healing</a></h2>`

			`<p>While the kademlia algorithm is fairly efficient at maintaining the necessary`
			`links, we keep additional statistics regarding the netDb's activity so that we`
			`can detect potential segmentation and actively avoid it. This is done as part of`
			`the peer profiling - with data points such as how many new and verifiable`
			`RouterInfo references a peer gives us, we can determine what peers know about`
			`groups of peers that we have never seen references to. When this occurs, we can`
			`take advantage of kademlia's flexibility in exploration and send requests to that`
			`peer so as to integrate ourselves further with the part of the network seen by`
			`that well integrated router.</p>`

			`<h2><a name="migration">Migration</a></h2>`

			`<p>Unlike traditional DHTs, the very act of conducting a search distributes the`
			`data as well, since rather than passing IP+port # pairs, references are given to`
			`the routers on which to query (namely, the SHA256 of those router's identities).`
			`As such, iteratively searching for a particular destination's LeaseSet or`
			`router's RouterInfo will also provide you with the RouterInfo of the peers along`
			`the way.</p>`

			`<p>In addition, due to the time sensitivity of the data, the information doesn't`
			`often need to be migrated - since a LeaseSet is only valid for the 10 minutes`
			`that the referenced tunnels are around, those entries can simply be dropped at`
			`expiration, since they will be replaced at the new location when the router`
			`publishes a new LeaseSet.</p>`

			`<p>To address the concerns of <a href="http://citeseer.ist.psu.edu/douceur02sybil.html">Sybil attacks</a>,`
			`the location used to store entries varies over time. Rather than storing the`
			`RouterInfo on the peers closest to SHA256(router identity), they are stored on`
			`the peers closest to SHA256(router identity + YYYYMMdd), requiring an adversary`
			`to remount the attack again daily so as to maintain closeness to the "current"`
			`keyspace. In addition, entries are probabalistically distributed to an additional`
			`peer outside of the target keyspace, so that a successful compromise of the K`
			`routers closest to the key will only degrade the search time.</p>`

			`<h2><a name="delivery">Delivery</a></h2>`

			`<p>As with DNS lookups, the fact that someone is trying to retrieve the LeaseSet`
			`for a particular destination is sensitive (the fact that someone is <i>publishing</i>`
			`a LeaseSet even more so!). To address this, netDb searches and netDb store`
			`messages are simply sent through the router's exploratory tunnels.</p>`

			`<h2><a name="status">Status</a></h2>`

			`<p>The netDb plays a very specific role in the I2P network, and the algorithms`
			`have been tuned towards our needs. This also means that it hasn't been tuned`
			`to address the needs we have yet to run into. I2P is currently (2005/02/18)`
			`fairly small (only 200 nodes), and we have not yet had to deal with the situations`
			`that kademlia really shines in - times when there are thousands or even millions`
			`of peers in the network. The netDb implementation more than adequately meets our`
			`needs at the moment, but there will likely be further tuning and bugfixing as`
			`the network grows.</p>`