{% extends "_layout.html" %}
{% block title %}How the Network Database (netDb) Works{% endblock %}
{% block content %}

<p>
Updated July 2010, current as of router version 0.8

<h2>Overview</h2>

<p>I2P's netDb is a specialized distributed database, containing 
just two types of data - router contact information (<b>RouterInfos</b>) and destination contact
information (<b>LeaseSets</b>).  Each piece of data is signed by the appropriate party and verified
by anyone who uses or stores it.  In addition, the data has liveliness information
within it, allowing irrelevant entries to be dropped, newer entries to replace
older ones, and protection against certain classes of attack.
</p>

<p>
The netDb is distributed with a simple technique called "floodfill".
Previously, the netDb also used the Kademlia DHT as a fallback algorithm. However,
it did not work well in our application, and it was completely disabled
in release 0.6.1.20. More information is <a href="status">below</a>.
</p>

<p>
Note that this document has been updated to include floodfill details,
but there are still some incorrect statements about the use of Kademlia that
need to be fixed up.
</p>

<h2><a name="routerInfo">RouterInfo</a></h2>

<p>When an I2P router wants to contact another router, they need to know some 
key pieces of data - all of which are bundled up and signed by the router into
a structure called the "RouterInfo", which is distributed under the key derived 
from the SHA256 of the router's identity.  The structure itself contains:</p><ul>
<li>The router's identity (a 2048bit ElGamal encryption key, a 1024bit DSA signing key, and a certificate)</li>
<li>The contact addresses at which it can be reached (e.g. TCP: example.org port 4108)</li>
<li>When this was published</li>
<li>A set of arbitrary text options</li>
<li>The signature of the above, generated by the identity's DSA signing key</li>
</ul>

<p>
The following text options, while not strictly required, are expected
to be present:
<ul>
<li><b>caps</b>
    (Capabilities flags - used to indicate approximate bandwidth and reachability)
<li><b>coreVersion</b>
    (The core library version, always the same as the router version)
<li><b>netId</b> = 2
    (Basic network compatibility - A router will refuse to communicate with a peer having a different netId)
<li><b>router.version</b>
    (Used to determine compatibility with newer features and messages)
<li><b>stat_uptime</b> = 90m
    (Always sent as 90m, for compatibility with an older scheme where routers published their actual uptime,
     and only sent tunnel requests to peers whose was more than 60m)
</ul>

These values are used by other routers for basic decisions.
Should we connect to this router? Should we attempt to route a tunnel through this router?
The bandwidth capability flag, in particular, is used only to determine whether
the router meets a minimum threshold for routing tunnels.
Above the minimum threshold, the advertised bandwidth is not used or trusted anywhere
in the router, except for display in the user interface and for debugging and network analysis.


<p>Additional text options include
a small number of statistics about the router's health, which are aggregated by
sites such as <a href="http://stats.i2p/">stats.i2p</a>
for network performance analysis and debugging.
These statistics were chosen to provide data crucial to the developers,
such as tunnel build success rates, while balancing the need for such data
with the side-effects that could result from revealing this data.
Current statistics are limited to:
<ul>
<li>1 hour average bandwidth (average of outbound  and inbound bandwidth)
<li>Client and exporatory tunnel build success, reject, and timeout rates
<li>1 hour average number of participating tunnels
</ul>


The data 
published can be seen in the router's user interface,
but is not used or trusted within the router.
As the network has matured, we have gradually removed most of the published
statistics to improve anonymity, and we plan to remove more in future releases.

</p>

<p>
<a href="common_structures_spec.html#struct_RouterInfo">RouterInfo specification</a>
<p>
<a href="http://docs.i2p2.de/core/net/i2p/data/RouterInfo.html">RouterInfo Javadoc</a>


<h2><a name="leaseSet">LeaseSet</a></h2>

<p>The second piece of data distributed in the netDb is a "LeaseSet" - documenting
a group of tunnel entry points (leases) for a particular client destination.  
Each of these leases specify the tunnel's gateway router (with the hash of its 
identity), the tunnel ID on that router to send messages (a 4 byte number), and
when that tunnel will expire.  The LeaseSet itself is stored in the netDb under
the key derived from the SHA256 of the destination.</p>
  
<p>In addition to these leases, the LeaseSet also includes the destination 
itself (namely, the destination's 2048bit ElGamal encryption key, 1024bit DSA 
signing key, and certificate) as well as an additional pair of signing and 
encryption keys.  These additional keys can be used for garlic routing messages
to the router on which the destination is located (though these keys are <b>not</b>
the router's keys - they are generated by the client and given to the router to
use).

FIXME

 End to end client messages are still, of course, encrypted with the 
destination's public keys.
[UPDATE - This is no longer true, we don't do end-to-end client encryption
any more, as explained in
<a href="how_intro.html">the introduction</a>.
So is there any use for the first encryption key, signing key, and certificate?
Can they be removed?]
</p>

<p>
<a href="common_structures_spec.html#struct_Lease">Lease specification</a>
<br>
<a href="common_structures_spec.html#struct_LeaseSet">LeaseSet specification</a>
<p>
<a href="http://docs.i2p2.de/core/net/i2p/data/Lease.html">Lease Javadoc</a>
<br>
<a href="http://docs.i2p2.de/core/net/i2p/data/LeaseSet.html">LeaseSet Javadoc</a>


<h3><a name="revoked">Revoked LeaseSets</a></h3>

A LeaseSet may be <i>revoked</i> by publishing a new LeaseSet with zero leases.


<h3><a name="encrypted">Encrypted LeaseSets</a></h3>

In an <i>encrypted</i> LeaseSet, all Leases are encrypted with a separate DSA key.
The leases may only be decoded, and thus the destination may only be contacted,
by those with the key.
There is no flag or other direct indication that the LeaseSet is encrypted.


<h2><a name="bootstrap">Bootstrapping</a></h2>

<p>The netDb is decentralized, however you do need at
least one reference to a peer so that the integration process
ties you in.  This is accomplished by "reseeding" your router with the RouterInfo
of an active peer - specifically, by retrieving their <code>routerInfo-$hash.dat</code>
file and storing it in your <code>netDb/</code> directory.  Anyone can provide
you with those files - you can even provide them to others by exposing your own
netDb directory.  To simplify the process,
volunteers publish their netDb directories (or a subset) on the regular (non-i2p) network,
and the URLs of these directories are hardcoded in I2P.
When the router starts up for the first time, it automatically fetches from
one of these URLs, selected at random.
</p>

<h2><a name="floodfill">Floodfill</a></h2>
<p>
(Adapted from a post by jrandom in the old Syndie, Nov. 26, 2005)
<br />
The floodfill netDb is really just a simple and perhaps temporary measure, 
using the simplest possible algorithm - send the data to a peer in the 
floodfill netDb, wait 10 seconds, pick a random peer in the netDb and ask them 
for the entry to be sent, verifying its proper insertion / distribution.  If the 
verification peer doesn't reply, or they don't have the entry, the sender 
repeats the process.  When the peer in the floodfill netDb receives a netDb 
store from a peer not in the floodfill netDb, they send it to all of the peers 
in the floodfill netDb.
</p><p>
Peers still do netDb exploration and bootstrapping as before.
</p><p>
At one point, the Kademlia 
search/store functionality was still in place.  The peers 
considered the floodfill peers as always being 'closer' to every key than any 
peer not participating in the netDb.  We fell back on the Kademlia 
netDb if the floodfill peers fail for some reason or another.
However, Kademlia has since been disabled completely (see below).
</p><p>
Determining who is part of the floodfill netDb is trivial - it is exposed in each 
router's published routerInfo.
</p><p>
As for efficiency, this algorithm is optimal when the netDb peers are known and 
their quantity is appropriate for the appropriate uptime demands.  Regarding 
scaling, we've got two peers who participate in the netDb right now, and 
they'll be able to handle the load by themselves until we've got 10k+ eepsites.
It's not as sexy as the old 
Kademlia netDb, but there are subtle anonymity attacks against non-flooded 
netDbs.
</p>

<h3><a name="opt-in">Floodfill Router Opt-in</a></h3>

<p>
Unlike Tor, where the directory servers are hardcoded and trusted,
and operated by known entities,
the members of the I2P floodfill peer set need not be trusted and
change over time.
While some peers are manually configured to be floodfill,
others are simply high-bandwidth routers who automatically volunteer
when the number of floodfill peers drops below a threshold.
This prevents any long-term network damage from losing most or all
floodfills to an attack.
In turn, these peers will un-floodfill themselves when there are
too many floodfills outstanding.
</p>
<p>
All netDb data are signed by their publisher, so a floodfill peer cannot
spoof netDb responses. All peers monitor the performance of the floodfill
routers they talk to, so that fake, malicious, or unresponsive floodfills
can be avoided.
While these defenses may be insufficient to prevent any network disruption,
we continue to refine the automated detection of and responses to bad floodfills.
The available statistics should make the router ID of a troublemaker readily apparent.
We also have methods for users to manually block peers by router hash
or IP, and several channels to get the word out to users.
</p>

<h2><a name="healing">Healing</a></h2>
<i>Needs update since Kademlia is disabled.</i>

<p>While the Kademlia algorithm is fairly efficient at maintaining the necessary
links, we keep additional statistics regarding the netDb's activity so that we 
can detect potential segmentation and actively avoid it.  This is done as part of
the peer profiling - with data points such as how many new and verifiable 
RouterInfo references a peer gives us, we can determine what peers know about
groups of peers that we have never seen references to.  When this occurs, we can
take advantage of Kademlia's flexibility in exploration and send requests to that
peer so as to integrate ourselves further with the part of the network seen by
that well integrated router.</p>

<h2><a name="migration">Migration</a></h2>
<i>Needs update since Kademlia is disabled.</i>

<p>Unlike traditional DHTs, the very act of conducting a search distributes the
data as well, since rather than passing IP+port # pairs, references are given to
the routers on which to query (namely, the SHA256 of those router's identities).
As such, iteratively searching for a particular destination's LeaseSet or 
router's RouterInfo will also provide you with the RouterInfo of the peers along
the way.</p>

<p>In addition, due to the time sensitivity of the data, the information doesn't
often need to be migrated - since a LeaseSet is only valid for the 10 minutes 
that the referenced tunnels are around, those entries can simply be dropped at
expiration, since they will be replaced at the new location when the router
publishes a new LeaseSet.</p>

<p>To address the concerns of <a href="http://citeseer.ist.psu.edu/douceur02sybil.html">Sybil attacks</a>,
the location used to store entries varies over time.  Rather than storing the
RouterInfo on the peers closest to SHA256(router identity), they are stored on
the peers closest to SHA256(router identity + YYYYMMdd), requiring an adversary
to remount the attack again daily so as to maintain closeness to the "current"
keyspace.  In addition, entries are probabilistically distributed to an additional
peer outside of the target keyspace, so that a successful compromise of the K
routers closest to the key will only degrade the search time.</p>

<h2><a name="delivery">Delivery</a></h2>

<p>As with DNS lookups, the fact that someone is trying to retrieve the LeaseSet
for a particular destination is sensitive (the fact that someone is <i>publishing</i>
a LeaseSet even more so!).  To address this, netDb searches and netDb store 
messages are simply sent through the router's exploratory tunnels.</p>

<h2><a name="multihome">MultiHoming</a></h2>

<p>Destinations may be hosted on multiple routers simultaneously, by using the same
private and public keys (traditionally named eepPriv.dat files).
As both instances will periodically publish their signed LeaseSets to the floodfill peers,
the most recently published LeaseSet will be returned to a peer requesting a database lookup.
As LeaseSets have (at most) a 10 minute lifetime, should a particular instance go down,
the outage will be 10 minutes at most, and generally much less than that.
The multihoming behavior has been verified with the test eepsite
<a href="http://multihome.i2p/">http://multihome.i2p/</a>.
</p>

<h2><a name="history">History</a></h2>

<a href="netdb_discussion.html">Moved to the netdb disussion page</a>.

<h2><a name="future">Future Work</a></h2>


{% endblock %}