{% extends "_layout.html" %} {% block title %}How the Network Database (netDb) Works{% endblock %} {% block content %}

Updated July 2010, current as of router version 0.8

Overview

I2P's netDb is a specialized distributed database, containing just two types of data - router contact information (RouterInfos) and destination contact information (LeaseSets). Each piece of data is signed by the appropriate party and verified by anyone who uses or stores it. In addition, the data has liveliness information within it, allowing irrelevant entries to be dropped, newer entries to replace older ones, and protection against certain classes of attack.

The netDb is distributed with a simple technique called "floodfill". Previously, the netDb also used the Kademlia DHT as a fallback algorithm. However, it did not work well in our application, and it was completely disabled in release 0.6.1.20. More information is below.

Note that this document has been updated to include floodfill details, but there are still some incorrect statements about the use of Kademlia that need to be fixed up.

RouterInfo

When an I2P router wants to contact another router, they need to know some key pieces of data - all of which are bundled up and signed by the router into a structure called the "RouterInfo", which is distributed under the key derived from the SHA256 of the router's identity. The structure itself contains:

The following text options, while not strictly required, are expected to be present:

These values are used by other routers for basic decisions. Should we connect to this router? Should we attempt to route a tunnel through this router? The bandwidth capability flag, in particular, is used only to determine whether the router meets a minimum threshold for routing tunnels. Above the minimum threshold, the advertised bandwidth is not used or trusted anywhere in the router, except for display in the user interface and for debugging and network analysis.

Additional text options include a small number of statistics about the router's health, which are aggregated by sites such as stats.i2p for network performance analysis and debugging. These statistics were chosen to provide data crucial to the developers, such as tunnel build success rates, while balancing the need for such data with the side-effects that could result from revealing this data. Current statistics are limited to:

The data published can be seen in the router's user interface, but is not used or trusted within the router. As the network has matured, we have gradually removed most of the published statistics to improve anonymity, and we plan to remove more in future releases.

RouterInfo specification

RouterInfo Javadoc

LeaseSet

The second piece of data distributed in the netDb is a "LeaseSet" - documenting a group of tunnel entry points (leases) for a particular client destination. Each of these leases specify the tunnel's gateway router (with the hash of its identity), the tunnel ID on that router to send messages (a 4 byte number), and when that tunnel will expire. The LeaseSet itself is stored in the netDb under the key derived from the SHA256 of the destination.

In addition to these leases, the LeaseSet also includes the destination itself (namely, the destination's 2048bit ElGamal encryption key, 1024bit DSA signing key, and certificate) as well as an additional pair of signing and encryption keys. These additional keys can be used for garlic routing messages to the router on which the destination is located (though these keys are not the router's keys - they are generated by the client and given to the router to use). FIXME End to end client messages are still, of course, encrypted with the destination's public keys. [UPDATE - This is no longer true, we don't do end-to-end client encryption any more, as explained in the introduction. So is there any use for the first encryption key, signing key, and certificate? Can they be removed?]

Lease specification
LeaseSet specification

Lease Javadoc
LeaseSet Javadoc

Revoked LeaseSets

A LeaseSet may be revoked by publishing a new LeaseSet with zero leases.

Encrypted LeaseSets

In an encrypted LeaseSet, all Leases are encrypted with a separate DSA key. The leases may only be decoded, and thus the destination may only be contacted, by those with the key. There is no flag or other direct indication that the LeaseSet is encrypted.

Bootstrapping

The netDb is decentralized, however you do need at least one reference to a peer so that the integration process ties you in. This is accomplished by "reseeding" your router with the RouterInfo of an active peer - specifically, by retrieving their routerInfo-$hash.dat file and storing it in your netDb/ directory. Anyone can provide you with those files - you can even provide them to others by exposing your own netDb directory. To simplify the process, volunteers publish their netDb directories (or a subset) on the regular (non-i2p) network, and the URLs of these directories are hardcoded in I2P. When the router starts up for the first time, it automatically fetches from one of these URLs, selected at random.

Floodfill

(Adapted from a post by jrandom in the old Syndie, Nov. 26, 2005)
The floodfill netDb is really just a simple and perhaps temporary measure, using the simplest possible algorithm - send the data to a peer in the floodfill netDb, wait 10 seconds, pick a random peer in the netDb and ask them for the entry to be sent, verifying its proper insertion / distribution. If the verification peer doesn't reply, or they don't have the entry, the sender repeats the process. When the peer in the floodfill netDb receives a netDb store from a peer not in the floodfill netDb, they send it to all of the peers in the floodfill netDb.

Peers still do netDb exploration and bootstrapping as before.

At one point, the Kademlia search/store functionality was still in place. The peers considered the floodfill peers as always being 'closer' to every key than any peer not participating in the netDb. We fell back on the Kademlia netDb if the floodfill peers fail for some reason or another. However, Kademlia has since been disabled completely (see below).

Determining who is part of the floodfill netDb is trivial - it is exposed in each router's published routerInfo.

As for efficiency, this algorithm is optimal when the netDb peers are known and their quantity is appropriate for the appropriate uptime demands. Regarding scaling, we've got two peers who participate in the netDb right now, and they'll be able to handle the load by themselves until we've got 10k+ eepsites. It's not as sexy as the old Kademlia netDb, but there are subtle anonymity attacks against non-flooded netDbs.

Floodfill Router Opt-in

Unlike Tor, where the directory servers are hardcoded and trusted, and operated by known entities, the members of the I2P floodfill peer set need not be trusted and change over time. While some peers are manually configured to be floodfill, others are simply high-bandwidth routers who automatically volunteer when the number of floodfill peers drops below a threshold. This prevents any long-term network damage from losing most or all floodfills to an attack. In turn, these peers will un-floodfill themselves when there are too many floodfills outstanding.

All netDb data are signed by their publisher, so a floodfill peer cannot spoof netDb responses. All peers monitor the performance of the floodfill routers they talk to, so that fake, malicious, or unresponsive floodfills can be avoided. While these defenses may be insufficient to prevent any network disruption, we continue to refine the automated detection of and responses to bad floodfills. The available statistics should make the router ID of a troublemaker readily apparent. We also have methods for users to manually block peers by router hash or IP, and several channels to get the word out to users.

Healing

Needs update since Kademlia is disabled.

While the Kademlia algorithm is fairly efficient at maintaining the necessary links, we keep additional statistics regarding the netDb's activity so that we can detect potential segmentation and actively avoid it. This is done as part of the peer profiling - with data points such as how many new and verifiable RouterInfo references a peer gives us, we can determine what peers know about groups of peers that we have never seen references to. When this occurs, we can take advantage of Kademlia's flexibility in exploration and send requests to that peer so as to integrate ourselves further with the part of the network seen by that well integrated router.

Migration

Needs update since Kademlia is disabled.

Unlike traditional DHTs, the very act of conducting a search distributes the data as well, since rather than passing IP+port # pairs, references are given to the routers on which to query (namely, the SHA256 of those router's identities). As such, iteratively searching for a particular destination's LeaseSet or router's RouterInfo will also provide you with the RouterInfo of the peers along the way.

In addition, due to the time sensitivity of the data, the information doesn't often need to be migrated - since a LeaseSet is only valid for the 10 minutes that the referenced tunnels are around, those entries can simply be dropped at expiration, since they will be replaced at the new location when the router publishes a new LeaseSet.

To address the concerns of Sybil attacks, the location used to store entries varies over time. Rather than storing the RouterInfo on the peers closest to SHA256(router identity), they are stored on the peers closest to SHA256(router identity + YYYYMMdd), requiring an adversary to remount the attack again daily so as to maintain closeness to the "current" keyspace. In addition, entries are probabilistically distributed to an additional peer outside of the target keyspace, so that a successful compromise of the K routers closest to the key will only degrade the search time.

Delivery

As with DNS lookups, the fact that someone is trying to retrieve the LeaseSet for a particular destination is sensitive (the fact that someone is publishing a LeaseSet even more so!). To address this, netDb searches and netDb store messages are simply sent through the router's exploratory tunnels.

MultiHoming

Destinations may be hosted on multiple routers simultaneously, by using the same private and public keys (traditionally named eepPriv.dat files). As both instances will periodically publish their signed LeaseSets to the floodfill peers, the most recently published LeaseSet will be returned to a peer requesting a database lookup. As LeaseSets have (at most) a 10 minute lifetime, should a particular instance go down, the outage will be 10 minutes at most, and generally much less than that. The multihoming behavior has been verified with the test eepsite http://multihome.i2p/.

History

Moved to the netdb disussion page.

Future Work

{% endblock %}