{% extends "_layout.html" %} {% block title %}How the Network Database (netDb) Works{% endblock %} {% block content %}
Updated July 2010, current as of router version 0.8
I2P's netDb is a specialized distributed database, containing just two types of data - router contact information (RouterInfos) and destination contact information (LeaseSets). Each piece of data is signed by the appropriate party and verified by anyone who uses or stores it. In addition, the data has liveliness information within it, allowing irrelevant entries to be dropped, newer entries to replace older ones, and protection against certain classes of attack.
The netDb is distributed with a simple technique called "floodfill", where a subset of all routers, called "floodfill routers", maintains the distributed database.
When an I2P router wants to contact another router, they need to know some key pieces of data - all of which are bundled up and signed by the router into a structure called the "RouterInfo", which is distributed under the key derived from the SHA256 of the router's identity. The structure itself contains:
The following text options, while not strictly required, are expected to be present:
Additional text options include a small number of statistics about the router's health, which are aggregated by sites such as stats.i2p for network performance analysis and debugging. These statistics were chosen to provide data crucial to the developers, such as tunnel build success rates, while balancing the need for such data with the side-effects that could result from revealing this data. Current statistics are limited to:
The second piece of data distributed in the netDb is a "LeaseSet" - documenting a group of tunnel entry points (leases) for a particular client destination. Each of these leases specify the tunnel's gateway router (with the hash of its identity), the tunnel ID on that router to send messages (a 4 byte number), and when that tunnel will expire. The LeaseSet itself is stored in the netDb under the key derived from the SHA256 of the destination.
In addition to these leases, the LeaseSet includes the destination itself (namely, the destination's 2048bit ElGamal encryption key, 1024bit DSA signing key, and certificate) and an additional signing and encryption public keys. The additional encryption public key is used for end-to-end encryption of garlic messages. The additional signing publc key was intended for LeaseSet revocation but is currently unused.
Lease specification
LeaseSet specification
Lease Javadoc
LeaseSet Javadoc
The netDb is decentralized, however you do need at
least one reference to a peer so that the integration process
ties you in. This is accomplished by "reseeding" your router with the RouterInfo
of an active peer - specifically, by retrieving their routerInfo-$hash.dat
file and storing it in your netDb/
directory. Anyone can provide
you with those files - you can even provide them to others by exposing your own
netDb directory. To simplify the process,
volunteers publish their netDb directories (or a subset) on the regular (non-i2p) network,
and the URLs of these directories are hardcoded in I2P.
When the router starts up for the first time, it automatically fetches from
one of these URLs, selected at random.
Determining who is part of the floodfill netDb is trivial - it is exposed in each router's published routerInfo as a capability.
Floodfills have no central authority and do not form a "consensus" - they only implement a simple DHT overlay.
Unlike Tor, where the directory servers are hardcoded and trusted, and operated by known entities, the members of the I2P floodfill peer set need not be trusted, and change over time.
To increase reliability of the netDb, and minimize the impact of netDb traffic on a router, floodfill is automatically enabled only on routers that are configured with high bandwidth limits. Routers with high bandwidth limits (which must be manually configured, as the default is much lower) are presumed to be on lower-latency connections, and are more likely to be available 24/7. The current minimum share bandwidth for a floodfill router is 128 KBytes/sec.
In addition, a router must pass several additional tests for health (outbound message queue time, job lag, etc.) before floodfill operation is automatically enabled.
With the current rules for automatic opt-in, approximately 6% of the routers in the network are floodfill routers. While some peers are manually configured to be floodfill, others are simply high-bandwidth routers who automatically volunteer when the number of floodfill peers drops below a threshold. This prevents any long-term network damage from losing most or all floodfills to an attack. In turn, these peers will un-floodfill themselves when there are too many floodfills outstanding.
A router publishes its own RouterInfo by directly connecting to a floodfill router and sending it a I2NP DatabaseStoreMessage with a nonzero Reply Token. The message is not end-to-end garlic encrypted, as this is a direct connection, so there are no intervening routers (and no need to hide this data anyway). The floodfill router replies with a I2NP DeliveryStatusMessage, with the Message ID set to the value of the Reply Token.
It then directly connects to each of the 7 peers and sends it a I2NP DatabaseStoreMessage with a zero Reply Token. The message is not end-to-end garlic encrypted, as this is a direct connection, so there are no intervening routers (and no need to hide this data anyway). The other routers do not reply or re-flood, as the Reply Token is zero.
Lookups are generally sent to the two "good" floodfill routers closest to the requested key, in parallel.
If the key is found locally by the floodfill router, it responds with a I2NP DatabaseStoreMessage. If the key is not found locally by the floodfill router, it responds with a I2NP DatabaseSearchReplyMessage containing a list of other floodfill routers close to the key.
Lookups are not encrypted and thus are vulnerable to snooping by the outbound endpoint (OBEP) of the client tunnel.
As the requesting router does not reveal itself, there is no recipient public key for the floodfill router to encrypt the reply with. Therefore, the reply is exposed to the inbound gateway (IBGW) of the inbound exploratory tunnel. An appropriate method of encrypting the reply is a topic for future work.
(Reference: Hashing it out in Public Section 2.3 for terms below in italics)
Due to the relatively small size of the network, the flooding redundancy of 8x, and a lookup redundancy of 2x, lookups are currently O(1) rather than O(log n) -- a router is highly likely to know a floodfill router close enough to the key to get the answer on the first try. Neither recursive nor iterative routing for lookups is implemented.
Node IDs are verifiable in that we use the router hash directly as both the node ID and the Kademlia key. Given the current size of the network, a router has detailed knowledge of the neighborhood of the destination ID space.
Queries are sent throughmultiple routes simultaneously to reduce the chance of query failure.
After network growth of 5x - 10x, there will be a significant chance of lookup failure due to the O(1) lookup strategy, and implementation of an iterative lookup strategy will be required. See below for more information.
As for regular lookups, the reply is unencrypted, thus exposing the reply to the inbound gateway (IBGW) of the reply tunnel, and an appropriate method of encrypting the reply is a topic for future work. As the IBGW for the reply is one of the gateways published in the LeaseSet, the exposure is minimal.
Destinations may be hosted on multiple routers simultaneously, by using the same private and public keys (traditionally named eepPriv.dat files). As both instances will periodically publish their signed LeaseSets to the floodfill peers, the most recently published LeaseSet will be returned to a peer requesting a database lookup. As LeaseSets have (at most) a 10 minute lifetime, should a particular instance go down, the outage will be 10 minutes at most, and generally much less than that. The multihoming function has been verified and is in use by several services on the network.
A hostile user may attempt to harm the network by creating one or more floodfill routers and crafting them to offer bad, slow, or no reponses. Some scenarios are discussed below.
Each time a router needs to make a determination on which floodfill router is closest to a key, it uses these metrics to determine which floodfill routers are "good". The methods, and thresholds, used to determine "goodness" are relatively new, and are subject to further analysis and improvement. While a completely unresponsive router will quickly be identified and avoided, routers that are only sometimes malicious may be much harder to deal with.
If the floodfills are not sufficiently misbehaving to be marked as "bad" using the peer profile metrics described above, this is a difficult scenario to handle. Tor's response can be much more nimble in the relay case, as the suspicious relays can be manually removed from the consensus. Some possible reponses in the I2P case, none of them satisfactory:
This attack becomes more difficult as the network size grows.
As the keyspace is indexed by the cryptographic (SHA256) Hash of the key, an attacker must use a brute-force method to repeatedly generate router hashes until he has enough that are sufficiently close to the key. The amount of computational power required for this, which is dependent on network size, is unknown.
As a partial defense against this attack, the algorithm used to determine Kademlia "closeness" varies over time. Rather than using the Hash of the key (i.e. H(k))to determine closeness, we use the Hash of the key appended with the current date string, i.e. H(k + YYYYMMDD). A function called the "routing key generator" does this, which transforms the original key into a "routing key". In other words, the entire netdb keyspace "rotates" every day at UTC midnight. Any partial-keyspace attack would have to be regenerated every day, as after the rotation, the attacking routers would no longer be close to the target key, or to each other.
This attack becomes more difficult as the network size grows.
One consequence of daily keyspace rotation is that the distributed network database may become unreliable for a few minutes after the rotation -- lookups will fail because the new "closest" router has not received a store yet. The extent of the issue, and methods for mitigation (for example netdb "handoffs" at midnight) are a topic for further study.
Several defenses are possible, and most of these are planned:
Similar to a bootstrap attack, an attacker using a floodfill router could attempt to "steer" peers to a subset of routers controlled by him by returning their references.
This is unlikely to work via exploration, because exploration is a low-frequency task. Routers acquire a majority of their peer references through normal tunnel building activity. Exploration results are generally limited to a few router hashes, and each exploration query is directed to a random floodfill router.
For floodfill router references returned in a I2NP DatabaseSearchReplyMessage response to a lookup, these references are not immediately followed. The requesting router does not trust that the references are closer to the key (i.e. they are verifiably correct, and the references are not immediately queried. In other words, the Kademlia lookup is not iterative. This means the query capture attack described in Hashing it out in Public much less likely, until iterative lookups are implemented.
This doesn't have much to do with floodfill, but see the peer selection page for a discussion of the vulnerabilities of peer selection for tunnels.
End-to-end encryption of additional netDb lookups and responses. {% endblock %}