Files
i2p.www/i2p2www/pages/site/docs/how/network-database.html
2013-08-15 21:27:54 +00:00

838 lines
39 KiB
HTML

{% extends "global/layout.html" %}
{% block title %}{% trans %}The Network Database{% endtrans %}{% endblock %}
{% block lastupdated %}{% trans %}June 2013{% endtrans %}{% endblock %}
{% block accuratefor %}0.9.6{% endblock %}
{% block content %}
<h2>{% trans %}Overview{% endtrans %}</h2>
<p>{% trans -%}
I2P's netDb is a specialized distributed database, containing
just two types of data - router contact information (<b>RouterInfos</b>) and destination contact
information (<b>LeaseSets</b>). Each piece of data is signed by the appropriate party and verified
by anyone who uses or stores it. In addition, the data has liveliness information
within it, allowing irrelevant entries to be dropped, newer entries to replace
older ones, and protection against certain classes of attack.
{%- endtrans %}</p>
<p>{% trans -%}
The netDb is distributed with a simple technique called "floodfill",
where a subset of all routers, called "floodfill routers", maintains the distributed database.
{%- endtrans %}</p>
<h2 id="routerInfo">RouterInfo</h2>
<p>{% trans -%}
When an I2P router wants to contact another router, they need to know some
key pieces of data - all of which are bundled up and signed by the router into
a structure called the "RouterInfo", which is distributed with the SHA256 of the router's identity
as the key. The structure itself contains:
{%- endtrans %}</p>
<ul>
<li>{% trans %}The router's identity (a 2048bit ElGamal encryption key, a 1024bit DSA signing key, and a certificate){% endtrans %}</li>
<li>{% trans %}The contact addresses at which it can be reached (e.g. TCP: example.org port 4108){% endtrans %}</li>
<li>{% trans %}When this was published{% endtrans %}</li>
<li>{% trans %}A set of arbitrary text options{% endtrans %}</li>
<li>{% trans %}The signature of the above, generated by the identity's DSA signing key{% endtrans %}</li>
</ul>
<p>{% trans -%}
The following text options, while not strictly required, are expected
to be present:
{%- endtrans %}</p>
<ul>
<li><b>caps</b>
({% trans %}Capabilities flags - used to indicate floodfill participation, approximate bandwidth, and perceived reachability{% endtrans %})
</li>
<li><b>coreVersion</b>
({% trans %}The core library version, always the same as the router version{% endtrans %})
</li>
<li><b>netId</b> = 2
({% trans %}Basic network compatibility - A router will refuse to communicate with a peer having a different netId{% endtrans %})
</li>
<li><b>router.version</b>
({% trans %}Used to determine compatibility with newer features and messages{% endtrans %})
</li>
<li><b>stat_uptime</b> = 90m
({% trans %}Always sent as 90m, for compatibility with an older scheme where routers published their actual uptime,
and only sent tunnel requests to peers whose was more than 60m{% endtrans %})
</li>
</ul>
<p>{% trans -%}
These values are used by other routers for basic decisions.
Should we connect to this router? Should we attempt to route a tunnel through this router?
The bandwidth capability flag, in particular, is used only to determine whether
the router meets a minimum threshold for routing tunnels.
Above the minimum threshold, the advertised bandwidth is not used or trusted anywhere
in the router, except for display in the user interface and for debugging and network analysis.
{%- endtrans %}</p>
<p>{% trans stats=i2pconv('stats.i2p') -%}
Additional text options include
a small number of statistics about the router's health, which are aggregated by
sites such as <a href="http://{{ stats }}/">{{ stats }}</a>
for network performance analysis and debugging.
These statistics were chosen to provide data crucial to the developers,
such as tunnel build success rates, while balancing the need for such data
with the side-effects that could result from revealing this data.
Current statistics are limited to:
{%- endtrans %}</p>
<ul>
<li>{% trans %}Client and exploratory tunnel build success, reject, and timeout rates{% endtrans %}
<li>{% trans %}1 hour average number of participating tunnels{% endtrans %}
</ul>
<p>{% trans -%}
The data published can be seen in the router's user interface,
but is not used or trusted within the router.
As the network has matured, we have gradually removed most of the published
statistics to improve anonymity, and we plan to remove more in future releases.
{%- endtrans %}</p>
<p>
<a href="{{ site_url('docs/spec/common-structures') }}#struct_RouterInfo">{% trans %}RouterInfo specification{% endtrans %}</a>
</p>
<p>
<a href="http://docs.i2p-projekt.de/javadoc/net/i2p/data/RouterInfo.html">{% trans %}RouterInfo Javadoc{% endtrans %}</a>
</p>
<h3>{% trans %}RouterInfo Expiration{% endtrans %}</h3>
<p>{% trans -%}
RouterInfos have no set expiration time.
Each router is free to maintain its own local policy to trade off the frequency of RouterInfo lookups
with memory or disk usage.
In the current implementation, there are the following general policies:
{%- endtrans %}</p>
<ul>
<li>{% trans -%}
There is no expiration during the first hour of uptime, as the persistent stored data may be old.
{%- endtrans %}</li>
<li>{% trans -%}
There is no expiration if there are 25 or less RouterInfos.
{%- endtrans %}</li>
<li>{% trans -%}
As the number of local RouterInfos grows, the expiration time shrinks, in an attempt to maintain
a reasonable number RouterInfos. The expiration time with less than 120 routers is 72 hours,
while expiration time with 300 routers is around 30 hours.
{%- endtrans %}</li>
<li>{% trans ssu=site_url('docs/transport/ssu') -%}
RouterInfos containing <a href="{{ ssu }}">SSU</a> introducers expire in about an hour, as
the introducer list expires in about that time.
{%- endtrans %}</li>
<li>{% trans -%}
Floodfills use a short expiration time (1 hour) for all local RouterInfos, as valid RouterInfos will
be frequently republished to them.
{%- endtrans %}</li>
</ul>
<h3>{% trans %}RouterInfo Persistent Storage{% endtrans %}</h3>
<p>{% trans -%}
RouterInfos are periodically written to disk so that they are available after a restart.
{%- endtrans %}</p>
<h2 id="leaseSet">LeaseSet</h2>
<p>{% trans -%}
The second piece of data distributed in the netDb is a "LeaseSet" - documenting
a group of <b>tunnel entry points (leases)</b> for a particular client destination.
Each of these leases specify the following information:
{%- endtrans %}</p>
<ul>
<li>{% trans %}The tunnel gateway router (by specifying its identity){% endtrans %}</li>
<li>{% trans %}The tunnel ID on that router to send messages with (a 4 byte number){% endtrans %}</li>
<li>{% trans %}When that tunnel will expire.{% endtrans %}</li>
</ul>
<p>{% trans -%}
The LeaseSet itself is stored in the netDb under
the key derived from the SHA256 of the destination.
{%- endtrans %}</p>
<p>{% trans -%}
In addition to these leases, the LeaseSet includes:
{%- endtrans %}</p>
<ul>
<li>{% trans %}The destination itself (a 2048bit ElGamal encryption key, 1024bit DSA signing key and a certificate){% endtrans %}</li>
<li>{% trans %}Additional encryption public key: used for end-to-end encryption of garlic messages{% endtrans %}</li>
<li>{% trans %}Additional signing public key: intended for LeaseSet revocation, but is currently unused.{% endtrans %}</li>
<li>{% trans %}Signature of all the LeaseSet data, to make sure the Destination published the LeaseSet.{% endtrans %}</li>
</ul>
<p>
<a href="{{ site_url('docs/spec/common-structures') }}#struct_Lease">{% trans %}Lease specification{% endtrans %}</a>
<br />
<a href="{{ site_url('docs/spec/common-structures') }}#struct_LeaseSet">{% trans %}LeaseSet specification{% endtrans %}</a>
</p>
<p>
<a href="http://docs.i2p-projekt.de/javadoc/net/i2p/data/Lease.html">{% trans %}Lease Javadoc{% endtrans %}</a>
<br />
<a href="http://docs.i2p-projekt.de/javadoc/net/i2p/data/LeaseSet.html">{% trans %}LeaseSet Javadoc{% endtrans %}</a>
</p>
<h3 id="unpublished">{% trans %}Unpublished LeaseSets{% endtrans %}</h3>
<p>{% trans -%}
A LeaseSet for a destination used only for outgoing connections is <i>unpublished</i>.
It is never sent for publication to a floodfill router.
"Client" tunnels, such as those for web browsing and IRC clients, are unpublished.
Servers will still be able to send messages back to those unpublished destinations,
because of <a href="#leaseset_storage_peers">I2NP storage messages</a>.
{%- endtrans %}</p>
<h3 id="revoked">{% trans %}Revoked LeaseSets{% endtrans %}</h3>
<p>{% trans -%}
A LeaseSet may be <i>revoked</i> by publishing a new LeaseSet with zero leases.
Revocations must be signed by the additional signing key in the LeaseSet.
Revocations are not fully implemented, and it is unclear if they have any practical use.
This is the only planned use for that signing key, so it is currently unused.
{%- endtrans %}</p>
<h3 id="encrypted">{% trans %}Encrypted LeaseSets{% endtrans %}</h3>
<p>{% trans -%}
In an <i>encrypted</i> LeaseSet, all Leases are encrypted with a separate DSA key.
The leases may only be decoded, and thus the destination may only be contacted,
by those with the key.
There is no flag or other direct indication that the LeaseSet is encrypted.
Encrypted LeaseSets are not widely used, and it is a topic for future work to
research whether the user interface and implementation of encrypted LeaseSets could be improved.
{%- endtrans %}</p>
<h3>{% trans %}LeaseSet Expiration{% endtrans %}</h3>
<p>{% trans -%}
All Leases (tunnels) are valid for 10 minutes; therefore, a LeaseSet expires
10 minutes after the earliest creation time of all its Leases.
{%- endtrans %}</p>
<h3>{% trans %}LeaseSet Persistent Storage{% endtrans %}</h3>
<p>{% trans -%}
There is no persistent storage of LeaseSet data since they expire so quickly.
{%- endtrans %}</p>
<h2 id="bootstrap">{% trans %}Bootstrapping{% endtrans %}</h2>
<p>{% trans -%}
The netDb is decentralized, however you do need at
least one reference to a peer so that the integration process
ties you in. This is accomplished by "reseeding" your router with the RouterInfo
of an active peer - specifically, by retrieving their <code>routerInfo-$hash.dat</code>
file and storing it in your <code>netDb/</code> directory. Anyone can provide
you with those files - you can even provide them to others by exposing your own
netDb directory. To simplify the process,
volunteers publish their netDb directories (or a subset) on the regular (non-i2p) network,
and the URLs of these directories are hardcoded in I2P.
When the router starts up for the first time, it automatically fetches from
one of these URLs, selected at random.
{%- endtrans %}</p>
<h2 id="floodfill">{% trans %}Floodfill{% endtrans %}</h2>
<p>{% trans -%}
The floodfill netDb is a simple distributed storage mechanism.
The storage algorithm is simple: send the data to the closest peer that has advertised itself
as a floodfill router. Then wait 10 seconds, pick another floodfill router and ask them
for the entry to be sent, verifying its proper insertion / distribution. If the
verification peer doesn't reply, or they don't have the entry, the sender
repeats the process. When the peer in the floodfill netDb receives a netDb
store from a peer not in the floodfill netDb, they send it to a subset of the floodfill netDb-peers.
The peers selected are the ones closest (according to the <a href="#kademlia_closeness">XOR-metric</a>) to a specific key.
{%- endtrans %}</p>
<p>{% trans -%}
Determining who is part of the floodfill netDb is trivial - it is exposed in each
router's published routerInfo as a capability.
{%- endtrans %}</p>
<p>{% trans -%}
Floodfills have no central authority and do not form a "consensus" -
they only implement a simple DHT overlay.
{%- endtrans %}</p>
<h3 id="opt-in">{% trans %}Floodfill Router Opt-in{% endtrans %}</h3>
<p>{% trans -%}
Unlike Tor, where the directory servers are hardcoded and trusted,
and operated by known entities,
the members of the I2P floodfill peer set need not be trusted, and
change over time.
{%- endtrans %}</p>
<p>{% trans -%}
To increase reliability of the netDb, and minimize the impact
of netDb traffic on a router, floodfill is automatically enabled
only on routers that are configured with high bandwidth limits.
Routers with high bandwidth limits (which must be manually configured,
as the default is much lower) are presumed to be on lower-latency
connections, and are more likely to be available 24/7.
The current minimum share bandwidth for a floodfill router is 128 KBytes/sec.
{%- endtrans %}</p>
<p>{% trans -%}
In addition, a router must pass several additional tests for health
(outbound message queue time, job lag, etc.) before floodfill operation is
automatically enabled.
{%- endtrans %}</p>
<p>{% trans -%}
With the current rules for automatic opt-in, approximately 6&#37; of
the routers in the network are floodfill routers.
{%- endtrans %}</p>
<p>{% trans -%}
While some peers are manually configured to be floodfill,
others are simply high-bandwidth routers who automatically volunteer
when the number of floodfill peers drops below a threshold.
This prevents any long-term network damage from losing most or all
floodfills to an attack.
In turn, these peers will un-floodfill themselves when there are
too many floodfills outstanding.
{%- endtrans %}</p>
<h3>{% trans %}Floodfill Router Roles{% endtrans %}</h3>
<p>{% trans -%}
A floodfill router's only services that are in addition to those of non-floodfill routers
are in accepting netDb stores and responding to netDb queries.
Since they are generally high-bandwidth, they are more likely to participate in a high number of tunnels
(i.e. be a "relay" for others), but this is not directly related to their distributed database services.
{%- endtrans %}</p>
<a name="kademlia_closeness"><h2 id="kad">{% trans %}Kademlia Closeness Metric{% endtrans %}</h2></a>
<p>{% trans -%}
The netDb uses a simple Kademlia-style XOR metric to determine closeness.
The SHA256 hash of the key being looked up or stored is XOR-ed with
the hash of the router in question to determine closeness.
A modification to this algorithm is done to increase the costs of <a href="#sybil-partial">Sybil attacks</a>.
Instead of the SHA256 hash of the key being looked up of stored, the SHA256 hash is taken
of the 32-byte binary key appended with the UTC date represented as an 8-byte ASCII string yyyyMMdd, i.e. SHA256(key + yyyyMMdd).
This is called the "routing key", and it changes every day at midnight UTC.
The daily transformation of the DHT is sometimes called "keyspace rotation",
although it isn't strictly a rotation.
{%- endtrans %}</p>
<p>{% trans -%}
Routing keys are never sent on-the-wire in any I2NP message, they are only used locally for
determination of distance.
{%- endtrans %}</p>
<h2 id="delivery">{% trans %}Storage, Verification, and Lookup Mechanics{% endtrans %}</h2>
<h3>{% trans %}RouterInfo Storage to Peers{% endtrans %}</h3>
<p>{% trans i2np=site_url('docs/protocol/i2np'), ntcp=site_url('docs/transport/ntcp'), ssu=site_url('docs/transport/ssu') -%}
<a href="{{ i2np }}">I2NP</a> DatabaseStoreMessages containing the local RouterInfo are exchanged with peers
as a part of the initialization of a <a href="{{ ntcp }}">NTCP</a>
or <a href="{{ ssu }}">SSU</a> transport connection.
{%- endtrans %}</p>
<a name="leaseset_storage_peers"><h3>{% trans %}LeaseSet Storage to Peers{% endtrans %}</h3></a>
<p>{% trans i2np=site_url('docs/protocol/i2np') -%}
<a href="{{ i2np }}">I2NP</a> DatabaseStoreMessages containing the local LeaseSet are periodically exchanged with peers
by bundling them in a garlic message along with normal traffic from the related Destination.
This allows an initial response, and later responses, to be sent to an appropriate Lease,
without requiring any LeaseSet lookups, or requiring the communicating Destinations to have published LeaseSets at all.
{%- endtrans %}</p>
<h3>Floodfill Selection</h3>
<p>{% trans -%}
The DatabaseStoreMessage should be sent to the floodfill that is closest
to the current routing key for the RouterInfo or LeaseSet being stored.
Currently, the closest floodfill is found by a search in the local database.
Even if that floodfill is not actually closest, it will flood it "closer" by
sending it to multiple other floodfills.
This provides a high degree of fault-tolerance.
{%- endtrans %}</p>
<p>{% trans -%}
In traditional Kademlia, a peer would do a "find-closest" search before inserting
an item in the DHT to the closest target. As the verify operation will tend to
discover closer floodfills if they are present, a router will quickly improve
its knowledge of the DHT "neighborhood" for the RouterInfo and LeaseSets it regularly publishes.
While I2NP does not define a "find-closest" message, if it becomes necessary,
a router may simply do an iterative search for a key with the least significant bit flipped
(i.e. key ^ 0x01) until no closer peers are received in the DatabaseSearchReplyMessages.
This ensures that the true closest peer will be found even if a more-distant peer had
the netdb item.
{%- endtrans %}</p>
<h3>{% trans %}RouterInfo Storage to Floodfills{% endtrans %}</h3>
<p>{% trans i2np=site_url('docs/protocol/i2np') -%}
A router publishes its own RouterInfo by directly connecting to a floodfill router
and sending it a <a href="{{ i2np }}">I2NP</a> DatabaseStoreMessage
with a nonzero Reply Token. The message is not end-to-end garlic encrypted,
as this is a direct connection, so there are no intervening routers
(and no need to hide this data anyway).
The floodfill router replies with a
<a href="{{ i2np }}">I2NP</a> DeliveryStatusMessage,
with the Message ID set to the value of the Reply Token.
{%- endtrans %}</p>
<h3>{% trans %}LeaseSet Storage to Floodfills{% endtrans %}</h3>
<p>{% trans -%}
Storage of LeaseSets is much more sensitive than for RouterInfos, as a router
must take care that the LeaseSet cannot be associated with the router.
{%- endtrans %}</p>
<p>{% trans i2np=site_url('docs/protocol/i2np') -%}
A router publishes a local LeaseSet by
sending a <a href="{{ i2np }}">I2NP</a> DatabaseStoreMessage
with a nonzero Reply Token over an outbound client tunnel for that Destination.
The message is end-to-end garlic encrypted using the Destination's Session Key Manager,
to hide the message from the tunnel's outbound endpoint.
The floodfill router replies with a
<a href="{{ i2np }}">I2NP</a> DeliveryStatusMessage,
with the Message ID set to the value of the Reply Token.
This message is sent back to one of the client's inbound tunnels.
{%- endtrans %}</p>
<h3>{% trans %}Flooding{% endtrans %}</h3>
<p>{% trans -%}
After a floodfill router receives a DatabaseStoreMessage containing a
valid RouterInfo or LeaseSet which is newer than that previously stored in its
local NetDb, it "floods" it.
To flood a NetDb entry, it looks up several (currently 4) floodfill routers closest to the routing key
of the NetDb entry. (The routing key is the SHA256 Hash of the RouterIdentity or Destination with the date (yyyyMMdd) appended.)
By flooding to those closest to the key, not closest to itself, the floodfill ensures that the storage
gets to the right place, even if the storing router did not have good knowledge of the
DHT "neighborhood" for the routing key.
{%- endtrans %}</p>
<p>{% trans i2np=site_url('docs/protocol/i2np') -%}
The floodfill then directly connects to each of those peers
and sends it a <a href="{{ i2np }}">I2NP</a> DatabaseStoreMessage
with a zero Reply Token. The message is not end-to-end garlic encrypted,
as this is a direct connection, so there are no intervening routers
(and no need to hide this data anyway).
The other routers do not reply or re-flood, as the Reply Token is zero.
{%- endtrans %}</p>
<h3 id="lookup">{% trans %}RouterInfo and LeaseSet Lookup{% endtrans %}</h3>
<p>{% trans i2np=site_url('docs/protocol/i2np') -%}
The <a href="{{ i2np }}">I2NP</a> DatabaseLookupMessage is used to request a netdb entry from a floodfill router.
Lookups are sent out one of the router's outbound exploratory tunnels.
The replies are specified to return via one of the router's inbound exploratory tunnels.
{%- endtrans %}</p>
<p>{% trans -%}
Lookups are generally sent to the two "good" (the connection doesn't fail) floodfill routers closest to the requested key, in parallel.
{%- endtrans %}</p>
<p>{% trans i2np=site_url('docs/protocol/i2np') -%}
If the key is found locally by the floodfill router, it responds with a
<a href="{{ i2np }}">I2NP</a> DatabaseStoreMessage.
If the key is not found locally by the floodfill router, it responds with a
<a href="{{ i2np }}">I2NP</a> DatabaseSearchReplyMessage
containing a list of other floodfill routers close to the key.
{%- endtrans %}</p>
<p>{% trans -%}
LeaseSet lookups are garlic encrypted end-to-end as of release 0.9.5.
RouterInfo lookups are not encrypted and thus are vulnerable to snooping by the outbound endpoint
(OBEP) of the client tunnel. This is due to the expense of the ElGamal encryption.
RouterInfo lookup encryption may be enabled in a future release.
{%- endtrans %}</p>
<p>{% trans -%}
As of release 0.9.7, replies to a LeaseSet lookup (a DatabaseStoreMessage or a DatabaseSearchReplyMessage)
will be encrypted by including the session key and tag in the lookup.
This hides the reply from the inbound gateway (IBGW) of the reply tunnel.
Responses to RouterInfo lookups will be encrypted if we enable the lookup encryption.
{%- endtrans %}</p>
<p>{% trans pdf='http://www-users.cs.umn.edu/~hopper/hashing_it_out.pdf' -%}
(Reference: <a href="{{ pdf }}">Hashing it out in Public</a> Sections 2.2-2.3 for terms below in italics)
{%- endtrans %}</p>
<p>{% trans -%}
Due to the relatively small size of the network and the flooding redundancy of 8x,
lookups are usually O(1) rather than O(log n) --
a router is highly likely to know a floodfill router close enough to the key to get the answer on the first try.
In releases prior to 0.8.9, routers used a lookup redundancy of two
(that is, two lookups were performed in parallel to different peers), and
neither <i>recursive</i> nor <i>iterative</i> routing for lookups was implemented.
Queries were sent through <i>multiple routes simultaneously</i>
to <i>reduce the chance of query failure</i>.
{%- endtrans %}</p>
<p>{% trans -%}
As of release 0.8.9, <i>iterative lookups</i> are implemented with no lookup redundancy.
This is a more efficient and reliable lookup that will work much better
when not all floodfill peers are known, and it removes a serious
limitation to network growth. As the network grows and each router knows only a small
subset of the floodfill peers, lookups will become O(log n).
Even if the peer does not return references closer to the key, the lookup continues with
the next-closest peer, for added robustness, and to prevent a malicious floodfill from
black-holing a part of the key space. Lookups continue until a total lookup timeout is reached,
or the maximum number of peers is queried.
{%- endtrans %}</p>
<p>{% trans -%}
<i>Node IDs</i> are <i>verifiable</i> in that we use the router hash directly as both the node ID and the Kademlia key.
Incorrect responses that are not closer to the search key are generally ignored.
Given the current size of the network, a router has
<i>detailed knowledge of the neighborhood of the destination ID space</i>.
{%- endtrans %}</p>
<h3>{% trans %}RouterInfo Storage Verification{% endtrans %}</h3>
<p>{% trans -%}
To verify a storage was successful, a router simply waits about 10 seconds,
then sends a lookup to another floodfill router close to the key
(but not the one the store was sent to).
Lookups sent out one of the router's outbound exploratory tunnels.
Lookups are end-to-end garlic encrypted to prevent snooping by the outbound endpoint(OBEP).
{%- endtrans %}</p>
<h3>{% trans %}LeaseSet Storage Verification{% endtrans %}</h3>
<p>{% trans -%}
To verify a storage was successful, a router simply waits about 10 seconds,
then sends a lookup to another floodfill router close to the key
(but not the one the store was sent to).
Lookups sent out one of the outbound client tunnels for the destination of the LeaseSet being verified.
To prevent snooping by the OBEP of the outbound tunnel,
lookups are end-to-end garlic encrypted.
The replies are specified to return via one of the client's inbound tunnels.
{%- endtrans %}</p>
<p>{% trans -%}
As of release 0.9.7, replies for both RouterInfo and LeaseSet lookups (a DatabaseStoreMessage or a DatabaseSearchReplyMessage)
will be encrypted,
to hide the reply from the inbound gateway (IBGW) of the reply tunnel.
{%- endtrans %}</p>
<h3>{% trans %}Exploration{% endtrans %}</h3>
<p>{% trans i2np=site_url('docs/protocol/i2np') -%}
<i>Exploration</i> is a special form of netdb lookup, where a router attempts to learn about
new routers.
It does this by sending a floodfill router a <a href="{{ i2np }}">I2NP</a> DatabaseLookupMessage, looking for a random key.
As this lookup will fail, the floodfill would normally respond with a
<a href="{{ i2np }}">I2NP</a> DatabaseSearchReplyMessage containing hashes of floodfill routers close to the key.
This would not be helpful, as the requesting router probably already knows those floodfills,
and it would be impractical to add ALL floodfill routers to the "don't include" field of the lookup.
For an exploration query, the requesting router adds a router hash of all zeros to the
"don't include" field of the DatabaseLookupMessage.
The floodfill will then respond only with non-floodfill routers close to the requested key.
{%- endtrans %}</p>
<h3>{% trans %}Notes on Lookup Responses{% endtrans %}</h3>
<p>{% trans -%}
The response to a lookup request is either a Database Store Message (on success) or a
Database Search Reply Message (on failure). The DSRM contains a 'from' router hash field
to indicate the source of the reply; the DSM does not.
The DSRM 'from' field is unauthenticated and may be spoofed or invalid.
There are no other response tags. Therefore, when making multiple requests in parallel, it is
difficult to monitor the performance of the various floodfill routers.
{%- endtrans %}</p>
<h2 id="multihome">{% trans %}MultiHoming{% endtrans %}</h2>
<p>{% trans -%}
Destinations may be hosted on multiple routers simultaneously, by using the same
private and public keys (traditionally stored in eepPriv.dat files).
As both instances will periodically publish their signed LeaseSets to the floodfill peers,
the most recently published LeaseSet will be returned to a peer requesting a database lookup.
As LeaseSets have (at most) a 10 minute lifetime, should a particular instance go down,
the outage will be 10 minutes at most, and generally much less than that.
The multihoming function has been verified and is in use by several services on the network.
{%- endtrans %}</p>
<h2 id="threat">{% trans %}Threat Analysis{% endtrans %}</h2>
<p>{% trans threatmodel=site_url('docs/how/threat-model') -%}
Also discussed on <a href="{{ threatmodel }}#floodfill">the threat model page</a>.
{%- endtrans %}</p>
<p>{% trans -%}
A hostile user may attempt to harm the network by
creating one or more floodfill routers and crafting them to offer
bad, slow, or no responses.
Some scenarios are discussed below.
{%- endtrans %}</p>
<h3>{% trans %}General Mitigation Through Growth{% endtrans %}</h3>
<p>{% trans -%}
There are currently hundreds of floodfill routers in the network.
Most of the following attacks will become more difficult, or have less impact,
as the network size and number of floodfill routers increase.
{%- endtrans %}</p>
<h3>{% trans %}General Mitigation Through Redundancy{% endtrans %}</h3>
<p>{% trans -%}
Via flooding, all netdb entries are stored on the 8 floodfill routers closest to the key.
{%- endtrans %}</p>
<h3>{% trans %}Forgeries{% endtrans %}</h3>
<p>{% trans -%}
All netdb entries are signed by their creators, so no router may forge a
RouterInfo or LeaseSet.
{%- endtrans %}</p>
<h3>{% trans %}Slow or Unresponsive{% endtrans %}</h3>
<p>{% trans peerselection=site_url('docs/how/peer-selection') -%}
Each router maintains an expanded set of statistics in the
<a href="{{ peerselection }}">peer profile</a> for each floodfill router,
covering various quality metrics for that peer.
The set includes:
{%- endtrans %}</p>
<ul>
<li>{% trans %}Average response time{% endtrans %}</li>
<li>{% trans %}Percentage of queries answered with the data requested{% endtrans %}</li>
<li>{% trans %}Percentage of stores that were successfully verified{% endtrans %}</li>
<li>{% trans %}Last successful store{% endtrans %}</li>
<li>{% trans %}Last successful lookup{% endtrans %}</li>
<li>{% trans %}Last response{% endtrans %}</li>
</ul>
<p>{% trans -%}
Each time a router needs to make a determination on which floodfill router is closest to a key,
it uses these metrics to determine which floodfill routers are "good".
The methods, and thresholds, used to determine "goodness" are relatively new, and
are subject to further analysis and improvement.
While a completely unresponsive router will quickly be identified and avoided,
routers that are only sometimes malicious may be much harder to deal with.
{%- endtrans %}</p>
<h3 id="sybil">{% trans %}Sybil Attack (Full Keyspace){% endtrans %}</h3>
<p>{% trans url='http://citeseer.ist.psu.edu/douceur02sybil.html' -%}
An attacker may mount a <a href="{{ url }}">Sybil attack</a>
by creating a large number of floodfill routers spread throughout the keyspace.
{%- endtrans %}</p>
<p>{% trans url='http://blog.torproject.org/blog/june-2010-progress-report' -%}
(In a related example, a researcher recently created a
<a href="{{ url }}">large number of Tor relays</a>.)
If successful, this could be an effective DOS attack on the entire network.
{%- endtrans %}</p>
<p>{% trans -%}
If the floodfills are not sufficiently misbehaving to be marked as "bad" using the peer profile
metrics described above, this is a difficult scenario to handle.
Tor's response can be much more nimble in the relay case, as the suspicious relays
can be manually removed from the consensus.
Some possible responses for the I2P network are listed below, however none of them is completely satisfactory:
{%- endtrans %}</p>
<ul>
<li>{% trans -%}
Compile a list of bad router hashes or IPs, and announce the list through various means
(console news, website, forum, etc.); users would have to manually download the list and
add it to their local "blacklist".
{%- endtrans %}</li>
<li>{% trans %}Ask everyone in the network to enable floodfill manually (fight Sybil with more Sybil){% endtrans %}</li>
<li>{% trans %}Release a new software version that includes the hardcoded "bad" list{% endtrans %}</li>
<li>{% trans -%}
Release a new software version that improves the peer profile metrics and thresholds,
in an attempt to automatically identify the "bad" peers.
{%- endtrans %}</li>
<li>{% trans %}Add software that disqualifies floodfills if too many of them are in a single IP block{% endtrans %}</li>
<li>{% trans -%}
Implement an automatic subscription-based blacklist controlled by a single individual or group.
This would essentially implement a portion of the Tor "consensus" model.
Unfortunately it would also give a single individual or group the power to
block participation of any particular router or IP in the network,
or even to completely shutdown or destroy the entire network.
{%- endtrans %}</li>
</ul>
<p>{% trans -%}
This attack becomes more difficult as the network size grows.
{%- endtrans %}</p>
<h3 id="sybil-partial">{% trans %}Sybil Attack (Partial Keyspace){% endtrans %}</h3>
<p>{% trans url='http://citeseer.ist.psu.edu/douceur02sybil.html' -%}
An attacker may mount a <a href="{{ url }}">Sybil attack</a>
by creating a small number (8-15) of floodfill routers clustered closely in the keyspace,
and distribute the RouterInfos for these routers widely.
Then, all lookups and stores for a key in that keyspace would be directed
to one of the attacker's routers.
If successful, this could be an effective DOS attack on a particular eepsite, for example.
{%- endtrans %}</p>
<p>{% trans -%}
As the keyspace is indexed by the cryptographic (SHA256) Hash of the key,
an attacker must use a brute-force method to repeatedly generate router hashes
until he has enough that are sufficiently close to the key.
The amount of computational power required for this, which is dependent on network
size, is unknown.
{%- endtrans %}</p>
<p>{% trans -%}
As a partial defense against this attack,
the algorithm used to determine Kademlia "closeness" varies over time.
Rather than using the Hash of the key (i.e. H(k)) to determine closeness,
we use the Hash of the key appended with the current date string, i.e. H(k + YYYYMMDD).
A function called the "routing key generator" does this, which transforms the original key into a "routing key".
In other words, the entire netdb keyspace "rotates" every day at UTC midnight.
Any partial-keyspace attack would have to be regenerated every day, for
after the rotation, the attacking routers would no longer be close
to the target key, or to each other.
{%- endtrans %}</p>
<p>{% trans -%}
This attack becomes more difficult as the network size grows.
However, recent research demonstrates that the keyspace rotation is not particularly effective.
An attacker can precompute numerous router hashes in advance,
and only a few routers are sufficient to "eclipse" a portion
of the keyspace within a half hour after rotation.
{%- endtrans %}</p>
<p>{% trans -%}
One consequence of daily keyspace rotation is that the distributed network database
may become unreliable for a few minutes after the rotation --
lookups will fail because the new "closest" router has not received a store yet.
The extent of the issue, and methods for mitigation
(for example netdb "handoffs" at midnight)
are a topic for further study.
{%- endtrans %}</p>
<h3>{% trans %}Bootstrap Attacks{% endtrans %}</h3>
<p>{% trans -%}
An attacker could attempt to boot new routers into an isolated
or majority-controlled network by taking over a reseed website,
or tricking the developers into adding his reseed website
to the hardcoded list in the router.
{%- endtrans %}</p>
<p>{% trans -%}
Several defenses are possible, and most of these are planned:
{%- endtrans %}</p>
<ul>
<li>{% trans -%}
Disallow fallback from HTTPS to HTTP for reseeding.
A MITM attacker could simply block HTTPS, then respond to the HTTP.
{%- endtrans %}</li>
<li>{% trans -%}
Changing the reseed task to fetch a subset of RouterInfos from
each of several reseed sites rather than using only a single site
{%- endtrans %}</li>
<li>{% trans -%}
Creating an out-of-network reseed monitoring service that
periodically polls reseed websites and verifies that the
data are not stale or inconsistent with other views of the network
{%- endtrans %}</li>
<li>{% trans %}Bundling reseed data in the installer{% endtrans %}</li>
</ul>
<h3>{% trans %}Query Capture{% endtrans %}</h3>
<p>{% trans pdf='http://www-users.cs.umn.edu/~hopper/hashing_it_out.pdf' -%}
See also <a href="#lookup">lookup</a>
(Reference: <a href="{{ pdf }}">Hashing it out in Public</a> Sections 2.2-2.3 for terms below in italics)
{%- endtrans %}</p>
<p>{% trans -%}
Similar to a bootstrap attack, an attacker using a floodfill router could attempt to "steer"
peers to a subset of routers controlled by him by returning their references.
{%- endtrans %}</p>
<p>{% trans -%}
This is unlikely to work via exploration, because exploration is a low-frequency task.
Routers acquire a majority of their peer references through normal tunnel building activity.
Exploration results are generally limited to a few router hashes,
and each exploration query is directed to a random floodfill router.
{%- endtrans %}</p>
<p>{% trans i2np=site_url('docs/protocol/i2np'),
pdf='http://www-users.cs.umn.edu/~hopper/hashing_it_out.pdf' -%}
As of release 0.8.9, <i>iterative lookups</i> are implemented.
For floodfill router references returned in a
<a href="{{ i2np }}">I2NP</a> DatabaseSearchReplyMessage
response to a lookup,
these references are followed if they are closer (or the next closest) to the lookup key.
The requesting router does not trust that the references are
closer to the key (i.e. they are <i>verifiably correct</i>.
The lookup also does not stop when no closer key is found, but continues by querying the
next-closet node, until the timeout or maximum number of queries is reached.
This prevents a malicious floodfill from black-holing a part of the key space.
Also, the daily keyspace rotation requires an attacker to regenerate a router info
within the desired key space region.
This design ensures that the query capture attack described in
<a href="{{ pdf }}">Hashing it out in Public</a>
is much more difficult.
{%- endtrans %}</p>
<h3>{% trans %}DHT-Based Relay Selection{% endtrans %}</h3>
<p>{% trans pdf='http://www-users.cs.umn.edu/~hopper/hashing_it_out.pdf' -%}
(Reference: <a href="{{ pdf }}">Hashing it out in Public</a> Section 3)
{%- endtrans %}</p>
<p>{% trans peerselection=site_url('docs/how/peer-selection') -%}
This doesn't have much to do with floodfill, but see
the <a href="{{ peerselection }}">peer selection page</a>
for a discussion of the vulnerabilities of peer selection for tunnels.
{%- endtrans %}</p>
<h3>{% trans %}Information Leaks{% endtrans %}</h3>
<p>{% trans pdf='http://www.eecs.berkeley.edu/~pmittal/publications/nisan-torsk-ccs10.pdf' -%}
(Reference: <a href="{{ pdf }}">In Search of an Anonymous and Secure Lookup</a> Section 3)
{%- endtrans %}</p>
<p>{% trans peerselection=site_url('docs/how/peer-selection'),
tunnelrouting=site_url('docs/how/tunnel-routing') -%}
This paper addresses weaknesses in the "Finger Table" DHT lookups used by Torsk and NISAN.
At first glance, these do not appear to apply to I2P. First, the use of DHT by Torsk and NISAN
is significantly different from that in I2P. Second, I2P's network database lookups are only
loosely correlated to the <a href="{{ peerselection }}">peer selection</a> and
<a href="{{ tunnelrouting }}">tunnel building</a> processes; only previously-known peers
are used for tunnels.
Also, peer selection is unrelated to any notion of DHT key-closeness.
{%- endtrans %}</p>
<p>{% trans -%}
Some of this may actually be more interesting when the I2P network gets much larger.
Right now, each router knows a large proportion of the network, so looking up a particular
Router Info in the network database is not strongly indicative of a future intent to use
that router in a tunnel. Perhaps when the network is 100 times larger, the lookup may be
more correlative. Of course, a larger network makes a Sybil attack that much harder.
{%- endtrans %}</p>
<p>{% trans threatmodel=site_url('docs/how/threat-model') -%}
However, the general issue of DHT information leakage in I2P needs further investigation.
The floodfill routers are in a position to observe queries and gather information.
Certainly, at a level of <i>f</i> = 0.2 (20&#37; malicious nodes, as specifed in the paper)
we expect that many of the Sybil threats we describe
(<a href="{{ threatmodel }}#sybil">here</a>,
<a href="#sybil">here</a> and
<a href="#sybil-partial">here</a>)
become problematic for several reasons.
{%- endtrans %}</p>
<h2 id="history">{% trans %}History{% endtrans %}</h2>
<p>
<a href="{{ site_url('docs/discussions/netdb') }}">{% trans %}Moved to the netdb discussion page{% endtrans %}</a>.
</p>
<h2 id="future">{% trans %}Future Work{% endtrans %}</h2>
<p>{% trans -%}
End-to-end encryption of additional netDb lookups and responses.
{%- endtrans %}</p>
<p>{% trans -%}
Better methods for tracking lookup responses.
{%- endtrans %}</p>
{% endblock %}