762 lines
32 KiB
HTML
762 lines
32 KiB
HTML
{% extends "global/layout.html" %}
|
|
{% block title %}The Network Database{% endblock %}
|
|
{% block lastupdated %}June 2012{% endblock %}
|
|
{% block accuratefor %}0.9{% endblock %}
|
|
{% block content %}
|
|
<h2>Overview</h2>
|
|
|
|
<p>
|
|
I2P's netDb is a specialized distributed database, containing
|
|
just two types of data - router contact information (<b>RouterInfos</b>) and destination contact
|
|
information (<b>LeaseSets</b>). Each piece of data is signed by the appropriate party and verified
|
|
by anyone who uses or stores it. In addition, the data has liveliness information
|
|
within it, allowing irrelevant entries to be dropped, newer entries to replace
|
|
older ones, and protection against certain classes of attack.
|
|
</p>
|
|
|
|
<p>
|
|
The netDb is distributed with a simple technique called "floodfill",
|
|
where a subset of all routers, called "floodfill routers", maintains the distributed database.
|
|
</p>
|
|
|
|
|
|
<h2 id="routerInfo">RouterInfo</h2>
|
|
|
|
<p>
|
|
When an I2P router wants to contact another router, they need to know some
|
|
key pieces of data - all of which are bundled up and signed by the router into
|
|
a structure called the "RouterInfo", which is distributed with the SHA256 of the router's identity
|
|
as the key. The structure itself contains:
|
|
</p>
|
|
<ul>
|
|
<li>The router's identity (a 2048bit ElGamal encryption key, a 1024bit DSA signing key, and a certificate)</li>
|
|
<li>The contact addresses at which it can be reached (e.g. TCP: example.org port 4108)</li>
|
|
<li>When this was published</li>
|
|
<li>A set of arbitrary text options</li>
|
|
<li>The signature of the above, generated by the identity's DSA signing key</li>
|
|
</ul>
|
|
|
|
<p>
|
|
The following text options, while not strictly required, are expected
|
|
to be present:
|
|
</p>
|
|
<ul>
|
|
<li><b>caps</b>
|
|
(Capabilities flags - used to indicate floodfill participation, approximate bandwidth, and perceived reachability)
|
|
<li><b>coreVersion</b>
|
|
(The core library version, always the same as the router version)
|
|
<li><b>netId</b> = 2
|
|
(Basic network compatibility - A router will refuse to communicate with a peer having a different netId)
|
|
<li><b>router.version</b>
|
|
(Used to determine compatibility with newer features and messages)
|
|
<li><b>stat_uptime</b> = 90m
|
|
(Always sent as 90m, for compatibility with an older scheme where routers published their actual uptime,
|
|
and only sent tunnel requests to peers whose was more than 60m)
|
|
</ul>
|
|
|
|
<p>
|
|
These values are used by other routers for basic decisions.
|
|
Should we connect to this router? Should we attempt to route a tunnel through this router?
|
|
The bandwidth capability flag, in particular, is used only to determine whether
|
|
the router meets a minimum threshold for routing tunnels.
|
|
Above the minimum threshold, the advertised bandwidth is not used or trusted anywhere
|
|
in the router, except for display in the user interface and for debugging and network analysis.
|
|
</p>
|
|
|
|
|
|
<p>
|
|
Additional text options include
|
|
a small number of statistics about the router's health, which are aggregated by
|
|
sites such as <a href="http://stats.i2p/">stats.i2p</a>
|
|
for network performance analysis and debugging.
|
|
These statistics were chosen to provide data crucial to the developers,
|
|
such as tunnel build success rates, while balancing the need for such data
|
|
with the side-effects that could result from revealing this data.
|
|
Current statistics are limited to:
|
|
</p>
|
|
<ul>
|
|
<li>Client and exploratory tunnel build success, reject, and timeout rates
|
|
<li>1 hour average number of participating tunnels
|
|
</ul>
|
|
|
|
<p>
|
|
The data
|
|
published can be seen in the router's user interface,
|
|
but is not used or trusted within the router.
|
|
As the network has matured, we have gradually removed most of the published
|
|
statistics to improve anonymity, and we plan to remove more in future releases.
|
|
</p>
|
|
|
|
<p>
|
|
<a href="common_structures_spec.html#struct_RouterInfo">RouterInfo specification</a>
|
|
</p>
|
|
<p>
|
|
<a href="http://docs.i2p-projekt.de/javadoc/net/i2p/data/RouterInfo.html">RouterInfo Javadoc</a>
|
|
</p>
|
|
|
|
<h3>RouterInfo Expiration</h3>
|
|
<p>
|
|
RouterInfos have no set expiration time.
|
|
Each router is free to maintain its own local policy to trade off the frequency of RouterInfo lookups
|
|
with memory or disk usage.
|
|
In the current implementation, there are the following general policies:
|
|
</p>
|
|
<ul>
|
|
<li>There is no expiration during the first hour of uptime, as the persistent stored data may be old.</li>
|
|
<li>There is no expiration if there are 25 or less RouterInfos.</li>
|
|
<li>As the number of local RouterInfos grows, the expiration time shrinks, in an attempt to maintain
|
|
a reasonable number RouterInfos. The expiration time with less than 120 routers is 72 hours,
|
|
while expiration time with 300 routers is around 30 hours.</li>
|
|
<li>RouterInfos containing <a href="udp.html">SSU</a> introducers expire in about an hour, as
|
|
the introducer list expires in about that time.</li>
|
|
<li>Floodfills use a short expiration time (1 hour) for all local RouterInfos, as valid RouterInfos will
|
|
be frequently republished to them.</li>
|
|
</ul>
|
|
|
|
<h3>RouterInfo Persistent Storage</h3>
|
|
|
|
<p>
|
|
RouterInfos are periodically written to disk so that they are available after a restart.
|
|
</p>
|
|
|
|
|
|
<h2 id="leaseSet">LeaseSet</h2>
|
|
|
|
<p>
|
|
The second piece of data distributed in the netDb is a "LeaseSet" - documenting
|
|
a group of <b>tunnel entry points (leases)</b> for a particular client destination.
|
|
Each of these leases specify the following information:
|
|
<ul>
|
|
<li>The tunnel gateway router (by specifying its identity)</li>
|
|
<li>The tunnel ID on that router to send messages with (a 4 byte number)</li>
|
|
<li>When that tunnel will expire.</li>
|
|
</ul>
|
|
The LeaseSet itself is stored in the netDb under
|
|
the key derived from the SHA256 of the destination.
|
|
</p>
|
|
|
|
<p>
|
|
In addition to these leases, the LeaseSet includes:
|
|
<ul>
|
|
<li>The destination itself (a 2048bit ElGamal encryption key, 1024bit DSA signing key and a certificate)</li>
|
|
<li>Additional encryption public key: used for end-to-end encryption of garlic messages</li>
|
|
<li>Additional signing public key: intended for LeaseSet revocation, but is currently unused.</li>
|
|
<li>Signature of all the LeaseSet data, to make sure the Destination published the LeaseSet.</li>
|
|
</ul>
|
|
</p>
|
|
|
|
<p>
|
|
<a href="common_structures_spec.html#struct_Lease">Lease specification</a>
|
|
<br />
|
|
<a href="common_structures_spec.html#struct_LeaseSet">LeaseSet specification</a>
|
|
</p>
|
|
<p>
|
|
<a href="http://docs.i2p-projekt.de/javadoc/net/i2p/data/Lease.html">Lease Javadoc</a>
|
|
<br />
|
|
<a href="http://docs.i2p-projekt.de/javadoc/net/i2p/data/LeaseSet.html">LeaseSet Javadoc</a>
|
|
</p>
|
|
|
|
|
|
<h3 id="unpublished">Unpublished LeaseSets</h3>
|
|
<p>
|
|
A LeaseSet for a destination used only for outgoing connections is <i>unpublished</i>.
|
|
It is never sent for publication to a floodfill router.
|
|
"Client" tunnels, such as those for web browsing and IRC clients, are unpublished.
|
|
Servers will still be able to send messages back to those unpublished destinations,
|
|
because of <a href="#leaseset_storage_peers">I2NP storage messages</a>.
|
|
</p>
|
|
|
|
|
|
|
|
<h3 id="revoked">Revoked LeaseSets</h3>
|
|
<p>
|
|
A LeaseSet may be <i>revoked</i> by publishing a new LeaseSet with zero leases.
|
|
Revocations must be signed by the additional signing key in the LeaseSet.
|
|
Revocations are not fully implemented, and it is unclear if they have any practical use.
|
|
This is the only planned use for that signing key, so it is currently unused.
|
|
</p>
|
|
|
|
<h3 id="encrypted">Encrypted LeaseSets</h3>
|
|
<p>
|
|
In an <i>encrypted</i> LeaseSet, all Leases are encrypted with a separate DSA key.
|
|
The leases may only be decoded, and thus the destination may only be contacted,
|
|
by those with the key.
|
|
There is no flag or other direct indication that the LeaseSet is encrypted.
|
|
Encrypted LeaseSets are not widely used, and it is a topic for future work to
|
|
research whether the user interface and implementation of encrypted LeaseSets could be improved.
|
|
</p>
|
|
|
|
<h3>LeaseSet Expiration</h3>
|
|
<p>
|
|
All Leases (tunnels) are valid for 10 minutes; therefore, a LeaseSet expires
|
|
10 minutes after the earliest creation time of all its Leases.
|
|
</p>
|
|
|
|
<h3>LeaseSet Persistent Storage</h3>
|
|
<p>
|
|
There is no persistent storage of LeaseSet data since they expire so quickly.
|
|
</p>
|
|
|
|
|
|
<h2 id="bootstrap">Bootstrapping</h2>
|
|
<p>
|
|
The netDb is decentralized, however you do need at
|
|
least one reference to a peer so that the integration process
|
|
ties you in. This is accomplished by "reseeding" your router with the RouterInfo
|
|
of an active peer - specifically, by retrieving their <code>routerInfo-$hash.dat</code>
|
|
file and storing it in your <code>netDb/</code> directory. Anyone can provide
|
|
you with those files - you can even provide them to others by exposing your own
|
|
netDb directory. To simplify the process,
|
|
volunteers publish their netDb directories (or a subset) on the regular (non-i2p) network,
|
|
and the URLs of these directories are hardcoded in I2P.
|
|
When the router starts up for the first time, it automatically fetches from
|
|
one of these URLs, selected at random.
|
|
</p>
|
|
|
|
<h2 id="floodfill">Floodfill</h2>
|
|
<p>
|
|
The floodfill netDb is a simple distributed storage mechanism.
|
|
The storage algorithm is simple: send the data to the closest peer that has advertised itself
|
|
as a floodfill router. Then wait 10 seconds, pick another floodfill router and ask them
|
|
for the entry to be sent, verifying its proper insertion / distribution. If the
|
|
verification peer doesn't reply, or they don't have the entry, the sender
|
|
repeats the process. When the peer in the floodfill netDb receives a netDb
|
|
store from a peer not in the floodfill netDb, they send it to a subset of the floodfill netDb-peers.
|
|
The peers selected are the ones closest (according to the <a href="#kademlia_closeness">XOR-metric</a>) to a specific key.
|
|
</p>
|
|
<p>
|
|
Determining who is part of the floodfill netDb is trivial - it is exposed in each
|
|
router's published routerInfo as a capability.
|
|
</p>
|
|
<p>
|
|
Floodfills have no central authority and do not form a "consensus" -
|
|
they only implement a simple DHT overlay.
|
|
</p>
|
|
|
|
|
|
|
|
<h3 id="opt-in">Floodfill Router Opt-in</h3>
|
|
|
|
<p>
|
|
Unlike Tor, where the directory servers are hardcoded and trusted,
|
|
and operated by known entities,
|
|
the members of the I2P floodfill peer set need not be trusted, and
|
|
change over time.
|
|
</p>
|
|
<p>
|
|
To increase reliability of the netDb, and minimize the impact
|
|
of netDb traffic on a router, floodfill is automatically enabled
|
|
only on routers that are configured with high bandwidth limits.
|
|
Routers with high bandwidth limits (which must be manually configured,
|
|
as the default is much lower) are presumed to be on lower-latency
|
|
connections, and are more likely to be available 24/7.
|
|
The current minimum share bandwidth for a floodfill router is 128 KBytes/sec.
|
|
</p>
|
|
<p>
|
|
In addition, a router must pass several additional tests for health
|
|
(outbound message queue time, job lag, etc.) before floodfill operation is
|
|
automatically enabled.
|
|
</p>
|
|
<p>
|
|
With the current rules for automatic opt-in, approximately 6% of
|
|
the routers in the network are floodfill routers.
|
|
</p>
|
|
|
|
<p>
|
|
While some peers are manually configured to be floodfill,
|
|
others are simply high-bandwidth routers who automatically volunteer
|
|
when the number of floodfill peers drops below a threshold.
|
|
This prevents any long-term network damage from losing most or all
|
|
floodfills to an attack.
|
|
In turn, these peers will un-floodfill themselves when there are
|
|
too many floodfills outstanding.
|
|
</p>
|
|
|
|
<h3>Floodfill Router Roles</h3>
|
|
<p>
|
|
A floodfill router's only services that are in addition to those of non-floodfill routers
|
|
are in accepting netDb stores and responding to netDb queries.
|
|
Since they are generally high-bandwidth, they are more likely to participate in a high number of tunnels
|
|
(i.e. be a "relay" for others), but this is not directly related to their distributed database services.
|
|
</p>
|
|
|
|
|
|
<a name="kademlia_closeness"><h2 id="kad">Kademlia Closeness Metric</h2></a>
|
|
<p>
|
|
The netDb uses a simple Kademlia-style XOR metric to determine closeness.
|
|
The SHA256 hash of the key being looked up or stored is XOR-ed with
|
|
the hash of the router in question to determine closeness.
|
|
A modification to this algorithm is done to increase the costs of <a href="#sybil-partial">Sybil attacks</a>.
|
|
Instead of the SHA256 hash of the key being looked up of stored, the SHA256 hash is taken
|
|
of the key appended with the date: yyyyMMdd (SHA256(key + yyyyMMdd).
|
|
</p>
|
|
|
|
|
|
|
|
<h2 id="delivery">Storage, Verification, and Lookup Mechanics</h2>
|
|
|
|
<h3>RouterInfo Storage to Peers</h3>
|
|
<p>
|
|
<a href="i2np.html">I2NP</a> DatabaseStoreMessages containing the local RouterInfo are exchanged with peers
|
|
as a part of the initialization of a
|
|
<a href="ntcp.html">NTCP</a>
|
|
or
|
|
<a href="udp.html">SSU</a>
|
|
transport connection.
|
|
</p>
|
|
|
|
|
|
<a name="leaseset_storage_peers"><h3>LeaseSet Storage to Peers</h3></a>
|
|
<p>
|
|
<a href="i2np.html">I2NP</a> DatabaseStoreMessages containing the local LeaseSet are periodically exchanged with peers
|
|
by bundling them in a garlic message along with normal traffic from the related Destination.
|
|
This allows an initial response, and later responses, to be sent to an appropriate Lease,
|
|
without requiring any LeaseSet lookups, or requiring the communicating Destinations to have published LeaseSets at all.
|
|
</p>
|
|
|
|
|
|
<h3>RouterInfo Storage to Floodfills</h3>
|
|
<p>
|
|
A router publishes its own RouterInfo by directly connecting to a floodfill router
|
|
and sending it a <a href="i2np.html">I2NP</a> DatabaseStoreMessage
|
|
with a nonzero Reply Token. The message is not end-to-end garlic encrypted,
|
|
as this is a direct connection, so there are no intervening routers
|
|
(and no need to hide this data anyway).
|
|
The floodfill router replies with a
|
|
<a href="i2np.html">I2NP</a> DeliveryStatusMessage,
|
|
with the Message ID set to the value of the Reply Token.
|
|
</p>
|
|
|
|
|
|
|
|
|
|
<h3>LeaseSet Storage to Floodfills</h3>
|
|
<p>
|
|
Storage of LeaseSets is much more sensitive than for RouterInfos, as a router
|
|
must take care that the LeaseSet cannot be associated with the router.
|
|
</p>
|
|
|
|
<p>
|
|
A router publishes a local LeaseSet by
|
|
sending a <a href="i2np.html">I2NP</a> DatabaseStoreMessage
|
|
with a nonzero Reply Token over an outbound client tunnel for that Destination.
|
|
The message is end-to-end garlic encrypted using the Destination's Session Key Manager,
|
|
to hide the message from the tunnel's outbound endpoint.
|
|
The floodfill router replies with a
|
|
<a href="i2np.html">I2NP</a> DeliveryStatusMessage,
|
|
with the Message ID set to the value of the Reply Token.
|
|
This message is sent back to one of the client's inbound tunnels.
|
|
</p>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<h3>Flooding</h3>
|
|
<p>
|
|
After a floodfill router receives a DatabaseStoreMessage containing a
|
|
valid RouterInfo or LeaseSet which is newer than that previously stored in its
|
|
local NetDb, it "floods" it.
|
|
To flood a NetDb entry, it looks up the 7 floodfill routers closest to the key
|
|
of the NetDb entry. (The key is the SHA256 Hash of the RouterIdentity or Destination with the date (yyyyMMdd) appended.)
|
|
</p>
|
|
<p>
|
|
It then directly connects to each of the 7 peers
|
|
and sends it a <a href="i2np.html">I2NP</a> DatabaseStoreMessage
|
|
with a zero Reply Token. The message is not end-to-end garlic encrypted,
|
|
as this is a direct connection, so there are no intervening routers
|
|
(and no need to hide this data anyway).
|
|
The other routers do not reply or re-flood, as the Reply Token is zero.
|
|
</p>
|
|
|
|
|
|
<h3 id="lookup">RouterInfo and LeaseSet Lookup</h3>
|
|
<p>
|
|
The <a href="i2np.html">I2NP</a> DatabaseLookupMessage is used to request a netdb entry from a floodfill router.
|
|
Lookups are sent out one of the router's outbound exploratory tunnels.
|
|
The replies are specified to return via one of the router's inbound exploratory tunnels.
|
|
</p>
|
|
<p>
|
|
Lookups are generally sent to the two "good" (the connection doesn't fail) floodfill routers closest to the requested key, in parallel.
|
|
</p>
|
|
<p>
|
|
If the key is found locally by the floodfill router, it responds with a
|
|
<a href="i2np.html">I2NP</a> DatabaseStoreMessage.
|
|
If the key is not found locally by the floodfill router, it responds with a
|
|
<a href="i2np.html">I2NP</a> DatabaseSearchReplyMessage
|
|
containing a list of other floodfill routers close to the key.
|
|
</p>
|
|
|
|
<p>
|
|
Lookups are not encrypted and thus are vulnerable to snooping by the outbound endpoint
|
|
(OBEP) of the client tunnel.
|
|
</p>
|
|
<p>
|
|
As the requesting router does not reveal itself, there is no recipient public key for the floodfill router to
|
|
encrypt the reply with. Therefore, the reply is exposed to the inbound gateway (IBGW)
|
|
of the inbound exploratory tunnel.
|
|
An appropriate method of encrypting the reply is a topic for future work.
|
|
</p>
|
|
|
|
<p>
|
|
(Reference:
|
|
<a href="http://www-users.cs.umn.edu/~hopper/hashing_it_out.pdf">Hashing it out in Public</a> Sections 2.2-2.3 for terms below in italics)
|
|
</p>
|
|
<p>
|
|
Due to the relatively small size of the network and the flooding redundancy of 8x,
|
|
lookups are usually O(1) rather than O(log n) --
|
|
a router is highly likely to know a floodfill router close enough to the key to get the answer on the first try.
|
|
In releases prior to 0.8.9, routers used a lookup redundancy of two
|
|
(that is, two lookups were performed in parallel to different peers), and
|
|
neither <i>recursive</i> nor <i>iterative</i> routing for lookups was implemented.
|
|
Queries were sent through <i>multiple routes simultaneously</i>
|
|
to <i>reduce the chance of query failure</i>.
|
|
</p>
|
|
<p>
|
|
As of release 0.8.9, <i>iterative lookups</i> are implemented with no lookup redundancy.
|
|
This is a more efficient and reliable lookup that will work much better
|
|
when not all floodfill peers are known, and it removes a serious
|
|
limitation to network growth. As the network grows and each router knows only a small
|
|
subset of the floodfill peers, lookups will become O(log n).
|
|
Even if the peer does not return references closer to the key, the lookup continues with
|
|
the next-closest peer, for added robustness, and to prevent a malicious floodfill from
|
|
black-holing a part of the key space. Lookups continue until a total lookup timeout is reached,
|
|
or the maximum number of peers is queried.
|
|
</p>
|
|
<p>
|
|
<i>Node IDs</i> are <i>verifiable</i> in that we use the router hash directly as both the node ID and the Kademlia key.
|
|
Incorrect responses that are not closer to the search key are generally ignored.
|
|
Given the current size of the network, a router has
|
|
<i>detailed knowledge of the neighborhood of the destination ID space</i>.
|
|
</p>
|
|
|
|
|
|
|
|
<h3>RouterInfo Storage Verification</h3>
|
|
<p>
|
|
To verify a storage was successful, a router simply waits about 10 seconds,
|
|
then sends a lookup to another floodfill router close to the key
|
|
(but not the one the store was sent to).
|
|
Lookups sent out one of the router's outbound exploratory tunnels.
|
|
Lookups are end-to-end garlic encrypted to prevent snooping by the outbound endpoint(OBEP).
|
|
</p>
|
|
|
|
<h3>LeaseSet Storage Verification</h3>
|
|
<p>
|
|
To verify a storage was successful, a router simply waits about 10 seconds,
|
|
then sends a lookup to another floodfill router close to the key
|
|
(but not the one the store was sent to).
|
|
Lookups sent out one of the outbound client tunnels for the destination of the LeaseSet being verified.
|
|
To prevent snooping by the OBEP of the outbound tunnel,
|
|
lookups are end-to-end garlic encrypted.
|
|
The replies are specified to return via one of the client's inbound tunnels.
|
|
</p>
|
|
<p>
|
|
As for regular lookups, the reply is unencrypted,
|
|
thus exposing the reply to the inbound gateway (IBGW) of the reply tunnel, and
|
|
an appropriate method of encrypting the reply is a topic for future work.
|
|
As the IBGW for the reply is one of the gateways published in the LeaseSet,
|
|
the exposure is minimal.
|
|
</p>
|
|
|
|
|
|
|
|
<h3>Exploration</h3>
|
|
<p>
|
|
<i>Exploration</i> is a special form of netdb lookup, where a router attempts to learn about
|
|
new routers.
|
|
It does this by sending a floodfill router a <a href="i2np.html">I2NP</a> DatabaseLookupMessage, looking for a random key.
|
|
As this lookup will fail, the floodfill would normally respond with a
|
|
<a href="i2np.html">I2NP</a> DatabaseSearchReplyMessage containing hashes of floodfill routers close to the key.
|
|
This would not be helpful, as the requesting router probably already knows those floodfills,
|
|
and it would be impractical to add ALL floodfill routers to the "don't include" field of the lookup.
|
|
For an exploration query, the requesting router adds a router hash of all zeros to the
|
|
"don't include" field of the DatabaseLookupMessage.
|
|
The floodfill will then respond only with non-floodfill routers close to the requested key.
|
|
</p>
|
|
|
|
|
|
<h3>Notes on Lookup Responses</h3>
|
|
<p>
|
|
The response to a lookup request is either a Database Store Message (on success) or a
|
|
Database Search Reply Message (on failure). The DSRM contains a 'from' router hash field
|
|
to indicate the source of the reply; the DSM does not.
|
|
The DSRM 'from' field is unauthenticated and may be spoofed or invalid.
|
|
There are no other response tags. Therefore, when making multiple requests in parallel, it is
|
|
difficult to monitor the performance of the various floodfill routers.
|
|
</p>
|
|
|
|
|
|
<h2 id="multihome">MultiHoming</h2>
|
|
|
|
<p>
|
|
Destinations may be hosted on multiple routers simultaneously, by using the same
|
|
private and public keys (traditionally stored in eepPriv.dat files).
|
|
As both instances will periodically publish their signed LeaseSets to the floodfill peers,
|
|
the most recently published LeaseSet will be returned to a peer requesting a database lookup.
|
|
As LeaseSets have (at most) a 10 minute lifetime, should a particular instance go down,
|
|
the outage will be 10 minutes at most, and generally much less than that.
|
|
The multihoming function has been verified and is in use by several services on the network.
|
|
</p>
|
|
|
|
<h2 id="threat">Threat Analysis</h2>
|
|
<p>
|
|
Also discussed on <a href="{{ site_url('docs/how/threatmodel') }}#floodfill">the threat model page</a>.
|
|
</p>
|
|
<p>
|
|
A hostile user may attempt to harm the network by
|
|
creating one or more floodfill routers and crafting them to offer
|
|
bad, slow, or no responses.
|
|
Some scenarios are discussed below.
|
|
</p>
|
|
|
|
<h3>General Mitigation Through Growth</h3>
|
|
<p>
|
|
There are currently almost 100 floodfill routers in the network.
|
|
Most of the following attacks will become more difficult, or have less impact,
|
|
as the network size and number of floodfill routers increase.
|
|
</p>
|
|
|
|
|
|
<h3>General Mitigation Through Redundancy</h3>
|
|
<p>
|
|
Via flooding, all netdb entries are stored on the 8 floodfill routers closest to the key.
|
|
</p>
|
|
|
|
|
|
<h3>Forgeries</h3>
|
|
<p>
|
|
All netdb entries are signed by their creators, so no router may forge a
|
|
RouterInfo or LeaseSet.
|
|
</p>
|
|
|
|
<h3>Slow or Unresponsive</h3>
|
|
<p>
|
|
Each router maintains an expanded set of statistics in the
|
|
<a href="{{ site_url('docs/how/peerselection') }}">peer profile</a> for each floodfill router,
|
|
covering various quality metrics for that peer.
|
|
The set includes:
|
|
</p>
|
|
<ul>
|
|
<li>Average response time
|
|
<li>Percentage of queries answered with the data requested
|
|
<li>Percentage of stores that were successfully verified
|
|
<li>Last successful store
|
|
<li>Last successful lookup
|
|
<li>Last response
|
|
</ul>
|
|
|
|
<p>
|
|
Each time a router needs to make a determination on which floodfill router is closest to a key,
|
|
it uses these metrics to determine which floodfill routers are "good".
|
|
The methods, and thresholds, used to determine "goodness" are relatively new, and
|
|
are subject to further analysis and improvement.
|
|
While a completely unresponsive router will quickly be identified and avoided,
|
|
routers that are only sometimes malicious may be much harder to deal with.
|
|
</p>
|
|
|
|
|
|
<h3 id="sybil">Sybil Attack (Full Keyspace)</h3>
|
|
<p>
|
|
An attacker may mount a
|
|
<a href="http://citeseer.ist.psu.edu/douceur02sybil.html">Sybil attack</a>
|
|
by creating a large number of floodfill routers spread throughout the keyspace.
|
|
</p>
|
|
|
|
<p>
|
|
(In a related example, a researcher recently created a
|
|
<a href="http://blog.torproject.org/blog/june-2010-progress-report">large number of Tor relays</a>.)
|
|
|
|
If successful, this could be an effective DOS attack on the entire network.
|
|
</p>
|
|
|
|
<p>
|
|
If the floodfills are not sufficiently misbehaving to be marked as "bad" using the peer profile
|
|
metrics described above, this is a difficult scenario to handle.
|
|
Tor's response can be much more nimble in the relay case, as the suspicious relays
|
|
can be manually removed from the consensus.
|
|
Some possible responses for the I2P network are listed below, however none of them is completely satisfactory:
|
|
</p>
|
|
<ul>
|
|
<li>Compile a list of bad router hashes or IPs, and announce the list through various means
|
|
(console news, website, forum, etc.); users would have to manually download the list and
|
|
add it to their local "blacklist"
|
|
<li>Ask everyone in the network to enable floodfill manually (fight Sybil with more Sybil)
|
|
<li>Release a new software version that includes the hardcoded "bad" list
|
|
<li>Release a new software version that improves the peer profile metrics and thresholds,
|
|
in an attempt to automatically identify the "bad" peers
|
|
<li>Add software that disqualifies floodfills if too many of them are in a single IP block
|
|
<li>Implement an automatic subscription-based blacklist controlled by a single individual or group.
|
|
This would essentially implement a portion of the Tor "consensus" model.
|
|
Unfortunately it would also give a single individual or group the power to
|
|
block participation of any particular router or IP in the network,
|
|
or even to completely shutdown or destroy the entire network.
|
|
</ul>
|
|
|
|
<p>
|
|
This attack becomes more difficult as the network size grows.
|
|
</p>
|
|
|
|
|
|
|
|
<h3 id="sybil-partial">Sybil Attack (Partial Keyspace)</h3>
|
|
<p>
|
|
An attacker may mount a
|
|
<a href="http://citeseer.ist.psu.edu/douceur02sybil.html">Sybil attack</a>
|
|
by creating a small number (8-15) of floodfill routers clustered closely in the keyspace,
|
|
and distribute the RouterInfos for these routers widely.
|
|
Then, all lookups and stores for a key in that keyspace would be directed
|
|
to one of the attacker's routers.
|
|
If successful, this could be an effective DOS attack on a particular eepsite, for example.
|
|
</p>
|
|
<p>
|
|
As the keyspace is indexed by the cryptographic (SHA256) Hash of the key,
|
|
an attacker must use a brute-force method to repeatedly generate router hashes
|
|
until he has enough that are sufficiently close to the key.
|
|
The amount of computational power required for this, which is dependent on network
|
|
size, is unknown.
|
|
</p>
|
|
|
|
<p>
|
|
As a partial defense against this attack,
|
|
the algorithm used to determine Kademlia "closeness" varies over time.
|
|
Rather than using the Hash of the key (i.e. H(k)) to determine closeness,
|
|
we use the Hash of the key appended with the current date string, i.e. H(k + YYYYMMDD).
|
|
A function called the "routing key generator" does this, which transforms the original key into a "routing key".
|
|
In other words, the entire netdb keyspace "rotates" every day at UTC midnight.
|
|
Any partial-keyspace attack would have to be regenerated every day, for
|
|
after the rotation, the attacking routers would no longer be close
|
|
to the target key, or to each other.
|
|
</p>
|
|
<p>
|
|
This attack becomes more difficult as the network size grows.
|
|
</p>
|
|
|
|
<p>
|
|
One consequence of daily keyspace rotation is that the distributed network database
|
|
may become unreliable for a few minutes after the rotation --
|
|
lookups will fail because the new "closest" router has not received a store yet.
|
|
The extent of the issue, and methods for mitigation
|
|
(for example netdb "handoffs" at midnight)
|
|
are a topic for further study.
|
|
</p>
|
|
|
|
|
|
<h3>Bootstrap Attacks</h3>
|
|
<p>
|
|
An attacker could attempt to boot new routers into an isolated
|
|
or majority-controlled network by taking over a reseed website,
|
|
or tricking the developers into adding his reseed website
|
|
to the hardcoded list in the router.
|
|
</p>
|
|
<p>
|
|
Several defenses are possible, and most of these are planned:
|
|
</p>
|
|
<ul>
|
|
<li>Disallow fallback from HTTPS to HTTP for reseeding.
|
|
A MITM attacker could simply block HTTPS, then respond to the HTTP.
|
|
<li>Changing the reseed task to fetch a subset of RouterInfos from
|
|
each of several reseed sites rather than using only a single site
|
|
<li>Creating an out-of-network reseed monitoring service that
|
|
periodically polls reseed websites and verifies that the
|
|
data are not stale or inconsistent with other views of the network
|
|
<li>Bundling reseed data in the installer
|
|
</ul>
|
|
|
|
<h3>Query Capture</h3>
|
|
<p>
|
|
See also <a href="#lookup">lookup</a>
|
|
(Reference:
|
|
<a href="http://www-users.cs.umn.edu/~hopper/hashing_it_out.pdf">Hashing it out in Public</a> Sections 2.2-2.3 for terms below in italics)
|
|
</p>
|
|
<p>
|
|
Similar to a bootstrap attack, an attacker using a floodfill router could attempt to "steer"
|
|
peers to a subset of routers controlled by him by returning their references.
|
|
</p>
|
|
<p>
|
|
This is unlikely to work via exploration, because exploration is a low-frequency task.
|
|
Routers acquire a majority of their peer references through normal tunnel building activity.
|
|
Exploration results are generally limited to a few router hashes,
|
|
and each exploration query is directed to a random floodfill router.
|
|
</p>
|
|
<p>
|
|
As of release 0.8.9, <i>iterative lookups</i> are implemented.
|
|
For floodfill router references returned in a
|
|
<a href="i2np.html">I2NP</a> DatabaseSearchReplyMessage
|
|
response to a lookup,
|
|
these references are followed if they are closer (or the next closest) to the lookup key.
|
|
The requesting router does not trust that the references are
|
|
closer to the key (i.e. they are <i>verifiably correct</i>.
|
|
The lookup also does not stop when no closer key is found, but continues by querying the
|
|
next-closet node, until the timeout or maximum number of queries is reached.
|
|
This prevents a malicious floodfill from black-holing a part of the key space.
|
|
Also, the daily keyspace rotation requires an attacker to regenerate a router info
|
|
within the desired key space region.
|
|
This design ensures that the query capture attack described in
|
|
<a href="http://www-users.cs.umn.edu/~hopper/hashing_it_out.pdf">Hashing it out in Public</a>
|
|
is much more difficult.
|
|
</p>
|
|
|
|
<h3>DHT-Based Relay Selection</h3>
|
|
<p>
|
|
(Reference:
|
|
<a href="http://www-users.cs.umn.edu/~hopper/hashing_it_out.pdf">Hashing it out in Public</a> Section 3)
|
|
</p>
|
|
<p>
|
|
This doesn't have much to do with floodfill, but see
|
|
the <a href="{{ site_url('docs/how/peerselection') }}">peer selection page</a>
|
|
for a discussion of the vulnerabilities of peer selection for tunnels.
|
|
</p>
|
|
|
|
<h3>Information Leaks</h3>
|
|
<p>
|
|
(Reference:
|
|
<a href="https://netfiles.uiuc.edu/mittal2/www/nisan-torsk-ccs10.pdf">In Search of an Anonymous and Secure Lookup</a> Section 3)
|
|
</p>
|
|
<p>
|
|
This paper addresses weaknesses in the "Finger Table" DHT lookups used by Torsk and NISAN.
|
|
At first glance, these do not appear to apply to I2P. First, the use of DHT by Torsk and NISAN
|
|
is significantly different from that in I2P. Second, I2P's network database lookups are only
|
|
loosely correlated to the <a href="{{ site_url('docs/how/peerselection') }}">peer selection</a> and
|
|
<a href="{{ site_url('docs/how/tunnelrouting') }}">tunnel building</a> processes; only previously-known peers
|
|
are used for tunnels.
|
|
Also, peer selection is unrelated to any notion of DHT key-closeness.
|
|
</p>
|
|
<p>
|
|
Some of this may actually be more interesting when the I2P network gets much larger.
|
|
Right now, each router knows a large proportion of the network, so looking up a particular
|
|
Router Info in the network database is not strongly indicative of a future intent to use
|
|
that router in a tunnel. Perhaps when the network is 100 times larger, the lookup may be
|
|
more correlative. Of course, a larger network makes a Sybil attack that much harder.
|
|
</p>
|
|
<p>
|
|
However, the general issue of DHT information leakage in I2P needs further investigation.
|
|
The floodfill routers are in a position to observe queries and gather information.
|
|
Certainly, at a level of <i>f</i> = 0.2 (20% malicious nodes, as specifed in the paper)
|
|
we expect that many of the Sybil threats we describe
|
|
(<a href="{{ site_url('docs/how/threatmodel') }}#sybil">here</a>,
|
|
<a href="#sybil">here</a> and
|
|
<a href="#sybil-partial">here</a>)
|
|
become problematic for several reasons.
|
|
</p>
|
|
|
|
|
|
<h2 id="history">History</h2>
|
|
<p>
|
|
<a href="netdb_discussion.html">Moved to the netdb discussion page</a>.
|
|
</p>
|
|
|
|
<h2 id="future">Future Work</h2>
|
|
<p>
|
|
End-to-end encryption of additional netDb lookups and responses.
|
|
</p>
|
|
<p>
|
|
Better methods for tracking lookup responses.
|
|
</p>
|
|
|
|
|
|
{% endblock %}
|
|
|