i2p.www/i2p2www/pages/site/docs/how/peer-selection.html

{% extends "global/layout.html" %}
{% block title %}{% trans %}Peer Profiling and Selection{% endtrans %}{% endblock %}
{% block lastupdated %}{% trans %}July 2010{% endtrans %}{% endblock %}
{% block accuratefor %}0.8{% endblock %}
{% block content %}
<h2>{% trans %}Overview{% endtrans %}</h2>

<h3>{% trans %}Peer Profiling{% endtrans %}</h3>

<p>{% trans netdb=site_url('docs/how/network-database') -%}
<b>Peer profiling</b> is the process of collecting data based on the <b>observed</b> performance
of other routers or peers, and classifying those peers into groups.
Profiling does <b>not</b> use any claimed performance data published by the peer itself
in the <a href="{{ netdb }}">network database</a>.
{%- endtrans %}</p>

<p>{% trans %}Profiles are used for two purposes:{% endtrans %}</p>
<ol>
<li>{% trans %}Selecting peers to relay our traffic through, which is discussed below{% endtrans %}</li>
<li>{% trans netdb=site_url('docs/how/network-database') -%}
Choosing peers from the set of floodfill routers to use for network database storage and queries,
which is discussed on the <a href="{{ netdb }}">network database</a> page
{%- endtrans %}</li>
</ol>


<h3>{% trans %}Peer Selection{% endtrans %}</h3>
<p>{% trans -%}
<b>Peer selection</b> is the process of choosing which routers
on the network we want to relay our messages to go through (which peers will we 
ask to join our tunnels).  To accomplish this, we keep track of how each
peer performs (the peer's "profile") and use that data to estimate how 
fast they are, how often they will be able to accept our requests, and 
whether they seem to be overloaded or otherwise unable to perform what
they agree to reliably.
{%- endtrans %}</p>

<p>{% trans threatmodel=site_url('docs/how/threat-model') -%}
Unlike some other anonymous networks, in I2P,
claimed bandwidth is untrusted and is <b>only</b> used to avoid those peers
advertising very low bandwidth insufficient for routing tunnels.
All peer selection is done through profiling.
This prevents simple attacks based on peers claiming high bandwidth
in order to capture large numbers of tunnels.
It also makes
<a href="{{ threatmodel }}#timing">timing attacks</a>
more difficult.
{%- endtrans %}</p>

<p>{% trans -%}
Peer selection is done quite frequently, as a router may maintain a large number
of client and exploratory tunnels, and a tunnel lifetime is only 10 minutes.
{%- endtrans %}</p>


<h3>{% trans %}Further Information{% endtrans %}</h3>
<p>{% trans pdf=url_for('static', filename='pdf/I2P-PET-CON-2009.1.pdf'),
url='http://www.pet-con.org/index.php/PET_Convention_2009.1' -%}
For more information see the paper
<a href="{{ pdf }}">Peer Profiling and Selection in the I2P Anonymous Network</a>
presented at <a href="{{ url }}">PET-CON 2009.1</a>.
See <a href="#notes">below</a> for notes on minor changes since the paper was published.
{%- endtrans %}</p>

<h2>{% trans %}Profiles{% endtrans %}</h2>
<p>{% trans url='http://docs.i2p-projekt.de/javadoc/net/i2p/router/peermanager/PeerProfile.html' -%}
Each peer has a set of data points collected about them, including statistics 
about how long it takes for them to reply to a network database query, how 
often their tunnels fail, and how many new peers they are able to introduce 
us to, as well as simple data points such as when we last heard from them or
when the last communication error occurred.  The specific data points gathered
can be found in the <a href="{{ url }}">code</a>.
{%- endtrans %}</p>

<p>{% trans -%}
Profiles are fairly small, a few KB. To control memory usage, the profile expiration time
lessens as the number of profiles grows.
Profiles are kept in memory until router shutdown, when they are written to disk.
At startup, the profiles are read so the router need not reinitialize all profiles,
thus allowing a router to quickly re-integrate into the network after startup.
{%- endtrans %}</p>


<h2>{% trans %}Peer Summaries{% endtrans %}</h2>
<p>{% trans -%}
While the profiles themselves can be considered a summary of a peer's 
performance, to allow for effective peer selection we break each summary down 
into four simple values, representing the peer's speed, its capacity, how well 
integrated into the network it is, and whether it is failing.
{%- endtrans %}</p>

<h3>{% trans %}Speed{% endtrans %}</h3>
<p>{% trans -%}
The speed calculation
simply goes through the profile and estimates how much data we can
send or receive on a single tunnel through the peer in a minute.  For this estimate it just looks at
performance in the previous minute.
{%- endtrans %}</p>

<h3 id="capacity">{% trans %}Capacity{% endtrans %}</h3>
<p>{% trans -%}
The capacity calculation
simply goes through the profile and estimates how many tunnels the peer
would agree to participate in over a given time period.  For this estimate it looks at 
how many tunnel build requests
the peer has accepted, rejected, and dropped, and how many
of the agreed-to tunnels later failed.
While the calculation is time-weighted so that recent activity counts more than later activity,
statistics up to 48 hours old may be included.
{%- endtrans %}</p>

<p>{% trans -%}
Recognizing and avoiding unreliable and unreachable
peers is critically important.
Unfortunately, as the tunnel building and testing require the participation of several peers,
it is difficult to positively identify the cause of a dropped build request or test failure.
The router assigns a probability of failure to each of the
peers, and uses that probability in the capacity calculation.
Drops and test failures are weighted much higher than rejections.
{%- endtrans %}</p>

<h2>{% trans %}Peer organization{% endtrans %}</h2>
<p>{% trans -%}
As mentioned above, we drill through each peer's profile to come up with a 
few key calculations, and based upon those, we organize each peer into three
groups - fast, high capacity, and standard.
{%- endtrans %}</p>

<p>{% trans -%}
The groupings are not mutually exclusive, nor are they unrelated:
{%- endtrans %}</p>
<ul> 
<li>{% trans -%}
A peer is considered "high capacity" if its capacity calculation meets or 
exceeds the median of all peers.
{%- endtrans %}</li>
<li>{% trans -%}
A peer is considered "fast" if they are already "high capacity" and their 
speed calculation meets or exceeds the median of all peers.
{%- endtrans %}</li>
<li>{% trans %}A peer is considered "standard" if it is not "high capacity"{% endtrans %}</li>
</ul>

<p>{% trans url='http://docs.i2p-projekt.de/javadoc/net/i2p/router/peermanager/ProfileOrganizer.html' -%}
These groupings are implemented in the router's
<a href="{{ url }}">ProfileOrganizer</a>.
{%- endtrans %}</p>

<h3>{% trans %}Group size limits{% endtrans %}</h3>
<p>{% trans -%}
The size of the groups may be limited.
{%- endtrans %}</p>

<ul>
<li>{% trans -%}
The fast group is limited to 30 peers.
If there would be more, only the ones with the highest speed rating are placed in the group.
{%- endtrans %}</li>
<li>{% trans -%}
The high capacity group is limited to 75 peers (including the fast group)
If there would be more, only the ones with the highest capacity rating are placed in the group.
{%- endtrans %}</li>
<li>{% trans -%}
The standard group has no fixed limit, but is somewhat smaller than the number of RouterInfos
stored in the local network database.
On an active router in today's network, there may be about 1000 RouterInfos and 500 peer profiles
(including those in the fast and high capacity groups)
{%- endtrans %}</li>
</ul>

<h2>{% trans %}Recalculation and Stability{% endtrans %}</h2>
<p>{% trans -%}
Summaries are recalculated, and peers are resorted into groups, every 45 seconds.
{%- endtrans %}</p>

<p>{% trans -%}
The groups tend to be fairly stable, that is, there is not much "churn" in the rankings
at each recalculation.
Peers in the fast and high capacity groups get more tunnels build through them, which increases their speed and capacity ratings,
which reinforces their presence in the group.
{%- endtrans %}</p>


<h2>{% trans %}Peer Selection{% endtrans %}</h2>
<p>{% trans -%}
The router selects peers from the above groups to build tunnels through.
{%- endtrans %}</p>


<h3>{% trans %}Peer Selection for Client Tunnels{% endtrans %}</h3>
<p>{% trans -%}
Client tunnels are used for application traffic, such as for HTTP proxies and web servers.
{%- endtrans %}</p>

<p>{% trans -%}
To reduce the susceptibility to <a href="http://blog.torproject.org/blog/one-cell-enough">some attacks</a>,
and increase performance,
peers for building client tunnels are chosen randomly from the smallest group, which is the "fast" group.
There is no bias toward selecting peers that were previously participants in a tunnel for the same client.
{%- endtrans %}</p>


<h3>{% trans %}Peer Selection for Exploratory Tunnels{% endtrans %}</h3>
<p>{% trans -%}
Exploratory tunnels are used for router administrative purposes, such as network database traffic
and testing client tunnels.
Exploratory tunnels are also used to contact previously unconnected routers, which is why
they are called "exploratory".
These tunnels are usually low-bandwidth.
{%- endtrans %}</p>

<p>{% trans -%}
Peers for building exploratory tunnels are generally chosen randomly from the standard group.
If the success rate of these build attempts is low compared to the client tunnel build success rate,
the router will select a weighted average of peers randomly from the high capacity group instead.
This helps maintain a satisfactory build success rate even when network performance is poor.
There is no bias toward selecting peers that were previously participants in an exploratory tunnel.
{%- endtrans %}</p>

<p>{% trans -%}
As the standard group includes a very large subset of all peers the router knows about,
exploratory tunnels are essentially built through a random selection of all peers,
until the build success rate becomes too low.
{%- endtrans %}</p>


<h3>{% trans %}Restrictions{% endtrans %}</h3>
<p>{% trans -%}
To prevent some simple attacks, and for performance, there are the following restrictions:
{%- endtrans %}</p>
<ul>
<li>{% trans -%}
Two peers from the same /16 IP space may not be in the same tunnel.
{%- endtrans %}</li>
<li>{% trans -%}
A peer may participate in a maximum of 33% of all tunnels created by the router.
{%- endtrans %}</li>
<li>{% trans -%}
Peers with extremely low bandwidth are not used.
{%- endtrans %}</li>
<li>{% trans -%}
Peers for which a recent connection attempt failed are not used.
{%- endtrans %}</li>
</ul>


<h3>{% trans %}Peer Ordering in Tunnels{% endtrans %}</h3>
<p>{% trans pdf='http://forensics.umass.edu/pubs/wright-tissec.pdf',
pdf2008='http://forensics.umass.edu/pubs/wright.tissec.2008.pdf',
tunnelimpl=site_url('docs/tunnels/implementation') -%}
Peers are ordered within tunnels to
to deal with the <a href="{{ pdf }}">predecessor attack</a>
<a href="{{ pdf2008 }}">(2008 update)</a>.
More information is on the <a href="{{ tunnelimpl }}#ordering">tunnel page</a>.
{%- endtrans %}</p>


<h2>{% trans %}Future Work{% endtrans %}</h2>

<ul>
<li>{% trans -%}
Continue to analyze an tune speed and capacity calculations as necessary
{%- endtrans %}</li>
<li>{% trans -%}
Implement a more aggressive ejection strategy if necessary to control memory usage as the network grows
{%- endtrans %}</li>
<li>{% trans -%}
Evaluate group size limits
{%- endtrans %}</li>
<li>{% trans -%}
Use GeoIP data to include or exclude certain peers, if configured
{%- endtrans %}</li>
</ul>

<h2 id="notes">{% trans %}Notes{% endtrans %}</h2>
<p>{% trans pdf=url_for('static', filename='pdf/I2P-PET-CON-2009.1.pdf') -%}
For those reading the paper
<a href="{{ pdf }}">Peer Profiling and Selection in the I2P Anonymous Network</a>,
please keep in mind the following minor changes in I2P since the paper's publication:
{%- endtrans %}</p>
<ul>
<li>{% trans %}The Integration calculation is still not used{% endtrans %}</li>
<li>{% trans %}In the paper, "groups" are called "tiers"{% endtrans %}</li>
<li>{% trans %}The "Failing" tier is no longer used{% endtrans %}</li>
<li>{% trans %}The "Not Failing" tier is now named "Standard"{% endtrans %}</li>
</ul>


<h2>{% trans %}References{% endtrans %}</h2>
<ul>
<li>
<a href="{{ url_for('static', filename='pdf/I2P-PET-CON-2009.1.pdf') }}">{% trans %}Peer Profiling and Selection in the I2P Anonymous Network{% endtrans %}</a>
<li>
<a href="http://blog.torproject.org/blog/one-cell-enough">{% trans %}One Cell Enough{% endtrans %}</a>
<li>
<a href="https://wiki.torproject.org/noreply/TheOnionRouter/TorFAQ#EntryGuards">{% trans %}Tor Entry Guards{% endtrans %}</a>
<li>
<a href="http://freehaven.net/anonbib/#murdoch-pet2007">{% trans %}Murdoch 2007 Paper{% endtrans %}</a>
<li>
<a href="http://www.crhc.uiuc.edu/~nikita/papers/tuneup-cr.pdf">{% trans %}Tune-up for Tor{% endtrans %}</a>
<li>
<a href="http://cs.gmu.edu/~mccoy/papers/wpes25-bauer.pdf">{% trans %}Low-resource Routing Attacks Against Tor{% endtrans %}</a>
</ul>

{% endblock %}