i2p.www/www.i2p2/pages/naming_discussion.html

{% extends "_layout.html" %}
{% block title %}Naming discussion{% endblock %}
{% block content %}
<p>
NOTE: The following is a discussion of the reasons behind the I2P naming system,
common arguments and possible alternatives.
See <a href="naming.html">the naming page</a> for current documentation.
</p>

<h2>Discarded alternatives</h2>

<p>
Naming within I2P has been an oft-debated topic since the very beginning with
advocates across the spectrum of possibilities.  However, given I2P's inherent
demand for secure communication and decentralized operation, the traditional
DNS-style naming system is clearly out, as are "majority rules" voting systems.
</p>

<p>
I2P does not promote the use of DNS-like services though, as the damage done
by hijacking a site can be tremendous - and insecure destinations have no
value.  DNSsec itself still falls back on registrars and certificate authorities,
while with I2P, requests sent to a destination cannot be intercepted or the reply
spoofed, as they are encrypted to the destination's public keys, and a destination
itself is just a pair of public keys and a certificate.  DNS-style systems on the
other hand allow any of the name servers on the lookup path to mount simple denial
of service and spoofing attacks.  Adding on a certificate authenticating the
responses as signed by some centralized certificate authority would address many of
the hostile nameserver issues but would leave open replay attacks as well as
hostile certificate authority attacks.
</p>

<p>
Voting style naming is dangerous as well, especially given the effectiveness of
Sybil attacks in anonymous systems - the attacker can simply create an arbitrarily
high number of peers and "vote" with each to take over a given name.  Proof-of-work
methods can be used to make identity non-free, but as the network grows the load
required to contact everyone to conduct online voting is implausible, or if the
full network is not queried, different sets of answers may be reachable.
</p>

<p>
As with the Internet however, I2P is keeping the design and operation of a
naming system out of the (IP-like) communication layer.  The bundled naming library
includes a simple service provider interface which <a href="#alternatives">alternate naming systems</a> can
plug into, allowing end users to drive what sort of naming tradeoffs they prefer.
</p>

<h2>Discussion</h2>
<p>
See also <a href="https://zooko.com/distnames.html">Names: Decentralized, Secure, Human-Meaningful: Choose Two</a>.
</p>

<h3>Comments by jrandom</h3>
<p>(adapted from a post in the old Syndie, November 26, 2005)</p>
<p>
Q:
What to do if some hosts
do not agree on one address and if some addresses are working, others are not?
Who is the right source of a name?
</p><p>
A:
You don't. This is actually a critical difference between names on I2P and how
DNS works - names in I2P are human readable, secure, but <b>not globally
unique</b>.  This is by design, and an inherent part of our need for security.
</p><p>
If I could somehow convince you to change the destination associated with some
name, I'd successfully "take over" the site, and under no circumstances is that
acceptable.  Instead, what we do is make names <b>locally unique</b>: they are
what <i>you</i> use to call a site, just as how you can call things whatever
you want when you add them to your browser's bookmarks, or your IM client's
buddy list.  Who you call "Boss" may be who someone else calls "Sally".
</p><p>
Names will not, ever, be securely human readable and globally unique.
</p>

<h3>Comments by zzz</h3>
<p>The following from zzz is a review of several common
complaints about I2P's naming system.
<ul>
<li>Inefficiency<br/>
The whole hosts.txt is downloaded (if it has changed, since eepget uses the etag and last-modified headers).
It's about 400K right now for almost 800 hosts.
<p>
True, but this isn't a lot of traffic in the context of i2p, which is itself wildly inefficient
(floodfill databases, huge encryption overhead and padding, garlic routing, etc.).
If you downloaded a hosts.txt file from someone every 12 hours it averages out to about 10 bytes/sec.
<p>
As is usually the case in i2p, there is a fundamental tradeoff here between anonymity and efficiency.
Some would say that using the etag and last-modified headers is hazardous because it exposes when you
last requested the data.
Others have suggested asking for specific keys only (similar to what jump services do, but
in a more automated fashion), possibly at a further cost in anonymity.
<p>
Possible improvements would be a replacement or supplement to addressbook (see <a href="http://i2host.i2p/">i2host.i2p</a>),
or something simple like subscribing to http://example.i2p/cgi-bin/recenthosts.cgi rather than http://example.i2p/hosts.txt.
If a hypothetical recenthosts.cgi distributed all hosts from the last 24 hours, for example,
that could be both more efficient and more anonymous than the current hosts.txt with last-modified and etag.
<p>
A sample implementation is on stats.i2p at
<a href="http://stats.i2p/cgi-bin/newhosts.txt">http://stats.i2p/cgi-bin/newhosts.txt</a>.
This script returns an Etag with a timestamp.
When a request comes in with the If-None-Match etag,
the script ONLY returns new hosts since that timestamp, or 304 Not Modified if there are none.
In this way, the script efficiently returns only the hosts the subscriber
does not know about, in an addressbook-compatible manner.


<p>
So the inefficiency is not a big issue and there are several ways to improve things without
radical change.

<li>Not Scalable<br/>
The 400K hosts.txt (with linear search) isn't that big at the moment and
we can probably grow by 10x or 100x before it's a problem.
<p>
As far as network traffic see above.
But unless you're going to do a slow real-time query over the network for
a key, you need to have the whole set of keys stored locally, at a cost of about 500 bytes per key.
<li>Requires configuration and "trust"<br/>
Out-of-the-box addressbook is only subscribed to http://www.i2p2.i2p/hosts.txt, which is rarely updated,
leading to poor new-user experience.
<p>
This is very much intentional. jrandom wants a user to "trust" a hosts.txt
provider, and as he likes to say, "trust is not a boolean".
The configuration step attempts to force users to think about issues of trust in an anonymous network.
<p>
As another example, the "Eepsite Unknown" error page in the HTTP Proxy
lists some jump services, but doesn't "recommend" any one in particular,
and it's up to the user to pick one (or not).
jrandom would say we trust the listed providers enough to list them but not enough to
automatically go fetch the key from them.
<p>
How successful this is, I'm not sure.
But there must be some sort of hierarchy of trust for the naming system.
To treat everyone equally may increase the risk of hijacking.

<li>It isn't DNS<br/>
Unfortunately real-time lookups over i2p would significantly slow down web browsing.
<p>
Also, DNS is based on lookups with limited caching and time-to-live, while i2p
keys are permanent.
<p>
Sure, we could make it work, but why? It's a bad fit.
<li>Not reliable<br/>
It depends on specific servers for addressbook subscriptions.
<p>
Yes it depends on a few servers that you have configured.
Within i2p, servers and services come and go.
Any other centralized system (for example DNS root servers) would
have the same problem. A completely decentralized system (everybody is authoritative)
is possible by implementing an "everybody is a root DNS server" solution, or by
something even simpler, like a script that adds everybody in your hosts.txt to your addressbook.
<p>
People advocating all-authoritative solutions generally haven't thought through
the issues of conflicts and hijacking, however.

<li>Awkward, not real-time<br/>
It's a patchwork of hosts.txt providers, key-add web form providers, jump service providers,
eepsite status reporters.
Jump servers and subscriptions are a pain, it should just work like DNS.
<p>
See the reliability and trust sections.
</p>
</ul>
<p>So, in summary, the current system is not horribly broken, inefficient, or un-scalable,
and proposals to "just use DNS" aren't well thought-through.
</p>

<h2 id="alternatives">Alternatives</h2>
<p>The I2P source contains several pluggable naming systems and supports configuration options
to enable experimentation with naming systems.
<ul>
<li><b>Meta</b> - calls two or more other naming systems in order.
By default, calls PetName then HostsTxt.
<li><b>PetName</b> - Looks up in a petnames.txt file.
The format for this file is NOT the same as hosts.txt.
<li><b>HostsTxt</b> - Looks up in the following files, in order:
<ol>
<li>privatehosts.txt
<li>userhosts.txt
<li>hosts.txt
</ol>
<li><b>AddressDB</b> - Each host is listed in a separate file in a addressDb/ directory.
<li>
<b>Eepget</b> - does an HTTP lookup request from an external
server - must be stacked after the HostsTxt lookup with Meta.
This could augment or replace the jump system.
Includes in-memory caching.
<li>
<b>Exec</b> - calls an external program for lookup, allows
additional experimentation in lookup schemes, independent of java.
Can be used after HostsTxt or as the sole naming system.
Includes in-memory caching.
<li><b>Dummy</b> - used as a fallback for Base64 names, otherwise fails.
</ul>
<p>
The current naming system can be changed with the advanced config option 'i2p.naming.impl'
(restart required).
See core/java/src/net/i2p/client/naming for details.
<p>
Any new system should be stacked with HostsTxt, or should
implement local storage and/or the addressbook subscription functions, since addressbook
only knows about the hosts.txt files and format.

<h2 id="certificates">Certificates</h2>
<p>
I2P destinations contain a certificate, however at the moment that certificate
is always null.
With a null certificate, base64 destinations are always 516 bytes ending in "AAAA",
and this is checked in the addressbook merge mechanism, and possibly other places.
Also, there is no method available to generate a certificate or add it to a
destination. So these will have to be updated to implement certificates.
</p><p>
One possible use of certificates is for <a href="todo.html#hashcash">proof of work</a>.
</p><p>
Another is for "subdomains" (in quotes because there is really no such thing,
i2p uses a flat naming system) to be signed by the 2nd level domain's keys.
</p><p>
With any certificate implementation must come the method for verifying the
certificates.
Presumably this would happen in the addressbook merge code.
Is there a method for multiple types of certificates, or multiple certificates?
</p><p>
Adding on a certificate authenticating the
responses as signed by some centralized certificate authority would address many of
the hostile nameserver issues but would leave open replay attacks as well as
hostile certificate authority attacks.
</p>
{% endblock %}