Files
i2p.www/i2p2www/spec/proposals/150-garlic-farm-protocol.rst

466 lines
12 KiB
ReStructuredText
Raw Normal View History

2019-05-02 16:19:00 +00:00
====================
Garlic Farm Protocol
====================
.. meta::
:author: zzz
2019-05-03 16:24:18 +00:00
:created: 2019-05-02
2019-05-02 16:19:00 +00:00
:thread: http://zzz.i2p/topics/2234
2019-05-04 12:46:55 +00:00
:lastupdated: 2019-05-04
2019-05-02 16:19:00 +00:00
:status: Open
.. contents::
Overview
========
This is the spec for the Garlic Farm wire protocol,
2019-05-02 20:25:42 +00:00
based on JRaft, its "exts" code for implementation over TCP,
and its "dmprinter" sample application [JRAFT]_.
2019-05-02 21:01:23 +00:00
JRaft is an implementation of the Raft protocol [RAFT]_.
We were unable to find any implementation with a documented wire protocol.
However, the JRaft implementation is simple enough that we could
inspect the code and then document its protocol.
This proposal is the result of that effort.
2019-05-02 16:19:00 +00:00
This will be the backend for coordination of routers publishing
entries in a Meta LeaseSet. See proposal 123.
Goals
=====
2019-05-02 21:01:23 +00:00
- Small code size
- Based on existing implementation
- No serialized Java objects or any Java-specific features or encoding
2019-05-03 12:14:47 +00:00
- Any bootstrapping is out-of-scope. At least one other server is assumed
to be hardcoded, or configured out-of-band of this protocol.
2019-05-02 16:19:00 +00:00
Design
======
2019-05-02 21:01:23 +00:00
The Raft protocol is not a concrete protocol; it defines only a state machine.
Therefore we document the concrete protocol of JRaft and base our protocol on it.
There are no changes to the JRaft protocol other than the addition of
an authentication handshake.
2019-05-02 16:19:00 +00:00
Specification
=============
The wire protocol is over SSL sockets or non-SSL I2P sockets.
There is no support for clearnet non-SSL sockets.
2019-05-02 21:01:23 +00:00
Handshake and authentication
----------------------------
Not defined by JRaft.
2019-05-02 16:19:00 +00:00
TODO authentication. Perhaps new messages, or perhaps something before
the request/response phase happens.
2019-05-02 21:01:23 +00:00
Requirements:
2019-05-04 12:46:55 +00:00
- user/password or other authentication method
2019-05-02 21:01:23 +00:00
- version
- cluster identifier
- extensible
2019-05-02 16:19:00 +00:00
2019-05-04 12:46:55 +00:00
One possibility: HTTP-like headers for version and other info;
use HTTP digest authentication (RFC 2617).
2019-05-02 16:19:00 +00:00
Message Types
-------------
2019-05-02 21:01:23 +00:00
There are two types of messages, requests and responses.
2019-05-03 12:14:47 +00:00
Requests may contain Log Entries, and are variable-sized;
responses do not contain Log Entries, and are fixed-size.
2019-05-02 21:01:23 +00:00
2019-05-03 16:24:18 +00:00
======================== ====== =========== ================= =====================================
Message Number Sent By Sent To Notes
======================== ====== =========== ================= =====================================
RequestVoteRequest 1 Candidate Follower Standard Raft RPC; must not contain log entries
RequestVoteResponse 2 Follower Candidate Standard Raft RPC
AppendEntriesRequest 3 Leader Follower Standard Raft RPC
AppendEntriesResponse 4 Follower Leader / Client Standard Raft RPC
ClientRequest 5 Client Leader / Follower Response is AppendEntriesResponse; must contain Application log entries only
AddServerRequest 6 Client Leader Must contain a single ClusterServer log entry only
AddServerResponse 7 Leader Client Leader will also send a JoinClusterRequest
RemoveServerRequest 8 Follower Leader Must contain a single ClusterServer log entry only
2019-05-02 18:58:45 +00:00
RemoveServerResponse 9 Leader Follower
2019-05-03 16:24:18 +00:00
SyncLogRequest 10 Leader Follower Must contain a single LogPack log entry only
2019-05-02 18:58:45 +00:00
SyncLogResponse 11 Follower Leader
2019-05-03 16:24:18 +00:00
JoinClusterRequest 12 Leader New Server Invitation to join; must contain a single Configuration log entry only
JoinClusterResponse 13 New Server Leader
LeaveClusterRequest 14 Leader Follower Command to leave
2019-05-02 18:58:45 +00:00
LeaveClusterResponse 15 Follower Leader
2019-05-04 12:46:55 +00:00
InstallSnapshotRequest 16 Leader Follower Raft Section 7; Must contain a single SnapshotSyncRequest log entry only
2019-05-03 16:24:18 +00:00
InstallSnapshotResponse 17 Follower Leader Raft Section 7
======================== ====== =========== ================= =====================================
2019-05-02 18:58:45 +00:00
2019-05-02 16:19:00 +00:00
2019-05-04 12:46:55 +00:00
Establishment
-------------
The establishment sequence is as follows:
.. raw:: html
{% highlight %}
New Server Alice Random Follower Bob
ClientRequest ------->
<--------- AppendEntriesResponse
If Bob says he is the leader, continue as below.
Else, Alice must disconnect from Bob and connect to the leader.
New Server Alice Leader Charlie
ClientRequest ------->
<--------- AppendEntriesResponse
AddServerRequest ------->
<--------- AddServerResponse
<--------- JoinClusterRequest
JoinClusterResponse ------->
<--------- SyncLogRequest
OR InstallSnapshotRequest
SyncLogResponse ------->
OR InstallSnapshotResponse
{% endhighlight %}
Disconnect Sequence:
.. raw:: html
{% highlight %}
2019-05-04 13:30:31 +00:00
Follower Alice Leader Charlie
RemoveServerRequest ------->
2019-05-04 12:46:55 +00:00
<--------- RemoveServerResponse
<--------- LeaveClusterRequest
LeaveClusterResponse ------->
{% endhighlight %}
2019-05-04 13:30:31 +00:00
Election Sequence:
.. raw:: html
{% highlight %}
Candidate Alice Candidate/Follower Bob
RequestVoteRequest ------->
<--------- RequestVoteResponse
if Alice wins election:
Leader Alice Follower Bob
AppendEntriesRequest ------->
(heartbeat)
<--------- AppendEntriesResponse
{% endhighlight %}
2019-05-04 12:46:55 +00:00
2019-05-02 16:19:00 +00:00
Definitions
-----------
- Source: Identifies the originator of the message
- Destination: Identifies the recipient of the message
2019-05-02 17:15:09 +00:00
- Terms: See Raft. Initialized to 0, increases monotonically
- Indexes: See Raft. Initialized to 0, increases monotonically
2019-05-02 16:19:00 +00:00
Requests
--------
Requests contain a header and zero or more log entries.
2019-05-03 12:14:47 +00:00
Requests contain a fixed-size header and optional Log Entries of variable size.
2019-05-02 16:19:00 +00:00
Request Header
``````````````
The request header is 45 bytes, as follows.
2019-05-02 18:58:45 +00:00
All values are unsigned big-endian.
2019-05-02 16:19:00 +00:00
.. raw:: html
{% highlight lang='dataspec' %}
2019-05-03 12:14:47 +00:00
Message type: 1 byte
Source: ID, 4 byte integer
Destination: ID, 4 byte integer
2019-05-04 12:46:55 +00:00
Term: Current term (see notes), 8 byte integer
2019-05-03 12:14:47 +00:00
Last Log Term: 8 byte integer
Last Log Index: 8 byte integer
Commit Index: 8 byte integer
Log entries size: Total size in bytes, 4 byte integer
Log entries: see below, total length as specified
2019-05-02 16:19:00 +00:00
{% endhighlight %}
2019-05-04 12:46:55 +00:00
Notes
~~~~~
In the RequestVoteRequest, Term is the candidate's term.
Otherwise, it is the leader's current term.
In the AppendEntriesRequest, when the log entries size is zero,
this message is a heartbeat (keepalive) message.
2019-05-02 16:19:00 +00:00
Log Entries
```````````
The log contains zero or more log entries.
Each log entry is as follows.
2019-05-02 18:58:45 +00:00
All values are unsigned big-endian.
2019-05-02 16:19:00 +00:00
.. raw:: html
{% highlight lang='dataspec' %}
Term: 8 byte integer
Value type: 1 byte
2019-05-02 18:58:45 +00:00
Entry size: In bytes, 4 byte integer
2019-05-02 16:19:00 +00:00
Entry: length as specified
{% endhighlight %}
2019-05-02 18:58:45 +00:00
Log Contents
````````````
All values are unsigned big-endian.
2019-05-02 17:15:09 +00:00
======================== ======
Log Value Type Number
======================== ======
Application 1
Configuration 2
ClusterServer 3
LogPack 4
SnapshotSyncRequest 5
======================== ======
2019-05-02 16:19:00 +00:00
2019-05-02 17:15:09 +00:00
Application
~~~~~~~~~~~
2019-05-02 16:19:00 +00:00
TBD, probably JSON.
2019-05-02 17:15:09 +00:00
Configuration
~~~~~~~~~~~~~
2019-05-02 17:33:44 +00:00
This is used for the leader to serialize a new cluster configuration and replicate to peers.
2019-05-02 20:25:42 +00:00
It contains zero or more ClusterServer configurations.
2019-05-02 17:33:44 +00:00
.. raw:: html
{% highlight lang='dataspec' %}
Log Index: 8 byte integer
Last Log Index: 8 byte integer
2019-05-02 20:25:42 +00:00
ClusterServer Data for each server:
2019-05-02 17:33:44 +00:00
ID: 4 byte integer
Endpoint data len: In bytes, 4 byte integer
2019-05-02 20:25:42 +00:00
Endpoint data: ASCII string of the form "tcp://localhost:9001", length as specified
2019-05-02 17:33:44 +00:00
{% endhighlight %}
2019-05-02 17:15:09 +00:00
2019-05-02 18:58:45 +00:00
2019-05-02 17:15:09 +00:00
ClusterServer
~~~~~~~~~~~~~
2019-05-02 17:33:44 +00:00
The configuration information for a server in a cluster.
2019-05-02 21:01:23 +00:00
This is included only in a AddServerRequest or RemoveServerRequest message.
2019-05-02 17:33:44 +00:00
2019-05-02 20:25:42 +00:00
When used in a AddServerRequest Message:
2019-05-02 17:33:44 +00:00
.. raw:: html
{% highlight lang='dataspec' %}
ID: 4 byte integer
Endpoint data len: In bytes, 4 byte integer
2019-05-02 20:25:42 +00:00
Endpoint data: ASCII string of the form "tcp://localhost:9001", length as specified
{% endhighlight %}
When used in a RemoveServerRequest Message:
.. raw:: html
{% highlight lang='dataspec' %}
ID: 4 byte integer
2019-05-02 17:33:44 +00:00
{% endhighlight %}
2019-05-02 17:15:09 +00:00
LogPack
~~~~~~~
2019-05-03 12:14:47 +00:00
This is included only in a SyncLogRequest message.
2019-05-02 17:44:01 +00:00
The following is gzipped before transmission:
.. raw:: html
{% highlight lang='dataspec' %}
Index data len: In bytes, 4 byte integer
Log data len: In bytes, 4 byte integer
2019-05-02 20:25:42 +00:00
Index data: 8 bytes for each index, length as specified
Log data: length as specified
2019-05-02 17:44:01 +00:00
{% endhighlight %}
2019-05-02 17:15:09 +00:00
SnapshotSyncRequest
2019-05-02 17:44:01 +00:00
~~~~~~~~~~~~~~~~~~~
2019-05-02 17:15:09 +00:00
2019-05-02 21:01:23 +00:00
This is included only in a InstallSnapshotRequest message.
2019-05-02 17:15:09 +00:00
2019-05-02 17:33:44 +00:00
.. raw:: html
{% highlight lang='dataspec' %}
Last Log Index: 8 byte integer
2019-05-02 17:33:44 +00:00
Last Log Term: 8 byte integer
Config data len: In bytes, 4 byte integer
Config data: length as specified
2019-05-02 20:25:42 +00:00
Offset: The offset of the data in the database, in bytes, 8 byte integer
2019-05-02 17:33:44 +00:00
Data len: In bytes, 4 byte integer
Data: length as specified
Is Done: 1 if done, 0 if not done (1 byte)
{% endhighlight %}
2019-05-02 17:15:09 +00:00
2019-05-02 16:19:00 +00:00
Responses
---------
2019-05-03 12:14:47 +00:00
All responses are 26 bytes, as follows.
2019-05-02 17:15:09 +00:00
All values are unsigned big-endian.
2019-05-02 16:19:00 +00:00
.. raw:: html
{% highlight lang='dataspec' %}
Message type: 1 byte
2019-05-02 21:01:23 +00:00
Source: ID, 4 byte integer
2019-05-03 16:24:18 +00:00
Destination: Usually the actual destination ID (see notes), 4 byte integer
2019-05-02 21:01:23 +00:00
Term: Current term, 8 byte integer
2019-05-02 17:15:09 +00:00
Next Index: Initialized to leader last log index + 1, 8 byte integer
2019-05-04 12:46:55 +00:00
Is Accepted: 1 if accepted, 0 if not accepted (see notes), 1 byte
2019-05-02 16:19:00 +00:00
{% endhighlight %}
2019-05-03 16:24:18 +00:00
Notes
`````
The Destination ID is usually the actual destination for this message.
However, for AppendEntriesResponse, AddServerResponse, and RemoveServerResponse,
it is the ID of the current leader.
2019-05-04 12:46:55 +00:00
In the RequestVoteResponse, Is Accepted is 1 for a vote for the candidate (requestor),
and 0 for no vote.
2019-05-03 16:24:18 +00:00
2019-05-04 13:39:35 +00:00
Administration Interface
========================
TBD, possibly a separate proposal.
Requirements of an admin interface:
(not all necessarily for 1.0)
- Support for multiple master destinations, i.e. multiple virtual clusters (farms)
- Provide comprehensive view of shared cluster state - all stats published by members, who is the current leader, etc.
- Ability to anoint a leader
- Ability to force publish metaLS (if current node is leader)
- Ability to exclude hashes from metaLS (if current node is leader)
- Configuration import/export functionality for bulk deployments
Router Interface
================
TBD, possibly a separate proposal.
Requirements for Garlic Farm to router API (in-JVM java or i2pcontrol)
- getLocalRouterStatus()
- getLocalLeafHash(Hash masterHash)
- getLocalLeafStatus(Hash leaf)
- getRemoteMeasuredStatus(Hash masterOrLeaf) // probably not in MVP
- publishMetaLS(Hash masterHash, List<MetaLease> contents) // or signed MetaLeaseSet? Who signs?
- stopPublishingMetaLS(Hash masterHash)
- authentication TBD?
2019-05-02 16:19:00 +00:00
Justification
=============
Atomix is too large and won't allow customization for us to route
the protocol over I2P. Also, its wire format is undocumented, and depends
on Java serialization.
Notes
=====
Issues
======
- There's no way for a client to find out about and connect to an unknown leader.
It would be a minor change for a Follower to send the Configuration as a Log Entry in the AppendEntriesResponse.
2019-05-02 16:19:00 +00:00
Migration
=========
No backward compatibility issues.
References
==========
.. [JRAFT]
https://github.com/datatechnology/jraft
2019-05-02 21:01:23 +00:00
.. [RAFT]
https://ramcloud.stanford.edu/wiki/download/attachments/11370504/raft.pdf