summaryrefslogtreecommitdiff
path: root/proto.pod
blob: 74d673540321d3cab15e771234da34eef99119f5 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328

=head1 Introduction

This document defines a protocol to link two nodes together with the intent
that they become part of the same network. The protocol is run on top of a
reliable bidirectional and stream-oriented connection, such as UNIX sockets or
TCP. The protocol makes no distinction between the side that connects and the
side that is connected to - both sides initiate communication simultaneusly
once the connection has been established.



=head1 Handshake

After the connection has been established, both sides immediately send a
handshake message. This message is a space-separated list of parameters,
followed by a newline character (0x0A) or CRLF sequence (0x0D 0x0A). Parameters
may themselves contain multiple items by separating them with commas.
Parameters must be self-describing, the order in which they appear does not
matter.

The following is an example handshake.

  A: ver,1,0 seri,gob,json sero,gob,json
  B: ver,1.0 seri,storable,json sero,storable,json

Currently defined parameters are:

=over

=item ver

Protocol version, Formatted as a decimal number, followed by a dot and another
decimal number. The first number is the major version and the second the minor.
Minor versions are used for backward-compatible additions to the protocol,
whereas major version changes indicate incompatibilities. If a side supports
multiple major versions, it can list these in the same ver parameter by
separating them with a comma. Since minor releases are always
backward-compatible, only the highest supported minor version has to be listed.

After both sides have received the others' handshake, the highest supported
version that both support will be used for further communication. If there is
no common major version supported by both ends, then the connection should be
closed. Examples:

  A: ver,1.0,2.4
  B: ver,1.3,2.2
  # Used protocol: 2.2

  A: ver,1.0
  B: ver,1.2,2.0
  # Used protocol: 1.0

  A: ver,1.5
  B: ver,2.1
  # No common major version, disconnect

This document describes protocol version C<1.0>, which is the only version
available as of writing this document.

=item seri and sero

Supported serialization formats for incoming messages (I<seri>) and outgoing
messages (I<sero>). Multiple serialization formats may be delimited with a
comma. The I<seri> list should be ordered so that the most preferable format is
listed first and the least preferable format last. What is "preferable" is
defined by the implementation or even by each application individually. The
order of the I<sero> list does not matter.  The serialization formats are
defined later in this document.

Each side chooses which format to use for encoding outgoing messages by
comparing its own I<sero> list with the I<seri> list of the other. The first
format found in the others' I<seri> list that also appears in its own I<sero>
list is used.

Similarly, each side chooses which format to use for decoding incoming messages
by comparing its own I<seri> list with the I<sero> list of the other. The first
format found in its own I<seri> that also appears in the others' I<sero> list
is used.

This ensures that the format used for encoding outgoing messages on one side is
the same as the format used to decode incoming messages on the other side and
vice versa. This also allows implementations to support a different list of
input and output serialization formats. If there are no common formats for
either incoming or outgoing messages, then the handshake will fail and the
connection is closed. Examples:

  A: seri,gob,json sero,gob,json
  B: seri,storable,json sero,storable,json
  # B will write json messages to A
  # A will write json messages to B

  A: seri,json sero,storable,json
  B: seri,storable sero,storable,json
  # A will write storable messages to B
  # B will write json messages to A

  A: seri,gob sero,gob
  B: seri,json sero,gob
  # No common format for messages from A to B, disconnect

=back





=head1 Messages

After the handshake, messages are exchanged asynchronously. The message format
described here is purely conceptual, see the serialization formats below for
actual encoding of these messages.


=head2 Pattern synchronisation

Each side of the link operates in one of two modes: One in which the sender of
tuples is not aware of the patterns that the receiver is interested in, and one
in which the pattern list of the receiver is synchronised with the sender. The
initial mode is the one without synchronisation of patterns. The side that is
responsible for sending the tuples dictates which mode to use.

I<TODO:> This explanation is awfully vague, and so are the descriptions below.

=over

=item PATTERNSYNC [on|off]

Request the other side of the connection to either enable or disable
synchronisation of its pattern list. If synchronisation is disabled, no
REGISTER or UNREGISTER messages will be sent.

I<TODO:> Note about sending the initial list.

=item REGISTER $pid $pattern

Indicates that the sender of the message is interested in receiving tuples
matching $pattern. $pid is a positive number between 1 and C<2^31-1> and
uniquely identifies this pattern among other patterns registered by the sender.
Id numbers may be re-used after an UNREGISTER has been sent.

When a REGISTER message is received with a $pid that has already been used
before, this should be taken as a sequence of C<UNREGISTER $pid> followed by
the REGISTER command.

=item REGDONE

I<TODO>

=item UNREGISTER $pid

Indicates that the sender of the message is not interested anymore in receiving
tuples matching the pattern that has previously been registered with REGISTER.

An UNREGISTER message with a $pid not known to the receiver (e.g. it has
already been UNREGISTER'ed or has never been REGISTER'ed previously) should be
ignored.

=back


=head2 Tuple communication

=over

=item TUPLE $tid $tuple

Passes a tuple from the sending side to the receiving side. Every tuple is only
sent once, even if the other side has registered multiple patterns for which
the tuple matches.

$tid is a positive number between 0 and C<2^31-1>. If the sending side is not
interested in replies to this tuple (at least, not through a return-path), then
$tid is 0. Any other value is a means to identify the return-path for this
tuple among other active return-paths. This allows replies to this tuple to be
sent back with the REPLY message. Every TUPLE message with $tid>0 B<must> be
replied to either with a corresponding CLOSE or a disconnect. (Note, however,
that there isn't a strict time bound on when this CLOSE has to occur)

=item REPLY $tid $tuple

Send back a reply tuple over the return-path identified with $tid.

=item CLOSE $tid

Close the return-path identified with $tid.

=back

Either side may close the connection at any point. It is not necessary for to
send CLOSE on open return-paths or UNREGISTER for registered patterns when this
happens, things should be cleaned up automatically.





=head1 Serialization formats

=head2 JSON

The JSON serialization format uses the JSON format specified in RFC-4627 and
described on L<http://json.org/>. One exception is made to the specification:
the newline character is not allowed within a message, as it is used as message
delimiter.

=head3 Tuples & patterns

A tuple is represented as an array in JSON. The elements of the tuple map quite
naturally to JSON: C<null> represents a wildcard, a number represents an
integer or a float, a string represents a string, an array represents an array
and an object represents a map. The value C<true> should be taken as an alias
of the integer C<1> and C<false> as an alias for C<0>.

=head3 Messages

Each message is encoded in a JSON array. The first element in the array
indicates the type of message as an integer, and is followed by any arguments
specific to the message type.  Messages are delimited with a newline.

=over

=item PATTERNSYNC

Message type 1. Second element is either C<true> or C<false> to enable or
disable pattern synchronisation.

  [1,true]

=item REGISTER

Message type 2. The second element is the $pid, encoded as a number. The third
element is the $pattern.

  [2,14,["object",null,1]]

=item REGDONE

Message type 3. No extra elements.

  [3]

=item UNREGISTER

Message type 4. The second element is the same as the second element of
REGISTER.

  [4,14]

=item TUPLE

Message type 5. Second element is the $tid, encoded as a number. The third
element is the tuple.

  [5,0,["variable","set","listen",false]]

=item REPLY

Message type 6. Exact same encoding as TUPLE.

  [6,29382,[1]]

=item CLOSE

Message type 7. Second element is the $tid, encoded as a number.

  [7,29382]

=back




=head1 Stream protocols and security

This section is purely informative and only lists some recommendations, it is
not a specification that implementations must follow.

The above protocol does not specify any security, it assumes that this is
handled in the lower protocol that takes care of setting up the stream. Here I
present a few recommendations and techniques for adding this security. These
assume that there is at least one process in the Tanja network that runs on a
UNIX-like machine (ideally this process also runs the sessions that you wish to
communicate with, but it could just as well play the role of a tuple router or
broker).

=head2 UNIX sockets

Any process that accepts incoming connections for linking with Tanja nodes
(let's call this a I<server>), should do this with a UNIX listen socket. This
way, only processes on the local machine with the correct user permissions will
be able to connect. There is no need for additional security within the
application layer in this setup: As soon as a client connects to the UNIX
socket, the link protocol is initiated and the handshake starts.

=head2 SSH

The above mentioned servers only accept connections from the local machine. To
also allow connections from remote machines, the server (not the server
process, but the system it runs on) should be configured to accept SSH
connections for a certain user that has the right permissions to connect to the
UNIX socket. This does not necessarily have to be the same user as the one that
runs the server process, as long as it can connect to the local socket.

Processes that wish to connect to the remote server can then use SSH to log in
remotely, connect to the UNIX socket and then somehow forward this connection
over SSH. Unfortunately, the ubiquitously available OpenSSH implementation does
not seem to offer a method for forwarding UNIX sockets, but it is still
possible to do this by remotely executing a command that connects to the UNIX
socket and forwards the connection to standard I/O.

If the server has the I<socat> utility installed, then the remote command is as
follows:

  socat - UNIX-CONNECT:/path/to/local/server.socket

Unfortunately, socat is not installed by default on most current UNIX systems.
If the server has Perl installed (which is very likely), then the following
perl (wannabe) one-liner can be used as well: (Broken over multiple lines for
formatting purposes, but it really is a single command).

  perl -MIO::Socket -MIO::Select -e '$|=1;$u=IO::Socket::UNIX->new(shift)
  ||exit;$u->autoflush(1);$s=IO::Select->new(STDIN);$s->add($u);
  while(@r=$s->can_read){for(@r){$b="";sysread($_,$b,10240)||exit;
  print{$_==$u?STDOUT:$u}$b;}}' /path/to/local/server.socket

Suggestions for a more elegant solution are highly appreciated.