RFC: Command server protocol

Idan Kamara idankk86 at gmail.com
Mon Jun 13 20:05:44 UTC 2011


On Mon, Jun 13, 2011 at 7:03 PM, Matt Mackall <mpm at selenic.com> wrote:
> On Mon, 2011-06-13 at 18:15 +0300, Idan Kamara wrote:
>> On Mon, Jun 13, 2011 at 1:50 AM, Matt Mackall <mpm at selenic.com> wrote:
>> > On Sun, 2011-06-12 at 19:15 +0300, Idan Kamara wrote:
>> >> Here's an overview of the current protocol used by the command server (also
>> >> available here http://mercurial.selenic.com/wiki/CommandServer). Feedback is
>> >> appreciated.
>> >>
>> >> All communication with the server is done on stdin/stdout. The byte order
>> >> used by the server is big-endian.
>> >
>> > When is big-endian used?
>>
>> In the channel header. And all other length fields which are unsigned ints.
>
>> >
>> >> Data sent from the server is channel based, meaning a (channel [character],
>> >> length [unsigned int]) pair is sent before the actual data. For example:
>> >>
>> >> o
>> >> 1234
>> >
>> > Is this '1234' in text or in binary? If it's binary, how many bytes is
>> > it?
>>
>> It's binary, 4 bytes according to
>> http://docs.python.org/library/struct.html#format-characters
>
> Did you clarify this on the wiki?

Sort of: Data sent from the server is channel based, meaning a
(channel [character], length [unsigned int]) pair is sent before the
actual data.

I'll make sure it's more clear by linking to the python docs.

>
>> >
>> >> <data: 1234 bytes>
>> >>
>> >> that is 1234 bytes sent on channel 'o', with the data following.
>> >>
>> >> When starting the server, it will send a new-line separated list of
>> >> capabilities (on the 'o' channel), in this format:
>> >>
>> >> capabilities:\n
>> >> capability1\n
>> >> capability2\n
>> >> ...
>> >
>> > There should probably be a blank line or something indicating that
>> > there's no more data arriving?
>>
>> It's one string with all the capabilities being sent on the output channel.
>> So the client sees this as one chunk.
>
> Ok.
>
>> >> Channels
>> >> --------------
>> >> There are currently 5 channels:
>> >>
>> >> * o - Output channel. Most of the communication happens on this channel.
>> >> When running commands, output Mercurial writes to stdout is written to this
>> >> channel.
>> >> * e - Error channel. When running commands, this correlates to stderr.
>> >> * i - Input channel. The length field here can either be 0, telling the
>> >> client to send all input, or some positive number telling the client to send
>> >> at most <length> bytes.
>> >> * l - Line based input channel. The client should send a single line of
>> >> input (trimmed if length is not 0). This channel is used when Mercurial
>> >> interacts with the user or when iterating over stdin.
>> >
>> > What should a client do with unexpected channel responses?
>> >
>> > For instance, what happens when a progress channel is added? What
>> > happens if a client gets an unexpected prompt?
>>
>> Since progress is considered output, it needs to consume it and ignore it
>> if it's of no interest to him.
>
> If a client written today encounters a progress channel tomorrow, how
> does it know not to abort? It wasn't written to expect that.

The client can choose what to do when he gets data on an unexpected channel.
Unless we mess up with the initial design, I don't see why ignoring
unexpected data shouldn't be fine.
(by ignoring I mean just reading the data and doing nothing with it)

(side note: I'm not sure progress deserves its own channel, since it's
written to stderr
it will end up in the 'e'rror channel).

>
>> >> Input should be sent on stdin in the following format:
>> >>
>> >> length
>> >> data
>> >
>> > The input model is interesting: it basically has the server prompting
>> > the client for input. That probably makes sense, but we should probably
>> > be explicit about what's required to avoid deadlock.
>> >
>> > For instance, if the server is both consuming input and producing
>> > output, and the client is simply spooling input (ie a big patch), it
>> > will eventually write enough data to the client that its write blocks.
>> >
>>
>> Right. But technically if the server writes output while asking for input,
>> for the client to know it needs to send more input, it will have to
>> read the output first.
>
> You wrote:
>
>>> * i - Input channel. The length field here can either be 0, telling
> the
>> >> client to send all input
>
> So how does this happen? Does the client simply start writing and write
> until it's finished? What happens if the client wants to send a 50MB
> chunk?

At the moment it will have to send it in one chunk, which is probably bad.
I think we might want the server in this case ('i'nput channel,
length=0) to read loop from the client
until it says it's finished. That way the client can feed it with data
without having
the pipe explode.

>> > The wiki page has a piece about error codes but it's not quite clear how
>> > a client distinguishes those from the output stream.
>>
>> Yeah. This is a problem if the server sends a \0 as part of its 'regular'
>> output. The client will be misled thinking it's the end.
>
> And it definitely can.
>
>> Maybe we could use another channel here ('a'dmin?) for the server
>> to tell the client that a command finished and to send its return code.
>
> How about 'r'esult. We can use this generically for client command
> results.

Sounds good. I've updated the wiki:
- result channel: The server uses this channel to tell the client that
a command finished by writing its return code (a signed integer).



More information about the Mercurial-devel mailing list