Saturday, September 22, 2012

Extend a Couchbase Server with your own commands!

I was having a beer with Sergey after yesterday's CouchConf in San Francisco and we started discussing command extensions for memcached. I guess I haven't been very good at advertising about what I've been doing, because I added this to memcached a couple of years ago as part of the plugin extension framework.

I thought that it might be interesting to write up a small blog post that shows you how you may write your own commands to access a Couchbase cluster. Please note that what I'm going to show you in this blog post is abusing Couchbase Server, and nothing you should be doing in production. If you have special needs you should connect with us to let us work with you to make it happen (and ensure that you don't break stuff).

So lets imagine that you're storing relatively big objects with data into a cluster, and you only need fragments of each object at a time. The current SDK only allows you to read and write a complete item at a time, so we're going to extend our Couchbase cluster to be able to give us a fragment of the data at a time.

Luckily, I wrote this extension for memcached a couple of years ago, and it is actually bundled with the Couchbase Server. This does however not imply that it is a supported extension to be using from your Couchbase Server. So let's go ahead and enable the extension on the server:

# cd /opt/couchbase/bin
# mv memcached memcached.bin
# cat > memcached << ...
#! /bin/sh
exec `dirname $0`/memcached.bin \
            -X /opt/couchbase/lib/memcached/fragment_rw.so "$@"
...
# chmod +x memcached

If you are running a cluster you need to do this on all of your nodes (and you need to restart all of your nodes for them to load the module).

You don't learn anything unless we're walking through how the extension actually works. You'll find the module at https://github.com/trondn/memcached/tree/engine-pu/extensions/protocol in the files fragment_rw.c and fragment_rw.h.

So let's look into the details there. During startup memcached will load all shared objects specified with the -X parameter, and try to look up the function named memcached_extensions_initialize. This is where you should initialize your engine, and add it to the system. If you look at the signature for the method, you might wonder where you may specify the configuration. The "per engine" configuration is specified as part of the -X parameter. The fragment module allows you to specify the command id for read and write through "r" and "w", so we could write -X fragment.so,r=5,w=6 to tell it to use the command code 5 and 6 (please note that it would be a conflict with the existing commands ;))

Setting up a command extension happens in two different phases. The first thing you need to do is to registering your command descriptor:


server->extension->register_extension(EXTENSION_BINARY_PROTOCOL, &descriptor);

The descriptor for protocol extensions is pretty simple:

static EXTENSION_BINARY_PROTOCOL_DESCRIPTOR descriptor = {
    .get_name = get_name,
    .setup = setup
};

get_name is a function that just returns the name of the extension, whereas setup is the interesting function:


static void setup(void (*add)(EXTENSION_BINARY_PROTOCOL_DESCRIPTOR *descriptor,
                              uint8_t cmd,
                              BINARY_COMMAND_CALLBACK new_handler))
{
    add(&descriptor, read_command, handle_fragment_rw);
    add(&descriptor, write_command, handle_fragment_rw);
}

So what is happening here. The function is called with a parameter that is a function you may call in order to register handlers for various command codes. The first parameter is you should pass to this callback is the descriptor you registered. The second parameter is the command opcode you want to install a handler for, and the third parameter is the function that implements the logic.

So let's go ahead and look at how the function looks:

static ENGINE_ERROR_CODE handle_fragment_rw(EXTENSION_BINARY_PROTOCOL_DESCRIPTOR *descriptor,
                                            ENGINE_HANDLE* handle,
                                            const void* cookie,
                                            protocol_binary_request_header *request,
                                            ADD_RESPONSE response)

The descriptor is the descriptor you registered, the handle is the handle to the engine keeping all of the data (so that you may interact with the backing store). Cookie is a reference to the client who connected to you. Request is the complete package the client sent for the request, and response is a callback function you may call to send data back to the client. If you look at the implementation for handle_fragment_rw you'll see that it's not that hard to interact with the engine.


The only thing left for us now is to create create client functionality to connect to a server and send such a packet :) I leave that up to you folks for the moment (or if people would like me to post it here I'd be happy to do so. let me know).

Happy hacking!


No comments:

Post a Comment