Wednesday, October 5, 2011

libcouchbase - Explore the full features of your Couchbase server from C

I started the implementation of libcouchbase almost year ago out of my personal need to test the inner workings of Couchbase. Since I work in "core" of Couchase I needed an easy way to test out the changes I made there. At that time there were only clients for Java, Python and C# with support for REST interface to Couchbase. I'm no fan of Python, and my Solaris machine doesn't support C# so Java was my only option. I do like Java a lot, so that was my language of choice initially when I had to test something. Unfortunately I do find it a bit of extra hassle having to deal with multiple languages (and that I can't link the test code into the core to test it "from the inside" ;-)) Given that I set off trying to come up with an alternative.

Over the past few years I have put some effort into libmemcached, so the first thing I did was look at what it would take to add support for a "smart client behavior" into libmemcached. If that would be possible I could have a full featured library people maintained. Unfortunately it turned out that it wasn't straight forward to add what I wanted due to the way the library works. libmemcached would initialize it's list of server structures at initialization time, and map the key to the server it would belong to. For Couchbase I would have to be able to replace the list of servers at any time. In addition to that Couchbase uses a two-level mapping of the key. First we map the key to a vbucket, then we try to look up the server where the bucket reside. At the time I felt that I would have to change too much of the inner workings of libmemcached to get it to work for Couchbase (and given the fact that libmemcached isn't a vendor-specific library I felt it was the wrong thing to do). Since I didn't want to sit on a "ticking bomb" when it comes to trying to merge my private "patch" to libmemcached with the upstream changes I decided to just write something that would fulfill my needs. I think it's worth mentioning that libcouchbase isn't meant to be a replacement for libmemcached, but an alternative for those of you who want / needs to explore the full features of Couchbase.

The "quick'n'dirty" solution would have been to just hardcode the stuff I needed into various test programs and use copy'n'paste until I met the grim reaper, but that wouldn't have been particulary innovative (or fun). Instead I sat down and defined some criterias for the library:

  • It must all asynchronously
    I believe that all libraries should provide an asynchronously interface, and if the library provides a synchronously interface it should be built on top of the asynchronously interface and wait for the completion.
  • No internal locking
    Using locks internally may lead to lock contention. If I can avoid locking inside the library it will potentially scale better. It will be left entierly up to the client of the library to protect the variables from access by multiple threads.
  • It has to be cross platform!
    I do all of my development on Solaris, but I do respect that others may have different needs than myself. Some people prefer Mac OS, some prefer Windows and I've even heard of people using Linux. Why shouldn't all of them be able to use libcouchbase?
  • It shall not depend of a "shit load" of other modules
    It is not that I suffer from the not-invented-here-syndrome, but I do find it a PITA to have to compile tons of other libraries (which may have additional dependencies etc) just to get the library working (and keeping all of them in sync with each other as new releases comes). All library use should be justified.
  • Binary compatibility
    The binary interface should be stable and not change all the time. Coming up with a good API isn't easy, so during development the api is expected to evolve. If a client only use a committed interface he shouldn't have to do _anything_ except replacing the shared object when the new version is released.
  • No GPL
    I just can't stand the GPL license.

Although I started working on libcouchbase a year ago, it doesn't  mean that I have worked full time on it. My primary job is working on the core, so I've only extended the library when I've needed the functionality for my own testing. More recently I'm happy to say I've received some contributions from some people who've picked it up and used it and from some other Couchbase developers (thanks Sergey, Paul, Sebastian, Jan, and Bill!).  No doubt, there's something we can do better, so please drop an email to the Couchbase development mailing list (couchbase@googlegroups.com) if there's something you think needs some fixin'. I still think that the library has some rough edges that needs to be sorted out before you can start using it in production ;-)

Building the software

If you're trying to build libcoucbase on Solaris / *BSD / MacOS or Linux you can just use the "configure && make" way you're so used to. For the Windows users I've written an NMakefile you may use to compile and install the various bits. Since I'm doing almost all of my development I might have done stuff "wrong", but I'll be happy if you drop me an email telling me what I need to change.

I've been using Windows 7 with Microsoft Developer Studio 2010 to build and test the library. Execute the following commands to build and install the bits:


nmake -f NMakefile install

By default it will use c:\local as the root directory (to make it easy for you to create an installer or move it wherever you want). You can always override this by specifying INSTALL like:

nmake -f NMakefile INSTALL=c:\couchbase install

Prerequisites

We do have some prerequisites for libcouchbase. You might find binary packages available for your platform, but it shouldn't be very hard to build from source.

1) The header files from the engine branch of memcached.

Simply copy the memcached directory from https://github.com/memcached/memcached/tree/engine-pu/include to c:\local\include (or whatever you choose as your directory)

2) A sasl implementation.

libcouchbase needs to run SASL authentications to the different buckets. If you don't want to install a full featured SASL library you could always install "libisasl" from: https://github.com/membase/libisasl

3) libvbucket

The mapping between a key and the vbucket (and to locate which server the vbucket is located on) is provided by this library. https://github.com/membase/libvbucket

4) libevent (optional)

libcouchbase allows plugins to different event notification frameworks. The default framework for UNIX-like systems is libevent, so unless you're going to create your own plugin you might want to install this. Please note that the default for windows is something else so you don't need this at all for windows.

So how does the library work?


The primary idea with the library is that everything should be event driven, and that a callback should be triggered when something happens. That means that you must set up callbacks to handle everything you want. There is no simple:

std::string myvalue = libcoucbase->get("hello");

but you can easily implement that if you want.

Given the fact that there is no locking within libcouchbase you may think that it's not suited for use in a multithreaded process, but thats not true. As long as you don't use the same handle to libcouchbase from multiple threads you can use as many threads as you want (if you want to use the same libcouchbase instance from multiple threads you need to provide locking)

Enough talk, show me the code!

All you need to do in your program to start using libcouchbase is to include libcouchbase/couchbase.h. and link with libcouchbase. The first thing you would need to do is to create an instance to libcouchbase:

#include
...

const char *host = NULL; /* Use localhost:8091 */
const char *username = NULL; /* No user specified */
const char *password = NULL; /* No password specified */
const char *bucket = NULL; /* use default bucket */
struct libcouchbase_io_opt_st *io = NULL; /* Use default io options */

libcouchbase_t handle = libcouchbase_create(host, username, password,
                                            bucket, io);

if (handle == NULL) {
   /* Failed to create the handle */
}

The code fragment above does nothing more than allocate the handle to libcouchbase, and it did not try to connect it to the server to receive the list of servers etc. The username/password combination here will be used to authenticate to the REST server listening on the host port. With the handle in place we should set up the first callback: the error handler. Let's create a simple error callback that prints out the error and terminates the application:

static void error_callback(libcouchbase_t instance,
                           libcouchbase_error_t error,
                           const char *errinfo)
{
    fprintf(stderr, "%s", libcouchbase_strerror(instance, error));
    if (errinfo) {
        fprintf(stderr, ": %s", errinfo);
    }
    fprintf(stderr, "\n");
    exit(EXIT_FAILURE);
}

The callback is installed with:

libcouchbase_set_error_callback(handle, error_callback);

Now that we've got our error callback installed, we can start connecting to the server and receive the list of servers. Since everything is asyncronous we need to wait for the connect to complete (I'm not going to show you how to use the library in a shared event loop in this example).

libcouchbase_connect(handle);
// Wait for the connect to compelete
libcouchbase_wait(handle);

At this time we've got a "working" instance to libcouchbase we may use. So let's go ahead and store some items in the cache. If we don't care about the response message from the server we don't need to set up a callback, but to make the example more complete lets create a callback that terminates the program if we fail to store one of the items:

static void storage_callback(libcouchbase_t instance,
                             const void *cookie,
                             libcouchbase_storage_t operation,
                             libcouchbase_error_t error,
                             const void *key, size_t nkey,
                             uint64_t cas)
{
    if (error != LIBCOUCHBASE_SUCCESS) {
        fprintf(stderr, "Failed to store \"");
        fwrite(key, nkey, 1, stderr);
        fprintf(stderr, "\"\n");
        exit(EXIT_FAILURE);
    }
}

I don't have a good example of what we want to store, so let's just loop and store some numbers:

libcouchbase_set_storage_callback(instance, storage_callback);

for (int ii = 0; ii < 10; ++ii) {
    char key[80];
    size_t nkey = sprintf(key, "%d", ii);
    libcouchbase_store(handle, NULL, LIBCOUCHBASE_SET,
                       key, nkey, &ii, sizeof(ii), 0, 0, 0);
}
/* Wait for all of them to complete */
libcouchbase_wait(handle);

Timings

One of the things I find cool about libcouchbase is the ability to get some timings statistics about the current traffic. Everyone familiar to DTrace loves the ability to dump a histogram representing whatever you decided to measure. A lot of the times when you're running your stuff in production you might want to look at the response times you've got from your Couchbase cluster. In order to help you do that I've added some relatively lightweight timings you may use. Due to the asyncronous nature of libcouchbase (and that you're responsible to drive the event loop) you may impose a large effect on the timings so that they no longer represents the truth.. Anyway, let's add an example that utilize them to crate a histogram of the store section above (but instead of running all of them in a single batch, use a
synchronous set.

libcouchbase_enable_timings(handle);
for (int ii = 0; ii < 10; ++ii) {
    char key[80];
    size_t nkey = sprintf(key, "%d", ii);
    libcouchbase_store(handle, NULL, LIBCOUCHBASE_SET,
                       key, nkey, &ii, sizeof(ii), 0, 0, 0);
    libcouchbase_wait(handle);
}

/* Get the current timings */
libcouchbase_get_timings(handle, stdout, timings_callback);

/* Stop collecting timing information */
libcouchbase_disable_timings(handle);

So how does this "timings_callback" look like? That's completely up to you, but we could create a simple histogram with the following code:

static void timings_callback(libcouchbase_t instance, const void *cookie,
                            libcouchbase_timeunit_t timeunit,
                            uint32_t min, uint32_t max,
                            uint32_t total, uint32_t maxtotal)
{
    char buffer[1024];
    int offset = sprintf(buffer, "[%3u - %3u]", min, max);
    switch (timeunit) {
    case LIBCOUCHBASE_TIMEUNIT_NSEC:
        offset += sprintf(buffer + offset, "ns");
        break;
    case LIBCOUCHBASE_TIMEUNIT_USEC:
        offset += sprintf(buffer + offset, "us");
        break;
    case LIBCOUCHBASE_TIMEUNIT_MSEC:
        offset += sprintf(buffer + offset, "ms");
        break;
    case LIBCOUCHBASE_TIMEUNIT_SEC:
        offset += sprintf(buffer + offset, "s");
        break;
    default:
        ;
    }

    int num = (float)40.0 * (float)total / (float)maxtotal;
    offset += sprintf(buffer + offset, " |");
    for (int ii = 0; ii < num; ++ii) {
        offset += sprintf(buffer + offset, "#");
    }

    offset += sprintf(buffer + offset, " - %u\n", total);
    fputs(buffer, (FILE*)cookie);
}

This would generate something like:

[140 - 149]us |# - 2
[150 - 159]us |## - 3
[160 - 169]us |######################################## - 47
[170 - 179]us |###################### - 26
[180 - 189]us |########### - 14
[190 - 199]us |# - 2
[210 - 219]us | - 1
[220 - 229]us | - 1
[230 - 239]us | - 1
[250 - 259]us | - 1
[280 - 289]us | - 1
[400 - 409]us | - 1

Source code

I guess you want to try the program yourself :)



/* -*- Mode: C; tab-width: 4; c-basic-offset: 4; indent-tabs-mode: nil -*- */
#include
#include
#include

static void error_callback(libcouchbase_t instance,
                           libcouchbase_error_t error,
                           const char *errinfo)
{
   fprintf(stderr, "%s", libcouchbase_strerror(instance, error));
   if (errinfo) {
      fprintf(stderr, ": %s", errinfo);
   }
   fprintf(stderr, "\n");
   exit(EXIT_FAILURE);

}

static void storage_callback(libcouchbase_t instance,
                             const void *cookie,
                             libcouchbase_storage_t operation,
                             libcouchbase_error_t error,
                             const void *key, size_t nkey,
                             uint64_t cas)
{
   if (error != LIBCOUCHBASE_SUCCESS) {
      fprintf(stderr, "Failed to store \"");
      fwrite(key, nkey, 1, stderr);
      fprintf(stderr, "\"\n");
      exit(EXIT_FAILURE);
   }
}

static void timings_callback(libcouchbase_t instance, const void *cookie,
                             libcouchbase_timeunit_t timeunit,
                             uint32_t min, uint32_t max,
                             uint32_t total, uint32_t maxtotal)
{
   char buffer[1024];
   int offset = sprintf(buffer, "[%3u - %3u]", min, max);
   switch (timeunit) {
   case LIBCOUCHBASE_TIMEUNIT_NSEC:
      offset += sprintf(buffer + offset, "ns");
      break;
   case LIBCOUCHBASE_TIMEUNIT_USEC:
      offset += sprintf(buffer + offset, "us");
      break;
   case LIBCOUCHBASE_TIMEUNIT_MSEC:
      offset += sprintf(buffer + offset, "ms");
      break;
   case LIBCOUCHBASE_TIMEUNIT_SEC:
      offset += sprintf(buffer + offset, "s");
      break;
   default:
      ;
   }

   int num = (float)40.0 * (float)total / (float)maxtotal;
   offset += sprintf(buffer + offset, " |");
   for (int ii = 0; ii < num; ++ii) {
      offset += sprintf(buffer + offset, "#");
   }

   offset += sprintf(buffer + offset, " - %u\n", total);
   fputs(buffer, (FILE*)cookie);
}

int main(int argc, char **argv)
{
   const char *host = NULL; /* Use localhost:8091 */
   const char *username = NULL; /* No user specified */
   const char *password = NULL; /* No password specified */
   const char *bucket = NULL; /* use default bucket */
   struct libcouchbase_io_opt_st *io = NULL; /* Use default io options */

   libcouchbase_t handle = libcouchbase_create(host, 

                                               username, 
                                               password,
                                               bucket, io);

   if (handle == NULL) {
      /* Failed to create the handle */
      fprintf(stderr, "Failed to create instance\n");
      exit(EXIT_FAILURE);
   }
   libcouchbase_set_error_callback(handle, error_callback);
   libcouchbase_connect(handle);
   // Wait for the connect to compelete
   libcouchbase_wait(handle);
   libcouchbase_set_storage_callback(instance, storage_callback); 
   libcouchbase_enable_timings(handle);
   for (int ii = 0; ii < 100; ++ii) {
      char key[80];
      size_t nkey = sprintf(key, "%d", ii);
      libcouchbase_store(handle, NULL, LIBCOUCHBASE_SET,
                         key, nkey, &ii, sizeof(ii), 0, 0, 0);
      libcouchbase_wait(handle);
   }

   /* Get the current timings */
   libcouchbase_get_timings(handle, stdout, timings_callback);

   /* Stop collecting timing information */
   libcouchbase_disable_timings(handle);
   libcouchbase_destroy(handle);

   return 0;
}
 

Friday, January 14, 2011

SASL client library...

There is a lot of software out there that allows you to plug in SASL authentication if you got a SASL client library on your system. Some operating systems allows an easy download of a SASL library for their platform, but I have not seen any for Windows yet.


Membase supports SASL authentication, so when I started to implement libmembase I decided that I wanted to treat libsasl as a required dependency. One of my goals with libmembase is that it should be easy to compile as a dll for Windows (I'm not there yet), so I needed a libsasl dll for Windows.


Earlier today I fixed and pushed a (client side) SASL library to https://github.com/membase/libisasl/. If you're running on a Unix-like system you should build the library by using autotools, but I've added a Makefile you may use to build the dll on Windows with the following command:

nmake -f NMakefile


There is no install-target in the Makefile, so you need to copy the header-files from "include", libsasl.dll and libsasl.lib to the desired location on your machine when you're done.

Tuesday, January 11, 2011

Developing with Membase

As a developer I need to be able to start my processes in a certain way. One way to do that may be to modify the startup code we've got in our management system, but I found it way more flexible and easy to just replace our binaries with wrapper scripts that starts up our binaries.

Please note that this is something I do when I try to track down a certain bug, and not something I recommend in your production environment.

I've created my own little script that installs the wrapper script:

#! /bin/ksh

cat > /opt/membase/bin/launcher.sh <
#! /bin/ksh
logfile=/tmp/membase.log
binary=\`basename \$0\`
echo pid \$\$ : \$0 \$* >> \${logfile}
exec \${0}.bin "\$@" 2>&1 | awk "{printf(\"%d: %s\n\", $$, \\\$0); }" >> \${logfile}
EOF

chmod a+x /opt/membase/bin/launcher.sh

for f in memcached vbucketmigrator moxi
do
   mv /opt/membase/bin/${f}/${f} /opt/membase/bin/${f}/${f}.bin
   ln -s ../launcher.sh /opt/membase/bin/${f}/${f}
done

As an extra bonus this redirects all of the output from the processes to /tmp/membase.log, so that I can just check there for the error text instead of running browse_logs and start decoding the output there.

The above script use the same wrapper script for all processes, but sometimes I want to add extra options to one of the processes (like enabling verbosity for vbucketmigrator). All I need to do is just to replace the link with a copy of the file:

root@ubuntu:/opt/membase/bin# rm vbucketmigrator/vbucketmigrator
root@ubuntu:/opt/membase/bin# cp -p launcher.sh vbucketmigrator/vbucketmigrator

and edit the file. Since I'm going to add extra command line options, I'm most likely expecting more output so normally I store the output in its own file as well:

#! /bin/ksh
logfile=/tmp/vbucketmigrator.$$
binary=`basename $0`
echo pid $$ : $0 $* >> ${logfile}
exec ${0}.bin "$@" -vv 2>&1 >> ${logfile}

The next time vbucketmigrator starts it will dump the message traffic to /tmp/vbucketmigrator.pid

Monday, January 10, 2011

Dumping stats from a memcached server...

I normally dump the stats from my memcached servers with the following command:

trond@opensolaris> echo stats | nc localhost 11211

But if you start memcached with the ascii protocol disabled it becomes hard to dump the stats. I just created a small tool named mcstats that dumps the stats in the same format as you would get from the above command.

Example:
trond@opensolaris> ./mcstat
STAT evictions 0
STAT curr_items 0
STAT total_items 0
STAT bytes 0
STAT reclaimed 0
STAT engine_maxbytes 67108864
STAT pid 15436
STAT uptime 682
STAT time 1294693339
STAT version 1.3.3_499_g580ae55
STAT libevent 1.4.13-stable
STAT pointer_size 32
STAT rusage_user 0.012261
STAT rusage_system 0.014169
STAT daemon_connections 10
STAT curr_connections 11
STAT total_connections 16
STAT connection_structures 14
STAT cmd_get 0
STAT cmd_set 0
STAT cmd_flush 0
STAT auth_cmds 0
STAT auth_errors 0
STAT get_hits 0
STAT get_misses 0
STAT delete_misses 0
STAT delete_hits 0
STAT incr_misses 0
STAT incr_hits 0
STAT decr_misses 0
STAT decr_hits 0
STAT cas_misses 0
STAT cas_hits 0
STAT cas_badval 0
STAT bytes_read 106
STAT bytes_written 5656
STAT limit_maxbytes 67108864
STAT rejected_conns 0
STAT threads 4
STAT conn_yields 0

By default mcstat will connect to "localhost:11211", but you may tell it to go somewhere else by using -h host.

You'll find the tool in my git branch of the memcached source repository


Happy new year btw.