Tuesday, December 21, 2010

Running Membase on OpenSolaris

I've been wanting to run membase on my OpenSolaris server for a long time, but I haven't had the time to look into all of the details on getting it there. All of the command line tools compile and works perfectly well on OpenSolaris already, but I haven't been able to get the web ui and cluster management up and running. One of the bugs preventing an easy install of membase on OpenSolaris (and other unsupported platforms) is that the makefile for the admin ui don't have an install target (This is tracked in MB-2700 ).

Earlier today I cloned the layout of the web application on disk from the Ubuntu installation to my OpenSolaris box, and with some work I was able to get the system up and running on my OpenSolaris box. Just check it out:

trond@storm> svcs membase
STATE          STIME    FMRI
online         13:37:26 svc:/application/database/membase:membase

So let's assume you've got all of the dependencies you need to build membase installed on your OpenSolaris (I'll come up with this list later on and create an IPS package for you to install from my IPS server).

Due to MB-3227 we can't use the build method outlined in "Building on OSX from source" to build membase, but I've updated my build environment I blogged about a while ago so that it will build and install everything for you! So let's go ahead and build the stuff!

trond@storm> git clone git://github.com/trondn/tools.git
Initialized empty Git repository in /tmp/tools/.git/
remote: Counting objects: 168, done.
remote: Compressing objects: 100% (163/163), done.
remote: Total 168 (delta 89), reused 0 (delta 0)
Receiving objects: 100% (168/168), 157.56 KiB | 231 KiB/s, done.
Resolving deltas: 100% (89/89), done.
trond@storm> cd tools/membase/smf/membase
trond@storm> ./setup -u -s -z scratch               (scratch is the name of my zfs pool)
trond@storm> cd ../..
trond@storm> chown -R trond:staff /opt/membase
trond@storm> ./setup.sh -d /opt/membase/dev membase
Download commit hook - Ok.
Checking out libmemcached (Bazaar) - Ok.
Checking out bucket_engine (git) - Ok.
Checking out ep-engine (git) - Ok.
Checking out libconflate (git) - Ok.
Checking out libvbucket (git) - Ok.
Checking out memcached (git) - Ok.
Checking out moxi (git) - Ok.
Checking out vbucketmigrator (git) - Ok.
Checking out membase-cli (git) - Ok.
Checking out ns_server (git) - Ok.
Checking out memcachetest (git) - Ok.
Checking out mc-hammer (git) - Ok.
Checking out libmembase (git) - Ok.
Configure build for SunOS
trond@storm> cd membase
trond@storm> gmake install

Due to MB-2926 we need to change the file layout to move the files so that management process in membase can locate the tools. In theory we shouldn't need to create links for the libraries as well (because the runtime path of the binary should pick up the correct one), but unfortunately ns_server use the absolute path for the library.

trond@storm> cd /opt/membase/bin/dev
trond@storm> rm memcached vbucketmigrator moxi
trond@storm> mkdir memcached vbucketmigrator moxi bucket_engine ep_engine
trond@storm> cd memcached
trond@storm> ln -s ../amd64/memcached
trond@storm> ln -s ../../lib/amd64/default_engine.so
trond@storm> ln -s ../../lib/amd64/stdin_term_handler.so
trond@storm> cd ../vbucketmigrator
trond@storm> ln -s ../amd64/vbucketmigrator
trond@storm> cd ../moxi
trond@storm> ln -s ../amd64/moxi
trond@storm> cd ../bucket_engine
trond@storm> ln -s ../../lib/amd64/bucket_engine.so
trond@storm> cd ../ep_engine
trond@storm> ln -s ../../lib/amd64/ep.so
trond@storm> chown -R membase:membase /opt/membase

So let's go ahead and start the service by running:

trond@storm> svcadm enable membase

And navigate your browser to http://localhost:8091/ and start configuring your server. I was able to let my OpenSolaris server join my Ubuntu cluster.

Happy holidays!

Friday, December 17, 2010

Building libmembase

I guess I was a bit too excited yesterday when I announced libmembase, because I completely forgot to mention where to get and build it.

You will find the code at: https://github.com/trondn/libmembase, and building it should be pretty easy once you've installed all of the required dependencies. I've tried to keep the list of dependencies as short as possible (so you might think I'm suffering from the "not invented here syndrome" since I implemented the REST support myself instead of using cURL, but the main reason for that was to keep the list short ;-)):
  • git
  • Auto tools (automake, autoheader, autoconf, libtoolize)
  • A C99 compiler
  • libvbucket
  • libevent
  • SASL library

I have integrated building of libmembase in the build system I'm using to build membase making it easy for you to get the code and run configure with the correct set of options (see this blog entry for a description).

If you want to build it yourself all you need is:

$ git clone git://github.com/trondn/libmembase.git
$ cd libmembase
$ ./config/autorun.sh
$ ./configure --prefix=/opt/membase
$ make all install

Thursday, December 16, 2010

libmembase - a C interface to Membase

Membase is "on the wire" compatible with any memcached server if you connect to the standard memcached port (registered by myself back in 2009), so that you should should be able to access membase with any "memcapable" client. Backing this port is our membase proxy named moxi, and behind the scene it will do SASL authentication and proxy your requests to the correct membase server containing the item you want. One of the things that differs Membase from Memcached is that we store each item in a given vbucket that is mapped to a server. When you grow or shrink the cluster, membase will move the vbuckets to new servers.

There is no such thing as a free lunch, so accessing membase through moxi "costs" more than talking directly to the individual nodes yourself. We like to refer to such clients as "smart clients". As a developer on Memcached I need to test various stuff, so I went ahead and hacked together a quick prototype of such a library to ease my testing. Initially I wanted to extend libmemcached with this functionality, but that seemed to be a big (and risky) change I didn't have the guts to do at the time.

The current state of the library is far from production quality, and with a minimal list of supported features. So why announce it now? Well I don't think I'll find the time to implement everything myself, so I'm hoping that people will join me in adding features to the library when they need something that isn't there...

I've designed the library to be 100% callback based and integrated with libevent, making it easy for you to plug it into your application.

So let's say you want to create a TAP stream and listen to all of the modifications that happens in your cluster. All you need to do would be:

struct event_base *evbase = event_init();

   libmembase_t instance = libmembase_create(host, username, passwd, bucket, evbase);
   libmembase_connect(instance);

   libmembase_tap_filter_t filter;
   libmembase_callback_t callbacks = {
      .tap_mutation = tap_mutation
   };
   libmembase_set_callbacks(instance, &callbacks);
   libmembase_tap_cluster(instance, filter, true);

Then you would implement the tap callback function as:

static void tap_mutation(libmembase_t instance, const void *key, size_t nkey, const void *data, size_t nbytes, uint32_t flags, uint32_t exp, const void *es, size_t nes)
{
   // Do whatever you want with the object
}


And thats all you need to do to tap your entire cluster :-) Let's extend the example to tap multiple buckets from the same code.

struct event_base *evbase = event_init();

   libmembase_t instance1 = libmembase_create(host, username, passwd, bucket1, evbase);
   libmembase_t instance2 = libmembase_create(host, username, passwd, bucket2, evbase);
   libmembase_connect(instance1);
   libmembase_connect(instance2);

   libmembase_tap_filter_t filter;
   libmembase_callback_t callbacks = {
      .tap_mutation = tap_mutation
   };
   libmembase_set_callbacks(instance1, &callbacks);
   libmembase_set_callbacks(instance2, &callbacks);
   libmembase_tap_cluster(instance1, filter, false);
   libmembase_tap_cluster(instance2, filter, false);

   event_base_loop(evbase, 0);

The instance handle is passed to the callback function so you should be able to tell which bucket each mutation event belongs to.

As I said all of the functions in the API is callback based, so if you want to retrieve an object you have to register a callback for get before calling libmembase_mget. Ex:
libmembase_callback_t callbacks = {
        .get = get_callback
    };
    libmembase_set_callbacks(instance, &callbacks);
    libmembase_mget(instance, num_keys, (const void * const *)keys, nkey);

    // If you don't want to run your own event loop, you can call the following method
    // that will run all spooled commands and wait for their replies before breaking out
    // of the event loop
    libmembase_execute(instance);

The signature for the get callback looks like:
void get_callback(libmembase_t instance, libmembase_error_t error, const void *key, size_t nkey, const void *bytes, size_t nbytes, uint32_t flags, uint64_t cas)
{
   // do whatever you want...
}

So what is missing from the library right now?
  • Proper error handling. Right now I'm using asserts and abort() to handle error situations, causing your application to crash... you don't want that in production ;-)
  • Timeouts.. Right now it will only time out on TCP timeouts../
  • A lot of operations! I'm only supporting get/add/replace/set...
  • Fetch replicas..
  • Gracefully handle change in the vbucket list
  • +++

Do you feel like hacking on some of them?

Saturday, October 30, 2010

Running Moxi on Solaris

I have been working on getting membase up'n'running on OpenSolaris as a side project. Most of it is already in place, but there are still some Makefile issues to sort out. I thought that while we're waiting to complete that task, I could show you how to easily run moxi as a service controlled by SMF.

I've created some scripts to make it easier for you to build and install everything, so the first we need to do is to check out (or update your clone) of my tools repository:

trond@opensolaris> git clone git://github.com/trondn/tools.git
trond@opensolaris> cd tools/membase

Next up we need to create some new ZFS datasets for our moxi installation. I've created a script that creates the zfs datasets and set up the mountpoints:

trond@opensolaris> ./smf/moxi/setup.sh -u -z rpool

The -u option tells the script to create authorizations, profiles, users and groups we need, and the -z option tells the script to create the zfs filesystems in the zfs pool named rpool.

Next up we need to compile (and install) the source code. The directory /opt/membase is not writable for us, so let's change the ownership so I can install files there...:

trond@opensolaris> pfexec chown trond:staff /opt/membase
trond@opensolaris> ./setup.sh -d /opt/membase moxi
Download commit hook - Ok.
Checking out libmemcached (Bazaar) - Ok.
Checking out bucket_engine (git) - Ok.
Checking out ep-engine (git) - Ok.
Checking out libconflate (git) - Ok.
Checking out libvbucket (git) - Ok.
Checking out memcached (git) - Ok.
Checking out moxi (git) - Ok.
Checking out vbucketmigrator (git) - Ok.
Checking out membase-cli (git) - Ok.
Checking out ns_server (git) - Ok.
Checking out memcachetest (git) - Ok.
Configure build for SunOS
trond@opensolaris> cd moxi/SunOS
trond@opensolaris> make all install

Now we've got everything installed to /opt/membase, so let's change the ownership to membase:membase and install the SMF script to manage moxi:

trond@opensolaris> chown -R membase:membase /opt/membase
trond@opensolaris> cd ../../smf/moxi
trond@opensolaris> ./setup.sh -s
moxi installed as /lib/svc/method/moxi
moxi.xml installed as /var/svc/manifest/application/moxi.xml

So let's check out the configuration options we got for our new SMF service:

trond@opensolaris> svccfg
svc:> select moxi
svc:/application/database/moxi> listprop
manifestfiles                                        framework
manifestfiles/var_svc_manifest_application_moxi_xml  astring  /var/svc/manifest/application/moxi.xml
general                                              framework
general/action_authorization                         astring  solaris.smf.manage.moxi
general/entity_stability                             astring  Unstable
general/single_instance                              boolean  true
general/value_authorization                          astring  solaris.smf.value.moxi
multi-user-server                                    dependency
multi-user-server/entities                           fmri     svc:/milestone/multi-user-server
multi-user-server/grouping                           astring  require_all
multi-user-server/restart_on                         astring  none
multi-user-server/type                               astring  service
moxi                                                 application
moxi/corepattern                                     astring  /var/opt/membase/cores/core.%f.%p
moxi/downstream_max                                  astring  8
moxi/port                                            astring  11211
moxi/threads                                         astring  4
moxi/url                                             astring  http://membase:8091/pools/default/bucketStreaming/default
moxi/version                                         astring  1.6.0
tm_common_name                                       template
tm_common_name/C                                     ustring  Membase
tm_man_moxi                                          template
tm_man_moxi/manpath                                  astring  /opt/membase/share/man
tm_man_moxi/section                                  astring  1
tm_man_moxi/title                                    astring  moxi

You will most likely want to set the URL parameter to point to the bucket you want to use..

svc:/application/database/moxi> setprop moxi/url=http://myserver:8091/pools/default/bucketStreaming/default

Let's refresh the configuration and start the service:

trond@opensolaris> svccfg refresh moxi
trond@opensolaris> svcadm enable moxi
trond@opensolaris> svcs moxi
STATE          STIME    FMRI
online          9:45:41 svc:/application/database/moxi:moxi

Installing Python script from automake, Fixup :)

In my previous blog post I added a wrapper script to start the Python script, but it turns out that this script don't work unless you pass --libdir=something to configure. I didn't catch that originally because I always specify the library directory due to the fact that I'm building both 32bit and 64bit binaries on my Solaris machine.

The following script should address the problem:

#! /bin/sh
prefix=@prefix@
exec_prefix=@exec_prefix@
root=@libdir@/python

if test -z "${PYTHONPATH}"; then
   PYTHONPATH=$root
else
   PYTHONPATH=$root:${PYTHONPATH}
fi
export PYTHONPATH
exec $root/`basename $0`.py "$@"

Thursday, October 28, 2010

Installing Python scripts from automake...

I've been working on making it easier for developers to compile and install membase, and today I learned some more automake magic. I'm one of those developers who don't want to spend a lot of time working on the build system, I want to spend my time working on the code. At the same time I don't want to do unnecessary boring manual work that the build system should do for me.

Parts of membase is implemented in Python, and I've been trying to figure out how to install those pieces. I don't like to mess up the /bin directory with "library" files, so I needed a way to package the Python bits better. I've been using a wrapper script that sets the PYTHONPATH variable earlier, but I've never tried to integrate that into an automake generated makefile.

As always I started out asking google for help, but I didn't end up with a good and easy example so I ended up reading through the automake manual. It turns out that it's fairly easy to do exactly what I want, so I decided to share the knowledge in a blog post :-)

We don't want to hardcode the path to our binary anywhere, so the first thing we need to do is to update configure.ac to also generate our wrapper script:

AC_CONFIG_FILES(Makefile python_wrapper)

I've got multiple programs implemented with Python, and I don't want to create a ton of wrappers, so my python_wrapper.in looks like:

#! /bin/sh
if test -z "${PYTHONPATH}"; then
   PYTHONPATH=@libdir@/python
else
   PYTHONPATH=@libdir@/python:${PYTHONPATH}
fi
export PYTHONPATH
exec @libdir@/python/`basename $0`.py "$@"

This means that if I install this script as /opt/membase/bin/stats, it will try to execute /opt/membase/lib/python/stats.py with the same arguments. So let's go ahead and add a rule to Makefile.am to generate the scripts with the correct names:

PYTHON_TOOLS=stats
${PYTHON_TOOLS}: python_wrapper
    cp $< $@

BUILT_SOURCES += ${PYTHON_TOOLS}
CLEANFILES+= ${PYTHON_TOOLS}
bin_SCRIPTS+= ${PYTHON_TOOLS}

 Now we've got the wrapper script in place, and we've generated all of the scripts to start our programs. The next thing up would be to create the destination directory for the python bits, and install all of them there. To do so we need to create a variable that ends with "dir" to contain the name of the directory. Let's name our "pythonlibdir" and put it in a subdirectory named python of the specified libdir:

pythonlibdir=$(libdir)/python


Finally we need to list all of the files we want to go there:

pythonlib_DATA= \
                mc_bin_client.py \
                mc_bin_server.py \
                memcacheConstants.py
  
pythonlib_SCRIPTS= \
                stats.py \

The reason I use pythonlib_SCRIPTS for the last one is because I want the execute bit set on file.

Sunday, October 17, 2010

Writing your own storage engine for Memcached, part 3

Right now we've got an engine capable of running get and set load, but it is doing synchrounus filesystem IO. We can't serve our client faster than we can read the item from disk, but we might serve other connections while we're reading the item off disk.

In this entry we're going to fix our get and store method so that they don't block the engine API. As I've said earlier, the intention of this tutorial is to focus on the engine API. That means I'm not going to try to make an effective design, because that could distract the focus from what I'm trying to explain. If people are interested in how we could optimize this, I could add a second part of the tutorial in the future... Just let me know.

In order to implement asynchronous operations in our engine, we need to make use of the API the server makes available to us in create_instance. Let's extend our engine structure to keep track of the server API:

struct fs_engine {
   ENGINE_HANDLE_V1 engine;
   SERVER_HANDLE_V1 sapi;
};

...

MEMCACHED_PUBLIC_API
ENGINE_ERROR_CODE create_instance(uint64_t interface,
                                  GET_SERVER_API get_server_api,
                                  ENGINE_HANDLE **handle)
{

...

   h->engine.item_set_cas = fs_item_set_cas;
   h->engine.get_item_info = fs_get_item_info;
   h->sapi = *get_server_api();

...

      

To implement an asynchronous function in the engine interface, the engine needs to dispatch the request to another thread before it return ENGINE_EWOULDBLOCK from the engine function. When the backend is done processing the result, it notifies the memcached core by using the notify_io_complete function in the server interface. If an error occurred while processing the request, the memcached core will return the error message to the client. If your engine called notify_io_complete with ENGINE_SUCCESS, the memcached core will call the engine interface function once more with the same argument as the
first time.

If you look at the server api, you'll see that the it got an interface for storing an engine-specific pointer. This will make our life easier when we want to implement async io. So let's go ahead and update our fs_get method:

static ENGINE_ERROR_CODE fs_get(ENGINE_HANDLE* handle,
                                const void* cookie,
                                item** item,
                                const void* key,
                                const int nkey,
                                uint16_t vbucket)
{
   struct fs_engine *engine = (struct fs_engine *)engine;
   void *res = engine->sapi.cookie->get_engine_specific(cookie);
   if (res != NULL) {
      *item = res;
      engine->sapi.cookie->store_engine_specific(cookie, NULL);
      return ENGINE_SUCCESS;

   }

...
      

The next thing we need to do is to create a function that runs asynchronously and stores the result in the engine_specific setting for the cookie. Since we're going to use async tasks for all of the engine methods, let's go ahead and create a function to run tasks in another thread:

static ENGINE_ERROR_CODE execute(struct task *task)
{
   pthread_attr_t attr;
   pthread_t tid;

   if (pthread_attr_init(&attr) != 0 ||
       pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED) != 0 ||
       pthread_create(&tid, &attr, task->callback, task) != 0) {
      return ENGINE_FAILED;
   }

   return ENGINE_EWOULDBLOCK;
}
      

As you can see from the code I'm going to create a new thread to execute each operation. This isn't very efficient, because creating a new thread got a substansial overhead. In your design you would probably want a pool of threads to run your tasks.

The newly created thread would run the specialized callback with a pointer to the task as it's only argument. So what does this task structure look like?

struct task {
   struct fs_engine *engine; /* Pointer to the engine */
   const void *cookie; /* The cookie requesting the operation */
   void *(*callback)(void *arg);
   union {
      struct {
         char key[PATH_MAX];
         size_t nkey;
      } get; /* Data used by the get operation */
      struct {
         item *item;
         ENGINE_STORE_OPERATION operation;
      } store; /* Data used by the store operation */
   } data;
};
      
So let's finish up fs_get:

static ENGINE_ERROR_CODE fs_get(ENGINE_HANDLE* handle,
                                const void* cookie,
                                item** item,
                                const void* key,
                                const int nkey,
                                uint16_t vbucket)
{
   struct fs_engine *engine = (struct fs_engine *)handle;
   /* Check to see if this is the callback from an earlier ewouldblock */
   void *res = engine->sapi.cookie->get_engine_specific(cookie);
   if (res != NULL) {
      *item = res;
      engine->sapi.cookie->store_engine_specific(cookie, NULL);
      return ENGINE_SUCCESS;
   }

   /* We don't support keys longer than PATH_MAX */
   if (nkey >= PATH_MAX) {
      return ENGINE_FAILED;
   }

   /* Set up the callback struct */
   struct task *task = calloc(1, sizeof(*task));
   if (task == NULL) {
      return ENGINE_ENOMEM;
   }

   task->engine = (struct fs_engine *)handle;
   task->cookie = cookie;
   task->callback = fs_get_callback;
   memcpy(task->data.get.key, key, nkey);
   task->data.get.nkey = nkey;

   ENGINE_ERROR_CODE ret = execute(task);
   if (ret != ENGINE_EWOULDBLOCK) {
      free(task);
   }
   return ret;
}
      

If you look at the code above, you'll see that we specify gs_get_callback as the function to execute. So let's go ahead and implement the callback:

static void *fs_get_callback(void *arg)
{
   struct task *task = arg;
   char *fname = task->data.get.key;
   task->data.get.key[task->data.get.nkey] = '\0';

   struct stat st;
   if (stat(fname, &st) == -1) {
      task->engine->sapi.cookie->notify_io_complete(task->cookie,
                                                    ENGINE_KEY_ENOENT);
      free(task);
      return NULL;
   }

   struct fs_item* it = NULL;
   ENGINE_ERROR_CODE ret = fs_allocate((ENGINE_HANDLE*)task->engine,
                                       task->cookie, (void**)&it,
                                       task->data.get.key,
                                       task->data.get.nkey,
                                       st.st_size, 0, 0);
   if (ret != ENGINE_SUCCESS) {
      task->engine->sapi.cookie->notify_io_complete(task->cookie,
                                                    ENGINE_ENOMEM);
      free(task);
      return NULL;
   }

   FILE *fp = fopen(fname, "r");
   if (fp == NULL) {
      fs_release((ENGINE_HANDLE*)task->engine, task->cookie, it);
      task->engine->sapi.cookie->notify_io_complete(task->cookie,
                                                    ENGINE_FAILED);
      free(task);
      return NULL;
   }

   size_t nr = fread(it->data, 1, it->ndata, fp);
   fclose(fp);
   if (nr != it->ndata) {
      fs_release((ENGINE_HANDLE*)task->engine, task->cookie, it);
      task->engine->sapi.cookie->notify_io_complete(task->cookie,
                                                    ENGINE_FAILED);
      free(task);
      return NULL;
   }

   task->engine->sapi.cookie->store_engine_specific(task->cookie, it);
   task->engine->sapi.cookie->notify_io_complete(task->cookie,
                                                 ENGINE_SUCCESS);
   return NULL;
}
      

As you see it's quite easy to add asynchronous support for the engine functions. Let's go ahead and do the same for fs_store:

static ENGINE_ERROR_CODE fs_store(ENGINE_HANDLE* handle,
                                  const void *cookie,
                                  item* item,
                                  uint64_t *cas,
                                  ENGINE_STORE_OPERATION operation,
                                  uint16_t vbucket)
{
   struct fs_engine *engine = (struct fs_engine *)handle;
   /* Check to see if this is the callback from an earlier ewouldblock */
   void *res = engine->sapi.cookie->get_engine_specific(cookie);
   if (res != NULL) {
      *cas = 0;
      engine->sapi.cookie->store_engine_specific(cookie, NULL);
      return ENGINE_SUCCESS;
   }


   /* Set up the callback struct */
   struct task *task = calloc(1, sizeof(*task));
   if (task == NULL) {
      return ENGINE_ENOMEM;
   }

   task->engine = (struct fs_engine *)handle;
   task->cookie = cookie;
   task->callback = fs_store_callback;
   task->data.store.item = item;
   task->data.store.operation = operation;

   ENGINE_ERROR_CODE ret = execute(task);
   if (ret != ENGINE_EWOULDBLOCK) {
      free(task);
   }
   return ret;
}

And fs_store_callback looks like the following:

static void *fs_store_callback(void *arg)
{
   struct task *task = arg;
   struct fs_item* it = task->data.store.item;
   char fname[it->nkey + 1];
   memcpy(fname, it->key, it->nkey);
   fname[it->nkey] = '\0';

   FILE *fp = fopen(fname, "w");
   if (fp == NULL) {
      task->engine->sapi.cookie->notify_io_complete(task->cookie,
                                                    ENGINE_NOT_STORED);
      free(task);
      return NULL;
   }

   size_t nw = fwrite(it->data, 1, it->ndata, fp);
   fclose(fp);
   if (nw != it->ndata) {
      remove(fname);
      task->engine->sapi.cookie->notify_io_complete(task->cookie,
                                                    ENGINE_NOT_STORED);
      free(task);
      return NULL;
   }

   task->engine->sapi.cookie->store_engine_specific(task->cookie, it);
   task->engine->sapi.cookie->notify_io_complete(task->cookie,
                                                 ENGINE_SUCCESS);
   return NULL;
}

If you look closely at the code above you'll see that we still don't differentiate between add/set/replace, but we'll fix that in the next session.

Tuesday, October 12, 2010

Building membase from the sources...

I thought I should share some information about my personal development model for membase.

I've set up a "sandbox" where I'm doing all of my development in with the following commands:

trond@opensolaris> pfexec zfs create -o mountpoint=/source rpool/source
trond@opensolaris> pfexec chown trond:staff /source
trond@opensolaris> mkdir /source/membase
trond@opensolaris> cd /source/membase
trond@opensolaris> git clone git://github.com/trondn/tools.git
trond@opensolaris> cd tools/membase

I like to keep my changes as separated as possible, to reduce the dependencies between them. Whenever I am fixing a bug report I would do something like:

trond@opensolaris> mkdir bugnnn
trond@opensolaris> cd bugnnn
trond@opensolaris> ln -s ../Makefile
trond@opensolaris> make

That would build the entire Membase stack and put the files in /tmp/membase-build. I would then change my working directory to the module where I'm going to fix a bug and (hopefully) fix it.

After fixing the bug (and writing a test case!) I would commit the change and push it for review with the following commands:

trond@opensolaris> git add -p   (and select the changes to include)
trond@opensolaris> git commit -m "bugnnn: blah blah blah"
trond@opensolaris> git for-review

The last command there would push the change to our review system, so that Dustin or someone else can read through the diffs and accept the patch if they like it.

If you look at the workflow above it looks pretty easy, there is however one little thing that is really annoying... That is that Membase is a cross platform project, so I need to ensure that the code compiles and works on all of our platforms. With the method above I would have to log into another system and set everything up and copy my change over to see that it works. For simple changes that only touch one module I could always use buildbot or Hudson to test it on all platforms, but that doesn't work if I do an interface change that affects all of our modules.

I'm kind of lazy so I don't want to do such boring work all of the time, so instead I wrote a script to set up the sources and create makefiles so that I can easily build the same source tree on all of my platforms.

In order for it to work you need to set up sharing on your filesystem:

trond@opensolaris> pfexec zfs set sharenfs=on rpool/source
trond@opensolaris> pfexec zfs set sharesmb=name=source rpool/source

To set up a tree for lets say bug 9999 I would run something like:

trond@opensolaris> ./setup.sh bug_9999
Download commit hook - Ok.
Checking out libmemcached (Bazaar) - Ok.
  Generate configure script - Ok.
Checking out bucket_engine (git) - Ok.
Checking out ep-engine (git) - Ok.
  Generate configure script - Ok.
Checking out libconflate (git) - Ok.
  Generate configure script - Ok.
Checking out libvbucket (git) - Ok.
  Generate configure script - Ok.
Checking out memcached (git) - Ok.
  Generate configure script - Ok.
Checking out moxi (git) - Ok.
  Generate configure script - Ok.
Checking out vbucketmigrator (git) - Ok.
  Generate configure script - Ok.
Checking out membase-cli (git) - Ok.
Checking out ns_server (git) - Ok.
Checking out memcachetest (git) - Ok.
  Generate configure script - Ok.
Configure build for SunOS

This will set up a build environemnt for Solaris that builds membase as a "multi isa" (both 32 and 64 bit) stack in /tmp/membase. But let's add support for my MacOSX, Ubuntu and Debian box as well. Since all of the code is located on my opensolaris box, I need to use the -s option to let it know where the source is located:

trond@opensolaris> ./setup.sh -s /net/opensolaris/source/membase/tools/membase -p Ubuntu bug_9999
Configure build for Ubuntu
trond@opensolaris> ./setup.sh -s /net/opensolaris/source/membase/tools/membase -p Darwin bug_9999
Configure build for Darwin
trond@opensolaris> ./setup.sh -s /net/opensolaris/source/membase/tools/membase -p Debian bug_9999
Configure build for Debian

So let's look inside the bug_9999 directory:

trond@opensolaris> ls -l bug_9999
total 15
drwxr-xr-x  13 trond    staff         14 Oct 12 13:35 Darwin
drwxr-xr-x  13 trond    staff         14 Oct 12 13:35 Debian
drwxr-xr-x  13 trond    staff         14 Oct 12 13:33 src
drwxr-xr-x   4 trond    staff          5 Oct 12 13:33 SunOS
drwxr-xr-x  13 trond    staff         14 Oct 12 13:35 Ubuntu

All of the sources are located in the src directory, and all of the makefiles for the various platforms will reference that code.

To build on all of my platforms I'm just executing:

trond@opensolaris> ssh ubuntu "cd /net/opensolaris/source/membase/tools/membase/bug_9999/Ubuntu && make" > ubuntu.log 2>&1 &
trond@opensolaris> ssh debian "cd /net/opensolaris/source/membase/tools/membase/bug_9999/Debian && make" > debian.log 2>&1 &
trond@opensolaris> ssh darwin "cd /net/opensolaris/source/membase/tools/membase/bug_9999/Darwin && make" > darwin.log 2>&1 &

but I've got that in a script of course:

trond@opensolaris> cat bug_9999/build.sh
#! /bin/ksh
cd SunOS && gmake > sunos.log 2>&1 &
ssh ubuntu "cd /net/opensolaris/source/membase/tools/membase/bug_9999/Ubuntu && make" > ubuntu.log 2>&1 &
ssh debian "cd /net/opensolaris/source/membase/tools/membase/bug_9999/Debian && make" > debian.log 2>&1 &
ssh darwin "cd /net/opensolaris/source/membase/tools/membase/bug_9999/Darwin && make" > darwin.log 2>&1 &
xterm -T SunOS -e tail -f sunos.log &
xterm -T Ubuntu -e tail -f ubuntu.log &
xterm -T Debian -e tail -f debian.log &
xterm -T MacOS -e tail -f darwin.log &

Unfortunately you can't start the membase we just installed in /tmp/membase, but I'm working on it!

Friday, October 8, 2010

Writing your own storage engine for Memcached, part 2

In the previous blog post I described the engine initialization and destruction. This blog post will cover the memory allocation model in the engine interface.

The memcached core is responsible for allocating all of the memory it needs for its connections (send / receive buffers etc), and the engine is responsible for allocating (and freeing) all of the memory it needs to keep track of the items. The engines shouldn't have to care about the memory the core allocates (and use), but the core will access the memory managed by the engine.

When the memcached core is about to store a new item it needs to get a (as of today continous) buffer to store the data for the item. The core will try to allocate this buffer by calling the allocate function in the API. So let's start extending our example code by adding out own implementation of the allocate function. The first thing we need to do is to add it to our engine descriptor we return from create_instance. We're going to add a number of functions in todays entry, so let's just map all of them while we're at it:

MEMCACHED_PUBLIC_API
ENGINE_ERROR_CODE create_instance(uint64_t interface,
                                  GET_SERVER_API get_server_api,
                                  ENGINE_HANDLE **handle)
{
   [ ... cut ... ]
  /*
   * Map the API entry points to our functions that implement them.
   */
   h->engine.initialize = fs_initialize;
   h->engine.destroy = fs_destroy;
   h->engine.get_info = fs_get_info;
   h->engine.allocate = fs_allocate;
   h->engine.remove = fs_item_delete;
   h->engine.release = fs_item_release;
   h->engine.get = fs_get;
   h->engine.get_stats = fs_get_stats;
   h->engine.reset_stats = fs_reset_stats;
   h->engine.store = fs_store;
   h->engine.flush = fs_flush;
   h->engine.unknown_command = fs_unknown_command;
   h->engine.item_set_cas = fs_item_set_cas;
   h->engine.get_item_info = fs_get_item_info;
      
The next thing we need to do is to create a data structure to keep the information we need. The purpose of this tutorial isn't to create a memory efficient implementation, but to exercise the API. So let's just create the following struct:

struct fs_item {
   void *key;
   size_t nkey;
   void *data;
   size_t ndata;
   int flags;
   rel_time_t exptime;
};
      
Our implementation of allocate would then look like:

static ENGINE_ERROR_CODE fs_allocate(ENGINE_HANDLE* handle,
                                     const void* cookie,
                                     item **item,
                                     const void* key,
                                     const size_t nkey,
                                     const size_t nbytes,
                                     const int flags,
                                     const rel_time_t exptime)
{
   struct fs_item *it = malloc(sizeof(struct fs_item));
   if (it == NULL) {
      return ENGINE_ENOMEM;
   }
   it->flags = flags;
   it->exptime = exptime;
   it->nkey = nkey;
   it->ndata = nbytes;
   it->key = malloc(nkey);
   it->data = malloc(nbytes);
   if (it->key == NULL || it->data == NULL) {
      free(it->key);
      free(it->data);
      free(it);
      return ENGINE_ENOMEM;
   }
   memcpy(it->key, key, nkey);
   *item = it;
   return ENGINE_SUCCESS;
}
      
If you look in the implementation above you'll see that we didn't return the pointer to the actual memory for the data storage to the memcached core. To get that address memcached will call get_item_info in the API. So let's implememnt that:

static bool fs_get_item_info(ENGINE_HANDLE *handle, const void *cookie,
                             const item* item, item_info *item_info)
{
   struct fs_item* it = (struct fs_item*)item;
   if (item_info->nvalue < 1) {
      return false;
   }

   item_info->cas = 0; /* Not supported */
   item_info->clsid = 0; /* Not supported */
   item_info->exptime = it->exptime;
   item_info->flags = it->flags;
   item_info->key = it->key;
   item_info->nkey = it->nkey;
   item_info->nbytes = it->ndata; /* Total length of the items data */
   item_info->nvalue = 1; /* Number of fragments used */
   item_info->value[0].iov_base = it->data; /* pointer to fragment 1 */
   item_info->value[0].iov_len = it->ndata; /* Length of fragment 1 */

   return true;
}
      
The get_item_info function is important and deserve more information. If you look in the engine API the "item" is defined as a void pointer, and we defined our own item-structure to keep track of the information we need on a per item basis. The memcached core will however need to know
where to read / write the memory for the key and the data going to / coming from a clinet. To do so in will invoke get_item_info. If you look closely at our implementation of fs_get_item_info you will see that the first thing I'm doing is to check that item_info->nvalue contains at least 1
element. Right now it will always be one, but the intention is that we're going to support scattered IO.

When the core is done moving the data it received over the wire into the item, it will try to store the item in our engine by calling store. So let's go ahead and create a simple implementation (we'll extend it later on in the tutorial):

static ENGINE_ERROR_CODE fs_store(ENGINE_HANDLE* handle,
                                  const void *cookie,
                                  item* item,
                                  uint64_t *cas,
                                  ENGINE_STORE_OPERATION operation,
                                  uint16_t vbucket)
{
   struct fs_item* it = item;
   char fname[it->nkey + 1];
   memcpy(fname, it->key, it->nkey);
   fname[it->nkey] = '\0';
   FILE *fp = fopen(fname, "w");
   if (fp == NULL) {
      return ENGINE_NOT_STORED;
   }
   size_t nw = fwrite(it->data, 1, it->ndata, fp);
   fclose(fp);
   if (nw != it->ndata) {
      remove(fname);
      return ENGINE_NOT_STORED;
   }

   *cas = 0;
   return ENGINE_SUCCESS;
}
      
If you look at the implementation above you will see that it doesn't implement the correct semantics for add/replace/set etc, and it will block memcached while we're doing file IO. Don't care about that right now, because we'll get back to that.

When the core is done using the item it allocated, it will release the item by calling the release function in the API. The engine may reuse the items storage for something else at this time. So let's hook up our release implementation:

static void fs_item_release(ENGINE_HANDLE* handle,
                            const void *cookie,
                            item* item)
{
   struct fs_item *it = item;
   free(it->key);
   free(it->data);
   free(it);
}
      
Now we've created all of the code to successfully store items in our engine, but we can't read any of them back. So let's implement get

static ENGINE_ERROR_CODE fs_get(ENGINE_HANDLE* handle,
                                const void* cookie,
                                item** item,
                                const void* key,
                                const int nkey,
                                uint16_t vbucket)
{

   char fname[nkey + 1];
   memcpy(fname, key, nkey);
   fname[nkey] = '\0';

   struct stat st;
   if (stat(fname, &st) == -1) {
      return ENGINE_NOT_STORED;
   }

   struct fs_item* it = NULL;
   ENGINE_ERROR_CODE ret = fs_allocate(handle, cookie, (void**)&it, key, nkey,
                                       st.st_size, 0, 0);
   if (ret != ENGINE_SUCCESS) {
      return ENGINE_ENOMEM;
   }

   FILE *fp = fopen(fname, "r");
   if (fp == NULL) {
      fs_release(handle, cookie, it);
      return ENGINE_FAILED;
   }

   size_t nr = fread(it->data, 1, it->ndata, fp);
   fclose(fp);
   if (nr != it->ndata) {
      fs_release(handle, cookie, it);
      return ENGINE_FAILED;
   }

   *item = it;
   return ENGINE_SUCCESS;
}
      
Let's add a dummy implementation for the rest of the API and try to load and test the engine:

static const engine_info* fs_get_info(ENGINE_HANDLE* handle)
{
   static engine_info info = {
      .description = "Filesystem engine v0.1",
      .num_features = 0
   };

   return &info;
}

static ENGINE_ERROR_CODE fs_item_delete(ENGINE_HANDLE* handle,
                                        const void* cookie,
                                        const void* key,
                                        const size_t nkey,
                                        uint64_t cas,
                                        uint16_t vbucket)
{
   return ENGINE_KEY_ENOENT;
}

static ENGINE_ERROR_CODE fs_get_stats(ENGINE_HANDLE* handle,
                                      const void* cookie,
                                      const char* stat_key,
                                      int nkey,
                                      ADD_STAT add_stat)
{
   return ENGINE_SUCCESS;
}

static ENGINE_ERROR_CODE fs_flush(ENGINE_HANDLE* handle,
                                  const void* cookie, time_t when)
{

   return ENGINE_SUCCESS;
}

static void fs_reset_stats(ENGINE_HANDLE* handle, const void *cookie)
{

}

static ENGINE_ERROR_CODE fs_unknown_command(ENGINE_HANDLE* handle,
                                            const void* cookie,
                                            protocol_binary_request_header *request,
                                            ADD_RESPONSE response)
{
   return ENGINE_ENOTSUP;
}

static void fs_item_set_cas(ENGINE_HANDLE *handle, const void *cookie,
                            item* item, uint64_t val)
{
}

      
So let's go ahead and try our engine:

trond@opensolaris> /opt/memcached/bin/memcached -E .libs/fs_engine.so
      
From another terminal I'm typing in:

trond@opensolaris> telnet localhost 11211
Trying ::1...
Connected to opensolaris.
Escape character is '^]'.
add test 0 0 4
test
STORED
get test
VALUE test 0 4
test
END
quit
Connection to storm closed by foreign host.
      
Terminate memcached by pressing ctrl-c, and look in the current directory:

trond@opensolaris> ls -l test
-rw-r--r--   1 trond    users          6 Oct  8 12:56 test
trond@opensolaris> cat test
test
      
That's all for this time.

Monday, October 4, 2010

Writing your own storage engine for Memcached

I am working full time on membase, which utilize the "engine interface" we're adding to Memcached. Being the one who designed the API and wrote the documentation, I can say that we do need more (and better) documentation without insulting anyone. This blog entry will be the first entry in mini-tutorial on how to write your own storage engine. I will try to cover all aspects of the engine interface while we're building an engine that stores all of the keys on files on the server.

This entry will cover the basic steps of setting up your development environment and cover the lifecycle of the engine.

Set up the development environment

The easiest way to get "up'n'running" is to install my development branch of the engine interface. Just execute the following commands:

$ git clone git://github.com/trondn/memcached.git
$ cd memcached
$ git -b engine origin/engine
$ ./config/autorun.sh
$ ./configure --prefix=/opt/memcached
$ make all install
     

Lets verify that the server works by executing the following commands:

$ /opt/memcached/bin/memcached -E default_engine.so &
$ echo version | nc localhost 11211
VERSION 1.3.3_433_g82fb476     ≶-- you may get another output string....
$ fg
$ ctrl-C
     

Creating the filesystem engine

You might want to use autoconf to build your engine, but setting up autoconf is way beyond the scope of this tutorial. Let's just use the following Makefile instead.

ROOT=/opt/memcached
INCLUDE=-I${ROOT}/include

#CC = gcc
#CFLAGS=-std=gnu99 -g -DNDEBUG -fno-strict-aliasing -Wall \
# -Wstrict-prototypes -Wmissing-prototypes -Wmissing-declarations \
# -Wredundant-decls \
# ${INCLUDE} -DHAVE_CONFIG_H
#LDFLAGS=-shared

CC=cc
CFLAGS=-I${ROOT}/include -m64 -xldscope=hidden -mt -g \
      -errfmt=error -errwarn -errshort=tags  -KPIC
LDFLAGS=-G -z defs -m64 -mt

all: .libs/fs_engine.so

install: all
 ${CP} .libs/fs_engine.so ${ROOT}/lib

SRC = fs_engine.c
OBJS = ${SRC:%.c=.libs/%.o}

.libs/fs_engine.so: .libs $(OBJS)
 ${LINK.c} -o $@ ${OBJS}

.libs:; -@mkdir $@

.libs/%.o: %.c
 ${COMPILE.c} $< -o $@   clean:  $(RM) .libs/fs_engine.so $(OBJS)       

I am doing most of my development on Solaris using the Sun Studio compilers, but I have added a section with settings for gcc there if you're using gcc. Just comment out lines for CC, CFLAGS and LDFLAGS and remove the # for the gcc alternatives.

In order for memcached to utilize your storage engine it needs to first load your module, and then create an instance the engine. You use the -E option to memcached to specify the name of the module memcached should load. With the module loaded memcached will look for a symbol named create_instance in the module to create an handle memcached can use to communicate with the engine. This is the first function we need to create, and it should have the following signature:

MEMCACHED_PUBLIC_API
ENGINE_ERROR_CODE create_instance(uint64_t interface, GET_SERVER_API get_server_api, ENGINE_HANDLE **handle);
     

The purpose of this function is to provide the server a handle to our module, but we should not perform any kind of initialization of our engine yet. The reason for that is because the memcached server may not support the version of the API we provide. The intention is that the server should notify the engine with the "highest" interface version it supports through interface, and the engine must return a descriptor to one of those interfaces through the handle. If the engine don't support any of those interfaces it should return ENGINE_ENOTSUP.

So let's go ahead and define a engine descriptor for our example engine and create an implementation for create_instance:

struct fs_engine {
  ENGINE_HANDLE_V1 engine;
  /* We're going to extend this structure later on */
};

MEMCACHED_PUBLIC_API
ENGINE_ERROR_CODE create_instance(uint64_t interface,
                                 GET_SERVER_API get_server_api,
                                 ENGINE_HANDLE **handle) {
  /*
   * Verify that the interface from the server is one we support. Right now
   * there is only one interface, so we would accept all of them (and it would
   * be up to the server to refuse us... I'm adding the test here so you
   * get the picture..
   */
  if (interface == 0) {
     return ENGINE_ENOTSUP;
  }

  /*
   * Allocate memory for the engine descriptor. I'm no big fan of using
   * global variables, because that might create problems later on if
   * we later on decide to create multiple instances of the same engine.
   * Better to be on the safe side from day one...
   */
  struct fs_engine *h = calloc(1, sizeof(*h));
  if (h == NULL) {
     return ENGINE_ENOMEM;
  }

  /*
   * We're going to implement the first version of the engine API, so
   * we need to inform the memcached core what kind of structure it should
   * expect
   */
  h->engine.interface.interface = 1;

  /*
   * Map the API entry points to our functions that implement them.
   */
  h->engine.initialize = fs_initialize;
  h->engine.destroy = fs_destroy;

  /* Pass the handle back to the core */
  *handle = (ENGINE_HANDLE*)h;

  return ENGINE_SUCCESS;
}
     

If the interface we provide in create_instance is dropped from the supported interfaces in memcached, the core will call destroy() immediately. The memcached core guarantees that it will never use any pointers returned from the engine when destroy() is called.

So let's go ahead and implement our destroy() function. If you look at our implementation of create_instance you will see that we mapped destroy() to a function named fs_destroy():

static void fs_destroy(ENGINE_HANDLE* handle) {
  /* Release the memory allocated for the engine descriptor */
  free(handle);
}
     

If the core implements the interface we specify, the core will call a the initialize() method. This is the time where you should do all sort of initialization in your engine (like connecting to a database, initializing mutexes etc). The initialize function is called only once per instance returned from create_instance (even if the memcached core use multiple threads). The core will not call any other functions in the api before the initialization method returns.

We don't need any kind of initialization at this moment, so we can use the following initialization code:

static ENGINE_ERROR_CODE fs_initialize(ENGINE_HANDLE* handle,
                                      const char* config_str) {
  return ENGINE_SUCCESS;
}
     

If the engine returns anything else than ENGINE_SUCCESS, the memcached core will refuse to use the engine and call destroy()

In the next blog entry we will start adding functionality so that we can load our engine and handle commands from the client.

Tuesday, September 7, 2010

Birkebeinerrittet

I got inspired to attend the worlds largest cross country bicycle race named Birkebeinerrittet when I saw a TV show from the race a couple of years back. Everyone that knows me knows that I'm probably as far as you can get from a top athlete (I'm a hacker ;-), so I borrowed a bike from a good friend of mine earlier this summer and started to prepare for the 96km bike ride over the mountain from Rena to Lillehammer.

I had a nice trip over the mountain, but unfortunately it rained most of the time up there so I didn't get to see much of the nature. On the bright side I was one of the lucky ones that chose to attend the race on Friday instead of the main event on Saturday since it kept on raining that night and the next day causing the track to be even more muddy (and the temperature kept on dropping).

Because this was the first time I attended the race I didn't know what to expect, so I was a bit afraid if I would run out of energy. After the race I felt I had more to give, so I am really motivated for attending next year as well. There is a separate quota for foreigners who want to attend the race, so please join me next year!

Wednesday, July 28, 2010

libmemcached on win32

It's a fact that people love their development platform and want to stick with it. I'm a die hard Solaris fan, and would never dream of switching to something else. I've heard that there is a crowd out there that likes to work on other systems like Windows, MacOSX, BSD and Linux. That the developer use a platform during development doesn't necessary mean that the target product will run on the platform, but the developer is more productive on that platform.

People can argue as much as they want, but there is a large crowd of developers using Windows. There is also a large number of systems running some version of Windows, so enabling them to use the projects I'm working on is a good thing. Earlier today I pushed a branch that adds support for building libmemcached into a dll on Windows.

90% of the source code in libmemcached is just "logic" that applies for all platforms, but there is a small part of the code that interacts tightly with the operating system. "Everything" is a file descriptor on Unix systems, but Windows got their own subsystem for sockets called WinSock. In order to avoid getting tonns of #ifdefs all over the code, I defined memcached_socket_t to represent a socket object, and used "the WinSock way" to implement the code. It is pretty easy to map the WinSock code to work on Unix systems with a couple of macros.

Well, enough talk. If you're interested in the details you can check out the branch on Launchpad.

The easiest way for you to test out the code is to install the fullinstall of msysgit. Unfortunately it doesn't come with all of the tools needed to build libmemcached (you can't generate a configure script and generate the documentation). This means that you cannot build the development branch unless you got another machine where you can generate the configure script. I am exporting parts of my ZFS filesystem via CIFS, so I generated the configure script on Solaris.

Building libmemcached with mingw is just as easy as on your favorite platform:

$ ./configure --with-memcached= --without-docs
$ make all install

I haven't fixed the test suite yet, so you have to wait a bit longer before you can run make test ;)

Tuesday, March 30, 2010

Building Memcached on Windows

I like to be able to compile my software on multiple platforms with different compilers, because it force me to write that complies to standards (if not you'll most likely have a spaghetti of #ifdefs all over your code). One of the bonuses you get from using multiple compilers is that you'll get "more eyes" looking at your code, and they may warn you on different things. Supporting multiple platforms will also make it easier to ensure that you don't have 'hidden bombs' in your code (byte order, alignment, assumptions that you can dereference a NIL pointer etc)...

Building software that runs on different flavors of Unix/Linux isn't that hard, but adding support for Microsoft Windows offers some challenges. Among the challenges are:

  1. Microsoft Windows doesn't provide the headerfiles found on most Unix-like systems (unistd.h etc).
  2. Sockets is implemented in a separate "subsystem" on windows.
  3. Win32 provides another threads/mutex implementation than what most software written for Unix-like systems use (pthreads).
  4. Microsoft doesn't supply a C99 compiler. One would think that it would be safe to assume that one could use a C standard that is more than 10 years old....

I am really glad to announce that we just pushed a number of changesets to memcached to allow you to build memcached on windows "just as easy" as you would on your favorite Unix/Linux system. This means that as of today users using Microsoft Windows is no longer stuck with an ancient version!

`The first thing you need to do in order to build memcached on Windows is to install the "fullversion" of msysgit. In addition to installing git, it also installs a compiler capable of building a 32 bit version of memcached.

So let's go ahead and build the dependencies we need to get our 32bit memcached version up and running!

The first thing up is libevent. You should download the latest 2.x release available (there is a bug in all versions up to (and including) 2.0.4, so unless there is a new one out there you can grab a development version I pushed to libevent-2.0.4-alpha-dev.tar.gz), and install it with the following commands (from within your msysgit-shell). Please note that I'm using /tmp because I've had problems using my "home directory" because of spaces in the directory name (or at least I guess that's the reason ;-) ):

$ cd /tmp
$ tar xfz libevent-2.0.4-alpha-dev.tar.gz
$ cd libevent-2.0.4-alpha-dev
$ ./configure --prefix=/usr/local
$ make all                    (this will fail with an error, but don't care about that.. it's is in the example code)
$ make install             (this will fail with an error, but don't care about that.. it's just an example)

It's time to start build memcached!!! Let's check out the source code and build it!

$ git clone git://github.com/trondn/memcached.git
$ git checkout -t origin/engine
$ make -f win32/Makefile.mingw

You should now be able to start memcached with the following command:

$ ./memcached.exe -E ./libs/default_engine.so

Go ahead and telnet to port 11211 and issue the "stats" command to verify that it works!

You don't need to install msysgit on the systems where you want to run memcached, but you do need to include pthreadGC2.dll from the msysgit distribution. That means that if you want to run memcached on another machine you need to copy the following files: memcached.exe .libs/default_engine.so and /mingw/bin/pthreadGC2.dll (Just place them in the same directory on the machine you want to run memcached on :-)

But wait, the world is 64 bit by now. Lets build a 64 bit binary instead

The msysgit we installed previously isn't capable of building a 64-bit binary, so we need to install a new compiler. Download the a bundle from http://sourceforge.net/projects/mingw-w64/files/ (I used mingw-w64-bin_i686-mingw_20100129.zip ). You can "install" it by running the following commands in our msysgit shell:

$ mkdir /mingw64
$ cd /mingw64
$ unzip /mingw-w64-bin_i686-minw_20100129.zip
$ export PATH=/mingw64/bin:/usr/local/bin:$PATH

So let's start compiling libevent:

$ cd /tmp
$ tar xfz libevent-2.0.4-alpha-dev.tar.gz
$ cd libevent-2.0.4-alpha-dev
$ ./configure --prefix=/usr/local --host=x86_64-w64-mingw32 --build=i686-pc-mingw32
$ make all
$ make install

The compiler we just downloaded didn't come with a 64-bit version of pthreads, so we have to download and build that ourself. I've pushed a package to my machine at: pthreads-w64-2-8-0-release.tar.gz

$ cd /tmp
$ tar xfz pthreads-w64-2-8-0-release.tar.gz
$ cd pthreads-w64-2-8-0-release
$ make clean GC CC=x86_64-w64-mingw32-gcc
$ cp pthread.h semaphore.h sched.h /usr/local/include
$ cp libpthreadGC2.a /usr/local/lib/libpthread.a
$ cp pthreadGC2.dll /usr/local/lib
$ cp pthreadGC2.dll /usr/local/bin

And finally memcached with:

$ git clone git://github.com/trondn/memcached.git
$ git checkout -t origin/engine
$ make -f win32/Makefile.mingw CC=x86_64-w64-mingw32-gcc

So how does it perform?

I'm pretty sure a lot of the "Linux fanboys" are ready to jump in and tell how much faster the Linux version is. I have to admit that I was a bit skeptical in the beginning if the layers we added to get "Unix compatibility" had any performance impact. The only way to know for sure is to run a benchmark to measure the performance.

I ran a small benchmark where I had my client create 512 connections to the server, and then randomly choosing a connection and perform a get or a set operation on one of the 100 000 items I had stored in the server. I ran 500 000 operation (33% of the operations are set operations), and calculated the average time. I can dual-boot one of my machines into Windows 7 and Red Hat Enterprise Linux, so my results shows the "out of the box"-numbers between Windows 7 and Red Hat Enterprise Linux running on the same hardware. I used my OpenSolaris box to drive the test (connected to the same switch). Posting numbers is always "dangerous", because people put too much into the absolute numbers. The intention with my graphs is to show that the version running on Microsoft Windows is pretty much "on par" with the Linux version.

256 byte userdata

512 byte userdata

1024 byte userdata

Wednesday, March 3, 2010

Memcached with SASL on OpenSolaris - part 2

It turns out that some systems doesn't support shadow.h and fgetspent, so I just updated the patch to no longer require them.. To create the file, simply do: echo "myuser:mypass" >> my_sasl_pwdb Happy hacking

Thursday, February 25, 2010

Memcached with SASL on OpenSolaris

You may have tried to build memcached with SASL support on OpenSolaris with the following result:

trond@opensolaris< ./configure --enable-sasl
[ ... cut ... ]
checking sasl/sasl.h usability... yes
checking sasl/sasl.h presence... yes
checking for sasl/sasl.h... yes
checking for library containing sasl_server_init... no
configure: error: Failed to locate the library containing sasl_server_init

This is because configure only tries to look for sasl_server_init in libsasl2, and OpenSolaris use libsasl instead. Yesterday I pushed a fix to my github repository that search for the symbol in libsasl as well.

Configuring SASL may also be a challenge (I want to spend my time writing code, not be a system administrator), so I decided to add support for plaintext passwords as well. You probably don't want to use this on your production servers, but it comes in really handy if you just want to test your favorite client.

You enable support for plaintext password by passing --enable-sasl-pwdb to configure. I didn't want to spend any time to writing a new parser or come up with a new file format, so I decided to use fgetspent_r to read the password file. This means that as long as you follow the format for a shadow file, you're good to go :-) You have to set the name of the file to use as the password file in the environment variable MEMCACHED_SASL_PWDB:

trond@opensolaris> echo "myname:mypass:::::::" > /tmp/memcached-sasl-db
trond@opensolaris> export MEMCACHED_SASL_PWDB=/tmp/memcached-sasl-db

With a password file in place, you have to create a config file for SASL to instruct it to use plain text password authentication:

trond@opensolaris> echo "mech_list: plain" > memcached.conf

If you don't want to install this as the global configuration for memcached, you should specify the location of the file in SASL_CONF_PATH:

trond@opensolaris> export SASL_CONF_PATH=`pwd`/memcached.conf

You then start the memcached deamon with "-S" to enable SASL authentication:

trond@opensolaris> ./memcached -S -d

So let's run some commands to the server and see how this works. I'm using the SASL support I implemented in libmemcached (not integrated yet, but you may download it from https://code.launchpad.net/~trond-norbye/libmemcached/sasl):

trond@opensolaris> ./memcp --servers=localhost:11211 --binary \
                                                    --username=myname --password=inncorrect \
                                                    memcp.c
memcp: memcp.c: memcache error AUTHENTICATION FAILURE
trond@opensolaris> ./memcp --servers=localhost:11211 --binary \
                                                    --username=myname --password=mypass \
                                                    memcp.c
trond@opensolaris> ./memcat --servers=localhost:11211 --binary \
                                                     --username=myname --password=mypass \
                                                     memcp.c
[ ... output of memcp.c ... ]
That's all for now. Happy hacking :-)

Thursday, February 11, 2010

New opportunities

I've been really quiet lately, so I guess I should come with a short update on what I'm up to. As of 2010 I am no longer a Sun Microsystems employee, so I spent January in Mountain View in the offices of my new employer: NorthScale. I'm currently working on getting up to speed on my new tasks, so I'll be back with more information later.