Monday, September 16, 2013

Using libcouchbase in a multithreaded environment

How to use libcouchbase in a multithreaded environment seems to be a recurrent question, so I figured I should do a quick blog post about that. One of the design goals of libcouchbase is that it should work in a multithreaded environment, but there are ways to implement that:
  • The library does all of the locking so that the client user don't have to care about it. This makes the life for the client users easy, but might introduce unnecessary locking and limitations.
  • The client use the library in a certain way, and introduce locking according to the way the library work.

For libcouchbase I chose the second approach. There are absolutely no locks in libcouchbase, so you as a client user have to ensure that you use libcouchbase in a safe way. Luckily for you doing so isn't hard at all. As long as you don't access the same lcb_t from multiple threads at the same time you should be safe:

void *my_thread_routine(void *arg) {
    lcb_create_st *options = arg;
    lcb_t instance;
    lcb_create(&instance, options);
    ....
}

....

lcb_create_st opt;
memset(&opt, 0, sizeof(opt));
opt.version = 0;
opt.v.v0.host = "mucluster";

for (int ii = 0; ii < 100; ++ii) {
   pthread_t tid;
   pthread_create(&tid, NULL, my_thread_routine, &opt);
}

The above would create 100 threads which in turn would create its own libcouchbase instance that would be safe to use within that thread. The "problem" with the above code is that it would use a lot
of resources not only on the client, but also on the server. Each lcb instance occupies one socket connection to the cluster management console (which is pretty resource heavy on the cluster), and one data connection to each of the nodes in the cluster. The code snippet would therefore use 300 sockets for a two-noe cluster. The connection to the management node may be eliminated if we use the configuration cache (I'll blog about that at a later time), but if your application barely use the libcocuhbase instances it still requires too much resources.

One easy way to work around this is to create a pool of libcouchbase instances and grab an instance from that whenever you need to perform an operation to Couchbase, and release it back to the pool whenever you're done:

void *my_thread_routine(void *arg) {
    my_instance_pool *pool = arg;

    lcb_t instance = pool_pop(pool);
    ....

    pool_push(pool, instance);
}

....

lcb_create_st opt;
memset(&opt, 0, sizeof(opt));
opt.version = 0;
opt.v.v0.host = "mucluster";
my_instance_pool *pool = pool_create(&opt, 10);

for (int ii = 0; ii < 100; ++ii) {
   pthread_t tid;
   pthread_create(&tid, NULL, my_thread_routine, pool);
}

By using such a pool you can control the resources used (like number of sockets), all you need to do is to tune the size of the pool to match the concurrency you're aiming for. You could even set the pool size to 1, and end up with a "singleton".

One thing that is really important to note here is that you can't share the same IO instance between the threads. None of the default IO providers are multithread safe, so bad things will happen if you try
to do so. That being said there is nothing stopping you from making an MT-safe IO provider and use a dedicated IO thread that the clients utilize (but I'll leave that up to you to figure out if it is worth
the extra work ;-))

So how does the code for such a resource pool look like? I added an extremely simple implementation to the example-section of libcouchase. Feel free to look at the example and cherry-pick some ideas :-)

Happy hacking!

2 comments:

  1. Hi Trond,

    I wanted to know if libcouchbase supports TAP client api for C ?, im not able to find any api which can use TAP stream to get notifications.

    Thanks
    Sanjay

    ReplyDelete