Quantcast
Channel: advait – Persistent Storage Solutions
Viewing all articles
Browse latest Browse all 43

Apache Cassandra – NoSQL storage solution

$
0
0

These days I am exploring another storage solution – Cassandra.

Apache Cassandra datastore was originally developed by Facebook as open source NoSQL data storage system. Its actually based on Amazon’s dynamoDB database. Apache Cassandra is an open source distributed database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers robust support for clusters spanning multiple datacenters, with asynchronous masterless replication allowing low latency operations for all clients.

Datastax Technology has created enterprise edition of Cassandra which is built on Apache Cassandra. Today we have multiple flavors of Cassandra available from Apache as well as datastax.

Cassandra is a NoSQL database storage solution and it stores the data using simple key-value pairs. Along with enterprise software, datastax also provide huge documentation for learning Cassandra. They also provide self-paced training and instructor led training for learning Cassandra.

I have started learning Cassandra using self-paced training available at following location – https://academy.datastax.com/courses

Apart from that, datastax also has very active blog where they discuss different issues and features available in Cassandra – http://www.datastax.com/dev/blog/

Installation:

You can either go with full installation of Cassandra on multiple physical nodes and creating a cluster or you can simulate a cluster on single node using CCM (Cassandra Cluster Manager).

Going for official Cassandra software on multiple physical nodes might not be feasible for everyone. Thats why CCM is the best utility to learn Cassandra.

You can find instruction to install CCM at following location – http://www.datastax.com/dev/blog/ccm-a-development-tool-for-creating-local-cassandra-clusters

Valid Versions:

At the time of this writing, most stable version of Apache Cassandra is 2.1.14. Latest version of Apache Cassandra released is 2.1.15. You have older version like 2.0.9 which was also stable.

You can get complete list of Apache Cassandra at – http://archive.apache.org/dist/cassandra/

You can check datastax community versions of Cassandra at http://planetcassandra.org/cassandra/

Community version is for learning and is free to download, install and play around.

CCM Installation Issue:

I faced following issue when I installed CCM on my ubuntu 12.04 machine.

ccm create --version=2.0.9 --nodes=6 deo
Downloading http://archive.apache.org/dist/cassandra/2.0.9/apache-cassandra-2.0.9-src.tar.gz to /tmp/ccm-2oKzAH.tar.gz (10.810MB)
  11335077  [100.00%]
Extracting /tmp/ccm-2oKzAH.tar.gz as version 2.0.9 ...
Compiling Cassandra 2.0.9 ...
Deleted /home/local/advaitd/.ccm/repository/2.0.9 due to error
Traceback (most recent call last):
  File "/usr/local/bin/ccm", line 5, in <module>
    pkg_resources.run_script('ccm==2.0.3.1', 'ccm')
  File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 499, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 1235, in run_script
    execfile(script_filename, namespace, namespace)
  File "/usr/local/lib/python2.7/dist-packages/ccm-2.0.3.1-py2.7.egg/EGG-INFO/scripts/ccm", line 72, in <module>
    cmd.run()
  File "/usr/local/lib/python2.7/dist-packages/ccm-2.0.3.1-py2.7.egg/ccmlib/cmds/cluster_cmds.py", line 127, in run
    cluster = Cluster(self.path, self.name, install_dir=self.options.install_dir, version=self.options.version, verbose=True)
  File "/usr/local/lib/python2.7/dist-packages/ccm-2.0.3.1-py2.7.egg/ccmlib/cluster.py", line 51, in __init__
    dir, v = self.load_from_repository(version, verbose)
  File "/usr/local/lib/python2.7/dist-packages/ccm-2.0.3.1-py2.7.egg/ccmlib/cluster.py", line 64, in load_from_repository
    return repository.setup(version, verbose)
  File "/usr/local/lib/python2.7/dist-packages/ccm-2.0.3.1-py2.7.egg/ccmlib/repository.py", line 40, in setup
    download_version(version, verbose=verbose, binary=binary)
  File "/usr/local/lib/python2.7/dist-packages/ccm-2.0.3.1-py2.7.egg/ccmlib/repository.py", line 221, in download_version
    raise e
ccmlib.common.CCMError: Error compiling Cassandra. See /home/local/advaitd/.ccm/repository/last.log for details

 

I posted the same error on github community and immediately got a solution – https://github.com/pcmanus/ccm/issues/268

Suggested me to use binary version of Cassandra for download -v binary:2.0.9. Cluster creation was successful after using binary version.

You can create as many nodes cluster as you want to. All it does is, it creates those many directories and treat them as separate nodes.

I create six node cluster on my ubuntu machine.

advaitd@desktop:~$ ccm status
Cluster: 'deo'
--------------
node1: UP
node3: UP
node2: UP
node5: UP
node4: UP
node6: UP

CCM Installation details:

CCM creates hidden directory under your home directory and a separate installation directory for each node under that hidden directory as shown below.

advaitd@desktop:~$ pwd
/home/local/advaitd
advaitd@desktop:~$ ls -rlt .ccm
total 12
drwxr-xr-x 3 advaitd domain^users 4096 Apr 23 12:15 repository
-rw-r--r-- 1 advaitd domain^users    4 Apr 23 12:15 CURRENT
drwxr-xr-x 8 advaitd domain^users 4096 Apr 23 12:15 deo
advaitd@desktop:~$ ls -rlt .ccm/deo/
total 28
-rw-r--r-- 1 advaitd domain^users  291 Apr 23 12:15 cluster.conf
drwxr-xr-x 8 advaitd domain^users 4096 Apr 23 12:16 node2
drwxr-xr-x 8 advaitd domain^users 4096 Apr 23 12:16 node1
drwxr-xr-x 8 advaitd domain^users 4096 Apr 23 12:16 node5
drwxr-xr-x 8 advaitd domain^users 4096 Apr 23 12:16 node3
drwxr-xr-x 8 advaitd domain^users 4096 Apr 23 12:16 node6
drwxr-xr-x 8 advaitd domain^users 4096 Apr 23 12:16 node4
advaitd@desktop:~$

So in each of the above node directory we have a Cassandra software installed. Each of above node directory is considered as separate node and cluster is created.

Cassandra binary is running from each of the node directory. So we should be seeing 6 cassandra processes running on that host as shown below.

advaitd@desktop:~$ ps -aef | grep cassandra | grep -v grep | wc -l
6

I will be doing more learning and posting articles on Cassandra as well.

References:

http://cassandra.apache.org/

http://www.datastax.com/

http://en.wikipedia.org/wiki/Apache_Cassandra

https://github.com/pcmanus/ccm/issues

http://www.datastax.com/blog

http://docs.datastax.com/en/index.html


Filed under: Cassandra, NoSQL Tagged: Apache Cassandra, cassandra, ccm, ccmlib.common.CCMError: Error compiling Cassandra, datastax

Viewing all articles
Browse latest Browse all 43

Trending Articles