Strapdata created Elassandra to simplify your data stack. Elassandra is an opensource cloud-agnostic solution to store and analyze your data, by tightly integrating the powerful search engine Elasticsearch into the mission-critical database Apache Cassandra.
When moving from Apache Cassandra to Elassandra
Yes, Elassandra is the opensource Apache Cassandra version 3.11.last with a tightly integrated Elasticsearch engine. Cassandra SSTable files, administration tasks and tools remain the same, it IS the Apache Cassandra code with additional features that you can enable if needed.
Yes, you can run Apache Cassandra datacenters and Elassandra datacenters in the same cluster, you just need to add a dummy java class in the Cassandra classpath to avoid a ClassNotFoundException when Cassandra creates an Elasticsearch secondary index referenced in the CQL schema.
There is several ways to migrate from Cassandra to Elassandra : You can replace the Cassandra binaries by Elassandra ones (or even switch a docker image as explained in this Blog form Pythian) and rolling restart nodes. You can also add an Elassandra datacenter to an existing Cassandra cluster and stream tables. Finally, you can create a new Elassandra cluster and restore SSTables from your existing Cassandra cluster.
Elassandra is developed and maintained by Strapdata from the opensource code of Cassandra and Elasticsearch. Strapdata provides various support contracts, training and consulting services to assist you in using Elassandra.
To avoid data duplication and wastage of disk space, Elassandra only stores data into Cassandra tables, Elasticsearch only manages Lucene indices and the _source document is not more stored in Elasticsearch, but fetched from the underlying Cassandra table.
On the write path, Elassandra synchronously updates in-memory Lucene segments with the defined Elasticsearch fields. Of course, write overhead depends on what you index (numbers, text, full text, etc...), but keep in minds that write throughput lineary scales with the number of nodes.
On the search path, there is two way to request Elasticsearch. If the partition key column is known, the search (or aggregation) request is directed to one node hosting the targeted data (like routing in Elasticsearch). Thus, search throughput scale lineary with the number of nodes. Without the partition key, all nodes in the datacenter are queried like with a Cassandra secondary index. If your Cassandra replication factor is 2 for example, you can use an optimized search strategy to request half the number of nodes in the datacenter, and thus, increase the search throughput by increasing the replication factor.
Finally, comparing Elassandra performances to Cassandra or Elasticsearch ones is not really meaningful, you should rather compare the TCO of an equivalent architecture delivering the same services, same throughput and resiliency. By synchronously writing in Elasticsearch indices without duplicating the data, Elassandra drastically reduce the total volume of disk and network IOs compared to more sophisticated architectures.
All Elassandra nodes are Elasticsearch data, primary and master nodes. Elasticsearch mapping updates are managed through a PAXOS transaction to avoid concurrent mapping updates. Consequently, Elassandra has no Single Point Of Failure and no Single Point of Write, it's a Multi-Master Search Engine, see here why it's easier to operate in the cloud.
Yes, by keeping the Elasticsearch REST API unchanged, Elassandra works as Elasticsearch for Kibana, Logstash, Beat, Fluentd, Fluentbit and many other tools. However, you cannot use the Elasticsearch x-pack features because it's proprietary code.
Yes, you can run Elasticsearch queries through your favorite CQL driver. It supports search and aggregation queries as described in the Elassandra documentation. Search results are returned as Cassandra rows, allowing to use the same Data Access Objects for both Cassandra and Elasticsearch queries.
The Elasticsearch dynamic mapping is a great feature allowing to update the mapping when a new field is detected in an ingested document. Elassandra automatically translate the Elasticsearch mapping to update the underlying CQL schema. Elassandra batches Cassandra DDL statements to reduce the number of broadcasted schema mutations, and validate all changes before applying it. Thus, Elassandra supports logs ingestion as Elasticsearch.
Yes, like with Cassandra SSTables, you can snapshot Elasticsearch indices on disk (Lucene files) when snapshotting a Cassandra table. Like with Cassandra snapshots, you can then restore these files on a node having the same Cassandra token ranges. Otherwise, you can only snapshot Cassandra SSTables, and Elasticsearch indices will automatically be rebuilt when restoring SSTables (The Elasticsearch mapping is stored in the Cassandra snapshots).
Yes, through its REST API, Elassandra supports Elasticsearch ingest processors allowing to transform the original document before writing into the underlying Cassandra table.
Strapdata© 2017. Elasticsearch, Kibana, Logstash and Beat are trademarks of Elasticsearch BV, registered in the U.S. and in other countries.
Apache Cassandra, Apache, Tomcat, Lucene, Hadoop, HDFS, Spark, and Cassandra are trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.