Elassandra in a Multi-Cloud Kubernetes world

Elassandra in a Multi-Cloud Kubernetes world

With the Kubernetes Elassandra Operator, let’s see how to deploy an Elassandra cluster running in many Kubernetes Clusters, with AKS and GKE.

Kubernetes adoption among IT departments is growing over time and that’s definitely a game changer, and running databases under Kubernetes is the next challenge to run your microservices on any cloud provider. Elassandra is providing a nice solution to achieve this as it provides both a distributed database, Apache Cassandra, and an embedded Elasticsearch. With the Kubernetes Elassandra Operator, let’s see how to deploy an Elassandra cluster running in many Kubernetes Clusters, with Azure Kubernetes Service and Google Kubernetes Engine.

Summary:

  • Overview
  • Create an AKS cluster
  • Create a GKE cluster
  • Deploy and configure additional services
  • Deploy Elassandra DC1
  • Deploy Elassandra DC1

Overview

The Elassandra Operator creates one Kubernetes statefulset per availability zone mapped to a Cassandra rack. Thus, in case of a zone failure, data is properly distributed across replicas and remains available.

The Elassandra Operator watches for Kubernetes nodes and identifies availability zones through the node label failure-domain.beta.kubernetes.io/zone. Each statefulset is named with its zone index rather than a zone name to keep the naming standard whatever are zone names. Here is a 9 nodes Elassandra datacenter running in 3 availability zones in a Kubernetes cluster.

By default, for each Elassandra/Cassandra cluster, you can have only one Elassandra node per Kubernetes node (enforced by an anti-affinity rule).

To ensure data consistency, Persistent Volume Claims are allocated in the zone as of the associated statefulset (or Cassandra rack).

For the purpose of the demonstration, we are deploying an Elassandra cluster using public IP addresses to connect the datacenter DC1 running on AKS, to another Elassandra datacenter DC2 running on GKE. Of course, for security reasons, it would be better to run such a cluster in a private network interconnected through a VPC or a VPN, but who can do more can do less.

Create an AKS cluster

We are going to use Azure VMScaleSet with public IP addresses on Kubernetes nodes, and this requires the Azure CLI aks-preview extension:

az extension add --name aks-preview
az extension update --name aks-preview
az feature register --name NodePublicIPPreview --namespace Microsoft.ContainerService

Create an Azure resource group and an AKS regional cluster running on 3 zones with public IP addresses on Kubernetes nodes and the Azure network plugin:

AZURE_REGION=westeurope
RESOURCE_GROUP_NAME=aks1
K8S_CLUSTER_NAME=kube1

az group create -l $AZURE_REGION -n $RESOURCE_GROUP_NAME
az aks create --name "${K8S_CLUSTER_NAME}" \
              --resource-group ${RESOURCE_GROUP_NAME} \
              --network-plugin azure \
              --node-count 3 \
              --node-vm-size Standard_D2_v3 \
              --vm-set-type VirtualMachineScaleSets \
              --output table \
              --zone 1 2 3 \
              --enable-node-public-ip
az aks get-credentials --name "${K8S_CLUSTER_NAME}" --resource-group $RESOURCE_GROUP_NAME --output table

Label the k8s nodes

Unfortunately, AKS does not map VM’s public IP address to the Kubernetes node external IP address, so the trick is to add these public IP addresses as a kubernetes custom label elassandra.strapdata.com/public-ip to each nodes.

add_vmss_public_ip() {
   AKS_RG_NAME=$(az resource show --namespace Microsoft.ContainerService --resource-type managedClusters -g $RESOURCE_GROUP_NAME -n $K8S_CLUSTER_NAME | jq -r .properties.nodeResourceGroup)
   AKS_VMSS_INSTANCE=$(kubectl get nodes -o json | jq -r ".items[${1:-0}].metadata.name")
   PUBLIC_IP=$(az vmss list-instance-public-ips -g $AKS_RG_NAME -n ${AKS_VMSS_INSTANCE::-6} | jq -r ".[${1:-0}].ipAddress")
   kubectl label nodes --overwrite $AKS_VMSS_INSTANCE elassandra.strapdata.com/public-ip=$PUBLIC_IP
}

NODE_COUNT=$(kubectl get nodes --no-headers | wc -l)
for i in $(seq 0 $((NODE_COUNT-1))); do
  add_vmss_public_ip $i
done

And you should get something like this:

kubectl get nodes -L failure-domain.beta.kubernetes.io/zone,elassandra.strapdata.com/public-ip
NAME                                STATUS   ROLES   AGE     VERSION    ZONE           PUBLIC-IP
aks-nodepool1-74300635-vmss000000   Ready    agent   8m18s   v1.15.11   westeurope-1   51.138.75.131
aks-nodepool1-74300635-vmss000001   Ready    agent   8m12s   v1.15.11   westeurope-2   40.113.160.148
aks-nodepool1-74300635-vmss000002   Ready    agent   8m22s   v1.15.11   westeurope-3   51.124.121.185

Install HELM 2

Install HELM 2 and add the strapdata repository:

kubectl create serviceaccount --namespace kube-system tiller
kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=kube-system:tiller
helm init --wait --service-account tiller
helm repo add strapdata https://charts.strapdata.com

AKS StorageClass

Azure persistent volumes are bound to an availability zone, so we need to defined one storageClass per zone in our AKS cluster, and each Elassandra rack or statefulSet will be bound to the corresponding storageClass. This is done here using the HELM chart strapdata/storageclass.

for z in 1 2 3; do
    helm install --name ssd-$AZURE_REGION-$z --namespace kube-system \
        --set parameters.kind="Managed" \
        --set parameters.cachingmode="ReadOnly" \
        --set parameters.storageaccounttype="StandardSSD_LRS" \
        --set provisioner="kubernetes.io/azure-disk" \
        --set zone="$AZURE_REGION-${z}" \
        --set nameOverride="ssd-$AZURE_REGION-$z" \
        $HELM_REPO/storageclass
done

AKS Firewall rules

Finally, you may need to authorise inbound Elassandra connections on the following TCP ports:

  • Cassandra storage port (usually 7000 or 7001) for internode connections.
  • Cassandra native CQL port (usually 9042) for client to node connections.
  • Elasticsearch HTTP port (usually 9200) for the Elasticsearch REST API.

Assuming you deploy an Elassandra datacenter respectively using ports 39000, 39001, and 39002 exposed to the internet, with no source IP address restrictions:

AKS_RG_NAME=$(az resource show --namespace Microsoft.ContainerService --resource-type managedClusters -g $RESOURCE_GROUP_NAME -n "${K8S_CLUSTER_NAME}" | jq -r .properties.nodeResourceGroup)
NSG_NAME=$(az network nsg list -g $AKS_RG_NAME | jq -r .[0].name)
az network nsg rule create \
    --resource-group $AKS_RG_NAME \
    --nsg-name $NSG_NAME \
    --name elassandra_inbound \
    --description "Elassandra inbound rule" \
    --priority 2000 \
    --access Allow \
    --source-address-prefixes Internet \
    --protocol Tcp \
    --direction Inbound \
    --destination-address-prefixes '*' \
    --destination-port-ranges 39000-39002

Create a GKE cluster

Create a Regional Kubernetes cluster on GCP, with RBAC enabled:

GCLOUD_PROJECT=strapkube1
K8S_CLUSTER_NAME=kube2
GCLOUD_REGION=europe-west1

gcloud container clusters create $K8S_CLUSTER_NAME \
  --region $GCLOUD_REGION \
  --project $GCLOUD_PROJECT \
  --machine-type "n1-standard-2" \
  --cluster-version=1.15 \
  --tags=$K8S_CLUSTER_NAME \
  --num-nodes "1"
gcloud container clusters get-credentials $K8S_CLUSTER_NAME --region $GCLOUD_REGION --project $GCLOUD_PROJECT
kubectl create clusterrolebinding cluster-admin-binding --clusterrole cluster-admin --user $(gcloud config get-value account)

Install HELM 2 (like on AKS).

GKE StorageClass

Google cloud persistent volumes are bound to an availability zone, so we need to defined one storageClass per zone in our Kubernetes cluster, and each Elassandra rack or statefulSet will be bound to the corresponding storageClass. This is done here using the HELM chart strapdata/storageclass.

for z in europe-west1-b europe-west1-c europe-west1-d; do
    helm install --name ssd-$z --namespace kube-system \
        --set parameters.type="pd-ssd" \
        --set provisioner="kubernetes.io/gce-pd" \
        --set zone=$z,nameOverride=ssd-$z \
        strapdata/storageclass
done

GKE Firewall rules

Assuming you deploy an Elassandra datacenter respectively using ports 39000, 39001, and 39002 exposed to the internet, with no source IP address restrictions, and Kubernetes nodes are properly tagged with the k8s cluster name, you can create an inbound firewall rule like this:

VPC_NETWORK=$(gcloud container clusters describe $K8S_CLUSTER_NAME --region $GCLOUD_REGION --format='value(network)')
NODE_POOLS_TARGET_TAGS=$(gcloud container clusters describe $K8S_CLUSTER_NAME --region $GCLOUD_REGION --format='value[terminator=","](nodePools.config.tags)' --flatten='nodePools[].config.tags[]' | sed 's/,\{2,\}//g')
gcloud compute firewall-rules create "allow-elassandra-inbound" \
  --allow tcp:39000-39002 \
  --network="$VPC_NETWORK" \
  --target-tags="$NODE_POOLS_TARGET_TAGS" \
  --description="Allow elassandra inbound" \
  --direction INGRESS

GKE CoreDNS installation

GKE is provided with KubeDNS by default, which does not allows to configure host aliases to resolve public IP addresses to internal Kubernetes node IP addresses (required by the Cassandra AddressTranslator to connect to Elassandra nodes using the internal IP address). So we need to install CoreDNS configured to import custom configuration (see CoreDNS import plugin), and configure KubeDNS with a stub domain to forward to CoreDNS.

helm install --name coredns --namespace=kube-system -f coredns-values.yaml stable/coredns

The used coredns-value.yaml is available here

Once CoreDNS is installed, add a stub domain to forward request for domain internal.strapdata.com to the CoreDNS service, and restart KubeDNS pods. The internal.strapdata.com is just a dummy DNS domain used to resolve public IP addresses to Kubernetes nodes internal IP addresses.

COREDNS_SERVICE_IP=$(kubectl get  service -l k8s-app=coredns  -n kube-system -o jsonpath='{.items[0].spec.clusterIP}')
KUBEDNS_STUB_DOMAINS="{\\\"internal.strapdata.com\\\": [\\\"$COREDNS_SERVICE_IP\\\"]}"
kubectl patch configmap/kube-dns -n kube-system -p "{\"data\": {\"stubDomains\": \"$KUBEDNS_STUB_DOMAINS\"}}"
kubectl delete pod -l k8s-app=coredns -n kube-system

Prepare your Kubernetes clusters

Once your AKS and GKE clusters are running, we need to deploy and configure additional services in these two clusters.

Elassandra Operator

Install the Elassandra operator in the default namespace:

helm install --namespace default --name elassop --wait strapdata/elassandra-operator

Configure CoreDNS

The Kubernetes CoreDNS is used for two reasons:

  • Resolve DNS name of you DNS zone from inside the Kubernetes cluster using DNS forwarders to your DNS zone.
  • Reverse resolution of the broadcast Elassandra public IP addresses to Kubernetes nodes private IP.

You can deploy the CodeDNS custom configuration with the strapdata coredns-forwarder HELM chart to basically install (or replace) the coredns-custom configmap, and restart coreDNS pods.

HOST_ALIASES=$(kubectl get nodes -o custom-columns='INTERNAL-IP:.status.addresses[?(@.type=="InternalIP")].address,EXTERNAL-IP:.status.addresses[?(@.type=="ExternalIP")].address' --no-headers |\
awk '{ gsub(/\./,"-",$2); printf("nodes.hosts[%d].name=%s,nodes.hosts[%d].value=%s,",NR-1, $2, NR-1, $1); }')

If your Kubernetes nodes does not have the ExternalIP set (like AKS), public node IP address should be available through the custom label elassandra.strapdata.com/public-ip.

HOST_ALIASES=$(kubectl get nodes -o custom-columns='INTERNAL-IP:.status.addresses[?(@.type=="InternalIP")].address,PUBLIC-IP:.metadata.labels.elassandra\.strapdata\.com/public-ip' --no-headers |\
awk '{ gsub(/\./,"-",$2); printf("nodes.hosts[%d].name=%s,nodes.hosts[%d].value=%s,",NR-1, $2, NR-1, $1); }')

Then configure the CoreDNS custom configmap with your DNS name servers and host aliases. In the following example, this is Azure DNS name servers:

kubectl delete configmap --namespace kube-system coredns-custom
helm install --name coredns-forwarder --namespace kube-system \
    --set forwarders.domain="${DNS_DOMAIN}" \
    --set forwarders.hosts[0]="40.90.4.8" \
    --set forwarders.hosts[1]="64.4.48.8" \
    --set forwarders.hosts[2]="13.107.24.8" \
    --set forwarders.hosts[3]="13.107.160.8" \
    --set nodes.domain=internal.strapdata.com \
    --set $HOST_ALIASES \
    strapdata/coredns-forwarder

Then restart CoreDNS pods to reload our configuration, but this depends on coreDNS deployment labels !

On AKS:

kubectl delete pod --namespace kube-system -l k8s-app=kube-dns

On GKE:

kubectl delete pod --namespace kube-system -l k8s-app=coredns

Check the CoreDNS custom configuration:

kubectl get configmap -n kube-system coredns-custom -o yaml
apiVersion: v1
data:
  dns.server: |
    test.strapkube.com:53 {
        errors
        cache 30
        forward $DNS_DOMAIN 40.90.4.8 64.4.48.8 13.107.24.8 13.107.160.8
    }
  hosts.override: |
    hosts nodes.hosts internal.strapdata.com {
        10.132.0.57 146-148-117-125.internal.strapdata.com 146-148-117-125
        10.132.0.58 35-240-56-87.internal.strapdata.com 35-240-56-87
        10.132.0.56 34-76-40-251.internal.strapdata.com 34-76-40-251
        fallthrough
    }
kind: ConfigMap
metadata:
  creationTimestamp: "2020-06-26T16:45:52Z"
  name: coredns-custom
  namespace: kube-system
  resourceVersion: "6632"
  selfLink: /api/v1/namespaces/kube-system/configmaps/coredns-custom
  uid: dca59c7d-6503-48c1-864f-28ae46319725

Deploy a dnsutil pod:

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: dnsutils
  namespace: default
spec:
  containers:
  - name: dnsutils
    image: gcr.io/kubernetes-e2e-test-images/dnsutils:1.3
    command:
      - sleep
      - "3600"
    imagePullPolicy: IfNotPresent
  restartPolicy: Always
EOF

Test resolution of public IP names to internal Kubernetes node IP address:

kubectl exec -ti dnsutils -- nslookup 146-148-117-125.internal.strapdata.com
Server:             10.19.240.10
Address:    10.19.240.10#53
Name:       146-148-117-125.internal.strapdata.com
Address: 10.132.0.57

ExternalDNS

The ExternalDNS is used to automatically update your DNS zone and create an A record for the Cassandra broadcast IP addresses. You can use it with a public or a private DNS zone, and with any DNS provider supported by ExternalDNS. In the following setup, we will use a DNS zone hosted on Azure.

helm install --name my-externaldns --namespace default \
    --set logLevel="debug" \
    --set rbac.create=true \
    --set policy="sync",txtPrefix=$(kubectl config current-context)\
    --set sources[0]="service",sources[1]="ingress",sources[2]="crd" \
    --set crd.create=true,crd.apiversion="externaldns.k8s.io/v1alpha1",crd.kind="DNSEndpoint" \
    --set provider="azure" \
    --set azure.secretName="$AZURE_DNS_SECRET_NAME",azure.resourceGroup="$AZURE_DNS_RESOURCE_GROUP" \
    --set azure.tenantId="$AZURE_DNS_TENANT_ID",azure.subscriptionId="$AZURE_SUBSCRIPTION_ID" \
    --set azure.aadClientId="$AZURE_DNS_CLIENT_ID",azure.aadClientSecret="$AZURE_DNS_CLIENT_SECRET" \
    stable/external-dns

Deploy Elassandra DC1

Deploy the first datacenter dc1 of the Elassandra cluster cl1 in the Kubernetes cluster kube1, with Kibana and Cassandra Reaper available through the Traefik ingress controller.

helm install --namespace default --name "default-cl1-dc1" \
    --set dataVolumeClaim.storageClassName="ssd-{zone}" \
    --set cassandra.sslStoragePort="39000" \
    --set cassandra.nativePort="39001" \
    --set elasticsearch.httpPort="39002" \
    --set elasticsearch.transportPort="39003" \
    --set jvm.jmxPort="39004" \
    --set jvm.jdb="39005" \
    --set prometheus.port="39006" \
    --set replicas="3" \
    --set networking.hostNetworkEnabled=true \
    --set networking.externalDns.enabled=true \
    --set networking.externalDns.domain=${DNS_DOMAIN} \
    --set networking.externalDns.root=cl1-dc1 \
    --set kibana.enabled="true",kibana.spaces[0].ingressAnnotations."kubernetes\.io/ingress\.class"="traefik",kibana.spaces[0].ingressSuffix=kibana.${TRAEFIK_FQDN} \
    --set reaper.enabled="true",reaper.ingressAnnotations."kubernetes\.io/ingress\.class"="traefik",reaper.ingressHost=reaper.${TRAEFIK_FQDN} \
    --wait $HELM_REPO/elassandra-datacenter

Once the Elassandra datacenter is deployed, you get 3 Elassandra pods from 3 StatefulSets:

kubectl get all -l app.kubernetes.io/managed-by=elassandra-operator
NAME                                                    READY   STATUS    RESTARTS   AGE
pod/elassandra-cl1-dc1-0-0                              1/1     Running   0          21m
pod/elassandra-cl1-dc1-1-0                              1/1     Running   0          19m
pod/elassandra-cl1-dc1-2-0                              1/1     Running   0          16m
pod/elassandra-cl1-dc1-kibana-kibana-5b94445c6b-6fxww   1/1     Running   0          13m
pod/elassandra-cl1-dc1-reaper-67d78d797d-2wr2z          1/1     Running   0          13m

NAME                                       TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)                                             AGE
service/elassandra-cl1-dc1                 ClusterIP   None           none          39000/TCP,39001/TCP,39004/TCP,39002/TCP,39006/TCP   21m
service/elassandra-cl1-dc1-admin           ClusterIP   10.0.234.29    none          39004/TCP                                           21m
service/elassandra-cl1-dc1-elasticsearch   ClusterIP   10.0.21.82     none          39002/TCP                                           21m
service/elassandra-cl1-dc1-external        ClusterIP   10.0.128.132   none          39001/TCP,39002/TCP                                 21m
service/elassandra-cl1-dc1-kibana-kibana   ClusterIP   10.0.145.131   none          5601/TCP                                            13m
service/elassandra-cl1-dc1-reaper          ClusterIP   10.0.234.176   none          8080/TCP,8081/TCP                                   13m

NAME                                               READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/elassandra-cl1-dc1-kibana-kibana   1/1     1            1           13m
deployment.apps/elassandra-cl1-dc1-reaper          1/1     1            1           13m

NAME                                                          DESIRED   CURRENT   READY   AGE
replicaset.apps/elassandra-cl1-dc1-kibana-kibana-5b94445c6b   1         1         1       13m
replicaset.apps/elassandra-cl1-dc1-reaper-67d78d797d          1         1         1       13m

NAME                                    READY   AGE
statefulset.apps/elassandra-cl1-dc1-0   1/1     21m
statefulset.apps/elassandra-cl1-dc1-1   1/1     19m
statefulset.apps/elassandra-cl1-dc1-2   1/1     16m

Once the datacenter is ready, check the cluster status:

kubectl exec elassandra-cl1-dc1-0-0 -- nodetool -u cassandra -pwf /etc/cassandra/jmxremote.password --jmxmp --ssl status
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Tokens       Owns    Host ID                               Rack
UN  40.113.160.148  365.18 KiB  8            ?       cf2cda52-05fe-4add-0001-000000000000  westeurope-2
UN  51.124.121.185  317.22 KiB  8            ?       141dcdbc-8e35-4535-0002-000000000000  westeurope-3
UN  51.138.75.131   302.2 KiB  8            ?       edf45f07-3128-46f1-0000-000000000000  westeurope-1

Then get the generated TLS certificates and the Cassandra admin password (Because using the default cassandra user is not recommended, the Elassandra operator automatically creates an admin superuser role):

kubectl get secret elassandra-cl1-ca-pub --context kube1 -n default -o jsonpath='{.data.cacert\.pem}' | base64 -D > cl1-cacert.pem
CASSANDRA_ADMIN_PASSWORD=$(kb get secret elassandra-cl1 --context kube1 -o jsonpath='{.data.cassandra\.admin_password}' | base64 -D)

Connect to the Elassandra/Cassandra node from the internet:

SSL_CERTFILE=cl1-cacert.pem bin/cqlsh --ssl -u admin -p $CASSANDRA_ADMIN_PASSWORD cassandra-cl1-dc1-0-0.$DNS_DOMAIN 39001
Connected to cl1 at cassandra-cl1-dc1-0-0.test.strapkube.com:39001.
[cqlsh 5.0.1 | Cassandra 3.11.6.1 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
admin@cqlsh>

Finally, you can check the Elassandra datacenter status (The CRD managed by the Elassandra Operator):

kubectl get edc elassandra-cl1-dc1 -o yaml
...(spec removed)
status:
  bootstrapped: true
  cqlStatus: ESTABLISHED
  cqlStatusMessage: Connected to cluster=[cl1] with role=[elassandra_operator] secret=[elassandra-cl1/cassandra.elassandra_operator_password]
  health: GREEN
  keyspaceManagerStatus:
    keyspaces:
    - reaper_db
    - system_traces
    - elastic_admin
    - system_distributed
    - _kibana_1
    - system_auth
    replicas: 3
  kibanaSpaceNames:
  - kibana
  needCleanup: true
  needCleanupKeyspaces: []
  observedGeneration: 1
  operationHistory:
  - actions:
    - Updating kibana space=[kibana]
    - Cassandra reaper registred
    durationInMs: 2099
    lastTransitionTime: "2020-07-02T21:30:09.411Z"
    pendingInMs: 2
    triggeredBy: Status update deployment=elassandra-cl1-dc1-reaper
  - actions:
    - Update keyspace RF for [reaper_db]
    - Update keyspace RF for [system_traces]
    - Update keyspace RF for [elastic_admin]
    - Update keyspace RF for [system_distributed]
    - Update keyspace RF for [_kibana_1]
    - Update keyspace RF for [system_auth]
    - Create or update role=[kibana]
    - Create or update role=[reaper]
    - Cassandra reaper deployed
    - Deploying kibana space=[kibana]
    durationInMs: 58213
    lastTransitionTime: "2020-07-02T21:28:17.621Z"
    pendingInMs: 1
    triggeredBy: Status update statefulset=elassandra-cl1-dc1-2 replicas=1/1
  - actions:
    - scale-up rack index=2 name=westeurope-3
    durationInMs: 152
    lastTransitionTime: "2020-07-02T21:26:15.883Z"
    pendingInMs: 2
    triggeredBy: Status update statefulset=elassandra-cl1-dc1-1 replicas=1/1
  - actions:
    - scale-up rack index=1 name=westeurope-2
    durationInMs: 430
    lastTransitionTime: "2020-07-02T21:23:31.076Z"
    pendingInMs: 5
    triggeredBy: Status update statefulset=elassandra-cl1-dc1-0 replicas=1/1
  - actions:
    - Datacenter resources deployed
    durationInMs: 4306
    lastTransitionTime: "2020-07-02T21:21:49.409Z"
    pendingInMs: 209
    triggeredBy: Datacenter added
  phase: RUNNING
  rackStatuses:
    "0":
      desiredReplicas: 1
      fingerprint: 230ec90-3feeca9
      health: GREEN
      index: 0
      name: westeurope-1
      progressState: RUNNING
      readyReplicas: 1
    "1":
      desiredReplicas: 1
      fingerprint: 230ec90-3feeca9
      health: GREEN
      index: 1
      name: westeurope-2
      progressState: RUNNING
      readyReplicas: 1
    "2":
      desiredReplicas: 1
      fingerprint: 230ec90-3feeca9
      health: GREEN
      index: 2
      name: westeurope-3
      progressState: RUNNING
      readyReplicas: 1
  readyReplicas: 3
  reaperRegistred: true
  zones:
  - westeurope-1
  - westeurope-2
  - westeurope-3

Deploy Elassandra DC2

First, we need to copy cluster secrets from the Elassandra datacenter dc1 into the Kubernetes kube2 running on GKE.

for s in elassandra-cl1 elassandra-cl1-ca-pub elassandra-cl1-ca-key elassandra-cl1-kibana; do
 kubectl get secret $s — context kube1 — export -n default -o yaml | kubectl apply — context kube2 -n default -f -
done

Then deploy the Elassandra datacenter dc2 into the GKE cluster2, using the same ports.

  • The TRAEFIK_FQDN should be something like traefik-cluster2.$DNS_DOMAIN.
  • The cassandra.remoteSeeds must include the DNS names of dc1 seed nodes, the first node of each rack StatefulSet with index 0.

helm install --namespace default --name "default-cl1-dc2" \
        --set dataVolumeClaim.storageClassName="ssd-{zone}" \
        --set cassandra.sslStoragePort="39000" \
        --set cassandra.nativePort="39001" \
        --set elasticsearch.httpPort="39002" \
        --set elasticsearch.transportPort="39003" \
        --set jvm.jmxPort="39004" \
        --set jvm.jdb="39005" \
        --set prometheus.port="39006" \
        --set replicas="3" \
        --set cassandra.remoteSeeds[0]=cassandra-cl1-dc1-0-0.${DNS_DOMAIN} \
        --set networking.hostNetworkEnabled=true \
        --set networking.externalDns.enabled=true \
        --set networking.externalDns.domain=${DNS_DOMAIN} \
        --set networking.externalDns.root=cl1-dc2 \
        --set kibana.enabled="true",kibana.spaces[0].ingressAnnotations."kubernetes\.io/ingress\.class"="traefik",kibana.spaces[0].ingressSuffix=kibana.${TRAEFIK_FQDN} \
        --set reaper.enabled="true",reaper.ingressAnnotations."kubernetes\.io/ingress\.class"="traefik",reaper.ingressHost=reaper.${TRAEFIK_FQDN} \
        --wait strapdata/elassandra-datacenter

Once dc2 Elassandra pods are started, you get a running Elassandra cluster in AKS and GKE.

Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address          Load       Tokens       Owns    Host ID                               Rack
UN  40.113.160.148   2.05 MiB   8            ?       cf2cda52-05fe-4add-0001-000000000000  westeurope-2
UN  51.124.121.185   2.44 MiB   8            ?       141dcdbc-8e35-4535-0002-000000000000  westeurope-3
UN  51.138.75.131    1.8 MiB    8            ?       edf45f07-3128-46f1-0000-000000000000  westeurope-1
Datacenter: dc2
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address          Load       Tokens       Owns    Host ID                               Rack
UN  104.155.105.232  179.86 KiB  8            ?       fefe51cc-6ae3-48e8-0000-000000000000  europe-west1-b
UN  35.241.189.119   95.66 KiB  8            ?       b380d522-4801-46dd-0001-000000000000  europe-west1-c
UN  35.187.1.156     95.35 KiB  8            ?       ab010464-bbdd-4653-0002-000000000000  europe-west1-d

The datacenter dc2 started without streaming data, and we now setup keyspace replication before rebuilding the datacenter from dc1 using an Elassandra task CRD. This task automatically includes Cassandra system keyspaces (system_auth, system_distributed, system_traces, and elastic_admin if Elasticsearch is enabled).

cat <<EOF | kubectl apply --context kube1 -f -
apiVersion: elassandra.strapdata.com/v1beta1
kind: ElassandraTask
metadata:
  name: replication-add-$$
  namespace: default
spec:
  cluster: "cl1"
  datacenter: "dc1"
  replication:
    action: ADD
    dcName: "dc2"
    dcSize: 3
    replicationMap:
      reaper_db: 3
EOF
edctl watch-task --context kube1 -n replication-add-573 -ns default --phase SUCCEED
19:54:04.505 [main] INFO  i.m.context.env.DefaultEnvironment.init:210 Established active environments: [cli]
Watching elassandra task context=kube1 name=replication-add-573 namespace=default phase=SUCCEED timeout=600s
19:54:06 ADDED: replication-add-573 phase=WAITING
19:55:02 MODIFIED: replication-add-573 phase=SUCCEED
done 56772ms

The edctl utility allow to wait on conditions on Elassandra datacenters or tasks. We now rebuild dc2 from dc1 by streaming the data:

cat <<EOF | kubectl apply --context gke_strapkube1_europe-west1_kube2 -f -
apiVersion: elassandra.strapdata.com/v1beta1
kind: ElassandraTask
metadata:
  name: rebuild-dc2-$$
  namespace: default
spec:
  cluster: "cl1"
  datacenter: "dc2"
  rebuild:
    srcDcName: "dc1"
EOF
edctl watch-task --context gke_strapkube1_europe-west1_kube2 -n rebuild-dc2-573 -ns default --phase SUCCEED
19:59:29.458 [main] INFO  i.m.context.env.DefaultEnvironment.init:210 Established active environments: [cli]
Watching elassandra task context=gke_strapkube1_europe-west1_kube2 name=rebuild-dc2-573 namespace=default phase=SUCCEED timeout=600s
"19:59:30 ADDED: rebuild-dc2-573 phase=SUCCEED
done 49ms

If Elasticsearch is enabled in dc2, you need to run restart Elassandra pods to update the Elasticsearch cluster state since data have been populated by streaming data from dc1.

kubectl delete pod --namespace default -l app=elassandra,elassandra.strapdata.com/datacenter=dc2

Finally, check you can connect on dc2:

SSL_CERTFILE=cl1-cacert.pem bin/cqlsh --ssl -u admin -p $CASSANDRA_ADMIN_PASSWORD cassandra-cl1-dc2-0-0.$DNS_DOMAIN 39001
Connected to cl1 at cassandra-cl1-dc2-0-0.test.strapkube.com:39001.
[cqlsh 5.0.1 | Cassandra 3.11.6.1 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
admin@cqlsh>

Check the Elasticsearch cluster status on dc2. The kibana index was automatically created by the deployed kibana pod running in the Kubernetes cluster kube2:

curl -k --user admin:c96ead5f-ad7c-4c0a-baeb-4ff037bfb185 https://cassandra-cl1-dc2-0-0.test.strapkube.com:39002/_cluster/state
{
  "cluster_name": "cl1",
  "cluster_uuid": "141dcdbc-8e35-4535-0002-000000000000",
  "version": 319,
  "state_uuid": "jIPjLqTfTUGgG04cSsYGuw",
  "master_node": "fefe51cc-6ae3-48e8-0000-000000000000",
  "blocks": {},
  "nodes": {
    "b380d522-4801-46dd-0001-000000000000": {
      "name": "35.241.189.119",
      "status": "ALIVE",
      "ephemeral_id": "b380d522-4801-46dd-0001-000000000000",
      "transport_address": "10.132.0.60:9300",
      "attributes": {
        "rack": "europe-west1-c",
        "dc": "dc2"
      }
    },
    "fefe51cc-6ae3-48e8-0000-000000000000": {
      "name": "104.155.105.232",
      "status": "ALIVE",
      "ephemeral_id": "fefe51cc-6ae3-48e8-0000-000000000000",
      "transport_address": "10.132.0.59:9300",
      "attributes": {
        "rack": "europe-west1-b",
        "dc": "dc2"
      }
    },
    "ab010464-bbdd-4653-0002-000000000000": {
      "name": "35.187.1.156",
      "status": "ALIVE",
      "ephemeral_id": "ab010464-bbdd-4653-0002-000000000000",
      "transport_address": "10.132.0.61:9300",
      "attributes": {
        "rack": "europe-west1-d",
        "dc": "dc2"
      }
    }
  },
  "metadata": {
    "version": 2,
    "cluster_uuid": "141dcdbc-8e35-4535-0002-000000000000",
    "templates": {},
    "indices": {
      ".kibana_1": {
        "state": "open",
        "settings": {
          "index": {
            "keyspace": "_kibana_1",
            "number_of_shards": "3",
            "auto_expand_replicas": "0-1",
            "provided_name": ".kibana_1",
            "creation_date": "1593725363439",
            "number_of_replicas": "2",
            "uuid": "_ATeExQwRlq_iDUnyMQkqw",
            "version": {
              "created": "6080499"
            }
          }
        },
        "mappings": {
          "doc": {
            "dynamic": "strict",
            "properties": {
              "server": {
                "properties": {
                  "uuid": {
                    "type": "keyword"
                  }
                }
              },
              "visualization": {
                "properties": {
                  "savedSearchId": {
                    "type": "keyword"
                  },
                  "description": {
                    "type": "text"
                  },
                  "uiStateJSON": {
                    "type": "text"
                  },
                  "title": {
                    "type": "text"
                  },
                  "version": {
                    "type": "integer"
                  },
                  "kibanaSavedObjectMeta": {
                    "properties": {
                      "searchSourceJSON": {
                        "type": "text"
                      }
                    }
                  },
                  "visState": {
                    "type": "text"
                  }
                }
              },
              "kql-telemetry": {
                "properties": {
                  "optInCount": {
                    "type": "long"
                  },
                  "optOutCount": {
                    "type": "long"
                  }
                }
              },
              "type": {
                "type": "keyword"
              },
              "url": {
                "properties": {
                  "accessCount": {
                    "type": "long"
                  },
                  "accessDate": {
                    "type": "date"
                  },
                  "url": {
                    "type": "text",
                    "fields": {
                      "keyword": {
                        "ignore_above": 2048,
                        "type": "keyword"
                      }
                    }
                  },
                  "createDate": {
                    "type": "date"
                  }
                }
              },
              "migrationVersion": {
                "dynamic": "true",
                "type": "object"
              },
              "index-pattern": {
                "properties": {
                  "notExpandable": {
                    "type": "boolean"
                  },
                  "fieldFormatMap": {
                    "type": "text"
                  },
                  "sourceFilters": {
                    "type": "text"
                  },
                  "typeMeta": {
                    "type": "keyword"
                  },
                  "timeFieldName": {
                    "type": "keyword"
                  },
                  "intervalName": {
                    "type": "keyword"
                  },
                  "fields": {
                    "type": "text"
                  },
                  "title": {
                    "type": "text"
                  },
                  "type": {
                    "type": "keyword"
                  }
                }
              },
              "search": {
                "properties": {
                  "hits": {
                    "type": "integer"
                  },
                  "columns": {
                    "type": "keyword"
                  },
                  "description": {
                    "type": "text"
                  },
                  "sort": {
                    "type": "keyword"
                  },
                  "title": {
                    "type": "text"
                  },
                  "version": {
                    "type": "integer"
                  },
                  "kibanaSavedObjectMeta": {
                    "properties": {
                      "searchSourceJSON": {
                        "type": "text"
                      }
                    }
                  }
                }
              },
              "updated_at": {
                "type": "date"
              },
              "namespace": {
                "type": "keyword"
              },
              "timelion-sheet": {
                "properties": {
                  "hits": {
                    "type": "integer"
                  },
                  "timelion_sheet": {
                    "type": "text"
                  },
                  "timelion_interval": {
                    "type": "keyword"
                  },
                  "timelion_columns": {
                    "type": "integer"
                  },
                  "timelion_other_interval": {
                    "type": "keyword"
                  },
                  "timelion_rows": {
                    "type": "integer"
                  },
                  "description": {
                    "type": "text"
                  },
                  "title": {
                    "type": "text"
                  },
                  "version": {
                    "type": "integer"
                  },
                  "kibanaSavedObjectMeta": {
                    "properties": {
                      "searchSourceJSON": {
                        "type": "text"
                      }
                    }
                  },
                  "timelion_chart_height": {
                    "type": "integer"
                  }
                }
              },
              "config": {
                "dynamic": "true",
                "properties": {
                  "buildNum": {
                    "type": "keyword"
                  }
                }
              },
              "dashboard": {
                "properties": {
                  "hits": {
                    "type": "integer"
                  },
                  "timeFrom": {
                    "type": "keyword"
                  },
                  "timeTo": {
                    "type": "keyword"
                  },
                  "refreshInterval": {
                    "properties": {
                      "display": {
                        "type": "keyword"
                      },
                      "section": {
                        "type": "integer"
                      },
                      "value": {
                        "type": "integer"
                      },
                      "pause": {
                        "type": "boolean"
                      }
                    }
                  },
                  "description": {
                    "type": "text"
                  },
                  "uiStateJSON": {
                    "type": "text"
                  },
                  "timeRestore": {
                    "type": "boolean"
                  },
                  "title": {
                    "type": "text"
                  },
                  "version": {
                    "type": "integer"
                  },
                  "kibanaSavedObjectMeta": {
                    "properties": {
                      "searchSourceJSON": {
                        "type": "text"
                      }
                    }
                  },
                  "optionsJSON": {
                    "type": "text"
                  },
                  "panelsJSON": {
                    "type": "text"
                  }
                }
              }
            }
          }
        },
        "aliases": [
          ".kibana"
        ],
        "primary_terms": {
          "0": 0,
          "1": 0,
          "2": 0
        },
        "in_sync_allocations": {
          "1": [],
          "2": [],
          "0": []
        }
      }
    },
    "index-graveyard": {
      "tombstones": []
    }
  },
  "routing_table": {
    "indices": {
      ".kibana_1": {
        "shards": {
          "1": [
            {
              "state": "STARTED",
              "primary": true,
              "node": "b380d522-4801-46dd-0001-000000000000",
              "relocating_node": null,
              "shard": 1,
              "index": ".kibana_1",
              "token_ranges": [
                "(-4905935092955018317,-4634350246539782229]",
                "(-3857615872770993871,-2876405104543864077]",
                "(-2092581070725691530,-1391742915081687192]",
                "(-1169679320448494956,260474944050277681]",
                "(679763209506391150,1066882304983515845]",
                "(3796001557251052615,5355091691330181301]"
              ],
              "allocation_id": {
                "id": "dummy_alloc_id"
              }
            }
          ],
          "2": [
            {
              "state": "STARTED",
              "primary": true,
              "node": "ab010464-bbdd-4653-0002-000000000000",
              "relocating_node": null,
              "shard": 2,
              "index": ".kibana_1",
              "token_ranges": [
                "(-9061546716106502397,-6865113398631406170]",
                "(-5994674258152922714,-4905935092955018317]",
                "(-4634350246539782229,-3857615872770993871]",
                "(260474944050277681,679763209506391150]",
                "(1928397905685598346,2279437765047107097]",
                "(3762179910787726390,3796001557251052615]"
              ],
              "allocation_id": {
                "id": "dummy_alloc_id"
              }
            }
          ],
          "0": [
            {
              "state": "STARTED",
              "primary": true,
              "node": "fefe51cc-6ae3-48e8-0000-000000000000",
              "relocating_node": null,
              "shard": 0,
              "index": ".kibana_1",
              "token_ranges": [
                "(-9223372036854775808,-9061546716106502397]",
                "(-6865113398631406170,-5994674258152922714]",
                "(-2876405104543864077,-2092581070725691530]",
                "(-1391742915081687192,-1169679320448494956]",
                "(1066882304983515845,1928397905685598346]",
                "(2279437765047107097,3762179910787726390]",
                "(5355091691330181301,9223372036854775807]"
              ],
              "allocation_id": {
                "id": "dummy_alloc_id"
              }
            }
          ]
        }
      }
    }
  },
  "routing_nodes": {
    "unassigned": [],
    "nodes": {
      "b380d522-4801-46dd-0001-000000000000": [
        {
          "state": "STARTED",
          "primary": true,
          "node": "b380d522-4801-46dd-0001-000000000000",
          "relocating_node": null,
          "shard": 1,
          "index": ".kibana_1",
          "token_ranges": [
            "(-4905935092955018317,-4634350246539782229]",
            "(-3857615872770993871,-2876405104543864077]",
            "(-2092581070725691530,-1391742915081687192]",
            "(-1169679320448494956,260474944050277681]",
            "(679763209506391150,1066882304983515845]",
            "(3796001557251052615,5355091691330181301]"
          ],
          "allocation_id": {
            "id": "dummy_alloc_id"
          }
        }
      ],
      "fefe51cc-6ae3-48e8-0000-000000000000": [
        {
          "state": "STARTED",
          "primary": true,
          "node": "fefe51cc-6ae3-48e8-0000-000000000000",
          "relocating_node": null,
          "shard": 0,
          "index": ".kibana_1",
          "token_ranges": [
            "(-9223372036854775808,-9061546716106502397]",
            "(-6865113398631406170,-5994674258152922714]",
            "(-2876405104543864077,-2092581070725691530]",
            "(-1391742915081687192,-1169679320448494956]",
            "(1066882304983515845,1928397905685598346]",
            "(2279437765047107097,3762179910787726390]",
            "(5355091691330181301,9223372036854775807]"
          ],
          "allocation_id": {
            "id": "dummy_alloc_id"
          }
        }
      ],
      "ab010464-bbdd-4653-0002-000000000000": [
        {
          "state": "STARTED",
          "primary": true,
          "node": "ab010464-bbdd-4653-0002-000000000000",
          "relocating_node": null,
          "shard": 2,
          "index": ".kibana_1",
          "token_ranges": [
            "(-9061546716106502397,-6865113398631406170]",
            "(-5994674258152922714,-4905935092955018317]",
            "(-4634350246539782229,-3857615872770993871]",
            "(260474944050277681,679763209506391150]",
            "(1928397905685598346,2279437765047107097]",
            "(3762179910787726390,3796001557251052615]"
          ],
          "allocation_id": {
            "id": "dummy_alloc_id"
          }
        }
      ]
    }
  },
  "snapshot_deletions": {
    "snapshot_deletions": []
  },
  "snapshots": {
    "snapshots": []
  },
  "restore": {
    "snapshots": []
  }
}

Conclusion

Here you get a multi-cloud Elassandra cluster running in multiple Kubernetes clusters. The Elassandra Operator gives you the flexibility to deploy on the cloud or on premise, in a public or private network. You can scale up/scale down, park/unpark your datacenters, you can loose a kubernetes node, a persistent volume or event a zone, the Elassandra datacenter remains up and running and you don’t have to manage any sync issue between your database and your Elasticsearch cluster.

In next the articles, we’ll see how the Elassandra Operator deploys Kibana for data visualisation and Cassandra Reaper to manage continuous Cassandra repairs. We’ll also see how to setup the Prometheus Operator with Grafana dashboards to monitor the Elassandra Operator, the Elassandra nodes and Kubernetes resources.

Have fun with this Elassandra Operator and thanks in advance for your feedbacks !

References