With the Kubernetes Elassandra Operator, let’s see how to deploy an Elassandra cluster running in many Kubernetes Clusters, with AKS and GKE.
Kubernetes adoption among IT departments is growing over time and that’s definitely a game changer, and running databases under Kubernetes is the next challenge to run your microservices on any cloud provider. Elassandra is providing a nice solution to achieve this as it provides both a distributed database, Apache Cassandra, and an embedded Elasticsearch. With the Kubernetes Elassandra Operator, let’s see how to deploy an Elassandra cluster running in many Kubernetes Clusters, with Azure Kubernetes Service and Google Kubernetes Engine.
Summary:
The Elassandra Operator creates one Kubernetes statefulset per availability zone mapped to a Cassandra rack. Thus, in case of a zone failure, data is properly distributed across replicas and remains available.
The Elassandra Operator watches for Kubernetes nodes and identifies availability zones through the node label failure-domain.beta.kubernetes.io/zone. Each statefulset is named with its zone index rather than a zone name to keep the naming standard whatever are zone names. Here is a 9 nodes Elassandra datacenter running in 3 availability zones in a Kubernetes cluster.
By default, for each Elassandra/Cassandra cluster, you can have only one Elassandra node per Kubernetes node (enforced by an anti-affinity rule).
To ensure data consistency, Persistent Volume Claims are allocated in the zone as of the associated statefulset (or Cassandra rack).
For the purpose of the demonstration, we are deploying an Elassandra cluster using public IP addresses to connect the datacenter DC1 running on AKS, to another Elassandra datacenter DC2 running on GKE. Of course, for security reasons, it would be better to run such a cluster in a private network interconnected through a VPC or a VPN, but who can do more can do less.
We are going to use Azure VMScaleSet with public IP addresses on Kubernetes nodes, and this requires the Azure CLI aks-preview extension:
az extension add --name aks-preview az extension update --name aks-preview az feature register --name NodePublicIPPreview --namespace Microsoft.ContainerService
Create an Azure resource group and an AKS regional cluster running on 3 zones with public IP addresses on Kubernetes nodes and the Azure network plugin:
AZURE_REGION=westeurope RESOURCE_GROUP_NAME=aks1 K8S_CLUSTER_NAME=kube1 az group create -l $AZURE_REGION -n $RESOURCE_GROUP_NAME az aks create --name "${K8S_CLUSTER_NAME}" \ --resource-group ${RESOURCE_GROUP_NAME} \ --network-plugin azure \ --node-count 3 \ --node-vm-size Standard_D2_v3 \ --vm-set-type VirtualMachineScaleSets \ --output table \ --zone 1 2 3 \ --enable-node-public-ip az aks get-credentials --name "${K8S_CLUSTER_NAME}" --resource-group $RESOURCE_GROUP_NAME --output table
Unfortunately, AKS does not map VM’s public IP address to the Kubernetes node external IP address, so the trick is to add these public IP addresses as a kubernetes custom label elassandra.strapdata.com/public-ip to each nodes.
add_vmss_public_ip() { AKS_RG_NAME=$(az resource show --namespace Microsoft.ContainerService --resource-type managedClusters -g $RESOURCE_GROUP_NAME -n $K8S_CLUSTER_NAME | jq -r .properties.nodeResourceGroup) AKS_VMSS_INSTANCE=$(kubectl get nodes -o json | jq -r ".items[${1:-0}].metadata.name") PUBLIC_IP=$(az vmss list-instance-public-ips -g $AKS_RG_NAME -n ${AKS_VMSS_INSTANCE::-6} | jq -r ".[${1:-0}].ipAddress") kubectl label nodes --overwrite $AKS_VMSS_INSTANCE elassandra.strapdata.com/public-ip=$PUBLIC_IP } NODE_COUNT=$(kubectl get nodes --no-headers | wc -l) for i in $(seq 0 $((NODE_COUNT-1))); do add_vmss_public_ip $i done
And you should get something like this:
kubectl get nodes -L failure-domain.beta.kubernetes.io/zone,elassandra.strapdata.com/public-ip NAME STATUS ROLES AGE VERSION ZONE PUBLIC-IP aks-nodepool1-74300635-vmss000000 Ready agent 8m18s v1.15.11 westeurope-1 51.138.75.131 aks-nodepool1-74300635-vmss000001 Ready agent 8m12s v1.15.11 westeurope-2 40.113.160.148 aks-nodepool1-74300635-vmss000002 Ready agent 8m22s v1.15.11 westeurope-3 51.124.121.185
Install HELM 2 and add the strapdata repository:
kubectl create serviceaccount --namespace kube-system tiller kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=kube-system:tiller helm init --wait --service-account tiller helm repo add strapdata https://charts.strapdata.com
Azure persistent volumes are bound to an availability zone, so we need to defined one storageClass per zone in our AKS cluster, and each Elassandra rack or statefulSet will be bound to the corresponding storageClass. This is done here using the HELM chart strapdata/storageclass.
for z in 1 2 3; do helm install --name ssd-$AZURE_REGION-$z --namespace kube-system \ --set parameters.kind="Managed" \ --set parameters.cachingmode="ReadOnly" \ --set parameters.storageaccounttype="StandardSSD_LRS" \ --set provisioner="kubernetes.io/azure-disk" \ --set zone="$AZURE_REGION-${z}" \ --set nameOverride="ssd-$AZURE_REGION-$z" \ $HELM_REPO/storageclass done
Finally, you may need to authorise inbound Elassandra connections on the following TCP ports:
Assuming you deploy an Elassandra datacenter respectively using ports 39000, 39001, and 39002 exposed to the internet, with no source IP address restrictions:
AKS_RG_NAME=$(az resource show --namespace Microsoft.ContainerService --resource-type managedClusters -g $RESOURCE_GROUP_NAME -n "${K8S_CLUSTER_NAME}" | jq -r .properties.nodeResourceGroup) NSG_NAME=$(az network nsg list -g $AKS_RG_NAME | jq -r .[0].name) az network nsg rule create \ --resource-group $AKS_RG_NAME \ --nsg-name $NSG_NAME \ --name elassandra_inbound \ --description "Elassandra inbound rule" \ --priority 2000 \ --access Allow \ --source-address-prefixes Internet \ --protocol Tcp \ --direction Inbound \ --destination-address-prefixes '*' \ --destination-port-ranges 39000-39002
Create a Regional Kubernetes cluster on GCP, with RBAC enabled:
GCLOUD_PROJECT=strapkube1 K8S_CLUSTER_NAME=kube2 GCLOUD_REGION=europe-west1 gcloud container clusters create $K8S_CLUSTER_NAME \ --region $GCLOUD_REGION \ --project $GCLOUD_PROJECT \ --machine-type "n1-standard-2" \ --cluster-version=1.15 \ --tags=$K8S_CLUSTER_NAME \ --num-nodes "1" gcloud container clusters get-credentials $K8S_CLUSTER_NAME --region $GCLOUD_REGION --project $GCLOUD_PROJECT kubectl create clusterrolebinding cluster-admin-binding --clusterrole cluster-admin --user $(gcloud config get-value account)
Install HELM 2 (like on AKS).
Google cloud persistent volumes are bound to an availability zone, so we need to defined one storageClass per zone in our Kubernetes cluster, and each Elassandra rack or statefulSet will be bound to the corresponding storageClass. This is done here using the HELM chart strapdata/storageclass.
for z in europe-west1-b europe-west1-c europe-west1-d; do helm install --name ssd-$z --namespace kube-system \ --set parameters.type="pd-ssd" \ --set provisioner="kubernetes.io/gce-pd" \ --set zone=$z,nameOverride=ssd-$z \ strapdata/storageclass done
Assuming you deploy an Elassandra datacenter respectively using ports 39000, 39001, and 39002 exposed to the internet, with no source IP address restrictions, and Kubernetes nodes are properly tagged with the k8s cluster name, you can create an inbound firewall rule like this:
VPC_NETWORK=$(gcloud container clusters describe $K8S_CLUSTER_NAME --region $GCLOUD_REGION --format='value(network)') NODE_POOLS_TARGET_TAGS=$(gcloud container clusters describe $K8S_CLUSTER_NAME --region $GCLOUD_REGION --format='value[terminator=","](nodePools.config.tags)' --flatten='nodePools[].config.tags[]' | sed 's/,\{2,\}//g') gcloud compute firewall-rules create "allow-elassandra-inbound" \ --allow tcp:39000-39002 \ --network="$VPC_NETWORK" \ --target-tags="$NODE_POOLS_TARGET_TAGS" \ --description="Allow elassandra inbound" \ --direction INGRESS
GKE is provided with KubeDNS by default, which does not allows to configure host aliases to resolve public IP addresses to internal Kubernetes node IP addresses (required by the Cassandra AddressTranslator to connect to Elassandra nodes using the internal IP address). So we need to install CoreDNS configured to import custom configuration (see CoreDNS import plugin), and configure KubeDNS with a stub domain to forward to CoreDNS.
helm install --name coredns --namespace=kube-system -f coredns-values.yaml stable/coredns
The used coredns-value.yaml is available here
Once CoreDNS is installed, add a stub domain to forward request for domain internal.strapdata.com to the CoreDNS service, and restart KubeDNS pods. The internal.strapdata.com is just a dummy DNS domain used to resolve public IP addresses to Kubernetes nodes internal IP addresses.
COREDNS_SERVICE_IP=$(kubectl get service -l k8s-app=coredns -n kube-system -o jsonpath='{.items[0].spec.clusterIP}') KUBEDNS_STUB_DOMAINS="{\\\"internal.strapdata.com\\\": [\\\"$COREDNS_SERVICE_IP\\\"]}" kubectl patch configmap/kube-dns -n kube-system -p "{\"data\": {\"stubDomains\": \"$KUBEDNS_STUB_DOMAINS\"}}" kubectl delete pod -l k8s-app=coredns -n kube-system
Once your AKS and GKE clusters are running, we need to deploy and configure additional services in these two clusters.
Install the Elassandra operator in the default namespace:
helm install --namespace default --name elassop --wait strapdata/elassandra-operator
The Kubernetes CoreDNS is used for two reasons:
You can deploy the CodeDNS custom configuration with the strapdata coredns-forwarder HELM chart to basically install (or replace) the coredns-custom configmap, and restart coreDNS pods.
HOST_ALIASES=$(kubectl get nodes -o custom-columns='INTERNAL-IP:.status.addresses[?(@.type=="InternalIP")].address,EXTERNAL-IP:.status.addresses[?(@.type=="ExternalIP")].address' --no-headers |\ awk '{ gsub(/\./,"-",$2); printf("nodes.hosts[%d].name=%s,nodes.hosts[%d].value=%s,",NR-1, $2, NR-1, $1); }')
If your Kubernetes nodes does not have the ExternalIP set (like AKS), public node IP address should be available through the custom label elassandra.strapdata.com/public-ip.
HOST_ALIASES=$(kubectl get nodes -o custom-columns='INTERNAL-IP:.status.addresses[?(@.type=="InternalIP")].address,PUBLIC-IP:.metadata.labels.elassandra\.strapdata\.com/public-ip' --no-headers |\ awk '{ gsub(/\./,"-",$2); printf("nodes.hosts[%d].name=%s,nodes.hosts[%d].value=%s,",NR-1, $2, NR-1, $1); }')
Then configure the CoreDNS custom configmap with your DNS name servers and host aliases. In the following example, this is Azure DNS name servers:
kubectl delete configmap --namespace kube-system coredns-custom helm install --name coredns-forwarder --namespace kube-system \ --set forwarders.domain="${DNS_DOMAIN}" \ --set forwarders.hosts[0]="40.90.4.8" \ --set forwarders.hosts[1]="64.4.48.8" \ --set forwarders.hosts[2]="13.107.24.8" \ --set forwarders.hosts[3]="13.107.160.8" \ --set nodes.domain=internal.strapdata.com \ --set $HOST_ALIASES \ strapdata/coredns-forwarder
Then restart CoreDNS pods to reload our configuration, but this depends on coreDNS deployment labels !
On AKS:
kubectl delete pod --namespace kube-system -l k8s-app=kube-dns
On GKE:
kubectl delete pod --namespace kube-system -l k8s-app=coredns
Check the CoreDNS custom configuration:
kubectl get configmap -n kube-system coredns-custom -o yaml apiVersion: v1 data: dns.server: | test.strapkube.com:53 { errors cache 30 forward $DNS_DOMAIN 40.90.4.8 64.4.48.8 13.107.24.8 13.107.160.8 } hosts.override: | hosts nodes.hosts internal.strapdata.com { 10.132.0.57 146-148-117-125.internal.strapdata.com 146-148-117-125 10.132.0.58 35-240-56-87.internal.strapdata.com 35-240-56-87 10.132.0.56 34-76-40-251.internal.strapdata.com 34-76-40-251 fallthrough } kind: ConfigMap metadata: creationTimestamp: "2020-06-26T16:45:52Z" name: coredns-custom namespace: kube-system resourceVersion: "6632" selfLink: /api/v1/namespaces/kube-system/configmaps/coredns-custom uid: dca59c7d-6503-48c1-864f-28ae46319725
Deploy a dnsutil pod:
cat <<EOF | kubectl apply -f - apiVersion: v1 kind: Pod metadata: name: dnsutils namespace: default spec: containers: - name: dnsutils image: gcr.io/kubernetes-e2e-test-images/dnsutils:1.3 command: - sleep - "3600" imagePullPolicy: IfNotPresent restartPolicy: Always EOF
Test resolution of public IP names to internal Kubernetes node IP address:
kubectl exec -ti dnsutils -- nslookup 146-148-117-125.internal.strapdata.com Server: 10.19.240.10 Address: 10.19.240.10#53 Name: 146-148-117-125.internal.strapdata.com Address: 10.132.0.57
The ExternalDNS is used to automatically update your DNS zone and create an A record for the Cassandra broadcast IP addresses. You can use it with a public or a private DNS zone, and with any DNS provider supported by ExternalDNS. In the following setup, we will use a DNS zone hosted on Azure.
helm install --name my-externaldns --namespace default \ --set logLevel="debug" \ --set rbac.create=true \ --set policy="sync",txtPrefix=$(kubectl config current-context)\ --set sources[0]="service",sources[1]="ingress",sources[2]="crd" \ --set crd.create=true,crd.apiversion="externaldns.k8s.io/v1alpha1",crd.kind="DNSEndpoint" \ --set provider="azure" \ --set azure.secretName="$AZURE_DNS_SECRET_NAME",azure.resourceGroup="$AZURE_DNS_RESOURCE_GROUP" \ --set azure.tenantId="$AZURE_DNS_TENANT_ID",azure.subscriptionId="$AZURE_SUBSCRIPTION_ID" \ --set azure.aadClientId="$AZURE_DNS_CLIENT_ID",azure.aadClientSecret="$AZURE_DNS_CLIENT_SECRET" \ stable/external-dns
Deploy the first datacenter dc1 of the Elassandra cluster cl1 in the Kubernetes cluster kube1, with Kibana and Cassandra Reaper available through the Traefik ingress controller.
helm install --namespace default --name "default-cl1-dc1" \ --set dataVolumeClaim.storageClassName="ssd-{zone}" \ --set cassandra.sslStoragePort="39000" \ --set cassandra.nativePort="39001" \ --set elasticsearch.httpPort="39002" \ --set elasticsearch.transportPort="39003" \ --set jvm.jmxPort="39004" \ --set jvm.jdb="39005" \ --set prometheus.port="39006" \ --set replicas="3" \ --set networking.hostNetworkEnabled=true \ --set networking.externalDns.enabled=true \ --set networking.externalDns.domain=${DNS_DOMAIN} \ --set networking.externalDns.root=cl1-dc1 \ --set kibana.enabled="true",kibana.spaces[0].ingressAnnotations."kubernetes\.io/ingress\.class"="traefik",kibana.spaces[0].ingressSuffix=kibana.${TRAEFIK_FQDN} \ --set reaper.enabled="true",reaper.ingressAnnotations."kubernetes\.io/ingress\.class"="traefik",reaper.ingressHost=reaper.${TRAEFIK_FQDN} \ --wait $HELM_REPO/elassandra-datacenter
Once the Elassandra datacenter is deployed, you get 3 Elassandra pods from 3 StatefulSets:
kubectl get all -l app.kubernetes.io/managed-by=elassandra-operator NAME READY STATUS RESTARTS AGE pod/elassandra-cl1-dc1-0-0 1/1 Running 0 21m pod/elassandra-cl1-dc1-1-0 1/1 Running 0 19m pod/elassandra-cl1-dc1-2-0 1/1 Running 0 16m pod/elassandra-cl1-dc1-kibana-kibana-5b94445c6b-6fxww 1/1 Running 0 13m pod/elassandra-cl1-dc1-reaper-67d78d797d-2wr2z 1/1 Running 0 13m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/elassandra-cl1-dc1 ClusterIP None none 39000/TCP,39001/TCP,39004/TCP,39002/TCP,39006/TCP 21m service/elassandra-cl1-dc1-admin ClusterIP 10.0.234.29 none 39004/TCP 21m service/elassandra-cl1-dc1-elasticsearch ClusterIP 10.0.21.82 none 39002/TCP 21m service/elassandra-cl1-dc1-external ClusterIP 10.0.128.132 none 39001/TCP,39002/TCP 21m service/elassandra-cl1-dc1-kibana-kibana ClusterIP 10.0.145.131 none 5601/TCP 13m service/elassandra-cl1-dc1-reaper ClusterIP 10.0.234.176 none 8080/TCP,8081/TCP 13m NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/elassandra-cl1-dc1-kibana-kibana 1/1 1 1 13m deployment.apps/elassandra-cl1-dc1-reaper 1/1 1 1 13m NAME DESIRED CURRENT READY AGE replicaset.apps/elassandra-cl1-dc1-kibana-kibana-5b94445c6b 1 1 1 13m replicaset.apps/elassandra-cl1-dc1-reaper-67d78d797d 1 1 1 13m NAME READY AGE statefulset.apps/elassandra-cl1-dc1-0 1/1 21m statefulset.apps/elassandra-cl1-dc1-1 1/1 19m statefulset.apps/elassandra-cl1-dc1-2 1/1 16m
Once the datacenter is ready, check the cluster status:
kubectl exec elassandra-cl1-dc1-0-0 -- nodetool -u cassandra -pwf /etc/cassandra/jmxremote.password --jmxmp --ssl status Datacenter: dc1 =============== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 40.113.160.148 365.18 KiB 8 ? cf2cda52-05fe-4add-0001-000000000000 westeurope-2 UN 51.124.121.185 317.22 KiB 8 ? 141dcdbc-8e35-4535-0002-000000000000 westeurope-3 UN 51.138.75.131 302.2 KiB 8 ? edf45f07-3128-46f1-0000-000000000000 westeurope-1
Then get the generated TLS certificates and the Cassandra admin password (Because using the default cassandra user is not recommended, the Elassandra operator automatically creates an admin superuser role):
kubectl get secret elassandra-cl1-ca-pub --context kube1 -n default -o jsonpath='{.data.cacert\.pem}' | base64 -D > cl1-cacert.pem CASSANDRA_ADMIN_PASSWORD=$(kb get secret elassandra-cl1 --context kube1 -o jsonpath='{.data.cassandra\.admin_password}' | base64 -D)
Connect to the Elassandra/Cassandra node from the internet:
SSL_CERTFILE=cl1-cacert.pem bin/cqlsh --ssl -u admin -p $CASSANDRA_ADMIN_PASSWORD cassandra-cl1-dc1-0-0.$DNS_DOMAIN 39001 Connected to cl1 at cassandra-cl1-dc1-0-0.test.strapkube.com:39001. [cqlsh 5.0.1 | Cassandra 3.11.6.1 | CQL spec 3.4.4 | Native protocol v4] Use HELP for help. admin@cqlsh>
Finally, you can check the Elassandra datacenter status (The CRD managed by the Elassandra Operator):
kubectl get edc elassandra-cl1-dc1 -o yaml ...(spec removed) status: bootstrapped: true cqlStatus: ESTABLISHED cqlStatusMessage: Connected to cluster=[cl1] with role=[elassandra_operator] secret=[elassandra-cl1/cassandra.elassandra_operator_password] health: GREEN keyspaceManagerStatus: keyspaces: - reaper_db - system_traces - elastic_admin - system_distributed - _kibana_1 - system_auth replicas: 3 kibanaSpaceNames: - kibana needCleanup: true needCleanupKeyspaces: [] observedGeneration: 1 operationHistory: - actions: - Updating kibana space=[kibana] - Cassandra reaper registred durationInMs: 2099 lastTransitionTime: "2020-07-02T21:30:09.411Z" pendingInMs: 2 triggeredBy: Status update deployment=elassandra-cl1-dc1-reaper - actions: - Update keyspace RF for [reaper_db] - Update keyspace RF for [system_traces] - Update keyspace RF for [elastic_admin] - Update keyspace RF for [system_distributed] - Update keyspace RF for [_kibana_1] - Update keyspace RF for [system_auth] - Create or update role=[kibana] - Create or update role=[reaper] - Cassandra reaper deployed - Deploying kibana space=[kibana] durationInMs: 58213 lastTransitionTime: "2020-07-02T21:28:17.621Z" pendingInMs: 1 triggeredBy: Status update statefulset=elassandra-cl1-dc1-2 replicas=1/1 - actions: - scale-up rack index=2 name=westeurope-3 durationInMs: 152 lastTransitionTime: "2020-07-02T21:26:15.883Z" pendingInMs: 2 triggeredBy: Status update statefulset=elassandra-cl1-dc1-1 replicas=1/1 - actions: - scale-up rack index=1 name=westeurope-2 durationInMs: 430 lastTransitionTime: "2020-07-02T21:23:31.076Z" pendingInMs: 5 triggeredBy: Status update statefulset=elassandra-cl1-dc1-0 replicas=1/1 - actions: - Datacenter resources deployed durationInMs: 4306 lastTransitionTime: "2020-07-02T21:21:49.409Z" pendingInMs: 209 triggeredBy: Datacenter added phase: RUNNING rackStatuses: "0": desiredReplicas: 1 fingerprint: 230ec90-3feeca9 health: GREEN index: 0 name: westeurope-1 progressState: RUNNING readyReplicas: 1 "1": desiredReplicas: 1 fingerprint: 230ec90-3feeca9 health: GREEN index: 1 name: westeurope-2 progressState: RUNNING readyReplicas: 1 "2": desiredReplicas: 1 fingerprint: 230ec90-3feeca9 health: GREEN index: 2 name: westeurope-3 progressState: RUNNING readyReplicas: 1 readyReplicas: 3 reaperRegistred: true zones: - westeurope-1 - westeurope-2 - westeurope-3
First, we need to copy cluster secrets from the Elassandra datacenter dc1 into the Kubernetes kube2 running on GKE.
for s in elassandra-cl1 elassandra-cl1-ca-pub elassandra-cl1-ca-key elassandra-cl1-kibana; do kubectl get secret $s — context kube1 — export -n default -o yaml | kubectl apply — context kube2 -n default -f - done
Then deploy the Elassandra datacenter dc2 into the GKE cluster2, using the same ports.
helm install --namespace default --name "default-cl1-dc2" \ --set dataVolumeClaim.storageClassName="ssd-{zone}" \ --set cassandra.sslStoragePort="39000" \ --set cassandra.nativePort="39001" \ --set elasticsearch.httpPort="39002" \ --set elasticsearch.transportPort="39003" \ --set jvm.jmxPort="39004" \ --set jvm.jdb="39005" \ --set prometheus.port="39006" \ --set replicas="3" \ --set cassandra.remoteSeeds[0]=cassandra-cl1-dc1-0-0.${DNS_DOMAIN} \ --set networking.hostNetworkEnabled=true \ --set networking.externalDns.enabled=true \ --set networking.externalDns.domain=${DNS_DOMAIN} \ --set networking.externalDns.root=cl1-dc2 \ --set kibana.enabled="true",kibana.spaces[0].ingressAnnotations."kubernetes\.io/ingress\.class"="traefik",kibana.spaces[0].ingressSuffix=kibana.${TRAEFIK_FQDN} \ --set reaper.enabled="true",reaper.ingressAnnotations."kubernetes\.io/ingress\.class"="traefik",reaper.ingressHost=reaper.${TRAEFIK_FQDN} \ --wait strapdata/elassandra-datacenter
Once dc2 Elassandra pods are started, you get a running Elassandra cluster in AKS and GKE.
Datacenter: dc1 =============== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 40.113.160.148 2.05 MiB 8 ? cf2cda52-05fe-4add-0001-000000000000 westeurope-2 UN 51.124.121.185 2.44 MiB 8 ? 141dcdbc-8e35-4535-0002-000000000000 westeurope-3 UN 51.138.75.131 1.8 MiB 8 ? edf45f07-3128-46f1-0000-000000000000 westeurope-1 Datacenter: dc2 =============== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 104.155.105.232 179.86 KiB 8 ? fefe51cc-6ae3-48e8-0000-000000000000 europe-west1-b UN 35.241.189.119 95.66 KiB 8 ? b380d522-4801-46dd-0001-000000000000 europe-west1-c UN 35.187.1.156 95.35 KiB 8 ? ab010464-bbdd-4653-0002-000000000000 europe-west1-d
The datacenter dc2 started without streaming data, and we now setup keyspace replication before rebuilding the datacenter from dc1 using an Elassandra task CRD. This task automatically includes Cassandra system keyspaces (system_auth, system_distributed, system_traces, and elastic_admin if Elasticsearch is enabled).
cat <<EOF | kubectl apply --context kube1 -f - apiVersion: elassandra.strapdata.com/v1beta1 kind: ElassandraTask metadata: name: replication-add-$$ namespace: default spec: cluster: "cl1" datacenter: "dc1" replication: action: ADD dcName: "dc2" dcSize: 3 replicationMap: reaper_db: 3 EOF edctl watch-task --context kube1 -n replication-add-573 -ns default --phase SUCCEED 19:54:04.505 [main] INFO i.m.context.env.DefaultEnvironment.init:210 Established active environments: [cli] Watching elassandra task context=kube1 name=replication-add-573 namespace=default phase=SUCCEED timeout=600s 19:54:06 ADDED: replication-add-573 phase=WAITING 19:55:02 MODIFIED: replication-add-573 phase=SUCCEED done 56772ms
The edctl utility allow to wait on conditions on Elassandra datacenters or tasks. We now rebuild dc2 from dc1 by streaming the data:
cat <<EOF | kubectl apply --context gke_strapkube1_europe-west1_kube2 -f - apiVersion: elassandra.strapdata.com/v1beta1 kind: ElassandraTask metadata: name: rebuild-dc2-$$ namespace: default spec: cluster: "cl1" datacenter: "dc2" rebuild: srcDcName: "dc1" EOF edctl watch-task --context gke_strapkube1_europe-west1_kube2 -n rebuild-dc2-573 -ns default --phase SUCCEED 19:59:29.458 [main] INFO i.m.context.env.DefaultEnvironment.init:210 Established active environments: [cli] Watching elassandra task context=gke_strapkube1_europe-west1_kube2 name=rebuild-dc2-573 namespace=default phase=SUCCEED timeout=600s "19:59:30 ADDED: rebuild-dc2-573 phase=SUCCEED done 49ms
If Elasticsearch is enabled in dc2, you need to run restart Elassandra pods to update the Elasticsearch cluster state since data have been populated by streaming data from dc1.
kubectl delete pod --namespace default -l app=elassandra,elassandra.strapdata.com/datacenter=dc2
Finally, check you can connect on dc2:
SSL_CERTFILE=cl1-cacert.pem bin/cqlsh --ssl -u admin -p $CASSANDRA_ADMIN_PASSWORD cassandra-cl1-dc2-0-0.$DNS_DOMAIN 39001 Connected to cl1 at cassandra-cl1-dc2-0-0.test.strapkube.com:39001. [cqlsh 5.0.1 | Cassandra 3.11.6.1 | CQL spec 3.4.4 | Native protocol v4] Use HELP for help. admin@cqlsh>
Check the Elasticsearch cluster status on dc2. The kibana index was automatically created by the deployed kibana pod running in the Kubernetes cluster kube2:
curl -k --user admin:c96ead5f-ad7c-4c0a-baeb-4ff037bfb185 https://cassandra-cl1-dc2-0-0.test.strapkube.com:39002/_cluster/state { "cluster_name": "cl1", "cluster_uuid": "141dcdbc-8e35-4535-0002-000000000000", "version": 319, "state_uuid": "jIPjLqTfTUGgG04cSsYGuw", "master_node": "fefe51cc-6ae3-48e8-0000-000000000000", "blocks": {}, "nodes": { "b380d522-4801-46dd-0001-000000000000": { "name": "35.241.189.119", "status": "ALIVE", "ephemeral_id": "b380d522-4801-46dd-0001-000000000000", "transport_address": "10.132.0.60:9300", "attributes": { "rack": "europe-west1-c", "dc": "dc2" } }, "fefe51cc-6ae3-48e8-0000-000000000000": { "name": "104.155.105.232", "status": "ALIVE", "ephemeral_id": "fefe51cc-6ae3-48e8-0000-000000000000", "transport_address": "10.132.0.59:9300", "attributes": { "rack": "europe-west1-b", "dc": "dc2" } }, "ab010464-bbdd-4653-0002-000000000000": { "name": "35.187.1.156", "status": "ALIVE", "ephemeral_id": "ab010464-bbdd-4653-0002-000000000000", "transport_address": "10.132.0.61:9300", "attributes": { "rack": "europe-west1-d", "dc": "dc2" } } }, "metadata": { "version": 2, "cluster_uuid": "141dcdbc-8e35-4535-0002-000000000000", "templates": {}, "indices": { ".kibana_1": { "state": "open", "settings": { "index": { "keyspace": "_kibana_1", "number_of_shards": "3", "auto_expand_replicas": "0-1", "provided_name": ".kibana_1", "creation_date": "1593725363439", "number_of_replicas": "2", "uuid": "_ATeExQwRlq_iDUnyMQkqw", "version": { "created": "6080499" } } }, "mappings": { "doc": { "dynamic": "strict", "properties": { "server": { "properties": { "uuid": { "type": "keyword" } } }, "visualization": { "properties": { "savedSearchId": { "type": "keyword" }, "description": { "type": "text" }, "uiStateJSON": { "type": "text" }, "title": { "type": "text" }, "version": { "type": "integer" }, "kibanaSavedObjectMeta": { "properties": { "searchSourceJSON": { "type": "text" } } }, "visState": { "type": "text" } } }, "kql-telemetry": { "properties": { "optInCount": { "type": "long" }, "optOutCount": { "type": "long" } } }, "type": { "type": "keyword" }, "url": { "properties": { "accessCount": { "type": "long" }, "accessDate": { "type": "date" }, "url": { "type": "text", "fields": { "keyword": { "ignore_above": 2048, "type": "keyword" } } }, "createDate": { "type": "date" } } }, "migrationVersion": { "dynamic": "true", "type": "object" }, "index-pattern": { "properties": { "notExpandable": { "type": "boolean" }, "fieldFormatMap": { "type": "text" }, "sourceFilters": { "type": "text" }, "typeMeta": { "type": "keyword" }, "timeFieldName": { "type": "keyword" }, "intervalName": { "type": "keyword" }, "fields": { "type": "text" }, "title": { "type": "text" }, "type": { "type": "keyword" } } }, "search": { "properties": { "hits": { "type": "integer" }, "columns": { "type": "keyword" }, "description": { "type": "text" }, "sort": { "type": "keyword" }, "title": { "type": "text" }, "version": { "type": "integer" }, "kibanaSavedObjectMeta": { "properties": { "searchSourceJSON": { "type": "text" } } } } }, "updated_at": { "type": "date" }, "namespace": { "type": "keyword" }, "timelion-sheet": { "properties": { "hits": { "type": "integer" }, "timelion_sheet": { "type": "text" }, "timelion_interval": { "type": "keyword" }, "timelion_columns": { "type": "integer" }, "timelion_other_interval": { "type": "keyword" }, "timelion_rows": { "type": "integer" }, "description": { "type": "text" }, "title": { "type": "text" }, "version": { "type": "integer" }, "kibanaSavedObjectMeta": { "properties": { "searchSourceJSON": { "type": "text" } } }, "timelion_chart_height": { "type": "integer" } } }, "config": { "dynamic": "true", "properties": { "buildNum": { "type": "keyword" } } }, "dashboard": { "properties": { "hits": { "type": "integer" }, "timeFrom": { "type": "keyword" }, "timeTo": { "type": "keyword" }, "refreshInterval": { "properties": { "display": { "type": "keyword" }, "section": { "type": "integer" }, "value": { "type": "integer" }, "pause": { "type": "boolean" } } }, "description": { "type": "text" }, "uiStateJSON": { "type": "text" }, "timeRestore": { "type": "boolean" }, "title": { "type": "text" }, "version": { "type": "integer" }, "kibanaSavedObjectMeta": { "properties": { "searchSourceJSON": { "type": "text" } } }, "optionsJSON": { "type": "text" }, "panelsJSON": { "type": "text" } } } } } }, "aliases": [ ".kibana" ], "primary_terms": { "0": 0, "1": 0, "2": 0 }, "in_sync_allocations": { "1": [], "2": [], "0": [] } } }, "index-graveyard": { "tombstones": [] } }, "routing_table": { "indices": { ".kibana_1": { "shards": { "1": [ { "state": "STARTED", "primary": true, "node": "b380d522-4801-46dd-0001-000000000000", "relocating_node": null, "shard": 1, "index": ".kibana_1", "token_ranges": [ "(-4905935092955018317,-4634350246539782229]", "(-3857615872770993871,-2876405104543864077]", "(-2092581070725691530,-1391742915081687192]", "(-1169679320448494956,260474944050277681]", "(679763209506391150,1066882304983515845]", "(3796001557251052615,5355091691330181301]" ], "allocation_id": { "id": "dummy_alloc_id" } } ], "2": [ { "state": "STARTED", "primary": true, "node": "ab010464-bbdd-4653-0002-000000000000", "relocating_node": null, "shard": 2, "index": ".kibana_1", "token_ranges": [ "(-9061546716106502397,-6865113398631406170]", "(-5994674258152922714,-4905935092955018317]", "(-4634350246539782229,-3857615872770993871]", "(260474944050277681,679763209506391150]", "(1928397905685598346,2279437765047107097]", "(3762179910787726390,3796001557251052615]" ], "allocation_id": { "id": "dummy_alloc_id" } } ], "0": [ { "state": "STARTED", "primary": true, "node": "fefe51cc-6ae3-48e8-0000-000000000000", "relocating_node": null, "shard": 0, "index": ".kibana_1", "token_ranges": [ "(-9223372036854775808,-9061546716106502397]", "(-6865113398631406170,-5994674258152922714]", "(-2876405104543864077,-2092581070725691530]", "(-1391742915081687192,-1169679320448494956]", "(1066882304983515845,1928397905685598346]", "(2279437765047107097,3762179910787726390]", "(5355091691330181301,9223372036854775807]" ], "allocation_id": { "id": "dummy_alloc_id" } } ] } } } }, "routing_nodes": { "unassigned": [], "nodes": { "b380d522-4801-46dd-0001-000000000000": [ { "state": "STARTED", "primary": true, "node": "b380d522-4801-46dd-0001-000000000000", "relocating_node": null, "shard": 1, "index": ".kibana_1", "token_ranges": [ "(-4905935092955018317,-4634350246539782229]", "(-3857615872770993871,-2876405104543864077]", "(-2092581070725691530,-1391742915081687192]", "(-1169679320448494956,260474944050277681]", "(679763209506391150,1066882304983515845]", "(3796001557251052615,5355091691330181301]" ], "allocation_id": { "id": "dummy_alloc_id" } } ], "fefe51cc-6ae3-48e8-0000-000000000000": [ { "state": "STARTED", "primary": true, "node": "fefe51cc-6ae3-48e8-0000-000000000000", "relocating_node": null, "shard": 0, "index": ".kibana_1", "token_ranges": [ "(-9223372036854775808,-9061546716106502397]", "(-6865113398631406170,-5994674258152922714]", "(-2876405104543864077,-2092581070725691530]", "(-1391742915081687192,-1169679320448494956]", "(1066882304983515845,1928397905685598346]", "(2279437765047107097,3762179910787726390]", "(5355091691330181301,9223372036854775807]" ], "allocation_id": { "id": "dummy_alloc_id" } } ], "ab010464-bbdd-4653-0002-000000000000": [ { "state": "STARTED", "primary": true, "node": "ab010464-bbdd-4653-0002-000000000000", "relocating_node": null, "shard": 2, "index": ".kibana_1", "token_ranges": [ "(-9061546716106502397,-6865113398631406170]", "(-5994674258152922714,-4905935092955018317]", "(-4634350246539782229,-3857615872770993871]", "(260474944050277681,679763209506391150]", "(1928397905685598346,2279437765047107097]", "(3762179910787726390,3796001557251052615]" ], "allocation_id": { "id": "dummy_alloc_id" } } ] } }, "snapshot_deletions": { "snapshot_deletions": [] }, "snapshots": { "snapshots": [] }, "restore": { "snapshots": [] } }
Here you get a multi-cloud Elassandra cluster running in multiple Kubernetes clusters. The Elassandra Operator gives you the flexibility to deploy on the cloud or on premise, in a public or private network. You can scale up/scale down, park/unpark your datacenters, you can loose a kubernetes node, a persistent volume or event a zone, the Elassandra datacenter remains up and running and you don’t have to manage any sync issue between your database and your Elasticsearch cluster.
In next the articles, we’ll see how the Elassandra Operator deploys Kibana for data visualisation and Cassandra Reaper to manage continuous Cassandra repairs. We’ll also see how to setup the Prometheus Operator with Grafana dashboards to monitor the Elassandra Operator, the Elassandra nodes and Kubernetes resources.
Have fun with this Elassandra Operator and thanks in advance for your feedbacks !