As I observe the current job market (yes even with COVID-19 lurking in my area), I have noticed that several tools keep showing up in the needs of hiring companies. The job post below from Glassdoor is a fantastic example of the most critically needed skills today.
• DevOps Development Frameworks: Experience working with multiple development frameworks and methodologies with the ability to identify opportunities for innovation in development teams and address identified problems. Experience working in SCRUM and AGILE environments highly desired. Experience leading development teams highly desired.
Note: Check boxes show what we cover in this tutorial.
DevOps Engineering Technical Skills:
- Source Code Version Control (e.g. Bitbucket, Git, Github, Subversion)✔
- Source Code Review/Quality (e.g. Coverity, Fortify, Crucible, Fisheye)✖
- Software Build Tools (e.g. Gradle, Apache Ant, Maven)✔
- Automation server (e.g. Bamboo, Jenkins)✔
- Automation testing (e.g. Selenium, Unit)✔
- Software Containerization (e.g. Docker)✔
- Container Orchestration (e.g. Kubernetes)✔
- Configuration Management and Deployment (e.g. Ansible, Puppet, Chef, SaltStack)✖
- Binary Artifact Repository (e.g. Artifactory, Nexus)✖
- Issue and Project Tracking (e.g. Jira, Bugzilla)✔
- System Monitoring (e.g. Splunk, Raygun, Nagios, ELK, Zabbix)✔
We are going to cover as many of these as we can, while keeping this free tier.
Let's make a plan!
We will be briefly planning out our project with JIRA and keep track of the tasks. Then we are going to create a local kubernetes cluster with Minikube that is going to have a web and a prometheus (with alert manager) deployment for monitoring. We will also set up Jenkins locally (not in kube for reasons that I will explain) and run selenium tests from there.
PREREQUISITES
- A Hypervisor (i.e. VirtualBox)
- Minikube & Kubectl
- python3
- jenkins
- docker
- An Atlassian JIRA Account
- A github account
- At least 8gb of RAM (unless you want to move very slowly)
- Slack (optional if you want to see your alerts in action)
I decided to make a Kops tutorial separate in the blogs, so check that out if you want to move away from the local cluster!
Part One: JIRA Project Tracking
This section should be quick, because the main goal here is exposure.
Once you are all set with your Atlassian account, start a new project in your JIRA Software and call it whatever you'd like. Drill down into that project and you will see a tab called "Kanban". This is the board that many agile companies use to track the components of a project like bugs, features, and forward thinking plans.
Let's create a few tasks for our project by clicking the "Create" button.
We will need to track our minikube cluster, web app, monitoring setup with prometheus, and our jenkins (albeit local).
see example below:
I am not going to go through each ticket, as time is valuable. But by looking at your board, you can see the flow.
backlog > selected for development > in progress > done
This is a common process for modern software development companies.
Part 2: Minikube Cluster
Let's start our minikube cluster on our local. Open a terminal and fire away:
minikube start
once it fires up, lets check the node with kubectl
kubectl get nodes
You should also now have a hidden folder available in your home dir called .kube. This will hold your kube config and auth information to your cluster.
Awesome, our cluster is ready. Let's get to work.
Note: I personally am organizing my repo in this way:
devops_02/
apps/
hello/
testing/
kubernetes/
alertmanager/
hello/
prometheus/
We will be adding files to those directories throughout the project.
Part 3: Hello-App (Prepped for Monitoring)
Navigate to devops_02/apps/
cd devops_02/apps/hello
Ok, so the first thing we want to stand up is our Hello-App. If you missed "Devops_01", we made a falcon app. We are going to reuse that app, but make an important change. We need to add a python library for prometheus exports and then add that middleware, as well as create a route for it.
~/devops_02/apps/hello/hello.py
import falcon
from falcon_prometheus import PrometheusMiddleware
class HelloResource(object):
def on_get(self, req, resp):
resp.status = falcon.HTTP_200
resp.body = ("Hello, World!")
class Page2Resource(object):
def on_get(self, req, resp):
resp.status = falcon.HTTP_200
resp.body = ("This is the second page!")
prometheus = PrometheusMiddleware()
app = falcon.API(middleware=prometheus)
hello = HelloResource()
page2 = Page2Resource()
app.add_route('/', hello)
app.add_route('/page2', page2)
app.add_route('/metrics', prometheus)
Now save that file and open a dockerfile and requirements.txt file.
Our requirements file needs the following:
falcon==2.0.0
gunicorn==20.0.4
falcon-prometheus==0.1.0
Our Dockerfile should look like this:
FROM python:3
RUN pip install --upgrade pip
WORKDIR /hello_world
COPY requirements.txt .
RUN pip3 install -r requirements.txt
COPY . .
CMD ["gunicorn", "-b", "0.0.0.0:8000", "hello:app"]
Now if you want you can build this image and push it to your repo or feel free to pull and use the image scottyfullstack/hello:latest.
That's all we need for the app, so lets get Kube prepped for each deployment.
Part 4: Kubernetes
If you haven't already create and cd into a kubernetes folder in your working directory
mkdir kubernetes; and cd kubernetes
Note: We are going to create a lot of these deployments and services using the kubernetes cli. However, we will store those yaml files in our project to show employers that we understand the purpose of each.
Start out creating a deployment for the hello app that we made in the last section, as well as exposing it as a service. The last command below gets the pods of the deployment.
kubectl create deployment hello --image=scottyfullstack/hello:latest
kubectl expose deployment hello --type=NodePort --port=8000
kubectl get pods
If you want to view this in your browser, open it with
minikube service hello
Just to make sure we have our exporter working for our hello app, navigate to the route /metrics and make sure the prometheus exporter is working. The html should return similar to below:
request_latency_seconds_bucket{le="0.005",method="GET",path="/",status="200 OK"} 1.0
request_latency_seconds_bucket{le="0.01",method="GET",path="/",status="200 OK"} 1.0
request_latency_seconds_bucket{le="0.025",method="GET",path="/",status="200 OK"} 1.0
request_latency_seconds_bucket{le="0.05",method="GET",path="/",status="200 OK"} 1.0
request_latency_seconds_bucket{le="0.075",method="GET",path="/",status="200 OK"} 1.0
request_latency_seconds_bucket{le="0.1",method="GET",path="/",status="200 OK"} 1.0
request_latency_seconds_bucket{le="0.25",method="GET",path="/",status="200 OK"} 1.0
request_latency_seconds_bucket{le="0.5",method="GET",path="/",status="200 OK"} 1.0
request_latency_seconds_bucket{le="0.75",method="GET",path="/",status="200 OK"} 1.0
request_latency_seconds_bucket{le="1.0",method="GET",path="/",status="200 OK"} 1.0
request_latency_seconds_bucket{le="2.5",method="GET",path="/",status="200 OK"} 1.0
request_latency_seconds_bucket{le="5.0",method="GET",path="/",status="200 OK"} 1.0
request_latency_seconds_bucket{le="7.5",method="GET",path="/",status="200 OK"} 1.0
request_latency_seconds_bucket{le="10.0",method="GET",path="/",status="200 OK"} 1.0
request_latency_seconds_bucket{le="+Inf",method="GET",path="/",status="200 OK"} 1.0
If you want to store this deployment yaml, perform the below steps within the kubernetes directory.
mkdir hello
cd hello
kubectl get deployment -o yaml > hello-deployment.yml
That wraps up the hello deployment and now we can move on to Prometheus for monitoring our hello application.
Part 5: Monitoring with Prometheus
This deployment will have several extra steps, as we need to give prometheus permission to monitor other pods in our cluster.
Checking out the Prometheus documentation, we know we will need a config file that prometheus will reference to know what to monitor. We will use a kubernetes configmap resource for our deployment to use.
Inside ~/devops_02/kubernetes/prometheus create a file called prometheus.yml and place the following inside. On the last line replace the text with the output of (without 'https://' ):
minikube service hello --url
# my global config
global:
scrape_interval: 1s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
- localhost:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- rules.yml
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:9090']
- job_name: 'Hello-app'
static_configs:
- targets: ['SERVICE ADDRESS AND PORT']
Create and check the configmap with:
kubectl create configmap prometheus --from-file prometheus.yml
kubectl get configmaps
We will be adding alertmanager to this in the next section.
Now create the deployment, expose it, and then pull the yaml into a file called deployment.yml:
kubectl create deployment prometheus --image=prom/prometheus
kubectl expose deployment prometheus --type=NodePort --port=9090
kubectl get deployment -o yaml prometheus > deployment.yml
We will be updating this deployment file, but we need a few things first. We must create the following to ensure that prometheus can communicate with the entire cluster.
- service account
- cluster role
- cluster role binding
I am not going to explain these, but check out the docs for more. Just copy the code below and add it to a new file called permissions.yml
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: prometheus-kube
rules:
- apiGroups:
- ""
resources:
- nodes
- nodes/proxy
- services
- endpoints
- pods
verbs:
- get
- list
- watch
- apiGroups:
- extensions
resources:
- ingresses
verbs:
- get
- list
- watch
- nonResourceURLs:
- /metrics
verbs:
- get
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus-kube
subjects:
- kind: ServiceAccount
name: prometheus-sa
namespace: default
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus-sa
namespace: default
Those are essentially three separate yaml configs in one file. So lets apply them:
kubectl apply -f permissions.yml
Now open up your deployment.yml file and add the service account information and volume for our configmap. Mine looks like the following (notice the second spec container for prometheus. There is now a service account name right aboce "container" and a volume added and referenced by the container):
apiVersion: v1
items:
- apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "1"
creationTimestamp: "2020-03-12T23:25:18Z"
generation: 1
labels:
app: prometheus
name: prometheus
namespace: default
resourceVersion: "61653"
selfLink: /apis/apps/v1/namespaces/default/deployments/prometheus
uid: 63d11ba2-4caf-45c5-953f-d404812aec9a
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: prometheus
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: prometheus
spec:
serviceAccountName: prometheus-sa
containers:
- image: scottyfullstack/prometheus:latest
imagePullPolicy: IfNotPresent
name: scottyfullstack
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- name: prometheus
mountPath: /etc/prometheus/
volumes:
- name: prometheus
configMap:
defaultMode: 420
name: prometheus
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
status:
availableReplicas: 1
conditions:
- lastTransitionTime: "2020-03-12T23:25:20Z"
lastUpdateTime: "2020-03-12T23:25:20Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
- lastTransitionTime: "2020-03-12T23:25:18Z"
lastUpdateTime: "2020-03-12T23:25:20Z"
message: ReplicaSet "prometheus-79d4cb85d5" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
observedGeneration: 1
readyReplicas: 1
replicas: 1
updatedReplicas: 1
kind: List
metadata:
resourceVersion: ""
selfLink: ""
At this point, you may get errors if you try to apply -f, so I am going to use the --force flag. This is not recommended in production. Always use caution when forcing. This is a learning demo and we will want to override this rather than delete and recreate the deployment.
kubectl apply -f deployment.yml --force
Now open prometheus in the browser and check the status of Hello
minikube service prometheus
Navigate to "Targets" and check to see if your service is being monitored and says UP
to test, jump over to your console and run
kubectl scale deployment hello --replicas=0
The status should now say DOWN
then you can scale it back up
kubectl scale deployment hello --replicas=1
Part 6: AlertManager
What good would just monitoring alone be, if the application does not alert us when a failure happens? That's where alertmanager comes in. In our case, we want alert manager to alert us when our hello application goes down.
Looking at the docs, we need rules set up for prometheus, as well as the alertmanager image running in tandem with our prometheus image (sharing the deployment).
lets add the rules to our configmap:
kubectl get configmap prometheus -o yaml > configmap.yml
open configmap.yml and add the rules.yml | section within the data segment (see below).
---
apiVersion: v1
data:
prometheus.yml: |-
# my global config
global:
scrape_interval: 1s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
- localhost:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- rules.yml
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:9090']
- job_name: 'Hello-app'
static_configs:
- targets: ['192.168.99.107:30538']
rules.yml: |-
groups:
- name: Instances
rules:
- alert: InstanceDown
# Condition for alerting
expr: up == 0
for: 1m
# Annotation - additional informational labels to store more information
annotations:
title: 'Instance {{ $labels.instance }} down'
description: '{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minute.'
# Labels - additional labels to be attached to the alert
labels:
severity: 'critical'
kind: ConfigMap
metadata:
creationTimestamp: "2020-03-21T21:29:28Z"
name: prometheus
namespace: default
resourceVersion: "36722"
selfLink: /api/v1/namespaces/default/configmaps/prometheus
uid: 9e942b69-48dc-4ceb-9015-40672f35b45b
Then delete the old configmap and apply it
kubectl delete cm prometheus
kubectl apply -f configmap.yml
Finally, let's create the alertmanager configmap and add the alertmanager image (and volumes) to our deployment.yml:
open a new file called alertmanager.yml from within our alert manager directory
For this step you will need to have a slack account with admin access. Install the 'Incoming Webhook' app and then generate a webhook url in the slack portal. Replace 'Webhook URL' with your slack webhook url. Also, I have created a channel called #alerts for AlertManager to send messages to.
#global config
global:
resolve_timeout: 30s
slack_api_url: 'Webhook Url'
route:
receiver: 'slack-notifications'
receivers:
- name: 'slack-notifications'
slack_configs:
- channel: '#alerts'
send_resolved: true
Save it and create the configmap
kubectl create cm alertmanager --from-file=alertmanager.yml
Open ~/devops_02/kubernetes/prometheus/deployment.yml and make sure your spec looks like this (notice the alertmanager container and volume mounts)
spec:
serviceAccountName: prometheus-sa
containers:
- image: prom/prometheus
imagePullPolicy: IfNotPresent
name: scottyfullstack
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- name: prometheus
mountPath: /etc/prometheus/
- image: prom/alertmanager
imagePullPolicy: IfNotPresent
name: alertmanager
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- name: alertmanager
mountPath: /etc/alertmanager/
volumes:
- name: prometheus
configMap:
defaultMode: 420
name: prometheus
- name: alertmanager
configMap:
defaultMode: 420
name: alertmanager
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
Save it and apply it
If done correctly, your prometheus deployment should be standing with both containers in the pod.
kubectl get po
NAME READY STATUS RESTARTS AGE
hello-d445b4cc9-4lq9q 1/1 Running 2 25h
prometheus-cb6576576-7mdmj 2/2 Running 0 8m3s
Go ahead and test your alerts by scaling the hello deployment down
kubectl scale deployment hello --replicas=0
And we are done with the monitoring portion! Be sure to bring it back up for the next session
kubectl scale deployment hello --replicas=1
Part 7: Jenkins and Selenium Testing
For this part I installed Jenkins to my local machine and will be using python selenium to run my test cases with chromedriver.
Go ahead and set up your jenkins and log in (checkout devops_01 for initial setup information if needed)
Now lets jump back into our directory and if you haven't already
mkdir testing
Open a new file called hello.py
Replace the driver URL with your URL from
minikube service hello
~/devops_02/testing/hello.py
#!/usr/bin/python3
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome('./testing/chromedriver', options=chrome_options)
driver.get("Hello Svc URL")
hello_text = driver.find_element(By.XPATH, "//*[text()='Hello, World!']")
print('Success: {}'.format(hello_text))
driver.close()
We also need a requirements.txt file that includes
selenium==3.141.0
Then download chromedriver and place it in testing
Add and commit your repo and then push it. Then head over to your jenkins and create a new freestyle project.
For source code use 'GIT' and add your git creds if you haven't already.
Then under 'Build' select 'Execute Shell' and paste the following
#!/bin/bash
echo '#### Install requirements ####'
pip3 install -r ./testing/requirements.txt
echo '#### Run tests ####'
python3 ./testing/hello_test.py
Save that and run it! You should see it as a success. If you brought down the hello deployment, it would fail.
I know this was a long post, but I think it will be valuable to you going forward in your learning process. If you want some quick maven experience, check out Maven in 5 Minutes.
Thanks, and good luck!