A centralized model for Google Managed Prometheus metrics collection

Published in

The SADA Engineering Blog

6 min readJul 19, 2023

Author: Charmy Thakkar, Solutions Engineer, SADA

Introduction

Prometheus is an open source monitoring and alerting tool that constantly monitors your workloads by collecting metrics and scraping HTTP endpoints from the target. Metrics collected are stored as time series data. PromQL queries are used to fetch data from the time series database.

Google Cloud Managed Service for Prometheus is a fully managed, multi-cloud, cross-project solution for Prometheus metrics. It lets you globally monitor and alert on your workloads, using Prometheus, without having to manually manage and operate Prometheus at scale. Managed Service for Prometheus is built on top of Monarch, the same globally scalable data store used for Google’s own monitoring.

Data collection

You can use Managed Service for Prometheus in one of four modes: managed data collection, self-deployed data collection, the OpenTelemetry Collector, or the Ops Agent.

With self-deployed data collection, you manage your Prometheus installation as you always have. The only difference from upstream Prometheus is that you run the Managed Service for Prometheus using Google’s prometheus binary instead of the upstream Prometheus binary which we will discuss further.

In this guide, we will walk through the steps to set up Google Managed Prometheus to scrape external targets in a shared VPC environment using self-deployed data collection. We will deploy a Flask application on a Compute Engine instance, which will generate sample metrics. These metrics will then be deployed to GMP (Google Managed Prometheus), where we will be able to visualize and analyze them.

Setting up self-deployed collection

Prerequisites: Start a GKE cluster

Access your GKE cluster using cloud shell and then follow the below steps.

Step 1: Clone the official kube-prometheus repo:

git clone https://github.com/prometheus-operator/kube-prometheus.git 
&& 
cd kube-prometheus

Step 2: Create the monitoring stack using the config in the manifests directory. When we run the following commands, it creates the monitoring namespace and custom resource definitions.

kubectl apply - server-side -f manifests/setup
kubectl wait \
 - for condition=Established \
 - all CustomResourceDefinition \
 - namespace=monitoring

To use Google’s managed Prometheus service, we need to use the binary provided by Google instead of the upstream Prometheus binary. We do that by editing the prometheus-prometheus.yaml file as shown below.

nano manifests/prometheus-prometheus.yaml

Replace image specification and set the replica to 1:

externalLabels: {}
 #image: quay.io/prometheus/prometheus:v2.43.1
 image: gke.gcr.io/prometheus-engine/prometheus:v2.35.0-gmp.2-gke.0
 nodeSelector:
 kubernetes.io/os: linux
 podMetadata:
 labels:
 app.kubernetes.io/component: prometheus
 app.kubernetes.io/instance: k8s
 app.kubernetes.io/name: prometheus
 app.kubernetes.io/part-of: kube-prometheus
 app.kubernetes.io/version: 2.43.1
 podMonitorNamespaceSelector: {}
 podMonitorSelector: {}
 probeNamespaceSelector: {}
 probeSelector: {}
 replicas: 1

Next, deploy the Prometheus stack by running the following command:

kubectl apply -f manifests/

Step 3: Run the following command to verify Prometheus is deployed correctly:

charmy_thakkar@cloudshell:~/GMP (tcharmy-sandbox)$ kubectl get pods -n monitoring
NAME READY STATUS RESTARTS AGE
alertmanager-main-0 2/2 Running 0 5d1h
alertmanager-main-1 2/2 Running 0 5d1h
alertmanager-main-2 2/2 Running 0 5d1h
blackbox-exporter-69f4d86566-nbw9m 3/3 Running 0 5d1h
grafana-79cd8d4b69-p7gx5 1/1 Running 0 5d1h
kube-state-metrics-56f8746666-bvt7q 3/3 Running 0 5d1h
prometheus-adapter-77f56b865b-kqvrp 1/1 Running 0 5d1h
prometheus-k8s-0 2/2 Running 0 4d23h
prometheus-operator-6bc5dc864-h9kb7 2/2 Running 0 4d23h
charmy_thakkar@cloudshell:~/GMP (tcharmy-sandbox)$

Step 4: To decouple the responsibilities of different teams and simulate how this would look in production, I provisioned this using Shared VPCs. As seen below, create a sample Flask application from the service project to the host project.

Provision Shared VPC:

Shared instance:

Sample app.py file:

from flask import Flask
from prometheus_client import generate_latest, Gauge
import random
app = Flask(__name__)
cpu_usage = Gauge('cpu_usage', 'Current CPU Usage')
mem_usage = Gauge('memory_usage', 'Current memory usage')
@app.route('/')
def main():
 return "Hello world!"
@app.route('/metrics')
def metrics():
 cpu_usage.set(random.uniform(0.0, 100.0))
 mem_usage.set(random.uniform(0.0, 100.0))
 return generate_latest()
if __name__ == '__main__':
 app.run(host='0.0.0.0', port=8080)

To run this application on your instance:

$ python3 app.py

Now, we can collect the sample metrics by making an HTTP request to the shared instance IP as shown below:

The application would not be accessible from the shell directly because it was deployed inside a VPC. However, the GKE nodes can access the application via HTTP because both the GKE cluster nodes and the application instance are in the same VPC. This is necessary because Prometheus is running in GKE and needs to be able to reach the target endpoint to scrape metrics.

Step 5: Next, to deploy a self-deployed collector in a GMP cluster, we will use a ServiceMonitor custom resource. The key is to create the service.yaml and endpoint.yaml files that point to the external URL/endpoint.

To monitor objects outside the Kubernetes cluster, we define service.yaml as below:

Service.yaml

apiVersion: v1
kind: Service
metadata:
 name: pstest2-application
 namespace: monitoring
 labels:
 app: pstest2-application
spec:
 ports:
 - name: metrics
 port: 8080
 protocol: TCP
 targetPort: 8080

Typically, Kubernetes will automatically create the endpoint.yaml file when a pod is within the cluster by looking at the label defined in service.yaml. However, since the target in this case is external to the cluster, we will need to manually define and create the endpoint.yaml file ourselves. The following code shows how to do this.

Endpoint.yaml (Destination IP and port number)

apiVersion: v1
kind: Endpoints
metadata:
 name: pstest2-application
 namespace: monitoring
subsets:
 - addresses:
 - ip: 10.138.0.3 # Target IP address
 ports:
 - name: metrics
 port: 8080
 protocol: TCP

NOTE: Endpoint name and Service name should be exactly the same under the same namespace.

To describe the service:

charmy_thakkar@cloudshell:~$ kubectl describe service -n monitoring pstest2-application
Name: pstest2-application
Namespace: monitoring
Labels: app=pstest2-application
Annotations: cloud.google.com/neg: {"ingress":true}
Selector: <none>
Type: ClusterIP
IP Family Policy: SingleStack
IP Families: IPv4
IP: 10.8.9.230 #Internal Service IP (local to the k8s cluster)
IPs: 10.8.9.230
Port: metrics 8080/TCP
TargetPort: 8080/TCP
Endpoints: 10.138.0.3:8080 #External IP of the endpoint
Session Affinity: None
Events: <none>
charmy_thakkar@cloudshell:~$

Finally, to enable metrics scraping, we will create a ServiceMonitor. This will instruct Prometheus to start monitoring the service pstest2-application.

ServiceMonitor.yaml

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
 name: pstest2-application
 namespace: default
 labels:
 team: frontend # This is the default label used by kube-prometheus. 
spec:
 endpoints:
 - port: metrics
 interval: 15s
 path: /metrics $URI path where metrics are available to fetch
 selector:
 matchLabels:
 app: pstest2-application #label to identify the service that needs to be monitored 
 jobLabel: pstest-application
 namespaceSelector:
 matchNames:
 - monitoring

Step 6: To check if Prometheus is able to fetch metrics from the external target, we can go to the Monitoring console, select Metrics Explorer on the left tab, and then add PromQL queries in the code editor.

Important note:

According to GitHub issues [1], [2], and [3], it seems that you cannot scrape a service with its FQDN that is outside of the Kubernetes cluster. However, a feature request for this has already been implemented. The feature was released three weeks ago with the introduction of a new CRD (custom resource definition) called ScrapeConfig. This allows us to simply add static configurations to Prometheus as scrapeconfig objects. However, the latest Google Prometheus image (at the time of writing this document) does not yet support this feature, as it is still in alpha phase. Once this feature is available in GMP, you will no longer need to define a service/endpoint to scrape external endpoints. Simply adding a ScrapeConfig object will do the trick.

References:

The SADA Engineering Blog

A centralized model for Google Managed Prometheus metrics collection

Published in The SADA Engineering Blog

Written by SADA

No responses yet