Skip to main content

Grafana Cloud

Grafana Cloud is a hosted version of Grafana, Prometheus, Loki, and Tempo.

If you prefer this observability stack, now you don't have to host and maintain it yourself anymore. But self-hosting comes at a cost. Providing a service with good uptime, retention, and durability has its problems that need solving.

tip

If you decide to self-host, it is still beneficial reading this chapter. Even though it is about setting up Grafana Cloud, the pricing model of Grafana Cloud forces you to be diligent with your metrics: store only a subset of all available metrics with emphasis on low cardinality. An exercise that you have to do eventually with your self-hosted stack as well to be able to provide a reliable service.

Shipping metrics

Grafana Cloud has a mostly self-explanatory setup. You have to install Prometheus, Loki, and Tempo on your cluster in shipping configuration. You will have a Prometheus running in your cluster that scrapes all metrics, but you will also have a remote_write configuration. After scraping, a subset of metrics will be forwarded to Grafana Cloud to benefit from storage, dashboards, and retention. See the setup guide (here)(https://grafana.com/docs/grafana-cloud/metrics-prometheus/).

The same is true for Loki.

Alternatively, you can use the Grafana Agent project that is based on the open-source Prometheus and Loki projects, factored into a small package that contains the metric and log shipping parts.

If you chose Gimlet Stack as the installation method, it has a preconfigured Grafana Cloud integration with pruned metrics and logs.

Day-two operations

Billing alerts

When you use Grafana Cloud, you should always set billing alerts.

The built-in Grafana Cloud Billing dashboard allows you to track your usage. Make a copy of this dashboard, and set alerts for the total billable logs and metrics series.

On the included quota

Depending on your package, Grafana Cloud includes:

  • 100GB logs per month
  • 15000 metrics

The logging quota is fairly straightforward, but the metrics quota is not so self-evident.

  • Most off-the-shelf exporters push you over the 15K limit
  • To only ship metrics you use in your dashboards, put them on an allow list. See how.
  • Cloud billing is a dark art, learn how Grafana bills.

On the cost of Histogram metrics

Histogram metrics weigh heavier than other metrics. Each distinct label variation counts as a metric series.

If you have 3 labels with 10 different values each, that is 10x10x10 = 1000 metrics. So be careful with the number of different values you have per label.

This is especially true for histograms, as they have buckets (10 by default), and a histogram coming from a server/pod/thread counts as 10 metrics.

If you have 10 buckets and 10 workers, it is 100 metrics coming from a single metric line in code.

To identify the largest metrics you have, you can run

topk(10, count by (__name__)({__name__=~".+"}))

The top metric for me had 672 metric series.

Querying the metric, I could see that there are only a couple of labels: cluster, job, le, albeit rather high cardinality.

  • count(count by (le) (image_process_time_bucket)) shows 21 buckets
  • count(count by (job) (image_process_time_bucket)) 31 distinct jobs
  • count(count by (cluster) (image_process_time_bucket)) from 2 clusters

Since I pay $16 for 1000 series, this single metric (that is in code) costs $5 a month. High cardinality histograms are rather expensive.

tip

See how to analyze metrics cardinality

Grafana Cloud also includes a dashboard for cardinality analysis.