Why We Use ClickHouse for Analytics

PostgreSQL can do almost anything. It's our primary database for users, links, and settings. We love it.

But when we started tracking click analytics—every redirect, with timestamp, referrer, country, device, browser—PostgreSQL started struggling. Not because it's bad. Because we were using it wrong.

The Problem: Analytics at Scale

A URL shortener generates a lot of events. Every click is a row:

INSERT INTO clicks (
  link_id, clicked_at, referrer, country,
  city, device, browser, os
) VALUES (...);

At 1,000 clicks per day, PostgreSQL handles this effortlessly. At 100,000 clicks per day, still fine. At 1,000,000 clicks per day? Things get interesting.

The writes weren't the problem. PostgreSQL can ingest data fast enough.

The reads were the problem.

When SELECT Becomes Slow

Our analytics dashboard needs to answer questions like:

"How many clicks did this link get in the last 30 days?"
"What are the top 10 referrers this week?"
"Show me clicks by country for the last 90 days"
"What's the hourly click pattern for this campaign?"

In SQL, these are aggregation queries:

SELECT DATE(clicked_at), COUNT(*)
FROM clicks
WHERE link_id = ? AND clicked_at > NOW() - INTERVAL '30 days'
GROUP BY DATE(clicked_at);

With proper indexes, PostgreSQL runs this in milliseconds for small datasets.

But analytics queries scan lots of rows. A popular link might have millions of clicks. A campaign report might aggregate across hundreds of links. The "last 90 days" query touches millions of rows.

PostgreSQL is row-oriented. To count clicks, it reads entire rows—even though we only need two columns. With wide rows (we store 15+ fields per click), that's a lot of wasted I/O.

We added indexes. We added materialized views. We partitioned by date. It helped, but the fundamental problem remained: PostgreSQL wasn't designed for this workload.

OLTP vs OLAP

There are two broad categories of database workloads:

OLTP (Online Transaction Processing): Many small reads and writes. "Get user by ID." "Insert new link." "Update click count." PostgreSQL excels here.

OLAP (Online Analytical Processing): Few large reads across many rows. "Count all clicks by country." "Calculate average session duration." "Find top performers." This is where PostgreSQL struggles.

We were running OLAP queries on an OLTP database. It's like using a sports car to haul freight—technically possible, but there are better tools.

Why ClickHouse?

We evaluated several options:

TimescaleDB: PostgreSQL extension for time-series. Good, but still row-oriented under the hood. Didn't solve our core problem.

Elasticsearch: Powerful, but overkill. We needed aggregations, not full-text search. The operational complexity wasn't worth it.

BigQuery/Redshift: Cloud-only, pay-per-query pricing. Great for occasional analytics, expensive for real-time dashboards.

ClickHouse: Open source, self-hosted, designed specifically for analytics. Column-oriented storage. Stupid fast aggregations.

We tried ClickHouse and the difference was immediate.

Column-Oriented Storage

The key insight is how data is stored on disk.

Row-oriented (PostgreSQL):

Row 1: [link_id, clicked_at, referrer, country, device, ...]
Row 2: [link_id, clicked_at, referrer, country, device, ...]
Row 3: [link_id, clicked_at, referrer, country, device, ...]

To count clicks by date, PostgreSQL reads all columns for every row, then discards most of them.

Column-oriented (ClickHouse):

link_id column:    [1, 1, 2, 1, 3, ...]
clicked_at column: [2025-01-01, 2025-01-01, 2025-01-02, ...]
referrer column:   [google.com, twitter.com, direct, ...]

To count clicks by date, ClickHouse reads only the clicked_at column. The other columns stay on disk.

For analytics queries that touch few columns but many rows, this is dramatically more efficient.

Compression Matters

Column-oriented storage also compresses better.

A column of countries contains repetitive data: "US", "US", "UK", "US", "DE", "US"... ClickHouse compresses this aggressively. Our click data compresses to roughly 10x smaller than the equivalent PostgreSQL table.

Less data on disk means less data to read. Less data to read means faster queries.

What We Actually Use

Our setup is simple:

PostgreSQL: Users, links, settings, billing—anything transactional
ClickHouse: Click events, analytics aggregations—anything analytical
Redis: Caching, rate limiting, sessions—anything ephemeral

When a redirect happens:

Redis checks the cache for the target URL
If miss, PostgreSQL looks up the link
ClickHouse receives an async insert for the click event

The redirect completes before the ClickHouse insert. Analytics are eventually consistent, but redirects are fast.

The Schema

Our ClickHouse table is straightforward:

CREATE TABLE clicks (
  link_id UInt64,
  clicked_at DateTime,
  referrer String,
  country LowCardinality(String),
  city String,
  device LowCardinality(String),
  browser LowCardinality(String),
  os LowCardinality(String)
) ENGINE = MergeTree()
ORDER BY (link_id, clicked_at);

A few things to note:

LowCardinality: For columns with few unique values (country, device, browser), this stores an encoded dictionary instead of raw strings. Faster and smaller.

MergeTree: ClickHouse's default table engine. Handles merging, compression, and indexing automatically.

ORDER BY: The primary sorting key. Queries filtering by link_id and clicked_at are extremely fast because the data is physically sorted this way.

Queries That Just Work

The queries that struggled in PostgreSQL are trivial in ClickHouse:

-- Clicks per day for a link (last 30 days)
SELECT toDate(clicked_at) as date, count() as clicks
FROM clicks
WHERE link_id = 12345
  AND clicked_at > now() - INTERVAL 30 DAY
GROUP BY date
ORDER BY date;

-- Top referrers across all links
SELECT referrer, count() as clicks
FROM clicks
WHERE clicked_at > now() - INTERVAL 7 DAY
GROUP BY referrer
ORDER BY clicks DESC
LIMIT 10;

-- Geographic distribution
SELECT country, count() as clicks
FROM clicks
WHERE link_id = 12345
GROUP BY country
ORDER BY clicks DESC;

These run in milliseconds, even on millions of rows.

The Trade-offs

ClickHouse isn't a PostgreSQL replacement. It's a specialized tool with real limitations:

No transactions: You can't UPDATE a row. You append new data and query across versions. This is fine for event logs, wrong for user accounts.

No foreign keys: ClickHouse doesn't enforce referential integrity. Your application handles that.

Eventually consistent: Writes are batched and merged asynchronously. You might not see your insert immediately.

Different SQL: It's SQL, but not exactly PostgreSQL SQL. Some functions are different, some features are missing.

Operational complexity: Another database to run, backup, and monitor. We use the official Docker image and it's been stable, but it's still another thing.

When to Use ClickHouse

Consider ClickHouse when:

You're aggregating millions of rows regularly
Your queries touch few columns but many rows
Data is append-only (events, logs, metrics)
You need real-time analytics, not batch jobs
You're okay with eventual consistency for analytics

Stick with PostgreSQL when:

You need transactions and referential integrity
Queries touch many columns but few rows
You need to UPDATE and DELETE frequently
Your data is relational, not event-based

The Honest Assessment

Adding ClickHouse was the right call for us. Analytics queries that took seconds now take milliseconds. The dashboard feels instant.

But it added complexity. We now maintain two databases with different backup strategies, different query patterns, and different failure modes. For a smaller project, the complexity might not be worth it.

If you're under a million events per day and can tolerate slower dashboards, PostgreSQL with proper indexes and maybe some materialized views is probably fine. Don't add complexity you don't need.

If you're at scale and analytics performance matters, ClickHouse is worth the investment. Just understand what you're getting into.

Our analytics dashboard shows real-time click data across all your links. Try QCK and see the speed for yourself.