Building a Self-Hosted Analytics System for My Personal Website

I wanted to know when someone visits my website. Not in the aggregate, pageviews-per-month way that Cloudflare Analytics shows you, but in a “someone just opened my resume from LinkedIn” way.

The problem with Cloudflare’s built-in analytics is that most of the traffic is bots and scrapers. I couldn’t tell the difference between a real person and a crawler. I wanted something that would:

Only track real human visitors (filter out bots)
Tell me where they’re from (city, country)
Notify me instantly on Telegram
Show me what pages they visited, from where, on what device
Not slow down the site at all
Cost nothing

So I built it myself.

The architecture

Browser → Cloudflare Pages Function → Cloudflare Tunnel → Go service → PostgreSQL
                                                              ↓
                                                          Telegram
                                                          Grafana

The site is a static Hugo site hosted on Cloudflare Pages. The analytics pipeline works like this:

A tiny inline script (~500 bytes) fires a navigator.sendBeacon() POST on the first user interaction (scroll, click, touch)
A Cloudflare Pages Function receives it, attaches the visitor’s real IP and country from CF headers
The function forwards the event through a Cloudflare Tunnel to a Go service running on my spare Ubuntu laptop at home
The Go service parses the user agent, does a geo-IP lookup for city-level location, stores everything in PostgreSQL, and sends me a Telegram notification if it’s a new visitor

No cookies. No fingerprinting. No third-party scripts. The beacon is same-origin, so ad-blockers mostly ignore it.

Bot filtering: keep it simple

The initial bot filtering strategy was dead simple: if JavaScript executed and sent the beacon, it’s probably a human. Bots and scrapers that don’t run JS never fire the beacon.

This is the inverse of most analytics tools. Instead of trying to detect and block bots from a firehose of traffic, I only see events from browsers that ran my script. It worked well as a first pass, but headless Chrome bots eventually slipped through — more on that below.

The beacon

The client-side code is minimal. It lives in a Hugo partial that only loads in production:

{{- if hugo.IsProduction | or (eq site.Params.env "production") }}
<script>
(function(){
  if(!navigator.sendBeacon)return;
  navigator.sendBeacon("/api/ping",JSON.stringify({
    p:location.pathname,
    r:document.referrer,
    sw:screen.width,
    sh:screen.height,
    ua:navigator.userAgent,
    t:Date.now()
  }));
})();
</script>
{{- end }}

sendBeacon is non-blocking by design. It fires and forgets, with zero impact on page load or interaction. The payload is ~150 bytes of JSON.

The Cloudflare Pages Function

The Pages Function at /api/ping does the server-side enrichment. The browser can’t tell you a visitor’s IP address, but Cloudflare can via the CF-Connecting-IP header. The function merges the client payload with server-side metadata and forwards it:

export async function onRequestPost(context) {
  const { request, env } = context;
  const body = await request.json();

  const event = {
    path: String(body.p || "/"),
    referrer: String(body.r || ""),
    screen_width: parseInt(body.sw) || 0,
    screen_height: parseInt(body.sh) || 0,
    user_agent: String(body.ua || ""),
    ip: request.headers.get("CF-Connecting-IP") || "",
    country: request.headers.get("CF-IPCountry") || "",
    timestamp: new Date().toISOString(),
  };

  // Forward to analytics server, buffer in KV if it's down
  try {
    const resp = await fetch(`${env.ANALYTICS_SERVER_URL}/api/event`, {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify(event),
    });
    if (resp.ok) return new Response(null, { status: 204 });
  } catch {}

  // Server down — buffer in CF KV
  if (env.ANALYTICS_BUFFER) {
    const key = `evt:${Date.now()}:${Math.random().toString(36).slice(2, 8)}`;
    await env.ANALYTICS_BUFFER.put(key, JSON.stringify(event), {
      expirationTtl: 604800,
    });
  }
  return new Response(null, { status: 202 });
}

If my laptop is offline, events get buffered in Cloudflare KV with a 7-day TTL. A cron worker drains them when the server comes back.

The server: no open ports

The Go analytics service runs on an Ubuntu laptop at home inside a Podman container. The key constraint: I didn’t want to open any ports on my home network.

Cloudflare Tunnel solves this. The cloudflared daemon runs in a container, creates an outbound-only encrypted tunnel to Cloudflare’s edge, and routes analytics-internal.yashk.net to the Go service. My laptop’s IP is never exposed.

The whole stack runs as a Podman Compose setup:

Go analytics service — receives events, parses user agents, does MaxMind geo-IP lookups, sends Telegram notifications
PostgreSQL — stores all events
Grafana — dashboards, accessible only via SSH tunnel
cloudflared — the tunnel, no ports needed

Geo-IP: city-level location for free

MaxMind’s GeoLite2-City database is free (you just need to sign up for a license key). The Go service loads the ~70MB database at startup and does in-memory lookups. For a given IP, I get city, country, latitude/longitude, and ISP.

This means my Telegram notifications look like:

New visitor on yashk.net

Page: /resume
Location: Mumbai, Maharashtra, India
Referrer: linkedin.com
Device: Chrome 120 / macOS (Desktop)
Time: 14:32 IST

If someone from a company opens my resume from LinkedIn, I know about it within seconds.

Grafana dashboards

Grafana connects directly to PostgreSQL and gives me:

Visitors over time
Top pages and referrers
Geographic distribution
Browser and OS breakdown
A table of recent visitors with full details

I access it via SSH tunnel — it’s never exposed to the internet.

Lessons learned

A few things I hit during implementation that might save you time:

Cloudflare reserves /cdn-cgi/. I originally used /cdn-cgi/beacon as the beacon path since it looks like a native CF endpoint. Turns out CF blocks Pages Functions under that path entirely. Switched to /api/ping.

Pages Functions != Workers. The wrangler.toml [vars] and [[kv_namespaces]] sections only apply to Workers, not Pages Functions. For Pages, you set environment variables and KV bindings in the CF dashboard under your Pages project settings.

Podman networking is different from Docker. Container name resolution works via network aliases, not container names. In a compose file, the service name (analytics) is the alias, not the container_name (analytics-service). The CF tunnel needed to route to http://analytics:8080, not http://analytics-service:8080.

Podman needs fully-qualified image names. postgres:16-alpine doesn’t resolve — you need docker.io/library/postgres:16-alpine. Unlike Docker, Podman doesn’t default to Docker Hub without explicit registry configuration.

Grafana provisioning can’t interpolate secrets. You can use ${ENV_VAR} in Grafana provisioning YAML for some fields, but secureJsonData (like database passwords) doesn’t support it. I ended up configuring the PostgreSQL datasource manually through the Grafana UI.

Restart all containers together. With Podman Compose, if you restart just one container (like the tunnel), it can get a new IP on a different network segment than the other containers. Always podman compose down && podman compose up -d to keep everyone on the same network.

Update: the bots got smarter

After running the system for about 10 days, I pulled the raw data from PostgreSQL to see how things were going. 26 events total. I expected most of them to be real visitors — after all, the whole point of the JS-execution filter was to keep bots out.

Turns out, 7 of the 26 events (27%) were bots.

What slipped through

IP	Signal	What it was
104.197.69.115	Google Cloud IP, Chrome 125 (outdated), 800×600	Datacenter bot
205.169.39.14	Chrome 117, 800×600, referrer “bing.com”	Crawler
205.169.39.23	Same /24 subnet, identical fingerprint	Crawler
205.169.39.85	Same /24 subnet, Chrome 79, 1024×1024	Crawler
40.77.177.109	UA literally says `bingbot/2.0`	Bingbot
185.241.208.176	`HeadlessChrome` in UA, Tor exit node	Headless scraper
146.112.163.52	Reston, VA — Cisco Umbrella/OpenDNS IP	URL safety scanner

The pattern was clear. These bots all execute JavaScript (that’s how they got past the original filter), but they share telltale signs:

Default headless viewport sizes: 800×600, 1024×1024 — no real person uses these
Outdated Chrome versions: 79, 117, 125 — real browsers auto-update, bots freeze on old versions
Datacenter IPs: Google Cloud, Microsoft, Cisco — not residential ISPs
No city resolved: geo-IP returned empty or generic datacenter locations

The JS-execution filter was a good first pass, but “can run JavaScript” is too low a bar now that headless Chrome is everywhere.

The fix: interaction-based filtering

The insight is simple: bots load pages and execute scripts, but they don’t scroll, move a mouse, or tap. Real humans do.

I changed the beacon from firing on page load to firing on the first user interaction:

<script>
(function(){
  if(!navigator.sendBeacon)return;
  var sent=false;
  function ping(){
    if(sent)return;
    sent=true;
    navigator.sendBeacon("/api/ping",JSON.stringify({
      p:location.pathname,
      r:document.referrer,
      sw:screen.width,
      sh:screen.height,
      ua:navigator.userAgent,
      t:Date.now()
    }));
  }
  var ev=["scroll","mousemove","touchstart","click","keydown"];
  ev.forEach(function(e){
    document.addEventListener(e,ping,{once:true,passive:true});
  });
})();
</script>

The beacon now waits for scroll, mousemove, touchstart, click, or keydown before firing. The {once: true} option auto-removes each listener after the first trigger, and the sent flag ensures only one beacon fires per pageview.

This covers all real usage patterns:

Desktop: mouse move or scroll
Mobile browser: touch to scroll or tap
In-app webviews (Instagram, Twitter, LinkedIn): touch events fire the same way
Keyboard navigation: keydown catches it

The tradeoff is that visitors who land and leave without any interaction won’t be counted. On a personal site, this is negligible — almost everyone at least scrolls.

Every single bot from the data above would have been filtered out by this change. None of them simulate real user interactions.

What’s next

I might also add reverse DNS lookups to identify company visitors by their IP ranges. Knowing that a visit came from a corporate network is more useful than just seeing a city name.

The stack

Component	Technology	Cost
Static site	Hugo + PaperMod	Free (CF Pages)
Beacon	Inline JS, ~500 bytes	Free
Edge function	CF Pages Function	Free tier
Event buffer	CF KV	Free tier
Tunnel	Cloudflare Tunnel	Free
Analytics service	Go	Self-hosted
Database	PostgreSQL 16	Self-hosted
Geo-IP	MaxMind GeoLite2	Free
Dashboards	Grafana OSS	Self-hosted
Notifications	Telegram Bot API	Free
Container runtime	Podman Compose	Free

Total cost: $0/month. Total infrastructure: one spare laptop.

The architecture#

Bot filtering: keep it simple#

The beacon#

The Cloudflare Pages Function#

The server: no open ports#

Geo-IP: city-level location for free#

Grafana dashboards#

Lessons learned#

Update: the bots got smarter#

What slipped through#

The fix: interaction-based filtering#

What’s next#

The stack#