TL;DR
A WireGuard mesh is a network where any peer can establish a direct, encrypted tunnel to any other peer. Building one by hand with static configuration files works for a handful of peers and breaks at about ten, because peer configuration grows as O(N²) and NAT traversal requires real-time signalling. A coordination server solves both problems: it distributes peer information and orchestrates NAT traversal. Most mesh VPN products (Tailscale, NetBird, QuickZTNA, Headscale + Tailscale clients) differ in the coordination server, not in the underlying WireGuard data plane. This post explains what breaks at scale, how coordination servers solve it, and how to pick between DIY and a product.
Who this is for
Engineers running WireGuard deployments who are hitting the config-file scaling wall. Homelab builders wondering whether to roll their own or buy a product. Platform teams evaluating the WireGuard-based mesh VPN category.
1. What a WireGuard mesh actually is
Vanilla WireGuard is a point-to-point protocol. Two peers have public keys, exchange them out-of-band, configure each other as peers, and a tunnel comes up.
A mesh extends this: every peer can potentially tunnel to every other peer. “Potentially” because in practice, most traffic is between a small subset of peers; tunnels are typically established on demand and torn down when idle.
The mesh property matters because it eliminates the concentrator bottleneck of hub-and-spoke VPNs. Two engineers on laptops in the same building, on the same mesh, communicate directly through their local network — not through a concentrator halfway across the world.
2. The O(N²) config problem
Bare WireGuard has no peer discovery. Each peer must be told about every other peer it might want to reach.
A peer configuration looks like this:
[Interface]
PrivateKey = (this peer's private key)
Address = 10.0.0.1/32
ListenPort = 51820
[Peer]
PublicKey = (peer 2's public key)
AllowedIPs = 10.0.0.2/32
Endpoint = peer2.example.com:51820
[Peer]
PublicKey = (peer 3's public key)
AllowedIPs = 10.0.0.3/32
Endpoint = peer3.example.com:51820
...
For N peers, each peer needs (N-1) Peer blocks. Total config lines scale as N². At N=10, manageable. At N=50, painful. At N=100, infeasible.
Worse: every time you add a peer, you must update N-1 existing configs. Every time a peer’s endpoint changes (new IP, different network), you must update N-1 configs. Every time a peer’s public key rotates, same.
Static configuration breaks at the first IP change. Coordination servers solve this by distributing current peer information on demand.
3. The NAT traversal problem
Most devices are behind NAT. A laptop on home Wi-Fi has a private IP. A phone on mobile data has a carrier-grade NAT IP that is not reachable from the outside without a punch.
Classical VPN hub-and-spoke avoids the problem: peers only need to reach the hub, which has a public IP, and the hub forwards packets. Mesh needs peers to reach each other directly.
NAT traversal has three techniques:
- STUN. RFC 5389 defines a protocol for discovering your public endpoint as seen by an external server. Both peers query STUN, learn their public endpoints, then exchange those through some other channel (the coordination server).
- UDP hole punching. Both peers simultaneously send packets to the other’s discovered public endpoint. The NAT mapping opens when the outbound packet goes out; the inbound packet arrives at the open hole. Works for most NATs; fails for “symmetric NAT” which varies the mapping per destination.
- Relay fallback. When hole punching fails, an intermediate server forwards packets. Slower than direct but always works.
A coordination server is where STUN endpoints and NAT hints are exchanged. Without one, peers have no way to coordinate a hole punch.
4. What a coordination server does
A coordination server is a control-plane component that does not itself carry user traffic. Its jobs:
4.1 Peer registration and key distribution
Every peer registers with the coordination server, providing its public key, identity, and current endpoint. The server distributes a current peer list to every other peer on request.
4.2 NAT discovery and hint exchange
Peers report their observed public endpoint (via STUN or direct observation). The server shares these hints between peers that want to connect.
4.3 Policy enforcement
Which peers can reach which other peers. The policy is expressed as ACLs, tags, groups, or attribute-based rules. The server pushes relevant policy to each peer so the peer knows whose tunnels to accept.
4.4 Identity binding
Which human or machine identity owns a peer. Integrated with the organisation’s identity provider for SSO-based peer registration.
4.5 Relay coordination
When direct peer-to-peer fails, coordinate relay through a DERP-style server. See section 5.
4.6 Audit and observability
Log every peer registration, tunnel establishment, authorisation decision, and policy match. Feed to SIEM.
Mesh VPN products (Tailscale, NetBird, QuickZTNA, OpenZiti) differ principally in how they implement these six functions.
5. DERP-style relay fallback
DERP (Designated Encrypted Relay for Packets) is a Tailscale-coined term for the relay servers that forward packets when direct peer-to-peer connectivity fails. DERP servers carry encrypted WireGuard packets between peers without decrypting them — they act as packet-forwarding relays.
A mesh deployment typically runs DERP servers in multiple geographic regions. Peers discover the nearest DERP and use it as fallback. Direct peer-to-peer is always preferred when available; DERP is the safety net.
QuickZTNA runs DERP relays in four regions (BLR, NYC, LON, SFO). Tailscale runs a larger fabric. Headscale allows configuration of your own DERP or use of Tailscale’s public DERP. NetBird provides a similar relay layer.
6. Policy distribution
Policy in a mesh VPN is typically expressed centrally and distributed to enforcement points. Two patterns:
6.1 Centralised enforcement
The coordination server evaluates policy and tells each peer what tunnels to accept. When a peer receives an inbound handshake, it asks the coordination server “should I accept?”.
- Pros: central visibility, easy to update policy.
- Cons: latency on handshake; single point of failure.
6.2 Distributed enforcement
The coordination server pushes policy to each peer periodically. Each peer evaluates policy locally when a tunnel is attempted.
- Pros: no control-plane dependency per handshake; offline-capable.
- Cons: policy updates can lag; local enforcement must be correct.
Most mature products use a hybrid — push policy, revalidate periodically, escalate ambiguous cases to the server.
7. Identity binding
Each peer is owned by an identity — a user, a service account, or a specific machine role. The coordination server maps peers to identities via:
- SSO at peer registration. The user signs in via OIDC/SAML to prove identity; the peer’s public key is registered with that identity bound.
- Pre-authorisation keys. A short-lived auth key, scoped to a specific user or role, allows a peer to register non-interactively. Used for headless machines.
- Machine identity. Where humans are not involved (servers, containers), identity can be bound to machine-level credentials (cloud IAM, service account tokens).
Identity matters because policy is usually expressed in terms of identities (“engineers can reach prod-db”) not peer keys.
8. Subnet routes and exit nodes
Two related mesh VPN features that extend peer reachability.
8.1 Subnet routes
A peer advertises that it can route to a CIDR range behind it. Other peers configured to use that subnet send traffic through the advertising peer, which forwards it to the destination. Used for reaching machines that are not running a WireGuard agent themselves — legacy servers, printers, cameras.
8.2 Exit nodes
A peer designated as an exit node accepts “route all traffic” from other peers and forwards that traffic to its local network (typically the internet). Used for privacy or for egress compliance — traffic appears to originate from the exit node’s location.
Both features require the coordination server to distribute the advertisements and the policy of who is allowed to use them.
9. The four-peer example, fully worked
Rather than stopping at “it gets complicated”, let us work a four-peer example.
Peers: Alice’s laptop, Bob’s laptop, staging-server, prod-server. Goal: engineers can reach servers; servers cannot reach engineers’ laptops.
9.1 Bare WireGuard
Each peer needs three other peer configurations. On Alice’s laptop:
[Interface]
PrivateKey = ...alice-priv...
Address = 10.0.0.10/32
[Peer]
# Bob
PublicKey = ...bob-pub...
AllowedIPs = 10.0.0.11/32
[Peer]
# staging
PublicKey = ...staging-pub...
AllowedIPs = 10.0.0.100/32
Endpoint = staging.example.com:51820
[Peer]
# prod
PublicKey = ...prod-pub...
AllowedIPs = 10.0.0.101/32
Endpoint = prod.example.com:51820
Bob’s laptop has a similar config with Alice, staging, and prod. Staging and prod have peer blocks for both laptops and each other.
Policy is enforced by AllowedIPs — Alice can route to 10.0.0.11 (Bob), 10.0.0.100 (staging), 10.0.0.101 (prod). Servers can be configured without AllowedIPs for laptops, preventing server-to-laptop traffic.
9.2 With coordination server
Alice registers her public key with the coordination server and authenticates as user “alice@company.com” with tag “engineer”. Bob registers similarly. Servers register with tag “server” and a specific role.
Policy at the coordination server:
allow user:alice to tag:server
allow user:bob to tag:server
deny tag:server to tag:laptop
The server pushes Alice’s peer config dynamically — she sees Bob, staging, and prod as peers with the allowed routes. Staging sees the engineers inbound but does not accept their AllowedIPs for outbound.
If Eve joins the team, the admin assigns her the “engineer” tag. The server distributes updated peer lists to everyone. No manual config editing.
9.3 The scaling advantage
At four peers, the two approaches are comparable in complexity. At forty, the coordination-server approach is one policy rule per access pattern; the bare WireGuard approach is 40 × 39 = 1,560 peer entries to maintain.
10. When to graduate to a product
Five signals that suggest moving beyond bare WireGuard:
- Team size crosses 10 active peers. Config churn starts hurting.
- Endpoint changes become frequent. Laptops moving between networks, cloud instances redeploying — static configs break.
- Policy needs attributes. You want “engineers can reach prod between 9-6” or “only from posture-compliant devices” — ACLs with AllowedIPs cannot express this.
- Compliance enters scope. SOC 2, HIPAA, NIS2. Audit logs become a requirement. Bare WireGuard has essentially no audit trail.
- Identity integration is needed. You want peers bound to SSO identities, not static keys.
Any two of the above usually tips the decision toward a product. See our WireGuard-based mesh comparison for the serious options in 2026.
Further reading
- WireGuard protocol.
- WireGuard whitepaper (Donenfeld, 2017).
- RFC 5389 — STUN.
- Noise Protocol Framework.
- Tailscale’s “How NAT traversal works” post.
Related reading on this blog
- WireGuard vs OpenVPN vs IPsec: 2026 Engineering Comparison
- NetBird vs Tailscale vs QuickZTNA
- Self-Hosting Headscale vs a Managed Coordination Server
- What Is ZTNA?
Try QuickZTNA
QuickZTNA handles the four coordination-server jobs (peer registration, NAT traversal, policy, identity) on top of standard WireGuard, with hybrid post-quantum PSK on every tunnel. Start on Free — 100 devices, 3 users, no config files to maintain.