Private DNS in Azure Is Deceptively Hard — Here's the Architecture That Actually Holds Up

peterrivera813
Apr 19
9 min read

Every Azure architect I know has a Private DNS war story. The resolution worked fine in dev. It worked in the single-hub staging environment. Then it silently broke the moment the topology got complicated — a second region, an on-prem conditional forwarder, a partner tenant, or a new spoke that someone wired in slightly differently than the others.

Private DNS in Azure looks simple on the surface. A zone, a link, a record. But the failure modes are non-obvious, the defaults will mislead you, and the documentation is optimistic about how clean the real world is going to be. This post is the one I wish I'd had years ago.

Why It Seems Simple (And Why That's the Problem)

The basic model really is simple. You create a Private DNS Zone (say, privatelink.blob.core.windows.net), link it to a VNet, and DNS queries from that VNet resolve privately. The Azure-provided DNS at 168.63.129.16 handles everything.

That model works perfectly — until:

You have more than one hub (multi-region or active-active topologies)
You have on-premises clients that need to resolve Azure private endpoints
You have spoke VNets peered to multiple hubs
You have cross-tenant private endpoints (partner integrations, MSP scenarios)
You have custom DNS servers (AD-integrated DNS, BIND, Infoblox) anywhere in the chain

The moment any of those conditions appear, you're no longer dealing with the documented happy path. You're dealing with how DNS resolution actually propagates through Azure's network fabric — and that's a different conversation.

How Azure DNS Resolution Actually Works

Before fixing anything, it's worth being precise about what's actually happening.

Every Azure VM is configured by DHCP to use 168.63.129.16 as its DNS server. This is a non-routable virtual IP — it's Azure's internal fabric DNS, and it's reachable from every Azure VNet regardless of topology. It is not a VM, not a scale set, and not something you can inspect or directly influence.

When a VM queries 168.63.129.16 for a name, the fabric resolver:

Checks if any Private DNS Zone linked to that VNet (or a peered VNet) contains a matching record.
Falls back to public DNS resolution if no private match is found.

The critical word in step 1 is linked. A Private DNS Zone is only consulted during resolution if it has a VNet link to the VNet where the query originates. Peering alone does not propagate DNS zone visibility — you need an explicit link.

This is the first thing people get wrong at scale.

Failure Mode #1: The VNet Link Gap

In a hub-spoke topology, the natural instinct is to link Private DNS Zones to the hub VNet. Spokes are peered to the hub, so queries should flow through the hub and resolve correctly — right?

Wrong.

DNS resolution happens at the VNet where the query originates. A spoke VM querying 168.63.129.16 resolves using zones linked to the spoke's VNet, not the hub's. Unless the zone is explicitly linked to the spoke, or the spoke is configured to use a custom DNS server in the hub that performs the lookup on its behalf, the query will fall through to public DNS — and for private endpoints, that means it resolves to the public IP, bypassing your private networking entirely.

The fix depends on your scale:

Small environments (< 20 spokes): Link your Private DNS Zones directly to each spoke VNet. Use Azure Policy to enforce this automatically at subscription vending time.
Large environments (many spokes): Deploy a centralized DNS forwarder in the hub (more on this below) and configure all spokes to use it via a custom DNS server setting on the VNet. This keeps zone management centralized and doesn't require touching every spoke when a new zone is created.

Failure Mode #2: On-Premises Clients Can't Resolve Azure Private Endpoints

This one bites almost every hybrid shop. On-premises DNS servers (Active Directory DNS, BIND, Infoblox) handle internal name resolution and forward unknown queries to forwarders — typically public DNS or your ISP's resolvers. They do not have visibility into Azure Private DNS Zones.

When an on-prem client tries to resolve storageaccount.blob.core.windows.net after you've created a private endpoint for it, the query hits public DNS, resolves to the public IP, and traffic goes over the internet — bypassing your private endpoint entirely. Silent, and frustrating.

Understanding the CNAME chain is what makes the fix click. When you create a private endpoint, Azure sets up an automatic CNAME redirect:

storageaccount.blob.core.windows.net
    → CNAME → storageaccount.privatelink.blob.core.windows.net
        → A record → 10.x.x.x (private endpoint IP)

Azure's fabric DNS follows that chain and returns the private IP — but only if the resolver handling the query has visibility into your Private DNS Zones.

The correct architecture requires:

A conditional forwarder on your on-prem DNS servers targeting the public service zone — for example, blob.core.windows.net, database.windows.net, vaultcore.azure.net — pointing to the Azure DNS Private Resolver inbound endpoint IP.
That inbound endpoint must be reachable from on-prem over ExpressRoute or VPN.
Azure DNS follows the CNAME chain from the public zone through to the privatelink.* record and returns the private IP.

This is Microsoft's explicitly recommended approach, and it's meaningfully simpler than the alternative. Forwarding the public parent zone (blob.core.windows.net) means you need far fewer conditional forwarder entries — one per Azure service namespace rather than one per privatelink.* zone. It also means you don't need to update your on-prem forwarder configuration every time a new privatelink.* namespace is introduced. For resources that don't have private endpoints, Azure DNS resolves the query to the public IP as normal — no disruption to existing traffic.

The Azure DNS Private Resolver inbound endpoint is the key piece. It gives you a real routable IP (unlike 168.63.129.16, which is unreachable from on-prem) while remaining zone-aware because it lives inside a VNet linked to your Private DNS Zones.

Failure Mode #3: Custom DNS Servers Break the Chain

If you configure a VNet to use custom DNS servers (common in AD-joined environments), Azure's fabric DNS at 168.63.129.16 is no longer the first hop. Your custom server is. And your custom server may have no idea how to resolve Azure private endpoint names.

The pattern that works:

On-prem client / Azure VM with custom DNS
        ↓
Custom DNS Server (AD DNS / Infoblox / BIND)
        ↓ (conditional forwarder for public service zones
           e.g. blob.core.windows.net, database.windows.net)
Azure DNS Private Resolver — Inbound Endpoint
        ↓
168.63.129.16 (Azure fabric DNS)
        ↓ (follows CNAME → privatelink.* → A record)
Private DNS Zone record → private endpoint IP

The Azure DNS Private Resolver is the key piece here. It gives you a real IP address (not 168.63.129.16) that you can forward to from on-prem or from custom DNS servers in your VNet. It's zone-aware because it lives inside a VNet that has Private DNS Zone links. And it's fully managed — no VMs to patch, no availability sets to configure.

If you're still running forwarder VMs (Windows Server DNS roles in the hub) to bridge this gap, the DNS Private Resolver is worth migrating to. The operational burden of forwarder VMs in an availability-critical path is a latent resiliency risk most teams don't account for.

Failure Mode #4: Multi-Hub Topologies and Zone Ownership Conflicts

In a multi-region or dual-hub architecture, zone management becomes a coordination problem.

The most common mistake: creating the same Private DNS Zone (e.g., privatelink.blob.core.windows.net) independently in two different resource groups or regions, then linking each to its local hub and spokes. This works until you realize that a spoke peered to Hub A cannot see the zone linked to Hub B — and if you have resources in both regions that need to be resolved from both, you've created an inconsistency that's hard to debug.

The architecture that holds up:

One Private DNS Zone per namespace, centrally owned — typically in a dedicated connectivity subscription (aligned to CAF landing zone structure).
Linked to all hubs and, if you're using direct zone links instead of centralized resolvers, to all spokes.
Managed by the platform team. Workload teams request records via automation (or via Azure Policy + Private Endpoint DNS Group, which handles A-record creation automatically).

Private Endpoint DNS Groups (privateDnsZoneGroup Terrraform/Bicep) are underused here. When you associate a private endpoint with a DNS zone group, Azure automatically creates and manages the A record in the correct zone. This removes the manual step of creating DNS records and eliminates the entire class of "private endpoint exists but DNS record is missing" bugs.

Failure Mode #5: Cross-Tenant Resolution

This is the scenario that usually catches MSPs and enterprises with partner integrations off-guard. You have a private endpoint in Tenant A that a resource in Tenant B needs to reach — or vice versa. The Private DNS Zone in Tenant A cannot be linked to a VNet in Tenant B.

Options:

Manual DNS record management: Create a private DNS zone in Tenant B with the same namespace, and manually add the A record pointing to the private endpoint IP in Tenant A. Functional, but operationally fragile. Any IP change (endpoint redeployment, migration) requires manual record updates.
Centralized DNS forwarder with cross-tenant VNet peering or Private Link: Route DNS queries from Tenant B to a resolver in Tenant A (reachable via cross-tenant VNet peering or Azure Private Link itself). More complex to set up, but eliminates the manual record management problem.
Custom DNS resolver in a shared services VNet with visibility into both tenants: Requires careful RBAC and zone link design, but scales well for MSP scenarios with many tenants.

There's no clean out-of-the-box answer here. The right choice depends on whether the cross-tenant relationship is persistent (justify the infrastructure) or occasional (tolerate the manual approach with change management controls).

The Architecture That Actually Holds Up

For a mature multi-region Azure environment, here's the pattern I've seen work at scale:

┌─────────────────────────────────────────────┐
│         Connectivity Subscription            │
│                                              │
│  Private DNS Zones (all privatelink.* zones) │
│    └── Linked to: Hub-VNET-EastUS            │
│    └── Linked to: Hub-VNET-WestUS        │
│                                              │
│  Azure DNS Private Resolver (each region)    │
│    ├── Inbound Endpoint (for on-prem fwding) │
│    └── Outbound Endpoint (for custom rules)  │
│                                              │
│  Hub VNETs contain no local DNS forwarder VMs│
└─────────────────────────────────────────────┘

Spoke VNets → Custom DNS = DNS Private Resolver Inbound IP
On-Prem DNS → Conditional Forwarder → DNS Private Resolver Inbound IP

Key principles baked into this design:

Single zone ownership in the connectivity subscription. No zone duplication across regions or subscriptions.
DNS Private Resolver replaces forwarder VMs — two inbound endpoints per region for redundancy, both IPs configured as custom DNS on spoke VNets.
Azure Policy enforces that any VNet deployed in a landing zone subscription uses the correct custom DNS server IPs. Non-compliant VNets are flagged in Defender for Cloud.
Private Endpoint DNS Groups handle automatic A-record lifecycle. Workload teams don't touch DNS manually.
On-prem conditional forwarders target the inbound endpoint IPs (reachable over ExpressRoute) using the public service zones (blob.core.windows.net, database.windows.net, vaultcore.azure.net, etc.) — not the privatelink.* zones. Azure follows the CNAME chain and returns the private IP. Fewer forwarder entries, no maintenance burden when new Private Link namespaces appear.

Operational Considerations You'll Want Later

Zone link limits: A Private DNS Zone can be linked to up to 1,000 VNets. If you're doing direct spoke links instead of centralized resolvers, monitor this — it becomes a constraint in large enterprises.

TTL hygiene: Default TTLs on auto-registered records from Private DNS Zone auto-registration are 10 seconds. For A records you manage manually, set TTLs that match your failover/migration tolerance. Overly high TTLs will bite you when you move private endpoints.

Split-horizon DNS: If a service has both public and private endpoints and clients in different networks need different resolution outcomes, you need split-horizon DNS. Azure handles this natively for private endpoints by design — but verify it works as expected with conditional forwarders in the chain, as some forwarder configurations inadvertently cache the public resolution for on-prem clients.

Monitoring: Azure Monitor now surfaces DNS query metrics from the DNS Private Resolver. Set up alerts on failed query rates. A spike in failed resolutions is often the first signal of a VNet link misconfiguration or a missing zone record before your application teams even notice.

Bottom Line

Private DNS in Azure is absolutely manageable but it will humble you if you treat it as an afterthought. I've seen teams spend weeks chasing resolution failures that traced back to a single missing VNet link or a conditional forwarder pointing at the wrong zone. The frustrating part isn't the fix, it's that nothing tells you something is wrong. No error, no alert, no red banner in the portal. You just get the public IP when you were expecting the private one, and your traffic quietly bypasses the private endpoint you spent time setting up. That's what makes this infrastructure worth getting right up front.

A few principles that have held up for me across environments of different shapes and sizes: build your DNS architecture in the connectivity subscription and treat it like first-class platform infrastructure not a workload-team concern. Own the zones centrally, link them deliberately, and enforce custom DNS settings via Azure Policy so new spokes can't accidentally go rogue. If you're still running forwarder VMs in your hub to bridge the on-prem gap, take a hard look at migrating to the Azure DNS Private Resolver the operational surface area of VMs in an availability-critical path is a risk that doesn't show up until 2am. Also, lean on Private Endpoint DNS Groups to handle A-record lifecycle automatically; removing the manual step removes an entire category of "endpoint exists, DNS record doesn't" bugs.

On the on-prem side: forward the public service zones (blob.core.windows.net, database.windows.net, vaultcore.azure.net) to your DNS Private Resolver inbound endpoint, not the privatelink.* zones. Azure follows the CNAME chain and returns the private IP. It's simpler, it's what Microsoft actually recommends, and it doesn't require you to update your forwarder config every time a new Private Link namespace appears.

Most importantly, test your on-prem resolution path before you need it to work under pressure. Spin up a test VM or leverage an existing VM on-prem, resolve a private endpoint FQDN, and confirm you're getting the private IP. Then do it again after every major topology change. It takes five minutes and has saved me more than a few uncomfortable conversations.

The teams that invest in this early reach a point where DNS stops being a topic at all. It just works, spoke after spoke, region after region, service after service. The teams that don't will recognize themselves in every subtle name resolution bug that surfaces over the next two years usually at the worst possible time.