Error 198: DNS_ERROR

подсказка

This error occurs when ClickHouse cannot resolve a hostname to an IP address through DNS lookup. It indicates that DNS resolution failed for a hostname used in cluster configuration, distributed queries, or external connections.

Most common causes

Hostname does not exist
- Hostname is misspelled in configuration
- Pod or service not yet created in Kubernetes
- Server has been decommissioned or renamed
- DNS record not created or has been deleted
DNS server issues
- DNS server is unreachable or down
- Network connectivity problems to DNS server
- DNS server timeout or slow response
- Incorrect DNS server configuration
Kubernetes service discovery problems
- Pods not ready when DNS lookup occurs
- Service endpoints are not yet available
- Headless service DNS not propagated
- CoreDNS or kube-dns issues in cluster
Cluster configuration errors
- Wrong hostname in cluster configuration
- Hostname referencing nodes that don't exist
- Typo in remote_servers configuration
- Stale configuration with old hostnames
DNS cache issues
- Cached DNS entries for deleted hosts
- DNS TTL expiration causing lookups for removed hosts
- ClickHouse DNS cache not updated after infrastructure changes
Network or firewall issues
- Firewall blocking DNS queries (port 53)
- Network segmentation preventing DNS access
- DNS resolution timeout too short

Common solutions

1. Verify hostname resolution manually

# Test DNS resolution from ClickHouse server
nslookup hostname.domain.com

# Or using dig
dig hostname.domain.com

# Check from ClickHouse pod (Kubernetes)
kubectl exec -it clickhouse-pod -- nslookup service-name.namespace.svc.cluster.local

2. Check cluster configuration

<!-- Verify remote_servers configuration -->
<remote_servers>
    <cluster_name>
        <shard>
            <replica>
                <!-- Ensure hostname is correct -->
                <host>correct-hostname.domain.com</host>
                <port>9000</port>
            </replica>
        </shard>
    </remote_servers>
</remote_servers>

3. Check ClickHouse DNS resolver logs

-- View DNS resolution errors in logs
SELECT 
    event_time,
    logger_name,
    message
FROM system.text_log
WHERE logger_name = 'DNSResolver'
  AND level IN ('Error', 'Warning')
  AND event_date >= today() - 1
ORDER BY event_time DESC
LIMIT 100;

4. Clear ClickHouse DNS cache

ClickHouse caches DNS lookups. If hostnames have changed:

-- Force reload of cluster configuration
SYSTEM RELOAD CONFIG;

-- Or restart ClickHouse server

5. Fix Kubernetes service issues

# Check if pods are ready
kubectl get pods -n your-namespace

# Check service endpoints
kubectl get endpoints service-name -n your-namespace

# Check CoreDNS logs
kubectl logs -n kube-system -l k8s-app=kube-dns

# Restart CoreDNS if needed
kubectl rollout restart deployment/coredns -n kube-system

6. Verify DNS server configuration

# Check /etc/resolv.conf
cat /etc/resolv.conf

# Test DNS server accessibility
ping dns-server-ip

7. Update cluster configuration

Remove non-existent hosts from configuration:

<remote_servers>
    <cluster_name>
        <shard>
            <!-- Remove or comment out hosts that don't exist -->
            <!--
            <replica>
                <host>old-hostname-that-does-not-exist</host>
                <port>9000</port>
            </replica>
            -->
        </shard>
    </cluster_name>
</remote_servers>

Common scenarios

Scenario 1: Kubernetes pod not ready

Error: Cannot resolve host (pod-name.headless-service.namespace.svc.cluster.local), 
error 0: Host not found

Cause: Pod not yet started or service endpoints not available.

Solution:

Wait for pods to become ready
Check pod status: kubectl get pods
Verify headless service has endpoints: kubectl get endpoints

Scenario 2: Stale cluster configuration

DNSResolver: Cannot resolve host (old-server-name), error 0: Host not found
DNSResolver: Cached hosts dropped: old-server-name
DNSCacheUpdater: IPs of some hosts have been changed. Will reload cluster config

Cause: Configuration references servers that have been removed.

Solution:

Update cluster configuration to remove old hosts
Reload configuration: SYSTEM RELOAD CONFIG
Or restart ClickHouse server

Scenario 3: DNS server unreachable

Error: Cannot resolve host, error: Temporary failure in name resolution

Cause: DNS server is down or unreachable.

Solution:

Check DNS server status
Verify network connectivity
Test DNS resolution manually: nslookup hostname
Check /etc/resolv.conf for correct DNS servers

Scenario 4: Embedded Keeper quorum issues

DNSResolver: Cannot resolve host (node-3.cluster.local), error 0: Host not found

Cause: Keeper nodes not yet available or wrong hostname.

Solution:

Ensure all Keeper nodes are started
Verify Keeper configuration has correct hostnames
Check Keeper logs for connectivity issues

Prevention tips

Use valid hostnames: Verify hostnames exist before adding to configuration
Test DNS resolution: Use nslookup or dig to test hostnames before configuring
Monitor DNS health: Set up monitoring for DNS server availability
Use DNS caching wisely: Consider DNS TTL settings for dynamic environments
Keep configuration current: Remove decommissioned servers from cluster config
Kubernetes readiness: Ensure pods are ready before ClickHouse tries to connect
Use StatefulSets: In Kubernetes, use StatefulSets for predictable DNS names

Debugging steps

Identify failing hostname:

SELECT message
FROM system.text_log
WHERE message LIKE '%Cannot resolve host%'
  AND event_date >= today()
ORDER BY event_time DESC
LIMIT 10;

Test DNS resolution:

# From ClickHouse server
nslookup failing-hostname

# Check if DNS server responds
dig @dns-server-ip failing-hostname

Check cluster configuration:

-- View cluster configuration
SELECT *
FROM system.clusters
WHERE cluster = 'your_cluster';

Monitor DNS cache updates:

SELECT 
    event_time,
    message
FROM system.text_log
WHERE logger_name = 'DNSCacheUpdater'
  AND event_date >= today()
ORDER BY event_time DESC
LIMIT 20;

Check network connectivity:

# Ping DNS server
ping dns-server-ip

# Check DNS port accessibility
nc -zv dns-server-ip 53

# Test from specific pod (Kubernetes)
kubectl exec -it pod-name -- ping dns-server-ip

Review Kubernetes events (if applicable):

kubectl get events -n your-namespace --sort-by='.lastTimestamp'

Special considerations

For Kubernetes deployments:

Headless services create DNS entries for each pod
StatefulSet pods have predictable DNS names: pod-name-0.service-name.namespace.svc.cluster.local
DNS may not be immediately available when pods are starting
CoreDNS issues can affect entire cluster

For distributed clusters:

All nodes must be able to resolve each other's hostnames
DNS failures on one node can affect distributed queries
Consider using IP addresses for critical internal connections (though less flexible)

For ClickHouse Keeper:

All Keeper nodes must be resolvable by name
Keeper quorum formation requires DNS resolution
Wrong hostname in Keeper config prevents cluster formation

DNS cache behavior:

ClickHouse caches DNS lookups to reduce DNS queries
Cache is updated periodically (default: every 15 seconds)
Failed lookups are also cached temporarily
SYSTEM RELOAD CONFIG forces DNS cache refresh

Configuration settings

DNS-related settings in ClickHouse configuration:

<clickhouse>
    <!-- DNS cache update period in seconds -->
    <dns_cache_update_period>15</dns_cache_update_period>
    
    <!-- Disable DNS cache (not recommended for production) -->
    <disable_internal_dns_cache>0</disable_internal_dns_cache>
</clickhouse>

When DNS errors persist

If DNS errors continue after basic troubleshooting:

Use IP addresses temporarily:

<remote_servers>
    <cluster>
        <shard>
            <replica>
                <!-- Use IP instead of hostname as temporary workaround -->
                <host>192.168.1.10</host>
                <port>9000</port>
            </replica>
        </shard>
    </cluster>
</remote_servers>

Add entries to /etc/hosts:

# Add static DNS entries
echo "192.168.1.10 server-name.domain.com" >> /etc/hosts

Configure alternative DNS servers:

# Edit /etc/resolv.conf
nameserver 8.8.8.8
nameserver 8.8.4.4

Increase DNS timeout:
- Check system DNS resolver timeout settings
- Consider increasing if network latency is high

If you're experiencing this error:

Identify which hostname is failing from error logs
Test DNS resolution manually with nslookup or dig
Verify the hostname exists and is spelled correctly
Check DNS server availability and accessibility
For Kubernetes: ensure pods are ready and service endpoints exist
Update cluster configuration to remove non-existent hosts
Reload ClickHouse configuration or restart server
Monitor DNS cache updates in ClickHouse logs

Most common causes​

Common solutions​

Common scenarios​

Prevention tips​

Debugging steps​

Special considerations​

Configuration settings​

When DNS errors persist​

Most common causes

Common solutions

Common scenarios

Prevention tips

Debugging steps

Special considerations

Configuration settings

When DNS errors persist