We define \( t_{auth} \) as the average response time of a remote authoritative server, and \( t_{cached} \) as how fast our resolver implementation responds in the cached case.
If we set define \( P_{cached} \) as the chance of a cache hit, the average response time \( t_{resolve} \) becomes: $$ t_{resolve} = P_{cached} \cdot t_{cached} + (1-P_{cached}) \cdot t_{auth} $$ This in itself is not yet overly insightful, but this becomes more so if we model \( P_{cached} \). If we assume that an entry stays in the cache until the TTL has expired, and we take into account the average query rate, we can guess the amount of cache hits we will get after the initial miss. This assumes that the query rate is high enough that we will get a significant number of queries before the TTL expires.
A special case are domains for which the reciprocal query rate is longer than the TTL, but such cases are either rare (because of a low query rate), or pathological (because of an unwisely low choice of TTL). In this case, \(P_{cached} = 0\), leading to \(t_{resolve} = t_{auth}\).
For the normal case, if the Query rate is \( R \) and the average TTL is \( T \), it follows that per time period \( T \): $$ \begin {aligned} N_{hits} & = R \cdot T - 1 \\ N_{misses} & = 1 \end {aligned} $$ This leads to: $$ \begin{aligned} P_{cached} & = \frac{N_{hits}}{N_{hits} + N_{misses}} = \frac{N_{hits}}{R \cdot T} \\ P_{cached} & = \frac{R \cdot T - 1}{R \cdot T} \end{aligned} $$ Which is probably better understood as \(P_{miss}\): $$ P_{miss} = \frac{N_{misses}}{N_{hits}+N_{misses}} = \frac{1}{R \cdot T} = 1 - P_{cached} $$ $$ P_{cached} = 1 - \frac{1}{R \cdot T} $$ From this it follows that the average resolving response time is: $$ \begin {aligned} t_{resolve} & = P_{cached} \cdot t_{cached} + (1-P_{cached}) \cdot t_{auth} \\ & = (1 - \frac{1}{R \cdot T}) \cdot t_{cached} + ( \frac{1}{R \cdot T} ) \cdot t_{auth} \end {aligned} $$ Usually \( t_{cached} \ll t_{auth} \), allowing for: $$ t_{resolve} \approx \frac{t_{auth}}{R \cdot T} $$ From this, it can be seen that to improve response times without artificially refetching, we can work on lowering \(t_{auth}\), by picking remote servers that respond quickly, or by raising the effective query rate \(R\).
One avenue for doing that is 'query concentration', making sure that similar queries end up on the same node within a load balanced setup.
Another way of achieving higher effective \( R \) is by adding 'cache peeking' to other nodes within a resolver constellation.
Another cheap way is to artificially increase minimum \( T \) above a certain, non-pathological, value.