Hey all,
Needing the collective genius of this forum to put me out of my misery.
The Config/Context
3 x cPanel 110.0.15 Servers (CloudLinux 7.9) in a DNS Cluster (PowerDNS).
The customer's account in question was transferred from another cPanel Server a couple of years ago. It's a larger account with various sub-domains.
The Issue
Over the last ~9 months, there has been a periodic issue where the site stops responding. Pinging the domain or any sub-domains fail to resolve.
The issue ONLY affects this account and no others on the same Server or other Servers in the DNS Cluster.
It took me a minute to figure out that if I make any arbitrary change in the DNS Zone, the issue is immediately resolved, the domain name resolves correctly, and the site is back online.
The issue then reoccurs anywhere from an hour to several hours later.
Troubleshooting/Changes thus far
The actual Hosting Provider/Server Admins have always been great and had made some recommendations around correcting the NS records in the Zone among other things.
In the previous 2 instances of this issue, their recommended changes magically 'fix' the issues and I don't have a repeat of it for another few months. This 3rd instance however is stumping us. Server Admins are suggesting more top-level Glue/NS changes but again, there are no other affected accounts, just this one customer.
If it's relevant, I'm noting the Zone Serial# before and after each time I make the arbitrary change and between when I make the change and the issue reoccurs, I'm noticing 1-3 increments in the serial# - unsure if this is normal or suggests that something else is editing the zone in the cluster?
If anyone can help me resolve this...love you long time!
Needing the collective genius of this forum to put me out of my misery.
The Config/Context
3 x cPanel 110.0.15 Servers (CloudLinux 7.9) in a DNS Cluster (PowerDNS).
The customer's account in question was transferred from another cPanel Server a couple of years ago. It's a larger account with various sub-domains.
The Issue
Over the last ~9 months, there has been a periodic issue where the site stops responding. Pinging the domain or any sub-domains fail to resolve.
The issue ONLY affects this account and no others on the same Server or other Servers in the DNS Cluster.
It took me a minute to figure out that if I make any arbitrary change in the DNS Zone, the issue is immediately resolved, the domain name resolves correctly, and the site is back online.
The issue then reoccurs anywhere from an hour to several hours later.
Troubleshooting/Changes thus far
The actual Hosting Provider/Server Admins have always been great and had made some recommendations around correcting the NS records in the Zone among other things.
In the previous 2 instances of this issue, their recommended changes magically 'fix' the issues and I don't have a repeat of it for another few months. This 3rd instance however is stumping us. Server Admins are suggesting more top-level Glue/NS changes but again, there are no other affected accounts, just this one customer.
If it's relevant, I'm noting the Zone Serial# before and after each time I make the arbitrary change and between when I make the change and the issue reoccurs, I'm noticing 1-3 increments in the serial# - unsure if this is normal or suggests that something else is editing the zone in the cluster?
If anyone can help me resolve this...love you long time!