In order to scale Read-Copy Update (RCU) callback offloading from no-callbacks (No-CBs) CPUs, a set of RCU callback offload kernel threads (rcuo kthreads) may be spawned and each may be assigned to one of the No-CBs CPUs to invoke RCU callbacks generated by workloads running on the No-CBs CPUs at CPUs that are not No-CBs CPUs. Groups of the rcuo kthreads may be established, with each rcuo kthread group having one leader kthread and one or more follower rcuo kthreads. The leader rcuo kthreads may be periodically awakened without waking up the follower kthreads when an RCU grace period ends and an RCU callback needs to be invoked, or when a new RCU callback arrives and a new RCU grace period needs to be started. The leader rcuo kthreads may periodically awaken their associated follower rcuo kthreads for which the leader rcuo kthreads have sole responsibility to wake.