Happy numbers and the time it takes to find them

Ever heard of happy numbers? They are these amazing numbers which cycle down to 1 when their digits are squared and added. For instance,

$ 23=2^2 + 3^2=13 $

$ 13=1^2 + 3^2=10 $

$ 10=1^2 + 0^2=1 $

On the other hand, an un-happy number would cycle down to itself instead of 1.

In base 10, for the first 10 million or so numbers, the natural density of happy numbers is about 0.15. Being this mathematical surprise, naturally, software engineers decided to make interview questions out of it. But that’s not what this blog post is about. This blog post is about the math behind the time complexity of the solution to this question. So let’s get started.

Stat warning: might get math-y.

The question

Before the time complexity comes the solution. And before the solution comes the question. And the question is: “Given a base 10 number, return true if its a happy number, false otherwise“.

The solution

The most straight-forward and widely available solution is to use a set to track seen numbers and stop iterating when the current number is 1 or a seen number, proving that the sequence has turned a full cycle. Naturally, the space complexity would be the same as the time complexity in that case, so we will reduce our cognitive overhead a bit by using a slightly more complex algorithm that has a space complexity of $ O(1) $, so we can fully concentrate on the time complexity.

public boolean isHappy(int num) {        
  int slow = num, fast = num;        
  do {            
    slow = sq(slow);            
    fast = sq(sq(fast));        
  } while(slow != fast);        
  return slow == 1;    
}    

private int sq(int num) {        
  int sq = 0;        
  while (num > 0) {            
    int digit = num % 10;            
    sq += digit*digit;            
    num = num/10;        
  }        
  return sq;    
}

The time complexity

Finally to the interesting part, what would be the time complexity of this algorithm?

In the most basic form, we are iterating over every digit of a number. So if there were $ d $ digits, we would do $ d $ amount of work. Great, but how many digits does a number contain asymptotically?

In base 10, we have a range of $ 0-9 $. But if we are talking about the left-most digit of a number, it cannot be 0, so for the left-most digit, our range reduces to $ 1-9 $. So, for base 10, this gives us an inequality $ 10^{(d-1)} <= N < 10^d $, where $ N $ is the number.

Taking $ log $ on both sides gives us $ d-1 <= log(N) < d $. Since log is a monotonic function, this shouldn’t change the inequality. Let’s call this result-1, we will need it later.

But it still doesn’t tell us much, so we add $ 1 $ to the left inequality: $ d <= log(N)+1 $. Alright, now we are getting somewhere. Let’s call this result-2.

We still haven’t fully bounded $ d $ though. So, we combine result-1 and result-2: $ log(N) < d <= log(N)+1 $.

Now that $ d $ is fully bounded, we observe that, asymptotically, there are $ \theta(log(N)) $ digits in any number. So, at most, we need to do that much work.

Cool, that’s the first part of the puzzle solved. But how many numbers are there till we either reach 1 or the initial number itself? Since we only care about the time complexity, we may not need to answer that question.

From the previous sections, we make two observations:

There’s a logarithmic relationship between $ N $ and $ d $.
There must be a point from which the resultant number of digits after performing the happy number operation grows. And similarly, there must be a point from which the resultant number of digits shrink. Let’s generalize this statement and say, that at every step, the number of digits is cut-down by some factor.

Thus, from before, we know, for one operation it takes $ O(log(N) $ amount of time. In the next step, let’s say $ d $ is cut-down by some (non-linear) factor, so now the complexity would simply be $ O(log log N) $. We need to keep performing the operation a certain number of times till it cycles-down to 1 or the initial number itself. So, our total time complexity becomes $ O(log(N) + log log(N) + log log log(N) + ….) = O(log(N)) $. And that, is the final time complexity.

Conclusion

I came across this question somewhere and while the solution is interesting, the complexity analysis seemed far more interesting. Naively, it’s really hard to guess that it could be $ logN $ and most explanations I came across on the Internet, seemed to skim over some of the math, so this is my attempt to bring forth a more verbose analysis of the time this algorithm takes.

Abhishek Dey Das