Finding prime numbers with Eratosthenes
A prime is a natural number greater than 1 that has no positive divisors other than 1 and itself. For example:
- 6 is not prime because it has many divisors: 1, 2, 3, and 6.
- 7 is prime because it has only 2 divisors: 1 and itself.
Here's an interesting problem: write a program that finds all prime numbers between 0 and N. For N = 20, the solution would be: 2, 3, 5, 7, 11, 13, 17, 19.
In this article we will cover 2 solutions. One naive, and one very elegant called the "sieve of Eratosthenes". Both will be written in Python.
The naive solution
The best place to start when confronted with a problem is with a naive solution. It means we will code a straightforward program with no performance considerations.
First, we write a function to check whether or not the parameter x
is prime.
def is_prime(x):
# If x is less than 2, it is not prime by definition
if x < 2:
return False
# Go though all the numbers between 2 and x (x excluded)
for i in range(2, x):
# If the remainder of x divided by i is zero
if x % i == 0:
# Then i is a divisor of x, so x is not prime
return False
# If we found no divisor, then x is prime
return True
Second, we go though all numbers between 1 and N one by one, and call is_prime
to test if each number is prime or not.
def all_primes(n):
# Variable that will contain all the prime numbers
primes = []
# Go though all the numbers between 0 and n (n included)
for i in range(0, n+1):
# If the number i is prime
if is_prime(i):
# Then add it to the list
primes.append(i)
# Return all prime numbers
return primes
And we are done! Let's run our program a few times while measuring how long it takes.
all_primes(10**3) # 0.2ms
all_primes(10**4) # 9ms
all_primes(10**5) # 870ms
all_primes(10**6) # 115,000ms
It works, but it's pretty slow for large values of N. That's because we have 2 nested for
loops that need to go though many iterations.
The sieve of Eratosthenes algorithm
Eratosthenes is a Greek mathematician that was born in 276 BC. He discovered a clever algorithm named after him that will solve our problem in an elegant and efficient way.
Below is a description of how the algorithm works for N = 29.
First we list all numbers between 2 and 29. Each number can have 2 states: green or red. By default they are all green.
We start from the beginning of the list, and pick the first green number. In our case that's 2.
Then we go trough the remaining of the list to find all numbers that are multiples of 2, and we set them in red. They are: 4, 6, 8, ..., 24, 26, and 28.
Now we start again: find the first green number (3), and mark in red all its multiples (6, 9, 12, 15, 18, 21, 24, 27). Note that some of them where already red, so they won't change color.
And we repeat the same thing, this time with 5 (10, 15, 20, 25).
We continue this process for all remaining numbers, and we end up with this.
As you may have guessed, all numbers in green are primes. Pretty cool!
Here's a short explanation of why this works:
- Red numbers are multiples of the green ones. It means that they have at least one divisor and cannot be primes.
- Green numbers were never set in red. It means they have no divisors, so they must be primes.
The sieve of Eratosthenes code
Let's code the sieve of Eratosthenes. But before that, there are 2 things to keep in mind:
- To store the colors, we will use a boolean:
True
for green andFalse
for red. - To store both the numbers and the booleans, we will use a simple list. For example
[ False, False, True ]
would mean:False
in 0th position → 0 is not prime.False
in 1st position → 1 is not prime.True
in 2nd position → 2 is prime.
Here's the full code of the algorithm, with many comments to explain how it works.
def all_primes(n):
# Variable that will contain all the prime numbers
primes = []
# Initialize our table with n times the boolean True
table = [ True ] * n
# The numbers 0 and 1 are not primes by definition
# So we set them as False in the table
table[0] = table[1] = False
# Go though all the elements of the table
for i, boolean in enumerate(table):
# If the boolean is set to True
if boolean:
# It means i is a prime number, so we add it to our list
primes.append(i)
# Go though all multiples of i
# That's i+i, i+i+i, and so on up to n
for j in range(i+i, n, i):
# And set their flags to False
table[j] = False
# Return all primes
return primes
If we try our new program with our previous tests, we get vastly improved performances.
all_primes(10**3) # 0.001ms (previously 0.2ms)
all_primes(10**4) # 0.4ms (previously 9ms)
all_primes(10**5) # 3.5ms (previously 870ms)
all_primes(10**6) # 36.5ms (previously 115,000ms)
Conclusion
Prime numbers are an interesting topic, and the sieve of Erastosthenes is a very efficient and elegant solution to find them. That's why it's one of my favorite algorithms.
For your information, all the code shown in this article is available on GitHub.
If you have any questions or feedback: thomas@lesscake.com :-)