Given a set of N items, we need to calculate all subsets of K items. In each calculated subset, the selected items must appear only one time (no repetitions). In addition, the order of selected items should follow the criteria chosen for the entire set of N items.
In the following sections, it is always assumed that the elements in the original set are all unique and sorted according to some criteria.
An example
For easy of understanding, letβs consider a set of 5 characters: { βaβ, βbβ, βcβ, βdβ, βeβ }.
The goal of this post is to provide a basic algorithm to calculate all combinations of size k <= 5.
For example, if k = 4, then the combinations will be:
- { βaβ, βbβ, βcβ, βdβ }
- { βaβ, βbβ, βcβ, βeβ }
- { βaβ, βbβ, βdβ, βeβ }
- { βaβ, βcβ, βdβ, βeβ }
- { βbβ, βcβ, βdβ, βeβ }
Each of the selected items occurs only once in the calculated set (combinations with no repetitions).
How to calculate the expected number of combinations?
The number of combinations of size k from a set of n elements is calculated as:
nCk = n! / ( k! * (n-k)! )
where ! denotes the factorial operation.
A first approach: the brute force (iterative method)
The following code provides all possible combinations of size K from a vector of N items.
This code assumes that the input values are sorted according to some criteria and are unique inside the input vector.
The proposed code is basically a brute force algorithm that checks all possible combinations of size k and stores them inside an output vector.
// Author : Pietro L. Carotenuto
// Website: www.pietrolc.com
#include <iostream>
#include <vector>
#include <deque>
using Tin = std::vector<int>;
using Tout = std::vector<std::deque<int>>;
void calculateCombinationsRec(const Tin& vec, const int k, Tout& sol, std::deque<int>& tmp, const int nextIndex )
{
tmp.emplace_back(vec[nextIndex]);
// Save and return
if (k == tmp.size())
{
// Add to solutions bucket
sol.emplace_back(tmp);
// Remove last item
tmp.pop_back();
// Check remaining combinations
return;
}
// Otherwise, continue checking remaining combinations
for (size_t i = nextIndex + 1; i < vec.size(); i++)
{
calculateCombinationsRec(vec, k, sol, tmp, i);
}
// Remove last item before checking new combinations
tmp.pop_back();
}
// This returns a vector of all calculated combinations
// Assumption: The input vector does not contain any duplicate values.
// Ideally, the input values are sorted according some criteria.
void calculateCombinations(const Tin& vec, const int k, Tout& sol)
{
// Iterate on all possible root indexes
for (size_t rootIdx = 0; rootIdx < vec.size(); rootIdx++)
{
// For each root, this temporary queue stores all combinations having
// the same starting point
std::deque<int> tmp;
calculateCombinationsRec(vec, k, sol, tmp, rootIdx);
}
}
int main()
{
Tin vec;
Tout sol;
const int k = 5;
for (int i = 0; i < 10; i++)
{
vec.emplace_back(i);
}
// Main function
calculateCombinations(vec, k, sol);
std::cout << "Combinations: " << std::endl;
for (const auto & s1 : sol)
{
for (const auto& it2 : s1)
{
std::cout << it2 << " ";
}
std::cout << std::endl;
}
std::cout << "N. combinations: " << sol.size() << std::endl;
}
This code produces the following output:
Combinations:
0 1 2 3 4
0 1 2 3 5
0 1 2 3 6
0 1 2 3 7
0 1 2 3 8
0 1 2 3 9
0 1 2 4 5
0 1 2 4 6
0 1 2 4 7
0 1 2 4 8
0 1 2 4 9
...
3 5 6 8 9
3 5 7 8 9
3 6 7 8 9
4 5 6 7 8
4 5 6 7 9
4 5 6 8 9
4 5 7 8 9
4 6 7 8 9
5 6 7 8 9
N. combinations: 252
Each element inside the input vector can be the starting element of a set of combinations of k elements.
The most expert in this field might realize that the elements having indexes larger than n β (k-1) cannot be the starting element of a new set of combinations. In order to improve the performance, we could ignore those items.
Given a specific starting element, then the function calculateCombinationsRec() is recursively called to check the remaining elements. The function calculateCombinationsRec() only stores the combinations having size k.
The brute force (recursive optimized method)
The code above works pretty fine but it has a big disadvantage: it cannot handle a large data set of items. Why? Simply because the proposed function is recursive.
For a large set of items or, similarly, for large values of K, this recursive function might lead to a stack overflow.
The solution is to convert the proposed code to its non-recursive version.
Converting a code from a recursive to its iterative version is easy.
Please, have a look into the following code, where an iterative function is being used to calculate all combinations of a given set of characters:
Unlock now all the premium content in this page.
Sign-inΒ or register to buy access to this content using yourΒ virtual coins.
Which is the expected console output for the input vector of characters {βaβ, βbβ, βcβ, βdβ, βeβ} for K = 3?
Please, check the console output here below:
Combinations:
a b c
a b d
a b e
a c d
a c e
a d e
b c d
b c e
b d e
c d e
N. combinations: 10
Second approach: a C++ specific method
The method proposed in this section has been adapted from Rosetta Code website.
I like this approach since it is not recursive and it takes advantage of the function std::prev_permutation() already available in the STL library βalgorithmβ.
#include <algorithm>
#include <iostream>
#include <string>
#include <vector>
using T = char;
using Tcomb = std::vector<T>;
using Tsol = std::vector<Tcomb>;
void comb(const int K, const Tcomb& v, Tsol& sol)
{
const int N = v.size();
if (K > N)
{
// log an error message here!
return;
}
std::string bitmask(K, 1); // K leading 1's
bitmask.resize(N, 0); // N-K trailing 0's
// store values and permute bitmask
do {
Tcomb tmp{};
for (int i = 0; i < N; ++i) // [0..N-1] integers
{
if (bitmask[i] != 0)
{
tmp.emplace_back(v[i]);
}
}
if (!tmp.empty())
{
sol.emplace_back(tmp);
}
} while (std::prev_permutation(bitmask.begin(), bitmask.end()));
}
int main()
{
Tsol sol;
Tcomb values{ 'a','b','c','d','e' };
comb(3, values, sol);
for ( auto& t : sol)
{
for (auto& e: t)
{
std::cout << e << " ";
}
std::cout << std::endl;
}
std::cout << "N. combinations: " << sol.size() << std::endl;
}
The proposed iterative method can be used with any pair of K and N (assuming K <= N).
Conclusions
In this article, two different approaches have been proposed to calculate all combinations of size k from a set of size N.
The main assumptions for the proposed methods are that all items in the set are unique and β eventually β sorted according to some criteria. The calculated combinations are sorted according to the same criteria followed in the set of items used as input.
Each proposed method has been properly discussed to point out pros and cons.