Saturday, April 02, 2022

Optimal Grouping Algorithm for COVID-19 Detection

 In response to the outbreak, countries around the world need to test potentially infected people. Due to the relative shortage of detection reagents, how to use as few reagents as possible for detection has become an interesting problem. It is assumed here that the sampling volume is sufficient, and the detection time requirement is not considered.

At present, many countries use the group testing mechanism, that is, the groups to be tested are divided into groups, and the samples in the same group are mixed for testing. If a group is positive, the group is tested further.

A natural question: how to group to save reagents?

Consider first the two extreme cases.

A small number of infected people

Usually, infected people within a group are a minority.

Suppose there are 64 people in the group, and there is only 1 infected person, using the 2, 4, 8, 64 equal division method.

  • 2-point method requires 6 rounds of testing, a total of 2*6=12 times;
  • 4-point method requires 3 rounds to be tested, a total of 4*3=12 times;
  • 8-point method requires 2 rounds to be tested, a total of 8*2=16 times;
  • 64-point method requires 1 round of tests, 64 times in total.

In fact, when there are a large number of people, such as N people, the x-division method is used, and the total number of tests is about y=x*log(x, N). It is easy to know that the function takes a minimum value when x=e.

In this case, an x ​​value near e should be taken, such as 2 or 3, to save reagents. At the same time, the number of people N is best chosen to be a value that divides x.

A large number of infected people

In the second scenario, all 64 people are infected, and the 2, 4, 8, and 64 division method is also used.

  • 2-points method requires 6 rounds to be tested, a total of 2+4+8+16+32+64 = 126 times;
  • 4-point method requires 3 rounds of testing, a total of 4+16+32+64=116 times;
  • 8-point method requires 2 rounds of testing, a total of 8+64=72 times.
  • 64-point method requires 1 round of tests, 64 times in total.

It is easy to find that when the 64-point method is used at this time, the total number of detections is the least.

N people, using the x-point method, the total number of tests is about S=x*\frac{1-x^log(x,N)}{1-x}, which is obviously a decreasing function.

In this case, a larger number of groups should be taken.

Promotion as a general case

If N people are known and the infection rate is p, how much should x be selected at this time?

This problem is a bit more complicated than the above two cases, we adopt the idea of ​​backward push.

First draw the detection route, which is a typical tree structure, as shown in the following figure.

Among them, there are p*N people who are infected (in the worst case they do not have a common parent node), then in the last round, it needs to be checked p*N*x times. Pushing back to the next level, it still needs to detect pNx times. Until the last sample to be detected has a common parent node, starting from this layer, each layer needs to be detected x times.

That is, p*N*x + p*N*x + … x + x + ….

In the worst case, infected samples are separated as far as possible from each other until the second layer, where all samples to be detected have a common parent node. At this time, the total number of detections is S=p*N*x*(log(x, N)-1)+x. At this time, the optimal x value is around 3.


Usually, this formula is accurate when the infection rate p is not high; when p is high, the total number of detections predicted by the formula is higher than the actual value because multiple infected people at the bottom have a common parent node. Therefore, the consensus obtained is the upper limit of the required number of reagents.

Applied to real life

Assuming that the resident population of a city is N=20000000 (20 million), and the infection rate is p=0.0001 (one ten thousandth), then insert the formula to get

S=2000*x*(log(x,20000000)-1)+x, it is easy to find that when x=2.91 (which can be rounded to 3), the minimum value of S is 85782, and the number of detection rounds is 16 rounds.

Therefore, if the optimal detection algorithm is adopted, the number of reagents used in the city accounts for only 0.43% of the overall population to be tested, which can save a lot of reagents. The cost is the need for multiple rounds of testing.

In practice, only the first few rounds can be grouped to achieve a balance between the optimal number of reagents and the detection time. For example, the detection must be completed within a given time, and the number of detection packets at each layer can be adjusted at this time.

ps, interestingly, no matter what value p takes, the final optimal x will never exceed 4.

No comments:

Post a Comment