Sierpinski Sieving

Background

There's a long boring history of some old stuff I did for the original Ballinger/Keller Sierpinski project here.

After a turbulent change-over, about which I've already made my views clear, the project is now being attacked by a massive distributed computing project called Seventeen or Bust.

What's here

A sieve which until I spilt the beans about my algorithms on my machines benchmarked faster than Paul Jobling's SoBSieve! Paul's taken the lead again though, so if you are running Windoze, you'll probably want his SobSieve 1.22.

Choose your architecture from

no extension=Windows/x86,
_x86=Linux/x86,
_FB=FreeBSD/x86,
_OB=OpenBSD/x86,
_Sun=SunOS/Sparc,
_osx=OSX/PPC
_axp=Linux/Alpha

from the download area. Historical versions still available:

006: initial public release
007: saves state to 'SoB.bat', so restarts are trivial.
008: much faster (>50% on my machine)
009: Solaris: a bit faster still (had to disable one 009 optimisation)
009: Others: very much faster (3* 007 on my Duron)
010: Faster still. Very fast indeed on Linux. (>4* 007 on my Duron) Not availale for Solaris.

Not every version is available for every platform.

Note - the *BSD ones were built using gcc 2.95 and so probably are less optimised than the linux and windows versions.

The screen output of the above is merely informational, tha actual factors found are written to a file whose contents (apart from CRLF issues) should be identical to that of SoBSieve, and should be pastable straight into the SoB Sieve submission Form.

Sieve Depth

The Sierpinski project is _different_ from normal prime finding.

For normal prime finding you sieve until you are rejecting composites no faster than a PRP test would. The Sierpinski project is different for 2 reasons:

Candidates are not all the same size, so there is no definitive time for a PRP test.
The Sierpinski project is concerned only with finding the smallest prime in each range.
SoB's a massively distributed project, and the measurement of speeds of the two concepts is a somewhat abstract concept.

(1) can be immediately resolved by considering the speed of the fastest PRP test, i.e. the smallest candidate remaining. (3) simply means that you have to be prepared to take an average. (2) is the trickier one to evauate.

If the Sierpinski primes were denser, then you would predict that you'll only need to test a fraction of the range, and therefore you scale down the sieving effectiveness by the ratio of the range you expect to test. However, Sierpinski candidates are sparse, and therefore for most of the candidates you do expect to test the whole range. However, I believe that about half of the k values are expected to crack before the n=20M range. That means that half of them, "on average" (which means nothing) are expected to crack at 10M. And therefore for half of the candidates, sieving is expected to only be half as useful.

With the above in mind, on the whole I'd say don't sieve much beyond the 50% removal rate of PRPing. Any machine that can't effectively PRP test, such as ones with slow cache/memory architectures should certainly be sieving.

Predicting when that rate cutoff will be reached will require some data gathering. However, assuming the numbers now have a flat distribution of prime factors (fair once they've been sieved beyond a certain point), the ratio of candidates being removed changes in proportion with the log of the prime sieved to.

So if there are 1000 candidates after sieving to 4G (2^32, e^22), then you'd expect:

P log(p) c rate (c/G)

4G 22.2 1000 -

10G 23.0 963 6.12

20G 23.7 935 2.82

50G 24.6 900 1.16

100G 25.3 875 0.49

200G 26.0 852 0.233

500G 26.9 823 0.096

1T 27.6 802 0.041

2T 28.3 783 0.0196

5T 29.2 759 0.0082

So I hope you can see the diminishing returns as p increases

P	log(p)	c	rate (c/G)
4G	22.2	1000	-
10G	23.0	963	6.12
20G	23.7	935	2.82
50G	24.6	900	1.16
100G	25.3	875	0.49
200G	26.0	852	0.233
500G	26.9	823	0.096
1T	27.6	802	0.041
2T	28.3	783	0.0196
5T	29.2	759	0.0082

Hmmm, more of the same, but with the initial conditions matching "smh"'s SoB information:

P log(p) c rate (c/G)

5.7G 22.5 825000 -

10G 23.0 804860 4684

20G 23.7 781340 2352

50G 24.6 752280 969

100G 25.3 731690 412

200G 26.0 712200 195

500G 26.9 687970 81

1T 27.6 670720 34

2T 28.3 654300 16

5T 29.2 633800 7

10T 29.9 619000 2.9

20T 30.6 605000 1.4

50T 31.5 588000 0.6

100T 32.2 575000 0.25

200T 32.9 563000 0.12

Note - the number of significant digits in the above is completely unjustified! One of my test ranges was 155.00-155.01G, and there were 5 factors in it, that's slightly over what is predicted in the above, but it's such a small sample that one can't jump to any wild conclusions from it. (Nuutti spotted a typo in the above, it used to wrongly say 150.00-150.01G, and he found only _2_ factors in that range, so that's a sample point I'm happy to include here. Thanks Nuutti!)

P	log(p)	c	rate (c/G)
5.7G	22.5	825000	-
10G	23.0	804860	4684
20G	23.7	781340	2352
50G	24.6	752280	969
100G	25.3	731690	412
200G	26.0	712200	195
500G	26.9	687970	81
1T	27.6	670720	34
2T	28.3	654300	16
5T	29.2	633800	7
10T	29.9	619000	2.9
20T	30.6	605000	1.4
50T	31.5	588000	0.6
100T	32.2	575000	0.25
200T	32.9	563000	0.12

The previous figure I gave of "3% fewer candidates each time you double p" corresponds to about the 5T level. This is a level that is quite frequently reached in the kinds of primes searches I used to do. At the 500G-1T level it's still 4%.

Conclusion?

Someone said that it takes 1 day to do a PRP test, and he can sieve 6G per day. Therefore the 'assume all sieving is useful' cutoff would be rate=1/6. Assuming only 3/4 of sieving is useful (half of half will be unneeded), that becomes rate=2/9. i.e. the target for P is initially 100T given the above data. I'd like to see a real datapoint above 5.7G to help keep the momentous extrapolation on track.

Now as the candidates being tested grow, the time taken to test them grows, with the square of the exponent. Therefore by the time you're at 6G, you're looking at nearly 1/10th of the speed. Therefore the sieving depth should now be about 500T. (a factor of 10 sieving depth change would give more than a factor of 10 decrease in sieving efficiency, and only 13/16 of the sieving effort is relevant anymore, due to the fact 3-6M have been tested.)

Note well - this means that while PRP-ing is taking place, sieving should also take place, in parallel. It will become less useful as time goes on, but at that stage, the project ought to be considering getting 20-100M sieved on whatever k candidates remain.

Why that conclusion is wrong!

I've not worked out how much of an issue the different densities of the different k values makes. I'm sure it's not huge, but it could skew things noticably. You see - the ones with the least candidates are the ones which are most likely to be tested all teh way to 20M, as the density's so sparse. Similarly, the ones with the most candidates are the ones that are most likely to not need to be tested the whole way as some should be expected to crack at 5M or 10M, or wherever. That means that the majority of the candidates we're removing are more likely to be ones that we wouldn't have tested anyway. That decreases sieving effectiveness. Maybe by a factor or up to 2 or even more (I've not compared the real densities, but I'm guessing there's a spread over several binary orders of magnitude.).

Wild stab in the dark:
p=50T by the time you're testing n=3000000
p=300T by the time you're testing n=6000000

It's not worth trusting these figures with any certainty, as they're wild extrapolations. By the time we're sieved to 1T or 10T, it should be _much_ clearer where the real target is, and I'll re-do the predictions then. Hopefully, I'll see if I can get Jack Brennen and/or Yves Gallot to refine the predictions using the insight that they have which is undouubtedly greater than that of anyone else's.

Questions?

Any problems, mail me:

          _
_/  /  _ (_  _ _/     /  '  /          _  /           _           /
/  /) (- /  (/ /  /) /) /  (  (a)  (/ (/ /) () () .  (  () .  (/ /(
                 /                 /

Another hastily constructed page by Phil Carmody
Home / Maths / Sierpinski / index.html