If you came here from Memepool, you'll want the previous story. You can always return to this page for Chapter Two.

An Executable Prime Number?

I trust you've read about the first alleged illegal prime number already? And that includes disclaimers about inability to spell and sloppy typing!

Oh No! Not Another Dodgy Prime Number!

Professor Caldwell was recently due to give a talk about illegal prime numbers. The talk I believe is taking place as I write this, and it probably includes information on my previous allegedly illegal primes. The first one was in some ways ground-breaking, but in other ways unsatisfactory. Something was missing. However, he got in contact with me on Friday (probably Saturday my time - we're in very different time zones) with the following questions (paraphrased slightly):

  How hard would it be to create an executable prime? 
  1) The shortest possible executable prime
  2) Could _arbitrary_ programs be converted into primes?
Maybe I was reading something between the lines that wasn't there, but if arbitrary programs could be expressed as primes, the immediate conclusion is that all programs, including ones some people wished didn't exist, can too. I.e. the so called 'circumvention devices' of which my previous prime exploit was an example.

Professor Caldwell didn't actually ask me to create these - merely to evaluate the difficulty of the tasks. I.e Whether they're easy, feasable but hard, impractically hard (requires the entire computing power of the NSA), or just downright impossible (theoretically). (However, I like to rise to a technical challenge.)

The Smallest Executable Prime

A prime executable - that's novel. What OS? What processor? The answer to the question "what is the shortest possible executable prime?" must be accompanied with a definition of OS and processor - but it appeared that I had free rein. Then again, another question which arose was ...

What Defines "Executable"

To a mathematical mind, one that studied Computability at university, there was a simple answer to this. It need do nothing but terminate. It doesn't need to actually _do_ anything. For example, in the appropriate environment (one which does not require a return value from main(), such as some embedded platforms)

void main() { }
would, when compiled, be a valid executable.

Which OS? Which Processor?

The executable format with the least overhead has got to be the old DOS .COM format. Basically there is no overhead! All you have to do is return! Alas RET is non-prime, so the smallest executable .COM file must be at least 2 bytes. I needed a 'neutral' instruction which would yield a prime. Several of the 8-bit numbers which yield a prime are not single-byte instructions on the x86, so they're out, and some (7, 14, 22) mess with the stack, and cause the following RET to jump into outer space. Not good. Not deterministic. Not executable by my definition. Some numbers made meaningless, but not necessarily invalid, instructions. There was one which certainly has no issues. Here's the note I sent to Professor Caldwell on Saturday:

 7*256+195 // POP ES; RET  probably should crash, but didn't for me!
14*256+195 // PUSH CS; RET  should crash - and does :-)
22*256+195 // PUSH SS; RET  should crash - and does :-)
38*256+195 // ES:RET  is bizarre but should work and does for me
46*256+195 // CS:RET  is bizarre but should work and does for me
47*256+195 // DAS; RET  is fine. The smallest sensible program
So the jury's out - is it 38*256+195 = 9923, or 47*256+195 = 12227? Unless I hear otherwise I think that it's 9923.

What About Other Processors/OSes

Well, almost every other processor nowadays has instruction sets with at least 16-bit, and more commonly 32-bit, instructions. So one would have to rewind the clock a fair way to find a single-byte executable. From my memory of the Z80, the return code (0xC9) is non-prime, so there could be a battle for least-valued 2-byte prime... So the topic is not totally closed. But that's enough about it for now, at least.

An Executable Prime from an Arbitrary Program

But which program? Well, it would have to be small in order to be provable. I seem to remember that Charles Hannum's efdtt.c was pretty tiny. Perhaps that would be a good starting point!

This definitely required a game plan. This would not be easy. I knew size would be key. Again, I had to chose a platform - an OS and a processor. Because I needed 32-bit instructions .COM was out. The only platform in which I was capable of programming the task quickly was Linux, as an ELF executable.

Side Point - The Law

Can a program that just twiddles bits really be illegal? Oh dear. If I didn't feel obliged to cry at that very concept I'd love to laugh at it. However, it's trivial to create an 'illegal circumvention device', as defined by the DMCA. It appears from the Sklyarov case that it's remarkably easy for code to be classified as such a device. I wonder if anyone running rot13 could be considered to be using such a circumvention device - you have been warned.

It's All About Size

However, as before, to satisfy the infinitely pedantic mathematicians (pedantry is not just good, it's necessary), any prime would need to be provable. The current largest proven prime with no special form is 5020 digits, 16677 bits, or 2085 bytes. That's huge for a prime of no special form, but _tiny_ for a payload. The smallest executable on my system (x86 Linux) is

-rwxr-xr-x    1 root     root         2688 Sep  7  2000 arch
Way too big to prove. This would primarily (no pun intended) be an exercise in cheating. First I needed to find ...

The Smallest Implementation

No point in reinventing the wheel - Charles Hannum has the C code which pretty much sews up this challenge. OK, he optimised for source size, but if I know C compilers, less source will often imply less executable. So I snarfed his code and built it.

-rwxr-xr-x    1 phil     phil         3612 Sep 11 01:20 decss
Ouch!

Oh - if you want to skip the gory details of ELF files - skip to the conclusion of the optimisation.

Don't Use C. (Oh alright[*], if you have to...)

[* If the lexeme is good enough for Eric Clapton, it's good enough for me]

Assembly gurus will be saying at this point "don't use C - use assembly". However, I didn't have the inclination to rewrite the whole thing in assembly. I needed a compromise. There is only one reference I needed: Brian Raiter's Tiny Programs ("a whirlwind tutorial on creating really teensy ELF executables for Linux"). The hints there were invaluable:

Easy, isn't it?

Amusingly the third of those was easier than Brian Raiter suggests. I didn't need to create the whole file from binary data without the unnecessary sections - I simply had to use the strip command with the appropriate -R parameters.

The fourth tweak was a tad tricky. I had to write my own ELF file parser, which would detect where the unnecessary 'section header' was, and chop it off. Fortunately nothing follows it in the file so nothing needed to be relocated.

But it's Even!

The file ends in a zero. NOT a good start to finding a prime number. Disaster?

By this stage I was well out of my depth when it came to the ELF format. I didn't know what the section before the end was, and what it contained; worst of all I didn't have the time to read up enough to find out! I wanted to have a prime for Professor Caldwell in time for his presentation. Welcome to the land of ...

Brute Force and Ignorance

I simply pulled the executable file into hexl-mode in emacs. The last 64 bytes of my file were:

000002f0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000300: 002e 7379 6d74 6162 002e 7374 7274 6162  ..symtab..strtab
00000310: 002e 7368 7374 7274 6162 002e 7465 7874  ..shstrtab..text
00000320: 002e 726f 6461 7461 002e 6273 7300 0000  ..rodata..bss...
That's not my data, and that's not my code; i.e. I don't need it, I don't know if the system does or not, there's only one way to find out. So I just chopped that bit off!

And it worked :-)

A 752-Byte Executable

That wasn't too bad for an afternoon's work. 752 bytes would correspond to a 752*ln(256)/ln(10) = 1811 digit prime. This is no longer considered particuarly large, despite the fact that 6 months ago, it would have been. The prime world - and the PC world - do not stand still...

However, I had to find bytes that I could vary without affecting the workingness of my code. I could also need the final byte to be ... oh, no, not again - you guessed it ...

It's still even!

This time, however, I recognised the final bytes of the file:

000002d0: 5f5d 83c4 1cc3 3757 6f7e 2747 5f8e 0063  _]....7Wo~'G_..c
000002e0: 7233 7366 7736 763b 2a6b 2b3e 2f6e 2e00  r3sfw6v;*k+>/n..
Those are Charles Hannum's magic strings! And the final byte of the file is a zero only because they are C strings which end with a zero character. The reason they end with a zero is so that the C libraries can know how long they are without being told their length explicitly. What C functions? I scrapped using the C library hours ago! The zeroes are toast! All I needed to do was to replace them with numbers which would yield a prime value for the whole expression.

Finding the Probable Prime

This story's old. I had 128 choices for the last byte, and 256 for the one 17 bytes from the end. Each combination had something like a 1:2085 chance of being prime (at that size 1 in ln(256752)=4170 are prime, but I'm not going to bother looking at the even values, obviously, so what's left over is twice as dense. Basically I should expect about 15 primes in the range, and I knew that when I started.

As before, I used PrimeForm, or should I say the 'PFGW' incarnation of it. I wrote a simple script to look at the 32768 possible combinations, and just let it run. Alas I was using an old 133MHz pentium, so it took a while, maybe half an hour, but eventually a probable prime appeared.

The expression which was prime was effectively:

very_big_number + (10*(2^8)^17) + 49
i.e. a '10' 17 characters in from the end of the file, and a '49' at the end. (By coincidence these are both quite sensible ASCII characters to have within a string, so I didn't feel I was violating Charles' strings by inserting these characters. '10' is a line feed, and '49' is the digit 1.)

Proving it Prime

Marcel Martin, who wrote the program Titanix used to prove the last prime has given it a bit of a face-lift, and released a new version called Primo. It's faster, and more capable than Titanix was. In fact it's the current ECPP world record holder.

1811 digits is a breeze in Primo. It's a day's work. And given that it was by now Sunday afternoon, I reckoned I could get a proof by Monday. Primo is really 'point and shoot'; all I had to do was feed it the number and press 'start'!

At this point, I informed Professor Caldwell that with >99% certainty, there would be a prime by today (Monday). I also inquired of Dr. Touretzky whether he'd like another exhibit for his gallery. I was blessed again with two lovely positive replies.

It was simply a matter of waiting for ...

The World's First Non-Trivial Executable Titanic Prime

(Allegedly illegal)

Here's the head and the tail of the certificate file (a summary which can be used to verify the proof, which is vastly quicker than the huge ammount of searching required by an unaided proof). If anyone wants the full certificate, just drop me a note.

PRIMO 1.0.0 - Primality Certificate

Candidate 1/1
-------------
N = 493108359702850190027577767239076495728490777215020863208075\
    018409792627885097658864557802013660073286795447341128317353\
    678312015575359819785450548115719393458773300380099326195058\
    764525023820408110189885042615176579941704250889037029119015\
    870030479432826073821469541570330227987557681895601624030064\
    111516900872879838194258271674564774816684347928464580929131\
    531860070010043353189363193439129486044503709919800477094629\
    215581807111691530318762884778783541575932891093295447350881\
    882465495060005019006274705305381164278294267474853496525745\
    368151170655028190555265622135314631042100866286797114446706\
    366921982586158111251555650481342076867323407655054859108269\
    562666930662367997021048123965625180068183236539593483956753\
    575575324619023481064700987753027956186892925380693305204238\
    149969945456945774138335689906005870832181270486113368202651\
    590516635187402901819769393767785292872210955041292579257381\
    866058450150552502749947718831293104576980909153046133594190\
    302588132059322774443852550466779024518697062627788891979580\
    423065750615669834695617797879659201644051939960716981112615\
    195610276283233982579142332172696144374438105648552934887634\
    921030988702878745323313253212267863328370279250997499694887\
    759369159176445880327183847402359330203748885067557065879194\
    611341932307814854436454375113207098606390746417564121635042\
    388002967808558670370387509410769821183765499205204368255854\
    642288502429963322685369124648550007559166402472924071645072\
    531967449995294484347419021077296068205581309236268379879519\
    661997982855258871610961365617807456615924886608898164568541\
    721362920846656279131478466791550965154310113538586208196875\
    836883595577893914545393568199609880854047659073589728989834\
    250471289184162658789682185380879562790399786294493976054675\
    348212567501215170827371076462707124675321024836781594000875\
    05452543537

Decimal size = 1811
Binary size = 6015

...

Started 09/09/2001 09:13:59 PM
Running time 1154028h 37mn 4s

Candidate certified prime
Ignore the running time - I was running under WINE (Wine Is Not an Emulator), under Linux, and it appears as if there are a few problems with the time function... :-)

Well, that's it.

It's prime. It represents an executable. Some have alleged that the executable it represents is illegal. That's what was asked for, I do believe :-)

Preaching

Illegality

As someone who considers himself to be quite creative, I'm a firm believer in authors' and artists' rights, the rights that are protected under copyright. Ripping off DVDs with no intention to buy the originals is illegal in almost all countries in the world, and correctly so (IMHO).

However, I do not believe that the current implementation of US law is a sensible one. I believe it's logically inconsistent, and is biased towards the interests of multinational publishers, and against consumers. I am thankful that there are people like Dan Bernstein, Dave Touretzky, and their ilk, who are proud to stand up and speak common sense. (Which alas ain't so common, it appears.)

Phil Carmody,
Kivenlahti, nr. Helsinki 2001/09/10

Links/Greets/Gibber