RANDOM TOOLS Random tools is a set of programs to collect random data from various sources (e.g. the audio device) and feed it into the kernel random pool, used to implement /dev/[u]random. Under normal conditions, the kernel manages to collect enough randomness by timing internal events such as interrupts, hard disk requests, etc... which are related to unpredictable physical events (the user typing or moving the mouse, disk heads seek time). Such randomness is measured in bits, sometimes referred to as "entropy count". The kernel maintains both a random pool (a collection of random data) and an entropy count for this pool. This count is decremented by N when N bits are read from /dev/random (see comments in drivers/char/random.c). This is only because we leak N bits of information about the internal pool state, in theory. In pratice, since a SHA hash is computed internally by the driver, an attacker needs to crack SHA to gain that knowledge. Anyhow, just to play safe, the kernel assumes SHA is crackable, and thus decreases the entropy count by the maximum value, just as if the attacked were able to gain all the possible information from those N bits. On the other hand, when we're *incrementing* the count, we must assume that we can't be 100% sure about the true randomness of the data we're adding to the pool, thus we increment the entropy count by M every N bits we add, where usually M << N. The smaller M the better. We absolutely don't want to overestimate M (or the M/N ratio). In few words, while lowering the entropy count is easy (dd if=/dev/random bs=1 count=64 will extract 256 random bits, thus decreasing the entropy count by 256), increasing it without incurring in entropy overestimation may be difficult. It is common for rack-mounted ("headless") servers, which lack keyboard, mouse, and video events, to be unable to collect enough randomness to fulfill all requests for random bits generated by SSL/TLS enabled servers or by other security facilities that need random bits. One way to address this problem is to get random data from some random generating HW. A few systems already include such devices, but most don't. One possible source of random data is environmental (audio) noise, obtained via a sound card (a very common device nowadays). The first tool in my collection, sound2rnd, listens to the audio device collecting external noise. In stereo mode, it computes the difference between the left and the right channel (idea inspired by the audio-entropd by Damien Miller ), in order to obtain an output stream which is less predictable. Still, the randomness in audio data (very likely) isn't "dense" enough for our application. Compression is one way to "concentrate" randomness, since it works by removing some of the predictability of the input. Another way is to compute the digest of the audio data with a function like MD5 or SHA1, which can be seen as extremely lossy compression functions. Their output has the property of being highly dependent on even small variations of the input, and therefore is highly unpredictable even if the input contains little randomness (it should be noted that the output can be much shorter than the input: SHA1 always produces a 160 bit digest. It can be computed out of many kbytes of audio data, in a way compressing them to 20 bytes only. The randomness will be almost the same, but concentrated into 160 "random" bits). These functions also have the side effect of uniforming the data (mean tends to be 127.5), a property that the audio data alone lacks. The second tool in this collection is digest, which can be used to produce SHA1 digests of blocks of input data. Both the output of a HW RNG or the digests of samples of audio noise need to be tested for being "really" random. fips is a tool that can be used to perform FIPS 140-2 (http://csrc.nist.gov/publications/fips/) tests on the input. It will test a sequence of 20000 bits, and drop it if it doesn't pass the test. fips works like a UNIX filter program, i.e. it reads its standard input and writes (tested) data to standard output. This is different from the suggested behaviour, in that *all* data blocks are tested. Finally, the random data needs to be injected into the kernel pool. The entropy counter must be updated, too, with an esteem of the randomness of the injected data. feed_random updates the random pool, crediting to the entropy count M "entropy bits" every N random bits, M/N being a user configurable parameter. For audio data, the tools are designed to work in pipe mode, like this: sound2rnd | digest | fips | feed_random > /dev/random Read comments in the example daemon.sh to learn more about how to build a working example (the one above is unsatisfactory for some reasons). With an HW random generator, it can be as simple as: hwrnd-collector | fips | feed_random > /dev/random (user land driver, digest unnecessary) or just: cat /dev/[hw-rng] | digest | fips | feed_random > /dev/random (kernel driver only, undigested data) or even just: hwrnd-daemon | feed_random > /dev/random (collects data and performs its own tests) Writers of HW RNG device drivers are encouraged in contributing their own examples and suggestions. "RANDOM" NOTES The perfect random data is completely unpredictable, and each of its possibile values has the same chance to happen. A perfect random string of 20000 bit has a "entropy count" of 20000. On the other hand, a string made the same bit (either 0 or 1) has an entropy count of 1, since its completely predictable, once you know the first bit. You need to know only one bit to say you know the whole string. Even the data produced by a good pseudo-random number generator has very little entropy, since you only need to know the initial seed to be able to predict all the output (thus, even 1GB of output contains only few entropy bits). Many tests exist to detect non random data (well, they exist to detect random data or to raise a flag when the random part is dominant, e.g. in the results of an experiment - but their logic can be reversed). It is important to note that even if a very good PRNG passes these tests, the true randomness of its output is still limited by the size of the space of its initial states (the seed), in theory, no matter if we're able to detect that in practice. That means that the true entropy of a random source is unknown and there's no definitive way to measure it. It can be only estimated, and we can only hope the way we use to collect the random data is unbiased. Testing the random data we collect is useful only to detect macroscopic failures in the process (that's the purpose of the FIPS tests) and not to provide a complete protection against small subtle biases in a input that only looks like truly random. So the general rule is: be extremely conservative in estimating the entropy of your random source. HISTORY I wrote this as part of the process of enabling some of our Internet servers to use the encryption capabilities of a set of TLS/SSL enabled daemons, like sendmail, Cyrus IMAPs, Apache HTTPd. I discovered that many client applications behave in a suboptimal way (to be fair), opening and closing a connection to the IMAP server everytime they check for new mail. Since every time a new connection is made some random session key needs to be generated, this places some load on the kernel random pool used for /dev/random implementation. It turned out that some systems (Sun Ultra 1 hw, Linux 2.2.x kernel) were unable to retrieve enough randomness to cope with even a small number of users (on such a system, the kernel manages to collect about a couple of bits per second of randomness). Since the hw were "headless" servers (no video card, no keyboard, no mouse), one way to collect more random data was to process audio input (environmental noise). I've learned that it's that not easy to collect data that can be safely considered "random" (unpredicatable). Thus I decided to write some tools to allow me feed the random pool in a controlled way. -- Marco Colombo ESI Staff $Id: README,v 1.2 2002/03/16 18:32:12 marco Exp $