[docs][entropy] Document entropy quality tests

Change-Id: I9ac2cf8567a739e03918f3d0a27f574069b2f70d
2017-09-11 11:26:29 -07:00
commit e45bdf6546
@@ -0,0 +1,201 @@
 # Entropy quality tests
 This document describes how we test the quality of the entropy sources used to
 seed the Magenta CPRNG.
 [TOC]
 ## Theoretical concerns
 Approximately speaking, it's sometimes easy to tell that a stream of numbers is
 not random by recognizing a pattern in it. It's impossible to be sure that the
 numbers are truly random. The state of the art seems to be running several
 statistical tests on the data, and hoping to detect any exploitable weaknesses.
 The problem of testing for randomness gets more difficult when the random
 numbers aren't perfectly random (when their distributions aren't uniform, or
 when there are some limited correlations between numbers in the sequence). A
 stream of non-perfect random numbers still contains some randomness, but it's
 hard to determine how random it is.
 For our purposes, a good measure of how much randomness is contained in a stream
 of non-perfectly random numbers is the min-entropy. This is related to the
 Shannon entropy used in information theory, but is always takes a smaller value.
 The min-entropy controls how much randomness we can reliably extract from the
 entropy source; see, for example
 <https://en.wikipedia.org/wiki/Randomness_extractor#Formal_definition_of_extractors>
 From a practical standpoint, we can use the test suite described in US NIST
 SP800-90B to analyze samples of random from an entropy source. A prototype
 implementation for the tests is available from
 <https://github.com/usnistgov/SP800-90B_EntropyAssessment>. The suite takes a
 sample data file (say, 1MB of random bytes) as input. The nice thing about this
 test suite is that it can handle non-perfect RNGs, and it reports an estimate
 for how much min-entropy is contained in each byte of the random data stream.
 ### The importance of testing unprocessed data
 After drawing entropy from our entropy sources, we will mix it into the CPRNG in
 a "safe" way that basically gets rid of detectable correlations and
 distributional imperfections in the raw random byte stream from the entropy
 source. This is a very important thing to do when actually generating random
 numbers to use, but we must avoid this mixing and processing phase when testing
 the entropy source itself.
 For a stark example of why it's important to test unprocessed data if we want to
 test our actual entropy sources, here's an experiment. It should run on any
 modern linux system with OpenSSL installed.
    head -c 1000000 /dev/zero >zero.bin
    openssl enc -aes-256-ctr -in zero.bin -out random.bin -nosalt -k "password"
 This takes one million bytes from /dev/zero, encrypts them via AES-256, with a
 weak password and no salt (a terrible crypto scheme, of course!). The fact that
 the output looks like good random data is a sign that AES is working as
 intended, but this demonstrates the risk of estimating entropy content from
 processed data: together, /dev/zero and "password" provide ~0 bits of entropy,
 but our tests are way more optimistic about the resulting data!
 For a more concrete Magenta-related example, consider jitterentropy (the RNG
 discussed here: <http://www.chronox.de/jent/doc/CPU-Jitter-NPTRNG.html>).
 Jitterentropy draws entropy from variations in CPU timing. The unprocessed data
 are how long it took to run a certain block of CPU- and memory-intensive code
 (in nanoseconds). Naturally, these time data are not perfectly random: there's
 an average value that they center around, with some fluctuations. Each
 individual data sample might be several bits (e.g. a 64-bit integer) but only
 contribute 1 bit or less of min-entropy.
 The full jitterentropy RNG code takes several raw time data samples and
 processes them into a single random output (by shifting through a LFSR, among
 other things). If we test the processed output, we're seeing apparent randomness
 both from the actual timing variations and from the LFSR. We want to focus on
 just the timing variation, so we should test the raw time samples. Note that
 jitterentropy's built-in processing can be turned on and off via the
 `kernel.jitterentropy.raw` cmdline.
 ## Quality test implementation
 As mentioned above, the NIST test suite takes a file full of random bytes as
 input. We collect those bytes on a Magenta system (possibly with a thin Fuchsia
 layer on top), then usually export them to a more capable workstation to run the
 test suite.
 ## Boot-time tests
 Some of our entropy sources are read during boot, before userspace is started.
 To test these entropy sources in a realistic environment, we run the tests
 during boot. The relevant code is in
 `kernel/lib/crypto/entropy/quality\_test.cpp`, but the basic idea is that the
 kernel allocates a large static buffer to hold test data during early boot
 (before the VMM is up, so before it's possible to allocate a VMO). Later on, the
 data is copied into a VMO, and the VMO is passed to userboot and devmgr, where
 it's presented as a pseudo-file at `/boot/kernel/debug/entropy.bin`. Userspace
 apps can read this file and export the data (by copying to persistent storage or
 using the network, for example).
 ### Boot-time tests: building
 Since the boot-time entropy test requires that a large block of memory be
 permanently reserved (for the temporary, pre-VMM buffer), we don't usually build
 the entropy test mode into the kernel. The tests are enabled by passing the
 `ENABLE_ENTROPY_COLLECTOR_TEST` flag at build time, e.g. by adding
 ```
 EXTERNAL_DEFINES += ENABLE_ENTROPY_COLLECTOR_TEST=1
 ```
 to `local.mk`. Currently, there's also a build-time constant,
 `ENTROPY_COLLECTOR_TEST_MAXLEN`, which (if provided) is the size of the
 statically allocated buffer. The default value if unspecified is 128kB.
 ### Boot-time tests: configuring
 The boot-time tests are controlled via kernel cmdlines. The relevant cmdlines
 are `kernel.entropy-test.*`, documented in
 [kernel\_cmdlines.md](kernel_cmdlines.md).
 Some entropy sources, notably jitterentropy, have parameter values that can be
 tweaked via kernel cmdline. Again, see [kernel\_cmdlines.md](kernel_cmdlines.md)
 for further details.
 ### Boot-time tests: running
 The boot-time tests will run automatically during boot, as long as the correct
 kernel cmdlines are passed (if there are problems with the cmdlines, error
 messages will be printed instead). The tests run just before the first stage of
 RNG seeding, which happens at LK\_INIT\_LEVEL\_PLATFORM\_EARLY, shortly before
 the heap the VMM are brought up. If running a large test, boot will often slow
 down noticeably. For example, collecting 128kB of data from jitterentropy on
 rpi3 can take around a minute, depending on the parameter values.
 ## Run-time tests
 *TODO(SEC-29): discuss actual user-mode test process*
 *Current rough ideas: only the kernel can trigger hwrng reads. To test,
 userspace issues a kernel command (e.g. `k hwrng test`), with some arguments to
 specify the test source and length. The kernel collects random bytes into the
 existing VMO-backed pseudo-file at `/boot/kernel/debug/entropy.bin`, assuming
 that this is safely writeable. Currently unimplemented; blocked by lack of a
 userspace HWRNG driver. Can test the VMO-rewriting mechanism first.*
 ## Test data export
 Test data is saved in `/boot/kernel/debug/entropy.bin` in the Magenta system
 under test. So far I've usually exported the data file manually via `netcp`.
 Other options include `scp` if you build with the correct Fuchsia packages, or
 saving to persistent storage (probably using the Fuchsia `thinfs` FAT
 filesystem, so you can read the files on a non-Magenta computer).
 ## Running the NIST test suite
 *Note: the NIST tests aren't actually mirrored in Fuchsia yet. Today, you need
 to clone the tests from the repo at
 <https://github.com/usnistgov/SP800-90B_EntropyAssessment>.*
 The NIST test suite has three entry points (as of the version committed on Oct.
 25, 2016): `iid_main.py`, `noniid_main.py`, and `restart.py`. The two "main"
 scripts perform the bulk of the work. The `iid_main.py` script is meant for
 entropy sources that produce independent, identically distributed data samples.
 Most of the testing is to validate the iid condition. Many entropy sources will
 not be iid, so the `noniid_main.py` test implements several entropy estimators
 that don't require iid data.
 Note that the test binaries from the NIST repo are Python scripts without a
 shebang line, so you probably need to explicitly call `python` on the command
 line when invoking them.
 The first two scripts take two arguments, both mandatory: the data file to read,
 and the number of significant bits per sample (if less than 8, only the low `N`
 bits will be used from each byte). They optionally accept a `-v` flag to produce
 verbose output or `-h` for help.
 The `noniid_main.py` also optionally accepts a `-u <int>` flag that can reduce
 the number of bits below the `N` value passed in the second mandatory argument.
 I'm not entirely sure why this flag is provided; it seems functionally
 redundant, but passing it does change the verbose output slightly. My best guess
 is that this is provided because the noniid Markov test only works on samples of
 at most 6 bits, so 7- or 8-bit datasets will be reduced to their low 6 bits for
 this test. In contrast, all the iid tests can run on 8-bit samples.
 A sample invocation of the `iid_main.py` script:
 ```
 python2 -- $FUCHSIA_DIR/third_party/sp800-90b-entropy-assessment/iid_main.py -v /path/to/datafile.bin 8
 ```
 The `restart.py` script takes the same two arguments, plus a third argument: the
 min-entropy estimate returned by a previous run of `iid_main.py` or
 `noniid_main.py`. This document doesn't describe restart tests. For now, see
 NIST SP800-90B for more details.
 ## Future directions
 ### Automation
 It would be nice to automate the process of building, configuring, and running a
 quality test. As a first step, it should be easy to write a shell script to
 perform these steps. Even better would be to use the testing infrastructure to
 run entropy collector quality tests this automatically, mostly to reduce bit-rot
 in the test code. Failing automation, we have to rely on humans to periodically
 run the tests (or to fix the tests when they break).