RNG-188: Add Philox4x32 and Philox4x64 random number generators#191
RNG-188: Add Philox4x32 and Philox4x64 random number generators#191aherbert merged 5 commits intoapache:masterfrom
Conversation
aherbert
left a comment
There was a problem hiding this comment.
Thanks for the contribution.
Did you code this from the referenced paper? I think there are some edge cases around jumping when the counter is negative or approaches overflow that need to be tested against a reference implementation. This is particularly due to the use of signed integers in Java which may not be accounted for in a reference paper using e.g. c.
The int provider does not reset the internal state after one of the jumps. The long provider does though.
All of the comments for the int provider seem to apply to the long provider as the code is very similar just with longs. Please review that class too when updating.
I think the addition of a method for the open interval should be in a separate PR. It would be best to discuss its integration into the code on the mailing list.
I will have a look at running this RNG through the benchmarking code (TestU01 BigCrush and PractRand). Is the implementation expected to pass these test suites?
| int old = counter0; | ||
| counter0 += nlo; | ||
| if (Integer.compareUnsigned(counter0, old) < 0) { | ||
| nhi++; |
There was a problem hiding this comment.
If nhi is -1 (all bits set) then this increment will not be carried to counter2.
There was a problem hiding this comment.
code will be removed as per you suggestion to implement this later with Interface RandomGenerator.ArbitrarilyJumpableGenerator
| counter3 = state[5]; | ||
| bufferPosition = state[6]; | ||
| super.setStateInternal(c[1]); | ||
| rand10(); |
There was a problem hiding this comment.
Calling rand10 would change the state and prevent restoring to exactly the same state saved by getStateInternal.
There was a problem hiding this comment.
The call to rand10() is actually there to restore the buffer. rand10() only updates the buffer, based on the current state (key and counter). I added a comment in the code.
| final Philox4x32 copy = copy(); | ||
| incrementCounter(1L << 32); | ||
| rand10(); | ||
| resetCachedState(); |
There was a problem hiding this comment.
Why call resetCachedState here but not in the longJump?
There was a problem hiding this comment.
it should indeed be called in both places.
| * @param seed key0,key1,counter0,counter1,counter2,counter3. | ||
| */ | ||
| @SuppressWarnings("PMD.AvoidLiteralsInIfCondition") | ||
| public Philox4x32(int... seed) { |
There was a problem hiding this comment.
All these constructors are not required. Just a single constructor accepting int[] would be fine.
This can be simplified to:
input = seed.length < 6 ? Arrays.copyOf(seed, 6) : seed;
key0 = input[0]
...
counter3 = input[5];
bufferPosition = PHILOX_BUFFER_SIZE;There was a problem hiding this comment.
much nicer indeed!
| * @param seed key for Philox | ||
| * @param subsequence counter third and fourth ints. | ||
| */ | ||
| public void resetState(long seed, long subsequence) { |
There was a problem hiding this comment.
The resetState, setOffset and getOffset method are unique to this class and do not fit within the current API of the RNGs in the library. I am not sure if this support is required other than to demonstrate the arbitrary jumping ability of the RNG.
In the case of the jump(long n) method it may require some thought for a new addition to the JumpableUniformRandomProvider interface for configurable jump size.
There was a problem hiding this comment.
Yes. The idea was to provide a more user friendly interface to the "raw" key and counter. But I get your point. It may create ambiguities.
The jump(long) is indeed an attractive feature of those counter based prngs. But a clever user may implement the same directly on the int[] constructor (with the caveat that jumps need to be a multiple of 4) then. It also look like the Interface RandomGenerator.ArbitrarilyJumpableGenerator interface would be more appropriate to implement, as you suggested.
|
|
||
| import org.apache.commons.rng.UniformRandomProvider; | ||
| import org.apache.commons.rng.RestorableUniformRandomProvider; | ||
| import org.apache.commons.rng.UniformRandomProvider; |
There was a problem hiding this comment.
The imports here have been sorted. I am trying to understand when the order in master currently represents. It seems to be partly in historical order of when providers were added. I don't think this is needed but it would be best to remove the reordering here to make the PR addition more clear and I can fix the order in master separately.
| */ | ||
| PHILOX_4X32(ProviderBuilder.RandomSourceInternal.PHILOX_4X32), | ||
| /** | ||
| * Source of randomness is {@link org.apache.commons.rng.core.source32.Philox4x32}. |
| */ | ||
| @Override | ||
| public int next() { | ||
| if (bufferPosition < PHILOX_BUFFER_SIZE) { |
There was a problem hiding this comment.
There are some strange performance issues on JVMs with the use of post increments. This is something to be avoided when working with arrays if the same thing can be achieved with a pre-increment counter (offset by 1 for the range checks) or use of a temp counter.
I wonder if this would be more performant:
final int p = bufferPosition;
if (p < PHILOX_BUFFER_SIZE) {
bufferPosition = p + 1;
return buffer[p];
}| k1 += PHILOX_W1; | ||
| singleRound(buffer, k0, k1); | ||
| } | ||
|
|
| int origValue = rngOrig.nextInt(); | ||
| assertEquals(origValue, jumpValue); | ||
| } | ||
|
|
|
I've had a bit more time to review this in comparison to the other jumpable RNGs. The smallest jump supported by others is 2^64. The jumps are all documents in the same way. So you can get an idea of the jump size using: I think the Philox 32-bit generator can be updated to jump 42^64 and 42^96. This would make the jumps similar to those provided by the AbstractXoShiRo128 generators: XoShiRo128Plus, XoShiRo128PlusPlus, XoShiRo128StarStar. This size of jump is much easier to implement. The small jump increments counter2 and overflows to counter3; the long jump increments counter3. A similar change in the 64-bit generator would have jumps of 42^128 and 42^196. The purpose of long jumping is to move far enough ahead to be able to create a stream of jumpable generators, where each jumpable generator can stream effectively unlimited RNGs which will not overlap within reason. A jump size of 42^32 creates a generator that could overlap with the next one in a long running use of RNG output. The next size up of 42^64 for each output would require significantly more time to overlap its sequence. If you can copy the javadoc wording from the other generators to state the jump size and expected output length of each child generator that would be helpful. As for configurable jumping I believe we require some interface similar to the JDK's: |
|
Thanks for your detailed comments. I have attempted to address most of them in the latest commit. |
aherbert
left a comment
There was a problem hiding this comment.
Thanks for the update. I noticed the error with resetting the state is still present after long jump of the 32-bit RNG.
The library typically does not support many constructors for each RNG. It is expected that a RNG has only a single constructor to accept the full length seed. Typical use of the generators would be through the commons-rng-simple package where you can create a generator with all types of seeds using the seed conversion routines.
I would remove all but the array constructor. Then put the full seed for the expected output sequence into the unit test. See for example the other tests in the source. One way to test a few seeds is to stream the seed and expected output to a parameterized test. See for example the L64X256MixTest. Your case would require the seeds 1234 and 67280421310721L.
| final Philox4x32 copy = copy(); | ||
| counter3++; | ||
| rand10(); | ||
| return copy; |
There was a problem hiding this comment.
Still missing a call to resetCachedState()
| @Test | ||
| void testLongJumpCounter() { | ||
| Philox4x64 rng = new Philox4x64(new long[]{1234L, 0, 0xffffffffffffffffL, 0, 0xffffffffffffffffL, 0}); | ||
| UniformRandomProvider rngOrig = rng.jump(); |
| @Test | ||
| void testLongJumpCounter() { | ||
| Philox4x32 rng = new Philox4x32(new int[]{1234, 0, 0xffffffff, 0xffffffff, 0xffffffff, 0}); | ||
| UniformRandomProvider rngOrig = rng.longJump(); |
| @Test | ||
| void testJumpCounter() { | ||
| Philox4x32 rng = new Philox4x32(new int[]{1234, 0, 0xffffffff, 0xffffffff, 0xffffffff, 0}); | ||
| UniformRandomProvider rngOrig = rng.jump(); |
| } | ||
|
|
||
| @Test | ||
| void testDouble() { |
There was a problem hiding this comment.
This test is not required.
Note that this test is doomed by using a xor of the first 21-bits for the upper (v) and the full 32-bits of the lower (w) when comparing to the nextDouble method which composes the most significant 26 from v with 27 from w.
Anyway remove this redundant test.
|
|
||
| @Test | ||
| void testInternalCounter() { | ||
| //test of incrementCounter |
There was a problem hiding this comment.
These tests could use: TestUtils.assertNextIntEquals. Matching a single output is unlikely but possible with different seeds. Matching a sequence of output is increasingly unlikely with length.
|
|
||
| @Test | ||
| void testInternalCounter() { | ||
| //test of incrementCounter |
There was a problem hiding this comment.
These tests could use: TestUtils.assertNextLongEquals. Matching a single output is unlikely but possible with different seeds. Matching a sequence of output is increasingly unlikely with length.
| case SPLIT_MIX_64: | ||
| case TWO_CMRES: | ||
| case TWO_CMRES_SELECT: | ||
| case PHILOX_4X32: |
There was a problem hiding this comment.
This generator has a native seed size of 6.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #191 +/- ##
============================================
+ Coverage 99.69% 99.71% +0.01%
+ Complexity 1495 810 -685
============================================
Files 152 154 +2
Lines 5996 6267 +271
Branches 564 576 +12
============================================
+ Hits 5978 6249 +271
Misses 12 12
Partials 6 6 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
I am fine to merge this when it passes the CI build. Currently it is failing on a malformed javadoc tag. I do not know why so it may just need reformatting to a single line. If you can run the default maven goal locally (using |
…rray initialization.
|
Sorry, the master branch run an invalid integration test. I've corrected the dependencies for that. If you rebase on master then hopefully this will pass CI on all tested JDKs. |
Dear commons-rng team,
This is a pull request to provide two new random number generators: philox4x32 and philox4x64 from https://www.thesalmons.org/john/random123/
Those are quite standard nowadays (part of CUDA, Numpy, default for Pytorch on GPU, default for TensorFlow) and there is no official Java implementation. The proposed implementation matches Numpy and Python's randomgen numbers. It passes all commons-rng unit tests.
Counter-based PRNGs are great for parallelization, as skipping-ahead is nearly as fast as a single random number generation, and the counter makes it very easy to create subsequences. One has complete control then on how to split.
In addition I also have added two new methods to convert a long to a double or two ints two a double in the (0,1) open interval. This is useful for many applications such as Monte-Carlo simulations, where uniform random numbers are mapped to some distribution. One common technique is to rely on the inverse cumulative distribution function, which is only defined on the open interval.
Best wishes,
Jherek
P.S.: below are some performance numbers:
As you can see they are not the fastest generators, but are for most purposes fast enough and the ease and flexibility of parallelization makes up for the generation speed. The 64-bit version could be faster with Math.unsignedMultipliedHigh of Java 19. I however found it problematic to use an MR-Jar (which would be ideal for this) with Jacoco.