Stop using numpy.random.seed() | Integrated


Using np.random.seed(number) has been a best practice when using Numpy to create reproducible work. Setting random seed means your work is reproducible to othersoh use your code. But now when you look at the docs for np.random.seedthe description reads as follows:

This is a handy legacy feature.

The best practice is to do not reboot a BitGenerator, but rather recreate a new one. This method is here only for legacy reasons.

So what has changed? I am going to explainIn the old method and the problems with it. Next, I will demonstrate the New best practice and its benefits.

Stop Using NumPy’s Global Random Seed – Here’s Why

Using np.random.seed(number) defines what NumPy calls the global random seed, which affects all uses of the np.random.* module. Some imported packages or other scripts may reset the global random seed to another random seed with np.random.seed(another_number), which may cause unwanted changes to your output and make your results unreproducible.

Legacy best practice

If you are looking for tutorials using np.random you see a lot of them being used np.random.seed to lay the foundation for reproducible work. We can see how it works:

>>> import numpy as np

>>> import numpy as np
>>> np.random.rand(4)
array([0.96176779, 0.7088082 , 0.06416725, 0.82679036])

>>> np.random.rand(4)
array([0.15051909, 0.77788803, 0.67073372, 0.32134285])

As you can see, two calls to the function lead to two completely different responses. If you want someone to be able to reproduce your projects, you can set the seed with the following code snippet:

>>> np.random.seed(2021)
>>> np.random.rand(4)
array([0.60597828, 0.73336936, 0.13894716, 0.31267308])


>>> np.random.seed(2021)
>>> np.random.rand(4)
array([0.60597828, 0.73336936, 0.13894716, 0.31267308])

You see the results are the same. If you need to prove it to yourself, you can enter the code above into your Python configuration.

Setting the seed means that the next random call is the same; it defines the random number sequence so that any code that produces or uses random numbers (with NumPy) will now produce the same number sequence. For example, look at the following:

>>> np.random.seed(2021)
>>> np.random.rand(4)
array([0.60597828, 0.73336936, 0.13894716, 0.31267308])
>>> np.random.rand(4)
array([0.99724328, 0.12816238, 0.17899311, 0.75292543])
>>> np.random.rand(4)
array([0.66216051, 0.78431013, 0.0968944 , 0.05857129])
>>> np.random.rand(4)
array([0.96239599, 0.61655744, 0.08662996, 0.56127236])
>>> np.random.seed(2021)
>>> np.random.rand(4)
array([0.60597828, 0.73336936, 0.13894716, 0.31267308])
>>> np.random.rand(4)
array([0.99724328, 0.12816238, 0.17899311, 0.75292543])
>>> np.random.rand(4)
array([0.66216051, 0.78431013, 0.0968944 , 0.05857129])
>>> np.random.rand(4)
array([0.96239599, 0.61655744, 0.08662996, 0.56127236])

More from our Python experts5 Ways to Write More Python Code

The problem with NumPy’s global random seed

You might be looking at the example above and thinking, “so what’s the problem?” You can create repeatable calls, which means that all random numbers generated after setting the seed will be the same on any machine. For the most part, that’s true; and for many projects, you may not need to worry about it.

The problem comes from larger projects or projects with imports that could also give the seed. Using np.random.seed(number) defines what NumPy calls the global random seedwhich affects all uses of the np.random.* module. Some imported packages or other scripts may reset the global random seed to another random seed with np.random.seed(another_number), which can cause unwanted changes to your output and make your results non-reproducible. In most cases, you will only need to make sure to use the same random numbers for specific parts of your code (like tests or functions).

NumPy random seed method in Python

Your mother doesn’t work hereData scientists, your variable names are a mess. Clean up your code.

The solution and the new method

This is one of the reasons why NumPy decided to advise users to create a random number generator for specific tasks (or even pass it around when you need parts to be repeatable).

“The preferred best practice for getting repeatable pseudo-random numbers is to instantiate a generator object with a seed and pass it around.” -Robert Kern, NEP19

Using this new best practice looks like this:

import numpy as np
>>> rng = np.random.default_rng(2021)
>>> rng.random(4)
array([0.75694783, 0.94138187, 0.59246304, 0.31884171])

As you can see, these numbers are different from the previous example because NumPy has changed the default pseudo-random number generator. However, you can replicate old results using RandomStatewhich is a generator of old inherited methods

>>> rng = np.random.RandomState(2021)
>>> rng.rand(4)
array([0.60597828, 0.73336936, 0.13894716, 0.31267308])

More built-in Python help 4 Python tools to make your life easier

Benefits

You can pass random number generators between functions and classes, which means each individual or function can have its own random state without resetting the global seed. Additionally, each script could pass a random number generator to the functions that need to be repeatable. The advantage is that you know exactly which random number generator is used in each part of your project.

def f(x, rng): return rng.random(1)
#Intialise a random number generator
rng = np.random.default_rng(2021)
#pass the rng to functions which you would like to use it
random_number = f(x, rng)

Other advantages arise from parallel processing, as Albert Thomas shows us.

Using independent random number generators can help improve the reproducibility of your results. You can do this by not relying on the global random state (which can be reset or unknowingly used). Passing around a random number generator means you can follow when and How? ‘Or’ What it was used and make sure your results are the same.

Previous Metaverse Series: Will Eyes Follow Brands?
Next More money sooner: Greens launch plan to lower retirement age to 65 and raise pension rate