Random search

From HandWiki

Random search (RS) is a family of numerical optimization methods that do not require the gradient of the problem to be optimized, and RS can hence be used on functions that are not continuous or differentiable. Such optimization methods are also known as direct-search, derivative-free, or black-box methods. Anderson in 1953 reviewed the progress of methods in finding maximum or minimum of problems using a series of guesses distributed with a certain order or pattern in the parameter searching space, e.g. a confounded design with exponentially distributed spacings/steps.[1] This search goes on sequentially on each parameter and refines iteratively on the best guesses from the last sequence. The pattern can be a grid (factorial) search of all parameters, a sequential search on each parameter, or a combination of both. The method was developed to screen the experimental conditions in chemical reactions by a number of scientists listed in Anderson's paper. A MATLAB code reproducing the sequential procedure for the general non-linear regression of an example mathematical model can be found here (JCFit @ GitHub).[2]

The name "random search" is attributed to Rastrigin[3] who made an early presentation of RS along with basic mathematical analysis. RS works by iteratively moving to better positions in the search space, which are sampled from a hypersphere surrounding the current position.

The algorithm described herein is a type of local random search, where every iteration is dependent on the prior iteration's candidate solution. There are alternative random search methods that sample from the entirety of the search space (for example pure random search or uniform global random search), but these are not described in this article.

Random search has been used in artificial neural network for hyper-parameter optimization.[4]

If good parts of the search space occupy 5% of the volume the chances of hitting a good configuration in search space is 5%. The probability of finding at least one good configuration is above 95% after trying out 60 configurations ([math]\displaystyle{ 1-0.95^{60}=0.953\gt 0.95 }[/math], making use of the counterprobability).

Algorithm

Let f: ℝn → ℝ be the fitness or cost function which must be minimized. Let x ∈ ℝn designate a position or candidate solution in the search-space. The basic RS algorithm can then be described as:

  1. Initialize x with a random position in the search-space.
  2. Until a termination criterion is met (e.g. number of iterations performed, or adequate fitness reached), repeat the following:
    1. Sample a new position y from the hypersphere of a given radius surrounding the current position x (see e.g. Marsaglia's technique for sampling a hypersphere.)
    2. If f(y) < f(x) then move to the new position by setting x = y

Variants

Scheme of random search using a non-linear regression problem as an example. The goal is to minimize the value of the penalty function. The right bottom shows a few example methods: 1. Non-structured random search, 2. structured random search, 3. Gauss-Newton algorithm, and 4. Levenberg-Marquardt algorithm. 1,2 do not need to know the gradient and 3,4 have to calculate the gradient and usually minimize on both A and k parameters at the same time (scheme only shows the k dimension).

Truly random search is purely by luck and varies from very costive to very lucky, but the structured random search is strategic. A number of RS variants have been introduced in the literature with structured sampling in the searching space:

  • Friedman-Savage procedure: Sequentially search each parameter with a set of guesses that have a space pattern between the initial guess and the boundaries.[5] An example of exponentially distributed steps can be found here in a MATLAB code (JCFit @ GitHub).[2] This example code converges 1-2 orders of magnitude slower than the Levenberg–Marquardt algorithm, with an example also provided in the GitHub.
  • Fixed Step Size Random Search (FSSRS) is Rastrigin's [3] basic algorithm which samples from a hypersphere of fixed radius.
  • Optimum Step Size Random Search (OSSRS) by Schumer and Steiglitz [6] is primarily a theoretical study on how to optimally adjust the radius of the hypersphere so as to allow for speedy convergence to the optimum. The actual implementation of the OSSRS needs to approximate this optimal radius by repeated sampling and is therefore expensive to execute.
  • Adaptive Step Size Random Search (ASSRS) by Schumer and Steiglitz [6] attempts to heuristically adapt the hypersphere's radius: two new candidate solutions are generated, one with the current nominal step size and one with a larger step-size. The larger step size becomes the new nominal step size if and only if it leads to a larger improvement. If for several iterations neither of the steps leads to an improvement, the nominal step size is reduced.
  • Optimized Relative Step Size Random Search (ORSSRS) by Schrack and Choit [7] approximate the optimal step size by a simple exponential decrease. However, the formula for computing the decrease factor is somewhat complicated.

See also

References

  1. Anderson, R.L. (1953). "Recent Advances in Finding Best Operating Conditions". Journal of the American Statistical Association 48 (264): 789-798. doi:10.2307/2281072. 
  2. 2.0 2.1 "GitHub - Jixin Chen/jcfit: A Random Search Algorithm for general mathematical model(s) fittings". https://github.com/nkchenjx/jcfit. 
  3. 3.0 3.1 Rastrigin, L.A. (1963). "The convergence of the random search method in the extremal control of a many parameter system". Automation and Remote Control 24 (11): 1337–1342. https://archive.org/details/sim_automation-and-remote-control_1963-11_24_11/page/n1/mode/2up?view=theater. Retrieved 30 November 2021. "1964 translation of Russian Avtomat. i Telemekh pages 1467–1473". 
  4. Bergstra, J.; Bengio, Y. (2012). "Random search for hyper-parameter optimization.". Journal of Machine Learning Research 13: 281-305. https://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf. 
  5. Friedman, M.; Savage, L.J. (1947). Planning experiments seeking maxima, chapter 13 of Techniques of Statistical Analysis, edited by Eisenhart, Hastay, and Wallis.. McGraw-Hill Book Co., New York. pp. 363-372. https://miltonfriedman.hoover.org/internal/media/dispatcher/214332/full. Retrieved 30 November 2021. 
  6. 6.0 6.1 Schumer, M.A.; Steiglitz, K. (1968). "Adaptive step size random search". IEEE Transactions on Automatic Control 13 (3): 270–276. doi:10.1109/tac.1968.1098903. 
  7. Schrack, G.; Choit, M. (1976). "Optimized relative step size random searches". Mathematical Programming 10 (1): 230–244. doi:10.1007/bf01580669.