Shortcuts

autotorch.scheduler

FIFOScheduler

class autotorch.scheduler.FIFOScheduler(train_fn, args=None, resource=None, searcher=None, search_options=None, checkpoint='./exp/checkpoint.ag', resume=False, num_trials=None, time_out=None, max_reward=1.0, time_attr='epoch', reward_attr='accuracy', visualizer='none', dist_ip_addrs=None)[source]

Simple scheduler that just runs trials in submission order.

Parameters
  • train_fn (callable) – A task launch function for training. Note: please add the @autotorch_method decorater to the original function.

  • args (object (optional)) – Default arguments for launching train_fn.

  • resource (dict) – Computation resources. For example, {‘num_cpus’:2, ‘num_gpus’:1}

  • searcher (str or object) – Autotorch searcher. For example, autotorch.searcher.self.argsRandomSampling

  • time_attr (str) – A training result attr to use for comparing time. Note that you can pass in something non-temporal such as training_epoch as a measure of progress, the only requirement is that the attribute should increase monotonically.

  • reward_attr (str) – The training result objective value attribute. As with time_attr, this may refer to any objective value. Stopping procedures will use this attribute.

  • dist_ip_addrs (list of str) – IP addresses of remote machines.

Examples

>>> import numpy as np
>>> import autotorch as at
>>> @at.args(
...     lr=at.Real(1e-3, 1e-2, log=True),
...     wd=at.Real(1e-3, 1e-2))
>>> def train_fn(args, reporter):
...     print('lr: {}, wd: {}'.format(args.lr, args.wd))
...     for e in range(10):
...         dummy_accuracy = 1 - np.power(1.8, -np.random.uniform(e, 2*e))
...         reporter(epoch=e, accuracy=dummy_accuracy, lr=args.lr, wd=args.wd)
>>> scheduler = at.scheduler.FIFOScheduler(train_fn,
...                                        resource={'num_cpus': 2, 'num_gpus': 0},
...                                        num_trials=20,
...                                        reward_attr='accuracy',
...                                        time_attr='epoch')
>>> scheduler.run()
>>> scheduler.join_jobs()
>>> scheduler.get_training_curves(plot=True)
add_job(task, **kwargs)[source]

Adding a training task to the scheduler.

Parameters

task (autotorch.scheduler.Task) – a new training task

Relevant entries in kwargs:
  • bracket: HB bracket to be used. Has been sampled in _promote_config

  • new_config: If True, task starts new config eval, otherwise it promotes a config (only if type == ‘promotion’)

Only if new_config == False:
  • config_key: Internal key for config

  • resume_from: config promoted from this milestone

  • milestone: config promoted to this milestone (next from resume_from)

get_best_config()[source]

Get the best configuration from the finished jobs.

get_best_reward()[source]

Get the best reward from the finished jobs.

get_training_curves(filename=None, plot=False, use_legend=True)[source]

Get Training Curves

Parameters
  • filename (str) –

  • plot (bool) –

  • use_legend (bool) –

Examples

>>> scheduler.run()
>>> scheduler.join_jobs()
>>> scheduler.get_training_curves(plot=True)
https://github.com/zhanghang1989/AutoGluonWebdata/blob/master/doc/api/autogluon.1.png?raw=true
load_state_dict(state_dict)[source]

Load from the saved state dict.

Examples

>>> scheduler.load_state_dict(at.load('checkpoint.ag'))
run(**kwargs)[source]

Run multiple number of trials

run_with_config(config)[source]

Run with config for final fit. It launches a single training trial under any fixed values of the hyperparameters. For example, after HPO has identified the best hyperparameter values based on a hold-out dataset, one can use this function to retrain a model with the same hyperparameters on all the available labeled data (including the hold out set). It can also returns other objects or states.

save(checkpoint=None)[source]

Save Checkpoint

schedule_next()[source]

Schedule next searcher suggested task

state_dict(destination=None)[source]

Returns a dictionary containing a whole state of the Scheduler

Examples

>>> at.save(scheduler.state_dict(), 'checkpoint.ag')

HyperbandScheduler

class autotorch.scheduler.HyperbandScheduler(train_fn, args=None, resource=None, searcher=None, search_options=None, checkpoint='./exp/checkpoint.ag', resume=False, num_trials=None, time_out=None, max_reward=1.0, time_attr='epoch', reward_attr='accuracy', max_t=100, grace_period=10, reduction_factor=4, brackets=1, visualizer='none', type='stopping', dist_ip_addrs=None, keep_size_ratios=False, maxt_pending=False)[source]

Implements different variants of asynchronous Hyperband

See ‘type’ for the different variants. One implementation detail is when using multiple brackets, task allocation to bracket is done randomly based on a softmax probability.

Parameters
  • train_fn (callable) – A task launch function for training.

  • args (object, optional) – Default arguments for launching train_fn.

  • resource (dict) – Computation resources. For example, {‘num_cpus’:2, ‘num_gpus’:1}

  • searcher (object, optional) – Autotorch searcher. For example, autotorch.searcher.RandomSearcher

  • time_attr (str) – A training result attr to use for comparing time. Note that you can pass in something non-temporal such as training_epoch as a measure of progress, the only requirement is that the attribute should increase monotonically.

  • reward_attr (str) – The training result objective value attribute. As with time_attr, this may refer to any objective value. Stopping procedures will use this attribute.

  • max_t (float) – max time units per task. Trials will be stopped after max_t time units (determined by time_attr) have passed.

  • grace_period (float) – Only stop tasks at least this old in time. Also: min_t. The units are the same as the attribute named by time_attr.

  • reduction_factor (float) – Used to set halving rate and amount. This is simply a unit-less scalar.

  • brackets (int) – Number of brackets. Each bracket has a different grace period, all share max_t and reduction_factor. If brackets == 1, we just run successive halving, for brackets > 1, we run Hyperband.

  • type (str) –

    Type of Hyperband scheduler:
    stopping:

    See HyperbandStopping_Manager. Tasks and config evals are tightly coupled. A task is stopped at a milestone if worse than most others, otherwise it continues. As implemented in Ray/Tune: https://ray.readthedocs.io/en/latest/tune-schedulers.html#asynchronous-hyperband

    promotion:

    See HyperbandPromotion_Manager. A config eval may be associated with multiple tasks over its lifetime. It is never terminated, but may be paused. Whenever a task becomes available, it may promote a config to the next milestone, if better than most others. If no config can be promoted, a new one is chosen. This variant may benefit from pause&resume, which is not directly supported here. As proposed in this paper (termed ASHA): https://arxiv.org/abs/1810.05934

  • keep_size_ratios (bool) – Implemented for type ‘promotion’ only. If True, promotions are done only if the (current estimate of the) size ratio between rung and next rung are 1 / reduction_factor or better. This avoids higher rungs to get more populated than they would be in synchronous Hyperband. A drawback is that promotions to higher rungs take longer.

  • maxt_pending (bool) – Relevant only if a model-based searcher is used. If True, register pending config at level max_t whenever a new evaluation is started. This has a direct effect on the acquisition function (for model-based variant), which operates at level max_t. On the other hand, it decreases the variance of the latent process there. NOTE: This could also be removed…

  • dist_ip_addrs (list of str) – IP addresses of remote machines.

Examples

>>> import numpy as np
>>> import autotorch as at
>>>
>>> @at.args(
...     lr=at.Real(1e-3, 1e-2, log=True),
...     wd=at.Real(1e-3, 1e-2))
>>> def train_fn(args, reporter):
...     print('lr: {}, wd: {}'.format(args.lr, args.wd))
...     for e in range(10):
...         dummy_accuracy = 1 - np.power(1.8, -np.random.uniform(e, 2*e))
...         reporter(epoch=e, accuracy=dummy_accuracy, lr=args.lr, wd=args.wd)
>>> scheduler = at.scheduler.HyperbandScheduler(train_fn,
...                                             resource={'num_cpus': 2, 'num_gpus': 0},
...                                             num_trials=20,
...                                             reward_attr='accuracy',
...                                             time_attr='epoch',
...                                             grace_period=1)
>>> scheduler.run()
>>> scheduler.join_jobs()
>>> scheduler.get_training_curves(plot=True)
add_job(task, **kwargs)[source]

Adding a training task to the scheduler.

Parameters

task (autotorch.scheduler.Task) – a new training task

Relevant entries in kwargs:

  • bracket: HB bracket to be used. Has been sampled in _promote_config

  • new_config: If True, task starts new config eval, otherwise it promotes a config (only if type == ‘promotion’)

Only if new_config == False:

  • config_key: Internal key for config

  • resume_from: config promoted from this milestone

  • milestone: config promoted to this milestone (next from resume_from)

load_state_dict(state_dict)[source]

Load from the saved state dict.

Examples

>>> scheduler.load_state_dict(at.load('checkpoint.ag'))
state_dict(destination=None)[source]

Returns a dictionary containing a whole state of the Scheduler

Examples

>>> at.save(scheduler.state_dict(), 'checkpoint.ag')