Skip to content

torch.optim

Optimization algorithms.

All optimizers take model.parameters() and a learning rate, then step the weights after a backward pass.

Basic Usage

python
optimizer = optim.SGD(model.parameters(), lr=0.01)

# Inside training loop:
optimizer.zero_grad()
loss.backward()
optimizer.step()

Optimizers

torch.optim.SGD PyTorch

python
optim.SGD(params, lr, momentum=0, dampening=0, weight_decay=0, nesterov=False, maximize=False)

Stochastic Gradient Descent.

Parameters

NameTypeDefaultDescription
paramslistList of parameters (from model.parameters()).
lrfloat0.001Learning rate.
momentumfloat0Momentum factor.
dampeningfloat0Dampening for momentum.
weight_decayfloat0Weight decay (L2 penalty).
nesterovboolFalseEnables Nesterov momentum.
maximizeboolFalseMaximize the params based on the objective, instead of minimizing.

Example

python
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)

torch.optim.Adam PyTorch

python
optim.Adam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-8, weight_decay=0.0, amsgrad=False, maximize=False)

Adam optimizer.

Parameters

NameTypeDefaultDescription
paramslistList of parameters.
lrfloat0.001Learning rate.
betastuple(0.9, 0.999)Coefficients for computing running averages of gradient and its square.
epsfloat1e-8Numerical stability term.
weight_decayfloat0.0Weight decay (L2 penalty).
amsgradboolFalseWhether to use the AMSGrad variant.
maximizeboolFalseMaximize the params based on the objective, instead of minimizing.

Example

python
optimizer = optim.Adam(model.parameters(), lr=0.001)

Methods

All optimizers share these methods:

.step()

Updates all parameters using their stored gradients.

.zero_grad()

Resets gradients of all tracked parameters to zero. Call this before each backward pass.