Value-based decision making often involves integration of reward outcomes over time, but this becomes considerably more challenging if reward assignments on alternative options are probabilistic and non-stationary. Despite the existence of various models for optimally integrating reward under uncertainty, the underlying neural mechanisms are still unknown. Here we propose that reward-dependent metaplasticity (RDMP) can provide a plausible mechanism for both integration of reward under uncertainty and estimation of uncertainty itself. We show that a model based on RDMP can robustly perform the probabilistic reversal learning task via dynamic adjustment of learning based on reward feedback, while changes in its activity signal unexpected uncertainty. The model predicts time-dependent and choice-specific learning rates that strongly depend on reward history. Key predictions from this model were confirmed with behavioral data from non-human primates. Overall, our results suggest that metaplasticity can provide a neural substrate for adaptive learning and choice under uncertainty.