* Add MuZero configuration for MetaDrive * Add the training curve * Address comments * Use LSTM for reward prediction