* fix the bug of reward weight setting if checkpoint loaded * create reward weights as buffers * revert deletion