- Aug 03, 2021
-
-
Break Yang authored
This allows running alf train, play and other programs from anywhere (no longer restricted to `alf/examples/`)
-
Break Yang authored
-
Neale Ratzlaff authored
This commits adds GPVI to alf, main changes to alf/algorithms/generator.py and tests are * Add _create_mvp_network method, which instantiates an encoding network to compute a matrix-vector product, used for computing the inverse jacobian vector product in GPVI. * Add InverseMVPAlgorithm to specify a training step of the above InverseMVP network. * Add rkhs_func_grad function to generator.py This is the main update routine for GPVI, and is related to the InverseMVPAlgorithm. I have not added a function value version of GPVI, nor have I added a minmax version ``rkhs_func_grad`` is called whenever the generator argument ``functional_gradient`` is set to ``rkhs``. Right now, ``functional_gradient`` only supports a ReluMLP generator, as it relies on the fast ``compute_vjp`` function, rather than autograd * Add GPVI tests to generator_test.py. * Add GPVI tests to hypernetwork_test.py. --Small Changes-- * ReluMLP method ``compute_vjp`` now returns the output of the forward evaluation. needed for GPVI update * Added arguments for GPVI to generator.py, such as force_fullrank, and fullrank_diag_weight * Added arguments for pinverse_network to hypernetwork_algorithm.py * Added arguments for GPVI to hypernetwork_algorithm.py * addressing comments in PR discussion * renamed pinverse network to InverseMVPNetwork for (matrix vector product) * added a test for InverseMVPNetwork that explicitly computes the inverse Jacobian vector product of an MLP with respect to some random input. This is evaluated against the solution found by training an InverseMVPNetwork. * Implemented fixes and suggestions to generator and hypernetwork files. * Changed naming of Pinverse network to the new InverseMVP network. * Addressing PR comments. * Removed InverseMVPNetwork file. Opted for EncodingNetwork as suggested by Wei. * Refactored InverseMVP test accordingly, to show that the idea still works. * Added helper function to generator.py, to create this network * Added/changed docstrings to generator.py as suggested.
-
Break Yang authored
-
- Aug 02, 2021
-
-
Haonan Yu authored
* support multi-dim reward for AC and PPO * address comments * more updates
-
- Jul 30, 2021
-
-
Break Yang authored
* Add UnrollPerformer as the module being wrapped by DistributedDataParallel * Enable DDP for on policy RLTrainer
-
Break Yang authored
-
- Jul 29, 2021
-
-
Break Yang authored
* Disable optimizer's mnemonic naming in algorithm.state_dict() * Remove unnecessary commented out class
-
Break Yang authored
-
- Jul 28, 2021
-
-
Break Yang authored
-
- Jul 27, 2021
-
-
Haonan Yu authored
* chapter 2 of tutorial * proper cross referencing * fix syntax error * address comments
-
Break Yang authored
[REFACTOR] train.py to consolidate common logic for both single GPU and multi GPU training (#913) (#944) * [REFACTOR] train.py to consolidate common logic for both single GPU and multi GPU training * Address Wei's comments * Address Haonan's comments * Specify authoritative url and port as well * Remove unused Optional typing
-
- Jul 25, 2021
-
-
Break Yang authored
-
- Jul 23, 2021
-
-
Haonan Yu authored
-
- Jul 22, 2021
-
-
Qinxun Bai authored
* update oac algorithm with latest alf updates * update oac test hyper parameters to past test more stably
-
Break Yang authored
This is part of the effort to unblock #913. Two reasons for this change 1. `worker` definitely does not rely on `ProcessEnvironment` at all, and therefore it is cleaner to make it independent of `ProcessEnvironment`. 2. If it stays as a member method of `ProcessEnvironment`, `multiprocess.Process` will get stuck on `start()` if the parent process is also a `multiprocess.Process`, for unknown reason though (tried investigation but haven't figured out).
-
Haonan Yu authored
* first section of ALF tutorial * address comments
-
Break Yang authored
This is part of the effort to address #913. A sub-task requires extract the worker logic to be out of the class (for some reason it will prevent `multiprocessing` to work correctly). Without such change the `multiprocessing.Process` will just be stuck on `start()`.
-
- Jul 20, 2021
- Jul 18, 2021
-
-
pd-perry authored
* added suite load for bsuite environment * fixed observation shape bug in suite_bsuite and added alf_bsuite_wrapper * Revert "Correct relative path import in py configurations (#928)" This reverts commit 33c0ba21. * removed alf_bsuite_wrapper and added method to suite_bsuite instead * removed alf_bsuite_wrapper import from suite_bsuite file * change loadbsuite to bsuitewrapper and edit docstrings * edited description * fixed PR review changes * added check for max steps and change copyright year to this year
-
- Jul 13, 2021
-
-
Break Yang authored
-
hnyu authored
* write pre configs to conf file when grid searching * correct message
-
Break Yang authored
* Print stdout and stderr when run_cmd fails during train_play_test * Make cmd on its own line
-
- Jul 12, 2021
-
-
Haichao Zhang authored
-
- Jul 10, 2021
-
-
hnyu authored
-
Break Yang authored
* Convert ac_breakout to using python configurable * Remove unnecessary print
-
- Jul 08, 2021
-
-
Break Yang authored
-
- Jun 28, 2021
-
-
Haichao Zhang authored
-
- Jun 19, 2021
-
-
Haichao Zhang authored
* Remove use_parallel_network flag from alg * Move flag to network side
-
- Jun 18, 2021
-
-
Qinxun Bai authored
* add OacAlgorithm and replicate paper results on HalfCheetah * add oac_humannoid_conf and oac_algorithm_test * minor updates in oac_algorithm_test * address code reviews * add oac halfcheetah and humanoid result figures * update oac_algorithm test * address further code reviews * address further code reviews * remove the unroll_with_grad * update oac_halfcheetah_conf for better performance * minor update of oac_halfcheetah_conf
-
hnyu authored
-
emailweixu authored
* Minor improvement to config_util 1. Environment varible for not using gin. gin's wrapper is very complicated, which can make debugging unfriendly and slow down the execution. 2. Error report for misuse of alf.config() * comment * Fix message
-
- Jun 17, 2021
-
-
Haichao Zhang authored
* Compositional FC * Use bmm for weighted combination * Better support to layer chaining
-
emailweixu authored
Wrap a function to save memory for backward. The returned function performs same computation as ``func``, but save memory by discarding intermediate results. It calculates the gradient by recomputing ``func`` using the same input during backward.
-
- Jun 16, 2021
- Jun 15, 2021
-
-
hnyu authored
* replace all gin.configurable with alf.configurable * remove import gin * fix failed test case
-
- Jun 12, 2021
-
-
Haichao Zhang authored
-
- Jun 10, 2021
-
-
Haichao Zhang authored
* Fix issue that breaks sac test * use convert device
-