Releases: AVSLab/bsk_rl
Releases · AVSLab/bsk_rl
v1.2.0
Release Notes
- Add an example script where reward is based on the probability of successfully observing targets covered by clouds in the Cloud Environment with Re-imaging example.
- Add a conjunction checking dynamics model in ConjunctionDynModel.
- Add utilities for relative motion state setup, cd2hill, hill2cd, and relative_to_chief.
- Add a dtype argument to the environment (or individual satellites) and sets the default dtype to np.float64.
- Add support for continuous action spaces (e.g. for control problems) with ContinuousAction.
- Add models and action for impulsive thrust and drift with a continuous action space (ImpulsiveThrust).
- Changed inconsistent uses of datastore to data_store.
- Added property data_store_kwargs to GlobalReward that is unpacked in the DataStore constructor.
- Implemented ResourceReward to reward based on the level of a property in the satellite multiplied by some coefficient.
- Allow rewarders to mark a satellite as truncated or terminated with the is_truncated and is_terminated methods.
- Added example script for using curriculum learning with RLlib in Curriculum Learning example.
- Updated the list of publications
- Added the option to compute value with sMDP rewards at the start of the step in the RLlib configuration.
- Add the ability to observe remaining time in Time.
- Allow for the time_limit to be randomized.
- Added observation for arbitrary relative states between two satellites in RelativeProperties.
- Allow for the transmitterPacketSize to be specified. The default sets it to the instrument’s baud rate.
- Add a maximum range checking dynamics model in MaxRangeDynModel. Useful for keeping an agent in the vicinity of a target early in training.
- Add properties in spacecraft dynamics for orbital element observations.
- Fix an issue with failure penalties in the PettingZoo environment when the rewarder does not return a reward for a satellite.
- Allow for per-episode randomization of ResourceReward weights and observation of those weights with ResourceRewardWeight.
- Add ImpulsiveThrustHill for impulsive thrust in the Hill frame.
- Separate random_circular_orbit and random_orbit to avoid misleading altitude argument.
- Add fault modeling example script using four reaction wheels in the Fault Environment example.
- Introduce a new RSO inspection environment, primarily consisting of RSOInspectionReward, RSOPoints, RSOInspectorFSWModel, and RSODynModel. An example environment setup is described in the RSO Inspection example.
- Add a maximum duration option to Image.
- Fix a bug where a satellite’s initial data was never added to the rewarder.
- Fix a bug where using multiple of the same rewarder would cause some settings to be overwritten.
- Add the ability to define metaagents that concatenate satellite action and observation spaces in the environment.
What's Changed
- [MINOR] Fix CI wheel name by @Mark2000 in #241
- Feature/239 example cloud reimaging by @LorenzzoQM in #240
- [#242] Fix sat_arg overrides for matrices by @Mark2000 in #243
- [#144] Conjunction checking dynamics model by @Mark2000 in #245
- [#144] Relative motion state setup utilities by @Mark2000 in #246
- [#254] Observation dtype specification by @Mark2000 in #255
- [#200] Build ipynb on commit by @Mark2000 in #253
- [MINOR] Fix pages build settings by @Mark2000 in #257
- Continuous action spaces by @Mark2000 in #247
- Feature/249 generic rewarder by @Mark2000 in #250
- Truncate and terminate a satellite from within the rewarder by @Mark2000 in #251
- Feature/261 example curriculum by @LorenzzoQM in #262
- Feature/263 update publications by @LorenzzoQM in #264
- [#144] sMDP discounting at step start option by @Mark2000 in #259
- Feature/time remaining by @Mark2000 in #258
- [#144] Relative observations by @Mark2000 in #260
- Feature/270 specify transmitter packet size by @LorenzzoQM in #271
- [#265] Maximum range checking model by @Mark2000 in #267
- [#244] Refactor dynamics and fsw by @Mark2000 in #273
- [#266] Orbital element observations by @Mark2000 in #272
- [MINOR] Fix issue with failure penalty in multiagent env by @Mark2000 in #276
- [#144] Resource reward randomization by @Mark2000 in #278
- [#144] Add Hill-frame impulsive thrust by @Mark2000 in #275
- [#144] Correct noncircular orbit generation by @Mark2000 in #277
- [MINOR] Reenable notebooks by @Mark2000 in #280
- RSO inspection scenario by @Mark2000 in #274
- [#125] Add maximum duration to imaging action by @Mark2000 in #281
- [MINOR] Fix indent by @Mark2000 in #282
- [MINOR] Add retries to flaky integrated tests by @Mark2000 in #283
- [#144] Fix an issue with RSO reward calculation by @Mark2000 in #284
- [#289] Fix initial data bug by @Mark2000 in #290
- Bugfix/event issues by @Mark2000 in #292
- Feature/285 example script on how to simulate faults by @Yume27 in #286
- [#294] Fix multiple rewarders of same type by @Mark2000 in #295
- Add meta agent option and add continuous action space compatibility to async by @Mark2000 in #297
- Feature/two way range check by @Mark2000 in #299
New Contributors
Full Changelog: v1.1.0...v1.2.0
v1.1.0
- Add ability in SatProperties to define new observations with a custom function.
- Add deepcopy to mutable inputs to the environment so that an environment argument dictionary can be copied without being affected by things that happen in the environment. This fixes compatibility with RLlib 2.33.0+. Note that this means that the satellite object passed to the environment is not the same object as the one used in the environment, as is the case for rewarders and communication objects.
- Add additional observation properties for satellites and opportunities.
- Add connectors for multiagent semi-MDPs, as demonstrated in a new single agent and multiagent example.
- Add a min_period option to CommunicationMethod.
- Cache agents in the ConstellationTasking environment to improve performance.
- Add option to generate_obs_retasking_only to prevent computing observations for satellites that are continuing their current action.
- Allow for ImagingSatellite to default to a different type of opportunity than target. Also allows for access filters to include an opportunity type.
- Improve performance of Eclipse observations by about 95%.
- Logs a warning if the initial battery charge or buffer level is incompatible with its capacity.
- Optimize communication when all satellites are communicating with each other.
- Enable Vizard visualization of the environment by setting the vizard_dir and vizard_settings options in the environment.
- Allow for the specification of multiple rewarders in the environment.