Custom Rewards (doc in progress)

Detailed usage

Classes:

N1ContingencyReward([l_ids, ...])

This class implements a reward that leverage the lightsim2grid.ContingencyAnalysis to compute the number of unsafe contingency at any given time.

class lightsim2grid.rewards.N1ContingencyReward(l_ids=None, threshold_margin=1.0, dc=False, normalize=False, logger=None, tol=1e-08, nb_iter=10)[source]

This class implements a reward that leverage the lightsim2grid.ContingencyAnalysis to compute the number of unsafe contingency at any given time.

Examples

This can be used as:

import grid2op
from lightsim2grid.rewards import N1ContingencyReward
l_ids = [0, 1, 7]
env = grid2op.make("l2rpn_case14_sandbox",
                   reward_class=N1ContingencyReward(l_ids=l_ids)
                  )
obs = env.reset()
obs, reward, *_ = env.step(env.action_space())
print(f"reward: {reward:.3f}")

Methods:

close()

overide this for certain reward that might need specific behaviour

initialize(env)

If BaseReward.reward_min, BaseReward.reward_max or other custom attributes require to have a valid grid2op.Environment.Environment to be initialized, this should be done in this method.

reset(env)

This method is called each time env is reset.

close()[source]

overide this for certain reward that might need specific behaviour

initialize(env: Environment)[source]

If BaseReward.reward_min, BaseReward.reward_max or other custom attributes require to have a valid grid2op.Environment.Environment to be initialized, this should be done in this method.

NB reward_min and reward_max are used by the environment to compute the maximum and minimum reward and cast it in “reward_range” which is part of the openAI gym public interface. If you don’t define them, some piece of code might not work as expected.

Parameters:

env (grid2op.Environment.Environment) – An environment instance properly initialized.

reset(env: BaseEnv) None[source]

This method is called each time env is reset.

It can be usefull, for example if the reward depends on the length of the current chronics.

It does nothing by default.

Parameters:
  • env (grid2op.Environment.Environment) – The current environment

  • danger:: (..) –

    This function should not modify self.reward_min nor self.reward_max !!!

    It might cause really hard trouble for agent to learn if you do so.