Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I am wondering you may haven't used the long term reward as in your paper? thanks #33

Open
restart-again opened this issue Oct 6, 2024 · 1 comment

Comments

@restart-again
Copy link

Hi, thanks for your wonderful sharing. While from your code, in all your learning based algorithms, the total reward calculation is based on the instance_done, which means your reward is only the reward of that specific instance, rather than all the VNR requests.
Your reward in learn_singly(...) is always 0, I suppose the long term reward should be accumulated here which is the true long term reward, while you always keep it to 0.
Please feel free to correct me if I am wrong, thanks.

@GeminiLight
Copy link
Owner

Thank you for your feedback. In our library, we support two types of learning paradigms:

  • Instance-level optimization, which focuses on finding the optimal solution for each individual instance as it arrives.
  • Online-level optimization, which aims to learn a globally optimal policy to maximize overall system performance metrics across all instances.

You can find the implementations of both paradigms, including the instance-level and online-level environments, in the rl_solver directory.

As you correctly noted, most of the current implementations in our library are based on the instance-level paradigm. This is due to our empirical analysis and insights. In network systems, the randomness of service requests makes it difficult to learn a robust online-level policy that meets expectations. This approach tends to require more time and often does not deliver satisfactory performance. Conversely, the instance-level paradigm allows us to efficient obtain a high-quality solving policy, leading to more reliable and efficient results in practice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants