Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model Initialized outsize [w_min, w_max]: Pinpointing the bug in issue #604 #635

Open
Zhaoxian-Wu opened this issue Apr 1, 2024 · 7 comments
Assignees
Labels
bug Something isn't working status:reviewing

Comments

@Zhaoxian-Wu
Copy link

Description

As discussed in #604, the model weights will sometimes fall outsize [w_min, w_max].

Bug Pinpoint

The bug happens because of the incorrect initialization of w_min_bound_ and w_max_bound_ (see the code). It seems the following code snippet is designed to deal with the situation where the share weights is deployed and
PulsedDevice.perfect_bias is turned on. When the PulsedDevice.perfect_bias is on, the last dimension of the weights is incorrectly amplified by 100 times, yielding the incorrect active regions and weights.

// perfect bias
if ((par.perfect_bias) && (j == this->x_size_ - 1)) {
  w_scale_up_[i][j] = par.dw_min;
  w_scale_down_[i][j] = par.dw_min;
  w_min_bound_[i][j] = (T)100. * par.w_min; // essentially no bound
  w_max_bound_[i][j] = (T)100. * par.w_max; // essentially no bound
}

TODO

I was trying to fit the bug directly, but I found that I couldn't control shared_weight through the AnalogLinear initialization. I guess we should design a flag here to better control the shared weight behavior.

@Zhaoxian-Wu Zhaoxian-Wu added the bug Something isn't working label Apr 1, 2024
@maljoras
Copy link
Collaborator

maljoras commented Apr 2, 2024

Thanks for raising this issue. perfect_bias is indeed some "old" parameter setting, that should only be relevant for analog_bias. Since we have digital_bias now, it should actually be deleted. It has nothing to do with shared_weights so this is not relevant here.

@maljoras maljoras self-assigned this Apr 2, 2024
@Zhaoxian-Wu
Copy link
Author

I see. It seems that using digital_bias instead is a more natural solution. But what does shared_weights do? Does that mean multiple tiles share the same torch array?

@maljoras
Copy link
Collaborator

maljoras commented Apr 2, 2024

Shared weights is saying that the memory to the tile is handled from torch (and not from within C++). This means that also the backward etc is handled by torch. Note that the RPUCuda library is capable of handling the memory of the tile arrays and data internally (as it is a independent library that can be also used independently of pytorch)

@Zhaoxian-Wu
Copy link
Author

Shared weights is saying that the memory to the tile is handled from torch (and not from within C++). This means that also the backward etc is handled by torch. Note that the RPUCuda library is capable of handling the memory of the tile arrays and data internally (as it is a independent library that can be also used independently of pytorch)

I see. Thanks for your kind explaination :D

@kaoutar55
Copy link
Collaborator

@maljoras do we need to remove the perfect_bias in the code flow when we are using digital bias? what do you suggest here? It looks that we have a bug we need to solve.

@Borjagodoy
Copy link
Collaborator

I think this could be moved to a new issue @kaoutar55 , since the issue was opened because of a problem that finally seemed to be a concept bug, we can open a discussion about the perfect_bias if you like @maljoras and close this issue because actually the issue was solved, or at least that was my impression correct me if I'm wrong @Zhaoxian-Wu

@kaoutar55
Copy link
Collaborator

@Zhaoxian-Wu please look at this and try it at your end with the suggested changes. Let us know if the issue is resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working status:reviewing
Projects
None yet
Development

No branches or pull requests

4 participants