You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently the API is most developed for the creation/deletion and low-level modification of linear structures, rather than high-level operations on them (solves,mat-vec mult, mat-mat mult).
These higher-level operations are characterized by the fact they contain tight loops, though they themselves are unlikely to be in a tight loop at the core of important operations (and even were they, it would be the case that the encapsulated operation is computationally significant). This being the case we don't need a zero-overhead mechanism to retrieve and invoke these operations. Instead we can develop a general mechanism using inheritance polymorphism (for algorithm determination) and templating (for determining which backend is active).
The approach will be similar to how we currently retrieve minimal factory objects to build Mats and Vecs, but will require the user specify the desired high-level operation and possibly which of a variety of specific implementations of the algorithm they want to use (as we may implement e.g. gemm in a variety of ways with cuda w.r.t. memory movement and management).
While this is being worked on, whatever is developed should likely also be used to refactor the existing mechanisms to retrieve Mat/Vec factories.
The text was updated successfully, but these errors were encountered:
Practically speaking we can probably limit ourselves to the BLAS routines in terms of what high-level linear operations we intend to eventually support in the abstraction layer.
We should likely also move the DOT operation and NORM operations out of the zero-overhead API and into this more general API, since they are high-level operations rather than tight-loop operations.
Currently the API is most developed for the creation/deletion and low-level modification of linear structures, rather than high-level operations on them (solves,mat-vec mult, mat-mat mult).
These higher-level operations are characterized by the fact they contain tight loops, though they themselves are unlikely to be in a tight loop at the core of important operations (and even were they, it would be the case that the encapsulated operation is computationally significant). This being the case we don't need a zero-overhead mechanism to retrieve and invoke these operations. Instead we can develop a general mechanism using inheritance polymorphism (for algorithm determination) and templating (for determining which backend is active).
The approach will be similar to how we currently retrieve minimal factory objects to build Mats and Vecs, but will require the user specify the desired high-level operation and possibly which of a variety of specific implementations of the algorithm they want to use (as we may implement e.g. gemm in a variety of ways with cuda w.r.t. memory movement and management).
While this is being worked on, whatever is developed should likely also be used to refactor the existing mechanisms to retrieve Mat/Vec factories.
The text was updated successfully, but these errors were encountered: