Functional Proposal - Tenant Workload Scheduling Loci Relative To Persistent Storage Node Controls - Akash Provider System #248

arnoldoree · 2023-06-21T11:56:45Z

arnoldoree
Jun 21, 2023

I write to put forward this proposed priority functional development action upon the Akash Provider System.

Overview

The proposed development is to add mechanisms, means, and measures to ensure that Tenant pods / workloads that have been subscribed to persistent storage services are only ever scheduled on Kubernetes nodes within local proximity or within storage speed network connectivity to the baremetal host upon which Rook-Ceph persistent storage has been enabled.

Value Proposition

The value on offer from what should in technical terms be limited exertions hugely outweigh any architecting, design and implementation costs.

As a non-exhaustive itemisation, I put forward the following benefits:

10/25 Gbps SFP+ storage networks will no longer be the price of entry for scaling beyond a Provider Cluster single sever architecture whilst enabling persistent storage. Thus reducing:
1.a. Hardware costs
1.b. power consumption costs, and
1.c. Requisite Provider datacentre network complexity
Provider-Engineers can better shape their cluster systems to better serve the varying needs and technical / performance objectives of Tenants who require persistent storage, versus those who require ephemeral storage
2.a. Servers within the cluster dedicated to persistent storage can be designed, implemented, and engineered at a premium to meet the in all probability higher continuity needs/requirements of Tenants who subscribe to persistent storage; thus providing persistent storage tenants the service levels they require, for the premium they are prepared to pay. Whilst ensuring ephemeral storage clients can keep costs reflective of levels of continuity in actuality required.

Implementation Strategy

The two identified implementation strategies for this functional proposal are:

The Kubernetes taints-and-tolerances mechanism
The Kubernetes labels and affinity / anti-affinity mechanism

The Kubernetes taints-and-tolerances mechanism

The taints and tolerances mechanism approach offers the opportunity for automating much of the logical function operation, and in particular where Condition-based Kubernetes taints are applied. This approach would be very good for safeguarding the operation of tenant workloads, as it could employ Kubernetes to systematically ensure that Tenant persistent storage pods / workloads are only ever deployed to nodes whereby they have fast read and write access to the persistent storage backing to which they are subscribed.

The Taints and tolerances mechanism does however come with considerably more technical overhead, as it would be necessary to define and the measure targeted and excluded conditions, in order for Kubernetes to be able to make its compliant scheduling decisions.

The Kubernetes labels and affinity / anti-affinity mechanism

The labels and affinity / anti-affinity mechanism will be far simpler to implement, yet will have less scope for automation as compared to taints and tolerances. Labelling nodes for persistent storage is already a manual engineer action of the Persistent Storage Enablement procedure that is an annex to the general Akash Provider Build procedure.

The gain asides from greater simplicity in implementation, is that more control and choice may be placed in Provider-Engineer hands.

Strategic Choice

It is in my mind necessary to see that both mechanisms will in the medium to long term need to be applied to realize and develop this proposed functionality. Especially given that Persistent Storage is only the first domain where this functionality finds its relevance. GPU tenant subscription is another obvious domain where Tenant workload locality within the Provider Cluster will be important if not critical, with or without high speed networking.

As the work needed from this proposed functionality grows, it will be necessary to apply the full range of tools to realize its simple and reliable performance across multiple domains.

Further to the above, it will be the right course in my view to implement firstly a simplified labels + affinity/anti-affinity mechanism in a first iteration.

Implementation Design

In my view the labels + affinity/anti-affinity implementation should be closely related to but independent to the present beta1/beta2/beta3 labelling for persistent storage enabled nodes. The reason being that there will be considerable variance in Provider-Engineer architectures and networks; and specific variables need to be provided to allow Provider-Engineers to best tailor this proposed functionality to their specific provider context.

I would propose therefore a four label system consisting of:

storage.node.enabled
~ Persistent storage is enabled on this worker node
storage.node.local
~ Persistent storage is not enabled on this node, but this node is based on a virtual machine on the same local host as the target storage node
storage.node.network
~ Persistent storage is accessible over a high speed storage network
storage.node.remote
~ Persistent storage is on the other side of a low speed management or other network

I hope that the above label system variables should clearly enough articulate how Tenant persistent storage subscribed pod / workload placement could be granularly tailored by the Provider-Engineer to suit their particular provider context. Whereby they could both ensure that persistent storage subscribed Tenant workloads were guaranteed low latency locality to their target persistent storage node; whilst at the same time they could ensure that non-persistent storage Tenant workloads are not ill placed so as to occupy capacity in close proximity to irrelevant persistent storage nodes.

Arnold Opio Oree
arnoldoree@parallaxgo.com
Parallax Cloud Compute™

Quick Note: Would be great if there was a "Feature & Function Decision(s)" discussion category, where features and functions could be put forth with discussion around interested party appetite, and key implementation decision points. "Ideas" feels too vague for this intended focused collaborative feature & function development discussion.

andy108369 · 2023-06-21T13:59:18Z

andy108369
Jun 21, 2023
Collaborator

(for the devs) I think the following node affinity can be leveraged in the statefulset kind of deployments:

spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: akash.network/storageclasses
            operator: Exists

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Akash Network

Functional Proposal - Tenant Workload Scheduling Loci Relative To Persistent Storage Node Controls - Akash Provider System #248

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Akash Network

Functional Proposal - Tenant Workload Scheduling Loci Relative To Persistent Storage Node Controls - Akash Provider System #248

arnoldoree Jun 21, 2023

Overview

Value Proposition

Implementation Strategy

Strategic Choice

Implementation Design

Replies: 1 comment

andy108369 Jun 21, 2023 Collaborator

arnoldoree
Jun 21, 2023

andy108369
Jun 21, 2023
Collaborator