Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design proposal for multi-attach support in netebpfext #3764

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
126 changes: 126 additions & 0 deletions docs/NetEbpfExtMultiAttach.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
# Introduction

Multi-attach support allows multiple eBPF programs to be attached to the same attach point (aka "hook") with the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The filename implies this proposal is only for netebpfext and not other extensions.
The title of the document (line 1) and first paragraph are generic though.
Assuming this is really just about netebpfext and won't work in general, please update the doc title.

same attach parameters. For example, for `BPF_CGROUP_INET4_CONNECT` attach type, the attach parameter is the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
same attach parameters. For example, for `BPF_CGROUP_INET4_CONNECT` attach type, the attach parameter is the
same attach parameters. For example, for the `BPF_CGROUP_INET4_CONNECT` attach type, the attach parameter is the

compartment ID. With this feature, multiple eBPF programs can be attached to the same compartment ID.

This document describes the design changes required in `netebpfext` to support multiple attach of eBPF programs in
`netebpfext`. This is independent of the platform changes needed in BPF runtime to add support for new multi-attach
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`netebpfext`. This is independent of the platform changes needed in BPF runtime to add support for new multi-attach
`netebpfext`. This is independent of the platform changes needed in the BPF runtime to add support for new multi-attach

APIs and flags. These changes in netebpfext also help enable the "default" behavior of multi-attach feature where the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
APIs and flags. These changes in netebpfext also help enable the "default" behavior of multi-attach feature where the
APIs and flags. These changes in netebpfext also help enable the "default" behavior of the multi-attach feature where

multiple legacy apps using older libbpf APIs to attach programs can still attach to the same hook, with the programs
being "appended" in the invoke list in the order they are attached.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
being "appended" in the invoke list in the order they are attached.
being "appended" to the invoke list in the order they are attached.


# Current Design
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current design section can be deleted unless in the new design section you refer to some of these. In any ways, for better readability you can significantly shorten this section.


`net_ebpf_extension_hook_provider_t` is the provider context for the hook NPI provider. It maintains a list of client
contexts. A client context represents the eBPF program that is attached to this specific "attach point". It also holds
the attach params with which the program was attached. For each client context, a filter context is created. The filter
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
the attach params with which the program was attached. For each client context, a filter context is created. The filter
the attach parameters with which the program was attached. For each client context, a filter context is created. The filter

Many times this file has attach params and other times attach parameters. Pick one and be consistency throughout. I'd probably prefer parameters under the principle of avoiding unnecessary abbreviations.

Also sometimes "attach params" has quotes like here, and sometimes it doesn't, like line 56. Be consistent. I'd suggest maybe use quotes the very first occurrence in the file and not thereafter.

context is container for all the WFP filters that are configured for this "attach point". Since only one client is
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
context is container for all the WFP filters that are configured for this "attach point". Since only one client is
context is a container for all the Windows Filtering Platform (WFP) filters that are configured for this "attach point". Since only one client is

allowed to attach for each attach parameter (i.e. "attach point"), there is a one-to-one mapping between client context
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
allowed to attach for each attach parameter (i.e. "attach point"), there is a one-to-one mapping between client context
allowed to attach for each attach parameter (i.e., "attach point"), there is a one-to-one mapping between client context

and filter context.

Whenever a new eBPF program is attached, following is the existing flow:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Whenever a new eBPF program is attached, following is the existing flow:
Whenever a new eBPF program is attached, the existing flow is as follows:

1. Check the list of current client contexts in hook provider, to see if any other program with same attach params is
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. Check the list of current client contexts in hook provider, to see if any other program with same attach params is
1. Check the list of current client contexts in the hook provider to see if any other program with same attach parameters is

already attached. This check happens with hook provider lock held. (There is a bug today in this logic where this lock
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
already attached. This check happens with hook provider lock held. (There is a bug today in this logic where this lock
already attached. This check happens with the hook provider lock held. (There is a bug today in this logic where this lock

is released after checking the list, causing 2 threads to both find that no matching client exists, and both threads
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
is released after checking the list, causing 2 threads to both find that no matching client exists, and both threads
is released after checking the list, potentially causing 2 threads to both find that no matching client exists, and both threads

can then proceed with attach for clients with same attach params).
2. If a program is already attached, reject the new attach request.
3. If no program is already attached with same attach params, create a new client context.
4. Create a new filter context for this client context and add WFP filters.
4. Add the new client context in the list of clients in the hook provider.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
4. Add the new client context in the list of clients in the hook provider.
5. Add the new client context in the list of clients in the hook provider.


## Synchronization

There are 2 flows that need synchronization -- program invocation by the hook, and client attach and detach callbacks.
In the current design, this synchronization is achieved by using rundown references.

**Program invocation flow**:
1. Acquire rundown reference for the client context. If this fails, bail.
2. Invoke program.
3. Release rundown reference for client context.

**Client attach callback flow**
1. Filter context that is created on program attach takes an initial rundown reference on the client context.
2. Each WFP filter that is added takes a normal reference on filter context.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. Each WFP filter that is added takes a normal reference on filter context.
2. Each WFP filter that is added takes a normal reference on the filter context.


**Client detach callback flow**
1. Delete all the WFP filters. These will eventually release references on the filter context.
2. Queue a work item that waits for rundown on client context. This blocks until all the current invocations of the
program are completed. This also fails any new invocations of the program.
3. When all the references to the filter context are released (as part of WFP filter delete notification), it releases
the initial rundown reference taken on the client context, and the work item is unblocked, which then completes the NMR
client detach callback.

# New Design

In the new design, multiple programs attaching to the same attach point (i.e. same attach params) is allowed. The
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In the new design, multiple programs attaching to the same attach point (i.e. same attach params) is allowed. The
In the new design, multiple programs attaching to the same attach point (i.e., same attach parameters) is allowed. The

required data structures changes are described below.

Even though multiple programs will attach with same "attach params", there will be a single instance of the
corresponding WFP filters added. This means there will be one-to-many mapping from filter context to client context.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the benefit of the reader please establish that the client to the hook is an eBPF program. So program and client are used interchangeably in this doc.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
corresponding WFP filters added. This means there will be one-to-many mapping from filter context to client context.
corresponding WFP filters added. This means there will be a one-to-many mapping from filter context to client context.

The hook provider will now contain a list of filter contexts, which in turn will have list (or an array) of the clients
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The hook provider will now contain a list of filter contexts, which in turn will have list (or an array) of the clients
The hook provider will now contain a list of filter contexts, which in turn will have a list (or an array) of the clients

corresponding to those attach params.

Whenever a new eBPF program is attached, following will be the new flow:
1. Check the list of current filter contexts in hook provider, to see if a filter context with same attach params
already exists. If an existing filter context is found, the new client is added to the list of client contexts in the
existing filter context.
2. If an existing filter context does not exist, create a new filter context and a new client context.
3. Configure corresponding WFP filters.
4. Add the new client context to the list of client contexts in the filter context.
5. Add the filter context in the list of filter contexts in hook provider context.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
5. Add the filter context in the list of filter contexts in hook provider context.
5. Add the filter context in the list of filter contexts in the hook provider context.


**Multiple program invocation flow**

Whenever WFP callout is invoked based on a matching filter, the callout driver gets pointer to filter context. From
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Whenever WFP callout is invoked based on a matching filter, the callout driver gets pointer to filter context. From
Whenever a WFP callout is invoked based on a matching filter, the callout driver gets a pointer to the filter context. From

the filter context, the callout gets the list of all the attached clients. The callout loops over all the clients and
invokes them. The *out* context of one program invocation becomes the *in* context of the next program. In the chain of
invocation, if any program **drops** the packet, the chain breaks there, and the callout returns the verdict to WFP.


## Synchronization
In the new design, access to the list of clients in the filter context and the list of filter contexts in the provider
context needs to be synchronized. The 2 flows that need synchronization specifically are:
1. Program invocation
2. Client attach / detach callback.

To provide synchronization, the hook provider will maintain a *dispatch* level RW lock to synchronize access to these
lists. (A dispatch level lock is chosen here as the WFP callouts can be invoked at either PASSIVE_LEVEL or DISPATCH_LEVEL).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
lists. (A dispatch level lock is chosen here as the WFP callouts can be invoked at either PASSIVE_LEVEL or DISPATCH_LEVEL).
lists. (A dispatch level lock is chosen here as the WFP callouts can be invoked at either PASSIVE_LEVEL or DISPATCH_LEVEL.)

1. Attach and detach callbacks flow will acquire this lock in exclusive mode.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. Attach and detach callbacks flow will acquire this lock in exclusive mode.
1. Attach and detach callback flows will acquire this lock in exclusive mode.

2. Program invocation flow acquires this lock in shared mode.

This approach allows multiple program invocations at the same time, as well as protecting the list when a client is
being attached / detached. Client / attach operations are expected to be much less frequent than program invocations.
This approach eliminates any need for rundown reference for the client context. Since the list of client contexts for a
filter context is synchronized via a lock, it is always guaranteed that in the client detach flow, if the dispatch level
lock is acquired, there is no program invocation in progress for that program. Also, once the client context is removed
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"if the dispatch level write lock is acquired" ....

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 89-90 and 114 don't use the term "read" or "write" lock, they use "exclusive" or "shared".

from the list of clients in the filter context, no new classify callback will find this program in the list, hence no
new invocation can also happen.

Along with the above synchronization, there is also a need to **serialize multiple attach and detach operations callbacks**,
as the whole flow of creating filter context, adding filter context to the provider list, and configuring WFP filters
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
as the whole flow of creating filter context, adding filter context to the provider list, and configuring WFP filters
as the whole flow of creating filter context, adding the filter context to the provider list, and configuring WFP filters

needs to happen atomically (as mentioned in the current design section, there is bug currently where a lock is not held
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
needs to happen atomically (as mentioned in the current design section, there is bug currently where a lock is not held
needs to happen atomically (as mentioned in the current design section, there is a bug currently where a lock is not held

Please add the issue # of the current bug.

for the whole duration of attach / detach callback).

As the above mentioned dispatch level RW already serializes attach / detach callbacks for a hook provider, it could have
solved this serialization problem also. But there is a problem with using dispatch level locks: WFP APIs for adding
filters require PASSIVE_LEVEL, hence same DISPATCH_LEVEL lock cannot be used to serialize the attach and detach
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
filters require PASSIVE_LEVEL, hence same DISPATCH_LEVEL lock cannot be used to serialize the attach and detach
filters require PASSIVE_LEVEL, hence the same DISPATCH_LEVEL lock cannot be used to serialize the attach and detach

operations. To address this, a separate PASSIVE lock will be maintained in the provider context to serialize attach and
detach operations. As a result of this, in attach and detach operations, the flow acquires both the DISPATCH_LEVEL lock
and the PASSIVE_LEVEL lock. In the program invocation flow, only the DISPATCH_LEVEL lock is acquired.

**Expected flow for client attach callback**:
1. Acquire passive lock --> Serializes multiple attach / detach calls to the same hook provider.
2. Acquire dispatch lock (exclusive). --> Synchronizes access to the client and filter context lists.
3. Check if a matching filter context already exists. If it exists, insert the new client context, and return.
4. If no matching filter context exists, release dispatch lock.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the state changes after this lock is released? Won't this permit multiple callers to create the WFP context?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The passive lock (acquired in step 1) still serializes attach / detach calls to this provider.

5. Create filter context and client context.
6. Configure WFP filters --> We are not holding any dispatch level lock. IRQL is PASSIVE_LEVEL.
7. Acquire dispatch lock (exclusive) --> Synchronizes access to the client and filter context lists.
8. Insert the new filter context and client context in the corresponding queues.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if WFP API fails?

9. Release dispatch level lock and passive lock.

**Expected flow for program invocation**
1. Acquire dispatch lock (shared). --> Synchronizes access to the client and filter context lists.
2. Invoke programs in a loop.
3. Release dispatch lock (shared).
Loading