Xnnpack backend support #159

chenghuaWang · 2024-10-11T06:58:46Z

!!!Do not merge until xnnpack backend llama is runable!!!

How to use xnnpack backend in mllm

The xnnpack backend in MLLM offers a convenient wrapper function designed to convert a standard CPU-based MLLM module into one that utilizes the xnnpack backend. This function, wrap2xnn, accepts parameters such as inputs_nums, outputs_nums, and any other arguments required for the construction of a LinearModule. For a clearer understanding, please refer to the example provided below:

E.g.:

class LinearModule : public Module {
    Layer linear;

public:
    LinearModule() {
        linear = Linear(1024, 2048, true, "linear");
    }

    vector<Tensor> Forward(vector<Tensor> inputs, vector<std::any> args) override {
        auto x = inputs[0];
        auto out = linear(x);
        return {out};
    }
};

TEST(XpLinearTest, LinearModule) {
    mllm::xnnpack::Log::log_level = mllm::xnnpack::Log::ERROR;

    auto model = ::mllm::xnnpack::wrap2xnn<LinearModule>(1, 1);
    model.setNoLoadWeightsDtype(DataType::MLLM_TYPE_F32);

    EXPECT_EQ(Backend::global_backends[MLLM_XNNPACK] != nullptr, true);

    Tensor x(1, 1, 256, 1024, Backend::global_backends[MLLM_XNNPACK], true);
    x.setTtype(TensorType::INPUT_TENSOR);

    for (int i = 0; i < 256 * 1024; ++i) {
        *(x.hostPtr<float>() + i) = 1024.f;
    }

    auto out = model({x})[0];

    for (int i = 0; i < 256 * 2048; ++i) {
        EXPECT_EQ(*(out.hostPtr<float>() + i) < 1e-18, true);
    }

    out.printShape();
}

Unlike the dynamic graph mode in MLLM, xnnpack operates on a static graph. This necessitates a mechanism to convert from a dynamic graph to a static graph. The xnnpack backend wrapper in MLLM will add several layers on top of the LinearModule to register input external and output external Tensors. The final wrapped module, as shown in the following pseudocode:

Layer: Direct(Direct::ExternalInput)
Module: LinearModule()
Layer: Direct(Direct::ExternalOutput)
Layer: Dispatch()

You can find more use cases in https://github.com/chenghuaWang/mllm/blob/main/test/xnnpack/

How are the operators in MLLM's xnnpack backend implemented?

Take XpAdd operation as an example:

The XpAdd‘s reshape function is identical to that of CPUAdd. The main differences lie in the setUp and execute functions.

Upon calling execute, XpAdd will integrate a static graph node into the xnnpack subgraph. However, XpAdd performs no actions during the setUp phase. This is because, during the setUp stage, we need to allow the XpDirect Op to determine whether the Tensor is an external input, external output, or a regular tensor.

ErrorCode XpAdd::execute(vector<shared_ptr<Tensor>> inputs, vector<shared_ptr<Tensor>> outputs) {
    auto xpb = (XnnpackBackend *)backend();
    tryDefineAllXpTensors(xpb, inputs);
    tryDefineAllXpTensors(xpb, outputs);

    // define xnnpack op.
    auto status = xnn_define_binary(
        xpb->getXnnSubgraph(),
        xnn_binary_add,
        nullptr,
        inputs[0]->uuid(),
        inputs[1]->uuid(),
        outputs[0]->uuid(),
        0);

    if (status != xnn_status_success) {
        Log::error("XpAdd::execute Error");
        exit(-1);
    }

    return MLLM_NO_ERROR;
}

chenghuaWang added 30 commits October 9, 2024 09:03

feat: XpOps, XpDirect.

59cdc15

feat: Xnnpack Add Example

8fbd7df

feat: mllm frontend -> xnn static graph

7024eab

fix: Add Example Done.

a3f271c

feat: xnnpack wrap

9c59cd0

fix: include path, update xnnpack to latest

c4bad7b

feat: xnn backend element wise op function

0a6f706

feat: xnn weight register and linear op

f156f35

fix: XpLinear error with NoLoadWeightsDtype

efb3fa4

feat: xnnpack matmul rope

5781312

feat: fix redefine tensor in xnnpack bug

72dfb7a

feat: add relu and rope bug fix

4ad8d31

feat: xnnpack GELU, Softmax, SiLU impl

ad3b148

feat: rms norm, tranpose

886eba7

feat: kvcache, still buggy, rfc.

b37ad92

feat: update 3rd party packages

90164da

feat: xp kvcache fix.

b0883bb

fix: github action main.yml

05f167b

feat: !!!SDPA!!! (Support B, H, S, D) layout

69e80f9

feat: XpSDPA torch impl for check mllm's correctness

a7471be

fix: XpRoPE, add view func

4a95180

fix: rope test example

a34240f

fix: transpose xnn example bugs find

5b4bd71

fix: xnnpack uuid register bug

314ca42

fix: xnnpack uuid register bug

bea70c1

fix: rope xnnpack test error

063f37f

feat: matmul xnnpack, failed at stl malloc.

3b75a25

fix: xnnpack illegal memory r/w by using valgrind.

2945423

fix: xnnpack attention impl bug

d778974

feat: XpEmbedding op

dbb642b

chenghuaWang and others added 7 commits October 25, 2024 13:30

feat: xp llama

75a2850

fix: use_layername_2_tensorname = false;

895e84e

change: llama xp example -> qwen xp example

d0e31e9

Merge remote-tracking branch 'upstream/main'

19ba8aa

fix: megre confict bugs

a4a8c70

fix: llm_model_ptr null error

bf278ef

Merge branch 'main' into main

1c043ee

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Xnnpack backend support #159

Xnnpack backend support #159

chenghuaWang commented Oct 11, 2024

Xnnpack backend support #159

Are you sure you want to change the base?

Xnnpack backend support #159

Conversation

chenghuaWang commented Oct 11, 2024

How to use xnnpack backend in mllm

How are the operators in MLLM's xnnpack backend implemented?