Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xnnpack backend support #159

Open
wants to merge 37 commits into
base: main
Choose a base branch
from

Conversation

chenghuaWang
Copy link
Contributor

!!!Do not merge until xnnpack backend llama is runable!!!

How to use xnnpack backend in mllm

The xnnpack backend in MLLM offers a convenient wrapper function designed to convert a standard CPU-based MLLM module into one that utilizes the xnnpack backend. This function, wrap2xnn, accepts parameters such as inputs_nums, outputs_nums, and any other arguments required for the construction of a LinearModule. For a clearer understanding, please refer to the example provided below:

E.g.:

class LinearModule : public Module {
    Layer linear;

public:
    LinearModule() {
        linear = Linear(1024, 2048, true, "linear");
    }

    vector<Tensor> Forward(vector<Tensor> inputs, vector<std::any> args) override {
        auto x = inputs[0];
        auto out = linear(x);
        return {out};
    }
};

TEST(XpLinearTest, LinearModule) {
    mllm::xnnpack::Log::log_level = mllm::xnnpack::Log::ERROR;

    auto model = ::mllm::xnnpack::wrap2xnn<LinearModule>(1, 1);
    model.setNoLoadWeightsDtype(DataType::MLLM_TYPE_F32);

    EXPECT_EQ(Backend::global_backends[MLLM_XNNPACK] != nullptr, true);

    Tensor x(1, 1, 256, 1024, Backend::global_backends[MLLM_XNNPACK], true);
    x.setTtype(TensorType::INPUT_TENSOR);

    for (int i = 0; i < 256 * 1024; ++i) {
        *(x.hostPtr<float>() + i) = 1024.f;
    }

    auto out = model({x})[0];

    for (int i = 0; i < 256 * 2048; ++i) {
        EXPECT_EQ(*(out.hostPtr<float>() + i) < 1e-18, true);
    }

    out.printShape();
}

Unlike the dynamic graph mode in MLLM, xnnpack operates on a static graph. This necessitates a mechanism to convert from a dynamic graph to a static graph. The xnnpack backend wrapper in MLLM will add several layers on top of the LinearModule to register input external and output external Tensors. The final wrapped module, as shown in the following pseudocode:

Layer: Direct(Direct::ExternalInput)
Module: LinearModule()
Layer: Direct(Direct::ExternalOutput)
Layer: Dispatch()

You can find more use cases in https://github.com/chenghuaWang/mllm/blob/main/test/xnnpack/

How are the operators in MLLM's xnnpack backend implemented?

Take XpAdd operation as an example:

The XpAdd‘s reshape function is identical to that of CPUAdd. The main differences lie in the setUp and execute functions.

Upon calling execute, XpAdd will integrate a static graph node into the xnnpack subgraph. However, XpAdd performs no actions during the setUp phase. This is because, during the setUp stage, we need to allow the XpDirect Op to determine whether the Tensor is an external input, external output, or a regular tensor.

ErrorCode XpAdd::execute(vector<shared_ptr<Tensor>> inputs, vector<shared_ptr<Tensor>> outputs) {
    auto xpb = (XnnpackBackend *)backend();
    tryDefineAllXpTensors(xpb, inputs);
    tryDefineAllXpTensors(xpb, outputs);

    // define xnnpack op.
    auto status = xnn_define_binary(
        xpb->getXnnSubgraph(),
        xnn_binary_add,
        nullptr,
        inputs[0]->uuid(),
        inputs[1]->uuid(),
        outputs[0]->uuid(),
        0);

    if (status != xnn_status_success) {
        Log::error("XpAdd::execute Error");
        exit(-1);
    }

    return MLLM_NO_ERROR;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants