Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No border? #9

Closed
zvezdochiot opened this issue Dec 30, 2022 · 6 comments
Closed

No border? #9

zvezdochiot opened this issue Dec 30, 2022 · 6 comments

Comments

@zvezdochiot
Copy link

zvezdochiot commented Dec 30, 2022

Hi @shawlleyw .

Use "no border" variant:

#ifndef RESIZE_H_
#define RESIZE_H_

#include "utils.hpp"

float BiCubicWeightCoeff(float x, float a)
{
    if (x <= 1.0f)
    {
        return (1.0f - ((a + 3.0f) - (a + 2.0f) * x) * x * x);
    }
    else if (x < 2.0f)
    {
        return ((-4.0f + (8.0f - (5.0f - x) * x) * x) * a);
    }
    return 0.0f;
}

void BiCubicCoeff4x4(float y, float x, float *coeff)
{
    const float a = -0.5f;

    float u = y - (int)y;
    float v = x - (int)x;

    u += 1.0f;
    v += 1.0f;

    int k = 0;
    for (int i = 0; i < 4; i++)
    {
        float du = (u > i) ? (u - i) : (i - u);
        for (int j = 0; j < 4; j++)
        {
            float dv = (v > j) ? (v - j) : (j - v);
            coeff[k] =
                BiCubicWeightCoeff(du, a) * BiCubicWeightCoeff(dv, a);
            k++;
        }
    }
}

void BGRAfterBiCubic(unsigned char *pix, RGBImage src, float y_float, float x_float,
                              int channels)
{
    float coeff[16];

    int y0 = (int)y_float - 1;
    int x0 = (int)x_float - 1;
    BiCubicCoeff4x4(y_float, x_float, coeff);

    float sum[channels] = {0.0f};
    size_t k = 0, l;
    for (int i = 0; i < 4; i++)
    {
        int yf = ((y0 + i) < 0) ? 0 : ((y0 + i) < src.rows) ? (y0 + i) : (src.rows - 1);
        for (int j = 0; j < 4; j++)
        {
            int xf = ((x0 + j) < 0) ? 0 : ((x0 + j) < src.cols) ? (x0 + j) : (src.cols - 1);
            l = (yf * src.cols + xf) * channels;
            for (int d = 0; d < channels; d++)
            {
                sum[d] += coeff[k] * src.data[l + d];
            }
            k++;
        }
    }
    for (int d = 0; d < channels; d++)
    {
        pix[d] = (unsigned char)((sum[d] < 0.0f) ? 0 : (sum[d] < 255.0f) ? sum[d] : 255);
    }
}

RGBImage ResizeImage(RGBImage src, float ratio)
{
    const int channels = src.channels;
    Timer timer("resize image");
    int resize_rows = src.rows * ratio;
    int resize_cols = src.cols * ratio;
    unsigned char pix[channels];

    printf("resize to: %d x %d\n", resize_cols, resize_rows);

    unsigned char *res = new unsigned char[channels * resize_rows * resize_cols];

    size_t k = 0;
    for (int i = 0; i < resize_rows; i++)
    {
        float src_y = ((float)i + 0.5f) / ratio - 0.5f;
        for (int j = 0; j < resize_cols; j++)
        {
            float src_x = ((float)j + 0.5f) / ratio - 0.5f;
            BGRAfterBiCubic(pix, src, src_y, src_x, channels);
            for (int d = 0; d < channels; d++)
            {
                res[k] = pix[d];
                k++;
            }
            // k += channels;
        }
    }
    return RGBImage{resize_cols, resize_rows, channels, res};
}

#endif /* RESIZE_H_ */

PS: Lead time *= 0.65

You variant:

time ./resize tree.jpg 
image height: 2434, width: 3314
resize to: 12170 x 16570
>>> resize image by 5x: 38610ms
save image tree_5x.jpg

real    0m44,045s
user    0m43,800s
sys     0m0,244s

My variant:

time ./resize tree.jpg tree_x5.jpg 5 
image 3314 x 2434
resize to: 16570 x 12170
>>> resize image: 23882ms
save image tree_x5.jpg

real    0m29,393s
user    0m29,070s
sys     0m0,244s
@shawlleyw
Copy link
Contributor

Hi, this is one of our codebase for practicing code optimization skills. The code of this repo is not optimized intentionally and it is super slow.

I can really see you made an improvement to speed up a little bit. However, we have some highly optimized code in pull requests which you might be interested to have a look.

For example, this is a optimized version and speed up the original code by hundreds times.

@zvezdochiot
Copy link
Author

zvezdochiot commented Dec 31, 2022

@shawlleyw say:

For example, this is a optimized version and speed up the original code by hundreds times.

Yes. But!

My version doesn't use any _mm256_ or _mm128_ and does include the edges of the image (no border). Shouldn't the base version include edge processing and not use specific tools?

Good luck.

@shawlleyw
Copy link
Contributor

Yeah, I remember that I was not quite sure about how to deal with the border so I just dropped it. It didn't cause much trouble though.

Still, it is better to also process the border. You made a point. Thanks!

@zvezdochiot
Copy link
Author

zvezdochiot commented Dec 31, 2022

@shawlleyw say:

Still, it is better to also process the border.

Despite "this", your implementation of BiCubic in C is one of the best on GitHub. 👍

Good luck.

@shawlleyw
Copy link
Contributor

@shawlleyw say:

Still, it is better to also process the border.

Despite "this", your implementation of BiCubic in C is one of the best on GitHub. 👍

😂Thanks, I will update it later, processing the border but remaining calculating each single channel individually cause I want this codebase to be less efficient.

And I'm going to close this issue now. You can also open a pull request to help us fix the border problem if you're interested😉.

@zvezdochiot
Copy link
Author

@shawlleyw say:

You can also open a pull request to help us fix the border problem if you're interested

I think you can manage without me. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants