Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark Pi 4 4GB @1,5GHz but 7x Pi 4 4GB Cluster @1,5GHz wan't #41

Open
somera opened this issue Aug 28, 2024 · 3 comments
Open

Benchmark Pi 4 4GB @1,5GHz but 7x Pi 4 4GB Cluster @1,5GHz wan't #41

somera opened this issue Aug 28, 2024 · 3 comments

Comments

@somera
Copy link

somera commented Aug 28, 2024

Distributor ID: Raspbian
Description:    Raspbian GNU/Linux 11 (bullseye)
Release:        11
Codename:       bullseye

Result for one Pi 4 GB:

TASK [Output the results.] ************************************************************************************************************************************
ok: [127.0.0.1] =>
  mpirun_output.stdout: |-
    ================================================================================
    HPLinpack 2.3  --  High-Performance Linpack benchmark  --   December 2, 2018
    Written by A. Petitet and R. Clint Whaley,  Innovative Computing Laboratory, UTK
    Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
    Modified by Julien Langou, University of Colorado Denver
    ================================================================================

    An explanation of the input/output parameters follows:
    T/V    : Wall time / encoded variant.
    N      : The order of the coefficient matrix A.
    NB     : The partitioning blocking factor.
    P      : The number of process rows.
    Q      : The number of process columns.
    Time   : Time in seconds to solve the linear system.
    Gflops : Rate of execution for solving the linear system.

    The following parameter values will be used:

    N      :   14745
    NB     :     256
    PMAP   : Row-major process mapping
    P      :       1
    Q      :       4
    PFACT  :   Right
    NBMIN  :       4
    NDIV   :       2
    RFACT  :   Crout
    BCAST  :  1ringM
    DEPTH  :       1
    SWAP   : Mix (threshold = 64)
    L1     : transposed form
    U      : transposed form
    EQUIL  : yes
    ALIGN  : 8 double precision words

    --------------------------------------------------------------------------------

    - The matrix A is randomly generated for each test.
    - The following scaled residual check will be computed:
          ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
    - The relative machine precision (eps) is taken to be               1.110223e-16
    - Computational tests pass if scaled residuals are less than                16.0

    ================================================================================
    T/V                N    NB     P     Q               Time                 Gflops
    --------------------------------------------------------------------------------
    WR11C2R4       14745   256     1     4             783.04             2.7298e+00
    HPL_pdgesv() start time Wed Aug 28 00:14:16 2024

    HPL_pdgesv() end time   Wed Aug 28 00:27:19 2024

    --------------------------------------------------------------------------------
    ||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=   3.85920786e-03 ...... PASSED
    ================================================================================

    Finished      1 tests with the following results:
                  1 tests completed and passed residual checks,
                  0 tests completed and failed residual checks,
                  0 tests skipped because of illegal input values.
    --------------------------------------------------------------------------------

    End of Tests.
    ================================================================================

Result for my cluster (7x Pi 4 4GB):

Wan't scale. Run only on one Pi

image

And I don't know why. I have other small applications for Open-MPI framework which run on my cluster. But HPLinpack 2.3 isn't

$ more /clusterfs_sd/clusterhosts_all_cores
pi-4-node-1:4
pi-4-node-2:4
pi-4-node-3:4
pi-4-node-4:4
pi-4-node-5:4
pi-4-node-6:4
pi-manager:4

If I change the order in /clusterfs_sd/clusterhosts_all_cores I see that xhpl tun only on the first node. If I remove :4 than on 4 first nodes (one core on each node) xhpl is running.

@somera
Copy link
Author

somera commented Aug 28, 2024

I tried

$ mpirun -np 28 --hostfile /clusterfs_sd/clusterhosts_all ./xhpl

same. Run only on 4 cores.

image

And I have no idea why?!

@somera
Copy link
Author

somera commented Aug 28, 2024

I saw geerlingguy/turing-pi-2-cluster#1. But my /etc/hosts looks good.

@somera
Copy link
Author

somera commented Aug 28, 2024

I know my problem.

Till now I worked with Open-MPI. And I was using: mpiexec --mca btl tcp,self --mca btl_tcp_if_include eth0.

But now HPL need mpich. And mpich isn't support --mca btl tcp,self --mca btl_tcp_if_include eth0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant