Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sessions new #46

Open
wants to merge 38 commits into
base: master
Choose a base branch
from
Open

Sessions new #46

wants to merge 38 commits into from

Conversation

hppritcha
Copy link
Member

No description provided.

hjelmn and others added 30 commits May 25, 2021 13:41
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
the way fboxes works has issues for the sessions implementation,
in particular tthe session finalize approach.

what happens without this temporary fix is that if there is not some fully shcnronizing call
prior to calling session_finalize, there are cases where a process may be probing its fast
mailboxes for processes that are tearing down theses fboxes.  That results in segfauls and
sigbus problems.

The fast box mechanism will need to be supplemented with some kind of shutdown mechanism
that will tell the owner of the fboxes when its okay to actually tear them down.

IN the interest of making progress using the sessions prototype with applications, shut
down the fbox process for the prototype and return to coming up with a real fix at a later
date.

relates to #3

drop use of MPI_Flag

what we're reading at the forum now for Sessions proposal has ditched MPI_Flags.
Now using info object to MPI_Session_init to specify thread support level
desired.

Signed-off-by: Howard Pritchard <hppritcha@gmail.com>
and so much more

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
Signed-off-by: Howard Pritchard <hppritcha@gmail.com>
remove extraneous method
add patch to bypass excid exchange
some fixes to excid bypass path
sessions/fortran: add missing redefine for profiling interfaces

sessions: move ompi_mpi_instance_cleanup_pml ahead in cleanup
stack to avoid melt-downs in ompi_win_finalize, etc. under certain
cases.

fix attr subsys and ucx compile errors

swat uninitialized variables

fixes many test failures when not using --enable-debug and
application calling MPI_Comm_create_errhandler, etc.

sessions: put MPI_Intercomm_merge back to orig

don't use MPI_Comm_create_from_group since its not supported
by all PMLs.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
minor patchups

sessions: rebase over ULFM attributes
patch up to adjust for MPI_FT key

sessions:  remove pmix glue functions
for PMIx group operations.  Now calling pmix directly

sessions: adjustment for ucx initialization

since it makes calls to hwloc at this point which needs
PMIx and hence rte_init to have been called.

sessions: rebase over ULFM OB1 changes

sessions: rebase over ULFM portals adjustments

sessions: rebase over ULFM minor ucx fixup

sessions: rebase over ULFM changes to FT code

ucx: improve configury

to handle case of HAVE_UCP_ATTR_MEMORY_TYPES not defined

remove redundeant check for gni provier

remove some commented out print statements

Signed-off-by: Howard Pritchard <hppritcha@gmail.com>
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
Turns out that the predefined datatypes are initialized statically,
so the first time through calling ompi_datatype_finalize the magic
cookie gets cleared if code is compiled in debug mode.  This results
in a second call to ompi_datatype_finalize to blow up with an assert
since the cookie had been cleared in the first pass.

Add a new OBJ_DESTRUCT variant which does not check/clear cookie even
when the code is compiled in --enable-debug mode.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
this patch fixes problems encountered trying to
handle session_init following a previous finalize
that shut down the bml.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
ompi_mmpi_instance_unit_basic_count will drop to zero with the
next call to ompi_mpi_instance_release.

This patch may need a better solution.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
this leads to meltdowns in a new session is inited as
the predefined datatypes have been destructed.

Signed-off-by: Howard Pritchard <hppritcha@gmail.com>
The fix to not destruct the predefined datatypes seem to have
addressed the problem.  This commit removes the hack patch.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
per the MPI 4.0 standard

related to

#37

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
this didn't make it into the MPI 4.0 standard

related to #52

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
forum decided attrs had enough problems already and shouldn't be
extended to other mpi objects, so sessions attrs are not part
of the MPI 4.0 standard

related to #47

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
processing to match with current master.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
remove sessions (instance) attribute type a this is not part of the MPI 4.0 standard.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
check

related to #49

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
hppritcha and others added 8 commits June 14, 2021 14:40
this was not part of the MPI 4.0 standard

related to #54

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
not part of the MPI 4 standard so removing

related to #54

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
and some more windows from group stuff too.

None of these made it into the sessions API for MPI 4.0 and its very
likely they will be added in a MPI 5.0 release.  Other approaches to topology
constructs outside of sessions proposal are being explored.

Related to  #58
Related to #54

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
for ULFM communicators.

Related to #43

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
comm_create_from_group

and

intercomm_create_from_groups

related to #56

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
Signed-off-by: tomhers <tom.herschberg@gmail.com>
Signed-off-by: tomhers <tom.herschberg@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants