Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential Use Cases #45

Open
eleon opened this issue Aug 25, 2022 · 25 comments
Open

Potential Use Cases #45

eleon opened this issue Aug 25, 2022 · 25 comments

Comments

@eleon
Copy link
Member

eleon commented Aug 25, 2022

Courtesy of @adammoody.

As a concrete use case, we might have a situation like:

  • an application uses SCR and MPI
  • SCR spawns a background thread per application process for data copies from node-local storage to parallel file system
  • MPI spawns a background thread per application process for async progress

Ideally, those background threads would run on different cores than the main application thread to avoid contention. However, they could run on the same core as the main app thread if there are no spare cores available. The background threads could run on the same core together, since they are likely not CPU intensive.

Does the Quo Vadis interface provide a way to specify a situation like that?

We can use hints within qv_scope_create to accomodate this.

// worker_scope has been defined by the app

// Get the system cores
qv_scope_get(ctx, QV_SCOPE_SYSTEM, &sys_scope);

// Check if there are available PUs
qv_scope_nobjs_avail(ctx, sys_scope, QV_HW_OBJ_PU, &pus);

if (pus > 0)
   qv_scope_create(ctx, sys_scope, QV_HW_OBJ_PU, 1,
                   QV_SCOPE_ATTR_INCLUSIVE,
                   &sub_scope);
else
   sub_scope = worker_scope;

// Launch the thread(s)
qv_pthread_create(thread, attr, start_routine, arg,
                    subscope, ctx);

What we need to implement:

  • QV_SCOPE_SYSTEM
  • qv_pthread_create
  • qv_scope_nobjs_avail
  • Hints: Implement (and define) the hints that qv_scope_create can take. The INCLUSIVE (or shared) hint means that other workers may be running on the same resource (the opposite of exclusive). By default we should place threads using a BFS strategy and then fill up the cores if multiple hardware threads are available.

I guess both SCR and MPI would need to make QV calls?

Yes, the more components that use QV, the better placement and coordination.

@GuillaumeMercier
Copy link
Collaborator

What we need to implement:
* qv_pthread_create

I'm working on it.

@GuillaumeMercier
Copy link
Collaborator

* MPI spawns a background thread per application process for async progress

Interesting. It means then that we have to support the hybrid MPI + OpenMP + Pthreads model. I'm not sure that the current OpenMP-based implementation in QV supports this.

@samuelkgutierrez
Copy link
Member

* MPI spawns a background thread per application process for async progress

Interesting. It means then that we have to support the hybrid MPI + OpenMP + Pthreads model. I'm not sure that the current OpenMP-based implementation in QV supports this.

Fine-grained MPI + OpenMP + Pthreads support would be fantastic. I'm aware of other use cases that would benefit from this capability, too.

@GuillaumeMercier
Copy link
Collaborator

One thing I'm thinking about it the ability to share a contex and/or a scope between OpenMP threads and Pthreads.
This is more or less the same issue as in #35 I guess.

@GuillaumeMercier
Copy link
Collaborator

I think we need functions in the like of qv_init/qv_finalize to support this.

@samuelkgutierrez
Copy link
Member

I think we need functions in the like of qv_init/qv_finalize to support this.

Could you please elaborate on this?

@samuelkgutierrez
Copy link
Member

Also, something to consider: would splitting up OpenMP and Pthread support help in any way?

@GuillaumeMercier
Copy link
Collaborator

Could you please elaborate on this?

Yes, I can: as we already discussed, MPI and OpenMP feature some kind of runtime system that can be relied upon and queried to. This is not the case with pthread and I will need to introduce some shared memory space that can contain some information the Pthread implementation will need. In the case of multi-paradigms programs (eg. MPI + OpenMP + Pthreads) then this shared space should be accessible by all "paradigms". My thinking is that having three separate supports/implemenations that are not aware of the others is not going to work.

@GuillaumeMercier
Copy link
Collaborator

Thus, having a generic qv_init call would allow for setting up this shared space regardless of the programming model.
But maybe it's a call that would need to be made only in the case of hybrid cases?

@samuelkgutierrez
Copy link
Member

Could you please elaborate on this?

Yes, I can: as we already discussed, MPI and OpenMP feature some kind of runtime system that can be relied upon and queried to. This is not the case with pthread and I will need to introduce some shared memory space that can contain some information the Pthread implementation will need. In the case of multi-paradigms programs (eg. MPI + OpenMP + Pthreads) then this shared space should be accessible by all "paradigms". My thinking is that having three separate supports/implemenations that are not aware of the others is not going to work.

In that case, could one implement an internal abstraction that provides such mechanisms for Pthreads?

@GuillaumeMercier
Copy link
Collaborator

Yes, that is what I'm planning to do. But this internal abstraction would have to be shared eventually, wouldn't it?

@samuelkgutierrez
Copy link
Member

Thus, having a generic qv_init call would allow for setting up this shared space regardless of the programming model. But maybe it's a call that would need to be made only in the case of hybrid cases?

Could we accomplish the same goal by implementing the missing machinery internally to QV?

@GuillaumeMercier
Copy link
Collaborator

Maybe. But how do you detect hybrid cases? And enable the support in these cases?

@samuelkgutierrez
Copy link
Member

Yes, that is what I'm planning to do. But this internal abstraction would have to be shared eventually, wouldn't it?

Shared across tasks, yes; but I'm not convinced that we have to expose those details to the user.

@GuillaumeMercier
Copy link
Collaborator

And MPI + OpenMP is different from OpenMP + Pthreads IMHO.

@GuillaumeMercier
Copy link
Collaborator

Shared across tasks, yes; but I'm not convinced that we have to expose those details to the user.

I agree but I'm not sure that we can do something completely transparent. But I advocate for transparency in this matter so I think we're in agreement here.

@GuillaumeMercier
Copy link
Collaborator

Let me come up with an initial crappy design for Pthreads that works and then we'll iterate from it.

@samuelkgutierrez
Copy link
Member

Maybe. But how do you detect hybrid cases? And enable the support in these cases?

Would some machinery we come up regarding #35 do the trick? Recall that the RMI should (but currently doesn't) keep track of all the groups and their respective tasks for us. Maybe we can use the RMI as the ultimate keeper of such information. This would obviate the need for an explicit init and finalize.

@GuillaumeMercier
Copy link
Collaborator

Would some machinery we come up regarding #35 do the trick?

My gut feeling is that it will (partially at least).

Recall that the RMI should (but currently doesn't) keep track of all the groups and their respective tasks for us. Maybe we can > use the RMI as the ultimate keeper of such information. This would obviate the need for an explicit init and finalize.

OK, we have to discuss a bit then because it's something that I didn't completly catched previously. Which groups are you refering to? Because we know the word can be confusing. The same groups that are included in group tabs for each structure?
And what I'm thinking about (when talking about shared space or runtime info sharing) would only apply to single processes.
Therefore I'm not sure that we need a global, centralized instance for this.

@samuelkgutierrez
Copy link
Member

Yes, let's schedule a call so we can talk this over. This is an important decision. I have some ideas about the single-process case: it should be pretty straightforward to implement (famous last words).

@GuillaumeMercier
Copy link
Collaborator

Yes, let's schedule a call so we can talk this over. This is an important decision.

Agreed.

I have some ideas about the single-process case: it should be pretty straightforward to implement (famous last words).

Are you trying to impersonate me?

@samuelkgutierrez samuelkgutierrez changed the title Utility threads use case: SCR Potential Use Cases Apr 7, 2023
@samuelkgutierrez
Copy link
Member

Here is another potential use case that's worth considering: internal use in mpibind. This might help demonstrate QV's generality in another piece of system software.

@eleon
Copy link
Member Author

eleon commented Apr 7, 2023

Here is another potential use case that's worth considering: internal use in mpibind

Greetings, @samuelkgutierrez Not sure I follow. Could you elaborate a bit more?

@samuelkgutierrez
Copy link
Member

I was just thinking that maybe we can implement core pieces of mpibind's API using QV underneath the covers. This could serve as another demonstration of QV's generality in the system software space if we can successfully use it for common mpibind tasks.

@eleon
Copy link
Member Author

eleon commented Apr 8, 2023

It makes sense, @samuelkgutierrez , thanks for clarifying!
Actually, we are already heading in that direction. For example, the split_at function with maximum spread is one mpibind's mappings!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants