diff --git a/miniVite b/miniVite deleted file mode 160000 index f4367f1..0000000 --- a/miniVite +++ /dev/null @@ -1 +0,0 @@ -Subproject commit f4367f1c5d64034a3ee09c4de5397767c25eafb2 diff --git a/miniVite/FAQS b/miniVite/FAQS new file mode 100644 index 0000000..11b66fe --- /dev/null +++ b/miniVite/FAQS @@ -0,0 +1,270 @@ +***************** +* miniVite FAQs * +***************** +---------------------------------------------------- +FYI, typical "How to run" queries are addressed Q5 +onward. + +Please send your suggestions for improving this FAQ +to zsayanz at gmail dot com OR hala at pnnl dot gov. +---------------------------------------------------- + +------------------------------------------------------------------------- +Q1. What is graph community detection? +------------------------------------------------------------------------- + +A1. In most real-world graphs/networks, the nodes/vertices tend to be +organized into tightly-knit modules known as communities or clusters, +such that nodes within a community are more likely to be "related" to +one another than they are to the rest of the network. The goodness of +partitioning into communities is typically measured using a metric +called modularity. Community detection is the method of identifying +these clusters or communities in graphs. + +[References] + +Fortunato, Santo. "Community detection in graphs." Physics reports +486.3-5 (2010): 75-174. https://arxiv.org/pdf/0906.0612.pdf + +-------------------------------------------------------------------------- +Q2. What is miniVite? +-------------------------------------------------------------------------- + +A2. miniVite is a distributed-memory code (or mini application) that +performs partial graph community detection using the Louvain method. +Louvain method is a multi-phase, iterative heuristic that performs +modularity optimization for graph community detection. miniVite only +performs the first phase of Louvain method. + +[Code] + +https://github.com/Exa-Graph/miniVite +http://hpc.pnl.gov/people/hala/grappolo.html + +[References] + +Blondel, Vincent D., et al. "Fast unfolding of communities in large +networks." Journal of statistical mechanics: theory and experiment +2008.10 (2008): P10008. + +Ghosh S, Halappanavar M, Tumeo A, Kalyanaraman A, Gebremedhin AH. +miniVite: A Graph Analytics Benchmarking Tool for Massively Parallel +Systems. + +--------------------------------------------------------------------------- +Q3. What is the parent application of miniVite? How are they different? +--------------------------------------------------------------------------- + +A3. miniVite is derived from Vite, which implements the multi-phase +Louvain method. Apart from a parallel baseline version, Vite provides +a number of heuristics (such as early termination, threshold cycling and +incomplete coloring) that can improve the scalability and quality of +community detection. In contrast, miniVite just provides a parallel +baseline version, and, has option to select different MPI communication +methods (such as send/recv, collectives and RMA) for one of the most +communication intensive portions of the code. miniVite also includes an +in-memory random geometric graph generator, making it convenient for +users to run miniVite without any external files. Vite can also convert +graphs from different native formats (like matrix market, SNAP, edge +list, DIMACS, etc) to the binary format that both Vite and miniVite +requires. + +[Code] + +http://hpc.pnl.gov/people/hala/grappolo.html + +[References] + +Ghosh S, Halappanavar M, Tumeo A, Kalyanaraman A, Lu H, Chavarria-Miranda D, +Khan A, Gebremedhin A. Distributed louvain algorithm for graph community +detection. In 2018 IEEE International Parallel and Distributed Processing +Symposium (IPDPS) 2018 May 21 (pp. 885-895). IEEE. + +Ghosh S, Halappanavar M, Tumeo A, Kalyanaraman A, Gebremedhin AH. +Scalable Distributed Memory Community Detection Using Vite. +In 2018 IEEE High Performance extreme Computing Conference (HPEC) 2018 +Sep 25 (pp. 1-7). IEEE. + +----------------------------------------------------------------------------- +Q4. Is there a shared-memory equivalent of Vite/miniVite? +----------------------------------------------------------------------------- + +A4. Yes, Grappolo performs shared-memory community detection using Louvain +method. Apart from community detection, Grappolo has routines for matrix +reordering as well. + +[Code] + +http://hpc.pnl.gov/people/hala/grappolo.html + +[References] + +Lu H, Halappanavar M, Kalyanaraman A. Parallel heuristics for scalable +community detection. Parallel Computing. 2015 Aug 1;47:19-37. + +Halappanavar M, Lu H, Kalyanaraman A, Tumeo A. Scalable static and dynamic +community detection using grappolo. In High Performance Extreme Computing +Conference (HPEC), 2017 IEEE 2017 Sep 12 (pp. 1-6). IEEE. + +------------------------------------------------------------------------------ +Q5. How does one perform strong scaling analysis using miniVite? How to +determine 'good' candidates (input graphs) that can be used for strong +scaling runs? How much time is approximately spent in performing I/O? +------------------------------------------------------------------------------ + +A5. Use a large graph as an input, preferably over a billion edges. Not all +large graphs have a good community structure. You should be able to identify +one that serves your purpose, hopefully after few trials. Graphs can be +obtained various websites serving as repositories, such as Sparse TAMU +collection[1], SNAP repository[2] and MIT Graph Challenge website[3], to name +a few of the prominent ones. You can convert graphs from their native format to +the binary format that miniVite requires, using the converters in Vite (please +see README). If your graph is in Webgraph[4] format, you can easily convert it +to an edge list first (example code snippet below), before passing it on to Vite +for subsequent binary conversion. + +#include "offline_edge_iterator.hpp" +... +using namespace webgraph::ascii_graph; + +// read in input/output file +std::ofstream ofile(argv[2]); +offline_edge_iterator itor(argv[1]), end; + +// read edges +while( itor != end ) { + ofile << itor->first << " " << itor->second << std::endl; + ++itor; +} +ofile.close(); +... + +Due to its simple vertex-based distribution, miniVite takes about 2-4s to read a 55GB +binary file if you use Burst buffer (Cray DataWarp) or Lustre striping (about 25 OSTs, +default 1M blocks). Hence, the overall I/O time that we have observed in most cases is +within 1/2% of the overall execution time. + +[1] https://sparse.tamu.edu/ +[2] http://snap.stanford.edu/data +[3] http://graphchallenge.mit.edu/data-sets +[4] http://webgraph.di.unimi.it/ + +----------------------------------------------------------------------------------- +Q6. How does one perform weak scaling analysis using miniVite? How does one scale +the graphs with processes? +----------------------------------------------------------------------------------- + +A6. miniVite has an in-memory random geometric graph generator (please see +README) that can be used for weak-scaling analysis. An n-D random geometric graph +(RGG), is generated by randomly placing N vertices in an n-D space and connecting +pairs of vertices whose Euclidean distance is less than or equal to d. We only +consider 2D RGGs contained within a unit square, [0,1]^2. We distribute the domain +such that each process receives N/p vertices (where p is the total +number of processes). + +Each process owns (1 * 1/p) portion of the unit square and d is computed as (please +refer to Section 4 of miniVite paper for details): + +d = (dc + dt)/2; +where, dc = sqrt(ln(N) / pi*N); dt = sqrt(2.0736 / pi*N) + +Therefore, the number of vertices (N) passed during miniVite execution on p +processes must satisfy the condition -- 1/p > d. + +Please note, the default distribution of graph generated from the in-built random +geometric graph generator causes a process to only communicate with its two +immediate neighbors. If you want to increase the communication intensity for +generated graphs, please use the "-p" option to specify an extra percentage of edges +that will be generated, linking random vertices. As a side-effect, this option +significantly increases the time required to generate the graph. + +------------------------------------------------------------------------------ +Q7. Does Vite (the parent application to miniVite) have an in-built graph +generator? +------------------------------------------------------------------------------ + +A7. At present, Vite does not have an in-built graph generator that we have in +miniVite, so we rely on users providing external graphs for Vite (strong/weak +scaling) analysis. However, Vite has bindings to NetworKit[5], and users can use +those bindings to generate graphs of their choice from Vite (refer to the +README). Generating large graphs in this manner can take a lot of time, since +there are intermediate copies and the graph generators themselves may be serial +or may use threads on a shared-memory system. We do not plan on supporting the +NetworKit bindings in future. + +[5] https://networkit.github.io/ + +------------------------------------------------------------------------------ +Q8. Does providing a larger input graph translate to comparatively larger +execution times? Is it possible to control the execution time for a particular +graph? +------------------------------------------------------------------------------ + +A8. No. A relatively small graph can run for many iterations, as compared to +a larger graph that runs for a few iterations to convergence. Since miniVite is +iterative, the final number of iterations to convergence (and hence, execution +time) depends on the structure of the graph. It is however possible to exit +early by passing a larger threshold (using the "-t <...>" option, the default +threshold or tolerance is 1.0E-06, a larger threshold can be passed, for e.g, +"-t 1.0E-03"), that should reduce the overall execution time for all graphs in +general (at least w.r.t miniVite, which only executes the first phase of Louvain +method). + +------------------------------------------------------------------------------ +Q9. Is there an option to add some noise in the generated random geometric +graphs? +------------------------------------------------------------------------------ + +A9. Yes, the "-p " option allows extra edges to be added between +random vertices (see README). This increases the overall communication, but +affects the structure of communities in the generated graph (lowers the +modularity). Therefore, adding extra edges in the generated graph will +most probably reduce the global modularity, and the number of iterations to +convergence shall decrease. +The maximum number of edges that can be added is bounded by INT_MAX, at +present, we do not handle data ranges more than INT_MAX. + +------------------------------------------------------------------------------ +Q10. What are the steps required for using real-world graphs as an input to +miniVite? +------------------------------------------------------------------------------ + +A10. First, please download Vite (parent application of miniVite) from: +http://hpc.pnl.gov/people/hala/grappolo.html + +Graphs/Sparse matrices come in several native formats (matrix market, SNAP, +DIMACS, etc.) Vite has several options to convert graphs from native to the +binary format that miniVite requires (please take a look at Vite README). + +As an example, you can download the Friendster file from: +https://sparse.tamu.edu/SNAP/com-Friendster +The option to convert Friendster to binary using Vite's converter is as follows +(please note, this part is serial): + +$VITE_BIN_PATH/bin/./fileConvertDist -f $INPUT_PATH/com-Friendster.mtx + -m -o $OUTPUT_PATH/com-Friendster.bin + +After the conversion, you can run miniVite with the binary file obtained +from the previous step: + +mpiexec -n <...> $MINIVITE_PATH/./dspl -r + -f $FILE_PATH/com-Friendster.bin + +-------------------------------------------------------------------------------- +Q11. miniVite is scalable for a particular input graph, but not for another +similar sized graph, why is that? +-------------------------------------------------------------------------------- + +A11. Presently, our distribution is vertex-based. That means a process owns N/p +vertices and all the edges connected to those N/p vertices (including ghost +vertices). Load imbalances are very probable in this type of distribution, +depending on the graph structure. + +As an example, lets say there is a large (real-world) graph, and its structure +is such that only a few processes end up owning a majority of edges, as per +miniVite graph data distribution. Also, lets assume that the graph has either a +very poor community structure (modularity closer to 0) or very stable community +structure (modularity close to 1 after a few iterations, that means not many +vertices are migrating to neighboring communities). In both these cases, +community detection in miniVite will run for relatively less number of +iterations, which may affect the overall scalability. diff --git a/miniVite/LICENSE b/miniVite/LICENSE new file mode 100644 index 0000000..4959d64 --- /dev/null +++ b/miniVite/LICENSE @@ -0,0 +1,29 @@ +BSD 3-Clause License + +Copyright (c) 2018, Battelle Memorial Institute +All rights reserved. + +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are met: + +* Redistributions of source code must retain the above copyright notice, this + list of conditions and the following disclaimer. + +* Redistributions in binary form must reproduce the above copyright notice, + this list of conditions and the following disclaimer in the documentation + and/or other materials provided with the distribution. + +* Neither the name of the copyright holder nor the names of its + contributors may be used to endorse or promote products derived from + this software without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE +FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR +SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER +CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, +OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. diff --git a/miniVite/Makefile b/miniVite/Makefile new file mode 100644 index 0000000..a5e51f4 --- /dev/null +++ b/miniVite/Makefile @@ -0,0 +1,33 @@ +CXX = mpicxx +# use -xmic-avx512 instead of -xHost for Intel Xeon Phi platforms +PLUGIN_FLAG = -Xclang -load -Xclang ~/git/unifiedmem/code/llvm-pass/build/uvm/libOMPPass.so +#OPTFLAGS = -O3 -xHost -qopenmp -DCHECK_NUM_EDGES #-DPRINT_EXTRA_NEDGES #-DPRINT_DIST_STATS #-DUSE_MPI_RMA -DUSE_MPI_ACCUMULATE #-DUSE_32_BIT_GRAPH #-DDEBUG_PRINTF #-DUSE_MPI_RMA #-DPRINT_LCG_DOUBLE_LOHI_RANDOM_NUMBERS#-DUSE_MPI_RMA #-DPRINT_LCG_DOUBLE_RANDOM_NUMBERS #-DPRINT_RANDOM_XY_COORD +OPTFLAGS = -O3 -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -DOMP_GPU -DOMP_GPU_ALLOC -DCHECK_NUM_EDGES #-DPRINT_EXTRA_NEDGES #-DPRINT_DIST_STATS #-DUSE_MPI_RMA -DUSE_MPI_ACCUMULATE #-DUSE_32_BIT_GRAPH #-DDEBUG_PRINTF #-DUSE_MPI_RMA #-DPRINT_LCG_DOUBLE_LOHI_RANDOM_NUMBERS#-DUSE_MPI_RMA #-DPRINT_LCG_DOUBLE_RANDOM_NUMBERS #-DPRINT_RANDOM_XY_COORD +#OPTFLAGS = -O3 -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -DOMP_GPU -DCHECK_NUM_EDGES #-DPRINT_EXTRA_NEDGES #-DPRINT_DIST_STATS #-DUSE_MPI_RMA -DUSE_MPI_ACCUMULATE #-DUSE_32_BIT_GRAPH #-DDEBUG_PRINTF #-DUSE_MPI_RMA #-DPRINT_LCG_DOUBLE_LOHI_RANDOM_NUMBERS#-DUSE_MPI_RMA #-DPRINT_LCG_DOUBLE_RANDOM_NUMBERS #-DPRINT_RANDOM_XY_COORD +#OPTFLAGS = -O3 -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -DOMP_GPU -DCHECK_NUM_EDGES -DDEBUG_PRINTF #-DPRINT_EXTRA_NEDGES #-DPRINT_DIST_STATS #-DUSE_MPI_RMA -DUSE_MPI_ACCUMULATE #-DUSE_32_BIT_GRAPH #-DDEBUG_PRINTF #-DUSE_MPI_RMA #-DPRINT_LCG_DOUBLE_LOHI_RANDOM_NUMBERS#-DUSE_MPI_RMA #-DPRINT_LCG_DOUBLE_RANDOM_NUMBERS #-DPRINT_RANDOM_XY_COORD +#OPTFLAGS = -O3 -fopenmp -DOMP_GPU -DCHECK_NUM_EDGES -DDEBUG_PRINTF #-DPRINT_EXTRA_NEDGES #-DPRINT_DIST_STATS #-DUSE_MPI_RMA -DUSE_MPI_ACCUMULATE #-DUSE_32_BIT_GRAPH #-DDEBUG_PRINTF #-DUSE_MPI_RMA #-DPRINT_LCG_DOUBLE_LOHI_RANDOM_NUMBERS#-DUSE_MPI_RMA #-DPRINT_LCG_DOUBLE_RANDOM_NUMBERS #-DPRINT_RANDOM_XY_COORD +#OPTFLAGS = -O3 -fopenmp -DCHECK_NUM_EDGES -DDEBUG_PRINTF #-DPRINT_EXTRA_NEDGES #-DPRINT_DIST_STATS #-DUSE_MPI_RMA -DUSE_MPI_ACCUMULATE #-DUSE_32_BIT_GRAPH #-DDEBUG_PRINTF #-DUSE_MPI_RMA #-DPRINT_LCG_DOUBLE_LOHI_RANDOM_NUMBERS#-DUSE_MPI_RMA #-DPRINT_LCG_DOUBLE_RANDOM_NUMBERS #-DPRINT_RANDOM_XY_COORD +#-DUSE_MPI_SENDRECV +#-DUSE_MPI_COLLECTIVES +# use export ASAN_OPTIONS=verbosity=1 to check ASAN output +SNTFLAGS = -std=c++11 -fopenmp -fsanitize=address -O1 -fno-omit-frame-pointer +CXXFLAGS = -std=c++11 -g $(OPTFLAGS) + +OBJ = main.o +TARGET = miniVite + +all: $(TARGET) + +%.o: %.cpp + $(CXX) $(CXXFLAGS) $(PLUGIN_FLAG) -c -o $@ $^ + +%.ll: %.cpp + $(CXX) $(CXXFLAGS) $(PLUGIN_FLAG) -emit-llvm -S -c -o $@ $^ + +$(TARGET): $(OBJ) + $(CXX) $^ $(OPTFLAGS) -o $@ + +.PHONY: clean + +clean: + rm -rf *~ $(OBJ) $(TARGET) *.ll diff --git a/miniVite/README b/miniVite/README new file mode 100644 index 0000000..f320dec --- /dev/null +++ b/miniVite/README @@ -0,0 +1,138 @@ +************************ +miniVite (/mini/ˈviːte/) + +Version: 1.0 +************************ + +******* +------- + ABOUT +------- +******* +miniVite is a proxy app that implements a single phase of Louvain +method in distributed memory for graph community detection. Please +refer to the following paper for a detailed discussion on +distributed memory Louvain method implementation: +https://ieeexplore.ieee.org/abstract/document/8425242/ + +Apart from real world graphs, users can use specific options +to generate a Random Geometric Graph (RGG) in parallel. +RGGs have been known to have good community structure: +https://arxiv.org/pdf/1604.03993.pdf + +The way we have implemented a parallel RGG generator, vertices +owned by a process will only have cross edges with its logical +neighboring processes (each process owning 1x1/p chunk of the +1x1 unit square). If MPI process mapping is such that consecutive +processes (for e.g., p and p+1) are physically close to each other, +then there is not much communication stress in the application. +Therefore, we allow an option to add extra edges between randomly +chosen vertices, whose owners may be physically far apart. + +We require the total number of processes to be a power of 2 and +total number of vertices to be perfectly divisible by the number of +processes when parallel RGG generation options are used. +This constraint does not apply to real world graphs passed to miniVite. + +We also allow users to pass any real world graph as input. However, +we expect an input graph to be in a certain binary format, which +we have observed to be more efficient than reading ASCII format +files. The code for binary conversion (from a variety of common +graph formats) is packaged separately with Vite, which is our +full implementation of Louvain method in distributed memory. +Please follow instructions in Vite README for binary file +conversion. + +Vite could be downloaded from: +http://hpc.pnl.gov/people/hala/grappolo.html + +Unlike Vite, we do not implement any heuristics to improve the +performance of Louvain method. miniVite is a baseline parallel +version, implementing only the first phase of Louvain method. + +This code requires an MPI library (preferably MPI-3 compatible) +and C++11 compliant compiler for building. + +Please contact the following for any queries or support: + +Sayan Ghosh, WSU (zsayanz at gmail dot com) +Mahantesh Halappanavar, PNNL (hala at pnnl dot gov) + +************* +------------- + COMPILATION +------------- +************* +Please update the Makefile with compiler flags and use a C++11 compliant +compiler of your choice. Invoke `make clean; make` after setting paths +to MPI for generating the binary. Use `mpirun` or `mpiexec` or `srun` +to execute the code with specific runtime arguments mentioned in the +next section. + +Pass -DPRINT_DIST_STATS for printing distributed graph +characteristics. + +Pass -DDEBUG_PRINTF if detailed diagonostics is required along +program run. This program requires OpenMP and C++11 support, +so pass -fopenmp (for g++)/-qopenmp (for icpc) and -std=c++11/ +-std=c++0x. + +Pass -DUSE_32_BIT_GRAPH if number of nodes in the graph are +within 32-bit range (2 x 10^9), else 64-bit range is assumed. + +Pass -DOMP_SCHEDULE_RUNTIME if you want to set OMP_SCHEDULE +for all parallel regions at runtime. If -DOMP_SCHEDULE_RUNTIME +is passed, and OMP_SCHEDULE is not set, then the default schedule will +be chosen (which is most probably "static" or "guided" for most of +the OpenMP regions). + +Communicating vertex-community information (per iteration) +is the most expensive step of our distributed Louvain +implementation. We use the one of the following MPI communication +primitives for communicating vertex-community during a Louvain +iteration, that could be enabled by passing predefined +macros at compile time: + +1. MPI Collectives: -DUSE_MPI_COLLECTIVES +2. MPI Send-Receive: -DUSE_MPI_SENDRECV +3. MPI RMA: -DUSE_MPI_RMA (using -DUSE_MPI_ACCUMULATE + additionally ensures atomic put) +4. Default: Uses MPI point-to-point nonblocking API. + +Apart from these, we use MPI (blocking) collectives, mostly +MPI_Alltoall. + +There are other predefined macros in the code as well for printing +intermediate results or checking correctness or using a particular +C++ data structure. + +*********************** +----------------------- + EXECUTING THE PROGRAM +----------------------- +*********************** + +E.g.: +mpiexec -n 2 bin/./minivite -f karate.bin +mpiexec -n 2 bin/./minivite -l -n 100 +mpiexec -n 2 bin/./minivite -n 100 +mpiexec -n 2 bin/./minivite -p 2 -n 100 + +Possible options (can be combined): + +1. -f : Specify input binary file after this argument. +2. -n : Pass total number of vertices of the generated graph. +3. -l : Use distributed LCG for randomly choosing edges. If this option + is not used, we will use C++ random number generator (using + std::default_random_engine). +4. -p : Specify percent of overall edges to be randomly generated between + processes. +5. -t : Specify threshold quantity (default: 1.0E-06) used to determine the + exit criteria in an iteration of Louvain method. +6. -w : Use Euclidean distance as edge weight. If this option is not used, + edge weights are considered as 1.0. Generate edge weight uniformly + between (0,1) if Euclidean distance is not available (applicable to + randomly generated edges). +7. -r : This is used to control the number of aggregators in MPI I/O and is + meaningful when an input binary graph file is passed with option "-f". + naggr := (nranks > 1) ? (nprocs/nranks) : nranks; diff --git a/miniVite/dspl.hpp b/miniVite/dspl.hpp new file mode 100644 index 0000000..f86ae90 --- /dev/null +++ b/miniVite/dspl.hpp @@ -0,0 +1,1392 @@ +// *********************************************************************** +// +// miniVite +// +// *********************************************************************** +// +// Copyright (2018) Battelle Memorial Institute +// All rights reserved. +// +// Redistribution and use in source and binary forms, with or without +// modification, are permitted provided that the following conditions +// are met: +// +// 1. Redistributions of source code must retain the above copyright +// notice, this list of conditions and the following disclaimer. +// +// 2. Redistributions in binary form must reproduce the above copyright +// notice, this list of conditions and the following disclaimer in the +// documentation and/or other materials provided with the distribution. +// +// 3. Neither the name of the copyright holder nor the names of its +// contributors may be used to endorse or promote products derived from +// this software without specific prior written permission. +// +// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS +// FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE +// COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, +// INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, +// BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; +// LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER +// CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +// LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN +// ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +// POSSIBILITY OF SUCH DAMAGE. +// +// ************************************************************************ + +#pragma once +#ifndef DSPL_HPP +#define DSPL_HPP + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +#include "graph.hpp" +#include "utils.hpp" + +struct Comm { + GraphElem size; + GraphWeight degree; + + Comm() : size(0), degree(0.0) {}; +}; + +struct CommInfo { + GraphElem community; + GraphElem size; + GraphWeight degree; +}; + +const int SizeTag = 1; +const int VertexTag = 2; +const int CommunityTag = 3; +const int CommunitySizeTag = 4; +const int CommunityDataTag = 5; + +static MPI_Datatype commType; + +void distSumVertexDegree(const Graph &g, std::vector &vDegree, std::vector &localCinfo) +{ + const GraphElem nv = g.get_lnv(); + +#ifdef OMP_SCHEDULE_RUNTIME +#pragma omp parallel for default(none), shared(g, vDegree, localCinfo), schedule(runtime) +#else +#pragma omp parallel for default(none), shared(g, vDegree, localCinfo), schedule(guided) +#endif + for (GraphElem i = 0; i < nv; i++) { + GraphElem e0, e1; + GraphWeight tw = 0.0; + + g.edge_range(i, e0, e1); + + for (GraphElem k = e0; k < e1; k++) { + const Edge &edge = g.get_edge(k); + tw += edge.weight_; + } + + vDegree[i] = tw; + + localCinfo[i].degree = tw; + localCinfo[i].size = 1L; + } +} // distSumVertexDegree + +GraphWeight distCalcConstantForSecondTerm(const std::vector &vDegree, MPI_Comm gcomm) +{ + GraphWeight totalEdgeWeightTwice = 0.0; + GraphWeight localWeight = 0.0; + int me = -1; + + const size_t vsz = vDegree.size(); + +#ifdef OMP_SCHEDULE_RUNTIME +#pragma omp parallel for default(none), shared(vDegree), reduction(+: localWeight) schedule(runtime) +#else +#pragma omp parallel for default(none), shared(vDegree), reduction(+: localWeight) schedule(static) +#endif + for (GraphElem i = 0; i < vsz; i++) + localWeight += vDegree[i]; // Local reduction + + // Global reduction + MPI_Allreduce(&localWeight, &totalEdgeWeightTwice, 1, + MPI_WEIGHT_TYPE, MPI_SUM, gcomm); + + return (1.0 / static_cast(totalEdgeWeightTwice)); +} // distCalcConstantForSecondTerm + +void distInitComm(std::vector &pastComm, std::vector &currComm, const GraphElem base) +{ + const size_t csz = currComm.size(); + +#ifdef DEBUG_PRINTF + assert(csz == pastComm.size()); +#endif + +#ifdef OMP_SCHEDULE_RUNTIME +#pragma omp parallel for default(none), shared(pastComm, currComm), schedule(runtime) +#else +#pragma omp parallel for default(none), shared(pastComm, currComm), schedule(static) +#endif + for (GraphElem i = 0L; i < csz; i++) { + pastComm[i] = i + base; + currComm[i] = i + base; + } +} // distInitComm + +void distInitLouvain(const Graph &dg, std::vector &pastComm, + std::vector &currComm, std::vector &vDegree, + std::vector &clusterWeight, std::vector &localCinfo, + std::vector &localCupdate, GraphWeight &constantForSecondTerm, + const int me) +{ + const GraphElem base = dg.get_base(me); + const GraphElem nv = dg.get_lnv(); + MPI_Comm gcomm = dg.get_comm(); + + vDegree.resize(nv); + pastComm.resize(nv); + currComm.resize(nv); + clusterWeight.resize(nv); + localCinfo.resize(nv); + localCupdate.resize(nv); + + distSumVertexDegree(dg, vDegree, localCinfo); + constantForSecondTerm = distCalcConstantForSecondTerm(vDegree, gcomm); + + distInitComm(pastComm, currComm, base); +} // distInitLouvain + +GraphElem distGetMaxIndex(const std::unordered_map &clmap, const std::vector &counter, + const GraphWeight selfLoop, const std::vector &localCinfo, + const std::map &remoteCinfo, const GraphWeight vDegree, + const GraphElem currSize, const GraphWeight currDegree, const GraphElem currComm, + const GraphElem base, const GraphElem bound, const GraphWeight constant) +{ + std::unordered_map::const_iterator storedAlready; + GraphElem maxIndex = currComm; + GraphWeight curGain = 0.0, maxGain = 0.0; + GraphWeight eix = static_cast(counter[0]) - static_cast(selfLoop); + + GraphWeight ax = currDegree - vDegree; + GraphWeight eiy = 0.0, ay = 0.0; + + GraphElem maxSize = currSize; + GraphElem size = 0; + + storedAlready = clmap.begin(); +#ifdef DEBUG_PRINTF + assert(storedAlready != clmap.end()); +#endif + do { + if (currComm != storedAlready->first) { + + // is_local, direct access local info + if ((storedAlready->first >= base) && (storedAlready->first < bound)) { + ay = localCinfo[storedAlready->first-base].degree; + size = localCinfo[storedAlready->first - base].size; + } + else { + // is_remote, lookup map + std::map::const_iterator citer = remoteCinfo.find(storedAlready->first); + ay = citer->second.degree; + size = citer->second.size; + } + + eiy = counter[storedAlready->second]; + + curGain = 2.0 * (eiy - eix) - 2.0 * vDegree * (ay - ax) * constant; + + if ((curGain > maxGain) || + ((curGain == maxGain) && (curGain != 0.0) && (storedAlready->first < maxIndex))) { + maxGain = curGain; + maxIndex = storedAlready->first; + maxSize = size; + } + } + storedAlready++; + } while (storedAlready != clmap.end()); + + if ((maxSize == 1) && (currSize == 1) && (maxIndex > currComm)) + maxIndex = currComm; + + return maxIndex; +} // distGetMaxIndex + +GraphWeight distBuildLocalMapCounter(const GraphElem e0, const GraphElem e1, std::unordered_map &clmap, + std::vector &counter, const Graph &g, + const std::vector &currComm, + const std::unordered_map &remoteComm, + const GraphElem vertex, const GraphElem base, const GraphElem bound) +{ + GraphElem numUniqueClusters = 1L; + GraphWeight selfLoop = 0; + std::unordered_map::const_iterator storedAlready; + + for (GraphElem j = e0; j < e1; j++) { + + const Edge &edge = g.get_edge(j); + const GraphElem &tail_ = edge.tail_; + const GraphWeight &weight = edge.weight_; + GraphElem tcomm; + + if (tail_ == vertex + base) + selfLoop += weight; + + // is_local, direct access local std::vector + if ((tail_ >= base) && (tail_ < bound)) + tcomm = currComm[tail_ - base]; + else { // is_remote, lookup map + std::unordered_map::const_iterator iter = remoteComm.find(tail_); + +#ifdef DEBUG_PRINTF + assert(iter != remoteComm.end()); +#endif + tcomm = iter->second; + } + + storedAlready = clmap.find(tcomm); + + if (storedAlready != clmap.end()) + counter[storedAlready->second] += weight; + else { + clmap.insert(std::unordered_map::value_type(tcomm, numUniqueClusters)); + counter.push_back(weight); + numUniqueClusters++; + } + } + + return selfLoop; +} // distBuildLocalMapCounter + +void distExecuteLouvainIteration(const GraphElem i, const Graph &dg, const std::vector &currComm, + std::vector &targetComm, const std::vector &vDegree, + std::vector &localCinfo, std::vector &localCupdate, + const std::unordered_map &remoteComm, + const std::map &remoteCinfo, + std::map &remoteCupdate, const GraphWeight constantForSecondTerm, + std::vector &clusterWeight, const int me) +{ + GraphElem localTarget = -1; + GraphElem e0, e1, selfLoop = 0; + std::unordered_map clmap; + std::vector counter; + + const GraphElem base = dg.get_base(me), bound = dg.get_bound(me); + const GraphElem cc = currComm[i]; + GraphWeight ccDegree; + GraphElem ccSize; + bool currCommIsLocal = false; + bool targetCommIsLocal = false; + + // Current Community is local + if (cc >= base && cc < bound) { + ccDegree=localCinfo[cc-base].degree; + ccSize=localCinfo[cc-base].size; + currCommIsLocal=true; + } else { + // is remote + std::map::const_iterator citer = remoteCinfo.find(cc); + ccDegree = citer->second.degree; + ccSize = citer->second.size; + currCommIsLocal=false; + } + + dg.edge_range(i, e0, e1); + + if (e0 != e1) { + clmap.insert(std::unordered_map::value_type(cc, 0)); + counter.push_back(0.0); + + selfLoop = distBuildLocalMapCounter(e0, e1, clmap, counter, dg, + currComm, remoteComm, i, base, bound); + + clusterWeight[i] += counter[0]; + + localTarget = distGetMaxIndex(clmap, counter, selfLoop, localCinfo, remoteCinfo, + vDegree[i], ccSize, ccDegree, cc, base, bound, constantForSecondTerm); + } + else + localTarget = cc; + + // is the Target Local? + if (localTarget >= base && localTarget < bound) + targetCommIsLocal = true; + + // current and target comm are local - atomic updates to vectors + if ((localTarget != cc) && (localTarget != -1) && currCommIsLocal && targetCommIsLocal) { + +#ifdef DEBUG_PRINTF + assert( base < localTarget < bound); + assert( base < cc < bound); + assert( cc - base < localCupdate.size()); + assert( localTarget - base < localCupdate.size()); +#endif + #pragma omp atomic update + localCupdate[localTarget-base].degree += vDegree[i]; + #pragma omp atomic update + localCupdate[localTarget-base].size++; + #pragma omp atomic update + localCupdate[cc-base].degree -= vDegree[i]; + #pragma omp atomic update + localCupdate[cc-base].size--; + } + + // current is local, target is not - do atomic on local, accumulate in Maps for remote + if ((localTarget != cc) && (localTarget != -1) && currCommIsLocal && !targetCommIsLocal) { + #pragma omp atomic update + localCupdate[cc-base].degree -= vDegree[i]; + #pragma omp atomic update + localCupdate[cc-base].size--; + + // search target! + std::map::iterator iter=remoteCupdate.find(localTarget); + + #pragma omp atomic update + iter->second.degree += vDegree[i]; + #pragma omp atomic update + iter->second.size++; + } + + // current is remote, target is local - accumulate for current, atomic on local + if ((localTarget != cc) && (localTarget != -1) && !currCommIsLocal && targetCommIsLocal) { + #pragma omp atomic update + localCupdate[localTarget-base].degree += vDegree[i]; + #pragma omp atomic update + localCupdate[localTarget-base].size++; + + // search current + std::map::iterator iter=remoteCupdate.find(cc); + + #pragma omp atomic update + iter->second.degree -= vDegree[i]; + #pragma omp atomic update + iter->second.size--; + } + + // current and target are remote - accumulate for both + if ((localTarget != cc) && (localTarget != -1) && !currCommIsLocal && !targetCommIsLocal) { + + // search current + std::map::iterator iter = remoteCupdate.find(cc); + + #pragma omp atomic update + iter->second.degree -= vDegree[i]; + #pragma omp atomic update + iter->second.size--; + + // search target + iter=remoteCupdate.find(localTarget); + + #pragma omp atomic update + iter->second.degree += vDegree[i]; + #pragma omp atomic update + iter->second.size++; + } + +#ifdef DEBUG_PRINTF + assert(localTarget != -1); +#endif + targetComm[i] = localTarget; +} // distExecuteLouvainIteration + +GraphWeight distComputeModularity(const Graph &g, std::vector &localCinfo, + const std::vector &clusterWeight, + const GraphWeight constantForSecondTerm, + const int me) +{ + const GraphElem nv = g.get_lnv(); + MPI_Comm gcomm = g.get_comm(); + + GraphWeight le_la_xx[2]; + GraphWeight e_a_xx[2] = {0.0, 0.0}; + GraphWeight le_xx = 0.0, la2_x = 0.0; + +#ifdef DEBUG_PRINTF + assert((clusterWeight.size() == nv)); +#endif + +#ifdef OMP_SCHEDULE_RUNTIME +#pragma omp parallel for default(none), shared(clusterWeight, localCinfo), \ + reduction(+: le_xx), reduction(+: la2_x) schedule(runtime) +#else +#pragma omp parallel for default(none), shared(clusterWeight, localCinfo), \ + reduction(+: le_xx), reduction(+: la2_x) schedule(static) +#endif + for (GraphElem i = 0L; i < nv; i++) { + le_xx += clusterWeight[i]; + la2_x += static_cast(localCinfo[i].degree) * static_cast(localCinfo[i].degree); + } + le_la_xx[0] = le_xx; + le_la_xx[1] = la2_x; + +#ifdef DEBUG_PRINTF + const double t0 = MPI_Wtime(); +#endif + + MPI_Allreduce(le_la_xx, e_a_xx, 2, MPI_WEIGHT_TYPE, MPI_SUM, gcomm); + +#ifdef DEBUG_PRINTF + const double t1 = MPI_Wtime(); +#endif + + GraphWeight currMod = (e_a_xx[0] * constantForSecondTerm) - + (e_a_xx[1] * constantForSecondTerm * constantForSecondTerm); +#ifdef DEBUG_PRINTF + std::cout << "[" << me << "]le_xx: " << le_xx << ", la2_x: " << la2_x << std::endl; + std::cout << "[" << me << "]e_xx: " << e_a_xx[0] << ", a2_x: " << e_a_xx[1] << ", currMod: " << currMod << std::endl; + std::cout << "[" << me << "]Reduction time: " << (t1 - t0) << std::endl; +#endif + + return currMod; +} // distComputeModularity + +void distUpdateLocalCinfo(std::vector &localCinfo, const std::vector &localCupdate) +{ + size_t csz = localCinfo.size(); + +#ifdef OMP_SCHEDULE_RUNTIME +#pragma omp for schedule(runtime) +#else +#pragma omp for schedule(static) +#endif + for (GraphElem i = 0L; i < csz; i++) { + localCinfo[i].size += localCupdate[i].size; + localCinfo[i].degree += localCupdate[i].degree; + } +} + +void distCleanCWandCU(const GraphElem nv, std::vector &clusterWeight, + std::vector &localCupdate) +{ +#ifdef OMP_SCHEDULE_RUNTIME +#pragma omp for schedule(runtime) +#else +#pragma omp for schedule(static) +#endif + for (GraphElem i = 0L; i < nv; i++) { + clusterWeight[i] = 0; + localCupdate[i].degree = 0; + localCupdate[i].size = 0; + } +} // distCleanCWandCU + +#if defined(USE_MPI_RMA) +void fillRemoteCommunities(const Graph &dg, const int me, const int nprocs, + const size_t &ssz, const size_t &rsz, const std::vector &ssizes, + const std::vector &rsizes, const std::vector &svdata, + const std::vector &rvdata, const std::vector &currComm, + const std::vector &localCinfo, std::map &remoteCinfo, + std::unordered_map &remoteComm, std::map &remoteCupdate, + const MPI_Win &commwin, const std::vector &disp) +#else +void fillRemoteCommunities(const Graph &dg, const int me, const int nprocs, + const size_t &ssz, const size_t &rsz, const std::vector &ssizes, + const std::vector &rsizes, const std::vector &svdata, + const std::vector &rvdata, const std::vector &currComm, + const std::vector &localCinfo, std::map &remoteCinfo, + std::unordered_map &remoteComm, std::map &remoteCupdate) +#endif +{ +#if defined(USE_MPI_RMA) + std::vector scdata(ssz); +#else + std::vector rcdata(rsz), scdata(ssz); +#endif + GraphElem spos, rpos; +#if defined(REPLACE_STL_UOSET_WITH_VECTOR) + std::vector< std::vector< GraphElem > > rcinfo(nprocs); +#else + std::vector > rcinfo(nprocs); +#endif + +#if defined(USE_MPI_SENDRECV) +#else + std::vector rreqs(nprocs), sreqs(nprocs); +#endif + +#ifdef DEBUG_PRINTF + double t0, t1, ta = 0.0; +#endif + + const GraphElem base = dg.get_base(me), bound = dg.get_bound(me); + const GraphElem nv = dg.get_lnv(); + MPI_Comm gcomm = dg.get_comm(); + + // Collects Communities of local vertices for remote nodes +#ifdef OMP_SCHEDULE_RUNTIME +#pragma omp parallel for shared(svdata, scdata, currComm) schedule(runtime) +#else +#pragma omp parallel for shared(svdata, scdata, currComm) schedule(static) +#endif + for (GraphElem i = 0; i < ssz; i++) { + const GraphElem vertex = svdata[i]; +#ifdef DEBUG_PRINTF + assert((vertex >= base) && (vertex < bound)); +#endif + const GraphElem comm = currComm[vertex - base]; + scdata[i] = comm; + } + + std::vector rcsizes(nprocs), scsizes(nprocs); + std::vector sinfo, rinfo; + +#ifdef DEBUG_PRINTF + t0 = MPI_Wtime(); +#endif + spos = 0; + rpos = 0; +#if defined(USE_MPI_COLLECTIVES) + std::vector scnts(nprocs), rcnts(nprocs), sdispls(nprocs), rdispls(nprocs); + for (int i = 0; i < nprocs; i++) { + scnts[i] = ssizes[i]; + rcnts[i] = rsizes[i]; + sdispls[i] = spos; + rdispls[i] = rpos; + spos += scnts[i]; + rpos += rcnts[i]; + } + scnts[me] = 0; + rcnts[me] = 0; + MPI_Alltoallv(scdata.data(), scnts.data(), sdispls.data(), + MPI_GRAPH_TYPE, rcdata.data(), rcnts.data(), rdispls.data(), + MPI_GRAPH_TYPE, gcomm); +#elif defined(USE_MPI_RMA) + for (int i = 0; i < nprocs; i++) { + if (i != me) { +#if defined(USE_MPI_ACCUMULATE) + MPI_Accumulate(scdata.data() + spos, ssizes[i], MPI_GRAPH_TYPE, i, + disp[i], ssizes[i], MPI_GRAPH_TYPE, MPI_REPLACE, commwin); +#else + MPI_Put(scdata.data() + spos, ssizes[i], MPI_GRAPH_TYPE, i, + disp[i], ssizes[i], MPI_GRAPH_TYPE, commwin); +#endif + } + spos += ssizes[i]; + rpos += rsizes[i]; + } +#elif defined(USE_MPI_SENDRECV) + for (int i = 0; i < nprocs; i++) { + if (i != me) + MPI_Sendrecv(scdata.data() + spos, ssizes[i], MPI_GRAPH_TYPE, i, CommunityTag, + rcdata.data() + rpos, rsizes[i], MPI_GRAPH_TYPE, i, CommunityTag, + gcomm, MPI_STATUSES_IGNORE); + + spos += ssizes[i]; + rpos += rsizes[i]; + } +#else + for (int i = 0; i < nprocs; i++) { + if (i != me) + MPI_Irecv(rcdata.data() + rpos, rsizes[i], MPI_GRAPH_TYPE, i, + CommunityTag, gcomm, &rreqs[i]); + else + rreqs[i] = MPI_REQUEST_NULL; + + rpos += rsizes[i]; + } + for (int i = 0; i < nprocs; i++) { + if (i != me) + MPI_Isend(scdata.data() + spos, ssizes[i], MPI_GRAPH_TYPE, i, + CommunityTag, gcomm, &sreqs[i]); + else + sreqs[i] = MPI_REQUEST_NULL; + + spos += ssizes[i]; + } + + MPI_Waitall(nprocs, sreqs.data(), MPI_STATUSES_IGNORE); + MPI_Waitall(nprocs, rreqs.data(), MPI_STATUSES_IGNORE); +#endif +#ifdef DEBUG_PRINTF + t1 = MPI_Wtime(); + ta += (t1 - t0); +#endif + + // reserve vectors +#if defined(REPLACE_STL_UOSET_WITH_VECTOR) + for (GraphElem i = 0; i < nprocs; i++) { + rcinfo[i].reserve(rpos); + } +#endif + + // fetch baseptr from MPI window +#if defined(USE_MPI_RMA) + MPI_Win_flush_all(commwin); + MPI_Barrier(gcomm); + + GraphElem *rcbuf = nullptr; + int flag = 0; + MPI_Win_get_attr(commwin, MPI_WIN_BASE, &rcbuf, &flag); +#endif + + remoteComm.clear(); + for (GraphElem i = 0; i < rpos; i++) { + +#if defined(USE_MPI_RMA) + const GraphElem comm = rcbuf[i]; +#else + const GraphElem comm = rcdata[i]; +#endif + + remoteComm.insert(std::unordered_map::value_type(rvdata[i], comm)); + const int tproc = dg.get_owner(comm); + + if (tproc != me) +#if defined(REPLACE_STL_UOSET_WITH_VECTOR) + rcinfo[tproc].emplace_back(comm); +#else + rcinfo[tproc].insert(comm); +#endif + } + + for (GraphElem i = 0; i < nv; i++) { + const GraphElem comm = currComm[i]; + const int tproc = dg.get_owner(comm); + + if (tproc != me) +#if defined(REPLACE_STL_UOSET_WITH_VECTOR) + rcinfo[tproc].emplace_back(comm); +#else + rcinfo[tproc].insert(comm); +#endif + } + +#ifdef DEBUG_PRINTF + t0 = MPI_Wtime(); +#endif + GraphElem stcsz = 0, rtcsz = 0; + +#ifdef OMP_SCHEDULE_RUNTIME +#pragma omp parallel for shared(scsizes, rcinfo) \ + reduction(+:stcsz) schedule(runtime) +#else +#pragma omp parallel for shared(scsizes, rcinfo) \ + reduction(+:stcsz) schedule(static) +#endif + for (int i = 0; i < nprocs; i++) { + scsizes[i] = rcinfo[i].size(); + stcsz += scsizes[i]; + } + + MPI_Alltoall(scsizes.data(), 1, MPI_GRAPH_TYPE, rcsizes.data(), + 1, MPI_GRAPH_TYPE, gcomm); + +#ifdef DEBUG_PRINTF + t1 = MPI_Wtime(); + ta += (t1 - t0); +#endif + +#ifdef OMP_SCHEDULE_RUNTIME +#pragma omp parallel for shared(rcsizes) \ + reduction(+:rtcsz) schedule(runtime) +#else +#pragma omp parallel for shared(rcsizes) \ + reduction(+:rtcsz) schedule(static) +#endif + for (int i = 0; i < nprocs; i++) { + rtcsz += rcsizes[i]; + } + +#ifdef DEBUG_PRINTF + std::cout << "[" << me << "]Total communities to receive: " << rtcsz << std::endl; +#endif +#if defined(USE_MPI_COLLECTIVES) + std::vector rcomms(rtcsz), scomms(stcsz); +#else +#if defined(REPLACE_STL_UOSET_WITH_VECTOR) + std::vector rcomms(rtcsz); +#else + std::vector rcomms(rtcsz), scomms(stcsz); +#endif +#endif + sinfo.resize(rtcsz); + rinfo.resize(stcsz); + +#ifdef DEBUG_PRINTF + t0 = MPI_Wtime(); +#endif + spos = 0; + rpos = 0; +#if defined(USE_MPI_COLLECTIVES) + for (int i = 0; i < nprocs; i++) { + if (i != me) { + std::copy(rcinfo[i].begin(), rcinfo[i].end(), scomms.data() + spos); + } + scnts[i] = scsizes[i]; + rcnts[i] = rcsizes[i]; + sdispls[i] = spos; + rdispls[i] = rpos; + spos += scnts[i]; + rpos += rcnts[i]; + } + scnts[me] = 0; + rcnts[me] = 0; + MPI_Alltoallv(scomms.data(), scnts.data(), sdispls.data(), + MPI_GRAPH_TYPE, rcomms.data(), rcnts.data(), rdispls.data(), + MPI_GRAPH_TYPE, gcomm); + + for (int i = 0; i < nprocs; i++) { + if (i != me) { +#ifdef OMP_SCHEDULE_RUNTIME +#pragma omp parallel for default(none), shared(rcsizes, rcomms, localCinfo, sinfo, rdispls), \ + firstprivate(i), schedule(runtime) /*, if(rcsizes[i] >= 1000) */ +#else +#pragma omp parallel for default(none), shared(rcsizes, rcomms, localCinfo, sinfo, rdispls), \ + firstprivate(i), schedule(guided) /*, if(rcsizes[i] >= 1000) */ +#endif + for (GraphElem j = 0; j < rcsizes[i]; j++) { + const GraphElem comm = rcomms[rdispls[i] + j]; + sinfo[rdispls[i] + j] = {comm, localCinfo[comm-base].size, localCinfo[comm-base].degree}; + } + } + } + + MPI_Alltoallv(sinfo.data(), rcnts.data(), rdispls.data(), + commType, rinfo.data(), scnts.data(), sdispls.data(), + commType, gcomm); +#else +#if !defined(USE_MPI_SENDRECV) + std::vector rcreqs(nprocs); +#endif + for (int i = 0; i < nprocs; i++) { + if (i != me) { +#if defined(USE_MPI_SENDRECV) +#if defined(REPLACE_STL_UOSET_WITH_VECTOR) + MPI_Sendrecv(rcinfo[i].data(), scsizes[i], MPI_GRAPH_TYPE, i, CommunityTag, + rcomms.data() + rpos, rcsizes[i], MPI_GRAPH_TYPE, i, CommunityTag, + gcomm, MPI_STATUSES_IGNORE); +#else + std::copy(rcinfo[i].begin(), rcinfo[i].end(), scomms.data() + spos); + MPI_Sendrecv(scomms.data() + spos, scsizes[i], MPI_GRAPH_TYPE, i, CommunityTag, + rcomms.data() + rpos, rcsizes[i], MPI_GRAPH_TYPE, i, CommunityTag, + gcomm, MPI_STATUSES_IGNORE); +#endif +#else + MPI_Irecv(rcomms.data() + rpos, rcsizes[i], MPI_GRAPH_TYPE, i, + CommunityTag, gcomm, &rreqs[i]); +#if defined(REPLACE_STL_UOSET_WITH_VECTOR) + MPI_Isend(rcinfo[i].data(), scsizes[i], MPI_GRAPH_TYPE, i, + CommunityTag, gcomm, &sreqs[i]); +#else + std::copy(rcinfo[i].begin(), rcinfo[i].end(), scomms.data() + spos); + MPI_Isend(scomms.data() + spos, scsizes[i], MPI_GRAPH_TYPE, i, + CommunityTag, gcomm, &sreqs[i]); +#endif +#endif + } + else { +#if !defined(USE_MPI_SENDRECV) + rreqs[i] = MPI_REQUEST_NULL; + sreqs[i] = MPI_REQUEST_NULL; +#endif + } + rpos += rcsizes[i]; + spos += scsizes[i]; + } + + spos = 0; + rpos = 0; + + // poke progress on last isend/irecvs +#if !defined(USE_MPI_COLLECTIVES) && !defined(USE_MPI_SENDRECV) && defined(POKE_PROGRESS_FOR_COMMUNITY_SENDRECV_IN_LOOP) + int tf = 0, id = 0; + MPI_Testany(nprocs, sreqs.data(), &id, &tf, MPI_STATUS_IGNORE); +#endif + +#if !defined(USE_MPI_COLLECTIVES) && !defined(USE_MPI_SENDRECV) && !defined(POKE_PROGRESS_FOR_COMMUNITY_SENDRECV_IN_LOOP) + MPI_Waitall(nprocs, sreqs.data(), MPI_STATUSES_IGNORE); + MPI_Waitall(nprocs, rreqs.data(), MPI_STATUSES_IGNORE); +#endif + + for (int i = 0; i < nprocs; i++) { + if (i != me) { +#if defined(USE_MPI_SENDRECV) +#ifdef OMP_SCHEDULE_RUNTIME +#pragma omp parallel for default(none), shared(rcsizes, rcomms, localCinfo, sinfo), \ + firstprivate(i, rpos), schedule(runtime) +#else +#pragma omp parallel for default(none), shared(rcsizes, rcomms, localCinfo, sinfo), \ + firstprivate(i, rpos), schedule(guided) +#endif + for (GraphElem j = 0; j < rcsizes[i]; j++) { + const GraphElem comm = rcomms[rpos + j]; + sinfo[rpos + j] = {comm, localCinfo[comm-base].size, localCinfo[comm-base].degree}; + } + + MPI_Sendrecv(sinfo.data() + rpos, rcsizes[i], commType, i, CommunityDataTag, + rinfo.data() + spos, scsizes[i], commType, i, CommunityDataTag, + gcomm, MPI_STATUSES_IGNORE); +#else + MPI_Irecv(rinfo.data() + spos, scsizes[i], commType, i, CommunityDataTag, + gcomm, &rcreqs[i]); + + // poke progress on last isend/irecvs +#if defined(POKE_PROGRESS_FOR_COMMUNITY_SENDRECV_IN_LOOP) + int flag = 0, done = 0; + while (!done) { + MPI_Test(&sreqs[i], &flag, MPI_STATUS_IGNORE); + MPI_Test(&rreqs[i], &flag, MPI_STATUS_IGNORE); + if (flag) + done = 1; + } +#endif + +#ifdef OMP_SCHEDULE_RUNTIME +#pragma omp parallel for default(none), shared(rcsizes, rcomms, localCinfo, sinfo), \ + firstprivate(i, rpos), schedule(runtime) +#else +#pragma omp parallel for default(none), shared(rcsizes, rcomms, localCinfo, sinfo), \ + firstprivate(i, rpos), schedule(guided) +#endif + for (GraphElem j = 0; j < rcsizes[i]; j++) { + const GraphElem comm = rcomms[rpos + j]; + sinfo[rpos + j] = {comm, localCinfo[comm-base].size, localCinfo[comm-base].degree}; + } + + MPI_Isend(sinfo.data() + rpos, rcsizes[i], commType, i, + CommunityDataTag, gcomm, &sreqs[i]); +#endif + } + else { +#if !defined(USE_MPI_SENDRECV) + rcreqs[i] = MPI_REQUEST_NULL; + sreqs[i] = MPI_REQUEST_NULL; +#endif + } + rpos += rcsizes[i]; + spos += scsizes[i]; + } + +#if !defined(USE_MPI_SENDRECV) + MPI_Waitall(nprocs, sreqs.data(), MPI_STATUSES_IGNORE); + MPI_Waitall(nprocs, rcreqs.data(), MPI_STATUSES_IGNORE); +#endif + +#endif + +#ifdef DEBUG_PRINTF + t1 = MPI_Wtime(); + ta += (t1 - t0); +#endif + + remoteCinfo.clear(); + remoteCupdate.clear(); + + for (GraphElem i = 0; i < stcsz; i++) { + const GraphElem ccomm = rinfo[i].community; + + Comm comm; + + comm.size = rinfo[i].size; + comm.degree = rinfo[i].degree; + + remoteCinfo.insert(std::map::value_type(ccomm, comm)); + remoteCupdate.insert(std::map::value_type(ccomm, Comm())); + } +} // end fillRemoteCommunities + +void createCommunityMPIType() +{ + CommInfo cinfo; + + MPI_Aint begin, community, size, degree; + + MPI_Get_address(&cinfo, &begin); + MPI_Get_address(&cinfo.community, &community); + MPI_Get_address(&cinfo.size, &size); + MPI_Get_address(&cinfo.degree, °ree); + + int blens[] = { 1, 1, 1 }; + MPI_Aint displ[] = { community - begin, size - begin, degree - begin }; + MPI_Datatype types[] = { MPI_GRAPH_TYPE, MPI_GRAPH_TYPE, MPI_WEIGHT_TYPE }; + + MPI_Type_create_struct(3, blens, displ, types, &commType); + MPI_Type_commit(&commType); +} // createCommunityMPIType + +void destroyCommunityMPIType() +{ + MPI_Type_free(&commType); +} // destroyCommunityMPIType + +void updateRemoteCommunities(const Graph &dg, std::vector &localCinfo, + const std::map &remoteCupdate, + const int me, const int nprocs) +{ + const GraphElem base = dg.get_base(me), bound = dg.get_bound(me); + std::vector> remoteArray(nprocs); + MPI_Comm gcomm = dg.get_comm(); + + // FIXME TODO can we use TBB::concurrent_vector instead, + // to make this parallel; first we have to get rid of maps + for (std::map::const_iterator iter = remoteCupdate.begin(); iter != remoteCupdate.end(); iter++) { + const GraphElem i = iter->first; + const Comm &curr = iter->second; + + const int tproc = dg.get_owner(i); + +#ifdef DEBUG_PRINTF + assert(tproc != me); +#endif + CommInfo rcinfo; + + rcinfo.community = i; + rcinfo.size = curr.size; + rcinfo.degree = curr.degree; + + remoteArray[tproc].push_back(rcinfo); + } + + std::vector send_sz(nprocs), recv_sz(nprocs); + +#ifdef DEBUG_PRINTF + GraphWeight tc = 0.0; + const double t0 = MPI_Wtime(); +#endif + +#ifdef OMP_SCHEDULE_RUNTIME +#pragma omp parallel for schedule(runtime) +#else +#pragma omp parallel for schedule(static) +#endif + for (int i = 0; i < nprocs; i++) { + send_sz[i] = remoteArray[i].size(); + } + + MPI_Alltoall(send_sz.data(), 1, MPI_GRAPH_TYPE, recv_sz.data(), + 1, MPI_GRAPH_TYPE, gcomm); + +#ifdef DEBUG_PRINTF + const double t1 = MPI_Wtime(); + tc += (t1 - t0); +#endif + + GraphElem rcnt = 0, scnt = 0; +#ifdef OMP_SCHEDULE_RUNTIME +#pragma omp parallel for shared(recv_sz, send_sz) \ + reduction(+:rcnt, scnt) schedule(runtime) +#else +#pragma omp parallel for shared(recv_sz, send_sz) \ + reduction(+:rcnt, scnt) schedule(static) +#endif + for (int i = 0; i < nprocs; i++) { + rcnt += recv_sz[i]; + scnt += send_sz[i]; + } +#ifdef DEBUG_PRINTF + std::cout << "[" << me << "]Total number of remote communities to update: " << scnt << std::endl; +#endif + + GraphElem currPos = 0; + std::vector rdata(rcnt); + +#ifdef DEBUG_PRINTF + const double t2 = MPI_Wtime(); +#endif +#if defined(USE_MPI_SENDRECV) + for (int i = 0; i < nprocs; i++) { + if (i != me) + MPI_Sendrecv(remoteArray[i].data(), send_sz[i], commType, i, CommunityDataTag, + rdata.data() + currPos, recv_sz[i], commType, i, CommunityDataTag, + gcomm, MPI_STATUSES_IGNORE); + + currPos += recv_sz[i]; + } +#else + std::vector sreqs(nprocs), rreqs(nprocs); + for (int i = 0; i < nprocs; i++) { + if (i != me) + MPI_Irecv(rdata.data() + currPos, recv_sz[i], commType, i, + CommunityDataTag, gcomm, &rreqs[i]); + else + rreqs[i] = MPI_REQUEST_NULL; + + currPos += recv_sz[i]; + } + + for (int i = 0; i < nprocs; i++) { + if (i != me) + MPI_Isend(remoteArray[i].data(), send_sz[i], commType, i, + CommunityDataTag, gcomm, &sreqs[i]); + else + sreqs[i] = MPI_REQUEST_NULL; + } + + MPI_Waitall(nprocs, sreqs.data(), MPI_STATUSES_IGNORE); + MPI_Waitall(nprocs, rreqs.data(), MPI_STATUSES_IGNORE); +#endif +#ifdef DEBUG_PRINTF + const double t3 = MPI_Wtime(); + std::cout << "[" << me << "]Update remote community MPI time: " << (t3 - t2) << std::endl; +#endif + +#ifdef OMP_SCHEDULE_RUNTIME +#pragma omp parallel for shared(rdata, localCinfo) schedule(runtime) +#else +#pragma omp parallel for shared(rdata, localCinfo) schedule(dynamic) +#endif + for (GraphElem i = 0; i < rcnt; i++) { + const CommInfo &curr = rdata[i]; + +#ifdef DEBUG_PRINTF + assert(dg.get_owner(curr.community) == me); +#endif + localCinfo[curr.community-base].size += curr.size; + localCinfo[curr.community-base].degree += curr.degree; + } +} // updateRemoteCommunities + +// initial setup before Louvain iteration begins +#if defined(USE_MPI_RMA) +void exchangeVertexReqs(const Graph &dg, size_t &ssz, size_t &rsz, + std::vector &ssizes, std::vector &rsizes, + std::vector &svdata, std::vector &rvdata, + const int me, const int nprocs, MPI_Win &commwin) +#else +void exchangeVertexReqs(const Graph &dg, size_t &ssz, size_t &rsz, + std::vector &ssizes, std::vector &rsizes, + std::vector &svdata, std::vector &rvdata, + const int me, const int nprocs) +#endif +{ + const GraphElem base = dg.get_base(me), bound = dg.get_bound(me); + const GraphElem nv = dg.get_lnv(); + MPI_Comm gcomm = dg.get_comm(); + +#ifdef USE_OPENMP_LOCK + std::vector locks(nprocs); + for (int i = 0; i < nprocs; i++) + omp_init_lock(&locks[i]); +#endif + std::vector> parray(nprocs); + +#ifdef USE_OPENMP_LOCK +#pragma omp parallel default(none), shared(dg, locks, parray) +#else +#pragma omp parallel default(none), shared(dg, parray) +#endif + { +#ifdef OMP_SCHEDULE_RUNTIME +#pragma omp for schedule(runtime) +#else +#pragma omp for schedule(guided) +#endif + for (GraphElem i = 0; i < nv; i++) { + GraphElem e0, e1; + + dg.edge_range(i, e0, e1); + + for (GraphElem j = e0; j < e1; j++) { + const Edge &edge = dg.get_edge(j); + const int tproc = dg.get_owner(edge.tail_); + + if (tproc != me) { +#ifdef USE_OPENMP_LOCK + omp_set_lock(&locks[tproc]); +#else + lock(); +#endif + parray[tproc].insert(edge.tail_); +#ifdef USE_OPENMP_LOCK + omp_unset_lock(&locks[tproc]); +#else + unlock(); +#endif + } + } + } + } + +#ifdef USE_OPENMP_LOCK + for (int i = 0; i < nprocs; i++) { + omp_destroy_lock(&locks[i]); + } +#endif + + rsizes.resize(nprocs); + ssizes.resize(nprocs); + ssz = 0, rsz = 0; + + int pproc = 0; + // TODO FIXME parallelize this loop + for (std::vector>::const_iterator iter = parray.begin(); iter != parray.end(); iter++) { + ssz += iter->size(); + ssizes[pproc] = iter->size(); + pproc++; + } + + MPI_Alltoall(ssizes.data(), 1, MPI_GRAPH_TYPE, rsizes.data(), + 1, MPI_GRAPH_TYPE, gcomm); + + GraphElem rsz_r = 0; +#ifdef OMP_SCHEDULE_RUNTIME +#pragma omp parallel for shared(rsizes) \ + reduction(+:rsz_r) schedule(runtime) +#else +#pragma omp parallel for shared(rsizes) \ + reduction(+:rsz_r) schedule(static) +#endif + for (int i = 0; i < nprocs; i++) + rsz_r += rsizes[i]; + rsz = rsz_r; + + svdata.resize(ssz); + rvdata.resize(rsz); + + GraphElem cpos = 0, rpos = 0; + pproc = 0; + +#if defined(USE_MPI_COLLECTIVES) + std::vector scnts(nprocs), rcnts(nprocs), sdispls(nprocs), rdispls(nprocs); + + for (std::vector>::const_iterator iter = parray.begin(); iter != parray.end(); iter++) { + std::copy(iter->begin(), iter->end(), svdata.begin() + cpos); + + scnts[pproc] = iter->size(); + rcnts[pproc] = rsizes[pproc]; + sdispls[pproc] = cpos; + rdispls[pproc] = rpos; + cpos += iter->size(); + rpos += rcnts[pproc]; + + pproc++; + } + + scnts[me] = 0; + rcnts[me] = 0; + MPI_Alltoallv(svdata.data(), scnts.data(), sdispls.data(), + MPI_GRAPH_TYPE, rvdata.data(), rcnts.data(), rdispls.data(), + MPI_GRAPH_TYPE, gcomm); +#else + std::vector rreqs(nprocs), sreqs(nprocs); + for (int i = 0; i < nprocs; i++) { + if (i != me) + MPI_Irecv(rvdata.data() + rpos, rsizes[i], MPI_GRAPH_TYPE, i, + VertexTag, gcomm, &rreqs[i]); + else + rreqs[i] = MPI_REQUEST_NULL; + + rpos += rsizes[i]; + } + + for (std::vector>::const_iterator iter = parray.begin(); iter != parray.end(); iter++) { + std::copy(iter->begin(), iter->end(), svdata.begin() + cpos); + + if (me != pproc) + MPI_Isend(svdata.data() + cpos, iter->size(), MPI_GRAPH_TYPE, pproc, + VertexTag, gcomm, &sreqs[pproc]); + else + sreqs[pproc] = MPI_REQUEST_NULL; + + cpos += iter->size(); + pproc++; + } + + MPI_Waitall(nprocs, sreqs.data(), MPI_STATUSES_IGNORE); + MPI_Waitall(nprocs, rreqs.data(), MPI_STATUSES_IGNORE); +#endif + + std::swap(svdata, rvdata); + std::swap(ssizes, rsizes); + std::swap(ssz, rsz); + + // create MPI window for communities +#if defined(USE_MPI_RMA) + GraphElem *ptr = nullptr; + MPI_Info info = MPI_INFO_NULL; +#if defined(USE_MPI_ACCUMULATE) + MPI_Info_create(&info); + MPI_Info_set(info, "accumulate_ordering", "none"); + MPI_Info_set(info, "accumulate_ops", "same_op"); +#endif + MPI_Win_allocate(rsz*sizeof(GraphElem), sizeof(GraphElem), + info, gcomm, &ptr, &commwin); + MPI_Win_lock_all(MPI_MODE_NOCHECK, commwin); +#endif +} // exchangeVertexReqs + +#if defined(USE_MPI_RMA) +GraphWeight distLouvainMethod(const int me, const int nprocs, const Graph &dg, + size_t &ssz, size_t &rsz, std::vector &ssizes, std::vector &rsizes, + std::vector &svdata, std::vector &rvdata, const GraphWeight lower, + const GraphWeight thresh, int &iters, MPI_Win &commwin) +#else +GraphWeight distLouvainMethod(const int me, const int nprocs, const Graph &dg, + size_t &ssz, size_t &rsz, std::vector &ssizes, std::vector &rsizes, + std::vector &svdata, std::vector &rvdata, const GraphWeight lower, + const GraphWeight thresh, int &iters) +#endif +{ + std::vector pastComm, currComm, targetComm; + std::vector vDegree; + std::vector clusterWeight; + std::vector localCinfo, localCupdate; + + std::unordered_map remoteComm; + std::map remoteCinfo, remoteCupdate; + + const GraphElem nv = dg.get_lnv(); + MPI_Comm gcomm = dg.get_comm(); + + GraphWeight constantForSecondTerm; + GraphWeight prevMod = lower; + GraphWeight currMod = -1.0; + int numIters = 0; + + distInitLouvain(dg, pastComm, currComm, vDegree, clusterWeight, localCinfo, + localCupdate, constantForSecondTerm, me); + targetComm.resize(nv); + +#ifdef DEBUG_PRINTF + std::cout << "[" << me << "]constantForSecondTerm: " << constantForSecondTerm << std::endl; + if (me == 0) + std::cout << "Threshold: " << thresh << std::endl; +#endif + const GraphElem base = dg.get_base(me), bound = dg.get_bound(me); + +#ifdef DEBUG_PRINTF + double t0, t1; + t0 = MPI_Wtime(); +#endif + + // setup vertices and communities +#if defined(USE_MPI_RMA) + exchangeVertexReqs(dg, ssz, rsz, ssizes, rsizes, + svdata, rvdata, me, nprocs, commwin); + + // store the remote displacements + std::vector disp(nprocs); + MPI_Exscan(ssizes.data(), (GraphElem*)disp.data(), nprocs, MPI_GRAPH_TYPE, + MPI_SUM, gcomm); +#else + exchangeVertexReqs(dg, ssz, rsz, ssizes, rsizes, + svdata, rvdata, me, nprocs); +#endif + +#ifdef DEBUG_PRINTF + t1 = MPI_Wtime(); + std::cout << "[" << me << "]Initial communication setup time before Louvain iteration (in s): " << (t1 - t0) << std::endl; +#endif + + // start Louvain iteration + while(true) { +#ifdef DEBUG_PRINTF + const double t2 = MPI_Wtime(); + if (me == 0) + std::cout << "Starting Louvain iteration: " << numIters << std::endl; +#endif + numIters++; + +#ifdef DEBUG_PRINTF + t0 = MPI_Wtime(); +#endif + +#if defined(USE_MPI_RMA) + fillRemoteCommunities(dg, me, nprocs, ssz, rsz, ssizes, + rsizes, svdata, rvdata, currComm, localCinfo, + remoteCinfo, remoteComm, remoteCupdate, + commwin, disp); +#else + fillRemoteCommunities(dg, me, nprocs, ssz, rsz, ssizes, + rsizes, svdata, rvdata, currComm, localCinfo, + remoteCinfo, remoteComm, remoteCupdate); +#endif + +#ifdef DEBUG_PRINTF + t1 = MPI_Wtime(); + std::cout << "[" << me << "]Remote community map size: " << remoteComm.size() << std::endl; + std::cout << "[" << me << "]Iteration communication time: " << (t1 - t0) << std::endl; +#endif + +#ifdef DEBUG_PRINTF + t0 = MPI_Wtime(); +#endif + +#pragma omp parallel default(none), shared(clusterWeight, localCupdate, currComm, targetComm, \ + vDegree, localCinfo, remoteCinfo, remoteComm, pastComm, dg, remoteCupdate), \ + firstprivate(constantForSecondTerm) + { + distCleanCWandCU(nv, clusterWeight, localCupdate); + +#ifdef OMP_SCHEDULE_RUNTIME +#pragma omp for schedule(runtime) +#else +#pragma omp for schedule(guided) +#endif + for (GraphElem i = 0; i < nv; i++) { + distExecuteLouvainIteration(i, dg, currComm, targetComm, vDegree, localCinfo, + localCupdate, remoteComm, remoteCinfo, remoteCupdate, + constantForSecondTerm, clusterWeight, me); + } + } + +#pragma omp parallel default(none), shared(localCinfo, localCupdate) + { + distUpdateLocalCinfo(localCinfo, localCupdate); + } + + // communicate remote communities + updateRemoteCommunities(dg, localCinfo, remoteCupdate, me, nprocs); + + // compute modularity + currMod = distComputeModularity(dg, localCinfo, clusterWeight, constantForSecondTerm, me); + + // exit criteria + if (currMod - prevMod < thresh) + break; + + prevMod = currMod; + if (prevMod < lower) + prevMod = lower; + +#ifdef OMP_SCHEDULE_RUNTIME +#pragma omp parallel for default(none) \ + shared(pastComm, currComm, targetComm) \ + schedule(runtime) +#else +#pragma omp parallel for default(none) \ + shared(pastComm, currComm, targetComm) \ + schedule(static) +#endif + for (GraphElem i = 0; i < nv; i++) { + GraphElem tmp = pastComm[i]; + pastComm[i] = currComm[i]; + currComm[i] = targetComm[i]; + targetComm[i] = tmp; + } + } // end of Louvain iteration + +#if defined(USE_MPI_RMA) + MPI_Win_unlock_all(commwin); + MPI_Win_free(&commwin); +#endif + + iters = numIters; + + vDegree.clear(); + pastComm.clear(); + currComm.clear(); + targetComm.clear(); + clusterWeight.clear(); + localCinfo.clear(); + localCupdate.clear(); + + return prevMod; +} // distLouvainMethod plain + +#endif // __DSPL diff --git a/miniVite/dspl_gpu.hpp b/miniVite/dspl_gpu.hpp new file mode 100644 index 0000000..601a382 --- /dev/null +++ b/miniVite/dspl_gpu.hpp @@ -0,0 +1,1409 @@ +// *********************************************************************** +// +// miniVite +// +// *********************************************************************** +// +// Copyright (2018) Battelle Memorial Institute +// All rights reserved. +// +// Redistribution and use in source and binary forms, with or without +// modification, are permitted provided that the following conditions +// are met: +// +// 1. Redistributions of source code must retain the above copyright +// notice, this list of conditions and the following disclaimer. +// +// 2. Redistributions in binary form must reproduce the above copyright +// notice, this list of conditions and the following disclaimer in the +// documentation and/or other materials provided with the distribution. +// +// 3. Neither the name of the copyright holder nor the names of its +// contributors may be used to endorse or promote products derived from +// this software without specific prior written permission. +// +// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS +// FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE +// COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, +// INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, +// BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; +// LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER +// CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +// LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN +// ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +// POSSIBILITY OF SUCH DAMAGE. +// +// ************************************************************************ + +#pragma once +#ifndef DSPL_HPP +#define DSPL_HPP + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +#include "graph.hpp" +#include "utils.hpp" + +struct Comm { + GraphElem size; + GraphWeight degree; + + Comm() : size(0), degree(0.0) {}; +}; + +struct CommInfo { + GraphElem community; + GraphElem size; + GraphWeight degree; +}; + +const int SizeTag = 1; +const int VertexTag = 2; +const int CommunityTag = 3; +const int CommunitySizeTag = 4; +const int CommunityDataTag = 5; + +static MPI_Datatype commType; + +void distSumVertexDegree(const Graph &g, std::vector &vDegree, std::vector &localCinfo) +{ + const GraphElem nv = g.get_lnv(); + +#ifdef OMP_SCHEDULE_RUNTIME +#pragma omp parallel for default(none), shared(g, vDegree, localCinfo), schedule(runtime) +#else +#pragma omp parallel for default(none), shared(g, vDegree, localCinfo), schedule(guided) +#endif + for (GraphElem i = 0; i < nv; i++) { + GraphElem e0, e1; + GraphWeight tw = 0.0; + + g.edge_range(i, e0, e1); + + for (GraphElem k = e0; k < e1; k++) { + const Edge &edge = g.get_edge(k); + tw += edge.weight_; + } + + vDegree[i] = tw; + + localCinfo[i].degree = tw; + localCinfo[i].size = 1L; + } +} // distSumVertexDegree + +GraphWeight distCalcConstantForSecondTerm(const std::vector &vDegree, MPI_Comm gcomm) +{ + GraphWeight totalEdgeWeightTwice = 0.0; + GraphWeight localWeight = 0.0; + int me = -1; + + const size_t vsz = vDegree.size(); + +#ifdef OMP_SCHEDULE_RUNTIME +#pragma omp parallel for default(none), shared(vDegree), reduction(+: localWeight) schedule(runtime) +#else +#pragma omp parallel for default(none), shared(vDegree), reduction(+: localWeight) schedule(static) +#endif + for (GraphElem i = 0; i < vsz; i++) + localWeight += vDegree[i]; // Local reduction + + // Global reduction + MPI_Allreduce(&localWeight, &totalEdgeWeightTwice, 1, + MPI_WEIGHT_TYPE, MPI_SUM, gcomm); + + return (1.0 / static_cast(totalEdgeWeightTwice)); +} // distCalcConstantForSecondTerm + +void distInitComm(std::vector &pastComm, std::vector &currComm, const GraphElem base) +{ + const size_t csz = currComm.size(); + +#ifdef DEBUG_PRINTF + assert(csz == pastComm.size()); +#endif + +#ifdef OMP_SCHEDULE_RUNTIME +#pragma omp parallel for default(none), shared(pastComm, currComm), schedule(runtime) +#else +#pragma omp parallel for default(none), shared(pastComm, currComm), schedule(static) +#endif + for (GraphElem i = 0L; i < csz; i++) { + pastComm[i] = i + base; + currComm[i] = i + base; + } +} // distInitComm + +void distInitLouvain(const Graph &dg, std::vector &pastComm, + std::vector &currComm, std::vector &vDegree, + std::vector &clusterWeight, std::vector &localCinfo, + std::vector &localCupdate, GraphWeight &constantForSecondTerm, + const int me) +{ + const GraphElem base = dg.get_base(me); + const GraphElem nv = dg.get_lnv(); + MPI_Comm gcomm = dg.get_comm(); + + vDegree.resize(nv); + pastComm.resize(nv); + currComm.resize(nv); + clusterWeight.resize(nv); + localCinfo.resize(nv); + localCupdate.resize(nv); + + distSumVertexDegree(dg, vDegree, localCinfo); + constantForSecondTerm = distCalcConstantForSecondTerm(vDegree, gcomm); + + distInitComm(pastComm, currComm, base); +} // distInitLouvain + +GraphElem distGetMaxIndex(const std::unordered_map &clmap, const std::vector &counter, + const GraphWeight selfLoop, const std::vector &localCinfo, + const std::map &remoteCinfo, const GraphWeight vDegree, + const GraphElem currSize, const GraphWeight currDegree, const GraphElem currComm, + const GraphElem base, const GraphElem bound, const GraphWeight constant) +{ + std::unordered_map::const_iterator storedAlready; + GraphElem maxIndex = currComm; + GraphWeight curGain = 0.0, maxGain = 0.0; + GraphWeight eix = static_cast(counter[0]) - static_cast(selfLoop); + + GraphWeight ax = currDegree - vDegree; + GraphWeight eiy = 0.0, ay = 0.0; + + GraphElem maxSize = currSize; + GraphElem size = 0; + + storedAlready = clmap.begin(); +#ifdef DEBUG_PRINTF + assert(storedAlready != clmap.end()); +#endif + do { + if (currComm != storedAlready->first) { + + // is_local, direct access local info + if ((storedAlready->first >= base) && (storedAlready->first < bound)) { + ay = localCinfo[storedAlready->first-base].degree; + size = localCinfo[storedAlready->first - base].size; + } + else { + // is_remote, lookup map + std::map::const_iterator citer = remoteCinfo.find(storedAlready->first); + ay = citer->second.degree; + size = citer->second.size; + } + + eiy = counter[storedAlready->second]; + + curGain = 2.0 * (eiy - eix) - 2.0 * vDegree * (ay - ax) * constant; + + if ((curGain > maxGain) || + ((curGain == maxGain) && (curGain != 0.0) && (storedAlready->first < maxIndex))) { + maxGain = curGain; + maxIndex = storedAlready->first; + maxSize = size; + } + } + storedAlready++; + } while (storedAlready != clmap.end()); + + if ((maxSize == 1) && (currSize == 1) && (maxIndex > currComm)) + maxIndex = currComm; + + return maxIndex; +} // distGetMaxIndex + +GraphWeight distBuildLocalMapCounter(const GraphElem e0, const GraphElem e1, std::unordered_map &clmap, + std::vector &counter, const Graph &g, + const std::vector &currComm, + const std::unordered_map &remoteComm, + const GraphElem vertex, const GraphElem base, const GraphElem bound) +{ + GraphElem numUniqueClusters = 1L; + GraphWeight selfLoop = 0; + std::unordered_map::const_iterator storedAlready; + + for (GraphElem j = e0; j < e1; j++) { + + const Edge &edge = g.get_edge(j); + const GraphElem &tail_ = edge.tail_; + const GraphWeight &weight = edge.weight_; + GraphElem tcomm; + + if (tail_ == vertex + base) + selfLoop += weight; + + // is_local, direct access local std::vector + if ((tail_ >= base) && (tail_ < bound)) + tcomm = currComm[tail_ - base]; + else { // is_remote, lookup map + std::unordered_map::const_iterator iter = remoteComm.find(tail_); + +#ifdef DEBUG_PRINTF + assert(iter != remoteComm.end()); +#endif + tcomm = iter->second; + } + + storedAlready = clmap.find(tcomm); + + if (storedAlready != clmap.end()) + counter[storedAlready->second] += weight; + else { + clmap.insert(std::unordered_map::value_type(tcomm, numUniqueClusters)); + counter.push_back(weight); + numUniqueClusters++; + } + } + + return selfLoop; +} // distBuildLocalMapCounter + +void distExecuteLouvainIteration(const GraphElem i, const Graph &dg, const std::vector &currComm, + std::vector &targetComm, const std::vector &vDegree, + std::vector &localCinfo, std::vector &localCupdate, + const std::unordered_map &remoteComm, + const std::map &remoteCinfo, + std::map &remoteCupdate, const GraphWeight constantForSecondTerm, + std::vector &clusterWeight, const int me) +{ + GraphElem localTarget = -1; + GraphElem e0, e1, selfLoop = 0; + std::unordered_map clmap; + std::vector counter; + + const GraphElem base = dg.get_base(me), bound = dg.get_bound(me); + const GraphElem cc = currComm[i]; + GraphWeight ccDegree; + GraphElem ccSize; + bool currCommIsLocal = false; + bool targetCommIsLocal = false; + + // Current Community is local + if (cc >= base && cc < bound) { + ccDegree=localCinfo[cc-base].degree; + ccSize=localCinfo[cc-base].size; + currCommIsLocal=true; + } else { + // is remote + std::map::const_iterator citer = remoteCinfo.find(cc); + ccDegree = citer->second.degree; + ccSize = citer->second.size; + currCommIsLocal=false; + } + + dg.edge_range(i, e0, e1); + + if (e0 != e1) { + clmap.insert(std::unordered_map::value_type(cc, 0)); + counter.push_back(0.0); + + selfLoop = distBuildLocalMapCounter(e0, e1, clmap, counter, dg, + currComm, remoteComm, i, base, bound); + + clusterWeight[i] += counter[0]; + + localTarget = distGetMaxIndex(clmap, counter, selfLoop, localCinfo, remoteCinfo, + vDegree[i], ccSize, ccDegree, cc, base, bound, constantForSecondTerm); + } + else + localTarget = cc; + + // is the Target Local? + if (localTarget >= base && localTarget < bound) + targetCommIsLocal = true; + + // current and target comm are local - atomic updates to vectors + if ((localTarget != cc) && (localTarget != -1) && currCommIsLocal && targetCommIsLocal) { + +#ifdef DEBUG_PRINTF + assert( base < localTarget < bound); + assert( base < cc < bound); + assert( cc - base < localCupdate.size()); + assert( localTarget - base < localCupdate.size()); +#endif + #pragma omp atomic update + localCupdate[localTarget-base].degree += vDegree[i]; + #pragma omp atomic update + localCupdate[localTarget-base].size++; + #pragma omp atomic update + localCupdate[cc-base].degree -= vDegree[i]; + #pragma omp atomic update + localCupdate[cc-base].size--; + } + + // current is local, target is not - do atomic on local, accumulate in Maps for remote + if ((localTarget != cc) && (localTarget != -1) && currCommIsLocal && !targetCommIsLocal) { + #pragma omp atomic update + localCupdate[cc-base].degree -= vDegree[i]; + #pragma omp atomic update + localCupdate[cc-base].size--; + + // search target! + std::map::iterator iter=remoteCupdate.find(localTarget); + + #pragma omp atomic update + iter->second.degree += vDegree[i]; + #pragma omp atomic update + iter->second.size++; + } + + // current is remote, target is local - accumulate for current, atomic on local + if ((localTarget != cc) && (localTarget != -1) && !currCommIsLocal && targetCommIsLocal) { + #pragma omp atomic update + localCupdate[localTarget-base].degree += vDegree[i]; + #pragma omp atomic update + localCupdate[localTarget-base].size++; + + // search current + std::map::iterator iter=remoteCupdate.find(cc); + + #pragma omp atomic update + iter->second.degree -= vDegree[i]; + #pragma omp atomic update + iter->second.size--; + } + + // current and target are remote - accumulate for both + if ((localTarget != cc) && (localTarget != -1) && !currCommIsLocal && !targetCommIsLocal) { + + // search current + std::map::iterator iter = remoteCupdate.find(cc); + + #pragma omp atomic update + iter->second.degree -= vDegree[i]; + #pragma omp atomic update + iter->second.size--; + + // search target + iter=remoteCupdate.find(localTarget); + + #pragma omp atomic update + iter->second.degree += vDegree[i]; + #pragma omp atomic update + iter->second.size++; + } + +#ifdef DEBUG_PRINTF + assert(localTarget != -1); +#endif + targetComm[i] = localTarget; +} // distExecuteLouvainIteration + +GraphWeight distComputeModularity(const Graph &g, std::vector &localCinfo, + const std::vector &clusterWeight, + const GraphWeight constantForSecondTerm, + const int me) +{ + const GraphElem nv = g.get_lnv(); + MPI_Comm gcomm = g.get_comm(); + + GraphWeight le_la_xx[2]; + GraphWeight e_a_xx[2] = {0.0, 0.0}; + GraphWeight le_xx = 0.0, la2_x = 0.0; + +#ifdef DEBUG_PRINTF + assert((clusterWeight.size() == nv)); +#endif + +#if defined(OMP_GPU) +#pragma omp target teams distribute parallel for map(to: clusterWeight, localCinfo) reduction(+: le_xx), reduction(+: la2_x) +#elif defined(OMP_SCHEDULE_RUNTIME) +#pragma omp parallel for default(none), shared(clusterWeight, localCinfo), \ + reduction(+: le_xx), reduction(+: la2_x) schedule(runtime) +#else +#pragma omp parallel for default(none), shared(clusterWeight, localCinfo), \ + reduction(+: le_xx), reduction(+: la2_x) schedule(static) +#endif + for (GraphElem i = 0L; i < nv; i++) { + le_xx += clusterWeight[i]; + la2_x += static_cast(localCinfo[i].degree) * static_cast(localCinfo[i].degree); + } + le_la_xx[0] = le_xx; + le_la_xx[1] = la2_x; + +#ifdef DEBUG_PRINTF + const double t0 = MPI_Wtime(); +#endif + + MPI_Allreduce(le_la_xx, e_a_xx, 2, MPI_WEIGHT_TYPE, MPI_SUM, gcomm); + +#ifdef DEBUG_PRINTF + const double t1 = MPI_Wtime(); +#endif + + GraphWeight currMod = (e_a_xx[0] * constantForSecondTerm) - + (e_a_xx[1] * constantForSecondTerm * constantForSecondTerm); +#ifdef DEBUG_PRINTF + std::cout << "[" << me << "]le_xx: " << le_xx << ", la2_x: " << la2_x << std::endl; + std::cout << "[" << me << "]e_xx: " << e_a_xx[0] << ", a2_x: " << e_a_xx[1] << ", currMod: " << currMod << std::endl; + std::cout << "[" << me << "]Reduction time: " << (t1 - t0) << std::endl; +#endif + + return currMod; +} // distComputeModularity + +void distUpdateLocalCinfo(std::vector &localCinfo, const std::vector &localCupdate) +{ + size_t csz = localCinfo.size(); + +#if defined(OMP_GPU) +#pragma omp target teams distribute parallel for +#elif defined(OMP_SCHEDULE_RUNTIME) +#pragma omp for schedule(runtime) +#else +#pragma omp for schedule(static) +#endif + for (GraphElem i = 0L; i < csz; i++) { + localCinfo[i].size += localCupdate[i].size; + localCinfo[i].degree += localCupdate[i].degree; + } +} + +void distCleanCWandCU(const GraphElem nv, std::vector &clusterWeight, + std::vector &localCupdate) +{ +#if defined(OMP_GPU) +#pragma omp target teams distribute parallel for +#elif defined(OMP_SCHEDULE_RUNTIME) +#pragma omp for schedule(runtime) +#else +#pragma omp for schedule(static) +#endif + for (GraphElem i = 0L; i < nv; i++) { + clusterWeight[i] = 0; + localCupdate[i].degree = 0; + localCupdate[i].size = 0; + } +} // distCleanCWandCU + +#if defined(USE_MPI_RMA) +void fillRemoteCommunities(const Graph &dg, const int me, const int nprocs, + const size_t &ssz, const size_t &rsz, const std::vector &ssizes, + const std::vector &rsizes, const std::vector &svdata, + const std::vector &rvdata, const std::vector &currComm, + const std::vector &localCinfo, std::map &remoteCinfo, + std::unordered_map &remoteComm, std::map &remoteCupdate, + const MPI_Win &commwin, const std::vector &disp) +#else +void fillRemoteCommunities(const Graph &dg, const int me, const int nprocs, + const size_t &ssz, const size_t &rsz, const std::vector &ssizes, + const std::vector &rsizes, const std::vector &svdata, + const std::vector &rvdata, const std::vector &currComm, + const std::vector &localCinfo, std::map &remoteCinfo, + std::unordered_map &remoteComm, std::map &remoteCupdate) +#endif +{ +#if defined(USE_MPI_RMA) + std::vector scdata(ssz); +#else + std::vector rcdata(rsz), scdata(ssz); +#endif + GraphElem spos, rpos; +#if defined(REPLACE_STL_UOSET_WITH_VECTOR) + std::vector< std::vector< GraphElem > > rcinfo(nprocs); +#else + std::vector > rcinfo(nprocs); +#endif + +#if defined(USE_MPI_SENDRECV) +#else + std::vector rreqs(nprocs), sreqs(nprocs); +#endif + +#ifdef DEBUG_PRINTF + double t0, t1, ta = 0.0; +#endif + + const GraphElem base = dg.get_base(me), bound = dg.get_bound(me); + const GraphElem nv = dg.get_lnv(); + MPI_Comm gcomm = dg.get_comm(); + + // Collects Communities of local vertices for remote nodes +#ifdef OMP_SCHEDULE_RUNTIME +#pragma omp parallel for shared(svdata, scdata, currComm) schedule(runtime) +#else +#pragma omp parallel for shared(svdata, scdata, currComm) schedule(static) +#endif + for (GraphElem i = 0; i < ssz; i++) { + const GraphElem vertex = svdata[i]; +#ifdef DEBUG_PRINTF + assert((vertex >= base) && (vertex < bound)); +#endif + const GraphElem comm = currComm[vertex - base]; + scdata[i] = comm; + } + + std::vector rcsizes(nprocs), scsizes(nprocs); + std::vector sinfo, rinfo; + +#ifdef DEBUG_PRINTF + t0 = MPI_Wtime(); +#endif + spos = 0; + rpos = 0; +#if defined(USE_MPI_COLLECTIVES) + std::vector scnts(nprocs), rcnts(nprocs), sdispls(nprocs), rdispls(nprocs); + for (int i = 0; i < nprocs; i++) { + scnts[i] = ssizes[i]; + rcnts[i] = rsizes[i]; + sdispls[i] = spos; + rdispls[i] = rpos; + spos += scnts[i]; + rpos += rcnts[i]; + } + scnts[me] = 0; + rcnts[me] = 0; + MPI_Alltoallv(scdata.data(), scnts.data(), sdispls.data(), + MPI_GRAPH_TYPE, rcdata.data(), rcnts.data(), rdispls.data(), + MPI_GRAPH_TYPE, gcomm); +#elif defined(USE_MPI_RMA) + for (int i = 0; i < nprocs; i++) { + if (i != me) { +#if defined(USE_MPI_ACCUMULATE) + MPI_Accumulate(scdata.data() + spos, ssizes[i], MPI_GRAPH_TYPE, i, + disp[i], ssizes[i], MPI_GRAPH_TYPE, MPI_REPLACE, commwin); +#else + MPI_Put(scdata.data() + spos, ssizes[i], MPI_GRAPH_TYPE, i, + disp[i], ssizes[i], MPI_GRAPH_TYPE, commwin); +#endif + } + spos += ssizes[i]; + rpos += rsizes[i]; + } +#elif defined(USE_MPI_SENDRECV) + for (int i = 0; i < nprocs; i++) { + if (i != me) + MPI_Sendrecv(scdata.data() + spos, ssizes[i], MPI_GRAPH_TYPE, i, CommunityTag, + rcdata.data() + rpos, rsizes[i], MPI_GRAPH_TYPE, i, CommunityTag, + gcomm, MPI_STATUSES_IGNORE); + + spos += ssizes[i]; + rpos += rsizes[i]; + } +#else + for (int i = 0; i < nprocs; i++) { + if (i != me) + MPI_Irecv(rcdata.data() + rpos, rsizes[i], MPI_GRAPH_TYPE, i, + CommunityTag, gcomm, &rreqs[i]); + else + rreqs[i] = MPI_REQUEST_NULL; + + rpos += rsizes[i]; + } + for (int i = 0; i < nprocs; i++) { + if (i != me) + MPI_Isend(scdata.data() + spos, ssizes[i], MPI_GRAPH_TYPE, i, + CommunityTag, gcomm, &sreqs[i]); + else + sreqs[i] = MPI_REQUEST_NULL; + + spos += ssizes[i]; + } + + MPI_Waitall(nprocs, sreqs.data(), MPI_STATUSES_IGNORE); + MPI_Waitall(nprocs, rreqs.data(), MPI_STATUSES_IGNORE); +#endif +#ifdef DEBUG_PRINTF + t1 = MPI_Wtime(); + ta += (t1 - t0); +#endif + + // reserve vectors +#if defined(REPLACE_STL_UOSET_WITH_VECTOR) + for (GraphElem i = 0; i < nprocs; i++) { + rcinfo[i].reserve(rpos); + } +#endif + + // fetch baseptr from MPI window +#if defined(USE_MPI_RMA) + MPI_Win_flush_all(commwin); + MPI_Barrier(gcomm); + + GraphElem *rcbuf = nullptr; + int flag = 0; + MPI_Win_get_attr(commwin, MPI_WIN_BASE, &rcbuf, &flag); +#endif + + remoteComm.clear(); + for (GraphElem i = 0; i < rpos; i++) { + +#if defined(USE_MPI_RMA) + const GraphElem comm = rcbuf[i]; +#else + const GraphElem comm = rcdata[i]; +#endif + + remoteComm.insert(std::unordered_map::value_type(rvdata[i], comm)); + const int tproc = dg.get_owner(comm); + + if (tproc != me) +#if defined(REPLACE_STL_UOSET_WITH_VECTOR) + rcinfo[tproc].emplace_back(comm); +#else + rcinfo[tproc].insert(comm); +#endif + } + + for (GraphElem i = 0; i < nv; i++) { + const GraphElem comm = currComm[i]; + const int tproc = dg.get_owner(comm); + + if (tproc != me) +#if defined(REPLACE_STL_UOSET_WITH_VECTOR) + rcinfo[tproc].emplace_back(comm); +#else + rcinfo[tproc].insert(comm); +#endif + } + +#ifdef DEBUG_PRINTF + t0 = MPI_Wtime(); +#endif + GraphElem stcsz = 0, rtcsz = 0; + +#ifdef OMP_SCHEDULE_RUNTIME +#pragma omp parallel for shared(scsizes, rcinfo) \ + reduction(+:stcsz) schedule(runtime) +#else +#pragma omp parallel for shared(scsizes, rcinfo) \ + reduction(+:stcsz) schedule(static) +#endif + for (int i = 0; i < nprocs; i++) { + scsizes[i] = rcinfo[i].size(); + stcsz += scsizes[i]; + } + + MPI_Alltoall(scsizes.data(), 1, MPI_GRAPH_TYPE, rcsizes.data(), + 1, MPI_GRAPH_TYPE, gcomm); + +#ifdef DEBUG_PRINTF + t1 = MPI_Wtime(); + ta += (t1 - t0); +#endif + +#ifdef OMP_SCHEDULE_RUNTIME +#pragma omp parallel for shared(rcsizes) \ + reduction(+:rtcsz) schedule(runtime) +#else +#pragma omp parallel for shared(rcsizes) \ + reduction(+:rtcsz) schedule(static) +#endif + for (int i = 0; i < nprocs; i++) { + rtcsz += rcsizes[i]; + } + +#ifdef DEBUG_PRINTF + std::cout << "[" << me << "]Total communities to receive: " << rtcsz << std::endl; +#endif +#if defined(USE_MPI_COLLECTIVES) + std::vector rcomms(rtcsz), scomms(stcsz); +#else +#if defined(REPLACE_STL_UOSET_WITH_VECTOR) + std::vector rcomms(rtcsz); +#else + std::vector rcomms(rtcsz), scomms(stcsz); +#endif +#endif + sinfo.resize(rtcsz); + rinfo.resize(stcsz); + +#ifdef DEBUG_PRINTF + t0 = MPI_Wtime(); +#endif + spos = 0; + rpos = 0; +#if defined(USE_MPI_COLLECTIVES) + for (int i = 0; i < nprocs; i++) { + if (i != me) { + std::copy(rcinfo[i].begin(), rcinfo[i].end(), scomms.data() + spos); + } + scnts[i] = scsizes[i]; + rcnts[i] = rcsizes[i]; + sdispls[i] = spos; + rdispls[i] = rpos; + spos += scnts[i]; + rpos += rcnts[i]; + } + scnts[me] = 0; + rcnts[me] = 0; + MPI_Alltoallv(scomms.data(), scnts.data(), sdispls.data(), + MPI_GRAPH_TYPE, rcomms.data(), rcnts.data(), rdispls.data(), + MPI_GRAPH_TYPE, gcomm); + + for (int i = 0; i < nprocs; i++) { + if (i != me) { +#ifdef OMP_SCHEDULE_RUNTIME +#pragma omp parallel for default(none), shared(rcsizes, rcomms, localCinfo, sinfo, rdispls), \ + firstprivate(i), schedule(runtime) /*, if(rcsizes[i] >= 1000) */ +#else +#pragma omp parallel for default(none), shared(rcsizes, rcomms, localCinfo, sinfo, rdispls), \ + firstprivate(i), schedule(guided) /*, if(rcsizes[i] >= 1000) */ +#endif + for (GraphElem j = 0; j < rcsizes[i]; j++) { + const GraphElem comm = rcomms[rdispls[i] + j]; + sinfo[rdispls[i] + j] = {comm, localCinfo[comm-base].size, localCinfo[comm-base].degree}; + } + } + } + + MPI_Alltoallv(sinfo.data(), rcnts.data(), rdispls.data(), + commType, rinfo.data(), scnts.data(), sdispls.data(), + commType, gcomm); +#else +#if !defined(USE_MPI_SENDRECV) + std::vector rcreqs(nprocs); +#endif + for (int i = 0; i < nprocs; i++) { + if (i != me) { +#if defined(USE_MPI_SENDRECV) +#if defined(REPLACE_STL_UOSET_WITH_VECTOR) + MPI_Sendrecv(rcinfo[i].data(), scsizes[i], MPI_GRAPH_TYPE, i, CommunityTag, + rcomms.data() + rpos, rcsizes[i], MPI_GRAPH_TYPE, i, CommunityTag, + gcomm, MPI_STATUSES_IGNORE); +#else + std::copy(rcinfo[i].begin(), rcinfo[i].end(), scomms.data() + spos); + MPI_Sendrecv(scomms.data() + spos, scsizes[i], MPI_GRAPH_TYPE, i, CommunityTag, + rcomms.data() + rpos, rcsizes[i], MPI_GRAPH_TYPE, i, CommunityTag, + gcomm, MPI_STATUSES_IGNORE); +#endif +#else + MPI_Irecv(rcomms.data() + rpos, rcsizes[i], MPI_GRAPH_TYPE, i, + CommunityTag, gcomm, &rreqs[i]); +#if defined(REPLACE_STL_UOSET_WITH_VECTOR) + MPI_Isend(rcinfo[i].data(), scsizes[i], MPI_GRAPH_TYPE, i, + CommunityTag, gcomm, &sreqs[i]); +#else + std::copy(rcinfo[i].begin(), rcinfo[i].end(), scomms.data() + spos); + MPI_Isend(scomms.data() + spos, scsizes[i], MPI_GRAPH_TYPE, i, + CommunityTag, gcomm, &sreqs[i]); +#endif +#endif + } + else { +#if !defined(USE_MPI_SENDRECV) + rreqs[i] = MPI_REQUEST_NULL; + sreqs[i] = MPI_REQUEST_NULL; +#endif + } + rpos += rcsizes[i]; + spos += scsizes[i]; + } + + spos = 0; + rpos = 0; + + // poke progress on last isend/irecvs +#if !defined(USE_MPI_COLLECTIVES) && !defined(USE_MPI_SENDRECV) && defined(POKE_PROGRESS_FOR_COMMUNITY_SENDRECV_IN_LOOP) + int tf = 0, id = 0; + MPI_Testany(nprocs, sreqs.data(), &id, &tf, MPI_STATUS_IGNORE); +#endif + +#if !defined(USE_MPI_COLLECTIVES) && !defined(USE_MPI_SENDRECV) && !defined(POKE_PROGRESS_FOR_COMMUNITY_SENDRECV_IN_LOOP) + MPI_Waitall(nprocs, sreqs.data(), MPI_STATUSES_IGNORE); + MPI_Waitall(nprocs, rreqs.data(), MPI_STATUSES_IGNORE); +#endif + + for (int i = 0; i < nprocs; i++) { + if (i != me) { +#if defined(USE_MPI_SENDRECV) +#ifdef OMP_SCHEDULE_RUNTIME +#pragma omp parallel for default(none), shared(rcsizes, rcomms, localCinfo, sinfo), \ + firstprivate(i, rpos), schedule(runtime) +#else +#pragma omp parallel for default(none), shared(rcsizes, rcomms, localCinfo, sinfo), \ + firstprivate(i, rpos), schedule(guided) +#endif + for (GraphElem j = 0; j < rcsizes[i]; j++) { + const GraphElem comm = rcomms[rpos + j]; + sinfo[rpos + j] = {comm, localCinfo[comm-base].size, localCinfo[comm-base].degree}; + } + + MPI_Sendrecv(sinfo.data() + rpos, rcsizes[i], commType, i, CommunityDataTag, + rinfo.data() + spos, scsizes[i], commType, i, CommunityDataTag, + gcomm, MPI_STATUSES_IGNORE); +#else + MPI_Irecv(rinfo.data() + spos, scsizes[i], commType, i, CommunityDataTag, + gcomm, &rcreqs[i]); + + // poke progress on last isend/irecvs +#if defined(POKE_PROGRESS_FOR_COMMUNITY_SENDRECV_IN_LOOP) + int flag = 0, done = 0; + while (!done) { + MPI_Test(&sreqs[i], &flag, MPI_STATUS_IGNORE); + MPI_Test(&rreqs[i], &flag, MPI_STATUS_IGNORE); + if (flag) + done = 1; + } +#endif + +#ifdef OMP_SCHEDULE_RUNTIME +#pragma omp parallel for default(none), shared(rcsizes, rcomms, localCinfo, sinfo), \ + firstprivate(i, rpos), schedule(runtime) +#else +#pragma omp parallel for default(none), shared(rcsizes, rcomms, localCinfo, sinfo), \ + firstprivate(i, rpos), schedule(guided) +#endif + for (GraphElem j = 0; j < rcsizes[i]; j++) { + const GraphElem comm = rcomms[rpos + j]; + sinfo[rpos + j] = {comm, localCinfo[comm-base].size, localCinfo[comm-base].degree}; + } + + MPI_Isend(sinfo.data() + rpos, rcsizes[i], commType, i, + CommunityDataTag, gcomm, &sreqs[i]); +#endif + } + else { +#if !defined(USE_MPI_SENDRECV) + rcreqs[i] = MPI_REQUEST_NULL; + sreqs[i] = MPI_REQUEST_NULL; +#endif + } + rpos += rcsizes[i]; + spos += scsizes[i]; + } + +#if !defined(USE_MPI_SENDRECV) + MPI_Waitall(nprocs, sreqs.data(), MPI_STATUSES_IGNORE); + MPI_Waitall(nprocs, rcreqs.data(), MPI_STATUSES_IGNORE); +#endif + +#endif + +#ifdef DEBUG_PRINTF + t1 = MPI_Wtime(); + ta += (t1 - t0); +#endif + + remoteCinfo.clear(); + remoteCupdate.clear(); + + for (GraphElem i = 0; i < stcsz; i++) { + const GraphElem ccomm = rinfo[i].community; + + Comm comm; + + comm.size = rinfo[i].size; + comm.degree = rinfo[i].degree; + + remoteCinfo.insert(std::map::value_type(ccomm, comm)); + remoteCupdate.insert(std::map::value_type(ccomm, Comm())); + } +} // end fillRemoteCommunities + +void createCommunityMPIType() +{ + CommInfo cinfo; + + MPI_Aint begin, community, size, degree; + + MPI_Get_address(&cinfo, &begin); + MPI_Get_address(&cinfo.community, &community); + MPI_Get_address(&cinfo.size, &size); + MPI_Get_address(&cinfo.degree, °ree); + + int blens[] = { 1, 1, 1 }; + MPI_Aint displ[] = { community - begin, size - begin, degree - begin }; + MPI_Datatype types[] = { MPI_GRAPH_TYPE, MPI_GRAPH_TYPE, MPI_WEIGHT_TYPE }; + + MPI_Type_create_struct(3, blens, displ, types, &commType); + MPI_Type_commit(&commType); +} // createCommunityMPIType + +void destroyCommunityMPIType() +{ + MPI_Type_free(&commType); +} // destroyCommunityMPIType + +void updateRemoteCommunities(const Graph &dg, std::vector &localCinfo, + const std::map &remoteCupdate, + const int me, const int nprocs) +{ + const GraphElem base = dg.get_base(me), bound = dg.get_bound(me); + std::vector> remoteArray(nprocs); + MPI_Comm gcomm = dg.get_comm(); + + // FIXME TODO can we use TBB::concurrent_vector instead, + // to make this parallel; first we have to get rid of maps + for (std::map::const_iterator iter = remoteCupdate.begin(); iter != remoteCupdate.end(); iter++) { + const GraphElem i = iter->first; + const Comm &curr = iter->second; + + const int tproc = dg.get_owner(i); + +#ifdef DEBUG_PRINTF + assert(tproc != me); +#endif + CommInfo rcinfo; + + rcinfo.community = i; + rcinfo.size = curr.size; + rcinfo.degree = curr.degree; + + remoteArray[tproc].push_back(rcinfo); + } + + std::vector send_sz(nprocs), recv_sz(nprocs); + +#ifdef DEBUG_PRINTF + GraphWeight tc = 0.0; + const double t0 = MPI_Wtime(); +#endif + +#ifdef OMP_SCHEDULE_RUNTIME +#pragma omp parallel for schedule(runtime) +#else +#pragma omp parallel for schedule(static) +#endif + for (int i = 0; i < nprocs; i++) { + send_sz[i] = remoteArray[i].size(); + } + + MPI_Alltoall(send_sz.data(), 1, MPI_GRAPH_TYPE, recv_sz.data(), + 1, MPI_GRAPH_TYPE, gcomm); + +#ifdef DEBUG_PRINTF + const double t1 = MPI_Wtime(); + tc += (t1 - t0); +#endif + + GraphElem rcnt = 0, scnt = 0; +#ifdef OMP_SCHEDULE_RUNTIME +#pragma omp parallel for shared(recv_sz, send_sz) \ + reduction(+:rcnt, scnt) schedule(runtime) +#else +#pragma omp parallel for shared(recv_sz, send_sz) \ + reduction(+:rcnt, scnt) schedule(static) +#endif + for (int i = 0; i < nprocs; i++) { + rcnt += recv_sz[i]; + scnt += send_sz[i]; + } +#ifdef DEBUG_PRINTF + std::cout << "[" << me << "]Total number of remote communities to update: " << scnt << std::endl; +#endif + + GraphElem currPos = 0; + std::vector rdata(rcnt); + +#ifdef DEBUG_PRINTF + const double t2 = MPI_Wtime(); +#endif +#if defined(USE_MPI_SENDRECV) + for (int i = 0; i < nprocs; i++) { + if (i != me) + MPI_Sendrecv(remoteArray[i].data(), send_sz[i], commType, i, CommunityDataTag, + rdata.data() + currPos, recv_sz[i], commType, i, CommunityDataTag, + gcomm, MPI_STATUSES_IGNORE); + + currPos += recv_sz[i]; + } +#else + std::vector sreqs(nprocs), rreqs(nprocs); + for (int i = 0; i < nprocs; i++) { + if (i != me) + MPI_Irecv(rdata.data() + currPos, recv_sz[i], commType, i, + CommunityDataTag, gcomm, &rreqs[i]); + else + rreqs[i] = MPI_REQUEST_NULL; + + currPos += recv_sz[i]; + } + + for (int i = 0; i < nprocs; i++) { + if (i != me) + MPI_Isend(remoteArray[i].data(), send_sz[i], commType, i, + CommunityDataTag, gcomm, &sreqs[i]); + else + sreqs[i] = MPI_REQUEST_NULL; + } + + MPI_Waitall(nprocs, sreqs.data(), MPI_STATUSES_IGNORE); + MPI_Waitall(nprocs, rreqs.data(), MPI_STATUSES_IGNORE); +#endif +#ifdef DEBUG_PRINTF + const double t3 = MPI_Wtime(); + std::cout << "[" << me << "]Update remote community MPI time: " << (t3 - t2) << std::endl; +#endif + +#ifdef OMP_SCHEDULE_RUNTIME +#pragma omp parallel for shared(rdata, localCinfo) schedule(runtime) +#else +#pragma omp parallel for shared(rdata, localCinfo) schedule(dynamic) +#endif + for (GraphElem i = 0; i < rcnt; i++) { + const CommInfo &curr = rdata[i]; + +#ifdef DEBUG_PRINTF + assert(dg.get_owner(curr.community) == me); +#endif + localCinfo[curr.community-base].size += curr.size; + localCinfo[curr.community-base].degree += curr.degree; + } +} // updateRemoteCommunities + +// initial setup before Louvain iteration begins +#if defined(USE_MPI_RMA) +void exchangeVertexReqs(const Graph &dg, size_t &ssz, size_t &rsz, + std::vector &ssizes, std::vector &rsizes, + std::vector &svdata, std::vector &rvdata, + const int me, const int nprocs, MPI_Win &commwin) +#else +void exchangeVertexReqs(const Graph &dg, size_t &ssz, size_t &rsz, + std::vector &ssizes, std::vector &rsizes, + std::vector &svdata, std::vector &rvdata, + const int me, const int nprocs) +#endif +{ + const GraphElem base = dg.get_base(me), bound = dg.get_bound(me); + const GraphElem nv = dg.get_lnv(); + MPI_Comm gcomm = dg.get_comm(); + +#ifdef USE_OPENMP_LOCK + std::vector locks(nprocs); + for (int i = 0; i < nprocs; i++) + omp_init_lock(&locks[i]); +#endif + std::vector> parray(nprocs); + +#ifdef USE_OPENMP_LOCK +#pragma omp parallel default(none), shared(dg, locks, parray) +#else +#pragma omp parallel default(none), shared(dg, parray) +#endif + { +#ifdef OMP_SCHEDULE_RUNTIME +#pragma omp for schedule(runtime) +#else +#pragma omp for schedule(guided) +#endif + for (GraphElem i = 0; i < nv; i++) { + GraphElem e0, e1; + + dg.edge_range(i, e0, e1); + + for (GraphElem j = e0; j < e1; j++) { + const Edge &edge = dg.get_edge(j); + const int tproc = dg.get_owner(edge.tail_); + + if (tproc != me) { +#ifdef USE_OPENMP_LOCK + omp_set_lock(&locks[tproc]); +#else + lock(); +#endif + parray[tproc].insert(edge.tail_); +#ifdef USE_OPENMP_LOCK + omp_unset_lock(&locks[tproc]); +#else + unlock(); +#endif + } + } + } + } + +#ifdef USE_OPENMP_LOCK + for (int i = 0; i < nprocs; i++) { + omp_destroy_lock(&locks[i]); + } +#endif + + rsizes.resize(nprocs); + ssizes.resize(nprocs); + ssz = 0, rsz = 0; + + int pproc = 0; + // TODO FIXME parallelize this loop + for (std::vector>::const_iterator iter = parray.begin(); iter != parray.end(); iter++) { + ssz += iter->size(); + ssizes[pproc] = iter->size(); + pproc++; + } + + MPI_Alltoall(ssizes.data(), 1, MPI_GRAPH_TYPE, rsizes.data(), + 1, MPI_GRAPH_TYPE, gcomm); + + GraphElem rsz_r = 0; +#ifdef OMP_SCHEDULE_RUNTIME +#pragma omp parallel for shared(rsizes) \ + reduction(+:rsz_r) schedule(runtime) +#else +#pragma omp parallel for shared(rsizes) \ + reduction(+:rsz_r) schedule(static) +#endif + for (int i = 0; i < nprocs; i++) + rsz_r += rsizes[i]; + rsz = rsz_r; + + svdata.resize(ssz); + rvdata.resize(rsz); + + GraphElem cpos = 0, rpos = 0; + pproc = 0; + +#if defined(USE_MPI_COLLECTIVES) + std::vector scnts(nprocs), rcnts(nprocs), sdispls(nprocs), rdispls(nprocs); + + for (std::vector>::const_iterator iter = parray.begin(); iter != parray.end(); iter++) { + std::copy(iter->begin(), iter->end(), svdata.begin() + cpos); + + scnts[pproc] = iter->size(); + rcnts[pproc] = rsizes[pproc]; + sdispls[pproc] = cpos; + rdispls[pproc] = rpos; + cpos += iter->size(); + rpos += rcnts[pproc]; + + pproc++; + } + + scnts[me] = 0; + rcnts[me] = 0; + MPI_Alltoallv(svdata.data(), scnts.data(), sdispls.data(), + MPI_GRAPH_TYPE, rvdata.data(), rcnts.data(), rdispls.data(), + MPI_GRAPH_TYPE, gcomm); +#else + std::vector rreqs(nprocs), sreqs(nprocs); + for (int i = 0; i < nprocs; i++) { + if (i != me) + MPI_Irecv(rvdata.data() + rpos, rsizes[i], MPI_GRAPH_TYPE, i, + VertexTag, gcomm, &rreqs[i]); + else + rreqs[i] = MPI_REQUEST_NULL; + + rpos += rsizes[i]; + } + + for (std::vector>::const_iterator iter = parray.begin(); iter != parray.end(); iter++) { + std::copy(iter->begin(), iter->end(), svdata.begin() + cpos); + + if (me != pproc) + MPI_Isend(svdata.data() + cpos, iter->size(), MPI_GRAPH_TYPE, pproc, + VertexTag, gcomm, &sreqs[pproc]); + else + sreqs[pproc] = MPI_REQUEST_NULL; + + cpos += iter->size(); + pproc++; + } + + MPI_Waitall(nprocs, sreqs.data(), MPI_STATUSES_IGNORE); + MPI_Waitall(nprocs, rreqs.data(), MPI_STATUSES_IGNORE); +#endif + + std::swap(svdata, rvdata); + std::swap(ssizes, rsizes); + std::swap(ssz, rsz); + + // create MPI window for communities +#if defined(USE_MPI_RMA) + GraphElem *ptr = nullptr; + MPI_Info info = MPI_INFO_NULL; +#if defined(USE_MPI_ACCUMULATE) + MPI_Info_create(&info); + MPI_Info_set(info, "accumulate_ordering", "none"); + MPI_Info_set(info, "accumulate_ops", "same_op"); +#endif + MPI_Win_allocate(rsz*sizeof(GraphElem), sizeof(GraphElem), + info, gcomm, &ptr, &commwin); + MPI_Win_lock_all(MPI_MODE_NOCHECK, commwin); +#endif +} // exchangeVertexReqs + +#if defined(USE_MPI_RMA) +GraphWeight distLouvainMethod(const int me, const int nprocs, const Graph &dg, + size_t &ssz, size_t &rsz, std::vector &ssizes, std::vector &rsizes, + std::vector &svdata, std::vector &rvdata, const GraphWeight lower, + const GraphWeight thresh, int &iters, MPI_Win &commwin) +#else +GraphWeight distLouvainMethod(const int me, const int nprocs, const Graph &dg, + size_t &ssz, size_t &rsz, std::vector &ssizes, std::vector &rsizes, + std::vector &svdata, std::vector &rvdata, const GraphWeight lower, + const GraphWeight thresh, int &iters) +#endif +{ + std::vector pastComm, currComm, targetComm; + std::vector vDegree; + std::vector clusterWeight; + std::vector localCinfo, localCupdate; + + std::unordered_map remoteComm; + std::map remoteCinfo, remoteCupdate; + + const GraphElem nv = dg.get_lnv(); + MPI_Comm gcomm = dg.get_comm(); + + GraphWeight constantForSecondTerm; + GraphWeight prevMod = lower; + GraphWeight currMod = -1.0; + int numIters = 0; + + distInitLouvain(dg, pastComm, currComm, vDegree, clusterWeight, localCinfo, + localCupdate, constantForSecondTerm, me); + targetComm.resize(nv); + +#ifdef DEBUG_PRINTF + std::cout << "[" << me << "]constantForSecondTerm: " << constantForSecondTerm << std::endl; + if (me == 0) + std::cout << "Threshold: " << thresh << std::endl; +#endif + const GraphElem base = dg.get_base(me), bound = dg.get_bound(me); + +#ifdef DEBUG_PRINTF + double t0, t1; + t0 = MPI_Wtime(); +#endif + + // setup vertices and communities +#if defined(USE_MPI_RMA) + exchangeVertexReqs(dg, ssz, rsz, ssizes, rsizes, + svdata, rvdata, me, nprocs, commwin); + + // store the remote displacements + std::vector disp(nprocs); + MPI_Exscan(ssizes.data(), (GraphElem*)disp.data(), nprocs, MPI_GRAPH_TYPE, + MPI_SUM, gcomm); +#else + exchangeVertexReqs(dg, ssz, rsz, ssizes, rsizes, + svdata, rvdata, me, nprocs); +#endif + +#ifdef DEBUG_PRINTF + t1 = MPI_Wtime(); + std::cout << "[" << me << "]Initial communication setup time before Louvain iteration (in s): " << (t1 - t0) << std::endl; +#endif + + // start Louvain iteration + while(true) { +#ifdef DEBUG_PRINTF + const double t2 = MPI_Wtime(); + if (me == 0) + std::cout << "Starting Louvain iteration: " << numIters << std::endl; +#endif + numIters++; + +#ifdef DEBUG_PRINTF + t0 = MPI_Wtime(); +#endif + +#if defined(USE_MPI_RMA) + fillRemoteCommunities(dg, me, nprocs, ssz, rsz, ssizes, + rsizes, svdata, rvdata, currComm, localCinfo, + remoteCinfo, remoteComm, remoteCupdate, + commwin, disp); +#else + fillRemoteCommunities(dg, me, nprocs, ssz, rsz, ssizes, + rsizes, svdata, rvdata, currComm, localCinfo, + remoteCinfo, remoteComm, remoteCupdate); +#endif + +#ifdef DEBUG_PRINTF + t1 = MPI_Wtime(); + std::cout << "[" << me << "]Remote community map size: " << remoteComm.size() << std::endl; + std::cout << "[" << me << "]Iteration communication time: " << (t1 - t0) << std::endl; +#endif + +#ifdef DEBUG_PRINTF + t0 = MPI_Wtime(); +#endif + +#if defined(OMP_GPU) +#pragma omp target data map(from: clusterWeight, localCupdate, targetComm) map(to: dg, currComm, vDegree) \ + map(to: localCinfo, remoteCinfo, remoteComm, remoteCupdate) +#else +#pragma omp parallel default(none), shared(clusterWeight, localCupdate, currComm, targetComm, \ + vDegree, localCinfo, remoteCinfo, remoteComm, pastComm, dg, remoteCupdate), \ + firstprivate(constantForSecondTerm) +#endif + { + distCleanCWandCU(nv, clusterWeight, localCupdate); + +#if defined(OMP_GPU) +#pragma omp target teams distribute parallel for +#elif defined(OMP_SCHEDULE_RUNTIME) +#pragma omp for schedule(runtime) +#else +#pragma omp for schedule(guided) +#endif + for (GraphElem i = 0; i < nv; i++) { + distExecuteLouvainIteration(i, dg, currComm, targetComm, vDegree, localCinfo, + localCupdate, remoteComm, remoteCinfo, remoteCupdate, + constantForSecondTerm, clusterWeight, me); + } + } + +#if defined(OMP_GPU) +#pragma omp target data map(to: localCinfo, localCupdate) +#else +#pragma omp parallel default(none), shared(localCinfo, localCupdate) +#endif + { + distUpdateLocalCinfo(localCinfo, localCupdate); + } + + // communicate remote communities + updateRemoteCommunities(dg, localCinfo, remoteCupdate, me, nprocs); + + // compute modularity + currMod = distComputeModularity(dg, localCinfo, clusterWeight, constantForSecondTerm, me); + + // exit criteria + if (currMod - prevMod < thresh) + break; + + prevMod = currMod; + if (prevMod < lower) + prevMod = lower; + +#ifdef OMP_SCHEDULE_RUNTIME +#pragma omp parallel for default(none) \ + shared(pastComm, currComm, targetComm) \ + schedule(runtime) +#else +#pragma omp parallel for default(none) \ + shared(pastComm, currComm, targetComm) \ + schedule(static) +#endif + for (GraphElem i = 0; i < nv; i++) { + GraphElem tmp = pastComm[i]; + pastComm[i] = currComm[i]; + currComm[i] = targetComm[i]; + targetComm[i] = tmp; + } + } // end of Louvain iteration + +#if defined(USE_MPI_RMA) + MPI_Win_unlock_all(commwin); + MPI_Win_free(&commwin); +#endif + + iters = numIters; + + vDegree.clear(); + pastComm.clear(); + currComm.clear(); + targetComm.clear(); + clusterWeight.clear(); + localCinfo.clear(); + localCupdate.clear(); + + return prevMod; +} // distLouvainMethod plain + +#endif // __DSPL diff --git a/miniVite/dspl_gpu_kernel.hpp b/miniVite/dspl_gpu_kernel.hpp new file mode 100644 index 0000000..1cf9c70 --- /dev/null +++ b/miniVite/dspl_gpu_kernel.hpp @@ -0,0 +1,1447 @@ +// *********************************************************************** +// +// miniVite +// +// *********************************************************************** +// +// Copyright (2018) Battelle Memorial Institute +// All rights reserved. +// +// Redistribution and use in source and binary forms, with or without +// modification, are permitted provided that the following conditions +// are met: +// +// 1. Redistributions of source code must retain the above copyright +// notice, this list of conditions and the following disclaimer. +// +// 2. Redistributions in binary form must reproduce the above copyright +// notice, this list of conditions and the following disclaimer in the +// documentation and/or other materials provided with the distribution. +// +// 3. Neither the name of the copyright holder nor the names of its +// contributors may be used to endorse or promote products derived from +// this software without specific prior written permission. +// +// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS +// FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE +// COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, +// INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, +// BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; +// LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER +// CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +// LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN +// ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +// POSSIBILITY OF SUCH DAMAGE. +// +// ************************************************************************ + +#pragma once +#ifndef DSPL_HPP +#define DSPL_HPP + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +#include "graph.hpp" +#include "utils.hpp" + +struct Comm { + GraphElem size; + GraphWeight degree; + + Comm() : size(0), degree(0.0) {}; +}; + +struct CommInfo { + GraphElem community; + GraphElem size; + GraphWeight degree; +}; + +const int SizeTag = 1; +const int VertexTag = 2; +const int CommunityTag = 3; +const int CommunitySizeTag = 4; +const int CommunityDataTag = 5; + +static MPI_Datatype commType; + +void distSumVertexDegree(const Graph &g, std::vector &vDegree, std::vector &localCinfo) +{ + const GraphElem nv = g.get_lnv(); + +#ifdef OMP_SCHEDULE_RUNTIME +#pragma omp parallel for default(none), shared(g, vDegree, localCinfo), schedule(runtime) +#else +#pragma omp parallel for default(none), shared(g, vDegree, localCinfo), firstprivate(nv) schedule(guided) +#endif + for (GraphElem i = 0; i < nv; i++) { + GraphElem e0, e1; + GraphWeight tw = 0.0; + + g.edge_range(i, e0, e1); + + for (GraphElem k = e0; k < e1; k++) { + const Edge &edge = g.get_edge(k); + tw += edge.weight_; + } + + vDegree[i] = tw; + + localCinfo[i].degree = tw; + localCinfo[i].size = 1L; + } +} // distSumVertexDegree + +GraphWeight distCalcConstantForSecondTerm(const std::vector &vDegree, MPI_Comm gcomm) +{ + GraphWeight totalEdgeWeightTwice = 0.0; + GraphWeight localWeight = 0.0; + int me = -1; + + const size_t vsz = vDegree.size(); + +#ifdef OMP_SCHEDULE_RUNTIME +#pragma omp parallel for default(none), shared(vDegree), reduction(+: localWeight) schedule(runtime) +#else +#pragma omp parallel for default(none), shared(vDegree), firstprivate(vsz), reduction(+: localWeight) schedule(static) +#endif + for (GraphElem i = 0; i < vsz; i++) + localWeight += vDegree[i]; // Local reduction + + // Global reduction + MPI_Allreduce(&localWeight, &totalEdgeWeightTwice, 1, + MPI_WEIGHT_TYPE, MPI_SUM, gcomm); + + return (1.0 / static_cast(totalEdgeWeightTwice)); +} // distCalcConstantForSecondTerm + +void distInitComm(std::vector &pastComm, std::vector &currComm, const GraphElem base) +{ + const size_t csz = currComm.size(); + +#ifdef DEBUG_PRINTF + assert(csz == pastComm.size()); +#endif + +#ifdef OMP_SCHEDULE_RUNTIME +#pragma omp parallel for default(none), shared(pastComm, currComm), schedule(runtime) +#else +#pragma omp parallel for default(none), shared(pastComm, currComm), firstprivate(csz, base), schedule(static) +#endif + for (GraphElem i = 0L; i < csz; i++) { + pastComm[i] = i + base; + currComm[i] = i + base; + } +} // distInitComm + +void distInitLouvain(const Graph &dg, std::vector &pastComm, + std::vector &currComm, std::vector &vDegree, + std::vector &clusterWeight, std::vector &localCinfo, + std::vector &localCupdate, GraphWeight &constantForSecondTerm, + const int me) +{ + const GraphElem base = dg.get_base(me); + const GraphElem nv = dg.get_lnv(); + MPI_Comm gcomm = dg.get_comm(); + + vDegree.resize(nv); + pastComm.resize(nv); + currComm.resize(nv); + clusterWeight.resize(nv); + localCinfo.resize(nv); + localCupdate.resize(nv); + + distSumVertexDegree(dg, vDegree, localCinfo); + constantForSecondTerm = distCalcConstantForSecondTerm(vDegree, gcomm); + + distInitComm(pastComm, currComm, base); +} // distInitLouvain + +struct clmap_t { + GraphElem f; + GraphElem s; +}; +#define CLMAP_MAX_NUM 32 +#define COUNT_MAX_NUM 32 + +GraphElem distGetMaxIndex(clmap_t *clmap, int &clmap_size, GraphWeight *counter, int &counter_size, + const GraphWeight selfLoop, const Comm *localCinfo, + const GraphWeight vDegree, + const GraphElem currSize, const GraphWeight currDegree, const GraphElem currComm, + const GraphElem base, const GraphElem bound, const GraphWeight constant) +{ + //std::unordered_map::const_iterator storedAlready; + clmap_t *storedAlready; + GraphElem maxIndex = currComm; + GraphWeight curGain = 0.0, maxGain = 0.0; + //GraphWeight eix = static_cast(counter[0]) - static_cast(selfLoop); + GraphWeight eix = counter[0] - selfLoop; + + GraphWeight ax = currDegree - vDegree; + GraphWeight eiy = 0.0, ay = 0.0; + + GraphElem maxSize = currSize; + GraphElem size = 0; + + //storedAlready = clmap.begin(); + storedAlready = clmap; +#ifdef DEBUG_PRINTF + //assert(storedAlready != clmap.end()); +#endif + do { + //if (currComm != storedAlready->first) { + if (currComm != storedAlready->f) { + + // is_local, direct access local info + //assert((storedAlready->first >= base) && (storedAlready->first < bound)); + //ay = localCinfo[storedAlready->first-base].degree; + //size = localCinfo[storedAlready->first - base].size; + //assert((storedAlready->f >= base) && (storedAlready->f < bound)); + ay = localCinfo[storedAlready->f-base].degree; + size = localCinfo[storedAlready->f - base].size; + + //eiy = counter[storedAlready->second]; + if (storedAlready->s < counter_size) + eiy = counter[storedAlready->s]; + + curGain = 2.0 * (eiy - eix) - 2.0 * vDegree * (ay - ax) * constant; + + if ((curGain > maxGain) || + //((curGain == maxGain) && (curGain != 0.0) && (storedAlready->first < maxIndex))) { + ((curGain == maxGain) && (curGain != 0.0) && (storedAlready->f < maxIndex))) { + maxGain = curGain; + //maxIndex = storedAlready->first; + maxIndex = storedAlready->f; + maxSize = size; + } + } + storedAlready++; + //} while (storedAlready != clmap.end()); + } while (storedAlready != clmap + clmap_size); + + if ((maxSize == 1) && (currSize == 1) && (maxIndex > currComm)) + maxIndex = currComm; + + return maxIndex; +} // distGetMaxIndex + +GraphWeight distBuildLocalMapCounter(const GraphElem e0, const GraphElem e1, clmap_t *clmap, int &clmap_size, + GraphWeight *counter, int &counter_size, const Edge *edge_list, + const GraphElem *currComm, + const GraphElem vertex, const GraphElem base, const GraphElem bound) +{ + GraphElem numUniqueClusters = 1L; + GraphWeight selfLoop = 0; + //std::unordered_map::const_iterator storedAlready; + clmap_t *storedAlready; + + for (GraphElem j = e0; j < e1; j++) { + + const Edge &edge = edge_list[j]; + const GraphElem &tail_ = edge.tail_; + const GraphWeight &weight = edge.weight_; + GraphElem tcomm; + + if (tail_ == vertex + base) + selfLoop += weight; + + // is_local, direct access local std::vector + tcomm = currComm[tail_ - base]; + + //storedAlready = clmap.find(tcomm); + storedAlready = clmap; + for (int i = 0; i < clmap_size; i++, storedAlready++) { + if (clmap[i].f == tcomm) + break; + } + + //if (storedAlready != clmap.end()) + // counter[storedAlready->second] += weight; + if (storedAlready != clmap + clmap_size && storedAlready->s < counter_size) + counter[storedAlready->s] += weight; + else { + //clmap.insert(std::unordered_map::value_type(tcomm, numUniqueClusters)); + if (clmap_size < CLMAP_MAX_NUM) { + clmap[clmap_size].f = tcomm; + clmap[clmap_size].s = numUniqueClusters; + clmap_size++; + } + //counter.push_back(weight); + if (counter_size < COUNT_MAX_NUM) { + counter[counter_size] = weight; + counter_size++; + } + numUniqueClusters++; + } + } + + return selfLoop; +} // distBuildLocalMapCounter + +void distExecuteLouvainIteration(const GraphElem i, const GraphElem *edge_indices, + const GraphElem *parts, const Edge *edge_list, + const GraphElem *currComm, + GraphElem *targetComm, const GraphWeight *vDegree, + Comm *localCinfo, Comm *localCupdate, + const GraphWeight constantForSecondTerm, + GraphWeight *clusterWeight, const int me) +{ + GraphElem localTarget = -1; + GraphElem e0, e1, selfLoop = 0; + //std::unordered_map clmap; + clmap_t clmap[CLMAP_MAX_NUM]; + int clmap_size = 0; + //std::vector counter; + GraphWeight counter[COUNT_MAX_NUM]; + int counter_size = 0; + + const GraphElem base = parts[me], bound = parts[me+1]; + const GraphElem cc = currComm[i]; + GraphWeight ccDegree; + GraphElem ccSize; + bool currCommIsLocal = false; + bool targetCommIsLocal = false; + + // Current Community is local +#ifdef DEBUG_PRINTF + assert(cc >= base && cc < bound); +#endif + ccDegree=localCinfo[cc-base].degree; + ccSize=localCinfo[cc-base].size; + currCommIsLocal=true; + + e0 = edge_indices[i]; + e1 = edge_indices[i+1]; + + if (e0 != e1) { + //clmap.insert(std::unordered_map::value_type(cc, 0)); + clmap[0].f = cc; + clmap[0].s = 0; + clmap_size++; + //counter.push_back(0.0); + counter[0] = 0.0; + counter_size++; + + selfLoop = distBuildLocalMapCounter(e0, e1, clmap, clmap_size, counter, counter_size, edge_list, + currComm, i, base, bound); + + clusterWeight[i] += counter[0]; + + localTarget = distGetMaxIndex(clmap, clmap_size, counter, counter_size, selfLoop, localCinfo, + vDegree[i], ccSize, ccDegree, cc, base, bound, constantForSecondTerm); + } + else + localTarget = cc; + + // is the Target Local? + //assert(localTarget >= base && localTarget < bound); + targetCommIsLocal = true; + + // current and target comm are local - atomic updates to vectors + if ((localTarget != cc) && (localTarget != -1) && currCommIsLocal && targetCommIsLocal) { + +#ifdef DEBUG_PRINTF + assert( base < localTarget < bound); + assert( base < cc < bound); + //assert( cc - base < localCupdate.size()); + //assert( localTarget - base < localCupdate.size()); +#endif + #pragma omp atomic update + localCupdate[localTarget-base].degree += vDegree[i]; + #pragma omp atomic update + localCupdate[localTarget-base].size++; + #pragma omp atomic update + localCupdate[cc-base].degree -= vDegree[i]; + #pragma omp atomic update + localCupdate[cc-base].size--; + } + +#ifdef DEBUG_PRINTF + assert(localTarget != -1); +#endif + targetComm[i] = localTarget; +} // distExecuteLouvainIteration + +GraphWeight distComputeModularity(const Graph &g, Comm *localCinfo, + const GraphWeight *clusterWeight, + const GraphWeight constantForSecondTerm, + const int me) +{ + const GraphElem nv = g.get_lnv(); + MPI_Comm gcomm = g.get_comm(); + + GraphWeight le_la_xx[2]; + GraphWeight e_a_xx[2] = {0.0, 0.0}; + GraphWeight le_xx = 0.0, la2_x = 0.0; + +#ifdef DEBUG_PRINTF + //assert((clusterWeight.size() == nv)); +#endif + +#if defined(OMP_GPU) +#pragma omp target teams distribute parallel for map(to: clusterWeight[0:nv], localCinfo[0:nv]) reduction(+: le_xx), reduction(+: la2_x) +#elif defined(OMP_SCHEDULE_RUNTIME) +#pragma omp parallel for default(none), shared(clusterWeight, localCinfo), \ + reduction(+: le_xx), reduction(+: la2_x) schedule(runtime) +#else +#pragma omp parallel for default(none), shared(clusterWeight, localCinfo), \ + reduction(+: le_xx), reduction(+: la2_x) schedule(static) +#endif + for (GraphElem i = 0L; i < nv; i++) { + le_xx += clusterWeight[i]; + //la2_x += static_cast(localCinfo[i].degree) * static_cast(localCinfo[i].degree); + la2_x += localCinfo[i].degree * localCinfo[i].degree; + } + le_la_xx[0] = le_xx; + le_la_xx[1] = la2_x; + +#ifdef DEBUG_PRINTF + const double t0 = MPI_Wtime(); +#endif + + MPI_Allreduce(le_la_xx, e_a_xx, 2, MPI_WEIGHT_TYPE, MPI_SUM, gcomm); + +#ifdef DEBUG_PRINTF + const double t1 = MPI_Wtime(); +#endif + + GraphWeight currMod = (e_a_xx[0] * constantForSecondTerm) - + (e_a_xx[1] * constantForSecondTerm * constantForSecondTerm); +#ifdef DEBUG_PRINTF + std::cout << "[" << me << "]le_xx: " << le_xx << ", la2_x: " << la2_x << std::endl; + std::cout << "[" << me << "]e_xx: " << e_a_xx[0] << ", a2_x: " << e_a_xx[1] << ", currMod: " << currMod << std::endl; + std::cout << "[" << me << "]Reduction time: " << (t1 - t0) << std::endl; +#endif + + return currMod; +} // distComputeModularity + +void distUpdateLocalCinfo(const GraphElem nv, Comm *localCinfo, const Comm *localCupdate) +{ +#if defined(OMP_GPU) +#pragma omp target teams distribute parallel for map(to \ + : localCupdate [0:nv]) \ + map(tofrom \ + : localCinfo [0:nv]) +#elif defined(OMP_SCHEDULE_RUNTIME) +#pragma omp for schedule(runtime) +#else +#pragma omp for schedule(static) +#endif + for (GraphElem i = 0L; i < nv; i++) { + localCinfo[i].size += localCupdate[i].size; + localCinfo[i].degree += localCupdate[i].degree; + } +} + +void distCleanCWandCU(const GraphElem nv, GraphWeight *clusterWeight, + Comm *localCupdate) +{ +#if defined(OMP_GPU) +#pragma omp target teams distribute parallel for map(from \ + : clusterWeight [0:nv], \ + localCupdate [0:nv]) +#elif defined(OMP_SCHEDULE_RUNTIME) +#pragma omp for schedule(runtime) +#else +#pragma omp for schedule(static) +#endif + for (GraphElem i = 0L; i < nv; i++) { + clusterWeight[i] = 0; + localCupdate[i].degree = 0; + localCupdate[i].size = 0; + } +} // distCleanCWandCU + +#if defined(USE_MPI_RMA) +void fillRemoteCommunities(const Graph &dg, const int me, const int nprocs, + const size_t &ssz, const size_t &rsz, const std::vector &ssizes, + const std::vector &rsizes, const std::vector &svdata, + const std::vector &rvdata, const std::vector &currComm, + const std::vector &localCinfo, std::map &remoteCinfo, + std::unordered_map &remoteComm, std::map &remoteCupdate, + const MPI_Win &commwin, const std::vector &disp) +#else +void fillRemoteCommunities(const Graph &dg, const int me, const int nprocs, + const size_t &ssz, const size_t &rsz, const std::vector &ssizes, + const std::vector &rsizes, const std::vector &svdata, + const std::vector &rvdata, const std::vector &currComm, + const std::vector &localCinfo, std::map &remoteCinfo, + std::unordered_map &remoteComm, std::map &remoteCupdate) +#endif +{ +#if defined(USE_MPI_RMA) + std::vector scdata(ssz); +#else + std::vector rcdata(rsz), scdata(ssz); +#endif + GraphElem spos, rpos; +#if defined(REPLACE_STL_UOSET_WITH_VECTOR) + std::vector< std::vector< GraphElem > > rcinfo(nprocs); +#else + std::vector > rcinfo(nprocs); +#endif + +#if defined(USE_MPI_SENDRECV) +#else + std::vector rreqs(nprocs), sreqs(nprocs); +#endif + +#ifdef DEBUG_PRINTF + double t0, t1, ta = 0.0; +#endif + + const GraphElem base = dg.get_base(me), bound = dg.get_bound(me); + const GraphElem nv = dg.get_lnv(); + MPI_Comm gcomm = dg.get_comm(); + + // Collects Communities of local vertices for remote nodes +#ifdef OMP_SCHEDULE_RUNTIME +#pragma omp parallel for shared(svdata, scdata, currComm) schedule(runtime) +#else +#pragma omp parallel for shared(svdata, scdata, currComm) schedule(static) +#endif + for (GraphElem i = 0; i < ssz; i++) { + const GraphElem vertex = svdata[i]; +#ifdef DEBUG_PRINTF + assert((vertex >= base) && (vertex < bound)); +#endif + const GraphElem comm = currComm[vertex - base]; + scdata[i] = comm; + } + + std::vector rcsizes(nprocs), scsizes(nprocs); + std::vector sinfo, rinfo; + +#ifdef DEBUG_PRINTF + t0 = MPI_Wtime(); +#endif + spos = 0; + rpos = 0; +#if defined(USE_MPI_COLLECTIVES) + std::vector scnts(nprocs), rcnts(nprocs), sdispls(nprocs), rdispls(nprocs); + for (int i = 0; i < nprocs; i++) { + scnts[i] = ssizes[i]; + rcnts[i] = rsizes[i]; + sdispls[i] = spos; + rdispls[i] = rpos; + spos += scnts[i]; + rpos += rcnts[i]; + } + scnts[me] = 0; + rcnts[me] = 0; + MPI_Alltoallv(scdata.data(), scnts.data(), sdispls.data(), + MPI_GRAPH_TYPE, rcdata.data(), rcnts.data(), rdispls.data(), + MPI_GRAPH_TYPE, gcomm); +#elif defined(USE_MPI_RMA) + for (int i = 0; i < nprocs; i++) { + if (i != me) { +#if defined(USE_MPI_ACCUMULATE) + MPI_Accumulate(scdata.data() + spos, ssizes[i], MPI_GRAPH_TYPE, i, + disp[i], ssizes[i], MPI_GRAPH_TYPE, MPI_REPLACE, commwin); +#else + MPI_Put(scdata.data() + spos, ssizes[i], MPI_GRAPH_TYPE, i, + disp[i], ssizes[i], MPI_GRAPH_TYPE, commwin); +#endif + } + spos += ssizes[i]; + rpos += rsizes[i]; + } +#elif defined(USE_MPI_SENDRECV) + for (int i = 0; i < nprocs; i++) { + if (i != me) + MPI_Sendrecv(scdata.data() + spos, ssizes[i], MPI_GRAPH_TYPE, i, CommunityTag, + rcdata.data() + rpos, rsizes[i], MPI_GRAPH_TYPE, i, CommunityTag, + gcomm, MPI_STATUSES_IGNORE); + + spos += ssizes[i]; + rpos += rsizes[i]; + } +#else + for (int i = 0; i < nprocs; i++) { + if (i != me) + MPI_Irecv(rcdata.data() + rpos, rsizes[i], MPI_GRAPH_TYPE, i, + CommunityTag, gcomm, &rreqs[i]); + else + rreqs[i] = MPI_REQUEST_NULL; + + rpos += rsizes[i]; + } + for (int i = 0; i < nprocs; i++) { + if (i != me) + MPI_Isend(scdata.data() + spos, ssizes[i], MPI_GRAPH_TYPE, i, + CommunityTag, gcomm, &sreqs[i]); + else + sreqs[i] = MPI_REQUEST_NULL; + + spos += ssizes[i]; + } + + MPI_Waitall(nprocs, sreqs.data(), MPI_STATUSES_IGNORE); + MPI_Waitall(nprocs, rreqs.data(), MPI_STATUSES_IGNORE); +#endif +#ifdef DEBUG_PRINTF + t1 = MPI_Wtime(); + ta += (t1 - t0); +#endif + + // reserve vectors +#if defined(REPLACE_STL_UOSET_WITH_VECTOR) + for (GraphElem i = 0; i < nprocs; i++) { + rcinfo[i].reserve(rpos); + } +#endif + + // fetch baseptr from MPI window +#if defined(USE_MPI_RMA) + MPI_Win_flush_all(commwin); + MPI_Barrier(gcomm); + + GraphElem *rcbuf = nullptr; + int flag = 0; + MPI_Win_get_attr(commwin, MPI_WIN_BASE, &rcbuf, &flag); +#endif + + remoteComm.clear(); + for (GraphElem i = 0; i < rpos; i++) { + +#if defined(USE_MPI_RMA) + const GraphElem comm = rcbuf[i]; +#else + const GraphElem comm = rcdata[i]; +#endif + + remoteComm.insert(std::unordered_map::value_type(rvdata[i], comm)); + const int tproc = dg.get_owner(comm); + + if (tproc != me) +#if defined(REPLACE_STL_UOSET_WITH_VECTOR) + rcinfo[tproc].emplace_back(comm); +#else + rcinfo[tproc].insert(comm); +#endif + } + + for (GraphElem i = 0; i < nv; i++) { + const GraphElem comm = currComm[i]; + const int tproc = dg.get_owner(comm); + + if (tproc != me) +#if defined(REPLACE_STL_UOSET_WITH_VECTOR) + rcinfo[tproc].emplace_back(comm); +#else + rcinfo[tproc].insert(comm); +#endif + } + +#ifdef DEBUG_PRINTF + t0 = MPI_Wtime(); +#endif + GraphElem stcsz = 0, rtcsz = 0; + +#ifdef OMP_SCHEDULE_RUNTIME +#pragma omp parallel for shared(scsizes, rcinfo) \ + reduction(+:stcsz) schedule(runtime) +#else +#pragma omp parallel for shared(scsizes, rcinfo) \ + reduction(+:stcsz) schedule(static) +#endif + for (int i = 0; i < nprocs; i++) { + scsizes[i] = rcinfo[i].size(); + stcsz += scsizes[i]; + } + + MPI_Alltoall(scsizes.data(), 1, MPI_GRAPH_TYPE, rcsizes.data(), + 1, MPI_GRAPH_TYPE, gcomm); + +#ifdef DEBUG_PRINTF + t1 = MPI_Wtime(); + ta += (t1 - t0); +#endif + +#ifdef OMP_SCHEDULE_RUNTIME +#pragma omp parallel for shared(rcsizes) \ + reduction(+:rtcsz) schedule(runtime) +#else +#pragma omp parallel for shared(rcsizes) \ + reduction(+:rtcsz) schedule(static) +#endif + for (int i = 0; i < nprocs; i++) { + rtcsz += rcsizes[i]; + } + +#ifdef DEBUG_PRINTF + std::cout << "[" << me << "]Total communities to receive: " << rtcsz << std::endl; +#endif +#if defined(USE_MPI_COLLECTIVES) + std::vector rcomms(rtcsz), scomms(stcsz); +#else +#if defined(REPLACE_STL_UOSET_WITH_VECTOR) + std::vector rcomms(rtcsz); +#else + std::vector rcomms(rtcsz), scomms(stcsz); +#endif +#endif + sinfo.resize(rtcsz); + rinfo.resize(stcsz); + +#ifdef DEBUG_PRINTF + t0 = MPI_Wtime(); +#endif + spos = 0; + rpos = 0; +#if defined(USE_MPI_COLLECTIVES) + for (int i = 0; i < nprocs; i++) { + if (i != me) { + std::copy(rcinfo[i].begin(), rcinfo[i].end(), scomms.data() + spos); + } + scnts[i] = scsizes[i]; + rcnts[i] = rcsizes[i]; + sdispls[i] = spos; + rdispls[i] = rpos; + spos += scnts[i]; + rpos += rcnts[i]; + } + scnts[me] = 0; + rcnts[me] = 0; + MPI_Alltoallv(scomms.data(), scnts.data(), sdispls.data(), + MPI_GRAPH_TYPE, rcomms.data(), rcnts.data(), rdispls.data(), + MPI_GRAPH_TYPE, gcomm); + + for (int i = 0; i < nprocs; i++) { + if (i != me) { +#ifdef OMP_SCHEDULE_RUNTIME +#pragma omp parallel for default(none), shared(rcsizes, rcomms, localCinfo, sinfo, rdispls), \ + firstprivate(i), schedule(runtime) /*, if(rcsizes[i] >= 1000) */ +#else +#pragma omp parallel for default(none), shared(rcsizes, rcomms, localCinfo, sinfo, rdispls), \ + firstprivate(i), schedule(guided) /*, if(rcsizes[i] >= 1000) */ +#endif + for (GraphElem j = 0; j < rcsizes[i]; j++) { + const GraphElem comm = rcomms[rdispls[i] + j]; + sinfo[rdispls[i] + j] = {comm, localCinfo[comm-base].size, localCinfo[comm-base].degree}; + } + } + } + + MPI_Alltoallv(sinfo.data(), rcnts.data(), rdispls.data(), + commType, rinfo.data(), scnts.data(), sdispls.data(), + commType, gcomm); +#else +#if !defined(USE_MPI_SENDRECV) + std::vector rcreqs(nprocs); +#endif + for (int i = 0; i < nprocs; i++) { + if (i != me) { +#if defined(USE_MPI_SENDRECV) +#if defined(REPLACE_STL_UOSET_WITH_VECTOR) + MPI_Sendrecv(rcinfo[i].data(), scsizes[i], MPI_GRAPH_TYPE, i, CommunityTag, + rcomms.data() + rpos, rcsizes[i], MPI_GRAPH_TYPE, i, CommunityTag, + gcomm, MPI_STATUSES_IGNORE); +#else + std::copy(rcinfo[i].begin(), rcinfo[i].end(), scomms.data() + spos); + MPI_Sendrecv(scomms.data() + spos, scsizes[i], MPI_GRAPH_TYPE, i, CommunityTag, + rcomms.data() + rpos, rcsizes[i], MPI_GRAPH_TYPE, i, CommunityTag, + gcomm, MPI_STATUSES_IGNORE); +#endif +#else + MPI_Irecv(rcomms.data() + rpos, rcsizes[i], MPI_GRAPH_TYPE, i, + CommunityTag, gcomm, &rreqs[i]); +#if defined(REPLACE_STL_UOSET_WITH_VECTOR) + MPI_Isend(rcinfo[i].data(), scsizes[i], MPI_GRAPH_TYPE, i, + CommunityTag, gcomm, &sreqs[i]); +#else + std::copy(rcinfo[i].begin(), rcinfo[i].end(), scomms.data() + spos); + MPI_Isend(scomms.data() + spos, scsizes[i], MPI_GRAPH_TYPE, i, + CommunityTag, gcomm, &sreqs[i]); +#endif +#endif + } + else { +#if !defined(USE_MPI_SENDRECV) + rreqs[i] = MPI_REQUEST_NULL; + sreqs[i] = MPI_REQUEST_NULL; +#endif + } + rpos += rcsizes[i]; + spos += scsizes[i]; + } + + spos = 0; + rpos = 0; + + // poke progress on last isend/irecvs +#if !defined(USE_MPI_COLLECTIVES) && !defined(USE_MPI_SENDRECV) && defined(POKE_PROGRESS_FOR_COMMUNITY_SENDRECV_IN_LOOP) + int tf = 0, id = 0; + MPI_Testany(nprocs, sreqs.data(), &id, &tf, MPI_STATUS_IGNORE); +#endif + +#if !defined(USE_MPI_COLLECTIVES) && !defined(USE_MPI_SENDRECV) && !defined(POKE_PROGRESS_FOR_COMMUNITY_SENDRECV_IN_LOOP) + MPI_Waitall(nprocs, sreqs.data(), MPI_STATUSES_IGNORE); + MPI_Waitall(nprocs, rreqs.data(), MPI_STATUSES_IGNORE); +#endif + + for (int i = 0; i < nprocs; i++) { + if (i != me) { +#if defined(USE_MPI_SENDRECV) +#ifdef OMP_SCHEDULE_RUNTIME +#pragma omp parallel for default(none), shared(rcsizes, rcomms, localCinfo, sinfo), \ + firstprivate(i, rpos), schedule(runtime) +#else +#pragma omp parallel for default(none), shared(rcsizes, rcomms, localCinfo, sinfo), \ + firstprivate(i, rpos), schedule(guided) +#endif + for (GraphElem j = 0; j < rcsizes[i]; j++) { + const GraphElem comm = rcomms[rpos + j]; + sinfo[rpos + j] = {comm, localCinfo[comm-base].size, localCinfo[comm-base].degree}; + } + + MPI_Sendrecv(sinfo.data() + rpos, rcsizes[i], commType, i, CommunityDataTag, + rinfo.data() + spos, scsizes[i], commType, i, CommunityDataTag, + gcomm, MPI_STATUSES_IGNORE); +#else + MPI_Irecv(rinfo.data() + spos, scsizes[i], commType, i, CommunityDataTag, + gcomm, &rcreqs[i]); + + // poke progress on last isend/irecvs +#if defined(POKE_PROGRESS_FOR_COMMUNITY_SENDRECV_IN_LOOP) + int flag = 0, done = 0; + while (!done) { + MPI_Test(&sreqs[i], &flag, MPI_STATUS_IGNORE); + MPI_Test(&rreqs[i], &flag, MPI_STATUS_IGNORE); + if (flag) + done = 1; + } +#endif + +#ifdef OMP_SCHEDULE_RUNTIME +#pragma omp parallel for default(none), shared(rcsizes, rcomms, localCinfo, sinfo), \ + firstprivate(i, rpos), schedule(runtime) +#else +#pragma omp parallel for default(none), shared(rcsizes, rcomms, localCinfo, sinfo), \ + firstprivate(i, rpos, base), schedule(guided) +#endif + for (GraphElem j = 0; j < rcsizes[i]; j++) { + const GraphElem comm = rcomms[rpos + j]; + sinfo[rpos + j] = {comm, localCinfo[comm-base].size, localCinfo[comm-base].degree}; + } + + MPI_Isend(sinfo.data() + rpos, rcsizes[i], commType, i, + CommunityDataTag, gcomm, &sreqs[i]); +#endif + } + else { +#if !defined(USE_MPI_SENDRECV) + rcreqs[i] = MPI_REQUEST_NULL; + sreqs[i] = MPI_REQUEST_NULL; +#endif + } + rpos += rcsizes[i]; + spos += scsizes[i]; + } + +#if !defined(USE_MPI_SENDRECV) + MPI_Waitall(nprocs, sreqs.data(), MPI_STATUSES_IGNORE); + MPI_Waitall(nprocs, rcreqs.data(), MPI_STATUSES_IGNORE); +#endif + +#endif + +#ifdef DEBUG_PRINTF + t1 = MPI_Wtime(); + ta += (t1 - t0); +#endif + + remoteCinfo.clear(); + remoteCupdate.clear(); + + for (GraphElem i = 0; i < stcsz; i++) { + const GraphElem ccomm = rinfo[i].community; + + Comm comm; + + comm.size = rinfo[i].size; + comm.degree = rinfo[i].degree; + + remoteCinfo.insert(std::map::value_type(ccomm, comm)); + remoteCupdate.insert(std::map::value_type(ccomm, Comm())); + } +} // end fillRemoteCommunities + +void createCommunityMPIType() +{ + CommInfo cinfo; + + MPI_Aint begin, community, size, degree; + + MPI_Get_address(&cinfo, &begin); + MPI_Get_address(&cinfo.community, &community); + MPI_Get_address(&cinfo.size, &size); + MPI_Get_address(&cinfo.degree, °ree); + + int blens[] = { 1, 1, 1 }; + MPI_Aint displ[] = { community - begin, size - begin, degree - begin }; + MPI_Datatype types[] = { MPI_GRAPH_TYPE, MPI_GRAPH_TYPE, MPI_WEIGHT_TYPE }; + + MPI_Type_create_struct(3, blens, displ, types, &commType); + MPI_Type_commit(&commType); +} // createCommunityMPIType + +void destroyCommunityMPIType() +{ + MPI_Type_free(&commType); +} // destroyCommunityMPIType + +void updateRemoteCommunities(const Graph &dg, std::vector &localCinfo, + const std::map &remoteCupdate, + const int me, const int nprocs) +{ + const GraphElem base = dg.get_base(me), bound = dg.get_bound(me); + std::vector> remoteArray(nprocs); + MPI_Comm gcomm = dg.get_comm(); + + // FIXME TODO can we use TBB::concurrent_vector instead, + // to make this parallel; first we have to get rid of maps + for (std::map::const_iterator iter = remoteCupdate.begin(); iter != remoteCupdate.end(); iter++) { + const GraphElem i = iter->first; + const Comm &curr = iter->second; + + const int tproc = dg.get_owner(i); + +#ifdef DEBUG_PRINTF + assert(tproc != me); +#endif + CommInfo rcinfo; + + rcinfo.community = i; + rcinfo.size = curr.size; + rcinfo.degree = curr.degree; + + remoteArray[tproc].push_back(rcinfo); + } + + std::vector send_sz(nprocs), recv_sz(nprocs); + +#ifdef DEBUG_PRINTF + GraphWeight tc = 0.0; + const double t0 = MPI_Wtime(); +#endif + +#ifdef OMP_SCHEDULE_RUNTIME +#pragma omp parallel for schedule(runtime) +#else +#pragma omp parallel for schedule(static) +#endif + for (int i = 0; i < nprocs; i++) { + send_sz[i] = remoteArray[i].size(); + } + + MPI_Alltoall(send_sz.data(), 1, MPI_GRAPH_TYPE, recv_sz.data(), + 1, MPI_GRAPH_TYPE, gcomm); + +#ifdef DEBUG_PRINTF + const double t1 = MPI_Wtime(); + tc += (t1 - t0); +#endif + + GraphElem rcnt = 0, scnt = 0; +#ifdef OMP_SCHEDULE_RUNTIME +#pragma omp parallel for shared(recv_sz, send_sz) \ + reduction(+:rcnt, scnt) schedule(runtime) +#else +#pragma omp parallel for shared(recv_sz, send_sz) \ + reduction(+:rcnt, scnt) schedule(static) +#endif + for (int i = 0; i < nprocs; i++) { + rcnt += recv_sz[i]; + scnt += send_sz[i]; + } +#ifdef DEBUG_PRINTF + std::cout << "[" << me << "]Total number of remote communities to update: " << scnt << std::endl; +#endif + + GraphElem currPos = 0; + std::vector rdata(rcnt); + +#ifdef DEBUG_PRINTF + const double t2 = MPI_Wtime(); +#endif +#if defined(USE_MPI_SENDRECV) + for (int i = 0; i < nprocs; i++) { + if (i != me) + MPI_Sendrecv(remoteArray[i].data(), send_sz[i], commType, i, CommunityDataTag, + rdata.data() + currPos, recv_sz[i], commType, i, CommunityDataTag, + gcomm, MPI_STATUSES_IGNORE); + + currPos += recv_sz[i]; + } +#else + std::vector sreqs(nprocs), rreqs(nprocs); + for (int i = 0; i < nprocs; i++) { + if (i != me) + MPI_Irecv(rdata.data() + currPos, recv_sz[i], commType, i, + CommunityDataTag, gcomm, &rreqs[i]); + else + rreqs[i] = MPI_REQUEST_NULL; + + currPos += recv_sz[i]; + } + + for (int i = 0; i < nprocs; i++) { + if (i != me) + MPI_Isend(remoteArray[i].data(), send_sz[i], commType, i, + CommunityDataTag, gcomm, &sreqs[i]); + else + sreqs[i] = MPI_REQUEST_NULL; + } + + MPI_Waitall(nprocs, sreqs.data(), MPI_STATUSES_IGNORE); + MPI_Waitall(nprocs, rreqs.data(), MPI_STATUSES_IGNORE); +#endif +#ifdef DEBUG_PRINTF + const double t3 = MPI_Wtime(); + std::cout << "[" << me << "]Update remote community MPI time: " << (t3 - t2) << std::endl; +#endif + +#ifdef OMP_SCHEDULE_RUNTIME +#pragma omp parallel for shared(rdata, localCinfo) schedule(runtime) +#else +#pragma omp parallel for shared(rdata, localCinfo) schedule(dynamic) +#endif + for (GraphElem i = 0; i < rcnt; i++) { + const CommInfo &curr = rdata[i]; + +#ifdef DEBUG_PRINTF + assert(dg.get_owner(curr.community) == me); +#endif + localCinfo[curr.community-base].size += curr.size; + localCinfo[curr.community-base].degree += curr.degree; + } +} // updateRemoteCommunities + +// initial setup before Louvain iteration begins +#if defined(USE_MPI_RMA) +void exchangeVertexReqs(const Graph &dg, size_t &ssz, size_t &rsz, + std::vector &ssizes, std::vector &rsizes, + std::vector &svdata, std::vector &rvdata, + const int me, const int nprocs, MPI_Win &commwin) +#else +void exchangeVertexReqs(const Graph &dg, size_t &ssz, size_t &rsz, + std::vector &ssizes, std::vector &rsizes, + std::vector &svdata, std::vector &rvdata, + const int me, const int nprocs) +#endif +{ + const GraphElem base = dg.get_base(me), bound = dg.get_bound(me); + const GraphElem nv = dg.get_lnv(); + MPI_Comm gcomm = dg.get_comm(); + +#ifdef USE_OPENMP_LOCK + std::vector locks(nprocs); + for (int i = 0; i < nprocs; i++) + omp_init_lock(&locks[i]); +#endif + std::vector> parray(nprocs); + +#ifdef USE_OPENMP_LOCK +#pragma omp parallel default(none), shared(dg, locks, parray) +#else +#pragma omp parallel default(none), shared(dg, parray) firstprivate(nv, me) +#endif + { +#ifdef OMP_SCHEDULE_RUNTIME +#pragma omp for schedule(runtime) +#else +#pragma omp for schedule(guided) +#endif + for (GraphElem i = 0; i < nv; i++) { + GraphElem e0, e1; + + dg.edge_range(i, e0, e1); + + for (GraphElem j = e0; j < e1; j++) { + const Edge &edge = dg.get_edge(j); + const int tproc = dg.get_owner(edge.tail_); + + if (tproc != me) { +#ifdef USE_OPENMP_LOCK + omp_set_lock(&locks[tproc]); +#else + lock(); +#endif + parray[tproc].insert(edge.tail_); +#ifdef USE_OPENMP_LOCK + omp_unset_lock(&locks[tproc]); +#else + unlock(); +#endif + } + } + } + } + +#ifdef USE_OPENMP_LOCK + for (int i = 0; i < nprocs; i++) { + omp_destroy_lock(&locks[i]); + } +#endif + + rsizes.resize(nprocs); + ssizes.resize(nprocs); + ssz = 0, rsz = 0; + + int pproc = 0; + // TODO FIXME parallelize this loop + for (std::vector>::const_iterator iter = parray.begin(); iter != parray.end(); iter++) { + ssz += iter->size(); + ssizes[pproc] = iter->size(); + pproc++; + } + + MPI_Alltoall(ssizes.data(), 1, MPI_GRAPH_TYPE, rsizes.data(), + 1, MPI_GRAPH_TYPE, gcomm); + + GraphElem rsz_r = 0; +#ifdef OMP_SCHEDULE_RUNTIME +#pragma omp parallel for shared(rsizes) \ + reduction(+:rsz_r) schedule(runtime) +#else +#pragma omp parallel for shared(rsizes) \ + reduction(+:rsz_r) schedule(static) +#endif + for (int i = 0; i < nprocs; i++) + rsz_r += rsizes[i]; + rsz = rsz_r; + + svdata.resize(ssz); + rvdata.resize(rsz); + + GraphElem cpos = 0, rpos = 0; + pproc = 0; + +#if defined(USE_MPI_COLLECTIVES) + std::vector scnts(nprocs), rcnts(nprocs), sdispls(nprocs), rdispls(nprocs); + + for (std::vector>::const_iterator iter = parray.begin(); iter != parray.end(); iter++) { + std::copy(iter->begin(), iter->end(), svdata.begin() + cpos); + + scnts[pproc] = iter->size(); + rcnts[pproc] = rsizes[pproc]; + sdispls[pproc] = cpos; + rdispls[pproc] = rpos; + cpos += iter->size(); + rpos += rcnts[pproc]; + + pproc++; + } + + scnts[me] = 0; + rcnts[me] = 0; + MPI_Alltoallv(svdata.data(), scnts.data(), sdispls.data(), + MPI_GRAPH_TYPE, rvdata.data(), rcnts.data(), rdispls.data(), + MPI_GRAPH_TYPE, gcomm); +#else + std::vector rreqs(nprocs), sreqs(nprocs); + for (int i = 0; i < nprocs; i++) { + if (i != me) + MPI_Irecv(rvdata.data() + rpos, rsizes[i], MPI_GRAPH_TYPE, i, + VertexTag, gcomm, &rreqs[i]); + else + rreqs[i] = MPI_REQUEST_NULL; + + rpos += rsizes[i]; + } + + for (std::vector>::const_iterator iter = parray.begin(); iter != parray.end(); iter++) { + std::copy(iter->begin(), iter->end(), svdata.begin() + cpos); + + if (me != pproc) + MPI_Isend(svdata.data() + cpos, iter->size(), MPI_GRAPH_TYPE, pproc, + VertexTag, gcomm, &sreqs[pproc]); + else + sreqs[pproc] = MPI_REQUEST_NULL; + + cpos += iter->size(); + pproc++; + } + + MPI_Waitall(nprocs, sreqs.data(), MPI_STATUSES_IGNORE); + MPI_Waitall(nprocs, rreqs.data(), MPI_STATUSES_IGNORE); +#endif + + std::swap(svdata, rvdata); + std::swap(ssizes, rsizes); + std::swap(ssz, rsz); + + // create MPI window for communities +#if defined(USE_MPI_RMA) + GraphElem *ptr = nullptr; + MPI_Info info = MPI_INFO_NULL; +#if defined(USE_MPI_ACCUMULATE) + MPI_Info_create(&info); + MPI_Info_set(info, "accumulate_ordering", "none"); + MPI_Info_set(info, "accumulate_ops", "same_op"); +#endif + MPI_Win_allocate(rsz*sizeof(GraphElem), sizeof(GraphElem), + info, gcomm, &ptr, &commwin); + MPI_Win_lock_all(MPI_MODE_NOCHECK, commwin); +#endif +} // exchangeVertexReqs + +#if defined(USE_MPI_RMA) +GraphWeight distLouvainMethod(const int me, const int nprocs, const Graph &dg, + size_t &ssz, size_t &rsz, std::vector &ssizes, std::vector &rsizes, + std::vector &svdata, std::vector &rvdata, const GraphWeight lower, + const GraphWeight thresh, int &iters, MPI_Win &commwin) +#else +GraphWeight distLouvainMethod(const int me, const int nprocs, const Graph &dg, + size_t &ssz, size_t &rsz, std::vector &ssizes, std::vector &rsizes, + std::vector &svdata, std::vector &rvdata, const GraphWeight lower, + const GraphWeight thresh, int &iters) +#endif +{ + std::vector pastComm, currComm, targetComm; + std::vector vDegree; + std::vector clusterWeight; + std::vector localCinfo, localCupdate; + + std::unordered_map remoteComm; + std::map remoteCinfo, remoteCupdate; + + const GraphElem nv = dg.get_lnv(); + MPI_Comm gcomm = dg.get_comm(); + + GraphWeight constantForSecondTerm; + GraphWeight prevMod = lower; + GraphWeight currMod = -1.0; + int numIters = 0; + + distInitLouvain(dg, pastComm, currComm, vDegree, clusterWeight, localCinfo, + localCupdate, constantForSecondTerm, me); + targetComm.resize(nv); + +#ifdef DEBUG_PRINTF + std::cout << "[" << me << "]constantForSecondTerm: " << constantForSecondTerm << std::endl; + if (me == 0) + std::cout << "Threshold: " << thresh << std::endl; +#endif + const GraphElem base = dg.get_base(me), bound = dg.get_bound(me); + +#ifdef DEBUG_PRINTF + double t0, t1; + t0 = MPI_Wtime(); +#endif + + // setup vertices and communities +#if defined(USE_MPI_RMA) + exchangeVertexReqs(dg, ssz, rsz, ssizes, rsizes, + svdata, rvdata, me, nprocs, commwin); + + // store the remote displacements + std::vector disp(nprocs); + MPI_Exscan(ssizes.data(), (GraphElem*)disp.data(), nprocs, MPI_GRAPH_TYPE, + MPI_SUM, gcomm); +#else + exchangeVertexReqs(dg, ssz, rsz, ssizes, rsizes, + svdata, rvdata, me, nprocs); +#endif + +#ifdef DEBUG_PRINTF + t1 = MPI_Wtime(); + std::cout << "[" << me << "]Initial communication setup time before Louvain iteration (in s): " << (t1 - t0) << std::endl; +#endif + +#ifdef OMP_GPU_ALLOC + GraphElem *d_edge_indices = (GraphElem *)omp_target_alloc( + (unsigned long long)dg.edge_indices_.size() * sizeof(GraphElem), -100); + memcpy(d_edge_indices, &dg.edge_indices_[0], dg.edge_indices_.size() * sizeof(GraphElem)); + GraphElem *d_parts = (GraphElem *)omp_target_alloc( + (unsigned long long)dg.parts_.size() * sizeof(GraphElem), -100); + memcpy(d_parts, &dg.parts_[0], dg.parts_.size() * sizeof(GraphElem)); + Edge *d_edge_list = (Edge *)omp_target_alloc( + (unsigned long long)dg.edge_list_.size() * sizeof(Edge), -100); + memcpy(d_edge_list, &dg.edge_list_[0], dg.edge_list_.size() * sizeof(Edge)); + GraphElem *d_currComm = (GraphElem *)omp_target_alloc( + (unsigned long long)nv * sizeof(GraphElem), -100); + memcpy(d_currComm, &currComm[0], nv * sizeof(GraphElem)); + GraphWeight *d_vDegree = (GraphWeight *)omp_target_alloc( + (unsigned long long)nv * sizeof(GraphWeight), -100); + memcpy(d_vDegree, &vDegree[0], nv * sizeof(GraphWeight)); + GraphElem *d_targetComm = (GraphElem *)omp_target_alloc( + (unsigned long long)nv * sizeof(GraphElem), -100); + memcpy(d_targetComm, &targetComm[0], nv * sizeof(GraphElem)); + Comm *d_localCinfo = + (Comm *)omp_target_alloc((unsigned long long)nv * sizeof(Comm), -100); + memcpy(d_localCinfo, &localCinfo[0], nv * sizeof(Comm)); + Comm *d_localCupdate = + (Comm *)omp_target_alloc((unsigned long long)nv * sizeof(Comm), -100); + memcpy(d_localCupdate, &localCupdate[0], nv * sizeof(Comm)); + GraphWeight *d_clusterWeight = (GraphWeight *)omp_target_alloc( + (unsigned long long)nv * sizeof(GraphWeight), -100); + memcpy(d_clusterWeight, &clusterWeight[0], nv * sizeof(GraphWeight)); +#else + const GraphElem *d_edge_indices = &dg.edge_indices_[0]; + d_edge_indices = (GraphElem *)omp_target_alloc((unsigned long long)d_edge_indices, -200); + const GraphElem *d_parts = &dg.parts_[0]; + d_parts = (GraphElem *)omp_target_alloc((unsigned long long)d_parts, -200); + const Edge *d_edge_list = &dg.edge_list_[0]; + d_edge_list = (Edge *)omp_target_alloc((unsigned long long)d_edge_list, -200); + GraphElem *d_currComm = &currComm[0]; + d_currComm = (GraphElem *)omp_target_alloc((unsigned long long)d_currComm, -200); + const GraphWeight *d_vDegree = &vDegree[0]; + d_vDegree = (GraphWeight *)omp_target_alloc((unsigned long long)d_vDegree, -200); + GraphElem *d_targetComm = &targetComm[0]; + d_targetComm = (GraphElem *)omp_target_alloc((unsigned long long)d_targetComm, -200); + Comm *d_localCinfo = &localCinfo[0]; + d_localCinfo = (Comm *)omp_target_alloc((unsigned long long)d_localCinfo, -200); + Comm *d_localCupdate = &localCupdate[0]; + d_localCupdate = (Comm *)omp_target_alloc((unsigned long long)d_localCupdate, -200); + GraphWeight *d_clusterWeight = &clusterWeight[0]; + d_clusterWeight = (GraphWeight *)omp_target_alloc((unsigned long long)d_clusterWeight, -200); +#endif + + double size = sizeof(GraphElem) * + (dg.edge_indices_.size() + dg.parts_.size() + 2 * nv) + + sizeof(Edge) * dg.edge_list_.size() + + sizeof(GraphWeight) * 2 * nv + sizeof(Comm) * 2 * nv; + double t_start = omp_get_wtime(); + + // start Louvain iteration + //while(true) { + while(numIters < 2) { +#ifdef DEBUG_PRINTF + double t2 = omp_get_wtime(); + if (me == 0) + std::cout << "Starting Louvain iteration: " << numIters << std::endl; +#endif + numIters++; + +#ifdef DEBUG_PRINTF + t0 = MPI_Wtime(); +#endif + +#if defined(USE_MPI_RMA) + fillRemoteCommunities(dg, me, nprocs, ssz, rsz, ssizes, + rsizes, svdata, rvdata, currComm, localCinfo, + remoteCinfo, remoteComm, remoteCupdate, + commwin, disp); +#else + fillRemoteCommunities(dg, me, nprocs, ssz, rsz, ssizes, + rsizes, svdata, rvdata, currComm, localCinfo, + remoteCinfo, remoteComm, remoteCupdate); +#endif + +#ifdef DEBUG_PRINTF + t1 = MPI_Wtime(); + std::cout << "[" << me << "]Remote community map size: " << remoteComm.size() << std::endl; + std::cout << "[" << me << "]Iteration communication time: " << (t1 - t0) << std::endl; +#endif + +#ifdef DEBUG_PRINTF + t0 = MPI_Wtime(); +#endif + +#if defined(OMP_GPU) +#else +#pragma omp parallel default(none), shared(clusterWeight, localCupdate, currComm, targetComm, \ + vDegree, localCinfo, remoteComm, pastComm, dg), \ + firstprivate(constantForSecondTerm) +#endif + { + distCleanCWandCU(nv, d_clusterWeight, d_localCupdate); + +#if defined(OMP_GPU) +#pragma omp target teams distribute parallel for map( \ + to \ + : d_edge_indices [0:dg.edge_indices_.size()], \ + d_parts [0:dg.parts_.size()], d_edge_list [0:dg.edge_list_.size()], \ + d_currComm [0:nv], d_vDegree [0:nv], d_localCinfo [0:nv]) \ + map(from \ + : d_targetComm [0:nv]) \ + map(tofrom \ + : d_localCupdate [0:nv], d_clusterWeight [0:nv]) +#elif defined(OMP_SCHEDULE_RUNTIME) +#pragma omp for schedule(runtime) +#else +#pragma omp for schedule(guided) +#endif + for (GraphElem i = 0; i < nv; i++) { + distExecuteLouvainIteration(i, d_edge_indices, d_parts, d_edge_list, d_currComm, d_targetComm, d_vDegree, d_localCinfo, + d_localCupdate, constantForSecondTerm, d_clusterWeight, me); + } + } + +#if defined(OMP_GPU) +#else +#pragma omp parallel default(none), shared(localCinfo, localCupdate) +#endif + { + distUpdateLocalCinfo(nv, d_localCinfo, d_localCupdate); + } + + // communicate remote communities + updateRemoteCommunities(dg, localCinfo, remoteCupdate, me, nprocs); + + // compute modularity + currMod = distComputeModularity(dg, d_localCinfo, d_clusterWeight, constantForSecondTerm, me); + + // exit criteria + if (currMod - prevMod < thresh) + break; + + prevMod = currMod; + if (prevMod < lower) + prevMod = lower; + +#ifdef OMP_SCHEDULE_RUNTIME +#pragma omp parallel for default(none) \ + shared(pastComm, d_currComm, d_targetComm) \ + schedule(runtime) +#else +#pragma omp parallel for default(none) \ + shared(pastComm, d_currComm, d_targetComm) firstprivate(nv) \ + schedule(static) +#endif + for (GraphElem i = 0; i < nv; i++) { + GraphElem tmp = pastComm[i]; + pastComm[i] = d_currComm[i]; + d_currComm[i] = d_targetComm[i]; + d_targetComm[i] = tmp; + } + } // end of Louvain iteration + std::cout << "Total size: " << size / 1024 / 1024 / 1024 << std::endl; + std::cout << "Time: " << omp_get_wtime() - t_start << std::endl; + +#if defined(USE_MPI_RMA) + MPI_Win_unlock_all(commwin); + MPI_Win_free(&commwin); +#endif + + iters = numIters; + + vDegree.clear(); + pastComm.clear(); + currComm.clear(); + targetComm.clear(); + clusterWeight.clear(); + localCinfo.clear(); + localCupdate.clear(); + + return prevMod; +} // distLouvainMethod plain + +#endif // __DSPL diff --git a/miniVite/err b/miniVite/err new file mode 100644 index 0000000..16184dd --- /dev/null +++ b/miniVite/err @@ -0,0 +1,8 @@ +mpicxx main.o -O3 -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -DOMP_GPU -DCHECK_NUM_EDGES -DDEBUG_PRINTF -o miniVite +nvlink error : Undefined reference to '_ZdlPv' in '/tmp/main-f4854c.cubin' +nvlink error : Undefined reference to '_ZSt17__throw_bad_allocv' in '/tmp/main-f4854c.cubin' +nvlink error : Undefined reference to '_Znwm' in '/tmp/main-f4854c.cubin' +nvlink error : Undefined reference to '_ZNKSt8__detail20_Prime_rehash_policy14_M_need_rehashEmmm' in '/tmp/main-f4854c.cubin' +nvlink error : Undefined reference to '__assert_fail' in '/tmp/main-f4854c.cubin' +nvlink error : Undefined reference to '_ZNKSt8__detail20_Prime_rehash_policy11_M_next_bktEm' in '/tmp/main-f4854c.cubin' +clang-9: error: nvlink command failed with exit code 255 (use -v to see invocation) diff --git a/miniVite/graph.hpp b/miniVite/graph.hpp new file mode 100644 index 0000000..ad8631d --- /dev/null +++ b/miniVite/graph.hpp @@ -0,0 +1,1053 @@ +// *********************************************************************** +// +// miniVite +// +// *********************************************************************** +// +// Copyright (2018) Battelle Memorial Institute +// All rights reserved. +// +// Redistribution and use in source and binary forms, with or without +// modification, are permitted provided that the following conditions +// are met: +// +// 1. Redistributions of source code must retain the above copyright +// notice, this list of conditions and the following disclaimer. +// +// 2. Redistributions in binary form must reproduce the above copyright +// notice, this list of conditions and the following disclaimer in the +// documentation and/or other materials provided with the distribution. +// +// 3. Neither the name of the copyright holder nor the names of its +// contributors may be used to endorse or promote products derived from +// this software without specific prior written permission. +// +// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS +// FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE +// COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, +// INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, +// BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; +// LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER +// CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +// LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN +// ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +// POSSIBILITY OF SUCH DAMAGE. +// +// ************************************************************************ + +#pragma once +#ifndef GRAPH_HPP +#define GRAPH_HPP + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +#include "utils.hpp" + +unsigned seed; + +struct Edge +{ + GraphElem tail_; + GraphWeight weight_; + + Edge(): tail_(-1), weight_(0.0) {} +}; + +struct EdgeTuple +{ + GraphElem ij_[2]; + GraphWeight w_; + + EdgeTuple(GraphElem i, GraphElem j, GraphWeight w): + ij_{i, j}, w_(w) + {} + EdgeTuple(GraphElem i, GraphElem j): + ij_{i, j}, w_(1.0) + {} + EdgeTuple(): + ij_{-1, -1}, w_(0.0) + {} +}; + +// per process graph instance +class Graph +{ + public: + Graph(): + lnv_(-1), lne_(-1), nv_(-1), + ne_(-1), comm_(MPI_COMM_WORLD) + { + MPI_Comm_size(comm_, &size_); + MPI_Comm_rank(comm_, &rank_); + } + + Graph(GraphElem lnv, GraphElem lne, + GraphElem nv, GraphElem ne, + MPI_Comm comm=MPI_COMM_WORLD): + lnv_(lnv), lne_(lne), + nv_(nv), ne_(ne), + comm_(comm) + { + MPI_Comm_size(comm_, &size_); + MPI_Comm_rank(comm_, &rank_); + + edge_indices_.resize(lnv_+1, 0); + edge_list_.resize(lne_); // this is usually populated later + + parts_.resize(size_+1); + parts_[0] = 0; + + for (GraphElem i = 1; i < size_+1; i++) + parts_[i]=((nv_ * i) / size_); + } + + ~Graph() + { + edge_list_.clear(); + edge_indices_.clear(); + parts_.clear(); + } + + // TODO FIXME put asserts like the following + // everywhere function member of Graph class + void set_edge_index(GraphElem const vertex, GraphElem const e0) + { +#if defined(DEBUG_BUILD) + assert((vertex >= 0) && (vertex <= lnv_)); + assert((e0 >= 0) && (e0 <= lne_)); + edge_indices_.at(vertex) = e0; +#else + edge_indices_[vertex] = e0; +#endif + } + + void edge_range(GraphElem const vertex, GraphElem& e0, + GraphElem& e1) const + { + e0 = edge_indices_[vertex]; + e1 = edge_indices_[vertex+1]; + } + + // collective + void set_nedges(GraphElem lne) + { + lne_ = lne; + edge_list_.resize(lne_); + + // compute total number of edges + ne_ = 0; + MPI_Allreduce(&lne_, &ne_, 1, MPI_GRAPH_TYPE, MPI_SUM, comm_); + } + + GraphElem get_base(const int rank) const + { return parts_[rank]; } + + GraphElem get_bound(const int rank) const + { return parts_[rank+1]; } + + GraphElem get_range(const int rank) const + { return (parts_[rank+1] - parts_[rank] + 1); } + + int get_owner(const GraphElem vertex) const + { + const std::vector::const_iterator iter = + std::upper_bound(parts_.begin(), parts_.end(), vertex); + + return (iter - parts_.begin() - 1); + } + + GraphElem get_lnv() const { return lnv_; } + GraphElem get_lne() const { return lne_; } + GraphElem get_nv() const { return nv_; } + GraphElem get_ne() const { return ne_; } + MPI_Comm get_comm() const { return comm_; } + + // return edge and active info + // ---------------------------- + + Edge const& get_edge(GraphElem const index) const + { return edge_list_[index]; } + + Edge& set_edge(GraphElem const index) + { return edge_list_[index]; } + + // local <--> global index translation + // ----------------------------------- + GraphElem local_to_global(GraphElem idx) + { return (idx + get_base(rank_)); } + + GraphElem global_to_local(GraphElem idx) + { return (idx - get_base(rank_)); } + + // w.r.t passed rank + GraphElem local_to_global(GraphElem idx, int rank) + { return (idx + get_base(rank)); } + + GraphElem global_to_local(GraphElem idx, int rank) + { return (idx - get_base(rank)); } + + // print edge list (with weights) + void print(bool print_weight = true) const + { + if (lne_ < MAX_PRINT_NEDGE) + { + for (int p = 0; p < size_; p++) + { + MPI_Barrier(comm_); + if (p == rank_) + { + std::cout << "###############" << std::endl; + std::cout << "Process #" << p << ": " << std::endl; + std::cout << "###############" << std::endl; + GraphElem base = get_base(p); + for (GraphElem i = 0; i < lnv_; i++) + { + GraphElem e0, e1; + edge_range(i, e0, e1); + if (print_weight) { // print weights (default) + for (GraphElem e = e0; e < e1; e++) + { + Edge const& edge = get_edge(e); + std::cout << i+base << " " << edge.tail_ << " " << edge.weight_ << std::endl; + } + } + else { // don't print weights + for (GraphElem e = e0; e < e1; e++) + { + Edge const& edge = get_edge(e); + std::cout << i+base << " " << edge.tail_ << std::endl; + } + } + } + MPI_Barrier(comm_); + } + } + } + else + { + if (rank_ == 0) + std::cout << "Graph size per process is {" << lnv_ << ", " << lne_ << + "}, which will overwhelm STDOUT." << std::endl; + } + } + + // print statistics about edge distribution + void print_dist_stats() + { + GraphElem sumdeg = 0, maxdeg = 0; + + MPI_Reduce(&lne_, &sumdeg, 1, MPI_GRAPH_TYPE, MPI_SUM, 0, comm_); + MPI_Reduce(&lne_, &maxdeg, 1, MPI_GRAPH_TYPE, MPI_MAX, 0, comm_); + + GraphElem my_sq = lne_*lne_; + GraphElem sum_sq = 0; + MPI_Reduce(&my_sq, &sum_sq, 1, MPI_GRAPH_TYPE, MPI_SUM, 0, comm_); + + GraphWeight average = (GraphWeight) sumdeg / size_; + GraphWeight avg_sq = (GraphWeight) sum_sq / size_; + GraphWeight var = avg_sq - (average*average); + GraphWeight stddev = sqrt(var); + + MPI_Barrier(comm_); + + if (rank_ == 0) + { + std::cout << std::endl; + std::cout << "-------------------------------------------------------" << std::endl; + std::cout << "Graph edge distribution characteristics" << std::endl; + std::cout << "-------------------------------------------------------" << std::endl; + std::cout << "Number of vertices: " << nv_ << std::endl; + std::cout << "Number of edges: " << ne_ << std::endl; + std::cout << "Maximum number of edges: " << maxdeg << std::endl; + std::cout << "Average number of edges: " << average << std::endl; + std::cout << "Expected value of X^2: " << avg_sq << std::endl; + std::cout << "Variance: " << var << std::endl; + std::cout << "Standard deviation: " << stddev << std::endl; + std::cout << "-------------------------------------------------------" << std::endl; + + } + } + + // public variables + std::vector edge_indices_; + std::vector edge_list_; + GraphElem lnv_, lne_, nv_, ne_; + std::vector parts_; + + MPI_Comm comm_; + int rank_, size_; + private: +}; + +// read in binary edge list files +// using MPI I/O +class BinaryEdgeList +{ + public: + BinaryEdgeList() : + M_(-1), N_(-1), + M_local_(-1), N_local_(-1), + comm_(MPI_COMM_WORLD) + {} + BinaryEdgeList(MPI_Comm comm) : + M_(-1), N_(-1), + M_local_(-1), N_local_(-1), + comm_(comm) + {} + + // the input binary file will be sorted by + // vertices + // read a file and return a graph + Graph* read(int me, int nprocs, int ranks_per_node, std::string file) + { + int file_open_error; + MPI_File fh; + MPI_Status status; + + // specify the number of aggregates + MPI_Info info; + MPI_Info_create(&info); + int naggr = (ranks_per_node > 1) ? (nprocs/ranks_per_node) : ranks_per_node; + if (naggr >= nprocs) + naggr = 1; + std::stringstream tmp_str; + tmp_str << naggr; + std::string str = tmp_str.str(); + MPI_Info_set(info, "cb_nodes", str.c_str()); + + file_open_error = MPI_File_open(comm_, file.c_str(), MPI_MODE_RDONLY, info, &fh); + MPI_Info_free(&info); + + if (file_open_error != MPI_SUCCESS) + { + std::cout << " Error opening file! " << std::endl; + MPI_Abort(comm_, -99); + } + + // read the dimensions + MPI_File_read_all(fh, &M_, sizeof(GraphElem), MPI_BYTE, &status); + MPI_File_read_all(fh, &N_, sizeof(GraphElem), MPI_BYTE, &status); + M_local_ = ((M_*(me + 1)) / nprocs) - ((M_*me) / nprocs); + + // create local graph + Graph *g = new Graph(M_local_, 0, M_, N_); + + // Let N = array length and P = number of processors. + // From j = 0 to P-1, + // Starting point of array on processor j = floor(N * j / P) + // Length of array on processor j = floor(N * (j + 1) / P) - floor(N * j / P) + + uint64_t tot_bytes=(M_local_+1)*sizeof(GraphElem); + MPI_Offset offset = 2*sizeof(GraphElem) + ((M_*me) / nprocs)*sizeof(GraphElem); + + // read in INT_MAX increments if total byte size is > INT_MAX + + if (tot_bytes < INT_MAX) + MPI_File_read_at(fh, offset, &g->edge_indices_[0], tot_bytes, MPI_BYTE, &status); + else + { + int chunk_bytes=INT_MAX; + uint8_t *curr_pointer = (uint8_t*) &g->edge_indices_[0]; + uint64_t transf_bytes = 0; + + while (transf_bytes < tot_bytes) + { + MPI_File_read_at(fh, offset, curr_pointer, chunk_bytes, MPI_BYTE, &status); + transf_bytes += chunk_bytes; + offset += chunk_bytes; + curr_pointer += chunk_bytes; + + if ((tot_bytes - transf_bytes) < INT_MAX) + chunk_bytes = tot_bytes - transf_bytes; + } + } + + N_local_ = g->edge_indices_[M_local_] - g->edge_indices_[0]; + g->set_nedges(N_local_); + + tot_bytes = N_local_*(sizeof(Edge)); + offset = 2*sizeof(GraphElem) + (M_+1)*sizeof(GraphElem) + g->edge_indices_[0]*(sizeof(Edge)); + + if (tot_bytes < INT_MAX) + MPI_File_read_at(fh, offset, &g->edge_list_[0], tot_bytes, MPI_BYTE, &status); + else + { + int chunk_bytes=INT_MAX; + uint8_t *curr_pointer = (uint8_t*)&g->edge_list_[0]; + uint64_t transf_bytes = 0; + + while (transf_bytes < tot_bytes) + { + MPI_File_read_at(fh, offset, curr_pointer, chunk_bytes, MPI_BYTE, &status); + transf_bytes += chunk_bytes; + offset += chunk_bytes; + curr_pointer += chunk_bytes; + + if ((tot_bytes - transf_bytes) < INT_MAX) + chunk_bytes = (tot_bytes - transf_bytes); + } + } + + MPI_File_close(&fh); + + for(GraphElem i=1; i < M_local_+1; i++) + g->edge_indices_[i] -= g->edge_indices_[0]; + g->edge_indices_[0] = 0; + + return g; + } + private: + GraphElem M_; + GraphElem N_; + GraphElem M_local_; + GraphElem N_local_; + MPI_Comm comm_; +}; + +// RGG graph +// 1D vertex distribution +class GenerateRGG +{ + public: + GenerateRGG(GraphElem nv, MPI_Comm comm = MPI_COMM_WORLD) + { + nv_ = nv; + comm_ = comm; + + MPI_Comm_rank(comm_, &rank_); + MPI_Comm_size(comm_, &nprocs_); + + // neighbors + up_ = down_ = MPI_PROC_NULL; + if (nprocs_ > 1) { + if (rank_ > 0 && rank_ < (nprocs_ - 1)) { + up_ = rank_ - 1; + down_ = rank_ + 1; + } + if (rank_ == 0) + down_ = 1; + if (rank_ == (nprocs_ - 1)) + up_ = rank_ - 1; + } + + n_ = nv_ / nprocs_; + + // check if number of nodes is divisible by #processes + if ((nv_ % nprocs_) != 0) { + if (rank_ == 0) { + std::cout << "[ERROR] Number of vertices must be perfectly divisible by number of processes." << std::endl; + std::cout << "Exiting..." << std::endl; + } + MPI_Abort(comm_, -99); + } + + // check if processes are power of 2 + if (!is_pwr2(nprocs_)) { + if (rank_ == 0) { + std::cout << "[ERROR] Number of processes must be a power of 2." << std::endl; + std::cout << "Exiting..." << std::endl; + } + MPI_Abort(comm_, -99); + } + + // calculate r(n) + GraphWeight rc = sqrt((GraphWeight)log(nv)/(GraphWeight)(PI*nv)); + GraphWeight rt = sqrt((GraphWeight)2.0736/(GraphWeight)nv); + rn_ = (rc + rt)/(GraphWeight)2.0; + + assert(((GraphWeight)1.0/(GraphWeight)nprocs_) > rn_); + + MPI_Barrier(comm_); + } + + // create RGG and returns Graph + // TODO FIXME use OpenMP wherever possible + // use Euclidean distance as edge weight + // for random edges, choose from (0,1) + // otherwise, use unit weight throughout + Graph* generate(bool isLCG, bool unitEdgeWeight = true, int randomEdgePercent = 0) + { + // Generate random coordinate points + std::vector X, Y, X_up, Y_up, X_down, Y_down; + + if (isLCG) + X.resize(2*n_); + else + X.resize(n_); + + Y.resize(n_); + + if (up_ != MPI_PROC_NULL) { + X_up.resize(n_); + Y_up.resize(n_); + } + + if (down_ != MPI_PROC_NULL) { + X_down.resize(n_); + Y_down.resize(n_); + } + + // create local graph + Graph *g = new Graph(n_, 0, nv_, nv_); + + // generate random number within range + // X: 0, 1 + // Y: rank_*1/p, (rank_+1)*1/p, + GraphWeight rec_np = (GraphWeight)(1.0/(GraphWeight)nprocs_); + GraphWeight lo = rank_* rec_np; + GraphWeight hi = lo + rec_np; + assert(hi > lo); + + // measure the time to generate random numbers + MPI_Barrier(MPI_COMM_WORLD); + double st = MPI_Wtime(); + + if (!isLCG) { + // set seed (declared an extern in utils) + seed = (unsigned)reseeder(1); + +#if defined(PRINT_RANDOM_XY_COORD) + for (int k = 0; k < nprocs_; k++) { + if (k == rank_) { + std::cout << "Random number generated on Process#" << k << " :" << std::endl; + for (GraphElem i = 0; i < n_; i++) { + X[i] = genRandom(0.0, 1.0); + Y[i] = genRandom(lo, hi); + std::cout << "X, Y: " << X[i] << ", " << Y[i] << std::endl; + } + } + MPI_Barrier(comm_); + } +#else + for (GraphElem i = 0; i < n_; i++) { + //X[i] = genRandom(0.0, 1.0); + //Y[i] = genRandom(lo, hi); + X[i] = 1.0 * i / n_; + Y[i] = lo + (hi - lo) * i / n_; + } +#endif + } + else { // LCG + // X | Y + // e.g seeds: 1741, 3821 + // create LCG object + // seed to generate x0 + LCG xr(/*seed*/1, X.data(), 2*n_, comm_); + + // generate random numbers between 0-1 + xr.generate(); + + // rescale xr further between lo-hi + // and put the numbers in Y taking + // from X[n] + xr.rescale(Y.data(), n_, lo); + +#if defined(PRINT_RANDOM_XY_COORD) + for (int k = 0; k < nprocs_; k++) { + if (k == rank_) { + std::cout << "Random number generated on Process#" << k << " :" << std::endl; + for (GraphElem i = 0; i < n_; i++) { + std::cout << "X, Y: " << X[i] << ", " << Y[i] << std::endl; + } + } + MPI_Barrier(comm_); + } +#endif + } + + double et = MPI_Wtime(); + double tt = et - st; + double tot_tt = 0.0; + MPI_Reduce(&tt, &tot_tt, 1, MPI_DOUBLE, MPI_SUM, 0, comm_); + + if (rank_ == 0) { + double tot_avg = (tot_tt/nprocs_); + std::cout << "Average time to generate " << 2*n_ + << " random numbers using LCG (in s): " + << tot_avg << std::endl; + } + + // ghost(s) + + // cross edges, each processor + // communicates with up or/and down + // neighbor only + std::vector sendup_edges, senddn_edges; + std::vector recvup_edges, recvdn_edges; + std::vector edgeList; + + // counts, indexing: [2] = {up - 0, down - 1} + // TODO can't we use MPI_INT + std::array send_sizes = {0, 0}, recv_sizes = {0, 0}; +#if defined(CHECK_NUM_EDGES) + GraphElem numEdges = 0; +#endif + // local + for (GraphElem i = 0; i < n_; i++) { + //for (GraphElem j = i + 1; j < n_; j++) { + for (GraphElem j = i + 1; j < n_ && j < i + 10; j++) { + // euclidean distance: + // 2D: sqrt((px-qx)^2 + (py-qy)^2) + GraphWeight dx = X[i] - X[j]; + GraphWeight dy = Y[i] - Y[j]; + GraphWeight ed = sqrt(dx*dx + dy*dy); + // are the two vertices within the range? + if (ed <= rn_) { + // local to global index + const GraphElem g_i = g->local_to_global(i); + const GraphElem g_j = g->local_to_global(j); + + if (!unitEdgeWeight) { + edgeList.emplace_back(i, g_j, ed); + edgeList.emplace_back(j, g_i, ed); + } + else { + edgeList.emplace_back(i, g_j); + edgeList.emplace_back(j, g_i); + } +#if defined(CHECK_NUM_EDGES) + numEdges += 2; +#endif + + g->edge_indices_[i+1]++; + g->edge_indices_[j+1]++; + } + } + } + + MPI_Barrier(comm_); + + // communicate ghost coordinates with neighbors + + const int x_ndown = X_down.empty() ? 0 : n_; + const int y_ndown = Y_down.empty() ? 0 : n_; + const int x_nup = X_up.empty() ? 0 : n_; + const int y_nup = Y_up.empty() ? 0 : n_; + + MPI_Sendrecv(X.data(), n_, MPI_WEIGHT_TYPE, up_, SR_X_UP_TAG, + X_down.data(), x_ndown, MPI_WEIGHT_TYPE, down_, SR_X_UP_TAG, + comm_, MPI_STATUS_IGNORE); + MPI_Sendrecv(X.data(), n_, MPI_WEIGHT_TYPE, down_, SR_X_DOWN_TAG, + X_up.data(), x_nup, MPI_WEIGHT_TYPE, up_, SR_X_DOWN_TAG, + comm_, MPI_STATUS_IGNORE); + MPI_Sendrecv(Y.data(), n_, MPI_WEIGHT_TYPE, up_, SR_Y_UP_TAG, + Y_down.data(), y_ndown, MPI_WEIGHT_TYPE, down_, SR_Y_UP_TAG, + comm_, MPI_STATUS_IGNORE); + MPI_Sendrecv(Y.data(), n_, MPI_WEIGHT_TYPE, down_, SR_Y_DOWN_TAG, + Y_up.data(), y_nup, MPI_WEIGHT_TYPE, up_, SR_Y_DOWN_TAG, + comm_, MPI_STATUS_IGNORE); + + // exchange ghost vertices / cross edges + if (nprocs_ > 1) { + if (up_ != MPI_PROC_NULL) { + + for (GraphElem i = 0; i < n_; i++) { + for (GraphElem j = i + 1; j < n_; j++) { + GraphWeight dx = X[i] - X_up[j]; + GraphWeight dy = Y[i] - Y_up[j]; + GraphWeight ed = sqrt(dx*dx + dy*dy); + + if (ed <= rn_) { + const GraphElem g_i = g->local_to_global(i); + const GraphElem g_j = j + up_*n_; + + if (!unitEdgeWeight) { + sendup_edges.emplace_back(j, g_i, ed); + edgeList.emplace_back(i, g_j, ed); + } + else { + sendup_edges.emplace_back(j, g_i); + edgeList.emplace_back(i, g_j); + } +#if defined(CHECK_NUM_EDGES) + numEdges++; +#endif + g->edge_indices_[i+1]++; + } + } + } + + // send up sizes + send_sizes[0] = sendup_edges.size(); + } + + if (down_ != MPI_PROC_NULL) { + + for (GraphElem i = 0; i < n_; i++) { + for (GraphElem j = i + 1; j < n_; j++) { + GraphWeight dx = X[i] - X_down[j]; + GraphWeight dy = Y[i] - Y_down[j]; + GraphWeight ed = sqrt(dx*dx + dy*dy); + + if (ed <= rn_) { + const GraphElem g_i = g->local_to_global(i); + const GraphElem g_j = j + down_*n_; + + if (!unitEdgeWeight) { + senddn_edges.emplace_back(j, g_i, ed); + edgeList.emplace_back(i, g_j, ed); + } + else { + senddn_edges.emplace_back(j, g_i); + edgeList.emplace_back(i, g_j); + } +#if defined(CHECK_NUM_EDGES) + numEdges++; +#endif + g->edge_indices_[i+1]++; + } + } + } + + // send down sizes + send_sizes[1] = senddn_edges.size(); + } + } + + MPI_Barrier(comm_); + + // communicate ghost vertices with neighbors + // send/recv buffer sizes + + MPI_Sendrecv(&send_sizes[0], 1, MPI_GRAPH_TYPE, up_, SR_SIZES_UP_TAG, + &recv_sizes[1], 1, MPI_GRAPH_TYPE, down_, SR_SIZES_UP_TAG, + comm_, MPI_STATUS_IGNORE); + MPI_Sendrecv(&send_sizes[1], 1, MPI_GRAPH_TYPE, down_, SR_SIZES_DOWN_TAG, + &recv_sizes[0], 1, MPI_GRAPH_TYPE, up_, SR_SIZES_DOWN_TAG, + comm_, MPI_STATUS_IGNORE); + + // resize recv buffers + + if (recv_sizes[0] > 0) + recvup_edges.resize(recv_sizes[0]); + if (recv_sizes[1] > 0) + recvdn_edges.resize(recv_sizes[1]); + + // send/recv both up and down + + MPI_Sendrecv(sendup_edges.data(), send_sizes[0]*sizeof(struct EdgeTuple), MPI_BYTE, + up_, SR_UP_TAG, recvdn_edges.data(), recv_sizes[1]*sizeof(struct EdgeTuple), + MPI_BYTE, down_, SR_UP_TAG, comm_, MPI_STATUS_IGNORE); + MPI_Sendrecv(senddn_edges.data(), send_sizes[1]*sizeof(struct EdgeTuple), MPI_BYTE, + down_, SR_DOWN_TAG, recvup_edges.data(), recv_sizes[0]*sizeof(struct EdgeTuple), + MPI_BYTE, up_, SR_DOWN_TAG, comm_, MPI_STATUS_IGNORE); + + // update local #edges + + // down + if (down_ != MPI_PROC_NULL) { + for (GraphElem i = 0; i < recv_sizes[1]; i++) { +#if defined(CHECK_NUM_EDGES) + numEdges++; +#endif + if (!unitEdgeWeight) + edgeList.emplace_back(recvdn_edges[i].ij_[0], recvdn_edges[i].ij_[1], recvdn_edges[i].w_); + else + edgeList.emplace_back(recvdn_edges[i].ij_[0], recvdn_edges[i].ij_[1]); + g->edge_indices_[recvdn_edges[i].ij_[0]+1]++; + } + } + + // up + if (up_ != MPI_PROC_NULL) { + for (GraphElem i = 0; i < recv_sizes[0]; i++) { +#if defined(CHECK_NUM_EDGES) + numEdges++; +#endif + if (!unitEdgeWeight) + edgeList.emplace_back(recvup_edges[i].ij_[0], recvup_edges[i].ij_[1], recvup_edges[i].w_); + else + edgeList.emplace_back(recvup_edges[i].ij_[0], recvup_edges[i].ij_[1]); + g->edge_indices_[recvup_edges[i].ij_[0]+1]++; + } + } + + // add random edges based on + // randomEdgePercent + if (randomEdgePercent > 0) { + const GraphElem pnedges = (edgeList.size()/2); + GraphElem tot_pnedges = 0; + + MPI_Allreduce(&pnedges, &tot_pnedges, 1, MPI_GRAPH_TYPE, MPI_SUM, comm_); + + // extra #edges per process + const GraphElem nrande = (((GraphElem)randomEdgePercent * tot_pnedges)/100); + GraphElem pnrande; + + // TODO FIXME try to ensure a fair edge distibution + if (nrande < nprocs_) { + if (rank_ == (nprocs_ - 1)) + pnrande += nrande; + } + else { + pnrande = nrande / nprocs_; + const GraphElem pnrem = nrande % nprocs_; + if (pnrem != 0) { + if (rank_ == (nprocs_ - 1)) + pnrande += pnrem; + } + } + + // add pnrande edges + + // send/recv buffers + std::vector> rand_edges(nprocs_); + std::vector sendrand_edges, recvrand_edges; + + // outgoing/incoming send/recv sizes + std::vector sendrand_sizes(nprocs_), recvrand_sizes(nprocs_); + +#if defined(PRINT_EXTRA_NEDGES) + int extraEdges = 0; +#endif + +#if defined(DEBUG_PRINTF) + for (int i = 0; i < nprocs_; i++) { + if (i == rank_) { + std::cout << "[" << i << "]Target process for random edge insertion between " + << lo << " and " << hi << std::endl; + } + MPI_Barrier(comm_); + } +#endif + // make sure each process has a + // different seed this time since + // we want random edges + unsigned rande_seed = (unsigned)(time(0)^getpid()); + GraphWeight weight = 1.0; + std::hash reh; + + // cannot use genRandom if it's already been seeded + std::default_random_engine re(rande_seed); + std::uniform_int_distribution<> IR, JR; + std::uniform_real_distribution<> IJW; + + for (GraphElem k = 0; k < pnrande; k++) { + + // randomly pick start/end vertex and target from my list + const GraphElem i = (GraphElem)IR(re, std::uniform_int_distribution<>::param_type{0, (int)(n_- 1)}); + const GraphElem g_j = (GraphElem)JR(re, std::uniform_int_distribution<>::param_type{0, (int)(nv_- 1)}); + const int target = g->get_owner(g_j); + const GraphElem j = g->global_to_local(g_j, target); // local + + if (i == j) + continue; + + const GraphElem g_i = g->local_to_global(i); + + // check for duplicates prior to edgeList insertion + auto found = std::find_if(edgeList.begin(), edgeList.end(), + [&](EdgeTuple const& et) + { return ((et.ij_[0] == i) && (et.ij_[1] == g_j)); }); + + // OK to insert, not in list + if (found == std::end(edgeList)) { + + // calculate weight + if (!unitEdgeWeight) { + if (target == rank_) { + GraphWeight dx = X[i] - X[j]; + GraphWeight dy = Y[i] - Y[j]; + weight = sqrt(dx*dx + dy*dy); + } + else if (target == up_) { + GraphWeight dx = X[i] - X_up[j]; + GraphWeight dy = Y[i] - Y_up[j]; + weight = sqrt(dx*dx + dy*dy); + } + else if (target == down_) { + GraphWeight dx = X[i] - X_down[j]; + GraphWeight dy = Y[i] - Y_down[j]; + weight = sqrt(dx*dx + dy*dy); + } + else { + unsigned randw_seed = reh((GraphElem)(g_i*nv_+g_j)); + std::default_random_engine rew(randw_seed); + weight = (GraphWeight)IJW(rew, std::uniform_real_distribution<>::param_type{0.0, 1.0}); + } + } + + rand_edges[target].emplace_back(j, g_i, weight); + sendrand_sizes[target]++; + +#if defined(PRINT_EXTRA_NEDGES) + extraEdges++; +#endif +#if defined(CHECK_NUM_EDGES) + numEdges++; +#endif + edgeList.emplace_back(i, g_j, weight); + g->edge_indices_[i+1]++; + } + } + +#if defined(PRINT_EXTRA_NEDGES) + int totExtraEdges = 0; + MPI_Reduce(&extraEdges, &totExtraEdges, 1, MPI_INT, MPI_SUM, 0, comm_); + if (rank_ == 0) + std::cout << "Adding extra " << totExtraEdges << " edges while trying to incorporate " + << randomEdgePercent << "%" << " extra edges globally." << std::endl; +#endif + + MPI_Barrier(comm_); + + // communicate ghosts edges + MPI_Request rande_sreq; + + MPI_Ialltoall(sendrand_sizes.data(), 1, MPI_INT, + recvrand_sizes.data(), 1, MPI_INT, comm_, + &rande_sreq); + + // send data if outgoing size > 0 + for (int p = 0; p < nprocs_; p++) { + sendrand_edges.insert(sendrand_edges.end(), + rand_edges[p].begin(), rand_edges[p].end()); + } + + MPI_Wait(&rande_sreq, MPI_STATUS_IGNORE); + + // total recvbuffer size + const int rcount = std::accumulate(recvrand_sizes.begin(), recvrand_sizes.end(), 0); + recvrand_edges.resize(rcount); + + // alltoallv for incoming data + // TODO FIXME make sure size of extra edges is + // within INT limits + + int rpos = 0, spos = 0; + std::vector sdispls(nprocs_), rdispls(nprocs_); + + for (int p = 0; p < nprocs_; p++) { + + sendrand_sizes[p] *= sizeof(struct EdgeTuple); + recvrand_sizes[p] *= sizeof(struct EdgeTuple); + + sdispls[p] = spos; + rdispls[p] = rpos; + + spos += sendrand_sizes[p]; + rpos += recvrand_sizes[p]; + } + + MPI_Alltoallv(sendrand_edges.data(), sendrand_sizes.data(), sdispls.data(), + MPI_BYTE, recvrand_edges.data(), recvrand_sizes.data(), rdispls.data(), + MPI_BYTE, comm_); + + // update local edge list + for (int i = 0; i < rcount; i++) { +#if defined(CHECK_NUM_EDGES) + numEdges++; +#endif + edgeList.emplace_back(recvrand_edges[i].ij_[0], recvrand_edges[i].ij_[1], recvrand_edges[i].w_); + g->edge_indices_[recvrand_edges[i].ij_[0]+1]++; + } + + sendrand_edges.clear(); + recvrand_edges.clear(); + rand_edges.clear(); + } // end of (conditional) random edges addition + + MPI_Barrier(comm_); + + // set graph edge indices + + std::vector ecTmp(n_+1); + std::partial_sum(g->edge_indices_.begin(), g->edge_indices_.end(), ecTmp.begin()); + g->edge_indices_ = ecTmp; + + for(GraphElem i = 1; i < n_+1; i++) + g->edge_indices_[i] -= g->edge_indices_[0]; + g->edge_indices_[0] = 0; + + g->set_edge_index(0, 0); + for (GraphElem i = 0; i < n_; i++) + g->set_edge_index(i+1, g->edge_indices_[i+1]); + + const GraphElem nedges = g->edge_indices_[n_] - g->edge_indices_[0]; + g->set_nedges(nedges); + + // set graph edge list + // sort edge list + auto ecmp = [] (EdgeTuple const& e0, EdgeTuple const& e1) + { return ((e0.ij_[0] < e1.ij_[0]) || ((e0.ij_[0] == e1.ij_[0]) && (e0.ij_[1] < e1.ij_[1]))); }; + + if (!std::is_sorted(edgeList.begin(), edgeList.end(), ecmp)) { +#if defined(DEBUG_PRINTF) + std::cout << "Edge list is not sorted." << std::endl; +#endif + std::sort(edgeList.begin(), edgeList.end(), ecmp); + } +#if defined(DEBUG_PRINTF) + else + std::cout << "Edge list is sorted!" << std::endl; +#endif + + GraphElem ePos = 0; + for (GraphElem i = 0; i < n_; i++) { + GraphElem e0, e1; + + g->edge_range(i, e0, e1); +#if defined(DEBUG_PRINTF) + if ((i % 100000) == 0) + std::cout << "Processing edges for vertex: " << i << ", range(" << e0 << ", " << e1 << + ")" << std::endl; +#endif + for (GraphElem j = e0; j < e1; j++) { + Edge &edge = g->set_edge(j); + + assert(ePos == j); + assert(i == edgeList[ePos].ij_[0]); + + edge.tail_ = edgeList[ePos].ij_[1]; + edge.weight_ = edgeList[ePos].w_; + + ePos++; + } + } + +#if defined(CHECK_NUM_EDGES) + GraphElem tot_numEdges = 0; + MPI_Allreduce(&numEdges, &tot_numEdges, 1, MPI_GRAPH_TYPE, MPI_SUM, comm_); + const GraphElem tne = g->get_ne(); + assert(tne == tot_numEdges); +#endif + edgeList.clear(); + + X.clear(); + Y.clear(); + X_up.clear(); + Y_up.clear(); + X_down.clear(); + Y_down.clear(); + + sendup_edges.clear(); + senddn_edges.clear(); + recvup_edges.clear(); + recvdn_edges.clear(); + + return g; + } + + GraphWeight get_d() const { return rn_; } + GraphElem get_nv() const { return nv_; } + + private: + GraphElem nv_, n_; + GraphWeight rn_; + MPI_Comm comm_; + int nprocs_, rank_, up_, down_; +}; + +#endif diff --git a/miniVite/log b/miniVite/log new file mode 100644 index 0000000..19ee514 --- /dev/null +++ b/miniVite/log @@ -0,0 +1,518 @@ +My libomptarget --> Set mode to SDEV +==168619== NVPROF is profiling process 168619, command: ./miniVite -n 50000000 +Average time to generate 100000000 random numbers using LCG (in s): 0.182072 +********************************************************************** +Generated Random Geometric Graph with d: 0.000269794 +Number of vertices: 50000000 +Number of edges: 899999910 +Time to generate distributed graph of 50000000 vertices (in s): 88.1166 +Size: 16 : 8 +My libomptarget --> omp_target_alloc returns uvm ptr 0x00002004c0000000, size=400000008 +My libomptarget --> omp_target_alloc returns uvm ptr 0x00002004d7e00000, size=16 +My libomptarget --> omp_target_alloc returns uvm ptr 0x00002004e0000000, size=14399998560 +My libomptarget --> omp_target_alloc returns uvm ptr 0x0000200840000000, size=400000000 +My libomptarget --> omp_target_alloc returns uvm ptr 0x0000200860000000, size=400000000 +My libomptarget --> omp_target_alloc returns uvm ptr 0x0000200880000000, size=400000000 +My libomptarget --> omp_target_alloc returns uvm ptr 0x00002008a0000000, size=800000000 +My libomptarget --> omp_target_alloc returns uvm ptr 0x00002008e0000000, size=800000000 +My libomptarget --> omp_target_alloc returns uvm ptr 0x0000200920000000, size=400000000 +My libomptarget --> COMPUTE (0x00000000100166f8) (#iter: 50000000 device: 0 UM: 0) at 1 +My libomptarget --> Map 0x0000200920000000 to soft device, size=400000000 +My libomptarget --> Apply opt 4 to 0x0000200920000000 +My libomptarget --> Apply opt 1 to 0x0000200920000000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x00002008e0000000 to soft device, size=800000000 +My libomptarget --> Apply opt 4 to 0x00002008e0000000 +My libomptarget --> Apply opt 1 to 0x00002008e0000000 +My libomptarget --> Invalid optimization +My libomptarget --> Unmap 0x00002008e0000000 from soft device (0x00002008e0000000), size=800000000 +My libomptarget --> Unmap 0x0000200920000000 from soft device (0x0000200920000000), size=400000000 +My libomptarget --> COMPUTE (0x0000000010016724) (#iter: 50000000 device: 0 UM: 0) at 2 +My libomptarget --> Map 0x00002004c0000000 to soft device, size=400000008 +My libomptarget --> Apply opt 4 to 0x00002004c0000000 +My libomptarget --> Apply opt 1 to 0x00002004c0000000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x00002004d7e00000 to soft device, size=16 +My libomptarget --> Apply opt 4 to 0x00002004d7e00000 +My libomptarget --> Apply opt 1 to 0x00002004d7e00000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x00002004e0000000 to soft device, size=14399998560 +My libomptarget --> Apply opt 4 to 0x00002004e0000000 +My libomptarget --> Apply opt 1 to 0x00002004e0000000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x0000200840000000 to soft device, size=400000000 +My libomptarget --> Apply opt 4 to 0x0000200840000000 +My libomptarget --> Apply opt 1 to 0x0000200840000000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x0000200880000000 to soft device, size=400000000 +My libomptarget --> Apply opt 4 to 0x0000200880000000 +My libomptarget --> Apply opt 1 to 0x0000200880000000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x0000200860000000 to soft device, size=400000000 +My libomptarget --> Apply opt 4 to 0x0000200860000000 +My libomptarget --> Apply opt 1 to 0x0000200860000000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x00002008a0000000 to soft device, size=800000000 +My libomptarget --> Apply opt 4 to 0x00002008a0000000 +My libomptarget --> Apply opt 1 to 0x00002008a0000000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x00002008e0000000 to soft device, size=800000000 +My libomptarget --> Apply opt 4 to 0x00002008e0000000 +My libomptarget --> Apply opt 1 to 0x00002008e0000000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x0000200920000000 to soft device, size=400000000 +My libomptarget --> Apply opt 4 to 0x0000200920000000 +My libomptarget --> Apply opt 1 to 0x0000200920000000 +My libomptarget --> Invalid optimization +My libomptarget --> Unmap 0x0000200920000000 from soft device (0x0000200920000000), size=400000000 +My libomptarget --> Unmap 0x00002008e0000000 from soft device (0x00002008e0000000), size=800000000 +My libomptarget --> Unmap 0x00002008a0000000 from soft device (0x00002008a0000000), size=800000000 +My libomptarget --> Unmap 0x0000200860000000 from soft device (0x0000200860000000), size=400000000 +My libomptarget --> Unmap 0x0000200880000000 from soft device (0x0000200880000000), size=400000000 +My libomptarget --> Unmap 0x0000200840000000 from soft device (0x0000200840000000), size=400000000 +My libomptarget --> Unmap 0x00002004e0000000 from soft device (0x00002004e0000000), size=14399998560 +My libomptarget --> Unmap 0x00002004d7e00000 from soft device (0x00002004d7e00000), size=16 +My libomptarget --> Unmap 0x00002004c0000000 from soft device (0x00002004c0000000), size=400000008 +My libomptarget --> COMPUTE (0x00000000100166d9) (#iter: 50000000 device: 0 UM: 0) at 3 +My libomptarget --> Map 0x00002008a0000000 to soft device, size=800000000 +My libomptarget --> Apply opt 4 to 0x00002008a0000000 +My libomptarget --> Apply opt 1 to 0x00002008a0000000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x00002008e0000000 to soft device, size=800000000 +My libomptarget --> Apply opt 4 to 0x00002008e0000000 +My libomptarget --> Apply opt 1 to 0x00002008e0000000 +My libomptarget --> Invalid optimization +My libomptarget --> Unmap 0x00002008e0000000 from soft device (0x00002008e0000000), size=800000000 +My libomptarget --> Unmap 0x00002008a0000000 from soft device (0x00002008a0000000), size=800000000 +My libomptarget --> COMPUTE (0x00000000100166d8) (#iter: 50000000 device: 0 UM: 0) at 4 +My libomptarget --> Map 0x00007ffff20be9a8 to device (0x0000200997600000), size=8 +My libomptarget --> Submit 0x00007ffff20be9a8 to 0x0000200997600000, size=8 +My libomptarget --> Map 0x0000200920000000 to soft device, size=400000000 +My libomptarget --> Apply opt 4 to 0x0000200920000000 +My libomptarget --> Apply opt 1 to 0x0000200920000000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x00007ffff20be9a0 to device (0x0000200997600200), size=8 +My libomptarget --> Submit 0x00007ffff20be9a0 to 0x0000200997600200, size=8 +My libomptarget --> Map 0x00002008a0000000 to soft device, size=800000000 +My libomptarget --> Apply opt 4 to 0x00002008a0000000 +My libomptarget --> Apply opt 1 to 0x00002008a0000000 +My libomptarget --> Invalid optimization +My libomptarget --> Unmap 0x00002008a0000000 from soft device (0x00002008a0000000), size=800000000 +My libomptarget --> Retrieve 0x00007ffff20be9a0 from 0x0000200997600200, size=8 +My libomptarget --> Unmap 0x00007ffff20be9a0 from device (0x0000200997600200), size=8 +My libomptarget --> Unmap 0x0000200920000000 from soft device (0x0000200920000000), size=400000000 +My libomptarget --> Retrieve 0x00007ffff20be9a8 from 0x0000200997600000, size=8 +My libomptarget --> Unmap 0x00007ffff20be9a8 from device (0x0000200997600000), size=8 +My libomptarget --> COMPUTE (0x00000000100166f8) (#iter: 50000000 device: 0 UM: 0) at 5 +My libomptarget --> Map 0x0000200920000000 to soft device, size=400000000 +My libomptarget --> Apply opt 4 to 0x0000200920000000 +My libomptarget --> Apply opt 1 to 0x0000200920000000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x00002008e0000000 to soft device, size=800000000 +My libomptarget --> Apply opt 4 to 0x00002008e0000000 +My libomptarget --> Apply opt 1 to 0x00002008e0000000 +My libomptarget --> Invalid optimization +My libomptarget --> Unmap 0x00002008e0000000 from soft device (0x00002008e0000000), size=800000000 +My libomptarget --> Unmap 0x0000200920000000 from soft device (0x0000200920000000), size=400000000 +My libomptarget --> COMPUTE (0x0000000010016724) (#iter: 50000000 device: 0 UM: 0) at 6 +My libomptarget --> Map 0x00002004c0000000 to soft device, size=400000008 +My libomptarget --> Apply opt 4 to 0x00002004c0000000 +My libomptarget --> Apply opt 1 to 0x00002004c0000000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x00002004d7e00000 to soft device, size=16 +My libomptarget --> Apply opt 4 to 0x00002004d7e00000 +My libomptarget --> Apply opt 1 to 0x00002004d7e00000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x00002004e0000000 to soft device, size=14399998560 +My libomptarget --> Apply opt 4 to 0x00002004e0000000 +My libomptarget --> Apply opt 1 to 0x00002004e0000000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x0000200840000000 to soft device, size=400000000 +My libomptarget --> Apply opt 4 to 0x0000200840000000 +My libomptarget --> Apply opt 1 to 0x0000200840000000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x0000200880000000 to soft device, size=400000000 +My libomptarget --> Apply opt 4 to 0x0000200880000000 +My libomptarget --> Apply opt 1 to 0x0000200880000000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x0000200860000000 to soft device, size=400000000 +My libomptarget --> Apply opt 4 to 0x0000200860000000 +My libomptarget --> Apply opt 1 to 0x0000200860000000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x00002008a0000000 to soft device, size=800000000 +My libomptarget --> Apply opt 4 to 0x00002008a0000000 +My libomptarget --> Apply opt 1 to 0x00002008a0000000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x00002008e0000000 to soft device, size=800000000 +My libomptarget --> Apply opt 4 to 0x00002008e0000000 +My libomptarget --> Apply opt 1 to 0x00002008e0000000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x0000200920000000 to soft device, size=400000000 +My libomptarget --> Apply opt 4 to 0x0000200920000000 +My libomptarget --> Apply opt 1 to 0x0000200920000000 +My libomptarget --> Invalid optimization +My libomptarget --> Unmap 0x0000200920000000 from soft device (0x0000200920000000), size=400000000 +My libomptarget --> Unmap 0x00002008e0000000 from soft device (0x00002008e0000000), size=800000000 +My libomptarget --> Unmap 0x00002008a0000000 from soft device (0x00002008a0000000), size=800000000 +My libomptarget --> Unmap 0x0000200860000000 from soft device (0x0000200860000000), size=400000000 +My libomptarget --> Unmap 0x0000200880000000 from soft device (0x0000200880000000), size=400000000 +My libomptarget --> Unmap 0x0000200840000000 from soft device (0x0000200840000000), size=400000000 +My libomptarget --> Unmap 0x00002004e0000000 from soft device (0x00002004e0000000), size=14399998560 +My libomptarget --> Unmap 0x00002004d7e00000 from soft device (0x00002004d7e00000), size=16 +My libomptarget --> Unmap 0x00002004c0000000 from soft device (0x00002004c0000000), size=400000008 +My libomptarget --> COMPUTE (0x00000000100166d9) (#iter: 50000000 device: 0 UM: 0) at 7 +My libomptarget --> Map 0x00002008a0000000 to soft device, size=800000000 +My libomptarget --> Apply opt 4 to 0x00002008a0000000 +My libomptarget --> Apply opt 1 to 0x00002008a0000000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x00002008e0000000 to soft device, size=800000000 +My libomptarget --> Apply opt 4 to 0x00002008e0000000 +My libomptarget --> Apply opt 1 to 0x00002008e0000000 +My libomptarget --> Invalid optimization +My libomptarget --> Unmap 0x00002008e0000000 from soft device (0x00002008e0000000), size=800000000 +My libomptarget --> Unmap 0x00002008a0000000 from soft device (0x00002008a0000000), size=800000000 +My libomptarget --> COMPUTE (0x00000000100166d8) (#iter: 50000000 device: 0 UM: 0) at 8 +My libomptarget --> Map 0x00007ffff20be9a8 to device (0x0000200997600000), size=8 +My libomptarget --> Submit 0x00007ffff20be9a8 to 0x0000200997600000, size=8 +My libomptarget --> Map 0x0000200920000000 to soft device, size=400000000 +My libomptarget --> Apply opt 4 to 0x0000200920000000 +My libomptarget --> Apply opt 1 to 0x0000200920000000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x00007ffff20be9a0 to device (0x0000200997600200), size=8 +My libomptarget --> Submit 0x00007ffff20be9a0 to 0x0000200997600200, size=8 +My libomptarget --> Map 0x00002008a0000000 to soft device, size=800000000 +My libomptarget --> Apply opt 4 to 0x00002008a0000000 +My libomptarget --> Apply opt 1 to 0x00002008a0000000 +My libomptarget --> Invalid optimization +My libomptarget --> Unmap 0x00002008a0000000 from soft device (0x00002008a0000000), size=800000000 +My libomptarget --> Retrieve 0x00007ffff20be9a0 from 0x0000200997600200, size=8 +My libomptarget --> Unmap 0x00007ffff20be9a0 from device (0x0000200997600200), size=8 +My libomptarget --> Unmap 0x0000200920000000 from soft device (0x0000200920000000), size=400000000 +My libomptarget --> Retrieve 0x00007ffff20be9a8 from 0x0000200997600000, size=8 +My libomptarget --> Unmap 0x00007ffff20be9a8 from device (0x0000200997600000), size=8 +Total size: 16.7638 +Time: 80.839 +Modularity: -2e-08, Iterations: 2, Time (in s): 91.9998 +********************************************************************** +==168619== Profiling application: ./miniVite -n 50000000 +==168619== Profiling result: + Type Time(%) Time Calls Avg Min Max Name + GPU activities: 88.78% 70.8005s 2 35.4002s 13.8961s 56.9044s __omp_offloading_35_eeedb51__Z17distLouvainMethodiiRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ddRi_l1367 + 7.38% 5.88504s 2 2.94252s 2.90431s 2.98072s __omp_offloading_35_eeedb51__Z20distUpdateLocalCinfolP4CommPKS__l435 + 3.45% 2.75114s 2 1.37557s 1.31162s 1.43952s __omp_offloading_35_eeedb51__Z21distComputeModularityRK5GraphP4CommPKddi_l395 + 0.39% 309.96ms 2 154.98ms 2.5277ms 307.43ms __omp_offloading_35_eeedb51__Z16distCleanCWandCUlPdP4Comm_l454 + 0.00% 14.176us 8 1.7720us 1.6320us 1.8880us [CUDA memcpy DtoH] + 0.00% 7.2000us 5 1.4400us 1.2480us 1.7280us [CUDA memcpy HtoD] + API calls: 98.42% 79.7474s 8 9.96842s 2.6219ms 56.9045s cuCtxSynchronize + 0.80% 645.71ms 1 645.71ms 645.71ms 645.71ms cuCtxDestroy + 0.65% 523.16ms 1 523.16ms 523.16ms 523.16ms cuCtxCreate + 0.07% 55.862ms 4 13.965ms 26.356us 29.241ms cuMemAlloc + 0.03% 21.165ms 9 2.3517ms 69.686us 20.311ms cuMemAllocManaged + 0.02% 14.921ms 30 497.36us 3.8050us 10.727ms cuMemAdvise + 0.01% 11.501ms 1 11.501ms 11.501ms 11.501ms cuModuleLoadDataEx + 0.00% 3.5211ms 8 440.13us 33.118us 3.1674ms cuLaunchKernel + 0.00% 3.4017ms 1 3.4017ms 3.4017ms 3.4017ms cuModuleUnload + 0.00% 1.0287ms 4 257.18us 27.296us 488.97us cuMemFree + 0.00% 610.48us 8 76.310us 53.527us 105.83us cuMemcpyDtoH + 0.00% 233.54us 5 46.707us 27.817us 70.651us cuMemcpyHtoD + 0.00% 62.364us 34 1.8340us 555ns 4.7440us cuCtxSetCurrent + 0.00% 18.389us 8 2.2980us 1.7010us 2.9630us cuFuncGetAttribute + 0.00% 13.046us 21 621ns 338ns 1.0360us cuDeviceGetAttribute + 0.00% 12.614us 5 2.5220us 2.3040us 3.1820us cuModuleGetGlobal + 0.00% 11.860us 6 1.9760us 1.2010us 4.5830us cuDeviceGetPCIBusId + 0.00% 8.3770us 7 1.1960us 592ns 4.2380us cuDeviceGet + 0.00% 7.7370us 4 1.9340us 1.5450us 2.9960us cuModuleGetFunction + 0.00% 1.5730us 3 524ns 403ns 588ns cuDeviceGetCount + +==168619== Unified Memory profiling result: +Device "Tesla V100-SXM2-16GB (0)" + Count Avg Size Min Size Max Size Total Size Total Time Name + 210990 169.82KB 64.000KB 1.6250MB 34.16998GB 1.252414s Host To Device + 9314 1.9949MB 64.000KB 2.0000MB 18.14508GB 433.0975ms Device To Host + 31726 - - - - 79.700933s Gpu page fault groups + 319 1.9969MB 1.5000MB 2.0000MB 637.0000MB - Remote mapping to device +Total CPU Page faults: 51896 +Total remote mappings from CPU: 319 + +------------------------------------------------------------ +Sender: LSF System +Subject: Job 310708: in cluster Done + +Job was submitted from host by user in cluster at Wed Mar 27 16:42:25 2019 +Job was executed on host(s) <1*batch3>, in queue , as user in cluster at Wed Mar 27 16:42:35 2019 + <42*g33n07> + was used as the home directory. + was used as the working directory. +Started at Wed Mar 27 16:42:35 2019 +Terminated at Wed Mar 27 16:45:46 2019 +Results reported at Wed Mar 27 16:45:46 2019 + +The output (if any) is above this job summary. + +My libomptarget --> Set mode to SDEV +==114131== NVPROF is profiling process 114131, command: ./miniVite -n 5000000 +Average time to generate 10000000 random numbers using LCG (in s): 0.0181821 +********************************************************************** +Generated Random Geometric Graph with d: 0.000817469 +Number of vertices: 5000000 +Number of edges: 89999910 +Time to generate distributed graph of 5000000 vertices (in s): 8.43148 +Size: 16 : 8 +My libomptarget --> omp_target_alloc returns uvm ptr 0x00002000e0000000, size=40000008 +My libomptarget --> omp_target_alloc returns uvm ptr 0x00002000e2800000, size=16 +My libomptarget --> omp_target_alloc returns uvm ptr 0x0000200120000000, size=1439998560 +My libomptarget --> omp_target_alloc returns uvm ptr 0x0000200175e00000, size=40000000 +My libomptarget --> omp_target_alloc returns uvm ptr 0x0000200178600000, size=40000000 +My libomptarget --> omp_target_alloc returns uvm ptr 0x000020017ae00000, size=40000000 +My libomptarget --> omp_target_alloc returns uvm ptr 0x00002000e2a00000, size=80000000 +My libomptarget --> omp_target_alloc returns uvm ptr 0x00002000e7800000, size=80000000 +My libomptarget --> omp_target_alloc returns uvm ptr 0x000020017d600000, size=40000000 +My libomptarget --> COMPUTE (0x00000000100166f8) (#iter: 5000000 device: 0 UM: 0) at 1 +My libomptarget --> Map 0x000020017d600000 to soft device, size=40000000 +My libomptarget --> Apply opt 4 to 0x000020017d600000 +My libomptarget --> Apply opt 1 to 0x000020017d600000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x00002000e7800000 to soft device, size=80000000 +My libomptarget --> Apply opt 4 to 0x00002000e7800000 +My libomptarget --> Apply opt 1 to 0x00002000e7800000 +My libomptarget --> Invalid optimization +My libomptarget --> Unmap 0x00002000e7800000 from soft device (0x00002000e7800000), size=80000000 +My libomptarget --> Unmap 0x000020017d600000 from soft device (0x000020017d600000), size=40000000 +My libomptarget --> COMPUTE (0x0000000010016724) (#iter: 5000000 device: 0 UM: 0) at 2 +My libomptarget --> Map 0x00002000e0000000 to soft device, size=40000008 +My libomptarget --> Apply opt 4 to 0x00002000e0000000 +My libomptarget --> Apply opt 1 to 0x00002000e0000000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x00002000e2800000 to soft device, size=16 +My libomptarget --> Apply opt 4 to 0x00002000e2800000 +My libomptarget --> Apply opt 1 to 0x00002000e2800000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x0000200120000000 to soft device, size=1439998560 +My libomptarget --> Apply opt 4 to 0x0000200120000000 +My libomptarget --> Apply opt 1 to 0x0000200120000000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x0000200175e00000 to soft device, size=40000000 +My libomptarget --> Apply opt 4 to 0x0000200175e00000 +My libomptarget --> Apply opt 1 to 0x0000200175e00000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x000020017ae00000 to soft device, size=40000000 +My libomptarget --> Apply opt 4 to 0x000020017ae00000 +My libomptarget --> Apply opt 1 to 0x000020017ae00000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x0000200178600000 to soft device, size=40000000 +My libomptarget --> Apply opt 4 to 0x0000200178600000 +My libomptarget --> Apply opt 1 to 0x0000200178600000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x00002000e2a00000 to soft device, size=80000000 +My libomptarget --> Apply opt 4 to 0x00002000e2a00000 +My libomptarget --> Apply opt 1 to 0x00002000e2a00000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x00002000e7800000 to soft device, size=80000000 +My libomptarget --> Apply opt 4 to 0x00002000e7800000 +My libomptarget --> Apply opt 1 to 0x00002000e7800000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x000020017d600000 to soft device, size=40000000 +My libomptarget --> Apply opt 4 to 0x000020017d600000 +My libomptarget --> Apply opt 1 to 0x000020017d600000 +My libomptarget --> Invalid optimization +My libomptarget --> Unmap 0x000020017d600000 from soft device (0x000020017d600000), size=40000000 +My libomptarget --> Unmap 0x00002000e7800000 from soft device (0x00002000e7800000), size=80000000 +My libomptarget --> Unmap 0x00002000e2a00000 from soft device (0x00002000e2a00000), size=80000000 +My libomptarget --> Unmap 0x0000200178600000 from soft device (0x0000200178600000), size=40000000 +My libomptarget --> Unmap 0x000020017ae00000 from soft device (0x000020017ae00000), size=40000000 +My libomptarget --> Unmap 0x0000200175e00000 from soft device (0x0000200175e00000), size=40000000 +My libomptarget --> Unmap 0x0000200120000000 from soft device (0x0000200120000000), size=1439998560 +My libomptarget --> Unmap 0x00002000e2800000 from soft device (0x00002000e2800000), size=16 +My libomptarget --> Unmap 0x00002000e0000000 from soft device (0x00002000e0000000), size=40000008 +My libomptarget --> COMPUTE (0x00000000100166d9) (#iter: 5000000 device: 0 UM: 0) at 3 +My libomptarget --> Map 0x00002000e2a00000 to soft device, size=80000000 +My libomptarget --> Apply opt 4 to 0x00002000e2a00000 +My libomptarget --> Apply opt 1 to 0x00002000e2a00000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x00002000e7800000 to soft device, size=80000000 +My libomptarget --> Apply opt 4 to 0x00002000e7800000 +My libomptarget --> Apply opt 1 to 0x00002000e7800000 +My libomptarget --> Invalid optimization +My libomptarget --> Unmap 0x00002000e7800000 from soft device (0x00002000e7800000), size=80000000 +My libomptarget --> Unmap 0x00002000e2a00000 from soft device (0x00002000e2a00000), size=80000000 +My libomptarget --> COMPUTE (0x00000000100166d8) (#iter: 5000000 device: 0 UM: 0) at 4 +My libomptarget --> Map 0x00007fffe1e06888 to device (0x00002001d7600000), size=8 +My libomptarget --> Submit 0x00007fffe1e06888 to 0x00002001d7600000, size=8 +My libomptarget --> Map 0x000020017d600000 to soft device, size=40000000 +My libomptarget --> Apply opt 4 to 0x000020017d600000 +My libomptarget --> Apply opt 1 to 0x000020017d600000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x00007fffe1e06880 to device (0x00002001d7600200), size=8 +My libomptarget --> Submit 0x00007fffe1e06880 to 0x00002001d7600200, size=8 +My libomptarget --> Map 0x00002000e2a00000 to soft device, size=80000000 +My libomptarget --> Apply opt 4 to 0x00002000e2a00000 +My libomptarget --> Apply opt 1 to 0x00002000e2a00000 +My libomptarget --> Invalid optimization +My libomptarget --> Unmap 0x00002000e2a00000 from soft device (0x00002000e2a00000), size=80000000 +My libomptarget --> Retrieve 0x00007fffe1e06880 from 0x00002001d7600200, size=8 +My libomptarget --> Unmap 0x00007fffe1e06880 from device (0x00002001d7600200), size=8 +My libomptarget --> Unmap 0x000020017d600000 from soft device (0x000020017d600000), size=40000000 +My libomptarget --> Retrieve 0x00007fffe1e06888 from 0x00002001d7600000, size=8 +My libomptarget --> Unmap 0x00007fffe1e06888 from device (0x00002001d7600000), size=8 +My libomptarget --> COMPUTE (0x00000000100166f8) (#iter: 5000000 device: 0 UM: 0) at 5 +My libomptarget --> Map 0x000020017d600000 to soft device, size=40000000 +My libomptarget --> Apply opt 4 to 0x000020017d600000 +My libomptarget --> Apply opt 1 to 0x000020017d600000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x00002000e7800000 to soft device, size=80000000 +My libomptarget --> Apply opt 4 to 0x00002000e7800000 +My libomptarget --> Apply opt 1 to 0x00002000e7800000 +My libomptarget --> Invalid optimization +My libomptarget --> Unmap 0x00002000e7800000 from soft device (0x00002000e7800000), size=80000000 +My libomptarget --> Unmap 0x000020017d600000 from soft device (0x000020017d600000), size=40000000 +My libomptarget --> COMPUTE (0x0000000010016724) (#iter: 5000000 device: 0 UM: 0) at 6 +My libomptarget --> Map 0x00002000e0000000 to soft device, size=40000008 +My libomptarget --> Apply opt 4 to 0x00002000e0000000 +My libomptarget --> Apply opt 1 to 0x00002000e0000000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x00002000e2800000 to soft device, size=16 +My libomptarget --> Apply opt 4 to 0x00002000e2800000 +My libomptarget --> Apply opt 1 to 0x00002000e2800000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x0000200120000000 to soft device, size=1439998560 +My libomptarget --> Apply opt 4 to 0x0000200120000000 +My libomptarget --> Apply opt 1 to 0x0000200120000000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x0000200175e00000 to soft device, size=40000000 +My libomptarget --> Apply opt 4 to 0x0000200175e00000 +My libomptarget --> Apply opt 1 to 0x0000200175e00000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x000020017ae00000 to soft device, size=40000000 +My libomptarget --> Apply opt 4 to 0x000020017ae00000 +My libomptarget --> Apply opt 1 to 0x000020017ae00000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x0000200178600000 to soft device, size=40000000 +My libomptarget --> Apply opt 4 to 0x0000200178600000 +My libomptarget --> Apply opt 1 to 0x0000200178600000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x00002000e2a00000 to soft device, size=80000000 +My libomptarget --> Apply opt 4 to 0x00002000e2a00000 +My libomptarget --> Apply opt 1 to 0x00002000e2a00000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x00002000e7800000 to soft device, size=80000000 +My libomptarget --> Apply opt 4 to 0x00002000e7800000 +My libomptarget --> Apply opt 1 to 0x00002000e7800000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x000020017d600000 to soft device, size=40000000 +My libomptarget --> Apply opt 4 to 0x000020017d600000 +My libomptarget --> Apply opt 1 to 0x000020017d600000 +My libomptarget --> Invalid optimization +My libomptarget --> Unmap 0x000020017d600000 from soft device (0x000020017d600000), size=40000000 +My libomptarget --> Unmap 0x00002000e7800000 from soft device (0x00002000e7800000), size=80000000 +My libomptarget --> Unmap 0x00002000e2a00000 from soft device (0x00002000e2a00000), size=80000000 +My libomptarget --> Unmap 0x0000200178600000 from soft device (0x0000200178600000), size=40000000 +My libomptarget --> Unmap 0x000020017ae00000 from soft device (0x000020017ae00000), size=40000000 +My libomptarget --> Unmap 0x0000200175e00000 from soft device (0x0000200175e00000), size=40000000 +My libomptarget --> Unmap 0x0000200120000000 from soft device (0x0000200120000000), size=1439998560 +My libomptarget --> Unmap 0x00002000e2800000 from soft device (0x00002000e2800000), size=16 +My libomptarget --> Unmap 0x00002000e0000000 from soft device (0x00002000e0000000), size=40000008 +My libomptarget --> COMPUTE (0x00000000100166d9) (#iter: 5000000 device: 0 UM: 0) at 7 +My libomptarget --> Map 0x00002000e2a00000 to soft device, size=80000000 +My libomptarget --> Apply opt 4 to 0x00002000e2a00000 +My libomptarget --> Apply opt 1 to 0x00002000e2a00000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x00002000e7800000 to soft device, size=80000000 +My libomptarget --> Apply opt 4 to 0x00002000e7800000 +My libomptarget --> Apply opt 1 to 0x00002000e7800000 +My libomptarget --> Invalid optimization +My libomptarget --> Unmap 0x00002000e7800000 from soft device (0x00002000e7800000), size=80000000 +My libomptarget --> Unmap 0x00002000e2a00000 from soft device (0x00002000e2a00000), size=80000000 +My libomptarget --> COMPUTE (0x00000000100166d8) (#iter: 5000000 device: 0 UM: 0) at 8 +My libomptarget --> Map 0x00007fffe1e06888 to device (0x00002001d7600000), size=8 +My libomptarget --> Submit 0x00007fffe1e06888 to 0x00002001d7600000, size=8 +My libomptarget --> Map 0x000020017d600000 to soft device, size=40000000 +My libomptarget --> Apply opt 4 to 0x000020017d600000 +My libomptarget --> Apply opt 1 to 0x000020017d600000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x00007fffe1e06880 to device (0x00002001d7600200), size=8 +My libomptarget --> Submit 0x00007fffe1e06880 to 0x00002001d7600200, size=8 +My libomptarget --> Map 0x00002000e2a00000 to soft device, size=80000000 +My libomptarget --> Apply opt 4 to 0x00002000e2a00000 +My libomptarget --> Apply opt 1 to 0x00002000e2a00000 +My libomptarget --> Invalid optimization +My libomptarget --> Unmap 0x00002000e2a00000 from soft device (0x00002000e2a00000), size=80000000 +My libomptarget --> Retrieve 0x00007fffe1e06880 from 0x00002001d7600200, size=8 +My libomptarget --> Unmap 0x00007fffe1e06880 from device (0x00002001d7600200), size=8 +My libomptarget --> Unmap 0x000020017d600000 from soft device (0x000020017d600000), size=40000000 +My libomptarget --> Retrieve 0x00007fffe1e06888 from 0x00002001d7600000, size=8 +My libomptarget --> Unmap 0x00007fffe1e06888 from device (0x00002001d7600000), size=8 +Total size: 1.67638 +Time: 0.743411 +Modularity: 8.22212e-07, Iterations: 2, Time (in s): 2.44237 +********************************************************************** +==114131== Profiling application: ./miniVite -n 5000000 +==114131== Profiling result: + Type Time(%) Time Calls Avg Min Max Name + GPU activities: 93.44% 474.32ms 2 237.16ms 21.093ms 453.23ms __omp_offloading_35_eeedb51__Z17distLouvainMethodiiRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ddRi_l1367 + 6.30% 31.997ms 2 15.999ms 253.70us 31.743ms __omp_offloading_35_eeedb51__Z16distCleanCWandCUlPdP4Comm_l454 + 0.13% 650.77us 2 325.38us 324.62us 326.15us __omp_offloading_35_eeedb51__Z21distComputeModularityRK5GraphP4CommPKddi_l395 + 0.12% 631.82us 2 315.91us 314.70us 317.13us __omp_offloading_35_eeedb51__Z20distUpdateLocalCinfolP4CommPKS__l435 + 0.00% 14.401us 8 1.8000us 1.6320us 2.0160us [CUDA memcpy DtoH] + 0.00% 7.0080us 5 1.4010us 1.2480us 1.6000us [CUDA memcpy HtoD] + API calls: 40.00% 518.67ms 1 518.67ms 518.67ms 518.67ms cuCtxCreate + 39.20% 508.28ms 8 63.536ms 356.70us 453.31ms cuCtxSynchronize + 16.52% 214.18ms 1 214.18ms 214.18ms 214.18ms cuCtxDestroy + 1.62% 21.061ms 9 2.3401ms 80.485us 20.281ms cuMemAllocManaged + 0.89% 11.579ms 1 11.579ms 11.579ms 11.579ms cuModuleLoadDataEx + 0.56% 7.2959ms 1 7.2959ms 7.2959ms 7.2959ms cuModuleUnload + 0.51% 6.6057ms 8 825.71us 28.769us 6.3330ms cuLaunchKernel + 0.24% 3.0631ms 4 765.77us 23.171us 1.5982ms cuMemAlloc + 0.23% 3.0404ms 30 101.35us 3.7320us 2.0698ms cuMemAdvise + 0.12% 1.5644ms 4 391.11us 30.221us 753.34us cuMemFree + 0.07% 889.41us 8 111.18us 89.358us 125.11us cuMemcpyDtoH + 0.01% 193.93us 5 38.786us 26.534us 53.105us cuMemcpyHtoD + 0.00% 59.670us 34 1.7550us 676ns 4.4900us cuCtxSetCurrent + 0.00% 14.464us 8 1.8080us 1.2480us 2.4810us cuFuncGetAttribute + 0.00% 13.321us 5 2.6640us 2.1600us 3.6020us cuModuleGetGlobal + 0.00% 12.168us 6 2.0280us 1.1920us 5.1090us cuDeviceGetPCIBusId + 0.00% 11.903us 21 566ns 293ns 949ns cuDeviceGetAttribute + 0.00% 8.5710us 7 1.2240us 596ns 4.3550us cuDeviceGet + 0.00% 7.0220us 4 1.7550us 1.4670us 2.4140us cuModuleGetFunction + 0.00% 1.5720us 3 524ns 375ns 609ns cuDeviceGetCount + +==114131== Unified Memory profiling result: +Device "Tesla V100-SXM2-16GB (0)" + Count Avg Size Min Size Max Size Total Size Total Time Name + 5614 164.73KB 64.000KB 1.2500MB 903.1250MB 33.28173ms Host To Device + 692 - - - - 226.4184ms Gpu page fault groups + 40 1.9094MB 192.00KB 2.0000MB 76.37500MB - Remote mapping to device +Total CPU Page faults: 5212 +Total remote mappings from CPU: 40 + +------------------------------------------------------------ +Sender: LSF System +Subject: Job 310715: in cluster Done + +Job was submitted from host by user in cluster at Wed Mar 27 16:46:44 2019 +Job was executed on host(s) <1*batch2>, in queue , as user in cluster at Wed Mar 27 16:46:54 2019 + <42*g31n10> + was used as the home directory. + was used as the working directory. +Started at Wed Mar 27 16:46:54 2019 +Terminated at Wed Mar 27 16:47:11 2019 +Results reported at Wed Mar 27 16:47:11 2019 + +The output (if any) is above this job summary. + + + +------------------------------------------------------------ +Sender: LSF System +Subject: Job 311049: in cluster Done + +Job was submitted from host by user in cluster at Wed Mar 27 22:00:55 2019 +Job was executed on host(s) <1*batch1>, in queue , as user in cluster at Wed Mar 27 22:01:04 2019 + <42*a32n16> + was used as the home directory. + was used as the working directory. +Started at Wed Mar 27 22:01:04 2019 +Terminated at Wed Mar 27 22:01:06 2019 +Results reported at Wed Mar 27 22:01:06 2019 + +The output (if any) is above this job summary. + diff --git a/miniVite/log1 b/miniVite/log1 new file mode 100644 index 0000000..aa97cdb --- /dev/null +++ b/miniVite/log1 @@ -0,0 +1,598 @@ +My libomptarget --> Set mode to DEV +==158920== NVPROF is profiling process 158920, command: ./miniVite -n 5000000 +Average time to generate 10000000 random numbers using LCG (in s): 0.0181787 +********************************************************************** +Generated Random Geometric Graph with d: 0.000817469 +Number of vertices: 5000000 +Number of edges: 89999910 +Time to generate distributed graph of 5000000 vertices (in s): 8.47192 +Size: 16 : 8 +My libomptarget --> omp_target_alloc returns uvm ptr 0x00002000e0000000, size=40000008 +My libomptarget --> omp_target_alloc returns uvm ptr 0x00002000e2800000, size=16 +My libomptarget --> omp_target_alloc returns uvm ptr 0x0000200100000000, size=1439998560 +My libomptarget --> omp_target_alloc returns uvm ptr 0x0000200155e00000, size=40000000 +My libomptarget --> omp_target_alloc returns uvm ptr 0x0000200158600000, size=40000000 +My libomptarget --> omp_target_alloc returns uvm ptr 0x000020015ae00000, size=40000000 +My libomptarget --> omp_target_alloc returns uvm ptr 0x00002000e2a00000, size=80000000 +My libomptarget --> omp_target_alloc returns uvm ptr 0x00002000e7800000, size=80000000 +My libomptarget --> omp_target_alloc returns uvm ptr 0x000020015d600000, size=40000000 +My libomptarget --> COMPUTE (0x00000000100166f8) (#iter: 5000000 device: 0 UM: 0) at 1 +My libomptarget --> Map 0x000020015d600000 to device (0x00002000d9800000), size=40000000 +My libomptarget --> Map 0x00002000e7800000 to device (0x00002001a0000000), size=80000000 +My libomptarget --> Retrieve 0x00002000e7800000 from 0x00002001a0000000, size=80000000 +My libomptarget --> Unmap 0x00002000e7800000 from device (0x00002001a0000000), size=80000000 +My libomptarget --> Retrieve 0x000020015d600000 from 0x00002000d9800000, size=40000000 +My libomptarget --> Unmap 0x000020015d600000 from device (0x00002000d9800000), size=40000000 +My libomptarget --> COMPUTE (0x0000000010016724) (#iter: 5000000 device: 0 UM: 0) at 2 +My libomptarget --> Map 0x00002000e0000000 to device (0x00002000d9800000), size=40000008 +My libomptarget --> Submit 0x00002000e0000000 to 0x00002000d9800000, size=40000008 +My libomptarget --> Map 0x00002000e2800000 to device (0x000020019dc00000), size=16 +My libomptarget --> Submit 0x00002000e2800000 to 0x000020019dc00000, size=16 +My libomptarget --> Map 0x0000200100000000 to device (0x00002001a0000000), size=1439998560 +My libomptarget --> Submit 0x0000200100000000 to 0x00002001a0000000, size=1439998560 +My libomptarget --> Map 0x0000200155e00000 to device (0x00002001f5e00000), size=40000000 +My libomptarget --> Submit 0x0000200155e00000 to 0x00002001f5e00000, size=40000000 +My libomptarget --> Map 0x000020015ae00000 to device (0x00002001f8600000), size=40000000 +My libomptarget --> Map 0x0000200158600000 to device (0x00002001fae00000), size=40000000 +My libomptarget --> Submit 0x0000200158600000 to 0x00002001fae00000, size=40000000 +My libomptarget --> Map 0x00002000e2a00000 to device (0x0000200200000000), size=80000000 +My libomptarget --> Submit 0x00002000e2a00000 to 0x0000200200000000, size=80000000 +My libomptarget --> Map 0x00002000e7800000 to device (0x0000200204e00000), size=80000000 +My libomptarget --> Submit 0x00002000e7800000 to 0x0000200204e00000, size=80000000 +My libomptarget --> Map 0x000020015d600000 to device (0x0000200209c00000), size=40000000 +My libomptarget --> Submit 0x000020015d600000 to 0x0000200209c00000, size=40000000 +My libomptarget --> Retrieve 0x000020015d600000 from 0x0000200209c00000, size=40000000 +My libomptarget --> Unmap 0x000020015d600000 from device (0x0000200209c00000), size=40000000 +My libomptarget --> Retrieve 0x00002000e7800000 from 0x0000200204e00000, size=80000000 +My libomptarget --> Unmap 0x00002000e7800000 from device (0x0000200204e00000), size=80000000 +My libomptarget --> Unmap 0x00002000e2a00000 from device (0x0000200200000000), size=80000000 +My libomptarget --> Unmap 0x0000200158600000 from device (0x00002001fae00000), size=40000000 +My libomptarget --> Retrieve 0x000020015ae00000 from 0x00002001f8600000, size=40000000 +My libomptarget --> Unmap 0x000020015ae00000 from device (0x00002001f8600000), size=40000000 +My libomptarget --> Unmap 0x0000200155e00000 from device (0x00002001f5e00000), size=40000000 +My libomptarget --> Unmap 0x0000200100000000 from device (0x00002001a0000000), size=1439998560 +My libomptarget --> Unmap 0x00002000e2800000 from device (0x000020019dc00000), size=16 +My libomptarget --> Unmap 0x00002000e0000000 from device (0x00002000d9800000), size=40000008 +My libomptarget --> COMPUTE (0x00000000100166d9) (#iter: 5000000 device: 0 UM: 0) at 3 +My libomptarget --> Map 0x00002000e2a00000 to device (0x0000200237600000), size=80000000 +My libomptarget --> Submit 0x00002000e2a00000 to 0x0000200237600000, size=80000000 +My libomptarget --> Map 0x00002000e7800000 to device (0x00002000d9800000), size=80000000 +My libomptarget --> Submit 0x00002000e7800000 to 0x00002000d9800000, size=80000000 +My libomptarget --> Unmap 0x00002000e7800000 from device (0x00002000d9800000), size=80000000 +My libomptarget --> Retrieve 0x00002000e2a00000 from 0x0000200237600000, size=80000000 +My libomptarget --> Unmap 0x00002000e2a00000 from device (0x0000200237600000), size=80000000 +My libomptarget --> COMPUTE (0x00000000100166d8) (#iter: 5000000 device: 0 UM: 0) at 4 +My libomptarget --> Map 0x00007fffedfd3f28 to device (0x0000200237600000), size=8 +My libomptarget --> Submit 0x00007fffedfd3f28 to 0x0000200237600000, size=8 +My libomptarget --> Map 0x000020015d600000 to device (0x0000200237800000), size=40000000 +My libomptarget --> Submit 0x000020015d600000 to 0x0000200237800000, size=40000000 +My libomptarget --> Map 0x00007fffedfd3f20 to device (0x0000200237600200), size=8 +My libomptarget --> Submit 0x00007fffedfd3f20 to 0x0000200237600200, size=8 +My libomptarget --> Map 0x00002000e2a00000 to device (0x000020023a000000), size=80000000 +My libomptarget --> Submit 0x00002000e2a00000 to 0x000020023a000000, size=80000000 +My libomptarget --> Unmap 0x00002000e2a00000 from device (0x000020023a000000), size=80000000 +My libomptarget --> Retrieve 0x00007fffedfd3f20 from 0x0000200237600200, size=8 +My libomptarget --> Unmap 0x00007fffedfd3f20 from device (0x0000200237600200), size=8 +My libomptarget --> Unmap 0x000020015d600000 from device (0x0000200237800000), size=40000000 +My libomptarget --> Retrieve 0x00007fffedfd3f28 from 0x0000200237600000, size=8 +My libomptarget --> Unmap 0x00007fffedfd3f28 from device (0x0000200237600000), size=8 +My libomptarget --> COMPUTE (0x00000000100166f8) (#iter: 5000000 device: 0 UM: 0) at 5 +My libomptarget --> Map 0x000020015d600000 to device (0x0000200237600000), size=40000000 +My libomptarget --> Map 0x00002000e7800000 to device (0x0000200239e00000), size=80000000 +My libomptarget --> Retrieve 0x00002000e7800000 from 0x0000200239e00000, size=80000000 +My libomptarget --> Unmap 0x00002000e7800000 from device (0x0000200239e00000), size=80000000 +My libomptarget --> Retrieve 0x000020015d600000 from 0x0000200237600000, size=40000000 +My libomptarget --> Unmap 0x000020015d600000 from device (0x0000200237600000), size=40000000 +My libomptarget --> COMPUTE (0x0000000010016724) (#iter: 5000000 device: 0 UM: 0) at 6 +My libomptarget --> Map 0x00002000e0000000 to device (0x0000200237600000), size=40000008 +My libomptarget --> Submit 0x00002000e0000000 to 0x0000200237600000, size=40000008 +My libomptarget --> Map 0x00002000e2800000 to device (0x0000200239e00000), size=16 +My libomptarget --> Submit 0x00002000e2800000 to 0x0000200239e00000, size=16 +My libomptarget --> Map 0x0000200100000000 to device (0x00002001a0000000), size=1439998560 +My libomptarget --> Submit 0x0000200100000000 to 0x00002001a0000000, size=1439998560 +My libomptarget --> Map 0x0000200155e00000 to device (0x00002001f5e00000), size=40000000 +My libomptarget --> Submit 0x0000200155e00000 to 0x00002001f5e00000, size=40000000 +My libomptarget --> Map 0x000020015ae00000 to device (0x00002001f8600000), size=40000000 +My libomptarget --> Map 0x0000200158600000 to device (0x00002001fae00000), size=40000000 +My libomptarget --> Submit 0x0000200158600000 to 0x00002001fae00000, size=40000000 +My libomptarget --> Map 0x00002000e2a00000 to device (0x000020023a000000), size=80000000 +My libomptarget --> Submit 0x00002000e2a00000 to 0x000020023a000000, size=80000000 +My libomptarget --> Map 0x00002000e7800000 to device (0x00002000d9800000), size=80000000 +My libomptarget --> Submit 0x00002000e7800000 to 0x00002000d9800000, size=80000000 +My libomptarget --> Map 0x000020015d600000 to device (0x00002001fd600000), size=40000000 +My libomptarget --> Submit 0x000020015d600000 to 0x00002001fd600000, size=40000000 +My libomptarget --> Retrieve 0x000020015d600000 from 0x00002001fd600000, size=40000000 +My libomptarget --> Unmap 0x000020015d600000 from device (0x00002001fd600000), size=40000000 +My libomptarget --> Retrieve 0x00002000e7800000 from 0x00002000d9800000, size=80000000 +My libomptarget --> Unmap 0x00002000e7800000 from device (0x00002000d9800000), size=80000000 +My libomptarget --> Unmap 0x00002000e2a00000 from device (0x000020023a000000), size=80000000 +My libomptarget --> Unmap 0x0000200158600000 from device (0x00002001fae00000), size=40000000 +My libomptarget --> Retrieve 0x000020015ae00000 from 0x00002001f8600000, size=40000000 +My libomptarget --> Unmap 0x000020015ae00000 from device (0x00002001f8600000), size=40000000 +My libomptarget --> Unmap 0x0000200155e00000 from device (0x00002001f5e00000), size=40000000 +My libomptarget --> Unmap 0x0000200100000000 from device (0x00002001a0000000), size=1439998560 +My libomptarget --> Unmap 0x00002000e2800000 from device (0x0000200239e00000), size=16 +My libomptarget --> Unmap 0x00002000e0000000 from device (0x0000200237600000), size=40000008 +My libomptarget --> COMPUTE (0x00000000100166d9) (#iter: 5000000 device: 0 UM: 0) at 7 +My libomptarget --> Map 0x00002000e2a00000 to device (0x0000200237600000), size=80000000 +My libomptarget --> Submit 0x00002000e2a00000 to 0x0000200237600000, size=80000000 +My libomptarget --> Map 0x00002000e7800000 to device (0x00002000d9800000), size=80000000 +My libomptarget --> Submit 0x00002000e7800000 to 0x00002000d9800000, size=80000000 +My libomptarget --> Unmap 0x00002000e7800000 from device (0x00002000d9800000), size=80000000 +My libomptarget --> Retrieve 0x00002000e2a00000 from 0x0000200237600000, size=80000000 +My libomptarget --> Unmap 0x00002000e2a00000 from device (0x0000200237600000), size=80000000 +My libomptarget --> COMPUTE (0x00000000100166d8) (#iter: 5000000 device: 0 UM: 0) at 8 +My libomptarget --> Map 0x00007fffedfd3f28 to device (0x0000200237600000), size=8 +My libomptarget --> Submit 0x00007fffedfd3f28 to 0x0000200237600000, size=8 +My libomptarget --> Map 0x000020015d600000 to device (0x0000200237800000), size=40000000 +My libomptarget --> Submit 0x000020015d600000 to 0x0000200237800000, size=40000000 +My libomptarget --> Map 0x00007fffedfd3f20 to device (0x0000200237600200), size=8 +My libomptarget --> Submit 0x00007fffedfd3f20 to 0x0000200237600200, size=8 +My libomptarget --> Map 0x00002000e2a00000 to device (0x000020023a000000), size=80000000 +My libomptarget --> Submit 0x00002000e2a00000 to 0x000020023a000000, size=80000000 +My libomptarget --> Unmap 0x00002000e2a00000 from device (0x000020023a000000), size=80000000 +My libomptarget --> Retrieve 0x00007fffedfd3f20 from 0x0000200237600200, size=8 +My libomptarget --> Unmap 0x00007fffedfd3f20 from device (0x0000200237600200), size=8 +My libomptarget --> Unmap 0x000020015d600000 from device (0x0000200237800000), size=40000000 +My libomptarget --> Retrieve 0x00007fffedfd3f28 from 0x0000200237600000, size=8 +My libomptarget --> Unmap 0x00007fffedfd3f28 from device (0x0000200237600000), size=8 +Total size: 1.67638 +Time: 0.797371 +Modularity: 8.22212e-07, Iterations: 2, Time (in s): 2.47803 +********************************************************************** +==158920== Profiling application: ./miniVite -n 5000000 +==158920== Profiling result: + Type Time(%) Time Calls Avg Min Max Name + GPU activities: 91.38% 486.10ms 36 13.503ms 2.0480us 166.74ms [CUDA memcpy DtoD] + 8.26% 43.918ms 2 21.959ms 21.908ms 22.010ms __omp_offloading_35_eeedb51__Z17distLouvainMethodiiRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ddRi_l1367 + 0.14% 750.04us 2 375.02us 374.27us 375.77us __omp_offloading_35_eeedb51__Z21distComputeModularityRK5GraphP4CommPKddi_l395 + 0.12% 639.87us 2 319.93us 318.62us 321.25us __omp_offloading_35_eeedb51__Z20distUpdateLocalCinfolP4CommPKS__l435 + 0.09% 504.73us 2 252.37us 252.00us 252.73us __omp_offloading_35_eeedb51__Z16distCleanCWandCUlPdP4Comm_l454 + 0.00% 15.552us 8 1.9440us 1.7600us 2.1760us [CUDA memcpy DtoH] + 0.00% 7.9680us 5 1.5930us 1.3760us 1.7280us [CUDA memcpy HtoD] + API calls: 35.22% 524.24ms 1 524.24ms 524.24ms 524.24ms cuCtxCreate + 29.72% 442.40ms 29 15.255ms 28.595us 167.00ms cuMemcpyHtoD + 19.82% 295.04ms 1 295.04ms 295.04ms 295.04ms cuCtxDestroy + 3.67% 54.573ms 34 1.6051ms 42.012us 8.8931ms cuMemAlloc + 3.52% 52.401ms 20 2.6201ms 90.217us 8.2123ms cuMemcpyDtoH + 3.13% 46.539ms 8 5.8174ms 335.42us 22.099ms cuCtxSynchronize + 1.98% 29.432ms 34 865.65us 34.372us 3.1892ms cuMemFree + 1.41% 21.040ms 9 2.3377ms 63.714us 20.305ms cuMemAllocManaged + 0.78% 11.613ms 1 11.613ms 11.613ms 11.613ms cuModuleLoadDataEx + 0.48% 7.2150ms 1 7.2150ms 7.2150ms 7.2150ms cuModuleUnload + 0.24% 3.5659ms 8 445.74us 30.009us 3.2446ms cuLaunchKernel + 0.01% 186.90us 130 1.4370us 533ns 5.1930us cuCtxSetCurrent + 0.00% 15.519us 8 1.9390us 1.0490us 2.8850us cuFuncGetAttribute + 0.00% 13.594us 5 2.7180us 2.2340us 3.8610us cuModuleGetGlobal + 0.00% 12.258us 21 583ns 272ns 1.0910us cuDeviceGetAttribute + 0.00% 11.703us 6 1.9500us 1.2010us 4.3370us cuDeviceGetPCIBusId + 0.00% 8.8880us 7 1.2690us 623ns 4.5410us cuDeviceGet + 0.00% 7.9890us 4 1.9970us 1.5170us 3.3080us cuModuleGetFunction + 0.00% 1.5340us 3 511ns 361ns 606ns cuDeviceGetCount + +==158920== Unified Memory profiling result: +Total CPU Page faults: 5172 + +------------------------------------------------------------ +Sender: LSF System +Subject: Job 310716: in cluster Done + +Job was submitted from host by user in cluster at Wed Mar 27 16:47:14 2019 +Job was executed on host(s) <1*batch5>, in queue , as user in cluster at Wed Mar 27 16:47:28 2019 + <42*a20n12> + was used as the home directory. + was used as the working directory. +Started at Wed Mar 27 16:47:28 2019 +Terminated at Wed Mar 27 16:47:49 2019 +Results reported at Wed Mar 27 16:47:49 2019 + +The output (if any) is above this job summary. + +My libomptarget --> Set mode to UM +==13217== NVPROF is profiling process 13217, command: ./miniVite -n 5000000 +Average time to generate 10000000 random numbers using LCG (in s): 0.0182397 +********************************************************************** +Generated Random Geometric Graph with d: 0.000817469 +Number of vertices: 5000000 +Number of edges: 89999910 +Time to generate distributed graph of 5000000 vertices (in s): 8.38551 +Size: 16 : 8 +My libomptarget --> omp_target_alloc returns uvm ptr 0x00002000e0000000, size=40000008 +My libomptarget --> omp_target_alloc returns uvm ptr 0x00002000e2800000, size=16 +My libomptarget --> omp_target_alloc returns uvm ptr 0x0000200100000000, size=1439998560 +My libomptarget --> omp_target_alloc returns uvm ptr 0x0000200155e00000, size=40000000 +My libomptarget --> omp_target_alloc returns uvm ptr 0x0000200158600000, size=40000000 +My libomptarget --> omp_target_alloc returns uvm ptr 0x000020015ae00000, size=40000000 +My libomptarget --> omp_target_alloc returns uvm ptr 0x00002000e2a00000, size=80000000 +My libomptarget --> omp_target_alloc returns uvm ptr 0x00002000e7800000, size=80000000 +My libomptarget --> omp_target_alloc returns uvm ptr 0x000020015d600000, size=40000000 +My libomptarget --> COMPUTE (0x00000000100166f8) (#iter: 5000000 device: 0 UM: 0) at 1 +My libomptarget --> Map 0x000020015d600000 to UM, size=40000000 +My libomptarget --> Map 0x00002000e7800000 to UM, size=80000000 +My libomptarget --> Unmap 0x00002000e7800000 from UM (0x00002000e7800000), size=80000000 +My libomptarget --> Unmap 0x000020015d600000 from UM (0x000020015d600000), size=40000000 +My libomptarget --> COMPUTE (0x0000000010016724) (#iter: 5000000 device: 0 UM: 0) at 2 +My libomptarget --> Map 0x00002000e0000000 to UM, size=40000008 +My libomptarget --> Map 0x00002000e2800000 to UM, size=16 +My libomptarget --> Map 0x0000200100000000 to UM, size=1439998560 +My libomptarget --> Map 0x0000200155e00000 to UM, size=40000000 +My libomptarget --> Map 0x000020015ae00000 to UM, size=40000000 +My libomptarget --> Map 0x0000200158600000 to UM, size=40000000 +My libomptarget --> Map 0x00002000e2a00000 to UM, size=80000000 +My libomptarget --> Map 0x00002000e7800000 to UM, size=80000000 +My libomptarget --> Map 0x000020015d600000 to UM, size=40000000 +My libomptarget --> Unmap 0x000020015d600000 from UM (0x000020015d600000), size=40000000 +My libomptarget --> Unmap 0x00002000e7800000 from UM (0x00002000e7800000), size=80000000 +My libomptarget --> Unmap 0x00002000e2a00000 from UM (0x00002000e2a00000), size=80000000 +My libomptarget --> Unmap 0x0000200158600000 from UM (0x0000200158600000), size=40000000 +My libomptarget --> Unmap 0x000020015ae00000 from UM (0x000020015ae00000), size=40000000 +My libomptarget --> Unmap 0x0000200155e00000 from UM (0x0000200155e00000), size=40000000 +My libomptarget --> Unmap 0x0000200100000000 from UM (0x0000200100000000), size=1439998560 +My libomptarget --> Unmap 0x00002000e2800000 from UM (0x00002000e2800000), size=16 +My libomptarget --> Unmap 0x00002000e0000000 from UM (0x00002000e0000000), size=40000008 +My libomptarget --> COMPUTE (0x00000000100166d9) (#iter: 5000000 device: 0 UM: 0) at 3 +My libomptarget --> Map 0x00002000e2a00000 to UM, size=80000000 +My libomptarget --> Map 0x00002000e7800000 to UM, size=80000000 +My libomptarget --> Unmap 0x00002000e7800000 from UM (0x00002000e7800000), size=80000000 +My libomptarget --> Unmap 0x00002000e2a00000 from UM (0x00002000e2a00000), size=80000000 +My libomptarget --> COMPUTE (0x00000000100166d8) (#iter: 5000000 device: 0 UM: 0) at 4 +My libomptarget --> Map 0x00007fffda6c2478 to device (0x00002001b7600000), size=8 +My libomptarget --> Submit 0x00007fffda6c2478 to 0x00002001b7600000, size=8 +My libomptarget --> Map 0x000020015d600000 to UM, size=40000000 +My libomptarget --> Map 0x00007fffda6c2470 to device (0x00002001b7600200), size=8 +My libomptarget --> Submit 0x00007fffda6c2470 to 0x00002001b7600200, size=8 +My libomptarget --> Map 0x00002000e2a00000 to UM, size=80000000 +My libomptarget --> Unmap 0x00002000e2a00000 from UM (0x00002000e2a00000), size=80000000 +My libomptarget --> Retrieve 0x00007fffda6c2470 from 0x00002001b7600200, size=8 +My libomptarget --> Unmap 0x00007fffda6c2470 from device (0x00002001b7600200), size=8 +My libomptarget --> Unmap 0x000020015d600000 from UM (0x000020015d600000), size=40000000 +My libomptarget --> Retrieve 0x00007fffda6c2478 from 0x00002001b7600000, size=8 +My libomptarget --> Unmap 0x00007fffda6c2478 from device (0x00002001b7600000), size=8 +My libomptarget --> COMPUTE (0x00000000100166f8) (#iter: 5000000 device: 0 UM: 0) at 5 +My libomptarget --> Map 0x000020015d600000 to UM, size=40000000 +My libomptarget --> Map 0x00002000e7800000 to UM, size=80000000 +My libomptarget --> Unmap 0x00002000e7800000 from UM (0x00002000e7800000), size=80000000 +My libomptarget --> Unmap 0x000020015d600000 from UM (0x000020015d600000), size=40000000 +My libomptarget --> COMPUTE (0x0000000010016724) (#iter: 5000000 device: 0 UM: 0) at 6 +My libomptarget --> Map 0x00002000e0000000 to UM, size=40000008 +My libomptarget --> Map 0x00002000e2800000 to UM, size=16 +My libomptarget --> Map 0x0000200100000000 to UM, size=1439998560 +My libomptarget --> Map 0x0000200155e00000 to UM, size=40000000 +My libomptarget --> Map 0x000020015ae00000 to UM, size=40000000 +My libomptarget --> Map 0x0000200158600000 to UM, size=40000000 +My libomptarget --> Map 0x00002000e2a00000 to UM, size=80000000 +My libomptarget --> Map 0x00002000e7800000 to UM, size=80000000 +My libomptarget --> Map 0x000020015d600000 to UM, size=40000000 +My libomptarget --> Unmap 0x000020015d600000 from UM (0x000020015d600000), size=40000000 +My libomptarget --> Unmap 0x00002000e7800000 from UM (0x00002000e7800000), size=80000000 +My libomptarget --> Unmap 0x00002000e2a00000 from UM (0x00002000e2a00000), size=80000000 +My libomptarget --> Unmap 0x0000200158600000 from UM (0x0000200158600000), size=40000000 +My libomptarget --> Unmap 0x000020015ae00000 from UM (0x000020015ae00000), size=40000000 +My libomptarget --> Unmap 0x0000200155e00000 from UM (0x0000200155e00000), size=40000000 +My libomptarget --> Unmap 0x0000200100000000 from UM (0x0000200100000000), size=1439998560 +My libomptarget --> Unmap 0x00002000e2800000 from UM (0x00002000e2800000), size=16 +My libomptarget --> Unmap 0x00002000e0000000 from UM (0x00002000e0000000), size=40000008 +My libomptarget --> COMPUTE (0x00000000100166d9) (#iter: 5000000 device: 0 UM: 0) at 7 +My libomptarget --> Map 0x00002000e2a00000 to UM, size=80000000 +My libomptarget --> Map 0x00002000e7800000 to UM, size=80000000 +My libomptarget --> Unmap 0x00002000e7800000 from UM (0x00002000e7800000), size=80000000 +My libomptarget --> Unmap 0x00002000e2a00000 from UM (0x00002000e2a00000), size=80000000 +My libomptarget --> COMPUTE (0x00000000100166d8) (#iter: 5000000 device: 0 UM: 0) at 8 +My libomptarget --> Map 0x00007fffda6c2478 to device (0x00002001b7600000), size=8 +My libomptarget --> Submit 0x00007fffda6c2478 to 0x00002001b7600000, size=8 +My libomptarget --> Map 0x000020015d600000 to UM, size=40000000 +My libomptarget --> Map 0x00007fffda6c2470 to device (0x00002001b7600200), size=8 +My libomptarget --> Submit 0x00007fffda6c2470 to 0x00002001b7600200, size=8 +My libomptarget --> Map 0x00002000e2a00000 to UM, size=80000000 +My libomptarget --> Unmap 0x00002000e2a00000 from UM (0x00002000e2a00000), size=80000000 +My libomptarget --> Retrieve 0x00007fffda6c2470 from 0x00002001b7600200, size=8 +My libomptarget --> Unmap 0x00007fffda6c2470 from device (0x00002001b7600200), size=8 +My libomptarget --> Unmap 0x000020015d600000 from UM (0x000020015d600000), size=40000000 +My libomptarget --> Retrieve 0x00007fffda6c2478 from 0x00002001b7600000, size=8 +My libomptarget --> Unmap 0x00007fffda6c2478 from device (0x00002001b7600000), size=8 +Total size: 1.67638 +Time: 0.732451 +Modularity: 8.22212e-07, Iterations: 2, Time (in s): 2.4223 +********************************************************************** +==13217== Profiling application: ./miniVite -n 5000000 +==13217== Profiling result: + Type Time(%) Time Calls Avg Min Max Name + GPU activities: 93.79% 493.19ms 2 246.59ms 43.538ms 449.65ms __omp_offloading_35_eeedb51__Z17distLouvainMethodiiRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ddRi_l1367 + 5.96% 31.328ms 2 15.664ms 250.43us 31.078ms __omp_offloading_35_eeedb51__Z16distCleanCWandCUlPdP4Comm_l454 + 0.12% 646.84us 2 323.42us 322.85us 324.00us __omp_offloading_35_eeedb51__Z21distComputeModularityRK5GraphP4CommPKddi_l395 + 0.12% 640.60us 2 320.30us 317.53us 323.07us __omp_offloading_35_eeedb51__Z20distUpdateLocalCinfolP4CommPKS__l435 + 0.00% 14.336us 8 1.7920us 1.6000us 2.0160us [CUDA memcpy DtoH] + 0.00% 7.0400us 5 1.4080us 1.2480us 1.6000us [CUDA memcpy HtoD] + API calls: 38.34% 526.46ms 8 65.807ms 356.21us 449.74ms cuCtxSynchronize + 38.05% 522.45ms 1 522.45ms 522.45ms 522.45ms cuCtxCreate + 19.76% 271.31ms 1 271.31ms 271.31ms 271.31ms cuCtxDestroy + 1.54% 21.175ms 9 2.3528ms 62.599us 20.311ms cuMemAllocManaged + 0.86% 11.793ms 1 11.793ms 11.793ms 11.793ms cuModuleLoadDataEx + 0.54% 7.3927ms 1 7.3927ms 7.3927ms 7.3927ms cuModuleUnload + 0.49% 6.6668ms 8 833.35us 24.849us 6.3629ms cuLaunchKernel + 0.23% 3.1179ms 4 779.48us 20.709us 1.6574ms cuMemAlloc + 0.12% 1.6051ms 4 401.28us 34.284us 770.77us cuMemFree + 0.06% 822.10us 8 102.76us 88.720us 116.52us cuMemcpyDtoH + 0.02% 208.24us 5 41.647us 26.380us 66.923us cuMemcpyHtoD + 0.01% 69.504us 34 2.0440us 543ns 6.6380us cuCtxSetCurrent + 0.00% 17.224us 8 2.1530us 1.3830us 5.5330us cuFuncGetAttribute + 0.00% 13.122us 21 624ns 432ns 1.7660us cuDeviceGetAttribute + 0.00% 13.090us 5 2.6180us 2.0610us 3.8750us cuModuleGetGlobal + 0.00% 10.963us 6 1.8270us 1.1400us 4.0110us cuDeviceGetPCIBusId + 0.00% 8.8770us 7 1.2680us 602ns 4.2380us cuDeviceGet + 0.00% 8.5390us 4 2.1340us 1.4970us 3.8560us cuModuleGetFunction + 0.00% 1.5890us 3 529ns 416ns 599ns cuDeviceGetCount + +==13217== Unified Memory profiling result: +Device "Tesla V100-SXM2-16GB (0)" + Count Avg Size Min Size Max Size Total Size Total Time Name + 9400 166.63KB 64.000KB 1.3750MB 1.493774GB 56.01582ms Host To Device + 812 192.95KB 64.000KB 960.00KB 153.0000MB 5.260943ms Device To Host + 1694 - - - - 505.7533ms Gpu page fault groups +Total CPU Page faults: 5664 + +------------------------------------------------------------ +Sender: LSF System +Subject: Job 310719: in cluster Done + +Job was submitted from host by user in cluster at Wed Mar 27 16:50:55 2019 +Job was executed on host(s) <1*batch5>, in queue , as user in cluster at Wed Mar 27 16:51:02 2019 + <42*a29n08> + was used as the home directory. + was used as the working directory. +Started at Wed Mar 27 16:51:02 2019 +Terminated at Wed Mar 27 16:51:20 2019 +Results reported at Wed Mar 27 16:51:20 2019 + +The output (if any) is above this job summary. + +My libomptarget --> Set mode to SDEV +==15656== NVPROF is profiling process 15656, command: ./miniVite -n 5000000 +Average time to generate 10000000 random numbers using LCG (in s): 0.0181794 +********************************************************************** +Generated Random Geometric Graph with d: 0.000817469 +Number of vertices: 5000000 +Number of edges: 89999910 +Time to generate distributed graph of 5000000 vertices (in s): 8.4145 +Size: 16 : 8 +My libomptarget --> omp_target_alloc returns uvm ptr 0x00002000e0000000, size=40000008 +My libomptarget --> omp_target_alloc returns uvm ptr 0x00002000e2800000, size=16 +My libomptarget --> omp_target_alloc returns uvm ptr 0x0000200120000000, size=1439998560 +My libomptarget --> omp_target_alloc returns uvm ptr 0x0000200175e00000, size=40000000 +My libomptarget --> omp_target_alloc returns uvm ptr 0x0000200178600000, size=40000000 +My libomptarget --> omp_target_alloc returns uvm ptr 0x000020017ae00000, size=40000000 +My libomptarget --> omp_target_alloc returns uvm ptr 0x00002000e2a00000, size=80000000 +My libomptarget --> omp_target_alloc returns uvm ptr 0x00002000e7800000, size=80000000 +My libomptarget --> omp_target_alloc returns uvm ptr 0x000020017d600000, size=40000000 +My libomptarget --> COMPUTE (0x00000000100166f8) (#iter: 5000000 device: 0 UM: 0) at 1 +My libomptarget --> Map 0x000020017d600000 to soft device, size=40000000 +My libomptarget --> Apply opt 4 to 0x000020017d600000 +My libomptarget --> Apply opt 1 to 0x000020017d600000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x00002000e7800000 to soft device, size=80000000 +My libomptarget --> Apply opt 4 to 0x00002000e7800000 +My libomptarget --> Apply opt 1 to 0x00002000e7800000 +My libomptarget --> Invalid optimization +My libomptarget --> Unmap 0x00002000e7800000 from soft device (0x00002000e7800000), size=80000000 +My libomptarget --> Unmap 0x000020017d600000 from soft device (0x000020017d600000), size=40000000 +My libomptarget --> COMPUTE (0x0000000010016724) (#iter: 5000000 device: 0 UM: 0) at 2 +My libomptarget --> Map 0x00002000e0000000 to soft device, size=40000008 +My libomptarget --> Apply opt 4 to 0x00002000e0000000 +My libomptarget --> Apply opt 1 to 0x00002000e0000000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x00002000e2800000 to soft device, size=16 +My libomptarget --> Apply opt 4 to 0x00002000e2800000 +My libomptarget --> Apply opt 1 to 0x00002000e2800000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x0000200120000000 to soft device, size=1439998560 +My libomptarget --> Apply opt 4 to 0x0000200120000000 +My libomptarget --> Apply opt 1 to 0x0000200120000000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x0000200175e00000 to soft device, size=40000000 +My libomptarget --> Apply opt 4 to 0x0000200175e00000 +My libomptarget --> Apply opt 1 to 0x0000200175e00000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x000020017ae00000 to soft device, size=40000000 +My libomptarget --> Apply opt 4 to 0x000020017ae00000 +My libomptarget --> Apply opt 1 to 0x000020017ae00000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x0000200178600000 to soft device, size=40000000 +My libomptarget --> Apply opt 4 to 0x0000200178600000 +My libomptarget --> Apply opt 1 to 0x0000200178600000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x00002000e2a00000 to soft device, size=80000000 +My libomptarget --> Apply opt 4 to 0x00002000e2a00000 +My libomptarget --> Apply opt 1 to 0x00002000e2a00000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x00002000e7800000 to soft device, size=80000000 +My libomptarget --> Apply opt 4 to 0x00002000e7800000 +My libomptarget --> Apply opt 1 to 0x00002000e7800000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x000020017d600000 to soft device, size=40000000 +My libomptarget --> Apply opt 4 to 0x000020017d600000 +My libomptarget --> Apply opt 1 to 0x000020017d600000 +My libomptarget --> Invalid optimization +My libomptarget --> Unmap 0x000020017d600000 from soft device (0x000020017d600000), size=40000000 +My libomptarget --> Unmap 0x00002000e7800000 from soft device (0x00002000e7800000), size=80000000 +My libomptarget --> Unmap 0x00002000e2a00000 from soft device (0x00002000e2a00000), size=80000000 +My libomptarget --> Unmap 0x0000200178600000 from soft device (0x0000200178600000), size=40000000 +My libomptarget --> Unmap 0x000020017ae00000 from soft device (0x000020017ae00000), size=40000000 +My libomptarget --> Unmap 0x0000200175e00000 from soft device (0x0000200175e00000), size=40000000 +My libomptarget --> Unmap 0x0000200120000000 from soft device (0x0000200120000000), size=1439998560 +My libomptarget --> Unmap 0x00002000e2800000 from soft device (0x00002000e2800000), size=16 +My libomptarget --> Unmap 0x00002000e0000000 from soft device (0x00002000e0000000), size=40000008 +My libomptarget --> COMPUTE (0x00000000100166d9) (#iter: 5000000 device: 0 UM: 0) at 3 +My libomptarget --> Map 0x00002000e2a00000 to soft device, size=80000000 +My libomptarget --> Apply opt 4 to 0x00002000e2a00000 +My libomptarget --> Apply opt 1 to 0x00002000e2a00000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x00002000e7800000 to soft device, size=80000000 +My libomptarget --> Apply opt 4 to 0x00002000e7800000 +My libomptarget --> Apply opt 1 to 0x00002000e7800000 +My libomptarget --> Invalid optimization +My libomptarget --> Unmap 0x00002000e7800000 from soft device (0x00002000e7800000), size=80000000 +My libomptarget --> Unmap 0x00002000e2a00000 from soft device (0x00002000e2a00000), size=80000000 +My libomptarget --> COMPUTE (0x00000000100166d8) (#iter: 5000000 device: 0 UM: 0) at 4 +My libomptarget --> Map 0x00007fffca5000d8 to device (0x00002001d7600000), size=8 +My libomptarget --> Submit 0x00007fffca5000d8 to 0x00002001d7600000, size=8 +My libomptarget --> Map 0x000020017d600000 to soft device, size=40000000 +My libomptarget --> Apply opt 4 to 0x000020017d600000 +My libomptarget --> Apply opt 1 to 0x000020017d600000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x00007fffca5000d0 to device (0x00002001d7600200), size=8 +My libomptarget --> Submit 0x00007fffca5000d0 to 0x00002001d7600200, size=8 +My libomptarget --> Map 0x00002000e2a00000 to soft device, size=80000000 +My libomptarget --> Apply opt 4 to 0x00002000e2a00000 +My libomptarget --> Apply opt 1 to 0x00002000e2a00000 +My libomptarget --> Invalid optimization +My libomptarget --> Unmap 0x00002000e2a00000 from soft device (0x00002000e2a00000), size=80000000 +My libomptarget --> Retrieve 0x00007fffca5000d0 from 0x00002001d7600200, size=8 +My libomptarget --> Unmap 0x00007fffca5000d0 from device (0x00002001d7600200), size=8 +My libomptarget --> Unmap 0x000020017d600000 from soft device (0x000020017d600000), size=40000000 +My libomptarget --> Retrieve 0x00007fffca5000d8 from 0x00002001d7600000, size=8 +My libomptarget --> Unmap 0x00007fffca5000d8 from device (0x00002001d7600000), size=8 +My libomptarget --> COMPUTE (0x00000000100166f8) (#iter: 5000000 device: 0 UM: 0) at 5 +My libomptarget --> Map 0x000020017d600000 to soft device, size=40000000 +My libomptarget --> Apply opt 4 to 0x000020017d600000 +My libomptarget --> Apply opt 1 to 0x000020017d600000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x00002000e7800000 to soft device, size=80000000 +My libomptarget --> Apply opt 4 to 0x00002000e7800000 +My libomptarget --> Apply opt 1 to 0x00002000e7800000 +My libomptarget --> Invalid optimization +My libomptarget --> Unmap 0x00002000e7800000 from soft device (0x00002000e7800000), size=80000000 +My libomptarget --> Unmap 0x000020017d600000 from soft device (0x000020017d600000), size=40000000 +My libomptarget --> COMPUTE (0x0000000010016724) (#iter: 5000000 device: 0 UM: 0) at 6 +My libomptarget --> Map 0x00002000e0000000 to soft device, size=40000008 +My libomptarget --> Apply opt 4 to 0x00002000e0000000 +My libomptarget --> Apply opt 1 to 0x00002000e0000000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x00002000e2800000 to soft device, size=16 +My libomptarget --> Apply opt 4 to 0x00002000e2800000 +My libomptarget --> Apply opt 1 to 0x00002000e2800000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x0000200120000000 to soft device, size=1439998560 +My libomptarget --> Apply opt 4 to 0x0000200120000000 +My libomptarget --> Apply opt 1 to 0x0000200120000000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x0000200175e00000 to soft device, size=40000000 +My libomptarget --> Apply opt 4 to 0x0000200175e00000 +My libomptarget --> Apply opt 1 to 0x0000200175e00000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x000020017ae00000 to soft device, size=40000000 +My libomptarget --> Apply opt 4 to 0x000020017ae00000 +My libomptarget --> Apply opt 1 to 0x000020017ae00000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x0000200178600000 to soft device, size=40000000 +My libomptarget --> Apply opt 4 to 0x0000200178600000 +My libomptarget --> Apply opt 1 to 0x0000200178600000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x00002000e2a00000 to soft device, size=80000000 +My libomptarget --> Apply opt 4 to 0x00002000e2a00000 +My libomptarget --> Apply opt 1 to 0x00002000e2a00000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x00002000e7800000 to soft device, size=80000000 +My libomptarget --> Apply opt 4 to 0x00002000e7800000 +My libomptarget --> Apply opt 1 to 0x00002000e7800000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x000020017d600000 to soft device, size=40000000 +My libomptarget --> Apply opt 4 to 0x000020017d600000 +My libomptarget --> Apply opt 1 to 0x000020017d600000 +My libomptarget --> Invalid optimization +My libomptarget --> Unmap 0x000020017d600000 from soft device (0x000020017d600000), size=40000000 +My libomptarget --> Unmap 0x00002000e7800000 from soft device (0x00002000e7800000), size=80000000 +My libomptarget --> Unmap 0x00002000e2a00000 from soft device (0x00002000e2a00000), size=80000000 +My libomptarget --> Unmap 0x0000200178600000 from soft device (0x0000200178600000), size=40000000 +My libomptarget --> Unmap 0x000020017ae00000 from soft device (0x000020017ae00000), size=40000000 +My libomptarget --> Unmap 0x0000200175e00000 from soft device (0x0000200175e00000), size=40000000 +My libomptarget --> Unmap 0x0000200120000000 from soft device (0x0000200120000000), size=1439998560 +My libomptarget --> Unmap 0x00002000e2800000 from soft device (0x00002000e2800000), size=16 +My libomptarget --> Unmap 0x00002000e0000000 from soft device (0x00002000e0000000), size=40000008 +My libomptarget --> COMPUTE (0x00000000100166d9) (#iter: 5000000 device: 0 UM: 0) at 7 +My libomptarget --> Map 0x00002000e2a00000 to soft device, size=80000000 +My libomptarget --> Apply opt 4 to 0x00002000e2a00000 +My libomptarget --> Apply opt 1 to 0x00002000e2a00000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x00002000e7800000 to soft device, size=80000000 +My libomptarget --> Apply opt 4 to 0x00002000e7800000 +My libomptarget --> Apply opt 1 to 0x00002000e7800000 +My libomptarget --> Invalid optimization +My libomptarget --> Unmap 0x00002000e7800000 from soft device (0x00002000e7800000), size=80000000 +My libomptarget --> Unmap 0x00002000e2a00000 from soft device (0x00002000e2a00000), size=80000000 +My libomptarget --> COMPUTE (0x00000000100166d8) (#iter: 5000000 device: 0 UM: 0) at 8 +My libomptarget --> Map 0x00007fffca5000d8 to device (0x00002001d7600000), size=8 +My libomptarget --> Submit 0x00007fffca5000d8 to 0x00002001d7600000, size=8 +My libomptarget --> Map 0x000020017d600000 to soft device, size=40000000 +My libomptarget --> Apply opt 4 to 0x000020017d600000 +My libomptarget --> Apply opt 1 to 0x000020017d600000 +My libomptarget --> Invalid optimization +My libomptarget --> Map 0x00007fffca5000d0 to device (0x00002001d7600200), size=8 +My libomptarget --> Submit 0x00007fffca5000d0 to 0x00002001d7600200, size=8 +My libomptarget --> Map 0x00002000e2a00000 to soft device, size=80000000 +My libomptarget --> Apply opt 4 to 0x00002000e2a00000 +My libomptarget --> Apply opt 1 to 0x00002000e2a00000 +My libomptarget --> Invalid optimization +My libomptarget --> Unmap 0x00002000e2a00000 from soft device (0x00002000e2a00000), size=80000000 +My libomptarget --> Retrieve 0x00007fffca5000d0 from 0x00002001d7600200, size=8 +My libomptarget --> Unmap 0x00007fffca5000d0 from device (0x00002001d7600200), size=8 +My libomptarget --> Unmap 0x000020017d600000 from soft device (0x000020017d600000), size=40000000 +My libomptarget --> Retrieve 0x00007fffca5000d8 from 0x00002001d7600000, size=8 +My libomptarget --> Unmap 0x00007fffca5000d8 from device (0x00002001d7600000), size=8 +Total size: 1.67638 +Time: 0.712737 +Modularity: 8.22212e-07, Iterations: 2, Time (in s): 2.37011 +********************************************************************** +==15656== Profiling application: ./miniVite -n 5000000 +==15656== Profiling result: + Type Time(%) Time Calls Avg Min Max Name + GPU activities: 93.47% 443.05ms 2 221.53ms 20.894ms 422.16ms __omp_offloading_35_eeedb51__Z17distLouvainMethodiiRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ddRi_l1367 + 6.26% 29.668ms 2 14.834ms 252.83us 29.415ms __omp_offloading_35_eeedb51__Z16distCleanCWandCUlPdP4Comm_l454 + 0.14% 645.59us 2 322.80us 322.37us 323.23us __omp_offloading_35_eeedb51__Z21distComputeModularityRK5GraphP4CommPKddi_l395 + 0.13% 635.20us 2 317.60us 317.05us 318.14us __omp_offloading_35_eeedb51__Z20distUpdateLocalCinfolP4CommPKS__l435 + 0.00% 14.368us 8 1.7960us 1.6000us 2.0160us [CUDA memcpy DtoH] + 0.00% 7.0720us 5 1.4140us 1.2480us 1.6000us [CUDA memcpy HtoD] + API calls: 40.70% 520.60ms 1 520.60ms 520.60ms 520.60ms cuCtxCreate + 37.11% 474.66ms 8 59.333ms 348.41us 422.24ms cuCtxSynchronize + 17.93% 229.37ms 1 229.37ms 229.37ms 229.37ms cuCtxDestroy + 1.65% 21.044ms 9 2.3382ms 77.557us 20.303ms cuMemAllocManaged + 0.90% 11.558ms 1 11.558ms 11.558ms 11.558ms cuModuleLoadDataEx + 0.57% 7.2276ms 1 7.2276ms 7.2276ms 7.2276ms cuModuleUnload + 0.52% 6.5948ms 8 824.34us 24.212us 6.3617ms cuLaunchKernel + 0.22% 2.7807ms 30 92.689us 3.1470us 1.9706ms cuMemAdvise + 0.20% 2.5650ms 4 641.26us 20.647us 1.3774ms cuMemAlloc + 0.12% 1.5353ms 4 383.83us 30.558us 741.42us cuMemFree + 0.06% 830.25us 8 103.78us 88.896us 125.59us cuMemcpyDtoH + 0.01% 168.38us 5 33.675us 26.187us 48.213us cuMemcpyHtoD + 0.00% 53.465us 34 1.5720us 641ns 4.6320us cuCtxSetCurrent + 0.00% 13.006us 5 2.6010us 2.0720us 3.2640us cuModuleGetGlobal + 0.00% 12.896us 21 614ns 365ns 1.1060us cuDeviceGetAttribute + 0.00% 12.454us 8 1.5560us 890ns 2.4160us cuFuncGetAttribute + 0.00% 11.742us 6 1.9570us 1.1400us 4.6640us cuDeviceGetPCIBusId + 0.00% 8.6470us 7 1.2350us 597ns 4.3750us cuDeviceGet + 0.00% 7.5310us 4 1.8820us 1.4760us 2.8370us cuModuleGetFunction + 0.00% 1.5450us 3 515ns 329ns 621ns cuDeviceGetCount + +==15656== Unified Memory profiling result: +Device "Tesla V100-SXM2-16GB (0)" + Count Avg Size Min Size Max Size Total Size Total Time Name + 9041 158.13KB 64.000KB 1.3125MB 1.363464GB 51.94749ms Host To Device + 1481 - - - - 448.4253ms Gpu page fault groups + 40 1.9094MB 192.00KB 2.0000MB 76.37500MB - Remote mapping to device +Total CPU Page faults: 5212 +Total remote mappings from CPU: 40 + +------------------------------------------------------------ +Sender: LSF System +Subject: Job 310728: in cluster Done + +Job was submitted from host by user in cluster at Wed Mar 27 17:01:23 2019 +Job was executed on host(s) <1*batch5>, in queue , as user in cluster at Wed Mar 27 17:01:36 2019 + <42*h36n13> + was used as the home directory. + was used as the working directory. +Started at Wed Mar 27 17:01:36 2019 +Terminated at Wed Mar 27 17:01:53 2019 +Results reported at Wed Mar 27 17:01:53 2019 + +The output (if any) is above this job summary. + diff --git a/miniVite/logcmpl b/miniVite/logcmpl new file mode 100644 index 0000000..aa4afc3 --- /dev/null +++ b/miniVite/logcmpl @@ -0,0 +1,5666 @@ +mpicxx -std=c++11 -g -O3 -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -DOMP_GPU -DOMP_GPU_ALLOC -DCHECK_NUM_EDGES -Xclang -load -Xclang ~/git/unifiedmem/code/llvm-pass/build/uvm/libOMPPass.so -c -o main.o main.cpp +In file included from main.cpp:58: +In file included from ./dspl_gpu_kernel.hpp:58: +In file included from ./graph.hpp:56: +./utils.hpp:263:56: warning: using floating point absolute value function 'fabs' when argument is of integer type [-Wabsolute-value] + drand_[i] = (GraphWeight)((GraphWeight)fabs(rnums_[i]) * mult); // 0-1 + ^ +./utils.hpp:263:56: note: use function 'std::abs' instead + drand_[i] = (GraphWeight)((GraphWeight)fabs(rnums_[i]) * mult); // 0-1 + ^~~~ + std::abs + ---- Function Argument Access Frequency CG Analysis ---- +On function _Z7is_pwr2i +Round 0 +Round end +On function _Z8reseederj +Round 0 +Round end +On function _ZNSt8seed_seq8generateIN9__gnu_cxx17__normal_iteratorIPjSt6vectorIjSaIjEEEEEEvT_S8_ +Round 0 + alias entry %18 = getelementptr inbounds %"class.std::seed_seq", %"class.std::seed_seq"* %0, i64 0, i32 0, i32 0, i32 0, i32 1, !dbg !10369 + alias entry %19 = bitcast i32** %18 to i64*, !dbg !10369 + alias entry %21 = bitcast %"class.std::seed_seq"* %0 to i64*, !dbg !10376 +Round 1 +Round end + load (6.274510e-01) from %"class.std::seed_seq"* %0 + load (6.274510e-01) from %"class.std::seed_seq"* %0 + Frequency of %"class.std::seed_seq"* %0 + load: 1.254902e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z4lockv +Round 0 +Round end +On function _Z6unlockv +Round 0 +Round end +On function _Z19distSumVertexDegreeRK5GraphRSt6vectorIdSaIdEERS2_I4CommSaIS6_EE +Round 0 + alias entry %6 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 2, !dbg !10459 +Round 1 +Round end + load (1.000000e+00) from %class.Graph* %0 + Frequency of %class.Graph* %0 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.10"* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.15"* %2 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function __clang_call_terminate +Round 0 +Round end + Frequency of i8* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined. +Round 0 + alias entry %25 = getelementptr inbounds %class.Graph, %class.Graph* %3, i64 0, i32 0, i32 0, i32 0, i32 0 + alias entry %26 = getelementptr inbounds %class.Graph, %class.Graph* %3, i64 0, i32 1, i32 0, i32 0, i32 0 + alias entry %27 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %4, i64 0, i32 0, i32 0, i32 0 + alias entry %28 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %5, i64 0, i32 0, i32 0, i32 0 +Round 1 +Round end + load (6.350000e+00) from %class.Graph* %3 + load (6.350000e+00) from %class.Graph* %3 + load (6.350000e+00) from %"class.std::vector.10"* %4 + load (6.350000e+00) from %"class.std::vector.15"* %5 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %class.Graph* %3 + load: 1.270000e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.10"* %4 + load: 6.350000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.15"* %5 + load: 6.350000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z29distCalcConstantForSecondTermRKSt6vectorIdSaIdEEP19ompi_communicator_t +Round 0 + alias entry %9 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10283 + alias entry %10 = bitcast double** %9 to i64*, !dbg !10283 + alias entry %12 = bitcast %"class.std::vector.10"* %0 to i64*, !dbg !10288 +Round 1 +Round end + load (1.000000e+00) from %"class.std::vector.10"* %0 + load (1.000000e+00) from %"class.std::vector.10"* %0 + Frequency of %"class.std::vector.10"* %0 + load: 2.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.ompi_communicator_t* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp.reduction.reduction_func +Round 0 + alias entry %3 = bitcast i8* %1 to double**, !dbg !10261 + alias entry %5 = bitcast i8* %0 to double**, !dbg !10261 +Round 1 +Round end + load (1.000000e+00) from i8* %1 + load (1.000000e+00) from i8* %0 + Frequency of i8* %0 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i8* %1 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..2 +Round 0 + alias entry %32 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %4, i64 0, i32 0, i32 0, i32 0 + alias entry %102 = bitcast double* %3 to i64*, !dbg !10325 +Round 1 +Round end + load (3.157895e-01) from %"class.std::vector.10"* %4 + load (2.105263e-01) from double* %3 + store (2.105263e-01) to double* %3 + load (2.105263e-01) from double* %3 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %3 + load: 4.210526e-01 store: 2.105263e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.10"* %4 + load: 3.157895e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z12distInitCommRSt6vectorIlSaIlEES2_l +Round 0 + alias entry %6 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %1, i64 0, i32 0, i32 0, i32 1, !dbg !10273 + alias entry %7 = bitcast i64** %6 to i64*, !dbg !10273 + alias entry %9 = bitcast %"class.std::vector.0"* %1 to i64*, !dbg !10280 +Round 1 +Round end + load (1.000000e+00) from %"class.std::vector.0"* %1 + load (1.000000e+00) from %"class.std::vector.0"* %1 + Frequency of %"class.std::vector.0"* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %1 + load: 2.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..4 +Round 0 + alias entry %29 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 0 + alias entry %31 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %5, i64 0, i32 0, i32 0, i32 0 +Round 1 +Round end + load (3.200000e-01) from %"class.std::vector.0"* %3 + load (3.200000e-01) from %"class.std::vector.0"* %5 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %3 + load: 3.200000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %5 + load: 3.200000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z15distInitLouvainRK5GraphRSt6vectorIlSaIlEES5_RS2_IdSaIdEES8_RS2_I4CommSaIS9_EESC_Rdi +Round 0 + alias entry %16 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 0, !dbg !10485 + alias entry %20 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 2, !dbg !10502 + alias entry %22 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !10514 + alias entry %24 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %3, i64 0, i32 0, i32 0, i32 1, !dbg !10532 + alias entry %25 = bitcast double** %24 to i64*, !dbg !10532 + alias entry %27 = bitcast %"class.std::vector.10"* %3 to i64*, !dbg !10536 + alias entry %40 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %1, i64 0, i32 0, i32 0, i32 1, !dbg !10572 + alias entry %41 = bitcast i64** %40 to i64*, !dbg !10572 + alias entry %43 = bitcast %"class.std::vector.0"* %1 to i64*, !dbg !10574 + alias entry %56 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %2, i64 0, i32 0, i32 0, i32 1, !dbg !10600 + alias entry %57 = bitcast i64** %56 to i64*, !dbg !10600 + alias entry %59 = bitcast %"class.std::vector.0"* %2 to i64*, !dbg !10601 + alias entry %72 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %4, i64 0, i32 0, i32 0, i32 1, !dbg !10622 + alias entry %73 = bitcast double** %72 to i64*, !dbg !10622 + alias entry %75 = bitcast %"class.std::vector.10"* %4 to i64*, !dbg !10623 + alias entry %88 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %5, i64 0, i32 0, i32 0, i32 1, !dbg !10654 + alias entry %89 = bitcast %struct.Comm** %88 to i64*, !dbg !10654 + alias entry %91 = bitcast %"class.std::vector.15"* %5 to i64*, !dbg !10658 + alias entry %104 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %6, i64 0, i32 0, i32 0, i32 1, !dbg !10685 + alias entry %105 = bitcast %struct.Comm** %104 to i64*, !dbg !10685 + alias entry %107 = bitcast %"class.std::vector.15"* %6 to i64*, !dbg !10686 +Round 1 +Round end + load (1.000000e+00) from %class.Graph* %0 + load (1.000000e+00) from %class.Graph* %0 + load (1.000000e+00) from %class.Graph* %0 + load (1.000000e+00) from %"class.std::vector.10"* %3 + load (1.000000e+00) from %"class.std::vector.10"* %3 +Warning: wrong traversal order, or recursive call +On function _Z15distGetMaxIndexP7clmap_tRiPdS1_dPK4Commdldllld +Round 0 + alias entry %22 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %0, i64 %21 + alias entry %28 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %0, i64 0, i32 0, !dbg !10320 + alias entry %33 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %32, i32 1, !dbg !10330 + alias entry %35 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %32, i32 0, !dbg !10333 + alias entry %37 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %0, i64 0, i32 1, !dbg !10335 + alias entry %41 = getelementptr inbounds double, double* %2, i64 %38, !dbg !10340 + alias entry %60 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %0, i64 1, !dbg !10352 + alias entry %81 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %80, i32 1, !dbg !10330 + alias entry %83 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %80, i32 0, !dbg !10333 + alias entry %89 = getelementptr inbounds double, double* %2, i64 %86, !dbg !10340 + alias entry %126 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %125, i32 1, !dbg !10330 + alias entry %128 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %125, i32 0, !dbg !10333 + alias entry %134 = getelementptr inbounds double, double* %2, i64 %131, !dbg !10340 +Round 1 +Round end + load (1.000000e+00) from double* %2 + load (1.000000e+00) from i32* %3 + load (1.000000e+00) from i32* %1 + load (5.000000e-01) from %struct.clmap_t* %0 + load (2.500000e-01) from %struct.Comm* %5 + load (2.500000e-01) from %struct.Comm* %5 + load (2.500000e-01) from %struct.clmap_t* %0 + load (1.250000e-01) from double* %2 + load (9.984375e+00) from %struct.Comm* %5 + load (9.984375e+00) from %struct.Comm* %5 + load (4.984375e+00) from double* %2 + load (9.984375e+00) from %struct.Comm* %5 + load (9.984375e+00) from %struct.Comm* %5 + load (4.984375e+00) from double* %2 + Frequency of %struct.clmap_t* %0 + load: 7.500000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %2 + load: 1.109375e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %3 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %5 + load: 4.043750e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z24distBuildLocalMapCounterllP7clmap_tRiPdS1_PK4EdgePKllll +Round 0 + alias entry %21 = getelementptr inbounds %struct.Edge, %struct.Edge* %6, i64 %20, i32 0, !dbg !10308 + alias entry %22 = getelementptr inbounds %struct.Edge, %struct.Edge* %6, i64 %20, i32 1, !dbg !10310 + alias entry %31 = getelementptr inbounds i64, i64* %7, i64 %30, !dbg !10326 + alias entry %39 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %2, i64 %37, i32 0, !dbg !10337 + alias entry %48 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %2, i64 %35, !dbg !10348 + alias entry %58 = getelementptr inbounds double, double* %4, i64 %52, !dbg !10358 + alias entry %64 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %2, i64 %35, i32 0, !dbg !10364 + alias entry %65 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %2, i64 %35, i32 1, !dbg !10367 + alias entry %71 = bitcast double* %22 to i64*, !dbg !10375 + alias entry %74 = getelementptr inbounds double, double* %4, i64 %73, !dbg !10377 + alias entry %75 = bitcast double* %74 to i64*, !dbg !10378 +Round 1 +Round end + load (1.593750e+01) from %struct.Edge* %6 + load (7.937500e+00) from %struct.Edge* %6 + load (1.593750e+01) from i64* %7 + load (1.593750e+01) from i32* %3 + load (1.625000e+02) from %struct.clmap_t* %2 + load (9.937500e+00) from i32* %5 + load (4.937500e+00) from %struct.Edge* %6 + load (4.937500e+00) from double* %4 + store (4.937500e+00) to double* %4 + store (5.437500e+00) to %struct.clmap_t* %2 + store (5.437500e+00) to %struct.clmap_t* %2 + store (5.437500e+00) to i32* %3 + load (1.093750e+01) from i32* %5 + load (5.437500e+00) from %struct.Edge* %6 + store (5.437500e+00) to double* %4 + store (5.437500e+00) to i32* %5 + Frequency of %struct.clmap_t* %2 + load: 1.625000e+02 store: 1.087500e+01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %3 + load: 1.593750e+01 store: 5.437500e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %4 + load: 4.937500e+00 store: 1.037500e+01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %5 + load: 2.087500e+01 store: 5.437500e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Edge* %6 + load: 3.425000e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %7 + load: 1.593750e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z27distExecuteLouvainIterationlPKlS0_PK4EdgeS0_PlPKdP4CommS8_dPdi +Round 0 + alias entry %18 = getelementptr inbounds i64, i64* %2, i64 %17, !dbg !10316 + alias entry %20 = getelementptr inbounds i64, i64* %4, i64 %0, !dbg !10322 + alias entry %23 = getelementptr inbounds i64, i64* %1, i64 %0, !dbg !10329 + alias entry %26 = getelementptr inbounds i64, i64* %1, i64 %25, !dbg !10332 + alias entry %30 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %22, i32 0, !dbg !10337 + alias entry %32 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %22, i32 1, !dbg !10341 + alias entry %47 = getelementptr inbounds %struct.Edge, %struct.Edge* %3, i64 %46, i32 0, !dbg !10401 + alias entry %48 = getelementptr inbounds %struct.Edge, %struct.Edge* %3, i64 %46, i32 1, !dbg !10403 + alias entry %57 = getelementptr inbounds i64, i64* %4, i64 %56, !dbg !10414 + alias entry %95 = bitcast double* %48 to i64*, !dbg !10457 + alias entry %118 = getelementptr inbounds double, double* %10, i64 %0, !dbg !10470 + alias entry %122 = getelementptr inbounds double, double* %6, i64 %0, !dbg !10473 + alias entry %140 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %139, i32 1, !dbg !10533 + alias entry %142 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %139, i32 0, !dbg !10534 + alias entry %188 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %187, i32 1, !dbg !10533 + alias entry %190 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %187, i32 0, !dbg !10534 + alias entry %236 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %235, i32 1, !dbg !10572 + alias entry %237 = bitcast double* %236 to i64*, !dbg !10573 + alias entry %248 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %235, i32 0, !dbg !10575 + alias entry %250 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %22, i32 1, !dbg !10578 + alias entry %252 = bitcast double* %250 to i64*, !dbg !10581 + alias entry %263 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %22, i32 0, !dbg !10583 + alias entry %267 = getelementptr inbounds i64, i64* %5, i64 %0, !dbg !10587 + alias entry %270 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %269, i32 1, !dbg !10533 + alias entry %272 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %269, i32 0, !dbg !10534 +Round 1 +Round end + load (1.000000e+00) from i64* %2 + load (1.000000e+00) from i64* %4 + load (1.000000e+00) from i64* %1 + load (1.000000e+00) from i64* %1 + load (5.000000e-01) from %struct.Comm* %7 + load (5.000000e-01) from %struct.Comm* %7 + load (7.992188e+00) from %struct.Edge* %3 + load (3.992188e+00) from %struct.Edge* %3 + load (7.992188e+00) from i64* %4 + load (2.492188e+00) from %struct.Edge* %3 + load (2.742188e+00) from %struct.Edge* %3 + load (5.000000e-01) from double* %10 + store (5.000000e-01) to double* %10 + load (5.000000e-01) from double* %6 + load (1.250000e-01) from %struct.Comm* %7 + load (1.250000e-01) from %struct.Comm* %7 + load (4.992188e+00) from %struct.Comm* %7 + load (4.992188e+00) from %struct.Comm* %7 + load (2.500000e-01) from %struct.Comm* %8 + load (2.500000e-01) from double* %6 + load (2.500000e-01) from %struct.Comm* %8 + store (1.000000e+00) to i64* %5 + load (4.992188e+00) from %struct.Comm* %7 + load (4.992188e+00) from %struct.Comm* %7 + Frequency of i64* %1 + load: 2.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %2 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Edge* %3 + load: 1.721875e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %4 + load: 8.992188e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %5 + load: 0.000000e+00 store: 1.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %6 + load: 7.500000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %7 + load: 2.121875e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %8 + load: 5.000000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %10 + load: 5.000000e-01 store: 5.000000e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z21distComputeModularityRK5GraphP4CommPKddi +Round 0 + alias entry %14 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 2, !dbg !10288 + alias entry %16 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !10304 + base alias entry %35 = bitcast i8** %34 to double**, !dbg !10317 + base alias entry %37 = bitcast i8** %36 to double**, !dbg !10317 + base alias entry %45 = bitcast i8** %44 to %struct.Comm**, !dbg !10317 + base alias entry %47 = bitcast i8** %46 to %struct.Comm**, !dbg !10317 +Round 1 + base alias entry %34 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 2, !dbg !10317 + base alias entry %36 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 2, !dbg !10317 + base alias entry %44 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 4, !dbg !10317 + base alias entry %46 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 4, !dbg !10317 +Round 2 + base alias offset entry (2) %11 = alloca [5 x i8*], align 8 + base alias offset entry (2) %12 = alloca [5 x i8*], align 8 + base alias offset entry (-1) %39 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 3, !dbg !10317 + base alias offset entry (-1) %41 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 3, !dbg !10317 + base alias offset entry (4) %11 = alloca [5 x i8*], align 8 + base alias offset entry (-2) %44 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 4, !dbg !10317 + base alias offset entry (4) %12 = alloca [5 x i8*], align 8 + base alias offset entry (-2) %46 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 4, !dbg !10317 +Round 3 + base alias offset entry (4) %24 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 0, !dbg !10317 + base alias offset entry (2) %24 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 0, !dbg !10317 + base alias offset entry (4) %26 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 0, !dbg !10317 + base alias offset entry (2) %26 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 0, !dbg !10317 + base alias offset entry (3) %29 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 1, !dbg !10317 + base alias offset entry (1) %29 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 1, !dbg !10317 + base alias offset entry (3) %31 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 1, !dbg !10317 + base alias offset entry (1) %31 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 1, !dbg !10317 + base alias offset entry (2) %34 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 2, !dbg !10317 + base alias offset entry (2) %36 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 2, !dbg !10317 + base alias offset entry (1) %39 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 3, !dbg !10317 + base alias offset entry (1) %41 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 3, !dbg !10317 +Round 4 +Round end + load (1.000000e+00) from %class.Graph* %0 + load (1.000000e+00) from %class.Graph* %0 + Frequency of %class.Graph* %0 + load: 2.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %2 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp.reduction.reduction_func.7 +Round 0 + alias entry %3 = bitcast i8* %1 to double**, !dbg !10261 + alias entry %5 = bitcast i8* %0 to double**, !dbg !10261 + alias entry %7 = getelementptr inbounds i8, i8* %1, i64 8, !dbg !10261 + alias entry %8 = bitcast i8* %7 to double**, !dbg !10261 + alias entry %10 = getelementptr inbounds i8, i8* %0, i64 8, !dbg !10261 + alias entry %11 = bitcast i8* %10 to double**, !dbg !10261 +Round 1 +Round end + load (1.000000e+00) from i8* %1 + load (1.000000e+00) from i8* %0 + load (1.000000e+00) from i8* %1 + load (1.000000e+00) from i8* %0 + Frequency of i8* %0 + load: 2.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i8* %1 + load: 2.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..8 +Round 0 + alias entry %40 = getelementptr inbounds double, double* %6, i64 %39, !dbg !10318 + alias entry %43 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %39, i32 1, !dbg !10321 + alias entry %63 = bitcast double* %5 to i64*, !dbg !10329 + alias entry %75 = bitcast double* %7 to i64*, !dbg !10329 +Round 1 +Round end + load (1.010526e+01) from double* %6 + load (1.010526e+01) from %struct.Comm* %8 + load (2.105263e-01) from double* %5 + store (2.105263e-01) to double* %5 + load (2.105263e-01) from double* %7 + store (2.105263e-01) to double* %7 + load (2.105263e-01) from double* %5 + load (2.105263e-01) from double* %7 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %5 + load: 4.210526e-01 store: 2.105263e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %6 + load: 1.010526e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %7 + load: 4.210526e-01 store: 2.105263e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %8 + load: 1.010526e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp.reduction.reduction_func.9 +Round 0 + alias entry %3 = bitcast i8* %1 to double**, !dbg !10261 + alias entry %5 = bitcast i8* %0 to double**, !dbg !10261 + alias entry %7 = getelementptr inbounds i8, i8* %1, i64 8, !dbg !10261 + alias entry %8 = bitcast i8* %7 to double**, !dbg !10261 + alias entry %10 = getelementptr inbounds i8, i8* %0, i64 8, !dbg !10261 + alias entry %11 = bitcast i8* %10 to double**, !dbg !10261 +Round 1 +Round end + load (1.000000e+00) from i8* %1 + load (1.000000e+00) from i8* %0 + load (1.000000e+00) from i8* %1 + load (1.000000e+00) from i8* %0 + Frequency of i8* %0 + load: 2.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i8* %1 + load: 2.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..10 +Round 0 + alias entry %67 = bitcast double* %3 to i64*, !dbg !10310 + alias entry %79 = bitcast double* %5 to i64*, !dbg !10310 +Round 1 +Round end + load (2.916667e-01) from double* %3 + store (2.916667e-01) to double* %3 + load (2.916667e-01) from double* %5 + store (2.916667e-01) to double* %5 + load (3.333333e-01) from double* %3 + load (3.333333e-01) from double* %5 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %3 + load: 6.250000e-01 store: 2.916667e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %4 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %5 + load: 6.250000e-01 store: 2.916667e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %6 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z20distUpdateLocalCinfolP4CommPKS_ +Round 0 + base alias entry %15 = bitcast i8** %14 to %struct.Comm**, !dbg !10269 + base alias entry %17 = bitcast i8** %16 to %struct.Comm**, !dbg !10269 + base alias entry %20 = bitcast i8** %19 to %struct.Comm**, !dbg !10269 + base alias entry %22 = bitcast i8** %21 to %struct.Comm**, !dbg !10269 +Round 1 + base alias entry %14 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 1, !dbg !10269 + base alias entry %16 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 1, !dbg !10269 + base alias entry %19 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 2, !dbg !10269 + base alias entry %21 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 2, !dbg !10269 +Round 2 + base alias offset entry (1) %5 = alloca [3 x i8*], align 8 + base alias offset entry (1) %6 = alloca [3 x i8*], align 8 + base alias offset entry (2) %5 = alloca [3 x i8*], align 8 + base alias offset entry (-1) %19 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 2, !dbg !10269 + base alias offset entry (2) %6 = alloca [3 x i8*], align 8 + base alias offset entry (-1) %21 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 2, !dbg !10269 +Round 3 + base alias offset entry (1) %9 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 0, !dbg !10269 + base alias offset entry (2) %9 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 0, !dbg !10269 + base alias offset entry (1) %11 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 0, !dbg !10269 + base alias offset entry (2) %11 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 0, !dbg !10269 + base alias offset entry (1) %14 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 1, !dbg !10269 + base alias offset entry (1) %16 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 1, !dbg !10269 +Round 4 +Round end + Frequency of %struct.Comm* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %2 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..13 +Round 0 + alias entry %33 = getelementptr %struct.Comm, %struct.Comm* %5, i64 %26, !dbg !10304 + alias entry %36 = getelementptr %struct.Comm, %struct.Comm* %5, i64 %35, i32 1, !dbg !10304 + alias entry %37 = getelementptr %struct.Comm, %struct.Comm* %6, i64 %26, !dbg !10304 + alias entry %38 = getelementptr %struct.Comm, %struct.Comm* %6, i64 %35, i32 1, !dbg !10304 + alias entry %39 = getelementptr %struct.Comm, %struct.Comm* %5, i64 %26, i32 1, !dbg !10304 + alias entry %41 = getelementptr %struct.Comm, %struct.Comm* %5, i64 %40, !dbg !10304 + alias entry %42 = getelementptr %struct.Comm, %struct.Comm* %6, i64 %26, i32 1, !dbg !10304 + alias entry %43 = getelementptr %struct.Comm, %struct.Comm* %6, i64 %40, !dbg !10304 + alias entry %44 = bitcast double* %38 to %struct.Comm*, !dbg !10304 + alias entry %46 = bitcast double* %36 to %struct.Comm*, !dbg !10304 + alias entry %49 = bitcast %struct.Comm* %43 to double*, !dbg !10304 + alias entry %51 = bitcast %struct.Comm* %41 to double*, !dbg !10304 + alias entry %67 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %61, i32 0, !dbg !10304 + alias entry %68 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %62, i32 0, !dbg !10304 + alias entry %69 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %63, i32 0, !dbg !10304 + alias entry %70 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %64, i32 0, !dbg !10304 + alias entry %71 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %65, i32 0, !dbg !10304 + alias entry %72 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %66, i32 0, !dbg !10304 + alias entry %73 = bitcast i64* %67 to <4 x i64>*, !dbg !10304 + alias entry %74 = bitcast i64* %68 to <4 x i64>*, !dbg !10304 + alias entry %75 = bitcast i64* %69 to <4 x i64>*, !dbg !10304 + alias entry %76 = bitcast i64* %70 to <4 x i64>*, !dbg !10304 + alias entry %77 = bitcast i64* %71 to <4 x i64>*, !dbg !10304 + alias entry %78 = bitcast i64* %72 to <4 x i64>*, !dbg !10304 + alias entry %97 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %61, i32 0, !dbg !10307 + alias entry %98 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %62, i32 0, !dbg !10307 + alias entry %99 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %63, i32 0, !dbg !10307 + alias entry %100 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %64, i32 0, !dbg !10307 + alias entry %101 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %65, i32 0, !dbg !10307 + alias entry %102 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %66, i32 0, !dbg !10307 + alias entry %103 = bitcast i64* %97 to <4 x i64>*, !dbg !10307 + alias entry %104 = bitcast i64* %98 to <4 x i64>*, !dbg !10307 + alias entry %105 = bitcast i64* %99 to <4 x i64>*, !dbg !10307 + alias entry %106 = bitcast i64* %100 to <4 x i64>*, !dbg !10307 + alias entry %107 = bitcast i64* %101 to <4 x i64>*, !dbg !10307 + alias entry %108 = bitcast i64* %102 to <4 x i64>*, !dbg !10307 + alias entry %139 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %61, i32 1, !dbg !10309 + alias entry %140 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %62, i32 1, !dbg !10309 + alias entry %141 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %63, i32 1, !dbg !10309 + alias entry %142 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %64, i32 1, !dbg !10309 + alias entry %143 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %65, i32 1, !dbg !10309 + alias entry %144 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %66, i32 1, !dbg !10309 + alias entry %151 = getelementptr inbounds double, double* %139, i64 -1, !dbg !10309 + alias entry %152 = bitcast double* %151 to <4 x double>*, !dbg !10309 + alias entry %153 = getelementptr inbounds double, double* %140, i64 -1, !dbg !10309 + alias entry %154 = bitcast double* %153 to <4 x double>*, !dbg !10309 + alias entry %155 = getelementptr inbounds double, double* %141, i64 -1, !dbg !10309 + alias entry %156 = bitcast double* %155 to <4 x double>*, !dbg !10309 + alias entry %157 = getelementptr inbounds double, double* %142, i64 -1, !dbg !10309 + alias entry %158 = bitcast double* %157 to <4 x double>*, !dbg !10309 + alias entry %159 = getelementptr inbounds double, double* %143, i64 -1, !dbg !10309 + alias entry %160 = bitcast double* %159 to <4 x double>*, !dbg !10309 + alias entry %161 = getelementptr inbounds double, double* %144, i64 -1, !dbg !10309 + alias entry %162 = bitcast double* %161 to <4 x double>*, !dbg !10309 + alias entry %183 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %182, i32 0, !dbg !10304 + alias entry %185 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %182, i32 0, !dbg !10307 + alias entry %188 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %182, i32 1, !dbg !10318 + alias entry %190 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %182, i32 1, !dbg !10309 +Round 1 +Round end + load (2.500000e+00) from %struct.Comm* %6 + load (2.500000e+00) from %struct.Comm* %6 + load (2.500000e+00) from %struct.Comm* %6 + load (2.500000e+00) from %struct.Comm* %6 + load (2.500000e+00) from %struct.Comm* %6 + load (2.500000e+00) from %struct.Comm* %6 + load (2.500000e+00) from %struct.Comm* %5 + load (2.500000e+00) from %struct.Comm* %5 + load (2.500000e+00) from %struct.Comm* %5 + load (2.500000e+00) from %struct.Comm* %5 + load (2.500000e+00) from %struct.Comm* %5 + load (2.500000e+00) from %struct.Comm* %5 + store (2.500000e+00) to %struct.Comm* %5 + store (2.500000e+00) to %struct.Comm* %5 + store (2.500000e+00) to %struct.Comm* %5 + store (2.500000e+00) to %struct.Comm* %5 + store (2.500000e+00) to %struct.Comm* %5 + store (2.500000e+00) to %struct.Comm* %5 + load (9.088235e+00) from %struct.Comm* %6 + load (9.088235e+00) from %struct.Comm* %5 + store (9.088235e+00) to %struct.Comm* %5 + load (9.088235e+00) from %struct.Comm* %6 + load (9.088235e+00) from %struct.Comm* %5 + store (9.088235e+00) to %struct.Comm* %5 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %5 + load: 3.317647e+01 store: 3.317647e+01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %6 + load: 3.317647e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..14 +Round 0 +Round end + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %3 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %4 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z16distCleanCWandCUlPdP4Comm +Round 0 + base alias entry %17 = bitcast i8** %16 to double**, !dbg !10269 + base alias entry %19 = bitcast i8** %18 to double**, !dbg !10269 + base alias entry %22 = bitcast i8** %21 to %struct.Comm**, !dbg !10269 + base alias entry %24 = bitcast i8** %23 to %struct.Comm**, !dbg !10269 +Round 1 + base alias entry %16 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 1, !dbg !10269 + base alias entry %18 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 1, !dbg !10269 + base alias entry %21 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 2, !dbg !10269 + base alias entry %23 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 2, !dbg !10269 +Round 2 + base alias offset entry (1) %5 = alloca [3 x i8*], align 8 + base alias offset entry (1) %6 = alloca [3 x i8*], align 8 + base alias offset entry (2) %5 = alloca [3 x i8*], align 8 + base alias offset entry (-1) %21 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 2, !dbg !10269 + base alias offset entry (2) %6 = alloca [3 x i8*], align 8 + base alias offset entry (-1) %23 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 2, !dbg !10269 +Round 3 + base alias offset entry (1) %11 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 0, !dbg !10269 + base alias offset entry (2) %11 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 0, !dbg !10269 + base alias offset entry (1) %13 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 0, !dbg !10269 + base alias offset entry (2) %13 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 0, !dbg !10269 + base alias offset entry (1) %16 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 1, !dbg !10269 + base alias offset entry (1) %18 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 1, !dbg !10269 +Round 4 +Round end + Frequency of double* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %2 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..18 +Round 0 + alias entry %30 = getelementptr inbounds double, double* %5, i64 %29, !dbg !10304 + alias entry %31 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %29, i32 0, !dbg !10309 + alias entry %34 = bitcast i64* %31 to i8*, !dbg !10299 +Round 1 +Round end + store (1.058333e+01) to double* %5 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %5 + load: 0.000000e+00 store: 1.058333e+01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %6 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..19 +Round 0 +Round end + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %3 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %4 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z21fillRemoteCommunitiesRK5GraphiiRKmS3_RKSt6vectorIlSaIlEES8_S8_S8_S8_RKS4_I4CommSaIS9_EERSt3mapIlS9_St4lessIlESaISt4pairIKlS9_EEERSt13unordered_mapIllSt4hashIlESt8equal_toIlESaISH_ISI_lEEESM_ +Round 0 + alias entry %126 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 0, !dbg !11433 + alias entry %130 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 2, !dbg !11449 + alias entry %132 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !11460 + alias entry %190 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %6, i64 0, i32 0, i32 0, i32 0 + alias entry %197 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %5, i64 0, i32 0, i32 0, i32 0 + alias entry %301 = getelementptr inbounds %"class.std::unordered_map", %"class.std::unordered_map"* %12, i64 0, i32 0, i32 2, i32 0, !dbg !11792 + alias entry %302 = bitcast %"struct.std::__detail::_Hash_node_base"* %301 to %"struct.std::__detail::_Hash_node"**, !dbg !11793 + alias entry %312 = bitcast %"class.std::unordered_map"* %12 to i8**, !dbg !11836 + alias entry %314 = getelementptr inbounds %"class.std::unordered_map", %"class.std::unordered_map"* %12, i64 0, i32 0, i32 1, !dbg !11842 + alias entry %317 = bitcast %"struct.std::__detail::_Hash_node_base"* %301 to i8*, !dbg !11846 + alias entry %320 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %8, i64 0, i32 0, i32 0, i32 0 + alias entry %321 = getelementptr inbounds %"class.std::unordered_map", %"class.std::unordered_map"* %12, i64 0, i32 0 + alias entry %322 = getelementptr inbounds %"class.std::unordered_map", %"class.std::unordered_map"* %12, i64 0, i32 0, i32 0 + alias entry %323 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6 + alias entry %324 = bitcast %"class.std::vector.0"* %323 to i64* + alias entry %325 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 1 + alias entry %326 = bitcast i64** %325 to i64* + alias entry %330 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %9, i64 0, i32 0, i32 0, i32 0 + alias entry %331 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6 + alias entry %332 = bitcast %"class.std::vector.0"* %331 to i64* + alias entry %333 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 1 + alias entry %334 = bitcast i64** %333 to i64* + alias entry %818 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, !dbg !13393 + alias entry %819 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 1, i32 1, !dbg !13405 + alias entry %820 = bitcast %"struct.std::_Rb_tree_node_base"** %819 to %"struct.std::_Rb_tree_node"**, !dbg !13405 + alias entry %826 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 1, !dbg !13419 + alias entry %827 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 1, i32 2, !dbg !13425 + base alias entry %827 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 1, i32 2, !dbg !13425 + alias entry %828 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 1, i32 3, !dbg !13435 + base alias entry %828 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 1, i32 3, !dbg !13435 + alias entry %829 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 2, !dbg !13437 + alias entry %830 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, !dbg !13442 + alias entry %831 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 1, i32 1, !dbg !13447 + alias entry %832 = bitcast %"struct.std::_Rb_tree_node_base"** %831 to %"struct.std::_Rb_tree_node"**, !dbg !13447 + alias entry %838 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 1, !dbg !13452 + alias entry %839 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 1, i32 2, !dbg !13455 + base alias entry %839 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 1, i32 2, !dbg !13455 + alias entry %840 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 1, i32 3, !dbg !13462 + base alias entry %840 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 1, i32 3, !dbg !13462 + alias entry %841 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 2, !dbg !13464 + alias entry %846 = bitcast %"struct.std::_Rb_tree_node_base"** %819 to i64* + alias entry %848 = bitcast %"struct.std::_Rb_tree_node_base"* %826 to %"struct.std::_Rb_tree_node"* + alias entry %850 = bitcast %"struct.std::_Rb_tree_node_base"** %831 to i64* + alias entry %852 = bitcast %"struct.std::_Rb_tree_node_base"* %838 to %"struct.std::_Rb_tree_node"* + alias entry %967 = load %"struct.std::_Rb_tree_node_base"*, %"struct.std::_Rb_tree_node_base"** %827, align 8, !dbg !14017, !tbaa !14018 + alias entry %1023 = load %"struct.std::_Rb_tree_node_base"*, %"struct.std::_Rb_tree_node_base"** %839, align 8, !dbg !14306, !tbaa !14018 +Round 1 +Round end + load (1.000000e+00) from i64* %4 + load (9.999994e-01) from i64* %3 + load (9.999963e-01) from %class.Graph* %0 + load (9.999963e-01) from %class.Graph* %0 + load (9.999963e-01) from %class.Graph* %0 + load (9.999803e+00) from %"class.std::vector.0"* %6 + load (1.999960e+01) from %"class.std::vector.0"* %6 + load (6.249782e+00) from %"class.std::vector.0"* %5 + load (1.249956e+01) from %"class.std::vector.0"* %5 + load (9.999777e-01) from %"class.std::unordered_map"* %12 + load (9.999777e-01) from %"class.std::unordered_map"* %12 + load (9.999777e-01) from %"class.std::unordered_map"* %12 + load (1.999809e+01) from %"class.std::vector.0"* %8 + load (1.999807e+01) from %"class.std::unordered_map"* %12 + load (1.999807e+01) from %"class.std::unordered_map"* %12 +Warning: wrong traversal order, or recursive call +On function .omp_outlined..22 +Round 0 + alias entry %31 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 0 + alias entry %33 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 0 + alias entry %35 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %6, i64 0, i32 0, i32 0, i32 0 +Round 1 +Round end + load (1.000000e+00) from i64* %2 + load (3.200000e-01) from %"class.std::vector.0"* %3 + load (3.200000e-01) from %"class.std::vector.0"* %4 + load (3.200000e-01) from %"class.std::vector.0"* %6 + load (1.020000e+01) from i64* %5 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %2 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %3 + load: 3.200000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %4 + load: 3.200000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %5 + load: 1.020000e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %6 + load: 3.200000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp.reduction.reduction_func.24 +Round 0 + alias entry %3 = bitcast i8* %1 to i64**, !dbg !10261 + alias entry %5 = bitcast i8* %0 to i64**, !dbg !10261 +Round 1 +Round end + load (1.000000e+00) from i8* %1 + load (1.000000e+00) from i8* %0 + Frequency of i8* %0 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i8* %1 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..25 +Round 0 + alias entry %33 = getelementptr inbounds %"class.std::vector.29", %"class.std::vector.29"* %4, i64 0, i32 0, i32 0, i32 0 + alias entry %35 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 0 +Round 1 +Round end + load (1.000000e+00) from i32* %2 + load (3.157895e-01) from %"class.std::vector.29"* %4 + load (3.157895e-01) from %"class.std::vector.0"* %3 + load (2.105263e-01) from i64* %5 + store (2.105263e-01) to i64* %5 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %2 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %3 + load: 3.157895e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.29"* %4 + load: 3.157895e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %5 + load: 2.105263e-01 store: 2.105263e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp.reduction.reduction_func.27 +Round 0 + alias entry %3 = bitcast i8* %1 to i64**, !dbg !10261 + alias entry %5 = bitcast i8* %0 to i64**, !dbg !10261 +Round 1 +Round end + load (1.000000e+00) from i8* %1 + load (1.000000e+00) from i8* %0 + Frequency of i8* %0 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i8* %1 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..28 +Round 0 + alias entry %32 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 0 +Round 1 +Round end + load (1.000000e+00) from i32* %2 + load (3.157895e-01) from %"class.std::vector.0"* %4 + load (2.105263e-01) from i64* %3 + store (2.105263e-01) to i64* %3 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %2 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %3 + load: 2.105263e-01 store: 2.105263e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %4 + load: 3.157895e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..30 +Round 0 + alias entry %20 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %2, i64 0, i32 0, i32 0, i32 0, !dbg !10503 + alias entry %34 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 0 + alias entry %35 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %7, i64 0, i32 0, i32 0, i32 0 + alias entry %36 = getelementptr inbounds %"class.std::vector.52", %"class.std::vector.52"* %6, i64 0, i32 0, i32 0, i32 0 +Round 1 +Round end + load (1.000000e+00) from %"class.std::vector.0"* %2 + load (2.047500e+02) from %"class.std::vector.0"* %4 + load (2.047500e+02) from %"class.std::vector.15"* %7 + load (2.047500e+02) from %"class.std::vector.52"* %6 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %2 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %4 + load: 2.047500e+02 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.52"* %6 + load: 2.047500e+02 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.15"* %7 + load: 2.047500e+02 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z22createCommunityMPITypev +Round 0 +Round end +On function _Z23destroyCommunityMPITypev +Round 0 +Round end +On function _Z23updateRemoteCommunitiesRK5GraphRSt6vectorI4CommSaIS3_EERKSt3mapIlS3_St4lessIlESaISt4pairIKlS3_EEEii +Round 0 + alias entry %19 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 0, !dbg !10869 + alias entry %46 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !11050 + alias entry %48 = getelementptr inbounds %"class.std::map", %"class.std::map"* %2, i64 0, i32 0, i32 0, i32 1, i32 2, !dbg !11068 + alias entry %49 = bitcast %"struct.std::_Rb_tree_node_base"** %48 to i64*, !dbg !11068 + alias entry %51 = getelementptr inbounds %"class.std::map", %"class.std::map"* %2, i64 0, i32 0, i32 0, i32 1, !dbg !11085 + alias entry %55 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6 + alias entry %56 = bitcast %"class.std::vector.0"* %55 to i64* + alias entry %57 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 1 + alias entry %58 = bitcast i64** %57 to i64* +Round 1 +Round end + load (1.000000e+00) from %class.Graph* %0 + load (9.999994e-01) from %class.Graph* %0 + load (9.999994e-01) from %"class.std::map"* %2 + load (1.999985e+01) from %class.Graph* %0 + load (1.999985e+01) from %class.Graph* %0 + Frequency of %class.Graph* %0 + load: 4.199970e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.15"* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::map"* %2 + load: 9.999994e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..32 +Round 0 + alias entry %28 = getelementptr inbounds %"class.std::vector.66", %"class.std::vector.66"* %4, i64 0, i32 0, i32 0, i32 0 + alias entry %30 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 0 +Round 1 +Round end + load (1.000000e+00) from i32* %2 + load (3.137255e-01) from %"class.std::vector.66"* %4 + load (3.137255e-01) from %"class.std::vector.0"* %3 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %2 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %3 + load: 3.137255e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.66"* %4 + load: 3.137255e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp.reduction.reduction_func.34 +Round 0 + alias entry %3 = bitcast i8* %1 to i64**, !dbg !10261 + alias entry %5 = bitcast i8* %0 to i64**, !dbg !10261 + alias entry %7 = getelementptr inbounds i8, i8* %1, i64 8, !dbg !10261 + alias entry %8 = bitcast i8* %7 to i64**, !dbg !10261 + alias entry %10 = getelementptr inbounds i8, i8* %0, i64 8, !dbg !10261 + alias entry %11 = bitcast i8* %10 to i64**, !dbg !10261 +Round 1 +Round end + load (1.000000e+00) from i8* %1 + load (1.000000e+00) from i8* %0 + load (1.000000e+00) from i8* %1 + load (1.000000e+00) from i8* %0 + Frequency of i8* %0 + load: 2.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i8* %1 + load: 2.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..35 +Round 0 + alias entry %36 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 0 + alias entry %38 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %6, i64 0, i32 0, i32 0, i32 0 +Round 1 +Round end + load (1.000000e+00) from i32* %2 + load (3.157895e-01) from %"class.std::vector.0"* %4 + load (3.157895e-01) from %"class.std::vector.0"* %6 + load (2.105263e-01) from i64* %3 + store (2.105263e-01) to i64* %3 + load (2.105263e-01) from i64* %5 + store (2.105263e-01) to i64* %5 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %2 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %3 + load: 2.105263e-01 store: 2.105263e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %4 + load: 3.157895e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %5 + load: 2.105263e-01 store: 2.105263e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %6 + load: 3.157895e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..37 +Round 0 + alias entry %26 = getelementptr inbounds %"class.std::vector.52", %"class.std::vector.52"* %3, i64 0, i32 0, i32 0, i32 0 + alias entry %27 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %4, i64 0, i32 0, i32 0, i32 0 +Round 1 +Round end + load (1.000000e+00) from i64* %2 + load (6.350000e+00) from %"class.std::vector.52"* %3 + load (6.350000e+00) from %"class.std::vector.15"* %4 + load (6.350000e+00) from i64* %5 + load (2.047500e+02) from i64* %5 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %2 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.52"* %3 + load: 6.350000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.15"* %4 + load: 6.350000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %5 + load: 2.111000e+02 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z18exchangeVertexReqsRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ii +Round 0 + alias entry %15 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 2, !dbg !10306 + alias entry %17 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !10319 + alias entry %51 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 1, !dbg !10485 + alias entry %52 = bitcast i64** %51 to i64*, !dbg !10485 + alias entry %54 = bitcast %"class.std::vector.0"* %4 to i64*, !dbg !10489 + alias entry %71 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 1, !dbg !10517 + alias entry %72 = bitcast i64** %71 to i64*, !dbg !10517 + alias entry %74 = bitcast %"class.std::vector.0"* %3 to i64*, !dbg !10518 + alias entry %91 = bitcast %"class.std::vector.0"* %3 to i8** + alias entry %94 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 0 + alias entry %99 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 0, !dbg !10612 + alias entry %100 = bitcast %"class.std::vector.0"* %4 to i8**, !dbg !10612 + alias entry %129 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %5, i64 0, i32 0, i32 0, i32 1, !dbg !10673 + alias entry %130 = bitcast i64** %129 to i64*, !dbg !10673 + alias entry %132 = bitcast %"class.std::vector.0"* %5 to i64*, !dbg !10674 + alias entry %148 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %6, i64 0, i32 0, i32 0, i32 1, !dbg !10696 + alias entry %149 = bitcast i64** %148 to i64*, !dbg !10696 + alias entry %151 = bitcast %"class.std::vector.0"* %6 to i64*, !dbg !10697 + alias entry %191 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %6, i64 0, i32 0, i32 0, i32 0 + alias entry %251 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %5, i64 0, i32 0, i32 0, i32 0 + alias entry %310 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %5, i64 0, i32 0, i32 0, i32 2, !dbg !11244 + alias entry %311 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %6, i64 0, i32 0, i32 0, i32 2, !dbg !11245 + alias entry %312 = bitcast i64** %310 to i64*, !dbg !11249 + alias entry %314 = bitcast i64** %311 to i64*, !dbg !11250 + alias entry %320 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 2, !dbg !11279 + alias entry %321 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 2, !dbg !11280 + alias entry %322 = bitcast i64** %320 to i64*, !dbg !11284 + alias entry %324 = bitcast i64** %321 to i64*, !dbg !11285 +Round 1 +Round end + load (1.000000e+00) from %class.Graph* %0 + load (1.000000e+00) from %class.Graph* %0 + load (9.999984e-01) from %"class.std::vector.0"* %4 + load (9.999984e-01) from %"class.std::vector.0"* %4 +Warning: wrong traversal order, or recursive call +On function .omp_outlined..39 +Round 0 + alias entry %26 = getelementptr inbounds %class.Graph, %class.Graph* %3, i64 0, i32 0, i32 0, i32 0, i32 0 + alias entry %27 = getelementptr inbounds %class.Graph, %class.Graph* %3, i64 0, i32 1, i32 0, i32 0, i32 0 + alias entry %28 = getelementptr inbounds %class.Graph, %class.Graph* %3, i64 0, i32 6 + alias entry %29 = bitcast %"class.std::vector.0"* %28 to i64* + alias entry %30 = getelementptr inbounds %class.Graph, %class.Graph* %3, i64 0, i32 6, i32 0, i32 0, i32 1 + alias entry %31 = bitcast i64** %30 to i64* + alias entry %32 = getelementptr inbounds %"class.std::vector.29", %"class.std::vector.29"* %5, i64 0, i32 0, i32 0, i32 0 +Round 1 +Round end + load (1.988141e+02) from %class.Graph* %3 + load (3.180957e+03) from %class.Graph* %3 + load (3.180957e+03) from %class.Graph* %3 + load (3.180957e+03) from %class.Graph* %3 + load (1.590478e+03) from %"class.std::vector.29"* %5 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %class.Graph* %3 + load: 9.741684e+03 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.29"* %5 + load: 1.590478e+03 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp.reduction.reduction_func.41 +Round 0 + alias entry %3 = bitcast i8* %1 to i64**, !dbg !10261 + alias entry %5 = bitcast i8* %0 to i64**, !dbg !10261 +Round 1 +Round end + load (1.000000e+00) from i8* %1 + load (1.000000e+00) from i8* %0 + Frequency of i8* %0 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i8* %1 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..42 +Round 0 + alias entry %32 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 0 +Round 1 +Round end + load (1.000000e+00) from i32* %2 + load (3.157895e-01) from %"class.std::vector.0"* %4 + load (2.105263e-01) from i64* %3 + store (2.105263e-01) to i64* %3 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %2 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %3 + load: 2.105263e-01 store: 2.105263e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %4 + load: 3.157895e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z17distLouvainMethodiiRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ddRi +Round 0 + alias entry %68 = getelementptr inbounds %class.Graph, %class.Graph* %2, i64 0, i32 2, !dbg !11180 + alias entry %85 = getelementptr inbounds %class.Graph, %class.Graph* %2, i64 0, i32 0, i32 0, i32 0, i32 1, !dbg !11380 + alias entry %86 = bitcast i64** %85 to i64*, !dbg !11380 + alias entry %88 = bitcast %class.Graph* %2 to i64*, !dbg !11384 + alias entry %93 = bitcast %class.Graph* %2 to i8**, !dbg !11392 + alias entry %98 = getelementptr inbounds %class.Graph, %class.Graph* %2, i64 0, i32 6, !dbg !11399 + alias entry %99 = getelementptr inbounds %class.Graph, %class.Graph* %2, i64 0, i32 6, i32 0, i32 0, i32 1, !dbg !11402 + alias entry %100 = bitcast i64** %99 to i64*, !dbg !11402 + alias entry %102 = bitcast %"class.std::vector.0"* %98 to i64*, !dbg !11403 + alias entry %107 = bitcast %"class.std::vector.0"* %98 to i8**, !dbg !11410 + alias entry %112 = getelementptr inbounds %class.Graph, %class.Graph* %2, i64 0, i32 1, !dbg !11417 + alias entry %113 = getelementptr inbounds %class.Graph, %class.Graph* %2, i64 0, i32 1, i32 0, i32 0, i32 1, !dbg !11424 + alias entry %114 = bitcast %struct.Edge** %113 to i64*, !dbg !11424 + alias entry %116 = bitcast %"class.std::vector.5"* %112 to i64*, !dbg !11428 + alias entry %121 = bitcast %"class.std::vector.5"* %112 to i8**, !dbg !11440 +Round 1 +Round end + load (9.999981e-01) from %class.Graph* %2 +Warning: wrong traversal order, or recursive call +On function .omp_outlined..45 +Round 0 +Round end + call (1.058333e+01, 2.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00) using i64* %5 + call (1.058333e+01, 1.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00) using i64* %6 + call (1.058333e+01, 1.721875e+01, 0.000000e+00, 0.000000e+00, 0.000000e+00) using %struct.Edge* %7 + call (1.058333e+01, 8.992188e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00) using i64* %8 + call (1.058333e+01, 0.000000e+00, 1.000000e+00, 0.000000e+00, 0.000000e+00) using i64* %9 + call (1.058333e+01, 7.500000e-01, 0.000000e+00, 0.000000e+00, 0.000000e+00) using double* %10 + call (1.058333e+01, 2.121875e+01, 0.000000e+00, 0.000000e+00, 0.000000e+00) using %struct.Comm* %11 + call (1.058333e+01, 5.000000e-01, 0.000000e+00, 0.000000e+00, 0.000000e+00) using %struct.Comm* %12 + call (1.058333e+01, 5.000000e-01, 5.000000e-01, 0.000000e+00, 0.000000e+00) using double* %14 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %5 + load: 2.116667e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %6 + load: 1.058333e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Edge* %7 + load: 1.822318e+02 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %8 + load: 9.516732e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %9 + load: 0.000000e+00 store: 1.058333e+01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %10 + load: 7.937500e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %11 + load: 2.245651e+02 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %12 + load: 5.291667e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %14 + load: 5.291667e+00 store: 5.291667e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..46 +Round 0 +Round end + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %3 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %4 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Edge* %5 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %6 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %7 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %8 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %9 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %10 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %12 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..49 +Round 0 + alias entry %28 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 0 +Round 1 +Round end + load (3.200000e-01) from %"class.std::vector.0"* %3 + load (3.200000e-01) from i64** %4 + load (3.200000e-01) from i64** %5 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %3 + load: 3.200000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64** %4 + load: 3.200000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64** %5 + load: 3.200000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function main +Round 0 + base alias entry %14 = alloca i8**, align 8 + alias entry %33 = load i8**, i8*** %14, align 8, !dbg !10342, !tbaa !10335 +Round 1 +Round end + Frequency of i8** %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZN11GenerateRGGC2ElP19ompi_communicator_t +Round 0 + alias entry %4 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 0, !dbg !10266 + alias entry %5 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 3, !dbg !10276 + base alias entry %5 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 3, !dbg !10276 + alias entry %6 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 5, !dbg !10279 + alias entry %8 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %5, align 8, !dbg !10281, !tbaa !10278 + alias entry %9 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 4, !dbg !10282 + alias entry %11 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 7, !dbg !10284 + alias entry %12 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 6, !dbg !10287 + alias entry %36 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 1, !dbg !10320 + alias entry %101 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %5, align 8, !dbg !10478, !tbaa !10278 + alias entry %172 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %5, align 8, !dbg !10565, !tbaa !10278 + alias entry %184 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 2, !dbg !10579 + alias entry %191 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %5, align 8, !dbg !10583, !tbaa !10278 +Round 1 +Round end + store (1.000000e+00) to %class.GenerateRGG* %0 + store (1.000000e+00) to %class.GenerateRGG* %0 + load (1.000000e+00) from %class.GenerateRGG* %0 + store (1.000000e+00) to %class.GenerateRGG* %0 + store (1.000000e+00) to %class.GenerateRGG* %0 + load (1.000000e+00) from %class.GenerateRGG* %0 + load (5.000000e-01) from %class.GenerateRGG* %0 + store (2.500000e-01) to %class.GenerateRGG* %0 + store (3.437500e-01) to %class.GenerateRGG* %0 + store (2.500000e-01) to %class.GenerateRGG* %0 + load (1.000000e+00) from %class.GenerateRGG* %0 + store (1.000000e+00) to %class.GenerateRGG* %0 + load (6.250000e-01) from %class.GenerateRGG* %0 + load (6.250000e-01) from %class.GenerateRGG* %0 + load (6.250000e-01) from %class.GenerateRGG* %0 + load (7.656250e-01) from %class.GenerateRGG* %0 + load (7.656250e-01) from %class.GenerateRGG* %0 + store (1.000000e+00) to %class.GenerateRGG* %0 + load (1.000000e+00) from %class.GenerateRGG* %0 + load (1.000000e+00) from %class.GenerateRGG* %0 + Frequency of %class.GenerateRGG* %0 + load: 8.906250e+00 store: 6.843750e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.ompi_communicator_t* %2 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZN11GenerateRGG8generateEbbi +Round 0 + alias entry %27 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 1, !dbg !10306 + alias entry %75 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 6, !dbg !10592 + alias entry %112 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 7, !dbg !10709 + alias entry %153 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 0, !dbg !10828 + alias entry %156 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 4, !dbg !10832 + alias entry %160 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 5, !dbg !10836 + alias entry %430 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 3, !dbg !10915 + alias entry %819 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 3, !dbg !11101 + alias entry %895 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 2 + alias entry %1233 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 2 + alias entry %1536 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 2 +Round 1 +Round end + load (1.000000e+00) from %class.GenerateRGG* %0 + load (6.249994e-01) from %class.GenerateRGG* %0 + load (9.999990e-01) from %class.GenerateRGG* %0 + load (4.999995e-01) from %class.GenerateRGG* %0 + load (3.124994e-01) from %class.GenerateRGG* %0 + load (9.999985e-01) from %class.GenerateRGG* %0 + load (4.999993e-01) from %class.GenerateRGG* %0 + load (3.124992e-01) from %class.GenerateRGG* %0 + load (9.999971e-01) from %class.GenerateRGG* %0 + load (9.999971e-01) from %class.GenerateRGG* %0 + load (9.999962e-01) from %class.GenerateRGG* %0 + load (9.999962e-01) from %class.GenerateRGG* %0 + load (4.999966e-01) from %class.GenerateRGG* %0 + load (4.999971e-01) from %class.GenerateRGG* %0 + load (4.999971e-01) from %class.GenerateRGG* %0 + load (4.999966e-01) from %class.GenerateRGG* %0 + load (9.999923e-01) from %class.GenerateRGG* %0 + load (9.999914e-01) from %class.GenerateRGG* %0 + load (3.749968e-01) from %class.GenerateRGG* %0 + load (3.749964e-01) from %class.GenerateRGG* %0 + load (9.999890e-01) from %class.GenerateRGG* %0 + load (9.998746e-01) from %class.GenerateRGG* %0 + load (3.199362e+02) from %class.GenerateRGG* %0 + load (3.199361e+02) from %class.GenerateRGG* %0 + load (6.249210e-01) from %class.GenerateRGG* %0 + load (6.249210e-01) from %class.GenerateRGG* %0 + load (6.249210e-01) from %class.GenerateRGG* %0 + load (9.998736e-01) from %class.GenerateRGG* %0 + load (9.998736e-01) from %class.GenerateRGG* %0 + load (9.998736e-01) from %class.GenerateRGG* %0 + load (9.998736e-01) from %class.GenerateRGG* %0 + load (9.998726e-01) from %class.GenerateRGG* %0 + load (9.998726e-01) from %class.GenerateRGG* %0 + load (9.998726e-01) from %class.GenerateRGG* %0 + load (9.998726e-01) from %class.GenerateRGG* %0 + load (9.998717e-01) from %class.GenerateRGG* %0 + load (9.998717e-01) from %class.GenerateRGG* %0 + load (9.998717e-01) from %class.GenerateRGG* %0 + load (9.998717e-01) from %class.GenerateRGG* %0 + load (9.998707e-01) from %class.GenerateRGG* %0 + load (9.998707e-01) from %class.GenerateRGG* %0 + load (9.998707e-01) from %class.GenerateRGG* %0 + load (9.998707e-01) from %class.GenerateRGG* %0 + load (9.998698e-01) from %class.GenerateRGG* %0 + load (4.999349e-01) from %class.GenerateRGG* %0 + load (2.499674e-01) from %class.GenerateRGG* %0 + load (7.997451e+01) from %class.GenerateRGG* %0 + load (3.998725e+01) from %class.GenerateRGG* %0 + load (3.998725e+01) from %class.GenerateRGG* %0 + load (7.997448e+01) from %class.GenerateRGG* %0 + load (4.999063e-01) from %class.GenerateRGG* %0 + load (2.499531e-01) from %class.GenerateRGG* %0 + load (7.996993e+01) from %class.GenerateRGG* %0 + load (3.998497e+01) from %class.GenerateRGG* %0 + load (3.998497e+01) from %class.GenerateRGG* %0 + load (7.996991e+01) from %class.GenerateRGG* %0 + load (9.998126e-01) from %class.GenerateRGG* %0 + load (9.998116e-01) from %class.GenerateRGG* %0 + load (9.998116e-01) from %class.GenerateRGG* %0 + load (9.998116e-01) from %class.GenerateRGG* %0 + load (9.998107e-01) from %class.GenerateRGG* %0 + load (9.998107e-01) from %class.GenerateRGG* %0 + load (9.998107e-01) from %class.GenerateRGG* %0 + load (9.998091e-01) from %class.GenerateRGG* %0 + load (9.998091e-01) from %class.GenerateRGG* %0 + load (9.998091e-01) from %class.GenerateRGG* %0 + load (9.998082e-01) from %class.GenerateRGG* %0 + load (9.998082e-01) from %class.GenerateRGG* %0 + load (9.998082e-01) from %class.GenerateRGG* %0 + load (9.998072e-01) from %class.GenerateRGG* %0 + load (9.998015e-01) from %class.GenerateRGG* %0 + load (6.248724e-01) from %class.GenerateRGG* %0 + load (6.248718e-01) from %class.GenerateRGG* %0 + load (1.952724e-01) from %class.GenerateRGG* %0 + load (3.905445e-01) from %class.GenerateRGG* %0 + load (3.905442e-01) from %class.GenerateRGG* %0 + load (6.248393e-01) from %class.GenerateRGG* %0 + load (1.249644e+01) from %class.GenerateRGG* %0 + load (1.249643e+01) from %class.GenerateRGG* %0 + load (1.171538e+00) from %class.GenerateRGG* %0 + load (5.857690e-01) from %class.GenerateRGG* %0 + load (2.928845e-01) from %class.GenerateRGG* %0 + load (1.464422e-01) from %class.GenerateRGG* %0 + load (6.248387e-01) from %class.GenerateRGG* %0 + load (6.248381e-01) from %class.GenerateRGG* %0 + load (1.249638e+01) from %class.GenerateRGG* %0 + load (6.248253e-01) from %class.GenerateRGG* %0 + load (3.905154e-01) from %class.GenerateRGG* %0 + load (2.440719e-01) from %class.GenerateRGG* %0 + load (6.248247e-01) from %class.GenerateRGG* %0 + load (4.881438e+00) from %class.GenerateRGG* %0 + load (9.997431e-01) from %class.GenerateRGG* %0 + load (9.997421e-01) from %class.GenerateRGG* %0 + load (9.997406e-01) from %class.GenerateRGG* %0 + load (9.997406e-01) from %class.GenerateRGG* %0 + load (6.248378e-01) from %class.GenerateRGG* %0 + load (1.999481e+01) from %class.GenerateRGG* %0 + load (9.997388e-01) from %class.GenerateRGG* %0 + load (9.997385e-01) from %class.GenerateRGG* %0 + Frequency of %class.GenerateRGG* %0 + load: 1.248245e+03 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZN14BinaryEdgeList4readEiiiSs +Round 0 + alias entry %39 = getelementptr inbounds %class.BinaryEdgeList, %class.BinaryEdgeList* %0, i64 0, i32 4, !dbg !10380 + alias entry %41 = getelementptr inbounds %"class.std::basic_string", %"class.std::basic_string"* %4, i64 0, i32 0, i32 0, !dbg !10388 + alias entry %99 = getelementptr inbounds %class.BinaryEdgeList, %class.BinaryEdgeList* %0, i64 0, i32 0, !dbg !10514 + alias entry %100 = bitcast %class.BinaryEdgeList* %0 to i8*, !dbg !10515 + alias entry %104 = getelementptr inbounds %class.BinaryEdgeList, %class.BinaryEdgeList* %0, i64 0, i32 1, !dbg !10518 + alias entry %105 = bitcast i64* %104 to i8*, !dbg !10519 + alias entry %118 = getelementptr inbounds %class.BinaryEdgeList, %class.BinaryEdgeList* %0, i64 0, i32 2, !dbg !10532 + alias entry %183 = getelementptr inbounds %class.BinaryEdgeList, %class.BinaryEdgeList* %0, i64 0, i32 3, !dbg !10605 +Round 1 +Round end + load (9.999971e-01) from %class.BinaryEdgeList* %0 + load (9.999971e-01) from %"class.std::basic_string"* %4 + load (6.249948e-01) from %class.BinaryEdgeList* %0 + load (9.999905e-01) from %class.BinaryEdgeList* %0 + store (9.999905e-01) to %class.BinaryEdgeList* %0 + load (9.999895e-01) from %class.BinaryEdgeList* %0 + load (9.999886e-01) from %class.BinaryEdgeList* %0 + load (9.999886e-01) from %class.BinaryEdgeList* %0 + load (9.999729e-01) from %class.BinaryEdgeList* %0 + store (9.999729e-01) to %class.BinaryEdgeList* %0 + load (9.999714e-01) from %class.BinaryEdgeList* %0 + load (9.999714e-01) from %class.BinaryEdgeList* %0 + load (9.999547e-01) from %class.BinaryEdgeList* %0 + load (1.999909e+01) from %class.BinaryEdgeList* %0 + Frequency of %class.BinaryEdgeList* %0 + load: 2.962391e+01 store: 1.999963e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::basic_string"* %4 + load: 9.999971e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZNSt8_Rb_treeIlSt4pairIKl4CommESt10_Select1stIS3_ESt4lessIlESaIS3_EE8_M_eraseEPSt13_Rb_tree_nodeIS3_E +Round 0 +Round end +Warning: wrong traversal order, or recursive call +On function _ZN5GraphC2EllllP19ompi_communicator_t +Round 0 + alias entry %8 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 0, !dbg !10272 + alias entry %9 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 1, !dbg !10272 + alias entry %10 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 2, !dbg !10309 + alias entry %11 = bitcast %class.Graph* %0 to i8*, !dbg !10309 + alias entry %12 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 3, !dbg !10320 + alias entry %13 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 4, !dbg !10322 + alias entry %14 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 5, !dbg !10324 + alias entry %15 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, !dbg !10272 + alias entry %16 = bitcast %"class.std::vector.0"* %15 to i8*, !dbg !10332 + alias entry %17 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !10334 + base alias entry %17 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !10334 + alias entry %18 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 9, !dbg !10336 + alias entry %21 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %17, align 8, !dbg !10338, !tbaa !10335 + alias entry %22 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 8, !dbg !10339 + alias entry %28 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 0, i32 0, i32 0, i32 1, !dbg !10361 + alias entry %29 = bitcast i64** %28 to i64*, !dbg !10361 + alias entry %31 = bitcast %class.Graph* %0 to i64*, !dbg !10365 + alias entry %45 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 1, i32 0, i32 0, i32 1, !dbg !10416 + alias entry %46 = bitcast %struct.Edge** %45 to i64*, !dbg !10416 + alias entry %48 = bitcast %"class.std::vector.5"* %9 to i64*, !dbg !10420 + alias entry %64 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 1, !dbg !10455 + alias entry %65 = bitcast i64** %64 to i64*, !dbg !10455 + alias entry %67 = bitcast %"class.std::vector.0"* %15 to i64*, !dbg !10456 + alias entry %76 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %15, i64 0, i32 0, i32 0, i32 0 + alias entry %111 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %15, i64 0, i32 0, i32 0, i32 0, !dbg !10511 + alias entry %117 = getelementptr inbounds %"class.std::vector.5", %"class.std::vector.5"* %9, i64 0, i32 0, i32 0, i32 0, !dbg !10547 + alias entry %123 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 0, i32 0, i32 0, i32 0, !dbg !10576 +Round 1 +Round end + store (1.000000e+00) to %class.Graph* %0 + store (1.000000e+00) to %class.Graph* %0 + store (1.000000e+00) to %class.Graph* %0 + store (1.000000e+00) to %class.Graph* %0 + store (1.000000e+00) to %class.Graph* %0 + load (9.999990e-01) from %class.Graph* %0 + load (9.999980e-01) from %class.Graph* %0 + load (9.999980e-01) from %class.Graph* %0 + load (9.999980e-01) from %class.Graph* %0 +Warning: wrong traversal order, or recursive call +On function _ZN3LCGC2EjPdlP19ompi_communicator_t +Round 0 + alias entry %6 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 3, !dbg !10268 + alias entry %7 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 4, !dbg !10277 + alias entry %8 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 6, !dbg !10279 + base alias entry %8 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 6, !dbg !10279 + alias entry %9 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 7, !dbg !10281 + alias entry %10 = bitcast %"class.std::vector.0"* %9 to i8*, !dbg !10300 + alias entry %11 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 0, !dbg !10302 + base alias entry %11 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 0, !dbg !10302 + alias entry %12 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 1, !dbg !10306 + alias entry %15 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %11, align 8, !dbg !10308, !tbaa !10305 + alias entry %16 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 2, !dbg !10309 + alias entry %20 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 7, i32 0, i32 0, i32 1, !dbg !10326 + alias entry %21 = bitcast i64** %20 to i64*, !dbg !10326 + alias entry %23 = bitcast %"class.std::vector.0"* %9 to i64*, !dbg !10330 + alias entry %42 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 5, !dbg !10359 + alias entry %45 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %9, i64 0, i32 0, i32 0, i32 0, !dbg !10374 + alias entry %52 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 5, !dbg !10399 + alias entry %53 = bitcast i64* %52 to i8*, !dbg !10400 + alias entry %54 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %11, align 8, !dbg !10401, !tbaa !10305 +Round 1 +Round end + store (1.000000e+00) to %class.LCG* %0 + store (1.000000e+00) to %class.LCG* %0 + store (1.000000e+00) to %class.LCG* %0 + store (1.000000e+00) to %class.LCG* %0 + load (9.999989e-01) from %class.LCG* %0 + load (9.999982e-01) from %class.LCG* %0 + load (9.999982e-01) from %class.LCG* %0 + load (9.999982e-01) from %class.LCG* %0 +Warning: wrong traversal order, or recursive call +On function _ZNSt24uniform_int_distributionIiEclISt26linear_congruential_engineImLm16807ELm0ELm2147483647EEEEiRT_RKNS0_10param_typeE +Round 0 + alias entry %5 = getelementptr inbounds %"struct.std::uniform_int_distribution::param_type", %"struct.std::uniform_int_distribution::param_type"* %2, i64 0, i32 1, !dbg !10267 + alias entry %8 = getelementptr inbounds %"struct.std::uniform_int_distribution::param_type", %"struct.std::uniform_int_distribution::param_type"* %2, i64 0, i32 0, !dbg !10279 + alias entry %19 = getelementptr inbounds %"class.std::linear_congruential_engine", %"class.std::linear_congruential_engine"* %1, i64 0, i32 0 + alias entry %37 = getelementptr inbounds %"class.std::linear_congruential_engine", %"class.std::linear_congruential_engine"* %1, i64 0, i32 0 + alias entry %51 = getelementptr inbounds %"class.std::linear_congruential_engine", %"class.std::linear_congruential_engine"* %1, i64 0, i32 0, !dbg !10376 +Round 1 +Round end + load (1.000000e+00) from %"struct.std::uniform_int_distribution::param_type"* %2 + load (1.000000e+00) from %"struct.std::uniform_int_distribution::param_type"* %2 + load (5.000000e-01) from %"class.std::linear_congruential_engine"* %1 + store (5.000000e-01) to %"class.std::linear_congruential_engine"* %1 +Warning: wrong traversal order, or recursive call +On function _ZNSt6vectorIlSaIlEEaSERKS1_ +Round 0 + alias entry %5 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %1, i64 0, i32 0, i32 0, i32 1, !dbg !10278 + alias entry %6 = bitcast i64** %5 to i64*, !dbg !10278 + alias entry %8 = bitcast %"class.std::vector.0"* %1 to i64*, !dbg !10285 + alias entry %12 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10294 + alias entry %13 = bitcast i64** %12 to i64*, !dbg !10294 + alias entry %15 = bitcast %"class.std::vector.0"* %0 to i64*, !dbg !10296 + alias entry %33 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10459 + alias entry %41 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10490 + alias entry %42 = bitcast i64** %41 to i64*, !dbg !10490 + alias entry %53 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %1, i64 0, i32 0, i32 0, i32 0, !dbg !10573 + alias entry %73 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10633 + alias entry %76 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10635 +Round 1 +Round end + load (6.250000e-01) from %"class.std::vector.0"* %1 + load (6.250000e-01) from %"class.std::vector.0"* %1 + load (6.250000e-01) from %"class.std::vector.0"* %0 + load (6.250000e-01) from %"class.std::vector.0"* %0 + load (3.125000e-01) from %"class.std::vector.0"* %0 + store (3.125000e-01) to %"class.std::vector.0"* %0 + store (3.125000e-01) to %"class.std::vector.0"* %0 + load (3.125000e-01) from %"class.std::vector.0"* %0 + load (9.765625e-02) from %"class.std::vector.0"* %1 + load (9.765625e-02) from %"class.std::vector.0"* %0 + load (9.765625e-02) from %"class.std::vector.0"* %0 + load (9.765625e-02) from %"class.std::vector.0"* %1 + load (6.250000e-01) from %"class.std::vector.0"* %0 + store (6.250000e-01) to %"class.std::vector.0"* %0 + Frequency of %"class.std::vector.0"* %0 + load: 2.695312e+00 store: 1.250000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %1 + load: 1.445312e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZNSt6vectorIlSaIlEE14_M_fill_insertEN9__gnu_cxx17__normal_iteratorIPlS1_EEmRKl +Round 0 + alias entry %8 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10281 + alias entry %9 = bitcast i64** %8 to i64*, !dbg !10281 + alias entry %11 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10288 + alias entry %12 = bitcast i64** %11 to i64*, !dbg !10288 + alias entry %632 = bitcast %"class.std::vector.0"* %0 to i64*, !dbg !10728 + alias entry %848 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10820 +Round 1 +Round end + load (6.250000e-01) from %"class.std::vector.0"* %0 + load (6.250000e-01) from %"class.std::vector.0"* %0 + load (3.125000e-01) from i64* %3 + load (9.765625e-02) from %"class.std::vector.0"* %0 + store (1.562500e-01) to %"class.std::vector.0"* %0 + store (1.562500e-01) to %"class.std::vector.0"* %0 + load (9.765625e-02) from %"class.std::vector.0"* %0 + store (1.562500e-01) to %"class.std::vector.0"* %0 + load (3.125000e-01) from %"class.std::vector.0"* %0 + load (3.125000e-01) from i64* %3 + load (3.125000e-01) from %"class.std::vector.0"* %0 + load (3.125000e-01) from %"class.std::vector.0"* %0 + store (3.125000e-01) to %"class.std::vector.0"* %0 + store (3.125000e-01) to %"class.std::vector.0"* %0 + store (3.125000e-01) to %"class.std::vector.0"* %0 + Frequency of %"class.std::vector.0"* %0 + load: 2.382812e+00 store: 1.406250e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %3 + load: 6.250000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZNSt6vectorI4EdgeSaIS0_EE17_M_default_appendEm +Round 0 + alias entry %5 = getelementptr inbounds %"class.std::vector.5", %"class.std::vector.5"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10274 + alias entry %6 = bitcast %struct.Edge** %5 to i64*, !dbg !10274 + alias entry %8 = getelementptr inbounds %"class.std::vector.5", %"class.std::vector.5"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10281 + alias entry %87 = bitcast %"class.std::vector.5"* %0 to i64*, !dbg !10375 + alias entry %108 = getelementptr inbounds %"class.std::vector.5", %"class.std::vector.5"* %0, i64 0, i32 0, i32 0, i32 0 + alias entry %115 = getelementptr inbounds %"class.std::vector.5", %"class.std::vector.5"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10431 +Round 1 +Round end + load (6.250000e-01) from %"class.std::vector.5"* %0 + load (6.250000e-01) from %"class.std::vector.5"* %0 + load (3.125000e-01) from %"class.std::vector.5"* %0 + load (1.953125e-01) from %"class.std::vector.5"* %0 + load (1.953125e-01) from %"class.std::vector.5"* %0 + load (3.125000e-01) from %"class.std::vector.5"* %0 + store (3.125000e-01) to %"class.std::vector.5"* %0 + store (3.125000e-01) to %"class.std::vector.5"* %0 + Frequency of %"class.std::vector.5"* %0 + load: 2.265625e+00 store: 6.250000e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZN3LCG18parallel_prefix_opEv +Round 0 + alias entry %10 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 4, !dbg !10283 + alias entry %169 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 1, !dbg !10361 + alias entry %175 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 2, !dbg !10269 + alias entry %179 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 0 + alias entry %188 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 5, !dbg !10372 + alias entry %252 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 7, i32 0, i32 0, i32 0, !dbg !10372 +Round 1 +Round end + load (1.000000e+00) from %class.LCG* %0 + load (1.000000e+00) from %class.LCG* %0 + load (1.000000e+00) from %class.LCG* %0 + load (1.000000e+00) from %class.LCG* %0 + load (2.005882e+01) from %class.LCG* %0 + load (2.005882e+01) from %class.LCG* %0 + load (2.005882e+01) from %class.LCG* %0 + load (2.005882e+01) from %class.LCG* %0 + load (1.000000e+00) from %class.LCG* %0 + Frequency of %class.LCG* %0 + load: 8.523529e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZNSt6vectorI9EdgeTupleSaIS0_EE17_M_default_appendEm +Round 0 + alias entry %5 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10273 + alias entry %6 = bitcast %struct.EdgeTuple** %5 to i64*, !dbg !10273 + alias entry %8 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10280 + alias entry %65 = bitcast %"class.std::vector.84"* %0 to i64*, !dbg !10369 + alias entry %86 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 0 + alias entry %93 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10425 +Round 1 +Round end + load (6.250000e-01) from %"class.std::vector.84"* %0 + load (6.250000e-01) from %"class.std::vector.84"* %0 + load (3.125000e-01) from %"class.std::vector.84"* %0 + load (3.125000e-01) from %"class.std::vector.84"* %0 + load (1.953125e-01) from %"class.std::vector.84"* %0 + load (1.953125e-01) from %"class.std::vector.84"* %0 + load (3.125000e-01) from %"class.std::vector.84"* %0 + store (3.125000e-01) to %"class.std::vector.84"* %0 + store (3.125000e-01) to %"class.std::vector.84"* %0 + Frequency of %"class.std::vector.84"* %0 + load: 2.578125e+00 store: 6.250000e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZSt9__find_ifIN9__gnu_cxx17__normal_iteratorIP9EdgeTupleSt6vectorIS2_SaIS2_EEEEZN11GenerateRGG8generateEbbiEUlRKS2_E_ET_SC_SC_T0_St26random_access_iterator_tag +Round 0 +Round end +On function _ZNSt6vectorI9EdgeTupleSaIS0_EE15_M_range_insertIN9__gnu_cxx17__normal_iteratorIPS0_S2_EEEEvS7_T_S8_St20forward_iterator_tag +Round 0 + alias entry %13 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10344 + alias entry %14 = bitcast %struct.EdgeTuple** %13 to i64*, !dbg !10344 + alias entry %16 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10351 + alias entry %17 = bitcast %struct.EdgeTuple** %16 to i64*, !dbg !10351 + alias entry %120 = bitcast %"class.std::vector.84"* %0 to i64*, !dbg !10799 + alias entry %141 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 0 + alias entry %146 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10851 +Round 1 +Round end + load (6.250000e-01) from %"class.std::vector.84"* %0 + load (6.250000e-01) from %"class.std::vector.84"* %0 + load (9.765625e-02) from %"class.std::vector.84"* %0 + store (1.562500e-01) to %"class.std::vector.84"* %0 + load (9.765625e-02) from %"class.std::vector.84"* %0 + store (1.562500e-01) to %"class.std::vector.84"* %0 + load (9.765625e-02) from %"class.std::vector.84"* %0 + store (1.562500e-01) to %"class.std::vector.84"* %0 + load (3.125000e-01) from %"class.std::vector.84"* %0 + load (1.953125e-01) from %"class.std::vector.84"* %0 + load (3.125000e-01) from %"class.std::vector.84"* %0 + load (3.125000e-01) from %"class.std::vector.84"* %0 + store (3.125000e-01) to %"class.std::vector.84"* %0 + store (3.125000e-01) to %"class.std::vector.84"* %0 + store (3.125000e-01) to %"class.std::vector.84"* %0 + Frequency of %"class.std::vector.84"* %0 + load: 2.675781e+00 store: 1.406250e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZSt16__introsort_loopIN9__gnu_cxx17__normal_iteratorIP9EdgeTupleSt6vectorIS2_SaIS2_EEEElZN11GenerateRGG8generateEbbiEUlRKS2_SA_E_EvT_SC_T0_T1_ +Round 0 +Round end +On function _ZSt22__final_insertion_sortIN9__gnu_cxx17__normal_iteratorIP9EdgeTupleSt6vectorIS2_SaIS2_EEEEZN11GenerateRGG8generateEbbiEUlRKS2_SA_E_EvT_SC_T0_ +Round 0 +Round end +On function _ZSt13__heap_selectIN9__gnu_cxx17__normal_iteratorIP9EdgeTupleSt6vectorIS2_SaIS2_EEEEZN11GenerateRGG8generateEbbiEUlRKS2_SA_E_EvT_SC_SC_T0_ +Round 0 +Round end +On function _ZSt13__adjust_heapIN9__gnu_cxx17__normal_iteratorIP9EdgeTupleSt6vectorIS2_SaIS2_EEEElS2_ZN11GenerateRGG8generateEbbiEUlRKS2_SA_E_EvT_T0_SD_T1_T2_ +Round 0 +Round end +On function _ZSt22__move_median_to_firstIN9__gnu_cxx17__normal_iteratorIP9EdgeTupleSt6vectorIS2_SaIS2_EEEEZN11GenerateRGG8generateEbbiEUlRKS2_SA_E_EvT_SC_SC_SC_T0_ +Round 0 +Round end +On function _ZNSt6vectorIlSaIlEE17_M_default_appendEm +Round 0 + alias entry %5 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10274 + alias entry %6 = bitcast i64** %5 to i64*, !dbg !10274 + alias entry %8 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10281 + alias entry %20 = bitcast i64** %8 to i64*, !dbg !10380 + alias entry %21 = bitcast %"class.std::vector.0"* %0 to i64*, !dbg !10381 + alias entry %42 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 0 + alias entry %65 = bitcast %"class.std::vector.0"* %0 to i8**, !dbg !10628 +Round 1 +Round end + load (6.250000e-01) from %"class.std::vector.0"* %0 + load (6.250000e-01) from %"class.std::vector.0"* %0 + load (3.125000e-01) from %"class.std::vector.0"* %0 + load (3.125000e-01) from %"class.std::vector.0"* %0 + load (1.953125e-01) from %"class.std::vector.0"* %0 + load (1.953125e-01) from %"class.std::vector.0"* %0 + store (3.125000e-01) to %"class.std::vector.0"* %0 + store (3.125000e-01) to %"class.std::vector.0"* %0 + Frequency of %"class.std::vector.0"* %0 + load: 2.265625e+00 store: 6.250000e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZNSt6vectorIdSaIdEE17_M_default_appendEm +Round 0 + alias entry %5 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10274 + alias entry %6 = bitcast double** %5 to i64*, !dbg !10274 + alias entry %8 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10281 + alias entry %20 = bitcast double** %8 to i64*, !dbg !10381 + alias entry %21 = bitcast %"class.std::vector.10"* %0 to i64*, !dbg !10382 + alias entry %42 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %0, i64 0, i32 0, i32 0, i32 0 + alias entry %65 = bitcast %"class.std::vector.10"* %0 to i8**, !dbg !10630 +Round 1 +Round end + load (6.250000e-01) from %"class.std::vector.10"* %0 + load (6.250000e-01) from %"class.std::vector.10"* %0 + load (3.125000e-01) from %"class.std::vector.10"* %0 + load (3.125000e-01) from %"class.std::vector.10"* %0 + load (1.953125e-01) from %"class.std::vector.10"* %0 + load (1.953125e-01) from %"class.std::vector.10"* %0 + store (3.125000e-01) to %"class.std::vector.10"* %0 + store (3.125000e-01) to %"class.std::vector.10"* %0 + Frequency of %"class.std::vector.10"* %0 + load: 2.265625e+00 store: 6.250000e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZNSt6vectorI4CommSaIS0_EE17_M_default_appendEm +Round 0 + alias entry %5 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10460 + alias entry %6 = bitcast %struct.Comm** %5 to i64*, !dbg !10460 + alias entry %8 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10467 + alias entry %20 = bitcast %"class.std::vector.15"* %0 to i64*, !dbg !10551 + alias entry %41 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %0, i64 0, i32 0, i32 0, i32 0 + alias entry %48 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10607 +Round 1 +Round end + load (6.250000e-01) from %"class.std::vector.15"* %0 + load (6.250000e-01) from %"class.std::vector.15"* %0 + load (3.125000e-01) from %"class.std::vector.15"* %0 + load (3.125000e-01) from %"class.std::vector.15"* %0 + load (1.953125e-01) from %"class.std::vector.15"* %0 + load (1.953125e-01) from %"class.std::vector.15"* %0 + load (3.125000e-01) from %"class.std::vector.15"* %0 + store (3.125000e-01) to %"class.std::vector.15"* %0 + store (3.125000e-01) to %"class.std::vector.15"* %0 + Frequency of %"class.std::vector.15"* %0 + load: 2.578125e+00 store: 6.250000e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZNSt27__uninitialized_default_n_1ILb0EE18__uninit_default_nIPSt13unordered_setIlSt4hashIlESt8equal_toIlESaIlEEmEEvT_T0_ +Round 0 +Round end + Frequency of %"class.std::unordered_set"* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZNSt10_HashtableIlSt4pairIKllESaIS2_ENSt8__detail10_Select1stESt8equal_toIlESt4hashIlENS4_18_Mod_range_hashingENS4_20_Default_ranged_hashENS4_20_Prime_rehash_policyENS4_17_Hashtable_traitsILb0ELb0ELb1EEEE21_M_insert_unique_nodeEmmPNS4_10_Hash_nodeIS2_Lb0EEE +Round 0 + alias entry %5 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 4, !dbg !10268 + alias entry %6 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 4, i32 1, !dbg !10275 + alias entry %8 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 1, !dbg !10282 + alias entry %10 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 3, !dbg !10288 + alias entry %17 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 0 + alias entry %29 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 2, i32 0, !dbg !10428 + alias entry %30 = bitcast %"struct.std::__detail::_Hash_node_base"* %29 to %"struct.std::__detail::_Hash_node"**, !dbg !10429 + alias entry %32 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %29, i64 0, i32 0, !dbg !10432 + alias entry %35 = bitcast %"struct.std::__detail::_Hash_node_base"* %29 to i64* + base alias entry %44 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %28, i64 %43, !dbg !10509 + alias entry %61 = load %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %44, align 8, !dbg !10529, !tbaa !10511 + alias entry %62 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %61, i64 0, i32 0, !dbg !10530 + alias entry %77 = bitcast %"class.std::_Hashtable"* %0 to i8**, !dbg !10550 + alias entry %83 = bitcast %"struct.std::__detail::_Hash_node"* %3 to i8*, !dbg !10618 + alias entry %87 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 0, !dbg !10296 + alias entry %94 = getelementptr inbounds %"struct.std::__detail::_Hash_node", %"struct.std::__detail::_Hash_node"* %3, i64 0, i32 0, !dbg !10627 + alias entry %95 = bitcast %"struct.std::__detail::_Hash_node"* %3 to i64*, !dbg !10628 + base alias entry %97 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %96, i64 0, i32 0, !dbg !10630 + alias entry %99 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 2, i32 0, !dbg !10639 + alias entry %100 = bitcast %"struct.std::__detail::_Hash_node_base"* %99 to i64*, !dbg !10640 + alias entry %102 = getelementptr inbounds %"struct.std::__detail::_Hash_node", %"struct.std::__detail::_Hash_node"* %3, i64 0, i32 0, !dbg !10641 + alias entry %103 = getelementptr inbounds %"struct.std::__detail::_Hash_node", %"struct.std::__detail::_Hash_node"* %3, i64 0, i32 0, i32 0, !dbg !10641 + alias entry %104 = bitcast %"struct.std::__detail::_Hash_node"* %3 to i64*, !dbg !10642 + alias entry %105 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %99, i64 0, i32 0, !dbg !10645 + base alias entry %105 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %99, i64 0, i32 0, !dbg !10645 + base alias entry %114 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %85, i64 %113, !dbg !10676 + base alias entry %118 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %117, i64 %86, !dbg !10678 +Round 1 +Warning: the first offset is not constant + alias entry %45 = load %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %44, align 8, !dbg !10509, !tbaa !10511 + alias entry %57 = bitcast %"struct.std::__detail::_Hash_node_base"* %45 to i64*, !dbg !10525 + base alias offset entry (0) %96 = load %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %88, align 8, !dbg !10629, !tbaa !10511 +Warning: the first offset is not constant +Warning: the first offset is not constant +Round 2 +Warning: the first offset is not constant +Warning: the first offset is not constant +Warning: the first offset is not constant +Round end + load (1.000000e+00) from %"class.std::_Hashtable"* %0 + load (1.000000e+00) from %"class.std::_Hashtable"* %0 + load (1.000000e+00) from %"class.std::_Hashtable"* %0 + load (5.000000e-01) from %"class.std::_Hashtable"* %0 + load (4.999995e-01) from %"class.std::_Hashtable"* %0 + store (4.999995e-01) to %"class.std::_Hashtable"* %0 + load (3.749996e+00) from %"class.std::_Hashtable"* %0 + store (3.749996e+00) to %"class.std::_Hashtable"* %0 + load (6.249994e+00) from %"class.std::_Hashtable"* %0 + store (6.249994e+00) to %"class.std::_Hashtable"* %0 + store (4.768372e-07) to %"class.std::_Hashtable"* %0 + load (4.999995e-01) from %"class.std::_Hashtable"* %0 + store (4.999995e-01) to %"class.std::_Hashtable"* %0 + store (4.999995e-01) to %"class.std::_Hashtable"* %0 + store (6.249997e-01) to %"struct.std::__detail::_Hash_node"* %3 + load (3.749998e-01) from %"class.std::_Hashtable"* %0 + store (3.749998e-01) to %"struct.std::__detail::_Hash_node"* %3 + store (3.749998e-01) to %"class.std::_Hashtable"* %0 + load (3.749998e-01) from %"struct.std::__detail::_Hash_node"* %3 + load (2.343749e-01) from %"class.std::_Hashtable"* %0 + load (2.343749e-01) from %"class.std::_Hashtable"* %0 + load (9.999995e-01) from %"class.std::_Hashtable"* %0 + store (9.999995e-01) to %"class.std::_Hashtable"* %0 + Frequency of %"class.std::_Hashtable"* %0 + load: 1.634374e+01 store: 1.287499e+01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"struct.std::__detail::_Hash_node"* %3 + load: 3.749998e-01 store: 9.999995e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZNSt10_HashtableIllSaIlENSt8__detail9_IdentityESt8equal_toIlESt4hashIlENS1_18_Mod_range_hashingENS1_20_Default_ranged_hashENS1_20_Prime_rehash_policyENS1_17_Hashtable_traitsILb0ELb1ELb1EEEE21_M_insert_unique_nodeEmmPNS1_10_Hash_nodeIlLb0EEE +Round 0 + alias entry %5 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 4, !dbg !10268 + alias entry %6 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 4, i32 1, !dbg !10275 + alias entry %8 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 1, !dbg !10282 + alias entry %10 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 3, !dbg !10288 + alias entry %17 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 0 + alias entry %29 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 2, i32 0, !dbg !10428 + alias entry %30 = bitcast %"struct.std::__detail::_Hash_node_base"* %29 to %"struct.std::__detail::_Hash_node.61"**, !dbg !10429 + alias entry %32 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %29, i64 0, i32 0, !dbg !10432 + alias entry %35 = bitcast %"struct.std::__detail::_Hash_node_base"* %29 to i64* + base alias entry %44 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %28, i64 %43, !dbg !10469 + alias entry %61 = load %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %44, align 8, !dbg !10489, !tbaa !10471 + alias entry %62 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %61, i64 0, i32 0, !dbg !10490 + alias entry %77 = bitcast %"class.std::_Hashtable.34"* %0 to i8**, !dbg !10510 + alias entry %83 = bitcast %"struct.std::__detail::_Hash_node.61"* %3 to i8*, !dbg !10578 + alias entry %87 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 0, !dbg !10296 + alias entry %94 = getelementptr inbounds %"struct.std::__detail::_Hash_node.61", %"struct.std::__detail::_Hash_node.61"* %3, i64 0, i32 0, !dbg !10587 + alias entry %95 = bitcast %"struct.std::__detail::_Hash_node.61"* %3 to i64*, !dbg !10588 + base alias entry %97 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %96, i64 0, i32 0, !dbg !10590 + alias entry %99 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 2, i32 0, !dbg !10599 + alias entry %100 = bitcast %"struct.std::__detail::_Hash_node_base"* %99 to i64*, !dbg !10600 + alias entry %102 = getelementptr inbounds %"struct.std::__detail::_Hash_node.61", %"struct.std::__detail::_Hash_node.61"* %3, i64 0, i32 0, !dbg !10601 + alias entry %103 = getelementptr inbounds %"struct.std::__detail::_Hash_node.61", %"struct.std::__detail::_Hash_node.61"* %3, i64 0, i32 0, i32 0, !dbg !10601 + alias entry %104 = bitcast %"struct.std::__detail::_Hash_node.61"* %3 to i64*, !dbg !10602 + alias entry %105 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %99, i64 0, i32 0, !dbg !10605 + base alias entry %105 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %99, i64 0, i32 0, !dbg !10605 + base alias entry %114 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %85, i64 %113, !dbg !10630 + base alias entry %118 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %117, i64 %86, !dbg !10632 +Round 1 +Warning: the first offset is not constant + alias entry %45 = load %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %44, align 8, !dbg !10469, !tbaa !10471 + alias entry %57 = bitcast %"struct.std::__detail::_Hash_node_base"* %45 to i64*, !dbg !10485 + base alias offset entry (0) %96 = load %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %88, align 8, !dbg !10589, !tbaa !10471 +Warning: the first offset is not constant +Warning: the first offset is not constant +Round 2 +Warning: the first offset is not constant +Warning: the first offset is not constant +Warning: the first offset is not constant +Round end + load (1.000000e+00) from %"class.std::_Hashtable.34"* %0 + load (1.000000e+00) from %"class.std::_Hashtable.34"* %0 + load (1.000000e+00) from %"class.std::_Hashtable.34"* %0 + load (5.000000e-01) from %"class.std::_Hashtable.34"* %0 + load (4.999995e-01) from %"class.std::_Hashtable.34"* %0 + store (4.999995e-01) to %"class.std::_Hashtable.34"* %0 + load (3.749996e+00) from %"class.std::_Hashtable.34"* %0 + store (3.749996e+00) to %"class.std::_Hashtable.34"* %0 + load (6.249994e+00) from %"class.std::_Hashtable.34"* %0 + store (6.249994e+00) to %"class.std::_Hashtable.34"* %0 + store (4.768372e-07) to %"class.std::_Hashtable.34"* %0 + load (4.999995e-01) from %"class.std::_Hashtable.34"* %0 + store (4.999995e-01) to %"class.std::_Hashtable.34"* %0 + store (4.999995e-01) to %"class.std::_Hashtable.34"* %0 + store (6.249997e-01) to %"struct.std::__detail::_Hash_node.61"* %3 + load (3.749998e-01) from %"class.std::_Hashtable.34"* %0 + store (3.749998e-01) to %"struct.std::__detail::_Hash_node.61"* %3 + store (3.749998e-01) to %"class.std::_Hashtable.34"* %0 + load (3.749998e-01) from %"struct.std::__detail::_Hash_node.61"* %3 + load (2.343749e-01) from %"class.std::_Hashtable.34"* %0 + load (2.343749e-01) from %"class.std::_Hashtable.34"* %0 + load (9.999995e-01) from %"class.std::_Hashtable.34"* %0 + store (9.999995e-01) to %"class.std::_Hashtable.34"* %0 + Frequency of %"class.std::_Hashtable.34"* %0 + load: 1.634374e+01 store: 1.287499e+01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"struct.std::__detail::_Hash_node.61"* %3 + load: 3.749998e-01 store: 9.999995e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZNSt6vectorI8CommInfoSaIS0_EE17_M_default_appendEm +Round 0 + alias entry %7 = getelementptr inbounds %"class.std::vector.52", %"class.std::vector.52"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10273 + alias entry %8 = bitcast %struct.CommInfo** %7 to i64*, !dbg !10273 + alias entry %10 = getelementptr inbounds %"class.std::vector.52", %"class.std::vector.52"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10280 + alias entry %59 = bitcast %struct.CommInfo** %10 to i64*, !dbg !10394 + alias entry %60 = bitcast %"class.std::vector.52"* %0 to i64*, !dbg !10395 + alias entry %81 = getelementptr inbounds %"class.std::vector.52", %"class.std::vector.52"* %0, i64 0, i32 0, i32 0, i32 0 + alias entry %89 = getelementptr inbounds %"class.std::vector.52", %"class.std::vector.52"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10449 + alias entry %143 = bitcast %"class.std::vector.52"* %0 to i8**, !dbg !10651 +Round 1 +Round end + load (6.250000e-01) from %"class.std::vector.52"* %0 + load (6.250000e-01) from %"class.std::vector.52"* %0 + load (3.125000e-01) from %"class.std::vector.52"* %0 + load (3.125000e-01) from %"class.std::vector.52"* %0 + load (1.953125e-01) from %"class.std::vector.52"* %0 + load (1.953125e-01) from %"class.std::vector.52"* %0 + load (3.125000e-01) from %"class.std::vector.52"* %0 + store (3.125000e-01) to %"class.std::vector.52"* %0 + store (3.125000e-01) to %"class.std::vector.52"* %0 + Frequency of %"class.std::vector.52"* %0 + load: 2.578125e+00 store: 6.250000e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _GLOBAL__sub_I_main.cpp +Round 0 +Round end +On function .omp_offloading.descriptor_unreg +Round 0 +Round end + Frequency of i8* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_offloading.descriptor_reg.nvptx64-nvidia-cuda +Round 0 +Round end + ---- Identify Target Regions ---- + target call: %49 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z21distComputeModularityRK5GraphP4CommPKddi_l396.region_id, i32 5, i8** nonnull %24, i8** nonnull %26, i64* nonnull %28, i64* getelementptr inbounds ([5 x i64], [5 x i64]* @.offload_maptypes, i64 0, i64 0), i32 0, i32 0), !dbg !10317 + target call: %24 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z20distUpdateLocalCinfolP4CommPKS__l436.region_id, i32 3, i8** nonnull %9, i8** nonnull %11, i64* nonnull %13, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.15, i64 0, i64 0), i32 0, i32 0), !dbg !10269 + target call: %26 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z16distCleanCWandCUlPdP4Comm_l455.region_id, i32 3, i8** nonnull %11, i8** nonnull %13, i64* nonnull %15, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.20, i64 0, i64 0), i32 0, i32 0), !dbg !10269 + target call: %258 = invoke i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z16distCleanCWandCUlPdP4Comm_l455.region_id, i32 3, i8** nonnull %184, i8** nonnull %186, i64* nonnull %188, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.20, i64 0, i64 0), i32 0, i32 0) + to label %259 unwind label %319, !dbg !11559 + target call: %276 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z17distLouvainMethodiiRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ddRi_l1368.region_id, i32 12, i8** nonnull %200, i8** nonnull %202, i64* nonnull %204, i64* getelementptr inbounds ([12 x i64], [12 x i64]* @.offload_maptypes.47, i64 0, i64 0), i32 0, i32 0), !dbg !11584 + target call: %325 = invoke i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z20distUpdateLocalCinfolP4CommPKS__l436.region_id, i32 3, i8** nonnull %184, i8** nonnull %186, i64* nonnull %188, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.15, i64 0, i64 0), i32 0, i32 0) + to label %326 unwind label %319, !dbg !11667 + ---- Target Distance Calculation ---- +_Z17distLouvainMethodiiRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ddRi converges after 3 iterations +target 0: (0: 1.000000e+00) (1: 1.000000e+00) (2: 1.000000e+00) (3: 1.000000e+00) (4: 1.000000e+00) (5: 1.000000e+00) +target 1: (0: 1.000000e+00) (1: 1.000000e+00) (2: 1.000000e+00) (3: 1.000000e+00) (4: 1.000000e+00) (5: 1.000000e+00) +target 2: (0: 1.000000e+00) (1: 1.000000e+00) (2: 1.000000e+00) (3: 1.000000e+00) (4: 1.000000e+00) (5: 1.000000e+00) +target 3: (0: 1.010000e+02) (1: 1.010000e+02) (2: 1.010000e+02) (3: 9.152967e+00) (4: 1.000095e+00) (5: 2.000190e+00) +target 4: (0: 1.010000e+02) (1: 1.010000e+02) (2: 1.010000e+02) (3: 8.152880e+00) (4: 9.091440e+00) (5: 1.000095e+00) +target 5: (0: 1.010000e+02) (1: 1.010000e+02) (2: 1.010000e+02) (3: 7.152791e+00) (4: 8.091353e+00) (5: 9.029914e+00) + ---- OMP (main.cpp, powerpc64le-unknown-linux-gnu) ---- +new entry %91 = invoke i8* @omp_target_alloc(i64 %90, i32 signext -100) + to label %92 unwind label %291, !dbg !11387 +new entry %105 = invoke i8* @omp_target_alloc(i64 %104, i32 signext -100) + to label %106 unwind label %295, !dbg !11405 +new entry %119 = invoke i8* @omp_target_alloc(i64 %118, i32 signext -100) + to label %120 unwind label %299, !dbg !11431 +new entry %128 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %129 unwind label %303, !dbg !11449 +new entry %134 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %135 unwind label %307, !dbg !11462 +new entry %140 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %141 unwind label %311, !dbg !11478 +new entry %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %146 unwind label %313, !dbg !11486 +new entry %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %152 unwind label %315, !dbg !11502 +new entry %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %157 unwind label %317, !dbg !11510 +Round 0 + base alias entry %130 = bitcast i64** %29 to i8**, !dbg !11450 + base alias entry %142 = bitcast i64** %30 to i8**, !dbg !11479 + alias entry %147 = bitcast i8* %145 to %struct.Comm*, !dbg !11487 + alias entry %158 = bitcast i8* %156 to double*, !dbg !11511 + base alias entry %189 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 1 + base alias entry %190 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 1 + base alias entry %192 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 2 + base alias entry %193 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 2 +Warning: reach to function declaration __kmpc_fork_teams + base alias entry %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1 + base alias entry %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1 + base alias entry %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2 + base alias entry %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2 + base alias entry %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3 + base alias entry %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3 + base alias entry %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6 + base alias entry %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6 + base alias entry %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7 + base alias entry %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7 + base alias entry %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8 + base alias entry %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8 + base alias entry %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10 + base alias entry %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10 +Warning: reach to function declaration __kmpc_fork_teams + base alias entry %189 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 1 + base alias entry %190 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 1 +Warning: reach to function declaration __kmpc_fork_teams + alias entry (func arg) %struct.Comm* %1 + alias entry (func arg) double* %2 +Info: add function _Z21distComputeModularityRK5GraphP4CommPKddi to Round 1 +Round 1 + base alias entry %35 = bitcast i8** %34 to double**, !dbg !10317 + base alias entry %37 = bitcast i8** %36 to double**, !dbg !10317 + base alias entry %45 = bitcast i8** %44 to %struct.Comm**, !dbg !10317 + base alias entry %47 = bitcast i8** %46 to %struct.Comm**, !dbg !10317 +Warning: reach to function declaration __kmpc_fork_teams + base alias entry %29 = alloca i64*, align 8 + base alias entry %30 = alloca i64*, align 8 + base alias offset entry (1) %16 = alloca [3 x i8*], align 8 + base alias offset entry (1) %17 = alloca [3 x i8*], align 8 + base alias offset entry (2) %16 = alloca [3 x i8*], align 8 + base alias offset entry (-1) %192 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 2 + base alias offset entry (2) %17 = alloca [3 x i8*], align 8 + base alias offset entry (-1) %193 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 2 + base alias offset entry (1) %31 = alloca [12 x i8*], align 8 + base alias offset entry (1) %32 = alloca [12 x i8*], align 8 + base alias offset entry (2) %31 = alloca [12 x i8*], align 8 + base alias offset entry (-1) %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2 + base alias offset entry (2) %32 = alloca [12 x i8*], align 8 + base alias offset entry (-1) %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2 + base alias offset entry (3) %31 = alloca [12 x i8*], align 8 + base alias offset entry (-2) %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3 + base alias offset entry (-1) %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3 + base alias offset entry (3) %32 = alloca [12 x i8*], align 8 + base alias offset entry (-2) %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3 + base alias offset entry (-1) %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3 + base alias offset entry (-3) %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4 + base alias offset entry (-2) %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4 + base alias offset entry (-1) %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4 + base alias offset entry (-3) %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4 + base alias offset entry (-2) %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4 + base alias offset entry (-1) %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4 + base alias offset entry (-4) %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5 + base alias offset entry (-3) %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5 + base alias offset entry (-2) %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5 + base alias offset entry (-4) %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5 + base alias offset entry (-3) %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5 + base alias offset entry (-2) %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5 + base alias offset entry (6) %31 = alloca [12 x i8*], align 8 + base alias offset entry (-5) %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6 + base alias offset entry (-4) %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6 + base alias offset entry (-3) %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6 + base alias offset entry (6) %32 = alloca [12 x i8*], align 8 + base alias offset entry (-5) %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6 + base alias offset entry (-4) %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6 + base alias offset entry (-3) %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6 + base alias offset entry (7) %31 = alloca [12 x i8*], align 8 + base alias offset entry (-6) %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7 + base alias offset entry (-5) %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7 + base alias offset entry (-4) %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7 + base alias offset entry (-1) %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7 + base alias offset entry (7) %32 = alloca [12 x i8*], align 8 + base alias offset entry (-6) %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7 + base alias offset entry (-5) %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7 + base alias offset entry (-4) %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7 + base alias offset entry (-1) %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7 + base alias offset entry (8) %31 = alloca [12 x i8*], align 8 + base alias offset entry (-7) %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8 + base alias offset entry (-6) %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8 + base alias offset entry (-5) %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8 + base alias offset entry (-2) %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8 + base alias offset entry (-1) %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8 + base alias offset entry (8) %32 = alloca [12 x i8*], align 8 + base alias offset entry (-7) %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8 + base alias offset entry (-6) %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8 + base alias offset entry (-5) %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8 + base alias offset entry (-2) %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8 + base alias offset entry (-1) %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8 + base alias offset entry (-8) %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9 + base alias offset entry (-7) %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9 + base alias offset entry (-6) %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9 + base alias offset entry (-3) %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9 + base alias offset entry (-2) %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9 + base alias offset entry (-1) %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9 + base alias offset entry (-8) %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9 + base alias offset entry (-7) %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9 + base alias offset entry (-6) %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9 + base alias offset entry (-3) %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9 + base alias offset entry (-2) %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9 + base alias offset entry (-1) %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9 + base alias offset entry (10) %31 = alloca [12 x i8*], align 8 + base alias offset entry (-9) %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10 + base alias offset entry (-8) %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10 + base alias offset entry (-7) %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10 + base alias offset entry (-4) %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10 + base alias offset entry (-3) %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10 + base alias offset entry (-2) %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10 + base alias offset entry (10) %32 = alloca [12 x i8*], align 8 + base alias offset entry (-9) %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10 + base alias offset entry (-8) %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10 + base alias offset entry (-7) %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10 + base alias offset entry (-4) %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10 + base alias offset entry (-3) %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10 + base alias offset entry (-2) %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10 + base alias offset entry (-10) %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11 + base alias offset entry (-9) %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11 + base alias offset entry (-8) %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11 + base alias offset entry (-5) %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11 + base alias offset entry (-4) %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11 + base alias offset entry (-3) %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11 + base alias offset entry (-1) %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11 + base alias offset entry (-10) %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11 + base alias offset entry (-9) %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11 + base alias offset entry (-8) %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11 + base alias offset entry (-5) %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11 + base alias offset entry (-4) %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11 + base alias offset entry (-3) %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11 + base alias offset entry (-1) %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11 +Warning: store a different alias pointer to a base pointer: store i8* %156, i8** %189, align 8, !dbg !11559 +Warning: store a different alias pointer to a base pointer: store i8* %156, i8** %190, align 8, !dbg !11559 +Warning: reach to function declaration __kmpc_fork_teams + alias entry %263 = load i64*, i64** %29, align 8, !dbg !11584, !tbaa !11451 + alias entry %264 = load i64*, i64** %30, align 8, !dbg !11584, !tbaa !11451 + alias entry %274 = ptrtoint i64* %263 to i64, !dbg !11584 + alias entry %275 = ptrtoint i64* %264 to i64, !dbg !11584 + base alias entry %215 = bitcast i8** %214 to i64* + base alias entry %217 = bitcast i8** %216 to i64* + base alias entry %220 = bitcast i8** %219 to i64* + base alias entry %222 = bitcast i8** %221 to i64* +Warning: reach to function declaration __kmpc_fork_teams +Warning: reach to function declaration __kmpc_fork_teams +Info: add function _Z21distComputeModularityRK5GraphP4CommPKddi to Round 2 +Warning: reach to function declaration __kmpc_fork_call +Round 2 + base alias entry %34 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 2, !dbg !10317 + base alias entry %36 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 2, !dbg !10317 + base alias entry %44 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 4, !dbg !10317 + base alias entry %46 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 4, !dbg !10317 +Warning: reach to function declaration __kmpc_fork_teams + base alias offset entry (2) %11 = alloca [5 x i8*], align 8 + base alias offset entry (2) %12 = alloca [5 x i8*], align 8 + base alias offset entry (-1) %39 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 3, !dbg !10317 + base alias offset entry (-1) %41 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 3, !dbg !10317 + base alias offset entry (4) %11 = alloca [5 x i8*], align 8 + base alias offset entry (-2) %44 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 4, !dbg !10317 + base alias offset entry (4) %12 = alloca [5 x i8*], align 8 + base alias offset entry (-2) %46 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 4, !dbg !10317 +Warning: reach to function declaration __kmpc_fork_teams + base alias entry %126 = bitcast i64** %29 to i8*, !dbg !11447 + base alias entry %139 = bitcast i64** %30 to i8*, !dbg !11477 + base alias offset entry (1) %184 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 0 + base alias offset entry (2) %184 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 0 + base alias offset entry (1) %186 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 0 + base alias offset entry (2) %186 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 0 + base alias offset entry (1) %189 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 1 + base alias offset entry (1) %190 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 1 + base alias offset entry (1) %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0 + base alias offset entry (2) %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0 + base alias offset entry (3) %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0 + base alias offset entry (6) %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0 + base alias offset entry (7) %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0 + base alias offset entry (8) %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0 + base alias offset entry (10) %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0 + base alias offset entry (1) %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0 + base alias offset entry (2) %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0 + base alias offset entry (3) %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0 + base alias offset entry (6) %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0 + base alias offset entry (7) %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0 + base alias offset entry (8) %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0 + base alias offset entry (10) %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0 + base alias offset entry (1) %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1 + base alias offset entry (2) %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1 + base alias offset entry (5) %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1 + base alias offset entry (6) %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1 + base alias offset entry (7) %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1 + base alias offset entry (9) %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1 + base alias offset entry (1) %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1 + base alias offset entry (2) %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1 + base alias offset entry (5) %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1 + base alias offset entry (6) %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1 + base alias offset entry (7) %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1 + base alias offset entry (9) %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1 + base alias offset entry (1) %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2 + base alias offset entry (4) %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2 + base alias offset entry (5) %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2 + base alias offset entry (6) %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2 + base alias offset entry (8) %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2 + base alias offset entry (1) %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2 + base alias offset entry (4) %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2 + base alias offset entry (5) %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2 + base alias offset entry (6) %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2 + base alias offset entry (8) %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2 + base alias offset entry (3) %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3 + base alias offset entry (4) %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3 + base alias offset entry (5) %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3 + base alias offset entry (7) %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3 + base alias offset entry (3) %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3 + base alias offset entry (4) %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3 + base alias offset entry (5) %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3 + base alias offset entry (7) %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3 + base alias offset entry (2) %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4 + base alias offset entry (3) %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4 + base alias offset entry (4) %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4 + base alias offset entry (6) %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4 + base alias entry %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4 + base alias offset entry (2) %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4 + base alias offset entry (3) %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4 + base alias offset entry (4) %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4 + base alias offset entry (6) %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4 + base alias entry %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4 + base alias offset entry (1) %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5 + base alias offset entry (2) %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5 + base alias offset entry (3) %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5 + base alias offset entry (5) %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5 + base alias entry %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5 + base alias offset entry (1) %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5 + base alias offset entry (2) %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5 + base alias offset entry (3) %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5 + base alias offset entry (5) %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5 + base alias entry %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5 + base alias offset entry (1) %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6 + base alias offset entry (2) %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6 + base alias offset entry (4) %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6 + base alias offset entry (1) %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6 + base alias offset entry (2) %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6 + base alias offset entry (4) %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6 + base alias offset entry (1) %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7 + base alias offset entry (3) %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7 + base alias offset entry (1) %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7 + base alias offset entry (3) %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7 + base alias offset entry (2) %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8 + base alias offset entry (2) %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8 + base alias offset entry (1) %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9 + base alias offset entry (1) %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9 +Warning: store a different alias pointer to a base pointer: store i8* %156, i8** %189, align 8, !dbg !11559 +Warning: store a different alias pointer to a base pointer: store i8* %156, i8** %190, align 8, !dbg !11559 +Warning: reach to function declaration __kmpc_fork_teams +Warning: reach to function declaration __kmpc_fork_teams +Warning: reach to function declaration __kmpc_fork_teams +Info: add function _Z21distComputeModularityRK5GraphP4CommPKddi to Round 3 +Warning: reach to function declaration __kmpc_fork_call +Round 3 + base alias offset entry (4) %24 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 0, !dbg !10317 + base alias offset entry (2) %24 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 0, !dbg !10317 + base alias offset entry (4) %26 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 0, !dbg !10317 + base alias offset entry (2) %26 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 0, !dbg !10317 + base alias offset entry (3) %29 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 1, !dbg !10317 + base alias offset entry (1) %29 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 1, !dbg !10317 + base alias offset entry (3) %31 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 1, !dbg !10317 + base alias offset entry (1) %31 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 1, !dbg !10317 + base alias offset entry (2) %34 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 2, !dbg !10317 + base alias offset entry (2) %36 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 2, !dbg !10317 + base alias offset entry (1) %39 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 3, !dbg !10317 + base alias offset entry (1) %41 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 3, !dbg !10317 +Warning: reach to function declaration __kmpc_fork_teams +Warning: reach to function declaration __kmpc_fork_teams +Warning: reach to function declaration __kmpc_fork_teams + base alias offset entry (4) %31 = alloca [12 x i8*], align 8 + base alias offset entry (4) %32 = alloca [12 x i8*], align 8 + base alias offset entry (5) %31 = alloca [12 x i8*], align 8 + base alias offset entry (-1) %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5 + base alias offset entry (5) %32 = alloca [12 x i8*], align 8 + base alias offset entry (-1) %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5 + base alias offset entry (-2) %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6 + base alias offset entry (-1) %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6 + base alias offset entry (-2) %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6 + base alias offset entry (-1) %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6 + base alias offset entry (-3) %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7 + base alias offset entry (-2) %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7 + base alias offset entry (-3) %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7 + base alias offset entry (-2) %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7 + base alias offset entry (-4) %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8 + base alias offset entry (-3) %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8 + base alias offset entry (-4) %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8 + base alias offset entry (-3) %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8 + base alias offset entry (-5) %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9 + base alias offset entry (-4) %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9 + base alias offset entry (-5) %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9 + base alias offset entry (-4) %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9 + base alias offset entry (-6) %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10 + base alias offset entry (-5) %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10 + base alias offset entry (-6) %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10 + base alias offset entry (-5) %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10 + base alias offset entry (-7) %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11 + base alias offset entry (-6) %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11 + base alias offset entry (-7) %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11 + base alias offset entry (-6) %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11 +Warning: store a different alias pointer to a base pointer: store i8* %156, i8** %189, align 8, !dbg !11559 +Warning: store a different alias pointer to a base pointer: store i8* %156, i8** %190, align 8, !dbg !11559 +Warning: reach to function declaration __kmpc_fork_teams +Warning: reach to function declaration __kmpc_fork_teams +Warning: reach to function declaration __kmpc_fork_teams +Info: add function _Z21distComputeModularityRK5GraphP4CommPKddi to Round 4 +Warning: reach to function declaration __kmpc_fork_call +Round 4 +Warning: reach to function declaration __kmpc_fork_teams +Warning: reach to function declaration __kmpc_fork_teams + base alias offset entry (4) %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0 + base alias offset entry (5) %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0 + base alias offset entry (4) %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0 + base alias offset entry (5) %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0 + base alias offset entry (3) %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1 + base alias offset entry (4) %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1 + base alias offset entry (3) %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1 + base alias offset entry (4) %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1 + base alias offset entry (2) %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2 + base alias offset entry (3) %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2 + base alias offset entry (2) %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2 + base alias offset entry (3) %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2 + base alias offset entry (1) %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3 + base alias offset entry (2) %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3 + base alias offset entry (1) %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3 + base alias offset entry (2) %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3 + base alias offset entry (1) %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4 + base alias offset entry (1) %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4 +Warning: store a different alias pointer to a base pointer: store i8* %156, i8** %189, align 8, !dbg !11559 +Warning: store a different alias pointer to a base pointer: store i8* %156, i8** %190, align 8, !dbg !11559 +Warning: reach to function declaration __kmpc_fork_teams +Warning: reach to function declaration __kmpc_fork_teams +Warning: reach to function declaration __kmpc_fork_teams +Info: add function _Z21distComputeModularityRK5GraphP4CommPKddi to Round 5 +Warning: reach to function declaration __kmpc_fork_call +Round 5 +Warning: reach to function declaration __kmpc_fork_teams +Warning: store a different alias pointer to a base pointer: store i8* %156, i8** %189, align 8, !dbg !11559 +Warning: store a different alias pointer to a base pointer: store i8* %156, i8** %190, align 8, !dbg !11559 +Warning: reach to function declaration __kmpc_fork_teams +Warning: reach to function declaration __kmpc_fork_teams +Warning: reach to function declaration __kmpc_fork_teams +Info: add function _Z21distComputeModularityRK5GraphP4CommPKddi to Round 6 +Warning: reach to function declaration __kmpc_fork_call +Round 6 +Warning: reach to function declaration __kmpc_fork_teams +Round end + ---- Access Frequency Analysis ---- + target call (1.625206e+01, 0.000000e+00, 5.076920e+00) using %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %146 unwind label %313, !dbg !11486 + target call (1.625206e+01, 0.000000e+00, 1.015380e+01) using %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %152 unwind label %315, !dbg !11502 + target call (1.625204e+01, 1.015380e+01, 0.000000e+00) using %91 = invoke i8* @omp_target_alloc(i64 %90, i32 signext -100) + to label %92 unwind label %291, !dbg !11387 + target call (1.625204e+01, 5.076920e+00, 0.000000e+00) using %105 = invoke i8* @omp_target_alloc(i64 %104, i32 signext -100) + to label %106 unwind label %295, !dbg !11405 + target call (1.625204e+01, 8.757690e+01, 0.000000e+00) using %119 = invoke i8* @omp_target_alloc(i64 %118, i32 signext -100) + to label %120 unwind label %299, !dbg !11431 + target call (1.625204e+01, 4.569230e+01, 0.000000e+00) using %128 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %129 unwind label %303, !dbg !11449 + target call (1.625204e+01, 0.000000e+00, 5.076920e+00) using %140 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %141 unwind label %311, !dbg !11478 + target call (1.625204e+01, 3.807690e+00, 0.000000e+00) using %134 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %135 unwind label %307, !dbg !11462 + target call (1.625204e+01, 1.078710e+02, 0.000000e+00) using %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %146 unwind label %313, !dbg !11486 + target call (1.625204e+01, 2.538460e+00, 0.000000e+00) using %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %152 unwind label %315, !dbg !11502 + target call (1.625204e+01, 2.538460e+00, 2.538460e+00) using %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %157 unwind label %317, !dbg !11510 + target call (1.625202e+01, 1.015380e+01, 1.015380e+01) using %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %146 unwind label %313, !dbg !11486 + target call (1.625202e+01, 1.015380e+01, 0.000000e+00) using %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %152 unwind label %315, !dbg !11502 + call (1.625199e+01, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00) using %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %146 unwind label %313, !dbg !11486 + call (1.625199e+01, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00) using %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %157 unwind label %317, !dbg !11510 +Frequency of %91 = invoke i8* @omp_target_alloc(i64 %90, i32 signext -100) + to label %92 unwind label %291, !dbg !11387 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 1.650200e+02 store: 0.000000e+00 (target) +Frequency of %105 = invoke i8* @omp_target_alloc(i64 %104, i32 signext -100) + to label %106 unwind label %295, !dbg !11405 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 8.251031e+01 store: 0.000000e+00 (target) +Frequency of %119 = invoke i8* @omp_target_alloc(i64 %118, i32 signext -100) + to label %120 unwind label %299, !dbg !11431 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 1.423303e+03 store: 0.000000e+00 (target) +Frequency of %128 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %129 unwind label %303, !dbg !11449 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 7.425931e+02 store: 0.000000e+00 (target) +Frequency of %134 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %135 unwind label %307, !dbg !11462 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 6.188273e+01 store: 0.000000e+00 (target) +Frequency of %140 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %141 unwind label %311, !dbg !11478 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 8.251031e+01 (target) +Frequency of %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %146 unwind label %313, !dbg !11486 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 1.918144e+03 store: 2.475302e+02 (target) +Frequency of %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %152 unwind label %315, !dbg !11502 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 2.062750e+02 store: 1.650201e+02 (target) +Frequency of %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %157 unwind label %317, !dbg !11510 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 4.125515e+01 store: 4.125515e+01 (target) + ---- Optimization Preparation ---- +Rank 9 for %134 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %135 unwind label %307, !dbg !11462 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 6.188273e+01 store: 0.000000e+00 (target) +Rank 8 for %105 = invoke i8* @omp_target_alloc(i64 %104, i32 signext -100) + to label %106 unwind label %295, !dbg !11405 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 8.251031e+01 store: 0.000000e+00 (target) +Rank 7 for %140 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %141 unwind label %311, !dbg !11478 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 8.251031e+01 (target) +Rank 6 for %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %157 unwind label %317, !dbg !11510 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 4.125515e+01 store: 4.125515e+01 (target) +Rank 5 for %91 = invoke i8* @omp_target_alloc(i64 %90, i32 signext -100) + to label %92 unwind label %291, !dbg !11387 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 1.650200e+02 store: 0.000000e+00 (target) +Rank 4 for %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %152 unwind label %315, !dbg !11502 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 2.062750e+02 store: 1.650201e+02 (target) +Rank 3 for %128 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %129 unwind label %303, !dbg !11449 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 7.425931e+02 store: 0.000000e+00 (target) +Rank 2 for %119 = invoke i8* @omp_target_alloc(i64 %118, i32 signext -100) + to label %120 unwind label %299, !dbg !11431 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 1.423303e+03 store: 0.000000e+00 (target) +Rank 1 for %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %146 unwind label %313, !dbg !11486 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 1.918144e+03 store: 2.475302e+02 (target) + ---- Data Mapping Optimization ---- + target call: %49 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z21distComputeModularityRK5GraphP4CommPKddi_l396.region_id, i32 5, i8** nonnull %24, i8** nonnull %26, i64* nonnull %28, i64* getelementptr inbounds ([5 x i64], [5 x i64]* @.offload_maptypes, i64 0, i64 0), i32 0, i32 0), !dbg !10317 +@.offload_maptypes = private unnamed_addr constant [5 x i64] [i64 800, i64 547, i64 33, i64 547, i64 33] + arg 2 (0.000000e+00, 0.000000e+00; 4.125515e+01, 4.125515e+01) is %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %157 unwind label %317, !dbg !11510 + size is %127 = shl i64 %69, 3, !dbg !11448 + global reuse is 0x06 + local reuse is 1.600380e+02, 1.280304e+03 after adjustment; scaled local reuse is 0x500 + reuse distance is 0x01 + arg 4 (0.000000e+00, 0.000000e+00; 1.918144e+03, 2.475302e+02) is %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %146 unwind label %313, !dbg !11486 + size is %144 = shl i64 %69, 4, !dbg !11485 + global reuse is 0x01 + local reuse is 1.600380e+02, 2.560608e+03 after adjustment; scaled local reuse is 0xa00 + reuse distance is 0x01 + map type changed: @.offload_maptypes.0 = private unnamed_addr constant [5 x i64] [i64 800, i64 547, i64 1100853829665, i64 547, i64 1102195986465] + target call: %24 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z20distUpdateLocalCinfolP4CommPKS__l436.region_id, i32 3, i8** nonnull %9, i8** nonnull %11, i64* nonnull %13, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.15, i64 0, i64 0), i32 0, i32 0), !dbg !10269 +@.offload_maptypes.15 = private unnamed_addr constant [3 x i64] [i64 800, i64 35, i64 33] + target call: %26 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z16distCleanCWandCUlPdP4Comm_l455.region_id, i32 3, i8** nonnull %11, i8** nonnull %13, i64* nonnull %15, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.20, i64 0, i64 0), i32 0, i32 0), !dbg !10269 +@.offload_maptypes.20 = private unnamed_addr constant [3 x i64] [i64 800, i64 34, i64 34] + target call: %258 = invoke i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z16distCleanCWandCUlPdP4Comm_l455.region_id, i32 3, i8** nonnull %184, i8** nonnull %186, i64* nonnull %188, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.20, i64 0, i64 0), i32 0, i32 0) + to label %259 unwind label %319, !dbg !11559 +@.offload_maptypes.20 = private unnamed_addr constant [3 x i64] [i64 800, i64 34, i64 34] + arg 1 (0.000000e+00, 0.000000e+00; 1.918144e+03, 2.475302e+02) is %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %146 unwind label %313, !dbg !11486 + size is %144 = shl i64 %69, 4, !dbg !11485 + global reuse is 0x01 + local reuse is 5.076920e+00, 4.061536e+01 after adjustment; scaled local reuse is 0x028 + reuse distance is 0x01 + arg 2 (0.000000e+00, 0.000000e+00; 2.062750e+02, 1.650201e+02) is %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %152 unwind label %315, !dbg !11502 + size is %144 = shl i64 %69, 4, !dbg !11485 + global reuse is 0x04 + local reuse is 1.015380e+01, 1.624608e+02 after adjustment; scaled local reuse is 0x0a2 + reuse distance is 0x01 + map type changed: @.offload_maptypes.20.1 = private unnamed_addr constant [3 x i64] [i64 800, i64 1099553574946, i64 1099681513506] + target call: %276 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z17distLouvainMethodiiRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ddRi_l1368.region_id, i32 12, i8** nonnull %200, i8** nonnull %202, i64* nonnull %204, i64* getelementptr inbounds ([12 x i64], [12 x i64]* @.offload_maptypes.47, i64 0, i64 0), i32 0, i32 0), !dbg !11584 +@.offload_maptypes.47 = private unnamed_addr constant [12 x i64] [i64 800, i64 33, i64 33, i64 33, i64 33, i64 34, i64 33, i64 33, i64 35, i64 800, i64 35, i64 800] + arg 1 (0.000000e+00, 0.000000e+00; 1.650200e+02, 0.000000e+00) is %91 = invoke i8* @omp_target_alloc(i64 %90, i32 signext -100) + to label %92 unwind label %291, !dbg !11387 + size is %90 = sub i64 %87, %89, !dbg !11386 + global reuse is 0x05 + local reuse is 1.015380e+01, 8.123040e+01 after adjustment; scaled local reuse is 0x051 + reuse distance is 0x09 + arg 2 (0.000000e+00, 0.000000e+00; 8.251031e+01, 0.000000e+00) is %105 = invoke i8* @omp_target_alloc(i64 %104, i32 signext -100) + to label %106 unwind label %295, !dbg !11405 + size is %104 = sub i64 %101, %103, !dbg !11404 + global reuse is 0x08 + local reuse is 5.076920e+00, 4.061536e+01 after adjustment; scaled local reuse is 0x028 + reuse distance is 0x09 + arg 3 (0.000000e+00, 0.000000e+00; 1.423303e+03, 0.000000e+00) is %119 = invoke i8* @omp_target_alloc(i64 %118, i32 signext -100) + to label %120 unwind label %299, !dbg !11431 + size is %118 = sub i64 %115, %117, !dbg !11430 + global reuse is 0x02 + local reuse is 8.757690e+01, 1.401230e+03 after adjustment; scaled local reuse is 0x579 + reuse distance is 0x09 + arg 4 (0.000000e+00, 0.000000e+00; 7.425931e+02, 0.000000e+00) is %128 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %129 unwind label %303, !dbg !11449 + size is %127 = shl i64 %69, 3, !dbg !11448 + global reuse is 0x03 + local reuse is 4.569230e+01, 3.655384e+02 after adjustment; scaled local reuse is 0x16d + reuse distance is 0x09 + arg 5 (0.000000e+00, 0.000000e+00; 0.000000e+00, 8.251031e+01) is %140 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %141 unwind label %311, !dbg !11478 + size is %127 = shl i64 %69, 3, !dbg !11448 + global reuse is 0x07 + local reuse is 5.076920e+00, 4.061536e+01 after adjustment; scaled local reuse is 0x028 + reuse distance is 0x09 + arg 6 (0.000000e+00, 0.000000e+00; 6.188273e+01, 0.000000e+00) is %134 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %135 unwind label %307, !dbg !11462 + size is %127 = shl i64 %69, 3, !dbg !11448 + global reuse is 0x09 + local reuse is 3.807690e+00, 3.046152e+01 after adjustment; scaled local reuse is 0x01e + reuse distance is 0x09 + arg 7 (0.000000e+00, 0.000000e+00; 1.918144e+03, 2.475302e+02) is %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %146 unwind label %313, !dbg !11486 + size is %144 = shl i64 %69, 4, !dbg !11485 + global reuse is 0x01 + local reuse is 1.078710e+02, 1.725936e+03 after adjustment; scaled local reuse is 0x6bd + reuse distance is 0x01 + arg 8 (0.000000e+00, 0.000000e+00; 2.062750e+02, 1.650201e+02) is %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %152 unwind label %315, !dbg !11502 + size is %144 = shl i64 %69, 4, !dbg !11485 + global reuse is 0x04 + local reuse is 2.538460e+00, 4.061536e+01 after adjustment; scaled local reuse is 0x028 + reuse distance is 0x01 + arg 10 (0.000000e+00, 0.000000e+00; 4.125515e+01, 4.125515e+01) is %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %157 unwind label %317, !dbg !11510 + size is %127 = shl i64 %69, 3, !dbg !11448 + global reuse is 0x06 + local reuse is 5.076920e+00, 4.061536e+01 after adjustment; scaled local reuse is 0x028 + reuse distance is 0x09 + map type changed: @.offload_maptypes.47.2 = private unnamed_addr constant [12 x i64] [i64 800, i64 9895689605153, i64 9895646625825, i64 9897073713185, i64 9895987392545, i64 9895646621730, i64 9895636144161, i64 1101320425505, i64 1099553587235, i64 800, i64 9895646617635, i64 800] + target call: %325 = invoke i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z20distUpdateLocalCinfolP4CommPKS__l436.region_id, i32 3, i8** nonnull %184, i8** nonnull %186, i64* nonnull %188, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.15, i64 0, i64 0), i32 0, i32 0) + to label %326 unwind label %319, !dbg !11667 +@.offload_maptypes.15 = private unnamed_addr constant [3 x i64] [i64 800, i64 35, i64 33] + arg 1 (0.000000e+00, 0.000000e+00; 1.918144e+03, 2.475302e+02) is %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %146 unwind label %313, !dbg !11486 + size is %144 = shl i64 %69, 4, !dbg !11485 + global reuse is 0x01 + local reuse is 2.030760e+01, 3.249216e+02 after adjustment; scaled local reuse is 0x144 + reuse distance is 0x07 + arg 2 (0.000000e+00, 0.000000e+00; 2.062750e+02, 1.650201e+02) is %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %152 unwind label %315, !dbg !11502 + size is %144 = shl i64 %69, 4, !dbg !11485 + global reuse is 0x04 + local reuse is 1.015380e+01, 1.624608e+02 after adjustment; scaled local reuse is 0x0a2 + reuse distance is 0x07 + map type changed: @.offload_maptypes.15.3 = private unnamed_addr constant [3 x i64] [i64 800, i64 7696921137187, i64 7696751280161] +1 warning generated. +In file included from main.cpp:58: +In file included from ./dspl_gpu_kernel.hpp:58: +In file included from ./graph.hpp:56: +./utils.hpp:263:56: warning: using floating point absolute value function 'fabs' when argument is of integer type [-Wabsolute-value] + drand_[i] = (GraphWeight)((GraphWeight)fabs(rnums_[i]) * mult); // 0-1 + ^ +./utils.hpp:263:56: note: use function 'std::abs' instead + drand_[i] = (GraphWeight)((GraphWeight)fabs(rnums_[i]) * mult); // 0-1 + ^~~~ + std::abs +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 + ---- Function Argument Access Frequency CG Analysis ---- +On function __omp_offloading_33_128194f__Z21distComputeModularityRK5GraphP4CommPKddi_l396 +Round 0 + alias entry %71 = getelementptr inbounds double, double* %2, i64 %68, !dbg !45 + alias entry %74 = getelementptr inbounds %struct.Comm, %struct.Comm* %4, i64 %68, i32 1, !dbg !52 +Round 1 +Round end +change loop scale from 32.0 to 1.0 + load (1.600385e+02) from double* %2 + load (1.600385e+02) from %struct.Comm* %4 + load (6.227106e-02) from double* %1 + store (6.227106e-02) to double* %1 + load (6.227106e-02) from double* %3 + store (6.227106e-02) to double* %3 + Frequency of double* %1 + load: 6.227106e-02 store: 6.227106e-02 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %2 + load: 1.600385e+02 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %3 + load: 6.227106e-02 store: 6.227106e-02 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %4 + load: 1.600385e+02 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function __omp_offloading_33_128194f__Z20distUpdateLocalCinfolP4CommPKS__l436 +Round 0 + alias entry %41 = getelementptr inbounds %struct.Comm, %struct.Comm* %2, i64 %40, i32 0, !dbg !45 + alias entry %43 = getelementptr inbounds %struct.Comm, %struct.Comm* %1, i64 %40, i32 0, !dbg !53 + alias entry %46 = getelementptr inbounds %struct.Comm, %struct.Comm* %2, i64 %40, i32 1, !dbg !55 + alias entry %48 = getelementptr inbounds %struct.Comm, %struct.Comm* %1, i64 %40, i32 1, !dbg !57 +Round 1 +Round end +change loop scale from 32.0 to 1.0 + load (5.076923e+00) from %struct.Comm* %2 + load (5.076923e+00) from %struct.Comm* %1 + store (5.076923e+00) to %struct.Comm* %1 + load (5.076923e+00) from %struct.Comm* %2 + load (5.076923e+00) from %struct.Comm* %1 + store (5.076923e+00) to %struct.Comm* %1 + Frequency of %struct.Comm* %1 + load: 1.015385e+01 store: 1.015385e+01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %2 + load: 1.015385e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function __omp_offloading_33_128194f__Z16distCleanCWandCUlPdP4Comm_l455 +Round 0 + alias entry %41 = getelementptr inbounds double, double* %1, i64 %40, !dbg !45 + alias entry %42 = getelementptr inbounds %struct.Comm, %struct.Comm* %2, i64 %40, i32 1, !dbg !52 + alias entry %43 = getelementptr inbounds %struct.Comm, %struct.Comm* %2, i64 %40, i32 0, !dbg !57 +Round 1 +Round end +change loop scale from 32.0 to 1.0 + store (5.076923e+00) to double* %1 + store (5.076923e+00) to %struct.Comm* %2 + store (5.076923e+00) to %struct.Comm* %2 + Frequency of double* %1 + load: 0.000000e+00 store: 5.076923e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %2 + load: 0.000000e+00 store: 1.015385e+01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function __omp_offloading_33_128194f__Z17distLouvainMethodiiRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ddRi_l1368 +Round 0 +Round end +change loop scale from 32.0 to 1.0 +Warning: wrong traversal order, or recursive call +On function _Z27distExecuteLouvainIterationlPKlS0_PK4EdgeS0_PlPKdP4CommS8_dPdi +Round 0 + alias entry %91 = getelementptr inbounds i64, i64* %2, i64 %90, !dbg !35 + alias entry %93 = getelementptr inbounds i64, i64* %4, i64 %0, !dbg !38 + alias entry %96 = getelementptr inbounds i64, i64* %1, i64 %0, !dbg !40 + alias entry %99 = getelementptr inbounds i64, i64* %1, i64 %98, !dbg !42 + alias entry %103 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %95, i32 0, !dbg !45 + alias entry %105 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %95, i32 1, !dbg !49 + base alias entry %178 = select i1 %119, %struct.Edge** %13, %struct.Edge** %177 + alias entry %188 = getelementptr inbounds %struct.Edge, %struct.Edge* %3, i64 %187, i32 0, !dbg !69 + alias entry %189 = getelementptr inbounds %struct.Edge, %struct.Edge* %3, i64 %187, i32 1, !dbg !70 + alias entry %198 = getelementptr inbounds i64, i64* %4, i64 %197, !dbg !77 + alias entry %239 = bitcast double* %189 to i64*, !dbg !109 + alias entry %282 = getelementptr inbounds double, double* %10, i64 %0, !dbg !122 + alias entry %286 = getelementptr inbounds double, double* %6, i64 %0, !dbg !125 + alias entry %307 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %306, i32 1, !dbg !136 + alias entry %309 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %306, i32 0, !dbg !137 + alias entry %355 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %354, i32 1, !dbg !136 + alias entry %357 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %354, i32 0, !dbg !137 + alias entry %403 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %402, i32 1, !dbg !167 + alias entry %404 = bitcast double* %403 to i64*, !dbg !168 + alias entry %415 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %402, i32 0, !dbg !170 + alias entry %417 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %95, i32 1, !dbg !172 + alias entry %419 = bitcast double* %417 to i64*, !dbg !174 + alias entry %430 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %95, i32 0, !dbg !176 + alias entry %434 = getelementptr inbounds i64, i64* %5, i64 %0, !dbg !179 + alias entry %462 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %461, i32 1, !dbg !136 + alias entry %464 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %461, i32 0, !dbg !137 +Round 1 +Round end +change loop scale from 32.0 to 1.0 + load (1.000000e+00) from i64* %2 + load (1.000000e+00) from i64* %4 + load (1.000000e+00) from i64* %1 + load (1.000000e+00) from i64* %1 + load (5.000000e-01) from %struct.Comm* %7 + load (5.000000e-01) from %struct.Comm* %7 + load (8.000000e+00) from %struct.Edge* %3 + load (4.000000e+00) from %struct.Edge* %3 + load (8.000000e+00) from i64* %4 + load (2.500000e+00) from %struct.Edge* %3 + load (2.750000e+00) from %struct.Edge* %3 + load (5.000000e-01) from double* %10 + store (5.000000e-01) to double* %10 + load (5.000000e-01) from double* %6 + load (1.236264e-01) from %struct.Comm* %7 + load (1.236264e-01) from %struct.Comm* %7 + load (5.000000e+00) from %struct.Comm* %7 + load (5.000000e+00) from %struct.Comm* %7 + load (2.500000e-01) from %struct.Comm* %8 + load (2.500000e-01) from double* %6 + load (2.500000e-01) from %struct.Comm* %8 + store (1.000000e+00) to i64* %5 + load (5.000000e+00) from %struct.Comm* %7 + load (5.000000e+00) from %struct.Comm* %7 + Frequency of i64* %1 + load: 2.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %2 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Edge* %3 + load: 1.725000e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %4 + load: 9.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %5 + load: 0.000000e+00 store: 1.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %6 + load: 7.500000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %7 + load: 2.124725e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %8 + load: 5.000000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %10 + load: 5.000000e-01 store: 5.000000e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z24distBuildLocalMapCounterllP7clmap_tRiPdS1_PK4EdgePKllll +Round 0 + base alias entry %83 = select i1 %16, %struct.Edge** %12, %struct.Edge** %82 + alias entry %93 = getelementptr inbounds %struct.Edge, %struct.Edge* %6, i64 %92, i32 0, !dbg !38 + alias entry %94 = getelementptr inbounds %struct.Edge, %struct.Edge* %6, i64 %92, i32 1, !dbg !39 + alias entry %103 = getelementptr inbounds i64, i64* %7, i64 %102, !dbg !48 + alias entry %111 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %2, i64 %110, i32 0, !dbg !53 + alias entry %121 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %2, i64 %120, !dbg !61 + alias entry %131 = getelementptr inbounds double, double* %4, i64 %125, !dbg !70 + alias entry %138 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %2, i64 %120, i32 1, !dbg !75 + alias entry %139 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %2, i64 %120, i32 0, !dbg !76 + alias entry %146 = getelementptr inbounds double, double* %4, i64 %145, !dbg !83 + alias entry %147 = bitcast double* %146 to i64*, !dbg !84 + alias entry %148 = bitcast double* %94 to i64*, !dbg !85 +Round 1 +Round end +change loop scale from 32.0 to 1.0 + load (5.000000e-01) from %struct.Edge* %6 + load (2.472527e-01) from %struct.Edge* %6 + load (5.000000e-01) from i64* %7 + load (5.000000e-01) from i32* %3 + load (5.076923e+00) from %struct.clmap_t* %2 + load (3.076923e-01) from i32* %5 + load (1.538462e-01) from %struct.Edge* %6 + load (1.538462e-01) from double* %4 + store (1.538462e-01) to double* %4 + store (1.703297e-01) to %struct.clmap_t* %2 + store (1.703297e-01) to %struct.clmap_t* %2 + store (1.703297e-01) to i32* %3 + load (3.406593e-01) from i32* %5 + load (1.703297e-01) from %struct.Edge* %6 + store (1.703297e-01) to double* %4 + store (1.703297e-01) to i32* %5 + Frequency of %struct.clmap_t* %2 + load: 5.076923e+00 store: 3.406593e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %3 + load: 5.000000e-01 store: 1.703297e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %4 + load: 1.538462e-01 store: 3.241758e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %5 + load: 6.483516e-01 store: 1.703297e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Edge* %6 + load: 1.071429e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %7 + load: 5.000000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z15distGetMaxIndexP7clmap_tRiPdS1_dPK4Commdldllld +Round 0 + alias entry %22 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %0, i64 %21 + alias entry %28 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %0, i64 0, i32 0, !dbg !36 + alias entry %33 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %32, i32 1, !dbg !43 + alias entry %35 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %32, i32 0, !dbg !46 + alias entry %37 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %0, i64 0, i32 1, !dbg !48 + alias entry %41 = getelementptr inbounds double, double* %2, i64 %38, !dbg !52 + alias entry %60 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %0, i64 1, !dbg !62 + alias entry %81 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %80, i32 1, !dbg !43 + alias entry %83 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %80, i32 0, !dbg !46 + alias entry %89 = getelementptr inbounds double, double* %2, i64 %86, !dbg !52 + alias entry %126 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %125, i32 1, !dbg !43 + alias entry %128 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %125, i32 0, !dbg !46 + alias entry %134 = getelementptr inbounds double, double* %2, i64 %131, !dbg !52 +Round 1 +Round end +change loop scale from 32.0 to 1.0 + load (1.000000e+00) from double* %2 + load (1.000000e+00) from i32* %3 + load (1.000000e+00) from i32* %1 + load (5.000000e-01) from %struct.clmap_t* %0 + load (2.500000e-01) from %struct.Comm* %5 + load (2.500000e-01) from %struct.Comm* %5 + load (2.500000e-01) from %struct.clmap_t* %0 + load (1.250000e-01) from double* %2 + load (3.125000e-01) from %struct.Comm* %5 + load (3.125000e-01) from %struct.Comm* %5 + load (1.562500e-01) from double* %2 + load (3.125000e-01) from %struct.Comm* %5 + load (3.125000e-01) from %struct.Comm* %5 + load (1.562500e-01) from double* %2 + Frequency of %struct.clmap_t* %0 + load: 7.500000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %2 + load: 1.437500e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %3 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %5 + load: 1.750000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function __omp_offloading_33_128194f__Z17distLouvainMethodiiRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ddRi_l1368 +Round 0 +Round end +change loop scale from 32.0 to 1.0 + call (5.076923e+00, 2.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00) using i64* %1 + call (5.076923e+00, 1.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00) using i64* %2 + call (5.076923e+00, 1.725000e+01, 0.000000e+00, 0.000000e+00, 0.000000e+00) using %struct.Edge* %3 + call (5.076923e+00, 9.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00) using i64* %4 + call (5.076923e+00, 0.000000e+00, 1.000000e+00, 0.000000e+00, 0.000000e+00) using i64* %5 + call (5.076923e+00, 7.500000e-01, 0.000000e+00, 0.000000e+00, 0.000000e+00) using double* %6 + call (5.076923e+00, 2.124725e+01, 0.000000e+00, 0.000000e+00, 0.000000e+00) using %struct.Comm* %7 + call (5.076923e+00, 5.000000e-01, 0.000000e+00, 0.000000e+00, 0.000000e+00) using %struct.Comm* %8 + call (5.076923e+00, 5.000000e-01, 5.000000e-01, 0.000000e+00, 0.000000e+00) using double* %10 + Frequency of i64* %1 + load: 1.015385e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %2 + load: 5.076923e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Edge* %3 + load: 8.757692e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %4 + load: 4.569231e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %5 + load: 0.000000e+00 store: 5.076923e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %6 + load: 3.807692e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %7 + load: 1.078707e+02 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %8 + load: 2.538462e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %10 + load: 2.538462e+00 store: 2.538462e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + ---- Identify Target Regions ---- + ---- OMP (main.cpp, nvptx64-nvidia-cuda) ---- +Info: ignore malloc +Info: ignore malloc +Info: ignore malloc +Round 0 +Round end + ---- Access Frequency Analysis ---- + ---- Optimization Preparation ---- + ---- Data Mapping Optimization ---- +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +1 warning generated. + ---- Function Argument Access Frequency CG Analysis ---- +On function _Z7is_pwr2i +Round 0 +Round end +On function _Z8reseederj +Round 0 +Round end +On function _ZNSt8seed_seq8generateIN9__gnu_cxx17__normal_iteratorIPjSt6vectorIjSaIjEEEEEEvT_S8_ +Round 0 + alias entry %18 = getelementptr inbounds %"class.std::seed_seq", %"class.std::seed_seq"* %0, i64 0, i32 0, i32 0, i32 0, i32 1, !dbg !10369 + alias entry %19 = bitcast i32** %18 to i64*, !dbg !10369 + alias entry %21 = bitcast %"class.std::seed_seq"* %0 to i64*, !dbg !10376 +Round 1 +Round end + load (6.274510e-01) from %"class.std::seed_seq"* %0 + load (6.274510e-01) from %"class.std::seed_seq"* %0 + Frequency of %"class.std::seed_seq"* %0 + load: 1.254902e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z4lockv +Round 0 +Round end +On function _Z6unlockv +Round 0 +Round end +On function _Z19distSumVertexDegreeRK5GraphRSt6vectorIdSaIdEERS2_I4CommSaIS6_EE +Round 0 + alias entry %6 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 2, !dbg !10459 +Round 1 +Round end + load (1.000000e+00) from %class.Graph* %0 + Frequency of %class.Graph* %0 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.10"* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.15"* %2 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function __clang_call_terminate +Round 0 +Round end + Frequency of i8* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined. +Round 0 + alias entry %25 = getelementptr inbounds %class.Graph, %class.Graph* %3, i64 0, i32 0, i32 0, i32 0, i32 0 + alias entry %26 = getelementptr inbounds %class.Graph, %class.Graph* %3, i64 0, i32 1, i32 0, i32 0, i32 0 + alias entry %27 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %4, i64 0, i32 0, i32 0, i32 0 + alias entry %28 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %5, i64 0, i32 0, i32 0, i32 0 +Round 1 +Round end + load (6.350000e+00) from %class.Graph* %3 + load (6.350000e+00) from %class.Graph* %3 + load (6.350000e+00) from %"class.std::vector.10"* %4 + load (6.350000e+00) from %"class.std::vector.15"* %5 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %class.Graph* %3 + load: 1.270000e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.10"* %4 + load: 6.350000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.15"* %5 + load: 6.350000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z29distCalcConstantForSecondTermRKSt6vectorIdSaIdEEP19ompi_communicator_t +Round 0 + alias entry %9 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10283 + alias entry %10 = bitcast double** %9 to i64*, !dbg !10283 + alias entry %12 = bitcast %"class.std::vector.10"* %0 to i64*, !dbg !10288 +Round 1 +Round end + load (1.000000e+00) from %"class.std::vector.10"* %0 + load (1.000000e+00) from %"class.std::vector.10"* %0 + Frequency of %"class.std::vector.10"* %0 + load: 2.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.ompi_communicator_t* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp.reduction.reduction_func +Round 0 + alias entry %3 = bitcast i8* %1 to double**, !dbg !10261 + alias entry %5 = bitcast i8* %0 to double**, !dbg !10261 +Round 1 +Round end + load (1.000000e+00) from i8* %1 + load (1.000000e+00) from i8* %0 + Frequency of i8* %0 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i8* %1 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..2 +Round 0 + alias entry %32 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %4, i64 0, i32 0, i32 0, i32 0 + alias entry %98 = bitcast double* %3 to i64*, !dbg !10325 +Round 1 +Round end + load (3.157895e-01) from %"class.std::vector.10"* %4 + load (2.105263e-01) from double* %3 + store (2.105263e-01) to double* %3 + load (2.105263e-01) from double* %3 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %3 + load: 4.210526e-01 store: 2.105263e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.10"* %4 + load: 3.157895e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z12distInitCommRSt6vectorIlSaIlEES2_l +Round 0 + alias entry %6 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %1, i64 0, i32 0, i32 0, i32 1, !dbg !10273 + alias entry %7 = bitcast i64** %6 to i64*, !dbg !10273 + alias entry %9 = bitcast %"class.std::vector.0"* %1 to i64*, !dbg !10280 +Round 1 +Round end + load (1.000000e+00) from %"class.std::vector.0"* %1 + load (1.000000e+00) from %"class.std::vector.0"* %1 + Frequency of %"class.std::vector.0"* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %1 + load: 2.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..4 +Round 0 + alias entry %29 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 0 + alias entry %31 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %5, i64 0, i32 0, i32 0, i32 0 +Round 1 +Round end + load (3.200000e-01) from %"class.std::vector.0"* %3 + load (3.200000e-01) from %"class.std::vector.0"* %5 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %3 + load: 3.200000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %5 + load: 3.200000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z15distInitLouvainRK5GraphRSt6vectorIlSaIlEES5_RS2_IdSaIdEES8_RS2_I4CommSaIS9_EESC_Rdi +Round 0 + alias entry %16 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 0, !dbg !10485 + alias entry %20 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 2, !dbg !10502 + alias entry %22 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !10514 + alias entry %24 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %3, i64 0, i32 0, i32 0, i32 1, !dbg !10532 + alias entry %25 = bitcast double** %24 to i64*, !dbg !10532 + alias entry %27 = bitcast %"class.std::vector.10"* %3 to i64*, !dbg !10536 + alias entry %40 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %1, i64 0, i32 0, i32 0, i32 1, !dbg !10572 + alias entry %41 = bitcast i64** %40 to i64*, !dbg !10572 + alias entry %43 = bitcast %"class.std::vector.0"* %1 to i64*, !dbg !10574 + alias entry %56 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %2, i64 0, i32 0, i32 0, i32 1, !dbg !10600 + alias entry %57 = bitcast i64** %56 to i64*, !dbg !10600 + alias entry %59 = bitcast %"class.std::vector.0"* %2 to i64*, !dbg !10601 + alias entry %72 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %4, i64 0, i32 0, i32 0, i32 1, !dbg !10622 + alias entry %73 = bitcast double** %72 to i64*, !dbg !10622 + alias entry %75 = bitcast %"class.std::vector.10"* %4 to i64*, !dbg !10623 + alias entry %88 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %5, i64 0, i32 0, i32 0, i32 1, !dbg !10654 + alias entry %89 = bitcast %struct.Comm** %88 to i64*, !dbg !10654 + alias entry %91 = bitcast %"class.std::vector.15"* %5 to i64*, !dbg !10658 + alias entry %104 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %6, i64 0, i32 0, i32 0, i32 1, !dbg !10685 + alias entry %105 = bitcast %struct.Comm** %104 to i64*, !dbg !10685 + alias entry %107 = bitcast %"class.std::vector.15"* %6 to i64*, !dbg !10686 +Round 1 +Round end + load (1.000000e+00) from %class.Graph* %0 + load (1.000000e+00) from %class.Graph* %0 + load (1.000000e+00) from %class.Graph* %0 + load (1.000000e+00) from %"class.std::vector.10"* %3 + load (1.000000e+00) from %"class.std::vector.10"* %3 +Warning: wrong traversal order, or recursive call +On function _Z15distGetMaxIndexP7clmap_tRiPdS1_dPK4Commdldllld +Round 0 + alias entry %22 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %0, i64 %21 + alias entry %28 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %0, i64 0, i32 0, !dbg !10320 + alias entry %33 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %32, i32 1, !dbg !10330 + alias entry %35 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %32, i32 0, !dbg !10333 + alias entry %37 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %0, i64 0, i32 1, !dbg !10335 + alias entry %41 = getelementptr inbounds double, double* %2, i64 %38, !dbg !10340 + alias entry %60 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %0, i64 1, !dbg !10352 + alias entry %80 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %79, i32 1, !dbg !10330 + alias entry %82 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %79, i32 0, !dbg !10333 + alias entry %88 = getelementptr inbounds double, double* %2, i64 %85, !dbg !10340 + alias entry %124 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %123, i32 1, !dbg !10330 + alias entry %126 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %123, i32 0, !dbg !10333 + alias entry %132 = getelementptr inbounds double, double* %2, i64 %129, !dbg !10340 +Round 1 +Round end + load (1.000000e+00) from double* %2 + load (1.000000e+00) from i32* %3 + load (1.000000e+00) from i32* %1 + load (5.000000e-01) from %struct.clmap_t* %0 + load (2.500000e-01) from %struct.Comm* %5 + load (2.500000e-01) from %struct.Comm* %5 + load (2.500000e-01) from %struct.clmap_t* %0 + load (1.250000e-01) from double* %2 + load (9.984375e+00) from %struct.Comm* %5 + load (9.984375e+00) from %struct.Comm* %5 + load (4.984375e+00) from double* %2 + load (9.984375e+00) from %struct.Comm* %5 + load (9.984375e+00) from %struct.Comm* %5 + load (4.984375e+00) from double* %2 + Frequency of %struct.clmap_t* %0 + load: 7.500000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %2 + load: 1.109375e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %3 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %5 + load: 4.043750e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z24distBuildLocalMapCounterllP7clmap_tRiPdS1_PK4EdgePKllll +Round 0 + alias entry %20 = getelementptr inbounds %struct.Edge, %struct.Edge* %6, i64 %19, i32 0, !dbg !10308 + alias entry %21 = getelementptr inbounds %struct.Edge, %struct.Edge* %6, i64 %19, i32 1, !dbg !10310 + alias entry %30 = getelementptr inbounds i64, i64* %7, i64 %29, !dbg !10326 + alias entry %37 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %2, i64 %35, i32 0, !dbg !10337 + alias entry %45 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %2, i64 %34, !dbg !10348 + alias entry %55 = getelementptr inbounds double, double* %4, i64 %49, !dbg !10358 + alias entry %61 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %2, i64 %34, i32 0, !dbg !10364 + alias entry %62 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %2, i64 %34, i32 1, !dbg !10367 + alias entry %68 = bitcast double* %21 to i64*, !dbg !10375 + alias entry %71 = getelementptr inbounds double, double* %4, i64 %70, !dbg !10377 + alias entry %72 = bitcast double* %71 to i64*, !dbg !10378 +Round 1 +Round end + load (1.593750e+01) from %struct.Edge* %6 + load (7.937500e+00) from %struct.Edge* %6 + load (1.593750e+01) from i64* %7 + load (1.593750e+01) from i32* %3 + load (1.625000e+02) from %struct.clmap_t* %2 + load (9.937500e+00) from i32* %5 + load (4.937500e+00) from %struct.Edge* %6 + load (4.937500e+00) from double* %4 + store (4.937500e+00) to double* %4 + store (5.437500e+00) to %struct.clmap_t* %2 + store (5.437500e+00) to %struct.clmap_t* %2 + store (5.437500e+00) to i32* %3 + load (1.093750e+01) from i32* %5 + load (5.437500e+00) from %struct.Edge* %6 + store (5.437500e+00) to double* %4 + store (5.437500e+00) to i32* %5 + Frequency of %struct.clmap_t* %2 + load: 1.625000e+02 store: 1.087500e+01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %3 + load: 1.593750e+01 store: 5.437500e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %4 + load: 4.937500e+00 store: 1.037500e+01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %5 + load: 2.087500e+01 store: 5.437500e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Edge* %6 + load: 3.425000e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %7 + load: 1.593750e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z27distExecuteLouvainIterationlPKlS0_PK4EdgeS0_PlPKdP4CommS8_dPdi +Round 0 + alias entry %18 = getelementptr inbounds i64, i64* %2, i64 %17, !dbg !10316 + alias entry %20 = getelementptr inbounds i64, i64* %4, i64 %0, !dbg !10322 + alias entry %23 = getelementptr inbounds i64, i64* %1, i64 %0, !dbg !10329 + alias entry %26 = getelementptr inbounds i64, i64* %1, i64 %25, !dbg !10332 + alias entry %30 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %22, i32 0, !dbg !10337 + alias entry %32 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %22, i32 1, !dbg !10341 + alias entry %47 = getelementptr inbounds %struct.Edge, %struct.Edge* %3, i64 %46, i32 0, !dbg !10401 + alias entry %48 = getelementptr inbounds %struct.Edge, %struct.Edge* %3, i64 %46, i32 1, !dbg !10403 + alias entry %57 = getelementptr inbounds i64, i64* %4, i64 %56, !dbg !10414 + alias entry %93 = bitcast double* %48 to i64*, !dbg !10457 + alias entry %116 = getelementptr inbounds double, double* %10, i64 %0, !dbg !10470 + alias entry %120 = getelementptr inbounds double, double* %6, i64 %0, !dbg !10473 + alias entry %137 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %136, i32 1, !dbg !10533 + alias entry %139 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %136, i32 0, !dbg !10534 + alias entry %183 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %182, i32 1, !dbg !10533 + alias entry %185 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %182, i32 0, !dbg !10534 + alias entry %230 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %229, i32 1, !dbg !10572 + alias entry %231 = bitcast double* %230 to i64*, !dbg !10573 + alias entry %242 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %229, i32 0, !dbg !10575 + alias entry %244 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %22, i32 1, !dbg !10578 + alias entry %246 = bitcast double* %244 to i64*, !dbg !10581 + alias entry %257 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %22, i32 0, !dbg !10583 + alias entry %261 = getelementptr inbounds i64, i64* %5, i64 %0, !dbg !10587 + alias entry %264 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %263, i32 1, !dbg !10533 + alias entry %266 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %263, i32 0, !dbg !10534 +Round 1 +Round end + load (1.000000e+00) from i64* %2 + load (1.000000e+00) from i64* %4 + load (1.000000e+00) from i64* %1 + load (1.000000e+00) from i64* %1 + load (5.000000e-01) from %struct.Comm* %7 + load (5.000000e-01) from %struct.Comm* %7 + load (7.992188e+00) from %struct.Edge* %3 + load (3.992188e+00) from %struct.Edge* %3 + load (7.992188e+00) from i64* %4 + load (2.492188e+00) from %struct.Edge* %3 + load (2.742188e+00) from %struct.Edge* %3 + load (5.000000e-01) from double* %10 + store (5.000000e-01) to double* %10 + load (5.000000e-01) from double* %6 + load (1.250000e-01) from %struct.Comm* %7 + load (1.250000e-01) from %struct.Comm* %7 + load (4.992188e+00) from %struct.Comm* %7 + load (4.992188e+00) from %struct.Comm* %7 + load (2.500000e-01) from %struct.Comm* %8 + load (2.500000e-01) from double* %6 + load (2.500000e-01) from %struct.Comm* %8 + store (1.000000e+00) to i64* %5 + load (4.992188e+00) from %struct.Comm* %7 + load (4.992188e+00) from %struct.Comm* %7 + Frequency of i64* %1 + load: 2.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %2 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Edge* %3 + load: 1.721875e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %4 + load: 8.992188e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %5 + load: 0.000000e+00 store: 1.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %6 + load: 7.500000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %7 + load: 2.121875e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %8 + load: 5.000000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %10 + load: 5.000000e-01 store: 5.000000e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z21distComputeModularityRK5GraphP4CommPKddi +Round 0 + alias entry %14 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 2, !dbg !10288 + alias entry %16 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !10304 + base alias entry %35 = bitcast i8** %34 to double**, !dbg !10317 + base alias entry %37 = bitcast i8** %36 to double**, !dbg !10317 + base alias entry %45 = bitcast i8** %44 to %struct.Comm**, !dbg !10317 + base alias entry %47 = bitcast i8** %46 to %struct.Comm**, !dbg !10317 +Round 1 + base alias entry %34 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 2, !dbg !10317 + base alias entry %36 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 2, !dbg !10317 + base alias entry %44 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 4, !dbg !10317 + base alias entry %46 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 4, !dbg !10317 +Round 2 + base alias offset entry (2) %11 = alloca [5 x i8*], align 8 + base alias offset entry (2) %12 = alloca [5 x i8*], align 8 + base alias offset entry (-1) %39 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 3, !dbg !10317 + base alias offset entry (-1) %41 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 3, !dbg !10317 + base alias offset entry (4) %11 = alloca [5 x i8*], align 8 + base alias offset entry (-2) %44 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 4, !dbg !10317 + base alias offset entry (4) %12 = alloca [5 x i8*], align 8 + base alias offset entry (-2) %46 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 4, !dbg !10317 +Round 3 + base alias offset entry (4) %24 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 0, !dbg !10317 + base alias offset entry (2) %24 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 0, !dbg !10317 + base alias offset entry (4) %26 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 0, !dbg !10317 + base alias offset entry (2) %26 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 0, !dbg !10317 + base alias offset entry (3) %29 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 1, !dbg !10317 + base alias offset entry (1) %29 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 1, !dbg !10317 + base alias offset entry (3) %31 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 1, !dbg !10317 + base alias offset entry (1) %31 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 1, !dbg !10317 + base alias offset entry (2) %34 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 2, !dbg !10317 + base alias offset entry (2) %36 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 2, !dbg !10317 + base alias offset entry (1) %39 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 3, !dbg !10317 + base alias offset entry (1) %41 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 3, !dbg !10317 +Round 4 +Round end + load (1.000000e+00) from %class.Graph* %0 + load (1.000000e+00) from %class.Graph* %0 + Frequency of %class.Graph* %0 + load: 2.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %2 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp.reduction.reduction_func.7 +Round 0 + alias entry %3 = bitcast i8* %1 to double**, !dbg !10261 + alias entry %5 = bitcast i8* %0 to double**, !dbg !10261 + alias entry %7 = getelementptr inbounds i8, i8* %1, i64 8, !dbg !10261 + alias entry %8 = bitcast i8* %7 to double**, !dbg !10261 + alias entry %10 = getelementptr inbounds i8, i8* %0, i64 8, !dbg !10261 + alias entry %11 = bitcast i8* %10 to double**, !dbg !10261 +Round 1 +Round end + load (1.000000e+00) from i8* %1 + load (1.000000e+00) from i8* %0 + load (1.000000e+00) from i8* %1 + load (1.000000e+00) from i8* %0 + Frequency of i8* %0 + load: 2.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i8* %1 + load: 2.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..8 +Round 0 + alias entry %39 = getelementptr inbounds double, double* %6, i64 %38, !dbg !10318 + alias entry %42 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %38, i32 1, !dbg !10321 + alias entry %62 = bitcast double* %5 to i64*, !dbg !10329 + alias entry %74 = bitcast double* %7 to i64*, !dbg !10329 +Round 1 +Round end + load (1.010526e+01) from double* %6 + load (1.010526e+01) from %struct.Comm* %8 + load (2.105263e-01) from double* %5 + store (2.105263e-01) to double* %5 + load (2.105263e-01) from double* %7 + store (2.105263e-01) to double* %7 + load (2.105263e-01) from double* %5 + load (2.105263e-01) from double* %7 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %5 + load: 4.210526e-01 store: 2.105263e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %6 + load: 1.010526e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %7 + load: 4.210526e-01 store: 2.105263e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %8 + load: 1.010526e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp.reduction.reduction_func.9 +Round 0 + alias entry %3 = bitcast i8* %1 to double**, !dbg !10261 + alias entry %5 = bitcast i8* %0 to double**, !dbg !10261 + alias entry %7 = getelementptr inbounds i8, i8* %1, i64 8, !dbg !10261 + alias entry %8 = bitcast i8* %7 to double**, !dbg !10261 + alias entry %10 = getelementptr inbounds i8, i8* %0, i64 8, !dbg !10261 + alias entry %11 = bitcast i8* %10 to double**, !dbg !10261 +Round 1 +Round end + load (1.000000e+00) from i8* %1 + load (1.000000e+00) from i8* %0 + load (1.000000e+00) from i8* %1 + load (1.000000e+00) from i8* %0 + Frequency of i8* %0 + load: 2.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i8* %1 + load: 2.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..10 +Round 0 + alias entry %65 = bitcast double* %3 to i64*, !dbg !10310 + alias entry %77 = bitcast double* %5 to i64*, !dbg !10310 +Round 1 +Round end + load (2.916667e-01) from double* %3 + store (2.916667e-01) to double* %3 + load (2.916667e-01) from double* %5 + store (2.916667e-01) to double* %5 + load (3.333333e-01) from double* %3 + load (3.333333e-01) from double* %5 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %3 + load: 6.250000e-01 store: 2.916667e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %4 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %5 + load: 6.250000e-01 store: 2.916667e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %6 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z20distUpdateLocalCinfolP4CommPKS_ +Round 0 + base alias entry %15 = bitcast i8** %14 to %struct.Comm**, !dbg !10269 + base alias entry %17 = bitcast i8** %16 to %struct.Comm**, !dbg !10269 + base alias entry %20 = bitcast i8** %19 to %struct.Comm**, !dbg !10269 + base alias entry %22 = bitcast i8** %21 to %struct.Comm**, !dbg !10269 +Round 1 + base alias entry %14 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 1, !dbg !10269 + base alias entry %16 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 1, !dbg !10269 + base alias entry %19 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 2, !dbg !10269 + base alias entry %21 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 2, !dbg !10269 +Round 2 + base alias offset entry (1) %5 = alloca [3 x i8*], align 8 + base alias offset entry (1) %6 = alloca [3 x i8*], align 8 + base alias offset entry (2) %5 = alloca [3 x i8*], align 8 + base alias offset entry (-1) %19 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 2, !dbg !10269 + base alias offset entry (2) %6 = alloca [3 x i8*], align 8 + base alias offset entry (-1) %21 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 2, !dbg !10269 +Round 3 + base alias offset entry (1) %9 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 0, !dbg !10269 + base alias offset entry (2) %9 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 0, !dbg !10269 + base alias offset entry (1) %11 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 0, !dbg !10269 + base alias offset entry (2) %11 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 0, !dbg !10269 + base alias offset entry (1) %14 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 1, !dbg !10269 + base alias offset entry (1) %16 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 1, !dbg !10269 +Round 4 +Round end + Frequency of %struct.Comm* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %2 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..13 +Round 0 + alias entry %33 = getelementptr %struct.Comm, %struct.Comm* %5, i64 %26, !dbg !10304 + alias entry %34 = getelementptr %struct.Comm, %struct.Comm* %5, i64 %25, i32 1, !dbg !10304 + alias entry %35 = getelementptr %struct.Comm, %struct.Comm* %6, i64 %26, !dbg !10304 + alias entry %36 = getelementptr %struct.Comm, %struct.Comm* %6, i64 %25, i32 1, !dbg !10304 + alias entry %37 = getelementptr %struct.Comm, %struct.Comm* %5, i64 %26, i32 1, !dbg !10304 + alias entry %38 = getelementptr %struct.Comm, %struct.Comm* %5, i64 %29, !dbg !10304 + alias entry %39 = getelementptr %struct.Comm, %struct.Comm* %6, i64 %26, i32 1, !dbg !10304 + alias entry %40 = getelementptr %struct.Comm, %struct.Comm* %6, i64 %29, !dbg !10304 + alias entry %41 = bitcast double* %36 to %struct.Comm*, !dbg !10304 + alias entry %43 = bitcast double* %34 to %struct.Comm*, !dbg !10304 + alias entry %46 = bitcast %struct.Comm* %40 to double*, !dbg !10304 + alias entry %48 = bitcast %struct.Comm* %38 to double*, !dbg !10304 + alias entry %63 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %57, i32 0, !dbg !10304 + alias entry %64 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %58, i32 0, !dbg !10304 + alias entry %65 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %59, i32 0, !dbg !10304 + alias entry %66 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %60, i32 0, !dbg !10304 + alias entry %67 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %61, i32 0, !dbg !10304 + alias entry %68 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %62, i32 0, !dbg !10304 + alias entry %69 = bitcast i64* %63 to <4 x i64>*, !dbg !10304 + alias entry %70 = bitcast i64* %64 to <4 x i64>*, !dbg !10304 + alias entry %71 = bitcast i64* %65 to <4 x i64>*, !dbg !10304 + alias entry %72 = bitcast i64* %66 to <4 x i64>*, !dbg !10304 + alias entry %73 = bitcast i64* %67 to <4 x i64>*, !dbg !10304 + alias entry %74 = bitcast i64* %68 to <4 x i64>*, !dbg !10304 + alias entry %93 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %57, i32 0, !dbg !10307 + alias entry %94 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %58, i32 0, !dbg !10307 + alias entry %95 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %59, i32 0, !dbg !10307 + alias entry %96 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %60, i32 0, !dbg !10307 + alias entry %97 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %61, i32 0, !dbg !10307 + alias entry %98 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %62, i32 0, !dbg !10307 + alias entry %99 = bitcast i64* %93 to <4 x i64>*, !dbg !10307 + alias entry %100 = bitcast i64* %94 to <4 x i64>*, !dbg !10307 + alias entry %101 = bitcast i64* %95 to <4 x i64>*, !dbg !10307 + alias entry %102 = bitcast i64* %96 to <4 x i64>*, !dbg !10307 + alias entry %103 = bitcast i64* %97 to <4 x i64>*, !dbg !10307 + alias entry %104 = bitcast i64* %98 to <4 x i64>*, !dbg !10307 + alias entry %135 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %57, i32 1, !dbg !10309 + alias entry %136 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %58, i32 1, !dbg !10309 + alias entry %137 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %59, i32 1, !dbg !10309 + alias entry %138 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %60, i32 1, !dbg !10309 + alias entry %139 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %61, i32 1, !dbg !10309 + alias entry %140 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %62, i32 1, !dbg !10309 + alias entry %147 = getelementptr inbounds double, double* %135, i64 -1, !dbg !10309 + alias entry %148 = bitcast double* %147 to <4 x double>*, !dbg !10309 + alias entry %149 = getelementptr inbounds double, double* %136, i64 -1, !dbg !10309 + alias entry %150 = bitcast double* %149 to <4 x double>*, !dbg !10309 + alias entry %151 = getelementptr inbounds double, double* %137, i64 -1, !dbg !10309 + alias entry %152 = bitcast double* %151 to <4 x double>*, !dbg !10309 + alias entry %153 = getelementptr inbounds double, double* %138, i64 -1, !dbg !10309 + alias entry %154 = bitcast double* %153 to <4 x double>*, !dbg !10309 + alias entry %155 = getelementptr inbounds double, double* %139, i64 -1, !dbg !10309 + alias entry %156 = bitcast double* %155 to <4 x double>*, !dbg !10309 + alias entry %157 = getelementptr inbounds double, double* %140, i64 -1, !dbg !10309 + alias entry %158 = bitcast double* %157 to <4 x double>*, !dbg !10309 + alias entry %178 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %177, i32 0, !dbg !10304 + alias entry %180 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %177, i32 0, !dbg !10307 + alias entry %183 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %177, i32 1, !dbg !10318 + alias entry %185 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %177, i32 1, !dbg !10309 +Round 1 +Round end + load (2.500000e+00) from %struct.Comm* %6 + load (2.500000e+00) from %struct.Comm* %6 + load (2.500000e+00) from %struct.Comm* %6 + load (2.500000e+00) from %struct.Comm* %6 + load (2.500000e+00) from %struct.Comm* %6 + load (2.500000e+00) from %struct.Comm* %6 + load (2.500000e+00) from %struct.Comm* %5 + load (2.500000e+00) from %struct.Comm* %5 + load (2.500000e+00) from %struct.Comm* %5 + load (2.500000e+00) from %struct.Comm* %5 + load (2.500000e+00) from %struct.Comm* %5 + load (2.500000e+00) from %struct.Comm* %5 + store (2.500000e+00) to %struct.Comm* %5 + store (2.500000e+00) to %struct.Comm* %5 + store (2.500000e+00) to %struct.Comm* %5 + store (2.500000e+00) to %struct.Comm* %5 + store (2.500000e+00) to %struct.Comm* %5 + store (2.500000e+00) to %struct.Comm* %5 + load (9.088235e+00) from %struct.Comm* %6 + load (9.088235e+00) from %struct.Comm* %5 + store (9.088235e+00) to %struct.Comm* %5 + load (9.088235e+00) from %struct.Comm* %6 + load (9.088235e+00) from %struct.Comm* %5 + store (9.088235e+00) to %struct.Comm* %5 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %5 + load: 3.317647e+01 store: 3.317647e+01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %6 + load: 3.317647e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..14 +Round 0 +Round end + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %3 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %4 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z16distCleanCWandCUlPdP4Comm +Round 0 + base alias entry %17 = bitcast i8** %16 to double**, !dbg !10269 + base alias entry %19 = bitcast i8** %18 to double**, !dbg !10269 + base alias entry %22 = bitcast i8** %21 to %struct.Comm**, !dbg !10269 + base alias entry %24 = bitcast i8** %23 to %struct.Comm**, !dbg !10269 +Round 1 + base alias entry %16 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 1, !dbg !10269 + base alias entry %18 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 1, !dbg !10269 + base alias entry %21 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 2, !dbg !10269 + base alias entry %23 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 2, !dbg !10269 +Round 2 + base alias offset entry (1) %5 = alloca [3 x i8*], align 8 + base alias offset entry (1) %6 = alloca [3 x i8*], align 8 + base alias offset entry (2) %5 = alloca [3 x i8*], align 8 + base alias offset entry (-1) %21 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 2, !dbg !10269 + base alias offset entry (2) %6 = alloca [3 x i8*], align 8 + base alias offset entry (-1) %23 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 2, !dbg !10269 +Round 3 + base alias offset entry (1) %11 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 0, !dbg !10269 + base alias offset entry (2) %11 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 0, !dbg !10269 + base alias offset entry (1) %13 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 0, !dbg !10269 + base alias offset entry (2) %13 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 0, !dbg !10269 + base alias offset entry (1) %16 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 1, !dbg !10269 + base alias offset entry (1) %18 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 1, !dbg !10269 +Round 4 +Round end + Frequency of double* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %2 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..18 +Round 0 + alias entry %29 = getelementptr inbounds double, double* %5, i64 %28, !dbg !10304 + alias entry %30 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %28, i32 0, !dbg !10309 + alias entry %33 = bitcast i64* %30 to i8*, !dbg !10299 +Round 1 +Round end + store (1.058333e+01) to double* %5 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %5 + load: 0.000000e+00 store: 1.058333e+01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %6 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..19 +Round 0 +Round end + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %3 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %4 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z21fillRemoteCommunitiesRK5GraphiiRKmS3_RKSt6vectorIlSaIlEES8_S8_S8_S8_RKS4_I4CommSaIS9_EERSt3mapIlS9_St4lessIlESaISt4pairIKlS9_EEERSt13unordered_mapIllSt4hashIlESt8equal_toIlESaISH_ISI_lEEESM_ +Round 0 + alias entry %126 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 0, !dbg !11433 + alias entry %130 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 2, !dbg !11449 + alias entry %132 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !11460 + alias entry %190 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %6, i64 0, i32 0, i32 0, i32 0 + alias entry %197 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %5, i64 0, i32 0, i32 0, i32 0 + alias entry %299 = getelementptr inbounds %"class.std::unordered_map", %"class.std::unordered_map"* %12, i64 0, i32 0, i32 2, i32 0, !dbg !11792 + alias entry %300 = bitcast %"struct.std::__detail::_Hash_node_base"* %299 to %"struct.std::__detail::_Hash_node"**, !dbg !11793 + alias entry %308 = bitcast %"class.std::unordered_map"* %12 to i8**, !dbg !11836 + alias entry %310 = getelementptr inbounds %"class.std::unordered_map", %"class.std::unordered_map"* %12, i64 0, i32 0, i32 1, !dbg !11842 + alias entry %313 = bitcast %"struct.std::__detail::_Hash_node_base"* %299 to i8*, !dbg !11846 + alias entry %316 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %8, i64 0, i32 0, i32 0, i32 0 + alias entry %317 = getelementptr inbounds %"class.std::unordered_map", %"class.std::unordered_map"* %12, i64 0, i32 0 + alias entry %318 = getelementptr inbounds %"class.std::unordered_map", %"class.std::unordered_map"* %12, i64 0, i32 0, i32 0 + alias entry %319 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6 + alias entry %320 = bitcast %"class.std::vector.0"* %319 to i64* + alias entry %321 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 1 + alias entry %322 = bitcast i64** %321 to i64* + alias entry %325 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %9, i64 0, i32 0, i32 0, i32 0 + alias entry %326 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6 + alias entry %327 = bitcast %"class.std::vector.0"* %326 to i64* + alias entry %328 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 1 + alias entry %329 = bitcast i64** %328 to i64* + alias entry %800 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, !dbg !13393 + alias entry %801 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 1, i32 1, !dbg !13405 + alias entry %802 = bitcast %"struct.std::_Rb_tree_node_base"** %801 to %"struct.std::_Rb_tree_node"**, !dbg !13405 + alias entry %808 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 1, !dbg !13419 + alias entry %809 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 1, i32 2, !dbg !13425 + base alias entry %809 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 1, i32 2, !dbg !13425 + alias entry %810 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 1, i32 3, !dbg !13435 + base alias entry %810 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 1, i32 3, !dbg !13435 + alias entry %811 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 2, !dbg !13437 + alias entry %812 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, !dbg !13442 + alias entry %813 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 1, i32 1, !dbg !13447 + alias entry %814 = bitcast %"struct.std::_Rb_tree_node_base"** %813 to %"struct.std::_Rb_tree_node"**, !dbg !13447 + alias entry %820 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 1, !dbg !13452 + alias entry %821 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 1, i32 2, !dbg !13455 + base alias entry %821 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 1, i32 2, !dbg !13455 + alias entry %822 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 1, i32 3, !dbg !13462 + base alias entry %822 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 1, i32 3, !dbg !13462 + alias entry %823 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 2, !dbg !13464 + alias entry %828 = bitcast %"struct.std::_Rb_tree_node_base"** %801 to i64* + alias entry %830 = bitcast %"struct.std::_Rb_tree_node_base"* %808 to %"struct.std::_Rb_tree_node"* + alias entry %832 = bitcast %"struct.std::_Rb_tree_node_base"** %813 to i64* + alias entry %834 = bitcast %"struct.std::_Rb_tree_node_base"* %820 to %"struct.std::_Rb_tree_node"* + alias entry %943 = load %"struct.std::_Rb_tree_node_base"*, %"struct.std::_Rb_tree_node_base"** %809, align 8, !dbg !14017, !tbaa !14018 + alias entry %998 = load %"struct.std::_Rb_tree_node_base"*, %"struct.std::_Rb_tree_node_base"** %821, align 8, !dbg !14306, !tbaa !14018 +Round 1 +Round end + load (1.000000e+00) from i64* %4 + load (9.999994e-01) from i64* %3 + load (9.999963e-01) from %class.Graph* %0 + load (9.999963e-01) from %class.Graph* %0 + load (9.999963e-01) from %class.Graph* %0 + load (9.999803e+00) from %"class.std::vector.0"* %6 + load (1.999960e+01) from %"class.std::vector.0"* %6 + load (6.249782e+00) from %"class.std::vector.0"* %5 + load (1.249956e+01) from %"class.std::vector.0"* %5 + load (9.999777e-01) from %"class.std::unordered_map"* %12 + load (9.999777e-01) from %"class.std::unordered_map"* %12 + load (9.999777e-01) from %"class.std::unordered_map"* %12 + load (1.999809e+01) from %"class.std::vector.0"* %8 + load (1.999807e+01) from %"class.std::unordered_map"* %12 + load (1.999807e+01) from %"class.std::unordered_map"* %12 +Warning: wrong traversal order, or recursive call +On function .omp_outlined..22 +Round 0 + alias entry %31 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 0 + alias entry %33 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 0 + alias entry %35 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %6, i64 0, i32 0, i32 0, i32 0 +Round 1 +Round end + load (1.000000e+00) from i64* %2 + load (3.200000e-01) from %"class.std::vector.0"* %3 + load (3.200000e-01) from %"class.std::vector.0"* %4 + load (3.200000e-01) from %"class.std::vector.0"* %6 + load (1.020000e+01) from i64* %5 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %2 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %3 + load: 3.200000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %4 + load: 3.200000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %5 + load: 1.020000e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %6 + load: 3.200000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp.reduction.reduction_func.24 +Round 0 + alias entry %3 = bitcast i8* %1 to i64**, !dbg !10261 + alias entry %5 = bitcast i8* %0 to i64**, !dbg !10261 +Round 1 +Round end + load (1.000000e+00) from i8* %1 + load (1.000000e+00) from i8* %0 + Frequency of i8* %0 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i8* %1 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..25 +Round 0 + alias entry %33 = getelementptr inbounds %"class.std::vector.29", %"class.std::vector.29"* %4, i64 0, i32 0, i32 0, i32 0 + alias entry %35 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 0 +Round 1 +Round end + load (1.000000e+00) from i32* %2 + load (3.157895e-01) from %"class.std::vector.29"* %4 + load (3.157895e-01) from %"class.std::vector.0"* %3 + load (2.105263e-01) from i64* %5 + store (2.105263e-01) to i64* %5 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %2 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %3 + load: 3.157895e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.29"* %4 + load: 3.157895e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %5 + load: 2.105263e-01 store: 2.105263e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp.reduction.reduction_func.27 +Round 0 + alias entry %3 = bitcast i8* %1 to i64**, !dbg !10261 + alias entry %5 = bitcast i8* %0 to i64**, !dbg !10261 +Round 1 +Round end + load (1.000000e+00) from i8* %1 + load (1.000000e+00) from i8* %0 + Frequency of i8* %0 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i8* %1 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..28 +Round 0 + alias entry %32 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 0 +Round 1 +Round end + load (1.000000e+00) from i32* %2 + load (3.157895e-01) from %"class.std::vector.0"* %4 + load (2.105263e-01) from i64* %3 + store (2.105263e-01) to i64* %3 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %2 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %3 + load: 2.105263e-01 store: 2.105263e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %4 + load: 3.157895e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..30 +Round 0 + alias entry %20 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %2, i64 0, i32 0, i32 0, i32 0, !dbg !10503 + alias entry %34 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 0 + alias entry %35 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %7, i64 0, i32 0, i32 0, i32 0 + alias entry %36 = getelementptr inbounds %"class.std::vector.52", %"class.std::vector.52"* %6, i64 0, i32 0, i32 0, i32 0 +Round 1 +Round end + load (1.000000e+00) from %"class.std::vector.0"* %2 + load (2.047500e+02) from %"class.std::vector.0"* %4 + load (2.047500e+02) from %"class.std::vector.15"* %7 + load (2.047500e+02) from %"class.std::vector.52"* %6 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %2 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %4 + load: 2.047500e+02 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.52"* %6 + load: 2.047500e+02 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.15"* %7 + load: 2.047500e+02 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z22createCommunityMPITypev +Round 0 +Round end +On function _Z23destroyCommunityMPITypev +Round 0 +Round end +On function _Z23updateRemoteCommunitiesRK5GraphRSt6vectorI4CommSaIS3_EERKSt3mapIlS3_St4lessIlESaISt4pairIKlS3_EEEii +Round 0 + alias entry %19 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 0, !dbg !10869 + alias entry %46 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !11050 + alias entry %48 = getelementptr inbounds %"class.std::map", %"class.std::map"* %2, i64 0, i32 0, i32 0, i32 1, i32 2, !dbg !11068 + alias entry %49 = bitcast %"struct.std::_Rb_tree_node_base"** %48 to i64*, !dbg !11068 + alias entry %51 = getelementptr inbounds %"class.std::map", %"class.std::map"* %2, i64 0, i32 0, i32 0, i32 1, !dbg !11085 + alias entry %55 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6 + alias entry %56 = bitcast %"class.std::vector.0"* %55 to i64* + alias entry %57 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 1 + alias entry %58 = bitcast i64** %57 to i64* +Round 1 +Round end + load (1.000000e+00) from %class.Graph* %0 + load (9.999994e-01) from %class.Graph* %0 + load (9.999994e-01) from %"class.std::map"* %2 + load (1.999985e+01) from %class.Graph* %0 + load (1.999985e+01) from %class.Graph* %0 + Frequency of %class.Graph* %0 + load: 4.199970e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.15"* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::map"* %2 + load: 9.999994e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..32 +Round 0 + alias entry %28 = getelementptr inbounds %"class.std::vector.66", %"class.std::vector.66"* %4, i64 0, i32 0, i32 0, i32 0 + alias entry %30 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 0 +Round 1 +Round end + load (1.000000e+00) from i32* %2 + load (3.137255e-01) from %"class.std::vector.66"* %4 + load (3.137255e-01) from %"class.std::vector.0"* %3 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %2 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %3 + load: 3.137255e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.66"* %4 + load: 3.137255e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp.reduction.reduction_func.34 +Round 0 + alias entry %3 = bitcast i8* %1 to i64**, !dbg !10261 + alias entry %5 = bitcast i8* %0 to i64**, !dbg !10261 + alias entry %7 = getelementptr inbounds i8, i8* %1, i64 8, !dbg !10261 + alias entry %8 = bitcast i8* %7 to i64**, !dbg !10261 + alias entry %10 = getelementptr inbounds i8, i8* %0, i64 8, !dbg !10261 + alias entry %11 = bitcast i8* %10 to i64**, !dbg !10261 +Round 1 +Round end + load (1.000000e+00) from i8* %1 + load (1.000000e+00) from i8* %0 + load (1.000000e+00) from i8* %1 + load (1.000000e+00) from i8* %0 + Frequency of i8* %0 + load: 2.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i8* %1 + load: 2.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..35 +Round 0 + alias entry %36 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 0 + alias entry %38 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %6, i64 0, i32 0, i32 0, i32 0 +Round 1 +Round end + load (1.000000e+00) from i32* %2 + load (3.157895e-01) from %"class.std::vector.0"* %4 + load (3.157895e-01) from %"class.std::vector.0"* %6 + load (2.105263e-01) from i64* %3 + store (2.105263e-01) to i64* %3 + load (2.105263e-01) from i64* %5 + store (2.105263e-01) to i64* %5 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %2 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %3 + load: 2.105263e-01 store: 2.105263e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %4 + load: 3.157895e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %5 + load: 2.105263e-01 store: 2.105263e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %6 + load: 3.157895e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..37 +Round 0 + alias entry %26 = getelementptr inbounds %"class.std::vector.52", %"class.std::vector.52"* %3, i64 0, i32 0, i32 0, i32 0 + alias entry %27 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %4, i64 0, i32 0, i32 0, i32 0 +Round 1 +Round end + load (1.000000e+00) from i64* %2 + load (6.350000e+00) from %"class.std::vector.52"* %3 + load (6.350000e+00) from %"class.std::vector.15"* %4 + load (6.350000e+00) from i64* %5 + load (2.047500e+02) from i64* %5 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %2 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.52"* %3 + load: 6.350000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.15"* %4 + load: 6.350000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %5 + load: 2.111000e+02 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z18exchangeVertexReqsRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ii +Round 0 + alias entry %15 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 2, !dbg !10306 + alias entry %17 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !10319 + alias entry %51 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 1, !dbg !10485 + alias entry %52 = bitcast i64** %51 to i64*, !dbg !10485 + alias entry %54 = bitcast %"class.std::vector.0"* %4 to i64*, !dbg !10489 + alias entry %71 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 1, !dbg !10517 + alias entry %72 = bitcast i64** %71 to i64*, !dbg !10517 + alias entry %74 = bitcast %"class.std::vector.0"* %3 to i64*, !dbg !10518 + alias entry %91 = bitcast %"class.std::vector.0"* %3 to i8** + alias entry %94 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 0 + alias entry %98 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 0, !dbg !10598 + alias entry %99 = bitcast %"class.std::vector.0"* %4 to i8**, !dbg !10598 + alias entry %128 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %5, i64 0, i32 0, i32 0, i32 1, !dbg !10673 + alias entry %129 = bitcast i64** %128 to i64*, !dbg !10673 + alias entry %131 = bitcast %"class.std::vector.0"* %5 to i64*, !dbg !10674 + alias entry %147 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %6, i64 0, i32 0, i32 0, i32 1, !dbg !10696 + alias entry %148 = bitcast i64** %147 to i64*, !dbg !10696 + alias entry %150 = bitcast %"class.std::vector.0"* %6 to i64*, !dbg !10697 + alias entry %190 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %6, i64 0, i32 0, i32 0, i32 0 + alias entry %249 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %5, i64 0, i32 0, i32 0, i32 0 + alias entry %306 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %5, i64 0, i32 0, i32 0, i32 2, !dbg !11244 + alias entry %307 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %6, i64 0, i32 0, i32 0, i32 2, !dbg !11245 + alias entry %308 = bitcast i64** %306 to i64*, !dbg !11249 + alias entry %310 = bitcast i64** %307 to i64*, !dbg !11250 + alias entry %316 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 2, !dbg !11279 + alias entry %317 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 2, !dbg !11280 + alias entry %318 = bitcast i64** %316 to i64*, !dbg !11284 + alias entry %320 = bitcast i64** %317 to i64*, !dbg !11285 +Round 1 +Round end + load (1.000000e+00) from %class.Graph* %0 + load (1.000000e+00) from %class.Graph* %0 + load (9.999984e-01) from %"class.std::vector.0"* %4 + load (9.999984e-01) from %"class.std::vector.0"* %4 +Warning: wrong traversal order, or recursive call +On function .omp_outlined..39 +Round 0 + alias entry %26 = getelementptr inbounds %class.Graph, %class.Graph* %3, i64 0, i32 0, i32 0, i32 0, i32 0 + alias entry %27 = getelementptr inbounds %class.Graph, %class.Graph* %3, i64 0, i32 1, i32 0, i32 0, i32 0 + alias entry %28 = getelementptr inbounds %class.Graph, %class.Graph* %3, i64 0, i32 6 + alias entry %29 = bitcast %"class.std::vector.0"* %28 to i64* + alias entry %30 = getelementptr inbounds %class.Graph, %class.Graph* %3, i64 0, i32 6, i32 0, i32 0, i32 1 + alias entry %31 = bitcast i64** %30 to i64* + alias entry %32 = getelementptr inbounds %"class.std::vector.29", %"class.std::vector.29"* %5, i64 0, i32 0, i32 0, i32 0 +Round 1 +Round end + load (1.988141e+02) from %class.Graph* %3 + load (3.180957e+03) from %class.Graph* %3 + load (3.180957e+03) from %class.Graph* %3 + load (3.180957e+03) from %class.Graph* %3 + load (1.590478e+03) from %"class.std::vector.29"* %5 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %class.Graph* %3 + load: 9.741684e+03 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.29"* %5 + load: 1.590478e+03 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp.reduction.reduction_func.41 +Round 0 + alias entry %3 = bitcast i8* %1 to i64**, !dbg !10261 + alias entry %5 = bitcast i8* %0 to i64**, !dbg !10261 +Round 1 +Round end + load (1.000000e+00) from i8* %1 + load (1.000000e+00) from i8* %0 + Frequency of i8* %0 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i8* %1 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..42 +Round 0 + alias entry %32 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 0 +Round 1 +Round end + load (1.000000e+00) from i32* %2 + load (3.157895e-01) from %"class.std::vector.0"* %4 + load (2.105263e-01) from i64* %3 + store (2.105263e-01) to i64* %3 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %2 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %3 + load: 2.105263e-01 store: 2.105263e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %4 + load: 3.157895e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z17distLouvainMethodiiRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ddRi +Round 0 + alias entry %68 = getelementptr inbounds %class.Graph, %class.Graph* %2, i64 0, i32 2, !dbg !11180 + alias entry %85 = getelementptr inbounds %class.Graph, %class.Graph* %2, i64 0, i32 0, i32 0, i32 0, i32 1, !dbg !11380 + alias entry %86 = bitcast i64** %85 to i64*, !dbg !11380 + alias entry %88 = bitcast %class.Graph* %2 to i64*, !dbg !11384 + alias entry %93 = bitcast %class.Graph* %2 to i8**, !dbg !11392 + alias entry %98 = getelementptr inbounds %class.Graph, %class.Graph* %2, i64 0, i32 6, !dbg !11399 + alias entry %99 = getelementptr inbounds %class.Graph, %class.Graph* %2, i64 0, i32 6, i32 0, i32 0, i32 1, !dbg !11402 + alias entry %100 = bitcast i64** %99 to i64*, !dbg !11402 + alias entry %102 = bitcast %"class.std::vector.0"* %98 to i64*, !dbg !11403 + alias entry %107 = bitcast %"class.std::vector.0"* %98 to i8**, !dbg !11410 + alias entry %112 = getelementptr inbounds %class.Graph, %class.Graph* %2, i64 0, i32 1, !dbg !11417 + alias entry %113 = getelementptr inbounds %class.Graph, %class.Graph* %2, i64 0, i32 1, i32 0, i32 0, i32 1, !dbg !11424 + alias entry %114 = bitcast %struct.Edge** %113 to i64*, !dbg !11424 + alias entry %116 = bitcast %"class.std::vector.5"* %112 to i64*, !dbg !11428 + alias entry %121 = bitcast %"class.std::vector.5"* %112 to i8**, !dbg !11440 +Round 1 +Round end + load (9.999981e-01) from %class.Graph* %2 +Warning: wrong traversal order, or recursive call +On function .omp_outlined..45 +Round 0 +Round end + call (1.058333e+01, 2.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00) using i64* %5 + call (1.058333e+01, 1.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00) using i64* %6 + call (1.058333e+01, 1.721875e+01, 0.000000e+00, 0.000000e+00, 0.000000e+00) using %struct.Edge* %7 + call (1.058333e+01, 8.992188e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00) using i64* %8 + call (1.058333e+01, 0.000000e+00, 1.000000e+00, 0.000000e+00, 0.000000e+00) using i64* %9 + call (1.058333e+01, 7.500000e-01, 0.000000e+00, 0.000000e+00, 0.000000e+00) using double* %10 + call (1.058333e+01, 2.121875e+01, 0.000000e+00, 0.000000e+00, 0.000000e+00) using %struct.Comm* %11 + call (1.058333e+01, 5.000000e-01, 0.000000e+00, 0.000000e+00, 0.000000e+00) using %struct.Comm* %12 + call (1.058333e+01, 5.000000e-01, 5.000000e-01, 0.000000e+00, 0.000000e+00) using double* %14 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %5 + load: 2.116667e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %6 + load: 1.058333e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Edge* %7 + load: 1.822318e+02 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %8 + load: 9.516732e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %9 + load: 0.000000e+00 store: 1.058333e+01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %10 + load: 7.937500e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %11 + load: 2.245651e+02 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %12 + load: 5.291667e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %14 + load: 5.291667e+00 store: 5.291667e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..46 +Round 0 +Round end + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %3 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %4 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Edge* %5 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %6 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %7 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %8 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %9 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %10 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %12 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..49 +Round 0 + alias entry %28 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 0 +Round 1 +Round end + load (3.200000e-01) from %"class.std::vector.0"* %3 + load (3.200000e-01) from i64** %4 + load (3.200000e-01) from i64** %5 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %3 + load: 3.200000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64** %4 + load: 3.200000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64** %5 + load: 3.200000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function main +Round 0 + base alias entry %14 = alloca i8**, align 8 + alias entry %33 = load i8**, i8*** %14, align 8, !dbg !10342, !tbaa !10335 +Round 1 +Round end + Frequency of i8** %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZN11GenerateRGGC2ElP19ompi_communicator_t +Round 0 + alias entry %4 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 0, !dbg !10266 + alias entry %5 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 3, !dbg !10276 + base alias entry %5 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 3, !dbg !10276 + alias entry %6 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 5, !dbg !10279 + alias entry %8 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %5, align 8, !dbg !10281, !tbaa !10278 + alias entry %9 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 4, !dbg !10282 + alias entry %11 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 7, !dbg !10284 + alias entry %12 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 6, !dbg !10287 + alias entry %36 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 1, !dbg !10320 + alias entry %100 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %5, align 8, !dbg !10478, !tbaa !10278 + alias entry %171 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %5, align 8, !dbg !10565, !tbaa !10278 + alias entry %183 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 2, !dbg !10579 + alias entry %190 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %5, align 8, !dbg !10583, !tbaa !10278 +Round 1 +Round end + store (1.000000e+00) to %class.GenerateRGG* %0 + store (1.000000e+00) to %class.GenerateRGG* %0 + load (1.000000e+00) from %class.GenerateRGG* %0 + store (1.000000e+00) to %class.GenerateRGG* %0 + store (1.000000e+00) to %class.GenerateRGG* %0 + load (1.000000e+00) from %class.GenerateRGG* %0 + load (5.000000e-01) from %class.GenerateRGG* %0 + store (2.500000e-01) to %class.GenerateRGG* %0 + store (3.437500e-01) to %class.GenerateRGG* %0 + store (2.500000e-01) to %class.GenerateRGG* %0 + load (1.000000e+00) from %class.GenerateRGG* %0 + store (1.000000e+00) to %class.GenerateRGG* %0 + load (5.000000e-01) from %class.GenerateRGG* %0 + load (5.000000e-01) from %class.GenerateRGG* %0 + load (5.000000e-01) from %class.GenerateRGG* %0 + load (7.656250e-01) from %class.GenerateRGG* %0 + load (7.656250e-01) from %class.GenerateRGG* %0 + store (1.000000e+00) to %class.GenerateRGG* %0 + load (1.000000e+00) from %class.GenerateRGG* %0 + load (1.000000e+00) from %class.GenerateRGG* %0 + Frequency of %class.GenerateRGG* %0 + load: 8.531250e+00 store: 6.843750e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.ompi_communicator_t* %2 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZN11GenerateRGG8generateEbbi +Round 0 + alias entry %27 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 1, !dbg !10306 + alias entry %75 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 6, !dbg !10592 + alias entry %112 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 7, !dbg !10709 + alias entry %153 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 0, !dbg !10828 + alias entry %156 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 4, !dbg !10832 + alias entry %160 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 5, !dbg !10836 + alias entry %362 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 3, !dbg !10915 + alias entry %696 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 3, !dbg !11101 + alias entry %772 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 2 + alias entry %1095 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 2 + alias entry %1388 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 2 +Round 1 +Round end + load (1.000000e+00) from %class.GenerateRGG* %0 + load (6.249994e-01) from %class.GenerateRGG* %0 + load (9.999990e-01) from %class.GenerateRGG* %0 + load (4.999995e-01) from %class.GenerateRGG* %0 + load (3.124994e-01) from %class.GenerateRGG* %0 + load (9.999985e-01) from %class.GenerateRGG* %0 + load (4.999993e-01) from %class.GenerateRGG* %0 + load (3.124992e-01) from %class.GenerateRGG* %0 + load (9.999971e-01) from %class.GenerateRGG* %0 + load (9.999971e-01) from %class.GenerateRGG* %0 + load (9.999962e-01) from %class.GenerateRGG* %0 + load (9.999962e-01) from %class.GenerateRGG* %0 + load (4.999966e-01) from %class.GenerateRGG* %0 + load (4.999971e-01) from %class.GenerateRGG* %0 + load (4.999971e-01) from %class.GenerateRGG* %0 + load (4.999966e-01) from %class.GenerateRGG* %0 + load (9.999923e-01) from %class.GenerateRGG* %0 + load (9.999914e-01) from %class.GenerateRGG* %0 + load (3.749968e-01) from %class.GenerateRGG* %0 + load (3.749964e-01) from %class.GenerateRGG* %0 + load (9.999890e-01) from %class.GenerateRGG* %0 + load (9.998746e-01) from %class.GenerateRGG* %0 + load (3.199362e+02) from %class.GenerateRGG* %0 + load (3.199361e+02) from %class.GenerateRGG* %0 + load (6.249210e-01) from %class.GenerateRGG* %0 + load (6.249210e-01) from %class.GenerateRGG* %0 + load (9.998736e-01) from %class.GenerateRGG* %0 + load (9.998736e-01) from %class.GenerateRGG* %0 + load (9.998736e-01) from %class.GenerateRGG* %0 + load (9.998736e-01) from %class.GenerateRGG* %0 + load (9.998726e-01) from %class.GenerateRGG* %0 + load (9.998726e-01) from %class.GenerateRGG* %0 + load (9.998726e-01) from %class.GenerateRGG* %0 + load (9.998726e-01) from %class.GenerateRGG* %0 + load (9.998717e-01) from %class.GenerateRGG* %0 + load (9.998717e-01) from %class.GenerateRGG* %0 + load (9.998717e-01) from %class.GenerateRGG* %0 + load (9.998717e-01) from %class.GenerateRGG* %0 + load (9.998707e-01) from %class.GenerateRGG* %0 + load (9.998707e-01) from %class.GenerateRGG* %0 + load (9.998707e-01) from %class.GenerateRGG* %0 + load (9.998707e-01) from %class.GenerateRGG* %0 + load (9.998698e-01) from %class.GenerateRGG* %0 + load (4.999349e-01) from %class.GenerateRGG* %0 + load (2.499674e-01) from %class.GenerateRGG* %0 + load (7.997451e+01) from %class.GenerateRGG* %0 + load (3.998725e+01) from %class.GenerateRGG* %0 + load (3.998725e+01) from %class.GenerateRGG* %0 + load (7.997448e+01) from %class.GenerateRGG* %0 + load (4.999063e-01) from %class.GenerateRGG* %0 + load (2.499531e-01) from %class.GenerateRGG* %0 + load (7.996993e+01) from %class.GenerateRGG* %0 + load (3.998497e+01) from %class.GenerateRGG* %0 + load (3.998497e+01) from %class.GenerateRGG* %0 + load (7.996991e+01) from %class.GenerateRGG* %0 + load (9.998126e-01) from %class.GenerateRGG* %0 + load (9.998116e-01) from %class.GenerateRGG* %0 + load (9.998116e-01) from %class.GenerateRGG* %0 + load (9.998116e-01) from %class.GenerateRGG* %0 + load (9.998107e-01) from %class.GenerateRGG* %0 + load (9.998107e-01) from %class.GenerateRGG* %0 + load (9.998107e-01) from %class.GenerateRGG* %0 + load (9.998091e-01) from %class.GenerateRGG* %0 + load (9.998091e-01) from %class.GenerateRGG* %0 + load (9.998091e-01) from %class.GenerateRGG* %0 + load (9.998082e-01) from %class.GenerateRGG* %0 + load (9.998082e-01) from %class.GenerateRGG* %0 + load (9.998082e-01) from %class.GenerateRGG* %0 + load (9.998072e-01) from %class.GenerateRGG* %0 + load (9.998015e-01) from %class.GenerateRGG* %0 + load (6.248724e-01) from %class.GenerateRGG* %0 + load (6.248718e-01) from %class.GenerateRGG* %0 + load (1.952724e-01) from %class.GenerateRGG* %0 + load (3.905445e-01) from %class.GenerateRGG* %0 + load (3.905442e-01) from %class.GenerateRGG* %0 + load (6.248393e-01) from %class.GenerateRGG* %0 + load (1.249644e+01) from %class.GenerateRGG* %0 + load (1.249643e+01) from %class.GenerateRGG* %0 + load (1.171538e+00) from %class.GenerateRGG* %0 + load (5.857690e-01) from %class.GenerateRGG* %0 + load (2.928845e-01) from %class.GenerateRGG* %0 + load (1.464422e-01) from %class.GenerateRGG* %0 + load (6.248387e-01) from %class.GenerateRGG* %0 + load (6.248381e-01) from %class.GenerateRGG* %0 + load (1.249638e+01) from %class.GenerateRGG* %0 + load (6.248253e-01) from %class.GenerateRGG* %0 + load (3.905154e-01) from %class.GenerateRGG* %0 + load (2.440719e-01) from %class.GenerateRGG* %0 + load (6.248247e-01) from %class.GenerateRGG* %0 + load (4.881438e+00) from %class.GenerateRGG* %0 + load (9.997431e-01) from %class.GenerateRGG* %0 + load (9.997421e-01) from %class.GenerateRGG* %0 + load (9.997406e-01) from %class.GenerateRGG* %0 + load (9.997406e-01) from %class.GenerateRGG* %0 + load (1.999481e+01) from %class.GenerateRGG* %0 + load (9.997388e-01) from %class.GenerateRGG* %0 + load (9.997385e-01) from %class.GenerateRGG* %0 + Frequency of %class.GenerateRGG* %0 + load: 1.246995e+03 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZN14BinaryEdgeList4readEiiiSs +Round 0 + alias entry %39 = getelementptr inbounds %class.BinaryEdgeList, %class.BinaryEdgeList* %0, i64 0, i32 4, !dbg !10380 + alias entry %41 = getelementptr inbounds %"class.std::basic_string", %"class.std::basic_string"* %4, i64 0, i32 0, i32 0, !dbg !10388 + alias entry %99 = getelementptr inbounds %class.BinaryEdgeList, %class.BinaryEdgeList* %0, i64 0, i32 0, !dbg !10514 + alias entry %100 = bitcast %class.BinaryEdgeList* %0 to i8*, !dbg !10515 + alias entry %104 = getelementptr inbounds %class.BinaryEdgeList, %class.BinaryEdgeList* %0, i64 0, i32 1, !dbg !10518 + alias entry %105 = bitcast i64* %104 to i8*, !dbg !10519 + alias entry %118 = getelementptr inbounds %class.BinaryEdgeList, %class.BinaryEdgeList* %0, i64 0, i32 2, !dbg !10532 + alias entry %182 = getelementptr inbounds %class.BinaryEdgeList, %class.BinaryEdgeList* %0, i64 0, i32 3, !dbg !10605 +Round 1 +Round end + load (9.999971e-01) from %class.BinaryEdgeList* %0 + load (9.999971e-01) from %"class.std::basic_string"* %4 + load (6.249948e-01) from %class.BinaryEdgeList* %0 + load (9.999905e-01) from %class.BinaryEdgeList* %0 + store (9.999905e-01) to %class.BinaryEdgeList* %0 + load (9.999895e-01) from %class.BinaryEdgeList* %0 + load (9.999886e-01) from %class.BinaryEdgeList* %0 + load (9.999886e-01) from %class.BinaryEdgeList* %0 + load (9.999729e-01) from %class.BinaryEdgeList* %0 + store (9.999729e-01) to %class.BinaryEdgeList* %0 + load (9.999714e-01) from %class.BinaryEdgeList* %0 + load (9.999714e-01) from %class.BinaryEdgeList* %0 + load (9.999547e-01) from %class.BinaryEdgeList* %0 + load (1.999909e+01) from %class.BinaryEdgeList* %0 + Frequency of %class.BinaryEdgeList* %0 + load: 2.962391e+01 store: 1.999963e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::basic_string"* %4 + load: 9.999971e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZNSt8_Rb_treeIlSt4pairIKl4CommESt10_Select1stIS3_ESt4lessIlESaIS3_EE8_M_eraseEPSt13_Rb_tree_nodeIS3_E +Round 0 +Round end +Warning: wrong traversal order, or recursive call +On function _ZN5GraphC2EllllP19ompi_communicator_t +Round 0 + alias entry %8 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 0, !dbg !10272 + alias entry %9 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 1, !dbg !10272 + alias entry %10 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 2, !dbg !10309 + alias entry %11 = bitcast %class.Graph* %0 to i8*, !dbg !10309 + alias entry %12 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 3, !dbg !10320 + alias entry %13 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 4, !dbg !10322 + alias entry %14 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 5, !dbg !10324 + alias entry %15 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, !dbg !10272 + alias entry %16 = bitcast %"class.std::vector.0"* %15 to i8*, !dbg !10332 + alias entry %17 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !10334 + base alias entry %17 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !10334 + alias entry %18 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 9, !dbg !10336 + alias entry %21 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %17, align 8, !dbg !10338, !tbaa !10335 + alias entry %22 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 8, !dbg !10339 + alias entry %28 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 0, i32 0, i32 0, i32 1, !dbg !10361 + alias entry %29 = bitcast i64** %28 to i64*, !dbg !10361 + alias entry %31 = bitcast %class.Graph* %0 to i64*, !dbg !10365 + alias entry %45 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 1, i32 0, i32 0, i32 1, !dbg !10416 + alias entry %46 = bitcast %struct.Edge** %45 to i64*, !dbg !10416 + alias entry %48 = bitcast %"class.std::vector.5"* %9 to i64*, !dbg !10420 + alias entry %64 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 1, !dbg !10455 + alias entry %65 = bitcast i64** %64 to i64*, !dbg !10455 + alias entry %67 = bitcast %"class.std::vector.0"* %15 to i64*, !dbg !10456 + alias entry %76 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %15, i64 0, i32 0, i32 0, i32 0 + alias entry %110 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %15, i64 0, i32 0, i32 0, i32 0, !dbg !10511 + alias entry %116 = getelementptr inbounds %"class.std::vector.5", %"class.std::vector.5"* %9, i64 0, i32 0, i32 0, i32 0, !dbg !10547 + alias entry %122 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 0, i32 0, i32 0, i32 0, !dbg !10576 +Round 1 +Round end + store (1.000000e+00) to %class.Graph* %0 + store (1.000000e+00) to %class.Graph* %0 + store (1.000000e+00) to %class.Graph* %0 + store (1.000000e+00) to %class.Graph* %0 + store (1.000000e+00) to %class.Graph* %0 + load (9.999990e-01) from %class.Graph* %0 + load (9.999980e-01) from %class.Graph* %0 + load (9.999980e-01) from %class.Graph* %0 + load (9.999980e-01) from %class.Graph* %0 +Warning: wrong traversal order, or recursive call +On function _ZN3LCGC2EjPdlP19ompi_communicator_t +Round 0 + alias entry %6 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 3, !dbg !10268 + alias entry %7 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 4, !dbg !10277 + alias entry %8 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 6, !dbg !10279 + base alias entry %8 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 6, !dbg !10279 + alias entry %9 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 7, !dbg !10281 + alias entry %10 = bitcast %"class.std::vector.0"* %9 to i8*, !dbg !10300 + alias entry %11 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 0, !dbg !10302 + base alias entry %11 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 0, !dbg !10302 + alias entry %12 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 1, !dbg !10306 + alias entry %15 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %11, align 8, !dbg !10308, !tbaa !10305 + alias entry %16 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 2, !dbg !10309 + alias entry %20 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 7, i32 0, i32 0, i32 1, !dbg !10326 + alias entry %21 = bitcast i64** %20 to i64*, !dbg !10326 + alias entry %23 = bitcast %"class.std::vector.0"* %9 to i64*, !dbg !10330 + alias entry %42 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 5, !dbg !10359 + alias entry %45 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %9, i64 0, i32 0, i32 0, i32 0, !dbg !10374 + alias entry %52 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 5, !dbg !10399 + alias entry %53 = bitcast i64* %52 to i8*, !dbg !10400 + alias entry %54 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %11, align 8, !dbg !10401, !tbaa !10305 +Round 1 +Round end + store (1.000000e+00) to %class.LCG* %0 + store (1.000000e+00) to %class.LCG* %0 + store (1.000000e+00) to %class.LCG* %0 + store (1.000000e+00) to %class.LCG* %0 + load (9.999989e-01) from %class.LCG* %0 + load (9.999982e-01) from %class.LCG* %0 + load (9.999982e-01) from %class.LCG* %0 + load (9.999982e-01) from %class.LCG* %0 +Warning: wrong traversal order, or recursive call +On function _ZNSt24uniform_int_distributionIiEclISt26linear_congruential_engineImLm16807ELm0ELm2147483647EEEEiRT_RKNS0_10param_typeE +Round 0 + alias entry %5 = getelementptr inbounds %"struct.std::uniform_int_distribution::param_type", %"struct.std::uniform_int_distribution::param_type"* %2, i64 0, i32 1, !dbg !10267 + alias entry %8 = getelementptr inbounds %"struct.std::uniform_int_distribution::param_type", %"struct.std::uniform_int_distribution::param_type"* %2, i64 0, i32 0, !dbg !10279 + alias entry %19 = getelementptr inbounds %"class.std::linear_congruential_engine", %"class.std::linear_congruential_engine"* %1, i64 0, i32 0 + alias entry %37 = getelementptr inbounds %"class.std::linear_congruential_engine", %"class.std::linear_congruential_engine"* %1, i64 0, i32 0 + alias entry %51 = getelementptr inbounds %"class.std::linear_congruential_engine", %"class.std::linear_congruential_engine"* %1, i64 0, i32 0, !dbg !10376 +Round 1 +Round end + load (1.000000e+00) from %"struct.std::uniform_int_distribution::param_type"* %2 + load (1.000000e+00) from %"struct.std::uniform_int_distribution::param_type"* %2 + load (5.000000e-01) from %"class.std::linear_congruential_engine"* %1 + store (5.000000e-01) to %"class.std::linear_congruential_engine"* %1 +Warning: wrong traversal order, or recursive call +On function _ZNSt6vectorIlSaIlEEaSERKS1_ +Round 0 + alias entry %5 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %1, i64 0, i32 0, i32 0, i32 1, !dbg !10278 + alias entry %6 = bitcast i64** %5 to i64*, !dbg !10278 + alias entry %8 = bitcast %"class.std::vector.0"* %1 to i64*, !dbg !10285 + alias entry %12 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10294 + alias entry %13 = bitcast i64** %12 to i64*, !dbg !10294 + alias entry %15 = bitcast %"class.std::vector.0"* %0 to i64*, !dbg !10296 + alias entry %.phi.trans.insert = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 0 + alias entry %35 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10460 + alias entry %42 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10490 + alias entry %43 = bitcast i64** %42 to i64*, !dbg !10490 + alias entry %54 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %1, i64 0, i32 0, i32 0, i32 0, !dbg !10573 + alias entry %74 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10633 + alias entry %77 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10635 +Round 1 +Round end + load (6.250000e-01) from %"class.std::vector.0"* %1 + load (6.250000e-01) from %"class.std::vector.0"* %1 + load (6.250000e-01) from %"class.std::vector.0"* %0 + load (6.250000e-01) from %"class.std::vector.0"* %0 + load (1.953125e-01) from %"class.std::vector.0"* %0 + store (3.125000e-01) to %"class.std::vector.0"* %0 + store (3.125000e-01) to %"class.std::vector.0"* %0 + load (3.125000e-01) from %"class.std::vector.0"* %0 + load (9.765625e-02) from %"class.std::vector.0"* %1 + load (9.765625e-02) from %"class.std::vector.0"* %0 + load (9.765625e-02) from %"class.std::vector.0"* %0 + load (9.765625e-02) from %"class.std::vector.0"* %1 + load (6.250000e-01) from %"class.std::vector.0"* %0 + store (6.250000e-01) to %"class.std::vector.0"* %0 + Frequency of %"class.std::vector.0"* %0 + load: 2.578125e+00 store: 1.250000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %1 + load: 1.445312e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZNSt6vectorIlSaIlEE14_M_fill_insertEN9__gnu_cxx17__normal_iteratorIPlS1_EEmRKl +Round 0 + alias entry %8 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10281 + alias entry %9 = bitcast i64** %8 to i64*, !dbg !10281 + alias entry %11 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10288 + alias entry %12 = bitcast i64** %11 to i64*, !dbg !10288 + alias entry %543 = bitcast %"class.std::vector.0"* %0 to i64*, !dbg !10728 + alias entry %729 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10820 +Round 1 +Round end + load (6.250000e-01) from %"class.std::vector.0"* %0 + load (6.250000e-01) from %"class.std::vector.0"* %0 + load (3.125000e-01) from i64* %3 + load (9.765625e-02) from %"class.std::vector.0"* %0 + store (1.562500e-01) to %"class.std::vector.0"* %0 + store (1.562500e-01) to %"class.std::vector.0"* %0 + load (9.765625e-02) from %"class.std::vector.0"* %0 + store (1.562500e-01) to %"class.std::vector.0"* %0 + load (3.125000e-01) from %"class.std::vector.0"* %0 + load (3.125000e-01) from i64* %3 + load (3.125000e-01) from %"class.std::vector.0"* %0 + load (3.125000e-01) from %"class.std::vector.0"* %0 + store (3.125000e-01) to %"class.std::vector.0"* %0 + store (3.125000e-01) to %"class.std::vector.0"* %0 + store (3.125000e-01) to %"class.std::vector.0"* %0 + Frequency of %"class.std::vector.0"* %0 + load: 2.382812e+00 store: 1.406250e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %3 + load: 6.250000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZNSt6vectorI4EdgeSaIS0_EE17_M_default_appendEm +Round 0 + alias entry %5 = getelementptr inbounds %"class.std::vector.5", %"class.std::vector.5"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10274 + alias entry %6 = bitcast %struct.Edge** %5 to i64*, !dbg !10274 + alias entry %8 = getelementptr inbounds %"class.std::vector.5", %"class.std::vector.5"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10281 + alias entry %83 = bitcast %"class.std::vector.5"* %0 to i64*, !dbg !10375 + alias entry %104 = getelementptr inbounds %"class.std::vector.5", %"class.std::vector.5"* %0, i64 0, i32 0, i32 0, i32 0 + alias entry %111 = getelementptr inbounds %"class.std::vector.5", %"class.std::vector.5"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10431 +Round 1 +Round end + load (6.250000e-01) from %"class.std::vector.5"* %0 + load (6.250000e-01) from %"class.std::vector.5"* %0 + load (3.125000e-01) from %"class.std::vector.5"* %0 + load (1.953125e-01) from %"class.std::vector.5"* %0 + load (1.953125e-01) from %"class.std::vector.5"* %0 + load (3.125000e-01) from %"class.std::vector.5"* %0 + store (3.125000e-01) to %"class.std::vector.5"* %0 + store (3.125000e-01) to %"class.std::vector.5"* %0 + Frequency of %"class.std::vector.5"* %0 + load: 2.265625e+00 store: 6.250000e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZN3LCG18parallel_prefix_opEv +Round 0 + alias entry %10 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 4, !dbg !10283 + alias entry %168 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 1, !dbg !10362 + alias entry %174 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 2, !dbg !10269 + alias entry %178 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 0 + alias entry %186 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 5, !dbg !10373 + alias entry %250 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 7, i32 0, i32 0, i32 0, !dbg !10373 +Round 1 +Round end + load (1.000000e+00) from %class.LCG* %0 + load (1.000000e+00) from %class.LCG* %0 + load (1.000000e+00) from %class.LCG* %0 + load (1.000000e+00) from %class.LCG* %0 + load (2.005882e+01) from %class.LCG* %0 + load (2.005882e+01) from %class.LCG* %0 + load (2.005882e+01) from %class.LCG* %0 + load (2.005882e+01) from %class.LCG* %0 + load (1.000000e+00) from %class.LCG* %0 + Frequency of %class.LCG* %0 + load: 8.523529e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZNSt6vectorI9EdgeTupleSaIS0_EE17_M_default_appendEm +Round 0 + alias entry %5 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10273 + alias entry %6 = bitcast %struct.EdgeTuple** %5 to i64*, !dbg !10273 + alias entry %8 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10280 + alias entry %60 = bitcast %"class.std::vector.84"* %0 to i64*, !dbg !10369 + alias entry %81 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 0 + alias entry %88 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10425 +Round 1 +Round end + load (6.250000e-01) from %"class.std::vector.84"* %0 + load (6.250000e-01) from %"class.std::vector.84"* %0 + load (3.125000e-01) from %"class.std::vector.84"* %0 + load (3.125000e-01) from %"class.std::vector.84"* %0 + load (1.953125e-01) from %"class.std::vector.84"* %0 + load (1.953125e-01) from %"class.std::vector.84"* %0 + load (3.125000e-01) from %"class.std::vector.84"* %0 + store (3.125000e-01) to %"class.std::vector.84"* %0 + store (3.125000e-01) to %"class.std::vector.84"* %0 + Frequency of %"class.std::vector.84"* %0 + load: 2.578125e+00 store: 6.250000e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZSt9__find_ifIN9__gnu_cxx17__normal_iteratorIP9EdgeTupleSt6vectorIS2_SaIS2_EEEEZN11GenerateRGG8generateEbbiEUlRKS2_E_ET_SC_SC_T0_St26random_access_iterator_tag +Round 0 +Round end +On function _ZNSt6vectorI9EdgeTupleSaIS0_EE15_M_range_insertIN9__gnu_cxx17__normal_iteratorIPS0_S2_EEEEvS7_T_S8_St20forward_iterator_tag +Round 0 + alias entry %13 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10344 + alias entry %14 = bitcast %struct.EdgeTuple** %13 to i64*, !dbg !10344 + alias entry %16 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10351 + alias entry %17 = bitcast %struct.EdgeTuple** %16 to i64*, !dbg !10351 + alias entry %116 = bitcast %"class.std::vector.84"* %0 to i64*, !dbg !10799 + alias entry %137 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 0 + alias entry %142 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10851 +Round 1 +Round end + load (6.250000e-01) from %"class.std::vector.84"* %0 + load (6.250000e-01) from %"class.std::vector.84"* %0 + load (9.765625e-02) from %"class.std::vector.84"* %0 + store (1.562500e-01) to %"class.std::vector.84"* %0 + load (9.765625e-02) from %"class.std::vector.84"* %0 + store (1.562500e-01) to %"class.std::vector.84"* %0 + load (9.765625e-02) from %"class.std::vector.84"* %0 + store (1.562500e-01) to %"class.std::vector.84"* %0 + load (3.125000e-01) from %"class.std::vector.84"* %0 + load (1.953125e-01) from %"class.std::vector.84"* %0 + load (3.125000e-01) from %"class.std::vector.84"* %0 + load (3.125000e-01) from %"class.std::vector.84"* %0 + store (3.125000e-01) to %"class.std::vector.84"* %0 + store (3.125000e-01) to %"class.std::vector.84"* %0 + store (3.125000e-01) to %"class.std::vector.84"* %0 + Frequency of %"class.std::vector.84"* %0 + load: 2.675781e+00 store: 1.406250e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZSt16__introsort_loopIN9__gnu_cxx17__normal_iteratorIP9EdgeTupleSt6vectorIS2_SaIS2_EEEElZN11GenerateRGG8generateEbbiEUlRKS2_SA_E_EvT_SC_T0_T1_ +Round 0 +Round end +On function _ZSt22__final_insertion_sortIN9__gnu_cxx17__normal_iteratorIP9EdgeTupleSt6vectorIS2_SaIS2_EEEEZN11GenerateRGG8generateEbbiEUlRKS2_SA_E_EvT_SC_T0_ +Round 0 +Round end +On function _ZSt13__heap_selectIN9__gnu_cxx17__normal_iteratorIP9EdgeTupleSt6vectorIS2_SaIS2_EEEEZN11GenerateRGG8generateEbbiEUlRKS2_SA_E_EvT_SC_SC_T0_ +Round 0 +Round end +On function _ZSt13__adjust_heapIN9__gnu_cxx17__normal_iteratorIP9EdgeTupleSt6vectorIS2_SaIS2_EEEElS2_ZN11GenerateRGG8generateEbbiEUlRKS2_SA_E_EvT_T0_SD_T1_T2_ +Round 0 +Round end +On function _ZSt22__move_median_to_firstIN9__gnu_cxx17__normal_iteratorIP9EdgeTupleSt6vectorIS2_SaIS2_EEEEZN11GenerateRGG8generateEbbiEUlRKS2_SA_E_EvT_SC_SC_SC_T0_ +Round 0 +Round end +On function _ZNSt6vectorIlSaIlEE17_M_default_appendEm +Round 0 + alias entry %5 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10274 + alias entry %6 = bitcast i64** %5 to i64*, !dbg !10274 + alias entry %8 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10281 + alias entry %20 = bitcast i64** %8 to i64*, !dbg !10380 + alias entry %21 = bitcast %"class.std::vector.0"* %0 to i64*, !dbg !10381 + alias entry %42 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 0 + alias entry %65 = bitcast %"class.std::vector.0"* %0 to i8**, !dbg !10628 +Round 1 +Round end + load (6.250000e-01) from %"class.std::vector.0"* %0 + load (6.250000e-01) from %"class.std::vector.0"* %0 + load (3.125000e-01) from %"class.std::vector.0"* %0 + load (3.125000e-01) from %"class.std::vector.0"* %0 + load (1.953125e-01) from %"class.std::vector.0"* %0 + load (1.953125e-01) from %"class.std::vector.0"* %0 + store (3.125000e-01) to %"class.std::vector.0"* %0 + store (3.125000e-01) to %"class.std::vector.0"* %0 + Frequency of %"class.std::vector.0"* %0 + load: 2.265625e+00 store: 6.250000e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZNSt6vectorIdSaIdEE17_M_default_appendEm +Round 0 + alias entry %5 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10274 + alias entry %6 = bitcast double** %5 to i64*, !dbg !10274 + alias entry %8 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10281 + alias entry %20 = bitcast double** %8 to i64*, !dbg !10381 + alias entry %21 = bitcast %"class.std::vector.10"* %0 to i64*, !dbg !10382 + alias entry %42 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %0, i64 0, i32 0, i32 0, i32 0 + alias entry %65 = bitcast %"class.std::vector.10"* %0 to i8**, !dbg !10630 +Round 1 +Round end + load (6.250000e-01) from %"class.std::vector.10"* %0 + load (6.250000e-01) from %"class.std::vector.10"* %0 + load (3.125000e-01) from %"class.std::vector.10"* %0 + load (3.125000e-01) from %"class.std::vector.10"* %0 + load (1.953125e-01) from %"class.std::vector.10"* %0 + load (1.953125e-01) from %"class.std::vector.10"* %0 + store (3.125000e-01) to %"class.std::vector.10"* %0 + store (3.125000e-01) to %"class.std::vector.10"* %0 + Frequency of %"class.std::vector.10"* %0 + load: 2.265625e+00 store: 6.250000e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZNSt6vectorI4CommSaIS0_EE17_M_default_appendEm +Round 0 + alias entry %5 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10460 + alias entry %6 = bitcast %struct.Comm** %5 to i64*, !dbg !10460 + alias entry %8 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10467 + alias entry %20 = bitcast %"class.std::vector.15"* %0 to i64*, !dbg !10551 + alias entry %41 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %0, i64 0, i32 0, i32 0, i32 0 + alias entry %48 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10607 +Round 1 +Round end + load (6.250000e-01) from %"class.std::vector.15"* %0 + load (6.250000e-01) from %"class.std::vector.15"* %0 + load (3.125000e-01) from %"class.std::vector.15"* %0 + load (3.125000e-01) from %"class.std::vector.15"* %0 + load (1.953125e-01) from %"class.std::vector.15"* %0 + load (1.953125e-01) from %"class.std::vector.15"* %0 + load (3.125000e-01) from %"class.std::vector.15"* %0 + store (3.125000e-01) to %"class.std::vector.15"* %0 + store (3.125000e-01) to %"class.std::vector.15"* %0 + Frequency of %"class.std::vector.15"* %0 + load: 2.578125e+00 store: 6.250000e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZNSt27__uninitialized_default_n_1ILb0EE18__uninit_default_nIPSt13unordered_setIlSt4hashIlESt8equal_toIlESaIlEEmEEvT_T0_ +Round 0 +Round end + Frequency of %"class.std::unordered_set"* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZNSt10_HashtableIlSt4pairIKllESaIS2_ENSt8__detail10_Select1stESt8equal_toIlESt4hashIlENS4_18_Mod_range_hashingENS4_20_Default_ranged_hashENS4_20_Prime_rehash_policyENS4_17_Hashtable_traitsILb0ELb0ELb1EEEE21_M_insert_unique_nodeEmmPNS4_10_Hash_nodeIS2_Lb0EEE +Round 0 + alias entry %5 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 4, !dbg !10268 + alias entry %6 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 4, i32 1, !dbg !10275 + alias entry %8 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 1, !dbg !10282 + alias entry %10 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 3, !dbg !10288 + alias entry %17 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 0 + alias entry %29 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 2, i32 0, !dbg !10428 + alias entry %30 = bitcast %"struct.std::__detail::_Hash_node_base"* %29 to %"struct.std::__detail::_Hash_node"**, !dbg !10429 + alias entry %32 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %29, i64 0, i32 0, !dbg !10432 + alias entry %35 = bitcast %"struct.std::__detail::_Hash_node_base"* %29 to i64* + base alias entry %44 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %28, i64 %43, !dbg !10509 + alias entry %61 = load %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %44, align 8, !dbg !10529, !tbaa !10511 + alias entry %62 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %61, i64 0, i32 0, !dbg !10530 + alias entry %76 = bitcast %"class.std::_Hashtable"* %0 to i8**, !dbg !10550 + alias entry %82 = bitcast %"struct.std::__detail::_Hash_node"* %3 to i8*, !dbg !10618 + alias entry %86 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 0, !dbg !10296 + alias entry %93 = getelementptr inbounds %"struct.std::__detail::_Hash_node", %"struct.std::__detail::_Hash_node"* %3, i64 0, i32 0, !dbg !10627 + alias entry %94 = bitcast %"struct.std::__detail::_Hash_node"* %3 to i64*, !dbg !10628 + base alias entry %96 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %95, i64 0, i32 0, !dbg !10630 + alias entry %98 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 2, i32 0, !dbg !10639 + alias entry %99 = bitcast %"struct.std::__detail::_Hash_node_base"* %98 to i64*, !dbg !10640 + alias entry %101 = getelementptr inbounds %"struct.std::__detail::_Hash_node", %"struct.std::__detail::_Hash_node"* %3, i64 0, i32 0, !dbg !10641 + alias entry %102 = getelementptr inbounds %"struct.std::__detail::_Hash_node", %"struct.std::__detail::_Hash_node"* %3, i64 0, i32 0, i32 0, !dbg !10641 + alias entry %103 = bitcast %"struct.std::__detail::_Hash_node"* %3 to i64*, !dbg !10642 + alias entry %104 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %98, i64 0, i32 0, !dbg !10645 + base alias entry %104 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %98, i64 0, i32 0, !dbg !10645 + base alias entry %113 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %84, i64 %112, !dbg !10676 + base alias entry %117 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %116, i64 %85, !dbg !10678 +Round 1 +Warning: the first offset is not constant + alias entry %45 = load %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %44, align 8, !dbg !10509, !tbaa !10511 + alias entry %57 = bitcast %"struct.std::__detail::_Hash_node_base"* %45 to i64*, !dbg !10525 + base alias offset entry (0) %95 = load %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %87, align 8, !dbg !10629, !tbaa !10511 +Warning: the first offset is not constant +Warning: the first offset is not constant +Round 2 +Warning: the first offset is not constant +Warning: the first offset is not constant +Warning: the first offset is not constant +Round end + load (1.000000e+00) from %"class.std::_Hashtable"* %0 + load (1.000000e+00) from %"class.std::_Hashtable"* %0 + load (1.000000e+00) from %"class.std::_Hashtable"* %0 + load (5.000000e-01) from %"class.std::_Hashtable"* %0 + load (4.999995e-01) from %"class.std::_Hashtable"* %0 + store (4.999995e-01) to %"class.std::_Hashtable"* %0 + load (3.749996e+00) from %"class.std::_Hashtable"* %0 + store (3.749996e+00) to %"class.std::_Hashtable"* %0 + load (6.249994e+00) from %"class.std::_Hashtable"* %0 + store (6.249994e+00) to %"class.std::_Hashtable"* %0 + store (4.768372e-07) to %"class.std::_Hashtable"* %0 + load (4.999995e-01) from %"class.std::_Hashtable"* %0 + store (4.999995e-01) to %"class.std::_Hashtable"* %0 + store (4.999995e-01) to %"class.std::_Hashtable"* %0 + store (6.249997e-01) to %"struct.std::__detail::_Hash_node"* %3 + load (3.749998e-01) from %"class.std::_Hashtable"* %0 + store (3.749998e-01) to %"struct.std::__detail::_Hash_node"* %3 + store (3.749998e-01) to %"class.std::_Hashtable"* %0 + load (3.749998e-01) from %"struct.std::__detail::_Hash_node"* %3 + load (2.343749e-01) from %"class.std::_Hashtable"* %0 + load (2.343749e-01) from %"class.std::_Hashtable"* %0 + load (9.999995e-01) from %"class.std::_Hashtable"* %0 + store (9.999995e-01) to %"class.std::_Hashtable"* %0 + Frequency of %"class.std::_Hashtable"* %0 + load: 1.634374e+01 store: 1.287499e+01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"struct.std::__detail::_Hash_node"* %3 + load: 3.749998e-01 store: 9.999995e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZNSt10_HashtableIllSaIlENSt8__detail9_IdentityESt8equal_toIlESt4hashIlENS1_18_Mod_range_hashingENS1_20_Default_ranged_hashENS1_20_Prime_rehash_policyENS1_17_Hashtable_traitsILb0ELb1ELb1EEEE21_M_insert_unique_nodeEmmPNS1_10_Hash_nodeIlLb0EEE +Round 0 + alias entry %5 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 4, !dbg !10268 + alias entry %6 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 4, i32 1, !dbg !10275 + alias entry %8 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 1, !dbg !10282 + alias entry %10 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 3, !dbg !10288 + alias entry %17 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 0 + alias entry %29 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 2, i32 0, !dbg !10428 + alias entry %30 = bitcast %"struct.std::__detail::_Hash_node_base"* %29 to %"struct.std::__detail::_Hash_node.61"**, !dbg !10429 + alias entry %32 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %29, i64 0, i32 0, !dbg !10432 + alias entry %35 = bitcast %"struct.std::__detail::_Hash_node_base"* %29 to i64* + base alias entry %44 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %28, i64 %43, !dbg !10469 + alias entry %61 = load %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %44, align 8, !dbg !10489, !tbaa !10471 + alias entry %62 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %61, i64 0, i32 0, !dbg !10490 + alias entry %76 = bitcast %"class.std::_Hashtable.34"* %0 to i8**, !dbg !10510 + alias entry %82 = bitcast %"struct.std::__detail::_Hash_node.61"* %3 to i8*, !dbg !10578 + alias entry %86 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 0, !dbg !10296 + alias entry %93 = getelementptr inbounds %"struct.std::__detail::_Hash_node.61", %"struct.std::__detail::_Hash_node.61"* %3, i64 0, i32 0, !dbg !10587 + alias entry %94 = bitcast %"struct.std::__detail::_Hash_node.61"* %3 to i64*, !dbg !10588 + base alias entry %96 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %95, i64 0, i32 0, !dbg !10590 + alias entry %98 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 2, i32 0, !dbg !10599 + alias entry %99 = bitcast %"struct.std::__detail::_Hash_node_base"* %98 to i64*, !dbg !10600 + alias entry %101 = getelementptr inbounds %"struct.std::__detail::_Hash_node.61", %"struct.std::__detail::_Hash_node.61"* %3, i64 0, i32 0, !dbg !10601 + alias entry %102 = getelementptr inbounds %"struct.std::__detail::_Hash_node.61", %"struct.std::__detail::_Hash_node.61"* %3, i64 0, i32 0, i32 0, !dbg !10601 + alias entry %103 = bitcast %"struct.std::__detail::_Hash_node.61"* %3 to i64*, !dbg !10602 + alias entry %104 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %98, i64 0, i32 0, !dbg !10605 + base alias entry %104 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %98, i64 0, i32 0, !dbg !10605 + base alias entry %113 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %84, i64 %112, !dbg !10630 + base alias entry %117 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %116, i64 %85, !dbg !10632 +Round 1 +Warning: the first offset is not constant + alias entry %45 = load %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %44, align 8, !dbg !10469, !tbaa !10471 + alias entry %57 = bitcast %"struct.std::__detail::_Hash_node_base"* %45 to i64*, !dbg !10485 + base alias offset entry (0) %95 = load %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %87, align 8, !dbg !10589, !tbaa !10471 +Warning: the first offset is not constant +Warning: the first offset is not constant +Round 2 +Warning: the first offset is not constant +Warning: the first offset is not constant +Warning: the first offset is not constant +Round end + load (1.000000e+00) from %"class.std::_Hashtable.34"* %0 + load (1.000000e+00) from %"class.std::_Hashtable.34"* %0 + load (1.000000e+00) from %"class.std::_Hashtable.34"* %0 + load (5.000000e-01) from %"class.std::_Hashtable.34"* %0 + load (4.999995e-01) from %"class.std::_Hashtable.34"* %0 + store (4.999995e-01) to %"class.std::_Hashtable.34"* %0 + load (3.749996e+00) from %"class.std::_Hashtable.34"* %0 + store (3.749996e+00) to %"class.std::_Hashtable.34"* %0 + load (6.249994e+00) from %"class.std::_Hashtable.34"* %0 + store (6.249994e+00) to %"class.std::_Hashtable.34"* %0 + store (4.768372e-07) to %"class.std::_Hashtable.34"* %0 + load (4.999995e-01) from %"class.std::_Hashtable.34"* %0 + store (4.999995e-01) to %"class.std::_Hashtable.34"* %0 + store (4.999995e-01) to %"class.std::_Hashtable.34"* %0 + store (6.249997e-01) to %"struct.std::__detail::_Hash_node.61"* %3 + load (3.749998e-01) from %"class.std::_Hashtable.34"* %0 + store (3.749998e-01) to %"struct.std::__detail::_Hash_node.61"* %3 + store (3.749998e-01) to %"class.std::_Hashtable.34"* %0 + load (3.749998e-01) from %"struct.std::__detail::_Hash_node.61"* %3 + load (2.343749e-01) from %"class.std::_Hashtable.34"* %0 + load (2.343749e-01) from %"class.std::_Hashtable.34"* %0 + load (9.999995e-01) from %"class.std::_Hashtable.34"* %0 + store (9.999995e-01) to %"class.std::_Hashtable.34"* %0 + Frequency of %"class.std::_Hashtable.34"* %0 + load: 1.634374e+01 store: 1.287499e+01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"struct.std::__detail::_Hash_node.61"* %3 + load: 3.749998e-01 store: 9.999995e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZNSt6vectorI8CommInfoSaIS0_EE17_M_default_appendEm +Round 0 + alias entry %7 = getelementptr inbounds %"class.std::vector.52", %"class.std::vector.52"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10273 + alias entry %8 = bitcast %struct.CommInfo** %7 to i64*, !dbg !10273 + alias entry %10 = getelementptr inbounds %"class.std::vector.52", %"class.std::vector.52"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10280 + alias entry %54 = bitcast %struct.CommInfo** %10 to i64*, !dbg !10394 + alias entry %55 = bitcast %"class.std::vector.52"* %0 to i64*, !dbg !10395 + alias entry %76 = getelementptr inbounds %"class.std::vector.52", %"class.std::vector.52"* %0, i64 0, i32 0, i32 0, i32 0 + alias entry %84 = getelementptr inbounds %"class.std::vector.52", %"class.std::vector.52"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10449 + alias entry %133 = bitcast %"class.std::vector.52"* %0 to i8**, !dbg !10651 +Round 1 +Round end + load (6.250000e-01) from %"class.std::vector.52"* %0 + load (6.250000e-01) from %"class.std::vector.52"* %0 + load (3.125000e-01) from %"class.std::vector.52"* %0 + load (3.125000e-01) from %"class.std::vector.52"* %0 + load (1.953125e-01) from %"class.std::vector.52"* %0 + load (1.953125e-01) from %"class.std::vector.52"* %0 + load (3.125000e-01) from %"class.std::vector.52"* %0 + store (3.125000e-01) to %"class.std::vector.52"* %0 + store (3.125000e-01) to %"class.std::vector.52"* %0 + Frequency of %"class.std::vector.52"* %0 + load: 2.578125e+00 store: 6.250000e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _GLOBAL__sub_I_main.cpp +Round 0 +Round end +On function .omp_offloading.descriptor_unreg +Round 0 +Round end + Frequency of i8* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_offloading.descriptor_reg.nvptx64-nvidia-cuda +Round 0 +Round end + ---- Identify Target Regions ---- + target call: %49 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z21distComputeModularityRK5GraphP4CommPKddi_l396.region_id, i32 5, i8** nonnull %24, i8** nonnull %26, i64* nonnull %28, i64* getelementptr inbounds ([5 x i64], [5 x i64]* @.offload_maptypes.0, i64 0, i64 0), i32 0, i32 0), !dbg !10317 + target call: %24 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z20distUpdateLocalCinfolP4CommPKS__l436.region_id, i32 3, i8** nonnull %9, i8** nonnull %11, i64* nonnull %13, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.15, i64 0, i64 0), i32 0, i32 0), !dbg !10269 + target call: %26 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z16distCleanCWandCUlPdP4Comm_l455.region_id, i32 3, i8** nonnull %11, i8** nonnull %13, i64* nonnull %15, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.20, i64 0, i64 0), i32 0, i32 0), !dbg !10269 + target call: %258 = invoke i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z16distCleanCWandCUlPdP4Comm_l455.region_id, i32 3, i8** nonnull %184, i8** nonnull %186, i64* nonnull %188, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.20.1, i64 0, i64 0), i32 0, i32 0) + to label %259 unwind label %319, !dbg !11559 + target call: %276 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z17distLouvainMethodiiRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ddRi_l1368.region_id, i32 12, i8** nonnull %200, i8** nonnull %202, i64* nonnull %204, i64* getelementptr inbounds ([12 x i64], [12 x i64]* @.offload_maptypes.47.2, i64 0, i64 0), i32 0, i32 0), !dbg !11584 + target call: %325 = invoke i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z20distUpdateLocalCinfolP4CommPKS__l436.region_id, i32 3, i8** nonnull %184, i8** nonnull %186, i64* nonnull %188, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.15.3, i64 0, i64 0), i32 0, i32 0) + to label %326 unwind label %319, !dbg !11667 + ---- Target Distance Calculation ---- +_Z17distLouvainMethodiiRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ddRi converges after 3 iterations +target 0: (0: 1.000000e+00) (1: 1.000000e+00) (2: 1.000000e+00) (3: 1.000000e+00) (4: 1.000000e+00) (5: 1.000000e+00) +target 1: (0: 1.000000e+00) (1: 1.000000e+00) (2: 1.000000e+00) (3: 1.000000e+00) (4: 1.000000e+00) (5: 1.000000e+00) +target 2: (0: 1.000000e+00) (1: 1.000000e+00) (2: 1.000000e+00) (3: 1.000000e+00) (4: 1.000000e+00) (5: 1.000000e+00) +target 3: (0: 1.010000e+02) (1: 1.010000e+02) (2: 1.010000e+02) (3: 9.152967e+00) (4: 1.000095e+00) (5: 2.000190e+00) +target 4: (0: 1.010000e+02) (1: 1.010000e+02) (2: 1.010000e+02) (3: 8.152880e+00) (4: 9.091440e+00) (5: 1.000095e+00) +target 5: (0: 1.010000e+02) (1: 1.010000e+02) (2: 1.010000e+02) (3: 7.152791e+00) (4: 8.091353e+00) (5: 9.029914e+00) + ---- OMP (/tmp/main-7b3dc0.bc, powerpc64le-unknown-linux-gnu) ---- +new entry %91 = invoke i8* @omp_target_alloc(i64 %90, i32 signext -100) + to label %92 unwind label %291, !dbg !11387 +new entry %105 = invoke i8* @omp_target_alloc(i64 %104, i32 signext -100) + to label %106 unwind label %295, !dbg !11405 +new entry %119 = invoke i8* @omp_target_alloc(i64 %118, i32 signext -100) + to label %120 unwind label %299, !dbg !11431 +new entry %128 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %129 unwind label %303, !dbg !11449 +new entry %134 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %135 unwind label %307, !dbg !11462 +new entry %140 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %141 unwind label %311, !dbg !11478 +new entry %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %146 unwind label %313, !dbg !11486 +new entry %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %152 unwind label %315, !dbg !11502 +new entry %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %157 unwind label %317, !dbg !11510 +Round 0 + base alias entry %130 = bitcast i64** %29 to i8**, !dbg !11450 + base alias entry %142 = bitcast i64** %30 to i8**, !dbg !11479 + alias entry %147 = bitcast i8* %145 to %struct.Comm*, !dbg !11487 + alias entry %158 = bitcast i8* %156 to double*, !dbg !11511 + base alias entry %189 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 1 + base alias entry %190 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 1 + base alias entry %192 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 2 + base alias entry %193 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 2 +Warning: reach to function declaration __kmpc_fork_teams + base alias entry %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1 + base alias entry %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1 + base alias entry %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2 + base alias entry %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2 + base alias entry %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3 + base alias entry %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3 + base alias entry %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6 + base alias entry %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6 + base alias entry %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7 + base alias entry %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7 + base alias entry %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8 + base alias entry %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8 + base alias entry %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10 + base alias entry %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10 +Warning: reach to function declaration __kmpc_fork_teams + base alias entry %189 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 1 + base alias entry %190 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 1 +Warning: reach to function declaration __kmpc_fork_teams + alias entry (func arg) %struct.Comm* %1 + alias entry (func arg) double* %2 +Info: add function _Z21distComputeModularityRK5GraphP4CommPKddi to Round 1 +Round 1 + base alias entry %35 = bitcast i8** %34 to double**, !dbg !10317 + base alias entry %37 = bitcast i8** %36 to double**, !dbg !10317 + base alias entry %45 = bitcast i8** %44 to %struct.Comm**, !dbg !10317 + base alias entry %47 = bitcast i8** %46 to %struct.Comm**, !dbg !10317 +Warning: reach to function declaration __kmpc_fork_teams + base alias entry %29 = alloca i64*, align 8 + base alias entry %30 = alloca i64*, align 8 + base alias offset entry (1) %16 = alloca [3 x i8*], align 8 + base alias offset entry (1) %17 = alloca [3 x i8*], align 8 + base alias offset entry (2) %16 = alloca [3 x i8*], align 8 + base alias offset entry (-1) %192 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 2 + base alias offset entry (2) %17 = alloca [3 x i8*], align 8 + base alias offset entry (-1) %193 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 2 + base alias offset entry (1) %31 = alloca [12 x i8*], align 8 + base alias offset entry (1) %32 = alloca [12 x i8*], align 8 + base alias offset entry (2) %31 = alloca [12 x i8*], align 8 + base alias offset entry (-1) %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2 + base alias offset entry (2) %32 = alloca [12 x i8*], align 8 + base alias offset entry (-1) %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2 + base alias offset entry (3) %31 = alloca [12 x i8*], align 8 + base alias offset entry (-2) %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3 + base alias offset entry (-1) %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3 + base alias offset entry (3) %32 = alloca [12 x i8*], align 8 + base alias offset entry (-2) %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3 + base alias offset entry (-1) %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3 + base alias offset entry (-3) %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4 + base alias offset entry (-2) %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4 + base alias offset entry (-1) %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4 + base alias offset entry (-3) %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4 + base alias offset entry (-2) %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4 + base alias offset entry (-1) %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4 + base alias offset entry (-4) %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5 + base alias offset entry (-3) %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5 + base alias offset entry (-2) %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5 + base alias offset entry (-4) %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5 + base alias offset entry (-3) %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5 + base alias offset entry (-2) %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5 + base alias offset entry (6) %31 = alloca [12 x i8*], align 8 + base alias offset entry (-5) %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6 + base alias offset entry (-4) %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6 + base alias offset entry (-3) %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6 + base alias offset entry (6) %32 = alloca [12 x i8*], align 8 + base alias offset entry (-5) %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6 + base alias offset entry (-4) %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6 + base alias offset entry (-3) %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6 + base alias offset entry (7) %31 = alloca [12 x i8*], align 8 + base alias offset entry (-6) %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7 + base alias offset entry (-5) %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7 + base alias offset entry (-4) %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7 + base alias offset entry (-1) %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7 + base alias offset entry (7) %32 = alloca [12 x i8*], align 8 + base alias offset entry (-6) %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7 + base alias offset entry (-5) %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7 + base alias offset entry (-4) %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7 + base alias offset entry (-1) %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7 + base alias offset entry (8) %31 = alloca [12 x i8*], align 8 + base alias offset entry (-7) %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8 + base alias offset entry (-6) %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8 + base alias offset entry (-5) %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8 + base alias offset entry (-2) %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8 + base alias offset entry (-1) %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8 + base alias offset entry (8) %32 = alloca [12 x i8*], align 8 + base alias offset entry (-7) %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8 + base alias offset entry (-6) %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8 + base alias offset entry (-5) %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8 + base alias offset entry (-2) %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8 + base alias offset entry (-1) %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8 + base alias offset entry (-8) %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9 + base alias offset entry (-7) %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9 + base alias offset entry (-6) %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9 + base alias offset entry (-3) %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9 + base alias offset entry (-2) %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9 + base alias offset entry (-1) %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9 + base alias offset entry (-8) %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9 + base alias offset entry (-7) %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9 + base alias offset entry (-6) %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9 + base alias offset entry (-3) %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9 + base alias offset entry (-2) %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9 + base alias offset entry (-1) %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9 + base alias offset entry (10) %31 = alloca [12 x i8*], align 8 + base alias offset entry (-9) %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10 + base alias offset entry (-8) %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10 + base alias offset entry (-7) %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10 + base alias offset entry (-4) %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10 + base alias offset entry (-3) %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10 + base alias offset entry (-2) %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10 + base alias offset entry (10) %32 = alloca [12 x i8*], align 8 + base alias offset entry (-9) %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10 + base alias offset entry (-8) %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10 + base alias offset entry (-7) %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10 + base alias offset entry (-4) %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10 + base alias offset entry (-3) %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10 + base alias offset entry (-2) %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10 + base alias offset entry (-10) %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11 + base alias offset entry (-9) %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11 + base alias offset entry (-8) %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11 + base alias offset entry (-5) %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11 + base alias offset entry (-4) %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11 + base alias offset entry (-3) %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11 + base alias offset entry (-1) %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11 + base alias offset entry (-10) %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11 + base alias offset entry (-9) %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11 + base alias offset entry (-8) %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11 + base alias offset entry (-5) %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11 + base alias offset entry (-4) %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11 + base alias offset entry (-3) %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11 + base alias offset entry (-1) %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11 +Warning: store a different alias pointer to a base pointer: store i8* %156, i8** %189, align 8, !dbg !11559 +Warning: store a different alias pointer to a base pointer: store i8* %156, i8** %190, align 8, !dbg !11559 +Warning: reach to function declaration __kmpc_fork_teams + alias entry %263 = load i64*, i64** %29, align 8, !dbg !11584, !tbaa !11451 + alias entry %264 = load i64*, i64** %30, align 8, !dbg !11584, !tbaa !11451 + alias entry %274 = ptrtoint i64* %263 to i64, !dbg !11584 + alias entry %275 = ptrtoint i64* %264 to i64, !dbg !11584 + base alias entry %215 = bitcast i8** %214 to i64* + base alias entry %217 = bitcast i8** %216 to i64* + base alias entry %220 = bitcast i8** %219 to i64* + base alias entry %222 = bitcast i8** %221 to i64* +Warning: reach to function declaration __kmpc_fork_teams +Warning: reach to function declaration __kmpc_fork_teams +Info: add function _Z21distComputeModularityRK5GraphP4CommPKddi to Round 2 +Warning: reach to function declaration __kmpc_fork_call +Round 2 + base alias entry %34 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 2, !dbg !10317 + base alias entry %36 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 2, !dbg !10317 + base alias entry %44 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 4, !dbg !10317 + base alias entry %46 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 4, !dbg !10317 +Warning: reach to function declaration __kmpc_fork_teams + base alias offset entry (2) %11 = alloca [5 x i8*], align 8 + base alias offset entry (2) %12 = alloca [5 x i8*], align 8 + base alias offset entry (-1) %39 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 3, !dbg !10317 + base alias offset entry (-1) %41 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 3, !dbg !10317 + base alias offset entry (4) %11 = alloca [5 x i8*], align 8 + base alias offset entry (-2) %44 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 4, !dbg !10317 + base alias offset entry (4) %12 = alloca [5 x i8*], align 8 + base alias offset entry (-2) %46 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 4, !dbg !10317 +Warning: reach to function declaration __kmpc_fork_teams + base alias entry %126 = bitcast i64** %29 to i8*, !dbg !11447 + base alias entry %139 = bitcast i64** %30 to i8*, !dbg !11477 + base alias offset entry (1) %184 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 0 + base alias offset entry (2) %184 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 0 + base alias offset entry (1) %186 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 0 + base alias offset entry (2) %186 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 0 + base alias offset entry (1) %189 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 1 + base alias offset entry (1) %190 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 1 + base alias offset entry (1) %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0 + base alias offset entry (2) %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0 + base alias offset entry (3) %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0 + base alias offset entry (6) %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0 + base alias offset entry (7) %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0 + base alias offset entry (8) %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0 + base alias offset entry (10) %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0 + base alias offset entry (1) %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0 + base alias offset entry (2) %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0 + base alias offset entry (3) %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0 + base alias offset entry (6) %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0 + base alias offset entry (7) %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0 + base alias offset entry (8) %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0 + base alias offset entry (10) %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0 + base alias offset entry (1) %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1 + base alias offset entry (2) %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1 + base alias offset entry (5) %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1 + base alias offset entry (6) %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1 + base alias offset entry (7) %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1 + base alias offset entry (9) %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1 + base alias offset entry (1) %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1 + base alias offset entry (2) %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1 + base alias offset entry (5) %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1 + base alias offset entry (6) %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1 + base alias offset entry (7) %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1 + base alias offset entry (9) %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1 + base alias offset entry (1) %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2 + base alias offset entry (4) %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2 + base alias offset entry (5) %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2 + base alias offset entry (6) %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2 + base alias offset entry (8) %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2 + base alias offset entry (1) %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2 + base alias offset entry (4) %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2 + base alias offset entry (5) %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2 + base alias offset entry (6) %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2 + base alias offset entry (8) %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2 + base alias offset entry (3) %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3 + base alias offset entry (4) %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3 + base alias offset entry (5) %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3 + base alias offset entry (7) %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3 + base alias offset entry (3) %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3 + base alias offset entry (4) %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3 + base alias offset entry (5) %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3 + base alias offset entry (7) %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3 + base alias offset entry (2) %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4 + base alias offset entry (3) %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4 + base alias offset entry (4) %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4 + base alias offset entry (6) %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4 + base alias entry %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4 + base alias offset entry (2) %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4 + base alias offset entry (3) %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4 + base alias offset entry (4) %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4 + base alias offset entry (6) %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4 + base alias entry %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4 + base alias offset entry (1) %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5 + base alias offset entry (2) %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5 + base alias offset entry (3) %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5 + base alias offset entry (5) %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5 + base alias entry %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5 + base alias offset entry (1) %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5 + base alias offset entry (2) %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5 + base alias offset entry (3) %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5 + base alias offset entry (5) %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5 + base alias entry %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5 + base alias offset entry (1) %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6 + base alias offset entry (2) %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6 + base alias offset entry (4) %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6 + base alias offset entry (1) %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6 + base alias offset entry (2) %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6 + base alias offset entry (4) %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6 + base alias offset entry (1) %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7 + base alias offset entry (3) %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7 + base alias offset entry (1) %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7 + base alias offset entry (3) %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7 + base alias offset entry (2) %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8 + base alias offset entry (2) %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8 + base alias offset entry (1) %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9 + base alias offset entry (1) %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9 +Warning: store a different alias pointer to a base pointer: store i8* %156, i8** %189, align 8, !dbg !11559 +Warning: store a different alias pointer to a base pointer: store i8* %156, i8** %190, align 8, !dbg !11559 +Warning: reach to function declaration __kmpc_fork_teams +Warning: reach to function declaration __kmpc_fork_teams +Warning: reach to function declaration __kmpc_fork_teams +Info: add function _Z21distComputeModularityRK5GraphP4CommPKddi to Round 3 +Warning: reach to function declaration __kmpc_fork_call +Round 3 + base alias offset entry (4) %24 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 0, !dbg !10317 + base alias offset entry (2) %24 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 0, !dbg !10317 + base alias offset entry (4) %26 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 0, !dbg !10317 + base alias offset entry (2) %26 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 0, !dbg !10317 + base alias offset entry (3) %29 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 1, !dbg !10317 + base alias offset entry (1) %29 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 1, !dbg !10317 + base alias offset entry (3) %31 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 1, !dbg !10317 + base alias offset entry (1) %31 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 1, !dbg !10317 + base alias offset entry (2) %34 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 2, !dbg !10317 + base alias offset entry (2) %36 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 2, !dbg !10317 + base alias offset entry (1) %39 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 3, !dbg !10317 + base alias offset entry (1) %41 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 3, !dbg !10317 +Warning: reach to function declaration __kmpc_fork_teams +Warning: reach to function declaration __kmpc_fork_teams +Warning: reach to function declaration __kmpc_fork_teams + base alias offset entry (4) %31 = alloca [12 x i8*], align 8 + base alias offset entry (4) %32 = alloca [12 x i8*], align 8 + base alias offset entry (5) %31 = alloca [12 x i8*], align 8 + base alias offset entry (-1) %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5 + base alias offset entry (5) %32 = alloca [12 x i8*], align 8 + base alias offset entry (-1) %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5 + base alias offset entry (-2) %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6 + base alias offset entry (-1) %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6 + base alias offset entry (-2) %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6 + base alias offset entry (-1) %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6 + base alias offset entry (-3) %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7 + base alias offset entry (-2) %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7 + base alias offset entry (-3) %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7 + base alias offset entry (-2) %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7 + base alias offset entry (-4) %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8 + base alias offset entry (-3) %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8 + base alias offset entry (-4) %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8 + base alias offset entry (-3) %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8 + base alias offset entry (-5) %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9 + base alias offset entry (-4) %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9 + base alias offset entry (-5) %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9 + base alias offset entry (-4) %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9 + base alias offset entry (-6) %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10 + base alias offset entry (-5) %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10 + base alias offset entry (-6) %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10 + base alias offset entry (-5) %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10 + base alias offset entry (-7) %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11 + base alias offset entry (-6) %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11 + base alias offset entry (-7) %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11 + base alias offset entry (-6) %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11 +Warning: store a different alias pointer to a base pointer: store i8* %156, i8** %189, align 8, !dbg !11559 +Warning: store a different alias pointer to a base pointer: store i8* %156, i8** %190, align 8, !dbg !11559 +Warning: reach to function declaration __kmpc_fork_teams +Warning: reach to function declaration __kmpc_fork_teams +Warning: reach to function declaration __kmpc_fork_teams +Info: add function _Z21distComputeModularityRK5GraphP4CommPKddi to Round 4 +Warning: reach to function declaration __kmpc_fork_call +Round 4 +Warning: reach to function declaration __kmpc_fork_teams +Warning: reach to function declaration __kmpc_fork_teams + base alias offset entry (4) %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0 + base alias offset entry (5) %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0 + base alias offset entry (4) %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0 + base alias offset entry (5) %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0 + base alias offset entry (3) %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1 + base alias offset entry (4) %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1 + base alias offset entry (3) %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1 + base alias offset entry (4) %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1 + base alias offset entry (2) %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2 + base alias offset entry (3) %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2 + base alias offset entry (2) %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2 + base alias offset entry (3) %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2 + base alias offset entry (1) %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3 + base alias offset entry (2) %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3 + base alias offset entry (1) %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3 + base alias offset entry (2) %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3 + base alias offset entry (1) %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4 + base alias offset entry (1) %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4 +Warning: store a different alias pointer to a base pointer: store i8* %156, i8** %189, align 8, !dbg !11559 +Warning: store a different alias pointer to a base pointer: store i8* %156, i8** %190, align 8, !dbg !11559 +Warning: reach to function declaration __kmpc_fork_teams +Warning: reach to function declaration __kmpc_fork_teams +Warning: reach to function declaration __kmpc_fork_teams +Info: add function _Z21distComputeModularityRK5GraphP4CommPKddi to Round 5 +Warning: reach to function declaration __kmpc_fork_call +Round 5 +Warning: reach to function declaration __kmpc_fork_teams +Warning: store a different alias pointer to a base pointer: store i8* %156, i8** %189, align 8, !dbg !11559 +Warning: store a different alias pointer to a base pointer: store i8* %156, i8** %190, align 8, !dbg !11559 +Warning: reach to function declaration __kmpc_fork_teams +Warning: reach to function declaration __kmpc_fork_teams +Warning: reach to function declaration __kmpc_fork_teams +Info: add function _Z21distComputeModularityRK5GraphP4CommPKddi to Round 6 +Warning: reach to function declaration __kmpc_fork_call +Round 6 +Warning: reach to function declaration __kmpc_fork_teams +Round end + ---- Access Frequency Analysis ---- + target call (1.625206e+01, 0.000000e+00, 5.076920e+00) using %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %146 unwind label %313, !dbg !11486 + target call (1.625206e+01, 0.000000e+00, 1.015380e+01) using %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %152 unwind label %315, !dbg !11502 + target call (1.625204e+01, 1.015380e+01, 0.000000e+00) using %91 = invoke i8* @omp_target_alloc(i64 %90, i32 signext -100) + to label %92 unwind label %291, !dbg !11387 + target call (1.625204e+01, 5.076920e+00, 0.000000e+00) using %105 = invoke i8* @omp_target_alloc(i64 %104, i32 signext -100) + to label %106 unwind label %295, !dbg !11405 + target call (1.625204e+01, 8.757690e+01, 0.000000e+00) using %119 = invoke i8* @omp_target_alloc(i64 %118, i32 signext -100) + to label %120 unwind label %299, !dbg !11431 + target call (1.625204e+01, 4.569230e+01, 0.000000e+00) using %128 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %129 unwind label %303, !dbg !11449 + target call (1.625204e+01, 0.000000e+00, 5.076920e+00) using %140 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %141 unwind label %311, !dbg !11478 + target call (1.625204e+01, 3.807690e+00, 0.000000e+00) using %134 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %135 unwind label %307, !dbg !11462 + target call (1.625204e+01, 1.078710e+02, 0.000000e+00) using %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %146 unwind label %313, !dbg !11486 + target call (1.625204e+01, 2.538460e+00, 0.000000e+00) using %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %152 unwind label %315, !dbg !11502 + target call (1.625204e+01, 2.538460e+00, 2.538460e+00) using %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %157 unwind label %317, !dbg !11510 + target call (1.625202e+01, 1.015380e+01, 1.015380e+01) using %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %146 unwind label %313, !dbg !11486 + target call (1.625202e+01, 1.015380e+01, 0.000000e+00) using %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %152 unwind label %315, !dbg !11502 + call (1.625199e+01, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00) using %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %146 unwind label %313, !dbg !11486 + call (1.625199e+01, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00) using %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %157 unwind label %317, !dbg !11510 +Frequency of %91 = invoke i8* @omp_target_alloc(i64 %90, i32 signext -100) + to label %92 unwind label %291, !dbg !11387 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 1.650200e+02 store: 0.000000e+00 (target) +Frequency of %105 = invoke i8* @omp_target_alloc(i64 %104, i32 signext -100) + to label %106 unwind label %295, !dbg !11405 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 8.251031e+01 store: 0.000000e+00 (target) +Frequency of %119 = invoke i8* @omp_target_alloc(i64 %118, i32 signext -100) + to label %120 unwind label %299, !dbg !11431 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 1.423303e+03 store: 0.000000e+00 (target) +Frequency of %128 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %129 unwind label %303, !dbg !11449 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 7.425931e+02 store: 0.000000e+00 (target) +Frequency of %134 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %135 unwind label %307, !dbg !11462 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 6.188273e+01 store: 0.000000e+00 (target) +Frequency of %140 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %141 unwind label %311, !dbg !11478 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 8.251031e+01 (target) +Frequency of %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %146 unwind label %313, !dbg !11486 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 1.918144e+03 store: 2.475302e+02 (target) +Frequency of %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %152 unwind label %315, !dbg !11502 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 2.062750e+02 store: 1.650201e+02 (target) +Frequency of %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %157 unwind label %317, !dbg !11510 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 4.125515e+01 store: 4.125515e+01 (target) + ---- Optimization Preparation ---- +Rank 9 for %134 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %135 unwind label %307, !dbg !11462 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 6.188273e+01 store: 0.000000e+00 (target) +Rank 8 for %105 = invoke i8* @omp_target_alloc(i64 %104, i32 signext -100) + to label %106 unwind label %295, !dbg !11405 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 8.251031e+01 store: 0.000000e+00 (target) +Rank 7 for %140 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %141 unwind label %311, !dbg !11478 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 8.251031e+01 (target) +Rank 6 for %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %157 unwind label %317, !dbg !11510 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 4.125515e+01 store: 4.125515e+01 (target) +Rank 5 for %91 = invoke i8* @omp_target_alloc(i64 %90, i32 signext -100) + to label %92 unwind label %291, !dbg !11387 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 1.650200e+02 store: 0.000000e+00 (target) +Rank 4 for %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %152 unwind label %315, !dbg !11502 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 2.062750e+02 store: 1.650201e+02 (target) +Rank 3 for %128 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %129 unwind label %303, !dbg !11449 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 7.425931e+02 store: 0.000000e+00 (target) +Rank 2 for %119 = invoke i8* @omp_target_alloc(i64 %118, i32 signext -100) + to label %120 unwind label %299, !dbg !11431 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 1.423303e+03 store: 0.000000e+00 (target) +Rank 1 for %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %146 unwind label %313, !dbg !11486 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 1.918144e+03 store: 2.475302e+02 (target) + ---- Data Mapping Optimization ---- + target call: %49 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z21distComputeModularityRK5GraphP4CommPKddi_l396.region_id, i32 5, i8** nonnull %24, i8** nonnull %26, i64* nonnull %28, i64* getelementptr inbounds ([5 x i64], [5 x i64]* @.offload_maptypes.0, i64 0, i64 0), i32 0, i32 0), !dbg !10317 +@.offload_maptypes.0 = private unnamed_addr constant [5 x i64] [i64 800, i64 547, i64 1100853829665, i64 547, i64 1102195986465] + arg 2 (0.000000e+00, 0.000000e+00; 4.125515e+01, 4.125515e+01) is %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %157 unwind label %317, !dbg !11510 + size is %127 = shl i64 %69, 3, !dbg !11448 + global reuse is 0x06 + local reuse is 1.600380e+02, 1.280304e+03 after adjustment; scaled local reuse is 0x500 + reuse distance is 0x01 + arg 4 (0.000000e+00, 0.000000e+00; 1.918144e+03, 2.475302e+02) is %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %146 unwind label %313, !dbg !11486 + size is %144 = shl i64 %69, 4, !dbg !11485 + global reuse is 0x01 + local reuse is 1.600380e+02, 2.560608e+03 after adjustment; scaled local reuse is 0xa00 + reuse distance is 0x01 + target call: %24 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z20distUpdateLocalCinfolP4CommPKS__l436.region_id, i32 3, i8** nonnull %9, i8** nonnull %11, i64* nonnull %13, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.15, i64 0, i64 0), i32 0, i32 0), !dbg !10269 +@.offload_maptypes.15 = private unnamed_addr constant [3 x i64] [i64 800, i64 35, i64 33] + target call: %26 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z16distCleanCWandCUlPdP4Comm_l455.region_id, i32 3, i8** nonnull %11, i8** nonnull %13, i64* nonnull %15, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.20, i64 0, i64 0), i32 0, i32 0), !dbg !10269 +@.offload_maptypes.20 = private unnamed_addr constant [3 x i64] [i64 800, i64 34, i64 34] + target call: %258 = invoke i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z16distCleanCWandCUlPdP4Comm_l455.region_id, i32 3, i8** nonnull %184, i8** nonnull %186, i64* nonnull %188, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.20.1, i64 0, i64 0), i32 0, i32 0) + to label %259 unwind label %319, !dbg !11559 +@.offload_maptypes.20.1 = private unnamed_addr constant [3 x i64] [i64 800, i64 1099553574946, i64 1099681513506] + arg 1 (0.000000e+00, 0.000000e+00; 1.918144e+03, 2.475302e+02) is %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %146 unwind label %313, !dbg !11486 + size is %144 = shl i64 %69, 4, !dbg !11485 + global reuse is 0x01 + local reuse is 5.076920e+00, 4.061536e+01 after adjustment; scaled local reuse is 0x028 + reuse distance is 0x01 + arg 2 (0.000000e+00, 0.000000e+00; 2.062750e+02, 1.650201e+02) is %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %152 unwind label %315, !dbg !11502 + size is %144 = shl i64 %69, 4, !dbg !11485 + global reuse is 0x04 + local reuse is 1.015380e+01, 1.624608e+02 after adjustment; scaled local reuse is 0x0a2 + reuse distance is 0x01 + target call: %276 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z17distLouvainMethodiiRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ddRi_l1368.region_id, i32 12, i8** nonnull %200, i8** nonnull %202, i64* nonnull %204, i64* getelementptr inbounds ([12 x i64], [12 x i64]* @.offload_maptypes.47.2, i64 0, i64 0), i32 0, i32 0), !dbg !11584 +@.offload_maptypes.47.2 = private unnamed_addr constant [12 x i64] [i64 800, i64 9895689605153, i64 9895646625825, i64 9897073713185, i64 9895987392545, i64 9895646621730, i64 9895636144161, i64 1101320425505, i64 1099553587235, i64 800, i64 9895646617635, i64 800] + arg 1 (0.000000e+00, 0.000000e+00; 1.650200e+02, 0.000000e+00) is %91 = invoke i8* @omp_target_alloc(i64 %90, i32 signext -100) + to label %92 unwind label %291, !dbg !11387 + size is %90 = sub i64 %87, %89, !dbg !11386 + global reuse is 0x05 + local reuse is 1.015380e+01, 8.123040e+01 after adjustment; scaled local reuse is 0x051 + reuse distance is 0x09 + arg 2 (0.000000e+00, 0.000000e+00; 8.251031e+01, 0.000000e+00) is %105 = invoke i8* @omp_target_alloc(i64 %104, i32 signext -100) + to label %106 unwind label %295, !dbg !11405 + size is %104 = sub i64 %101, %103, !dbg !11404 + global reuse is 0x08 + local reuse is 5.076920e+00, 4.061536e+01 after adjustment; scaled local reuse is 0x028 + reuse distance is 0x09 + arg 3 (0.000000e+00, 0.000000e+00; 1.423303e+03, 0.000000e+00) is %119 = invoke i8* @omp_target_alloc(i64 %118, i32 signext -100) + to label %120 unwind label %299, !dbg !11431 + size is %118 = sub i64 %115, %117, !dbg !11430 + global reuse is 0x02 + local reuse is 8.757690e+01, 1.401230e+03 after adjustment; scaled local reuse is 0x579 + reuse distance is 0x09 + arg 4 (0.000000e+00, 0.000000e+00; 7.425931e+02, 0.000000e+00) is %128 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %129 unwind label %303, !dbg !11449 + size is %127 = shl i64 %69, 3, !dbg !11448 + global reuse is 0x03 + local reuse is 4.569230e+01, 3.655384e+02 after adjustment; scaled local reuse is 0x16d + reuse distance is 0x09 + arg 5 (0.000000e+00, 0.000000e+00; 0.000000e+00, 8.251031e+01) is %140 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %141 unwind label %311, !dbg !11478 + size is %127 = shl i64 %69, 3, !dbg !11448 + global reuse is 0x07 + local reuse is 5.076920e+00, 4.061536e+01 after adjustment; scaled local reuse is 0x028 + reuse distance is 0x09 + arg 6 (0.000000e+00, 0.000000e+00; 6.188273e+01, 0.000000e+00) is %134 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %135 unwind label %307, !dbg !11462 + size is %127 = shl i64 %69, 3, !dbg !11448 + global reuse is 0x09 + local reuse is 3.807690e+00, 3.046152e+01 after adjustment; scaled local reuse is 0x01e + reuse distance is 0x09 + arg 7 (0.000000e+00, 0.000000e+00; 1.918144e+03, 2.475302e+02) is %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %146 unwind label %313, !dbg !11486 + size is %144 = shl i64 %69, 4, !dbg !11485 + global reuse is 0x01 + local reuse is 1.078710e+02, 1.725936e+03 after adjustment; scaled local reuse is 0x6bd + reuse distance is 0x01 + arg 8 (0.000000e+00, 0.000000e+00; 2.062750e+02, 1.650201e+02) is %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %152 unwind label %315, !dbg !11502 + size is %144 = shl i64 %69, 4, !dbg !11485 + global reuse is 0x04 + local reuse is 2.538460e+00, 4.061536e+01 after adjustment; scaled local reuse is 0x028 + reuse distance is 0x01 + arg 10 (0.000000e+00, 0.000000e+00; 4.125515e+01, 4.125515e+01) is %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %157 unwind label %317, !dbg !11510 + size is %127 = shl i64 %69, 3, !dbg !11448 + global reuse is 0x06 + local reuse is 5.076920e+00, 4.061536e+01 after adjustment; scaled local reuse is 0x028 + reuse distance is 0x09 + target call: %325 = invoke i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z20distUpdateLocalCinfolP4CommPKS__l436.region_id, i32 3, i8** nonnull %184, i8** nonnull %186, i64* nonnull %188, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.15.3, i64 0, i64 0), i32 0, i32 0) + to label %326 unwind label %319, !dbg !11667 +@.offload_maptypes.15.3 = private unnamed_addr constant [3 x i64] [i64 800, i64 7696921137187, i64 7696751280161] + arg 1 (0.000000e+00, 0.000000e+00; 1.918144e+03, 2.475302e+02) is %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %146 unwind label %313, !dbg !11486 + size is %144 = shl i64 %69, 4, !dbg !11485 + global reuse is 0x01 + local reuse is 2.030760e+01, 3.249216e+02 after adjustment; scaled local reuse is 0x144 + reuse distance is 0x07 + arg 2 (0.000000e+00, 0.000000e+00; 2.062750e+02, 1.650201e+02) is %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %152 unwind label %315, !dbg !11502 + size is %144 = shl i64 %69, 4, !dbg !11485 + global reuse is 0x04 + local reuse is 1.015380e+01, 1.624608e+02 after adjustment; scaled local reuse is 0x0a2 + reuse distance is 0x07 +mpicxx main.o -O3 -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -DOMP_GPU -DOMP_GPU_ALLOC -DCHECK_NUM_EDGES -o miniVite diff --git a/miniVite/logcmplll b/miniVite/logcmplll new file mode 100644 index 0000000..2c49c24 --- /dev/null +++ b/miniVite/logcmplll @@ -0,0 +1,5651 @@ +mpicxx -std=c++11 -g -O3 -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -DOMP_GPU -DOMP_GPU_ALLOC -DCHECK_NUM_EDGES -Xclang -load -Xclang ~/git/unifiedmem/code/llvm-pass/build/uvm/libOMPPass.so -emit-llvm -S -c -o main.ll main.cpp +In file included from main.cpp:58: +In file included from ./dspl_gpu_kernel.hpp:58: +In file included from ./graph.hpp:56: +./utils.hpp:263:56: warning: using floating point absolute value function 'fabs' when argument is of integer type [-Wabsolute-value] + drand_[i] = (GraphWeight)((GraphWeight)fabs(rnums_[i]) * mult); // 0-1 + ^ +./utils.hpp:263:56: note: use function 'std::abs' instead + drand_[i] = (GraphWeight)((GraphWeight)fabs(rnums_[i]) * mult); // 0-1 + ^~~~ + std::abs + ---- Function Argument Access Frequency CG Analysis ---- +On function _Z7is_pwr2i +Round 0 +Round end +On function _Z8reseederj +Round 0 +Round end +On function _ZNSt8seed_seq8generateIN9__gnu_cxx17__normal_iteratorIPjSt6vectorIjSaIjEEEEEEvT_S8_ +Round 0 + alias entry %18 = getelementptr inbounds %"class.std::seed_seq", %"class.std::seed_seq"* %0, i64 0, i32 0, i32 0, i32 0, i32 1, !dbg !10369 + alias entry %19 = bitcast i32** %18 to i64*, !dbg !10369 + alias entry %21 = bitcast %"class.std::seed_seq"* %0 to i64*, !dbg !10376 +Round 1 +Round end + load (6.274510e-01) from %"class.std::seed_seq"* %0 + load (6.274510e-01) from %"class.std::seed_seq"* %0 + Frequency of %"class.std::seed_seq"* %0 + load: 1.254902e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z4lockv +Round 0 +Round end +On function _Z6unlockv +Round 0 +Round end +On function _Z19distSumVertexDegreeRK5GraphRSt6vectorIdSaIdEERS2_I4CommSaIS6_EE +Round 0 + alias entry %6 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 2, !dbg !10459 +Round 1 +Round end + load (1.000000e+00) from %class.Graph* %0 + Frequency of %class.Graph* %0 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.10"* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.15"* %2 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function __clang_call_terminate +Round 0 +Round end + Frequency of i8* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined. +Round 0 + alias entry %25 = getelementptr inbounds %class.Graph, %class.Graph* %3, i64 0, i32 0, i32 0, i32 0, i32 0 + alias entry %26 = getelementptr inbounds %class.Graph, %class.Graph* %3, i64 0, i32 1, i32 0, i32 0, i32 0 + alias entry %27 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %4, i64 0, i32 0, i32 0, i32 0 + alias entry %28 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %5, i64 0, i32 0, i32 0, i32 0 +Round 1 +Round end + load (6.350000e+00) from %class.Graph* %3 + load (6.350000e+00) from %class.Graph* %3 + load (6.350000e+00) from %"class.std::vector.10"* %4 + load (6.350000e+00) from %"class.std::vector.15"* %5 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %class.Graph* %3 + load: 1.270000e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.10"* %4 + load: 6.350000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.15"* %5 + load: 6.350000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z29distCalcConstantForSecondTermRKSt6vectorIdSaIdEEP19ompi_communicator_t +Round 0 + alias entry %9 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10283 + alias entry %10 = bitcast double** %9 to i64*, !dbg !10283 + alias entry %12 = bitcast %"class.std::vector.10"* %0 to i64*, !dbg !10288 +Round 1 +Round end + load (1.000000e+00) from %"class.std::vector.10"* %0 + load (1.000000e+00) from %"class.std::vector.10"* %0 + Frequency of %"class.std::vector.10"* %0 + load: 2.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.ompi_communicator_t* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp.reduction.reduction_func +Round 0 + alias entry %3 = bitcast i8* %1 to double**, !dbg !10261 + alias entry %5 = bitcast i8* %0 to double**, !dbg !10261 +Round 1 +Round end + load (1.000000e+00) from i8* %1 + load (1.000000e+00) from i8* %0 + Frequency of i8* %0 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i8* %1 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..2 +Round 0 + alias entry %32 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %4, i64 0, i32 0, i32 0, i32 0 + alias entry %102 = bitcast double* %3 to i64*, !dbg !10325 +Round 1 +Round end + load (3.157895e-01) from %"class.std::vector.10"* %4 + load (2.105263e-01) from double* %3 + store (2.105263e-01) to double* %3 + load (2.105263e-01) from double* %3 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %3 + load: 4.210526e-01 store: 2.105263e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.10"* %4 + load: 3.157895e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z12distInitCommRSt6vectorIlSaIlEES2_l +Round 0 + alias entry %6 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %1, i64 0, i32 0, i32 0, i32 1, !dbg !10273 + alias entry %7 = bitcast i64** %6 to i64*, !dbg !10273 + alias entry %9 = bitcast %"class.std::vector.0"* %1 to i64*, !dbg !10280 +Round 1 +Round end + load (1.000000e+00) from %"class.std::vector.0"* %1 + load (1.000000e+00) from %"class.std::vector.0"* %1 + Frequency of %"class.std::vector.0"* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %1 + load: 2.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..4 +Round 0 + alias entry %29 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 0 + alias entry %31 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %5, i64 0, i32 0, i32 0, i32 0 +Round 1 +Round end + load (3.200000e-01) from %"class.std::vector.0"* %3 + load (3.200000e-01) from %"class.std::vector.0"* %5 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %3 + load: 3.200000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %5 + load: 3.200000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z15distInitLouvainRK5GraphRSt6vectorIlSaIlEES5_RS2_IdSaIdEES8_RS2_I4CommSaIS9_EESC_Rdi +Round 0 + alias entry %16 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 0, !dbg !10485 + alias entry %20 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 2, !dbg !10502 + alias entry %22 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !10514 + alias entry %24 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %3, i64 0, i32 0, i32 0, i32 1, !dbg !10532 + alias entry %25 = bitcast double** %24 to i64*, !dbg !10532 + alias entry %27 = bitcast %"class.std::vector.10"* %3 to i64*, !dbg !10536 + alias entry %40 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %1, i64 0, i32 0, i32 0, i32 1, !dbg !10572 + alias entry %41 = bitcast i64** %40 to i64*, !dbg !10572 + alias entry %43 = bitcast %"class.std::vector.0"* %1 to i64*, !dbg !10574 + alias entry %56 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %2, i64 0, i32 0, i32 0, i32 1, !dbg !10600 + alias entry %57 = bitcast i64** %56 to i64*, !dbg !10600 + alias entry %59 = bitcast %"class.std::vector.0"* %2 to i64*, !dbg !10601 + alias entry %72 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %4, i64 0, i32 0, i32 0, i32 1, !dbg !10622 + alias entry %73 = bitcast double** %72 to i64*, !dbg !10622 + alias entry %75 = bitcast %"class.std::vector.10"* %4 to i64*, !dbg !10623 + alias entry %88 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %5, i64 0, i32 0, i32 0, i32 1, !dbg !10654 + alias entry %89 = bitcast %struct.Comm** %88 to i64*, !dbg !10654 + alias entry %91 = bitcast %"class.std::vector.15"* %5 to i64*, !dbg !10658 + alias entry %104 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %6, i64 0, i32 0, i32 0, i32 1, !dbg !10685 + alias entry %105 = bitcast %struct.Comm** %104 to i64*, !dbg !10685 + alias entry %107 = bitcast %"class.std::vector.15"* %6 to i64*, !dbg !10686 +Round 1 +Round end + load (1.000000e+00) from %class.Graph* %0 + load (1.000000e+00) from %class.Graph* %0 + load (1.000000e+00) from %class.Graph* %0 + load (1.000000e+00) from %"class.std::vector.10"* %3 + load (1.000000e+00) from %"class.std::vector.10"* %3 +Warning: wrong traversal order, or recursive call +On function _Z15distGetMaxIndexP7clmap_tRiPdS1_dPK4Commdldllld +Round 0 + alias entry %22 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %0, i64 %21 + alias entry %28 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %0, i64 0, i32 0, !dbg !10320 + alias entry %33 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %32, i32 1, !dbg !10330 + alias entry %35 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %32, i32 0, !dbg !10333 + alias entry %37 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %0, i64 0, i32 1, !dbg !10335 + alias entry %41 = getelementptr inbounds double, double* %2, i64 %38, !dbg !10340 + alias entry %60 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %0, i64 1, !dbg !10352 + alias entry %81 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %80, i32 1, !dbg !10330 + alias entry %83 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %80, i32 0, !dbg !10333 + alias entry %89 = getelementptr inbounds double, double* %2, i64 %86, !dbg !10340 + alias entry %126 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %125, i32 1, !dbg !10330 + alias entry %128 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %125, i32 0, !dbg !10333 + alias entry %134 = getelementptr inbounds double, double* %2, i64 %131, !dbg !10340 +Round 1 +Round end + load (1.000000e+00) from double* %2 + load (1.000000e+00) from i32* %3 + load (1.000000e+00) from i32* %1 + load (5.000000e-01) from %struct.clmap_t* %0 + load (2.500000e-01) from %struct.Comm* %5 + load (2.500000e-01) from %struct.Comm* %5 + load (2.500000e-01) from %struct.clmap_t* %0 + load (1.250000e-01) from double* %2 + load (9.984375e+00) from %struct.Comm* %5 + load (9.984375e+00) from %struct.Comm* %5 + load (4.984375e+00) from double* %2 + load (9.984375e+00) from %struct.Comm* %5 + load (9.984375e+00) from %struct.Comm* %5 + load (4.984375e+00) from double* %2 + Frequency of %struct.clmap_t* %0 + load: 7.500000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %2 + load: 1.109375e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %3 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %5 + load: 4.043750e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z24distBuildLocalMapCounterllP7clmap_tRiPdS1_PK4EdgePKllll +Round 0 + alias entry %21 = getelementptr inbounds %struct.Edge, %struct.Edge* %6, i64 %20, i32 0, !dbg !10308 + alias entry %22 = getelementptr inbounds %struct.Edge, %struct.Edge* %6, i64 %20, i32 1, !dbg !10310 + alias entry %31 = getelementptr inbounds i64, i64* %7, i64 %30, !dbg !10326 + alias entry %39 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %2, i64 %37, i32 0, !dbg !10337 + alias entry %48 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %2, i64 %35, !dbg !10348 + alias entry %58 = getelementptr inbounds double, double* %4, i64 %52, !dbg !10358 + alias entry %64 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %2, i64 %35, i32 0, !dbg !10364 + alias entry %65 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %2, i64 %35, i32 1, !dbg !10367 + alias entry %71 = bitcast double* %22 to i64*, !dbg !10375 + alias entry %74 = getelementptr inbounds double, double* %4, i64 %73, !dbg !10377 + alias entry %75 = bitcast double* %74 to i64*, !dbg !10378 +Round 1 +Round end + load (1.593750e+01) from %struct.Edge* %6 + load (7.937500e+00) from %struct.Edge* %6 + load (1.593750e+01) from i64* %7 + load (1.593750e+01) from i32* %3 + load (1.625000e+02) from %struct.clmap_t* %2 + load (9.937500e+00) from i32* %5 + load (4.937500e+00) from %struct.Edge* %6 + load (4.937500e+00) from double* %4 + store (4.937500e+00) to double* %4 + store (5.437500e+00) to %struct.clmap_t* %2 + store (5.437500e+00) to %struct.clmap_t* %2 + store (5.437500e+00) to i32* %3 + load (1.093750e+01) from i32* %5 + load (5.437500e+00) from %struct.Edge* %6 + store (5.437500e+00) to double* %4 + store (5.437500e+00) to i32* %5 + Frequency of %struct.clmap_t* %2 + load: 1.625000e+02 store: 1.087500e+01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %3 + load: 1.593750e+01 store: 5.437500e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %4 + load: 4.937500e+00 store: 1.037500e+01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %5 + load: 2.087500e+01 store: 5.437500e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Edge* %6 + load: 3.425000e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %7 + load: 1.593750e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z27distExecuteLouvainIterationlPKlS0_PK4EdgeS0_PlPKdP4CommS8_dPdi +Round 0 + alias entry %18 = getelementptr inbounds i64, i64* %2, i64 %17, !dbg !10316 + alias entry %20 = getelementptr inbounds i64, i64* %4, i64 %0, !dbg !10322 + alias entry %23 = getelementptr inbounds i64, i64* %1, i64 %0, !dbg !10329 + alias entry %26 = getelementptr inbounds i64, i64* %1, i64 %25, !dbg !10332 + alias entry %30 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %22, i32 0, !dbg !10337 + alias entry %32 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %22, i32 1, !dbg !10341 + alias entry %47 = getelementptr inbounds %struct.Edge, %struct.Edge* %3, i64 %46, i32 0, !dbg !10401 + alias entry %48 = getelementptr inbounds %struct.Edge, %struct.Edge* %3, i64 %46, i32 1, !dbg !10403 + alias entry %57 = getelementptr inbounds i64, i64* %4, i64 %56, !dbg !10414 + alias entry %95 = bitcast double* %48 to i64*, !dbg !10457 + alias entry %118 = getelementptr inbounds double, double* %10, i64 %0, !dbg !10470 + alias entry %122 = getelementptr inbounds double, double* %6, i64 %0, !dbg !10473 + alias entry %140 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %139, i32 1, !dbg !10533 + alias entry %142 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %139, i32 0, !dbg !10534 + alias entry %188 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %187, i32 1, !dbg !10533 + alias entry %190 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %187, i32 0, !dbg !10534 + alias entry %236 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %235, i32 1, !dbg !10572 + alias entry %237 = bitcast double* %236 to i64*, !dbg !10573 + alias entry %248 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %235, i32 0, !dbg !10575 + alias entry %250 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %22, i32 1, !dbg !10578 + alias entry %252 = bitcast double* %250 to i64*, !dbg !10581 + alias entry %263 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %22, i32 0, !dbg !10583 + alias entry %267 = getelementptr inbounds i64, i64* %5, i64 %0, !dbg !10587 + alias entry %270 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %269, i32 1, !dbg !10533 + alias entry %272 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %269, i32 0, !dbg !10534 +Round 1 +Round end + load (1.000000e+00) from i64* %2 + load (1.000000e+00) from i64* %4 + load (1.000000e+00) from i64* %1 + load (1.000000e+00) from i64* %1 + load (5.000000e-01) from %struct.Comm* %7 + load (5.000000e-01) from %struct.Comm* %7 + load (7.992188e+00) from %struct.Edge* %3 + load (3.992188e+00) from %struct.Edge* %3 + load (7.992188e+00) from i64* %4 + load (2.492188e+00) from %struct.Edge* %3 + load (2.742188e+00) from %struct.Edge* %3 + load (5.000000e-01) from double* %10 + store (5.000000e-01) to double* %10 + load (5.000000e-01) from double* %6 + load (1.250000e-01) from %struct.Comm* %7 + load (1.250000e-01) from %struct.Comm* %7 + load (4.992188e+00) from %struct.Comm* %7 + load (4.992188e+00) from %struct.Comm* %7 + load (2.500000e-01) from %struct.Comm* %8 + load (2.500000e-01) from double* %6 + load (2.500000e-01) from %struct.Comm* %8 + store (1.000000e+00) to i64* %5 + load (4.992188e+00) from %struct.Comm* %7 + load (4.992188e+00) from %struct.Comm* %7 + Frequency of i64* %1 + load: 2.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %2 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Edge* %3 + load: 1.721875e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %4 + load: 8.992188e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %5 + load: 0.000000e+00 store: 1.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %6 + load: 7.500000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %7 + load: 2.121875e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %8 + load: 5.000000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %10 + load: 5.000000e-01 store: 5.000000e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z21distComputeModularityRK5GraphP4CommPKddi +Round 0 + alias entry %14 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 2, !dbg !10288 + alias entry %16 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !10304 + base alias entry %35 = bitcast i8** %34 to double**, !dbg !10317 + base alias entry %37 = bitcast i8** %36 to double**, !dbg !10317 + base alias entry %45 = bitcast i8** %44 to %struct.Comm**, !dbg !10317 + base alias entry %47 = bitcast i8** %46 to %struct.Comm**, !dbg !10317 +Round 1 + base alias entry %34 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 2, !dbg !10317 + base alias entry %36 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 2, !dbg !10317 + base alias entry %44 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 4, !dbg !10317 + base alias entry %46 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 4, !dbg !10317 +Round 2 + base alias offset entry (2) %11 = alloca [5 x i8*], align 8 + base alias offset entry (2) %12 = alloca [5 x i8*], align 8 + base alias offset entry (-1) %39 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 3, !dbg !10317 + base alias offset entry (-1) %41 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 3, !dbg !10317 + base alias offset entry (4) %11 = alloca [5 x i8*], align 8 + base alias offset entry (-2) %44 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 4, !dbg !10317 + base alias offset entry (4) %12 = alloca [5 x i8*], align 8 + base alias offset entry (-2) %46 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 4, !dbg !10317 +Round 3 + base alias offset entry (4) %24 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 0, !dbg !10317 + base alias offset entry (2) %24 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 0, !dbg !10317 + base alias offset entry (4) %26 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 0, !dbg !10317 + base alias offset entry (2) %26 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 0, !dbg !10317 + base alias offset entry (3) %29 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 1, !dbg !10317 + base alias offset entry (1) %29 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 1, !dbg !10317 + base alias offset entry (3) %31 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 1, !dbg !10317 + base alias offset entry (1) %31 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 1, !dbg !10317 + base alias offset entry (2) %34 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 2, !dbg !10317 + base alias offset entry (2) %36 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 2, !dbg !10317 + base alias offset entry (1) %39 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 3, !dbg !10317 + base alias offset entry (1) %41 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 3, !dbg !10317 +Round 4 +Round end + load (1.000000e+00) from %class.Graph* %0 + load (1.000000e+00) from %class.Graph* %0 + Frequency of %class.Graph* %0 + load: 2.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %2 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp.reduction.reduction_func.7 +Round 0 + alias entry %3 = bitcast i8* %1 to double**, !dbg !10261 + alias entry %5 = bitcast i8* %0 to double**, !dbg !10261 + alias entry %7 = getelementptr inbounds i8, i8* %1, i64 8, !dbg !10261 + alias entry %8 = bitcast i8* %7 to double**, !dbg !10261 + alias entry %10 = getelementptr inbounds i8, i8* %0, i64 8, !dbg !10261 + alias entry %11 = bitcast i8* %10 to double**, !dbg !10261 +Round 1 +Round end + load (1.000000e+00) from i8* %1 + load (1.000000e+00) from i8* %0 + load (1.000000e+00) from i8* %1 + load (1.000000e+00) from i8* %0 + Frequency of i8* %0 + load: 2.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i8* %1 + load: 2.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..8 +Round 0 + alias entry %40 = getelementptr inbounds double, double* %6, i64 %39, !dbg !10318 + alias entry %43 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %39, i32 1, !dbg !10321 + alias entry %63 = bitcast double* %5 to i64*, !dbg !10329 + alias entry %75 = bitcast double* %7 to i64*, !dbg !10329 +Round 1 +Round end + load (1.010526e+01) from double* %6 + load (1.010526e+01) from %struct.Comm* %8 + load (2.105263e-01) from double* %5 + store (2.105263e-01) to double* %5 + load (2.105263e-01) from double* %7 + store (2.105263e-01) to double* %7 + load (2.105263e-01) from double* %5 + load (2.105263e-01) from double* %7 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %5 + load: 4.210526e-01 store: 2.105263e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %6 + load: 1.010526e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %7 + load: 4.210526e-01 store: 2.105263e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %8 + load: 1.010526e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp.reduction.reduction_func.9 +Round 0 + alias entry %3 = bitcast i8* %1 to double**, !dbg !10261 + alias entry %5 = bitcast i8* %0 to double**, !dbg !10261 + alias entry %7 = getelementptr inbounds i8, i8* %1, i64 8, !dbg !10261 + alias entry %8 = bitcast i8* %7 to double**, !dbg !10261 + alias entry %10 = getelementptr inbounds i8, i8* %0, i64 8, !dbg !10261 + alias entry %11 = bitcast i8* %10 to double**, !dbg !10261 +Round 1 +Round end + load (1.000000e+00) from i8* %1 + load (1.000000e+00) from i8* %0 + load (1.000000e+00) from i8* %1 + load (1.000000e+00) from i8* %0 + Frequency of i8* %0 + load: 2.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i8* %1 + load: 2.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..10 +Round 0 + alias entry %67 = bitcast double* %3 to i64*, !dbg !10310 + alias entry %79 = bitcast double* %5 to i64*, !dbg !10310 +Round 1 +Round end + load (2.916667e-01) from double* %3 + store (2.916667e-01) to double* %3 + load (2.916667e-01) from double* %5 + store (2.916667e-01) to double* %5 + load (3.333333e-01) from double* %3 + load (3.333333e-01) from double* %5 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %3 + load: 6.250000e-01 store: 2.916667e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %4 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %5 + load: 6.250000e-01 store: 2.916667e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %6 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z20distUpdateLocalCinfolP4CommPKS_ +Round 0 + base alias entry %15 = bitcast i8** %14 to %struct.Comm**, !dbg !10269 + base alias entry %17 = bitcast i8** %16 to %struct.Comm**, !dbg !10269 + base alias entry %20 = bitcast i8** %19 to %struct.Comm**, !dbg !10269 + base alias entry %22 = bitcast i8** %21 to %struct.Comm**, !dbg !10269 +Round 1 + base alias entry %14 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 1, !dbg !10269 + base alias entry %16 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 1, !dbg !10269 + base alias entry %19 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 2, !dbg !10269 + base alias entry %21 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 2, !dbg !10269 +Round 2 + base alias offset entry (1) %5 = alloca [3 x i8*], align 8 + base alias offset entry (1) %6 = alloca [3 x i8*], align 8 + base alias offset entry (2) %5 = alloca [3 x i8*], align 8 + base alias offset entry (-1) %19 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 2, !dbg !10269 + base alias offset entry (2) %6 = alloca [3 x i8*], align 8 + base alias offset entry (-1) %21 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 2, !dbg !10269 +Round 3 + base alias offset entry (1) %9 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 0, !dbg !10269 + base alias offset entry (2) %9 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 0, !dbg !10269 + base alias offset entry (1) %11 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 0, !dbg !10269 + base alias offset entry (2) %11 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 0, !dbg !10269 + base alias offset entry (1) %14 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 1, !dbg !10269 + base alias offset entry (1) %16 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 1, !dbg !10269 +Round 4 +Round end + Frequency of %struct.Comm* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %2 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..13 +Round 0 + alias entry %33 = getelementptr %struct.Comm, %struct.Comm* %5, i64 %26, !dbg !10304 + alias entry %36 = getelementptr %struct.Comm, %struct.Comm* %5, i64 %35, i32 1, !dbg !10304 + alias entry %37 = getelementptr %struct.Comm, %struct.Comm* %6, i64 %26, !dbg !10304 + alias entry %38 = getelementptr %struct.Comm, %struct.Comm* %6, i64 %35, i32 1, !dbg !10304 + alias entry %39 = getelementptr %struct.Comm, %struct.Comm* %5, i64 %26, i32 1, !dbg !10304 + alias entry %41 = getelementptr %struct.Comm, %struct.Comm* %5, i64 %40, !dbg !10304 + alias entry %42 = getelementptr %struct.Comm, %struct.Comm* %6, i64 %26, i32 1, !dbg !10304 + alias entry %43 = getelementptr %struct.Comm, %struct.Comm* %6, i64 %40, !dbg !10304 + alias entry %44 = bitcast double* %38 to %struct.Comm*, !dbg !10304 + alias entry %46 = bitcast double* %36 to %struct.Comm*, !dbg !10304 + alias entry %49 = bitcast %struct.Comm* %43 to double*, !dbg !10304 + alias entry %51 = bitcast %struct.Comm* %41 to double*, !dbg !10304 + alias entry %67 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %61, i32 0, !dbg !10304 + alias entry %68 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %62, i32 0, !dbg !10304 + alias entry %69 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %63, i32 0, !dbg !10304 + alias entry %70 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %64, i32 0, !dbg !10304 + alias entry %71 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %65, i32 0, !dbg !10304 + alias entry %72 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %66, i32 0, !dbg !10304 + alias entry %73 = bitcast i64* %67 to <4 x i64>*, !dbg !10304 + alias entry %74 = bitcast i64* %68 to <4 x i64>*, !dbg !10304 + alias entry %75 = bitcast i64* %69 to <4 x i64>*, !dbg !10304 + alias entry %76 = bitcast i64* %70 to <4 x i64>*, !dbg !10304 + alias entry %77 = bitcast i64* %71 to <4 x i64>*, !dbg !10304 + alias entry %78 = bitcast i64* %72 to <4 x i64>*, !dbg !10304 + alias entry %97 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %61, i32 0, !dbg !10307 + alias entry %98 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %62, i32 0, !dbg !10307 + alias entry %99 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %63, i32 0, !dbg !10307 + alias entry %100 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %64, i32 0, !dbg !10307 + alias entry %101 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %65, i32 0, !dbg !10307 + alias entry %102 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %66, i32 0, !dbg !10307 + alias entry %103 = bitcast i64* %97 to <4 x i64>*, !dbg !10307 + alias entry %104 = bitcast i64* %98 to <4 x i64>*, !dbg !10307 + alias entry %105 = bitcast i64* %99 to <4 x i64>*, !dbg !10307 + alias entry %106 = bitcast i64* %100 to <4 x i64>*, !dbg !10307 + alias entry %107 = bitcast i64* %101 to <4 x i64>*, !dbg !10307 + alias entry %108 = bitcast i64* %102 to <4 x i64>*, !dbg !10307 + alias entry %139 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %61, i32 1, !dbg !10309 + alias entry %140 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %62, i32 1, !dbg !10309 + alias entry %141 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %63, i32 1, !dbg !10309 + alias entry %142 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %64, i32 1, !dbg !10309 + alias entry %143 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %65, i32 1, !dbg !10309 + alias entry %144 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %66, i32 1, !dbg !10309 + alias entry %151 = getelementptr inbounds double, double* %139, i64 -1, !dbg !10309 + alias entry %152 = bitcast double* %151 to <4 x double>*, !dbg !10309 + alias entry %153 = getelementptr inbounds double, double* %140, i64 -1, !dbg !10309 + alias entry %154 = bitcast double* %153 to <4 x double>*, !dbg !10309 + alias entry %155 = getelementptr inbounds double, double* %141, i64 -1, !dbg !10309 + alias entry %156 = bitcast double* %155 to <4 x double>*, !dbg !10309 + alias entry %157 = getelementptr inbounds double, double* %142, i64 -1, !dbg !10309 + alias entry %158 = bitcast double* %157 to <4 x double>*, !dbg !10309 + alias entry %159 = getelementptr inbounds double, double* %143, i64 -1, !dbg !10309 + alias entry %160 = bitcast double* %159 to <4 x double>*, !dbg !10309 + alias entry %161 = getelementptr inbounds double, double* %144, i64 -1, !dbg !10309 + alias entry %162 = bitcast double* %161 to <4 x double>*, !dbg !10309 + alias entry %183 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %182, i32 0, !dbg !10304 + alias entry %185 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %182, i32 0, !dbg !10307 + alias entry %188 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %182, i32 1, !dbg !10318 + alias entry %190 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %182, i32 1, !dbg !10309 +Round 1 +Round end + load (2.500000e+00) from %struct.Comm* %6 + load (2.500000e+00) from %struct.Comm* %6 + load (2.500000e+00) from %struct.Comm* %6 + load (2.500000e+00) from %struct.Comm* %6 + load (2.500000e+00) from %struct.Comm* %6 + load (2.500000e+00) from %struct.Comm* %6 + load (2.500000e+00) from %struct.Comm* %5 + load (2.500000e+00) from %struct.Comm* %5 + load (2.500000e+00) from %struct.Comm* %5 + load (2.500000e+00) from %struct.Comm* %5 + load (2.500000e+00) from %struct.Comm* %5 + load (2.500000e+00) from %struct.Comm* %5 + store (2.500000e+00) to %struct.Comm* %5 + store (2.500000e+00) to %struct.Comm* %5 + store (2.500000e+00) to %struct.Comm* %5 + store (2.500000e+00) to %struct.Comm* %5 + store (2.500000e+00) to %struct.Comm* %5 + store (2.500000e+00) to %struct.Comm* %5 + load (9.088235e+00) from %struct.Comm* %6 + load (9.088235e+00) from %struct.Comm* %5 + store (9.088235e+00) to %struct.Comm* %5 + load (9.088235e+00) from %struct.Comm* %6 + load (9.088235e+00) from %struct.Comm* %5 + store (9.088235e+00) to %struct.Comm* %5 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %5 + load: 3.317647e+01 store: 3.317647e+01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %6 + load: 3.317647e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..14 +Round 0 +Round end + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %3 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %4 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z16distCleanCWandCUlPdP4Comm +Round 0 + base alias entry %17 = bitcast i8** %16 to double**, !dbg !10269 + base alias entry %19 = bitcast i8** %18 to double**, !dbg !10269 + base alias entry %22 = bitcast i8** %21 to %struct.Comm**, !dbg !10269 + base alias entry %24 = bitcast i8** %23 to %struct.Comm**, !dbg !10269 +Round 1 + base alias entry %16 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 1, !dbg !10269 + base alias entry %18 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 1, !dbg !10269 + base alias entry %21 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 2, !dbg !10269 + base alias entry %23 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 2, !dbg !10269 +Round 2 + base alias offset entry (1) %5 = alloca [3 x i8*], align 8 + base alias offset entry (1) %6 = alloca [3 x i8*], align 8 + base alias offset entry (2) %5 = alloca [3 x i8*], align 8 + base alias offset entry (-1) %21 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 2, !dbg !10269 + base alias offset entry (2) %6 = alloca [3 x i8*], align 8 + base alias offset entry (-1) %23 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 2, !dbg !10269 +Round 3 + base alias offset entry (1) %11 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 0, !dbg !10269 + base alias offset entry (2) %11 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 0, !dbg !10269 + base alias offset entry (1) %13 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 0, !dbg !10269 + base alias offset entry (2) %13 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 0, !dbg !10269 + base alias offset entry (1) %16 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 1, !dbg !10269 + base alias offset entry (1) %18 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 1, !dbg !10269 +Round 4 +Round end + Frequency of double* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %2 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..18 +Round 0 + alias entry %30 = getelementptr inbounds double, double* %5, i64 %29, !dbg !10304 + alias entry %31 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %29, i32 0, !dbg !10309 + alias entry %34 = bitcast i64* %31 to i8*, !dbg !10299 +Round 1 +Round end + store (1.058333e+01) to double* %5 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %5 + load: 0.000000e+00 store: 1.058333e+01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %6 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..19 +Round 0 +Round end + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %3 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %4 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z21fillRemoteCommunitiesRK5GraphiiRKmS3_RKSt6vectorIlSaIlEES8_S8_S8_S8_RKS4_I4CommSaIS9_EERSt3mapIlS9_St4lessIlESaISt4pairIKlS9_EEERSt13unordered_mapIllSt4hashIlESt8equal_toIlESaISH_ISI_lEEESM_ +Round 0 + alias entry %126 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 0, !dbg !11433 + alias entry %130 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 2, !dbg !11449 + alias entry %132 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !11460 + alias entry %190 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %6, i64 0, i32 0, i32 0, i32 0 + alias entry %197 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %5, i64 0, i32 0, i32 0, i32 0 + alias entry %301 = getelementptr inbounds %"class.std::unordered_map", %"class.std::unordered_map"* %12, i64 0, i32 0, i32 2, i32 0, !dbg !11792 + alias entry %302 = bitcast %"struct.std::__detail::_Hash_node_base"* %301 to %"struct.std::__detail::_Hash_node"**, !dbg !11793 + alias entry %312 = bitcast %"class.std::unordered_map"* %12 to i8**, !dbg !11836 + alias entry %314 = getelementptr inbounds %"class.std::unordered_map", %"class.std::unordered_map"* %12, i64 0, i32 0, i32 1, !dbg !11842 + alias entry %317 = bitcast %"struct.std::__detail::_Hash_node_base"* %301 to i8*, !dbg !11846 + alias entry %320 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %8, i64 0, i32 0, i32 0, i32 0 + alias entry %321 = getelementptr inbounds %"class.std::unordered_map", %"class.std::unordered_map"* %12, i64 0, i32 0 + alias entry %322 = getelementptr inbounds %"class.std::unordered_map", %"class.std::unordered_map"* %12, i64 0, i32 0, i32 0 + alias entry %323 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6 + alias entry %324 = bitcast %"class.std::vector.0"* %323 to i64* + alias entry %325 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 1 + alias entry %326 = bitcast i64** %325 to i64* + alias entry %330 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %9, i64 0, i32 0, i32 0, i32 0 + alias entry %331 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6 + alias entry %332 = bitcast %"class.std::vector.0"* %331 to i64* + alias entry %333 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 1 + alias entry %334 = bitcast i64** %333 to i64* + alias entry %818 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, !dbg !13393 + alias entry %819 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 1, i32 1, !dbg !13405 + alias entry %820 = bitcast %"struct.std::_Rb_tree_node_base"** %819 to %"struct.std::_Rb_tree_node"**, !dbg !13405 + alias entry %826 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 1, !dbg !13419 + alias entry %827 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 1, i32 2, !dbg !13425 + base alias entry %827 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 1, i32 2, !dbg !13425 + alias entry %828 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 1, i32 3, !dbg !13435 + base alias entry %828 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 1, i32 3, !dbg !13435 + alias entry %829 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 2, !dbg !13437 + alias entry %830 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, !dbg !13442 + alias entry %831 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 1, i32 1, !dbg !13447 + alias entry %832 = bitcast %"struct.std::_Rb_tree_node_base"** %831 to %"struct.std::_Rb_tree_node"**, !dbg !13447 + alias entry %838 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 1, !dbg !13452 + alias entry %839 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 1, i32 2, !dbg !13455 + base alias entry %839 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 1, i32 2, !dbg !13455 + alias entry %840 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 1, i32 3, !dbg !13462 + base alias entry %840 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 1, i32 3, !dbg !13462 + alias entry %841 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 2, !dbg !13464 + alias entry %846 = bitcast %"struct.std::_Rb_tree_node_base"** %819 to i64* + alias entry %848 = bitcast %"struct.std::_Rb_tree_node_base"* %826 to %"struct.std::_Rb_tree_node"* + alias entry %850 = bitcast %"struct.std::_Rb_tree_node_base"** %831 to i64* + alias entry %852 = bitcast %"struct.std::_Rb_tree_node_base"* %838 to %"struct.std::_Rb_tree_node"* + alias entry %967 = load %"struct.std::_Rb_tree_node_base"*, %"struct.std::_Rb_tree_node_base"** %827, align 8, !dbg !14017, !tbaa !14018 + alias entry %1023 = load %"struct.std::_Rb_tree_node_base"*, %"struct.std::_Rb_tree_node_base"** %839, align 8, !dbg !14306, !tbaa !14018 +Round 1 +Round end + load (1.000000e+00) from i64* %4 + load (9.999994e-01) from i64* %3 + load (9.999963e-01) from %class.Graph* %0 + load (9.999963e-01) from %class.Graph* %0 + load (9.999963e-01) from %class.Graph* %0 + load (9.999803e+00) from %"class.std::vector.0"* %6 + load (1.999960e+01) from %"class.std::vector.0"* %6 + load (6.249782e+00) from %"class.std::vector.0"* %5 + load (1.249956e+01) from %"class.std::vector.0"* %5 + load (9.999777e-01) from %"class.std::unordered_map"* %12 + load (9.999777e-01) from %"class.std::unordered_map"* %12 + load (9.999777e-01) from %"class.std::unordered_map"* %12 + load (1.999809e+01) from %"class.std::vector.0"* %8 + load (1.999807e+01) from %"class.std::unordered_map"* %12 + load (1.999807e+01) from %"class.std::unordered_map"* %12 +Warning: wrong traversal order, or recursive call +On function .omp_outlined..22 +Round 0 + alias entry %31 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 0 + alias entry %33 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 0 + alias entry %35 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %6, i64 0, i32 0, i32 0, i32 0 +Round 1 +Round end + load (1.000000e+00) from i64* %2 + load (3.200000e-01) from %"class.std::vector.0"* %3 + load (3.200000e-01) from %"class.std::vector.0"* %4 + load (3.200000e-01) from %"class.std::vector.0"* %6 + load (1.020000e+01) from i64* %5 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %2 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %3 + load: 3.200000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %4 + load: 3.200000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %5 + load: 1.020000e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %6 + load: 3.200000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp.reduction.reduction_func.24 +Round 0 + alias entry %3 = bitcast i8* %1 to i64**, !dbg !10261 + alias entry %5 = bitcast i8* %0 to i64**, !dbg !10261 +Round 1 +Round end + load (1.000000e+00) from i8* %1 + load (1.000000e+00) from i8* %0 + Frequency of i8* %0 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i8* %1 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..25 +Round 0 + alias entry %33 = getelementptr inbounds %"class.std::vector.29", %"class.std::vector.29"* %4, i64 0, i32 0, i32 0, i32 0 + alias entry %35 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 0 +Round 1 +Round end + load (1.000000e+00) from i32* %2 + load (3.157895e-01) from %"class.std::vector.29"* %4 + load (3.157895e-01) from %"class.std::vector.0"* %3 + load (2.105263e-01) from i64* %5 + store (2.105263e-01) to i64* %5 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %2 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %3 + load: 3.157895e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.29"* %4 + load: 3.157895e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %5 + load: 2.105263e-01 store: 2.105263e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp.reduction.reduction_func.27 +Round 0 + alias entry %3 = bitcast i8* %1 to i64**, !dbg !10261 + alias entry %5 = bitcast i8* %0 to i64**, !dbg !10261 +Round 1 +Round end + load (1.000000e+00) from i8* %1 + load (1.000000e+00) from i8* %0 + Frequency of i8* %0 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i8* %1 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..28 +Round 0 + alias entry %32 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 0 +Round 1 +Round end + load (1.000000e+00) from i32* %2 + load (3.157895e-01) from %"class.std::vector.0"* %4 + load (2.105263e-01) from i64* %3 + store (2.105263e-01) to i64* %3 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %2 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %3 + load: 2.105263e-01 store: 2.105263e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %4 + load: 3.157895e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..30 +Round 0 + alias entry %20 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %2, i64 0, i32 0, i32 0, i32 0, !dbg !10503 + alias entry %34 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 0 + alias entry %35 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %7, i64 0, i32 0, i32 0, i32 0 + alias entry %36 = getelementptr inbounds %"class.std::vector.52", %"class.std::vector.52"* %6, i64 0, i32 0, i32 0, i32 0 +Round 1 +Round end + load (1.000000e+00) from %"class.std::vector.0"* %2 + load (2.047500e+02) from %"class.std::vector.0"* %4 + load (2.047500e+02) from %"class.std::vector.15"* %7 + load (2.047500e+02) from %"class.std::vector.52"* %6 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %2 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %4 + load: 2.047500e+02 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.52"* %6 + load: 2.047500e+02 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.15"* %7 + load: 2.047500e+02 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z22createCommunityMPITypev +Round 0 +Round end +On function _Z23destroyCommunityMPITypev +Round 0 +Round end +On function _Z23updateRemoteCommunitiesRK5GraphRSt6vectorI4CommSaIS3_EERKSt3mapIlS3_St4lessIlESaISt4pairIKlS3_EEEii +Round 0 + alias entry %19 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 0, !dbg !10869 + alias entry %46 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !11050 + alias entry %48 = getelementptr inbounds %"class.std::map", %"class.std::map"* %2, i64 0, i32 0, i32 0, i32 1, i32 2, !dbg !11068 + alias entry %49 = bitcast %"struct.std::_Rb_tree_node_base"** %48 to i64*, !dbg !11068 + alias entry %51 = getelementptr inbounds %"class.std::map", %"class.std::map"* %2, i64 0, i32 0, i32 0, i32 1, !dbg !11085 + alias entry %55 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6 + alias entry %56 = bitcast %"class.std::vector.0"* %55 to i64* + alias entry %57 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 1 + alias entry %58 = bitcast i64** %57 to i64* +Round 1 +Round end + load (1.000000e+00) from %class.Graph* %0 + load (9.999994e-01) from %class.Graph* %0 + load (9.999994e-01) from %"class.std::map"* %2 + load (1.999985e+01) from %class.Graph* %0 + load (1.999985e+01) from %class.Graph* %0 + Frequency of %class.Graph* %0 + load: 4.199970e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.15"* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::map"* %2 + load: 9.999994e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..32 +Round 0 + alias entry %28 = getelementptr inbounds %"class.std::vector.66", %"class.std::vector.66"* %4, i64 0, i32 0, i32 0, i32 0 + alias entry %30 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 0 +Round 1 +Round end + load (1.000000e+00) from i32* %2 + load (3.137255e-01) from %"class.std::vector.66"* %4 + load (3.137255e-01) from %"class.std::vector.0"* %3 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %2 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %3 + load: 3.137255e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.66"* %4 + load: 3.137255e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp.reduction.reduction_func.34 +Round 0 + alias entry %3 = bitcast i8* %1 to i64**, !dbg !10261 + alias entry %5 = bitcast i8* %0 to i64**, !dbg !10261 + alias entry %7 = getelementptr inbounds i8, i8* %1, i64 8, !dbg !10261 + alias entry %8 = bitcast i8* %7 to i64**, !dbg !10261 + alias entry %10 = getelementptr inbounds i8, i8* %0, i64 8, !dbg !10261 + alias entry %11 = bitcast i8* %10 to i64**, !dbg !10261 +Round 1 +Round end + load (1.000000e+00) from i8* %1 + load (1.000000e+00) from i8* %0 + load (1.000000e+00) from i8* %1 + load (1.000000e+00) from i8* %0 + Frequency of i8* %0 + load: 2.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i8* %1 + load: 2.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..35 +Round 0 + alias entry %36 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 0 + alias entry %38 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %6, i64 0, i32 0, i32 0, i32 0 +Round 1 +Round end + load (1.000000e+00) from i32* %2 + load (3.157895e-01) from %"class.std::vector.0"* %4 + load (3.157895e-01) from %"class.std::vector.0"* %6 + load (2.105263e-01) from i64* %3 + store (2.105263e-01) to i64* %3 + load (2.105263e-01) from i64* %5 + store (2.105263e-01) to i64* %5 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %2 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %3 + load: 2.105263e-01 store: 2.105263e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %4 + load: 3.157895e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %5 + load: 2.105263e-01 store: 2.105263e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %6 + load: 3.157895e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..37 +Round 0 + alias entry %26 = getelementptr inbounds %"class.std::vector.52", %"class.std::vector.52"* %3, i64 0, i32 0, i32 0, i32 0 + alias entry %27 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %4, i64 0, i32 0, i32 0, i32 0 +Round 1 +Round end + load (1.000000e+00) from i64* %2 + load (6.350000e+00) from %"class.std::vector.52"* %3 + load (6.350000e+00) from %"class.std::vector.15"* %4 + load (6.350000e+00) from i64* %5 + load (2.047500e+02) from i64* %5 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %2 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.52"* %3 + load: 6.350000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.15"* %4 + load: 6.350000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %5 + load: 2.111000e+02 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z18exchangeVertexReqsRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ii +Round 0 + alias entry %15 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 2, !dbg !10306 + alias entry %17 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !10319 + alias entry %51 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 1, !dbg !10485 + alias entry %52 = bitcast i64** %51 to i64*, !dbg !10485 + alias entry %54 = bitcast %"class.std::vector.0"* %4 to i64*, !dbg !10489 + alias entry %71 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 1, !dbg !10517 + alias entry %72 = bitcast i64** %71 to i64*, !dbg !10517 + alias entry %74 = bitcast %"class.std::vector.0"* %3 to i64*, !dbg !10518 + alias entry %91 = bitcast %"class.std::vector.0"* %3 to i8** + alias entry %94 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 0 + alias entry %99 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 0, !dbg !10612 + alias entry %100 = bitcast %"class.std::vector.0"* %4 to i8**, !dbg !10612 + alias entry %129 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %5, i64 0, i32 0, i32 0, i32 1, !dbg !10673 + alias entry %130 = bitcast i64** %129 to i64*, !dbg !10673 + alias entry %132 = bitcast %"class.std::vector.0"* %5 to i64*, !dbg !10674 + alias entry %148 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %6, i64 0, i32 0, i32 0, i32 1, !dbg !10696 + alias entry %149 = bitcast i64** %148 to i64*, !dbg !10696 + alias entry %151 = bitcast %"class.std::vector.0"* %6 to i64*, !dbg !10697 + alias entry %191 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %6, i64 0, i32 0, i32 0, i32 0 + alias entry %251 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %5, i64 0, i32 0, i32 0, i32 0 + alias entry %310 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %5, i64 0, i32 0, i32 0, i32 2, !dbg !11244 + alias entry %311 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %6, i64 0, i32 0, i32 0, i32 2, !dbg !11245 + alias entry %312 = bitcast i64** %310 to i64*, !dbg !11249 + alias entry %314 = bitcast i64** %311 to i64*, !dbg !11250 + alias entry %320 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 2, !dbg !11279 + alias entry %321 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 2, !dbg !11280 + alias entry %322 = bitcast i64** %320 to i64*, !dbg !11284 + alias entry %324 = bitcast i64** %321 to i64*, !dbg !11285 +Round 1 +Round end + load (1.000000e+00) from %class.Graph* %0 + load (1.000000e+00) from %class.Graph* %0 + load (9.999984e-01) from %"class.std::vector.0"* %4 + load (9.999984e-01) from %"class.std::vector.0"* %4 +Warning: wrong traversal order, or recursive call +On function .omp_outlined..39 +Round 0 + alias entry %26 = getelementptr inbounds %class.Graph, %class.Graph* %3, i64 0, i32 0, i32 0, i32 0, i32 0 + alias entry %27 = getelementptr inbounds %class.Graph, %class.Graph* %3, i64 0, i32 1, i32 0, i32 0, i32 0 + alias entry %28 = getelementptr inbounds %class.Graph, %class.Graph* %3, i64 0, i32 6 + alias entry %29 = bitcast %"class.std::vector.0"* %28 to i64* + alias entry %30 = getelementptr inbounds %class.Graph, %class.Graph* %3, i64 0, i32 6, i32 0, i32 0, i32 1 + alias entry %31 = bitcast i64** %30 to i64* + alias entry %32 = getelementptr inbounds %"class.std::vector.29", %"class.std::vector.29"* %5, i64 0, i32 0, i32 0, i32 0 +Round 1 +Round end + load (1.988141e+02) from %class.Graph* %3 + load (3.180957e+03) from %class.Graph* %3 + load (3.180957e+03) from %class.Graph* %3 + load (3.180957e+03) from %class.Graph* %3 + load (1.590478e+03) from %"class.std::vector.29"* %5 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %class.Graph* %3 + load: 9.741684e+03 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.29"* %5 + load: 1.590478e+03 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp.reduction.reduction_func.41 +Round 0 + alias entry %3 = bitcast i8* %1 to i64**, !dbg !10261 + alias entry %5 = bitcast i8* %0 to i64**, !dbg !10261 +Round 1 +Round end + load (1.000000e+00) from i8* %1 + load (1.000000e+00) from i8* %0 + Frequency of i8* %0 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i8* %1 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..42 +Round 0 + alias entry %32 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 0 +Round 1 +Round end + load (1.000000e+00) from i32* %2 + load (3.157895e-01) from %"class.std::vector.0"* %4 + load (2.105263e-01) from i64* %3 + store (2.105263e-01) to i64* %3 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %2 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %3 + load: 2.105263e-01 store: 2.105263e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %4 + load: 3.157895e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z17distLouvainMethodiiRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ddRi +Round 0 + alias entry %68 = getelementptr inbounds %class.Graph, %class.Graph* %2, i64 0, i32 2, !dbg !11180 + alias entry %85 = getelementptr inbounds %class.Graph, %class.Graph* %2, i64 0, i32 0, i32 0, i32 0, i32 1, !dbg !11380 + alias entry %86 = bitcast i64** %85 to i64*, !dbg !11380 + alias entry %88 = bitcast %class.Graph* %2 to i64*, !dbg !11384 + alias entry %93 = bitcast %class.Graph* %2 to i8**, !dbg !11392 + alias entry %98 = getelementptr inbounds %class.Graph, %class.Graph* %2, i64 0, i32 6, !dbg !11399 + alias entry %99 = getelementptr inbounds %class.Graph, %class.Graph* %2, i64 0, i32 6, i32 0, i32 0, i32 1, !dbg !11402 + alias entry %100 = bitcast i64** %99 to i64*, !dbg !11402 + alias entry %102 = bitcast %"class.std::vector.0"* %98 to i64*, !dbg !11403 + alias entry %107 = bitcast %"class.std::vector.0"* %98 to i8**, !dbg !11410 + alias entry %112 = getelementptr inbounds %class.Graph, %class.Graph* %2, i64 0, i32 1, !dbg !11417 + alias entry %113 = getelementptr inbounds %class.Graph, %class.Graph* %2, i64 0, i32 1, i32 0, i32 0, i32 1, !dbg !11424 + alias entry %114 = bitcast %struct.Edge** %113 to i64*, !dbg !11424 + alias entry %116 = bitcast %"class.std::vector.5"* %112 to i64*, !dbg !11428 + alias entry %121 = bitcast %"class.std::vector.5"* %112 to i8**, !dbg !11440 +Round 1 +Round end + load (9.999981e-01) from %class.Graph* %2 +Warning: wrong traversal order, or recursive call +On function .omp_outlined..45 +Round 0 +Round end + call (1.058333e+01, 2.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00) using i64* %5 + call (1.058333e+01, 1.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00) using i64* %6 + call (1.058333e+01, 1.721875e+01, 0.000000e+00, 0.000000e+00, 0.000000e+00) using %struct.Edge* %7 + call (1.058333e+01, 8.992188e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00) using i64* %8 + call (1.058333e+01, 0.000000e+00, 1.000000e+00, 0.000000e+00, 0.000000e+00) using i64* %9 + call (1.058333e+01, 7.500000e-01, 0.000000e+00, 0.000000e+00, 0.000000e+00) using double* %10 + call (1.058333e+01, 2.121875e+01, 0.000000e+00, 0.000000e+00, 0.000000e+00) using %struct.Comm* %11 + call (1.058333e+01, 5.000000e-01, 0.000000e+00, 0.000000e+00, 0.000000e+00) using %struct.Comm* %12 + call (1.058333e+01, 5.000000e-01, 5.000000e-01, 0.000000e+00, 0.000000e+00) using double* %14 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %5 + load: 2.116667e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %6 + load: 1.058333e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Edge* %7 + load: 1.822318e+02 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %8 + load: 9.516732e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %9 + load: 0.000000e+00 store: 1.058333e+01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %10 + load: 7.937500e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %11 + load: 2.245651e+02 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %12 + load: 5.291667e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %14 + load: 5.291667e+00 store: 5.291667e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..46 +Round 0 +Round end + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %3 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %4 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Edge* %5 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %6 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %7 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %8 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %9 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %10 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %12 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..49 +Round 0 + alias entry %28 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 0 +Round 1 +Round end + load (3.200000e-01) from %"class.std::vector.0"* %3 + load (3.200000e-01) from i64** %4 + load (3.200000e-01) from i64** %5 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %3 + load: 3.200000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64** %4 + load: 3.200000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64** %5 + load: 3.200000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function main +Round 0 + base alias entry %14 = alloca i8**, align 8 + alias entry %33 = load i8**, i8*** %14, align 8, !dbg !10342, !tbaa !10335 +Round 1 +Round end + Frequency of i8** %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZN11GenerateRGGC2ElP19ompi_communicator_t +Round 0 + alias entry %4 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 0, !dbg !10266 + alias entry %5 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 3, !dbg !10276 + base alias entry %5 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 3, !dbg !10276 + alias entry %6 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 5, !dbg !10279 + alias entry %8 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %5, align 8, !dbg !10281, !tbaa !10278 + alias entry %9 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 4, !dbg !10282 + alias entry %11 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 7, !dbg !10284 + alias entry %12 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 6, !dbg !10287 + alias entry %36 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 1, !dbg !10320 + alias entry %101 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %5, align 8, !dbg !10478, !tbaa !10278 + alias entry %172 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %5, align 8, !dbg !10565, !tbaa !10278 + alias entry %184 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 2, !dbg !10579 + alias entry %191 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %5, align 8, !dbg !10583, !tbaa !10278 +Round 1 +Round end + store (1.000000e+00) to %class.GenerateRGG* %0 + store (1.000000e+00) to %class.GenerateRGG* %0 + load (1.000000e+00) from %class.GenerateRGG* %0 + store (1.000000e+00) to %class.GenerateRGG* %0 + store (1.000000e+00) to %class.GenerateRGG* %0 + load (1.000000e+00) from %class.GenerateRGG* %0 + load (5.000000e-01) from %class.GenerateRGG* %0 + store (2.500000e-01) to %class.GenerateRGG* %0 + store (3.437500e-01) to %class.GenerateRGG* %0 + store (2.500000e-01) to %class.GenerateRGG* %0 + load (1.000000e+00) from %class.GenerateRGG* %0 + store (1.000000e+00) to %class.GenerateRGG* %0 + load (6.250000e-01) from %class.GenerateRGG* %0 + load (6.250000e-01) from %class.GenerateRGG* %0 + load (6.250000e-01) from %class.GenerateRGG* %0 + load (7.656250e-01) from %class.GenerateRGG* %0 + load (7.656250e-01) from %class.GenerateRGG* %0 + store (1.000000e+00) to %class.GenerateRGG* %0 + load (1.000000e+00) from %class.GenerateRGG* %0 + load (1.000000e+00) from %class.GenerateRGG* %0 + Frequency of %class.GenerateRGG* %0 + load: 8.906250e+00 store: 6.843750e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.ompi_communicator_t* %2 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZN11GenerateRGG8generateEbbi +Round 0 + alias entry %27 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 1, !dbg !10306 + alias entry %75 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 6, !dbg !10592 + alias entry %112 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 7, !dbg !10709 + alias entry %153 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 0, !dbg !10828 + alias entry %156 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 4, !dbg !10832 + alias entry %160 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 5, !dbg !10836 + alias entry %430 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 3, !dbg !10915 + alias entry %819 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 3, !dbg !11101 + alias entry %895 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 2 + alias entry %1233 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 2 + alias entry %1536 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 2 +Round 1 +Round end + load (1.000000e+00) from %class.GenerateRGG* %0 + load (6.249994e-01) from %class.GenerateRGG* %0 + load (9.999990e-01) from %class.GenerateRGG* %0 + load (4.999995e-01) from %class.GenerateRGG* %0 + load (3.124994e-01) from %class.GenerateRGG* %0 + load (9.999985e-01) from %class.GenerateRGG* %0 + load (4.999993e-01) from %class.GenerateRGG* %0 + load (3.124992e-01) from %class.GenerateRGG* %0 + load (9.999971e-01) from %class.GenerateRGG* %0 + load (9.999971e-01) from %class.GenerateRGG* %0 + load (9.999962e-01) from %class.GenerateRGG* %0 + load (9.999962e-01) from %class.GenerateRGG* %0 + load (4.999966e-01) from %class.GenerateRGG* %0 + load (4.999971e-01) from %class.GenerateRGG* %0 + load (4.999971e-01) from %class.GenerateRGG* %0 + load (4.999966e-01) from %class.GenerateRGG* %0 + load (9.999923e-01) from %class.GenerateRGG* %0 + load (9.999914e-01) from %class.GenerateRGG* %0 + load (3.749968e-01) from %class.GenerateRGG* %0 + load (3.749964e-01) from %class.GenerateRGG* %0 + load (9.999890e-01) from %class.GenerateRGG* %0 + load (9.998746e-01) from %class.GenerateRGG* %0 + load (3.199362e+02) from %class.GenerateRGG* %0 + load (3.199361e+02) from %class.GenerateRGG* %0 + load (6.249210e-01) from %class.GenerateRGG* %0 + load (6.249210e-01) from %class.GenerateRGG* %0 + load (6.249210e-01) from %class.GenerateRGG* %0 + load (9.998736e-01) from %class.GenerateRGG* %0 + load (9.998736e-01) from %class.GenerateRGG* %0 + load (9.998736e-01) from %class.GenerateRGG* %0 + load (9.998736e-01) from %class.GenerateRGG* %0 + load (9.998726e-01) from %class.GenerateRGG* %0 + load (9.998726e-01) from %class.GenerateRGG* %0 + load (9.998726e-01) from %class.GenerateRGG* %0 + load (9.998726e-01) from %class.GenerateRGG* %0 + load (9.998717e-01) from %class.GenerateRGG* %0 + load (9.998717e-01) from %class.GenerateRGG* %0 + load (9.998717e-01) from %class.GenerateRGG* %0 + load (9.998717e-01) from %class.GenerateRGG* %0 + load (9.998707e-01) from %class.GenerateRGG* %0 + load (9.998707e-01) from %class.GenerateRGG* %0 + load (9.998707e-01) from %class.GenerateRGG* %0 + load (9.998707e-01) from %class.GenerateRGG* %0 + load (9.998698e-01) from %class.GenerateRGG* %0 + load (4.999349e-01) from %class.GenerateRGG* %0 + load (2.499674e-01) from %class.GenerateRGG* %0 + load (7.997451e+01) from %class.GenerateRGG* %0 + load (3.998725e+01) from %class.GenerateRGG* %0 + load (3.998725e+01) from %class.GenerateRGG* %0 + load (7.997448e+01) from %class.GenerateRGG* %0 + load (4.999063e-01) from %class.GenerateRGG* %0 + load (2.499531e-01) from %class.GenerateRGG* %0 + load (7.996993e+01) from %class.GenerateRGG* %0 + load (3.998497e+01) from %class.GenerateRGG* %0 + load (3.998497e+01) from %class.GenerateRGG* %0 + load (7.996991e+01) from %class.GenerateRGG* %0 + load (9.998126e-01) from %class.GenerateRGG* %0 + load (9.998116e-01) from %class.GenerateRGG* %0 + load (9.998116e-01) from %class.GenerateRGG* %0 + load (9.998116e-01) from %class.GenerateRGG* %0 + load (9.998107e-01) from %class.GenerateRGG* %0 + load (9.998107e-01) from %class.GenerateRGG* %0 + load (9.998107e-01) from %class.GenerateRGG* %0 + load (9.998091e-01) from %class.GenerateRGG* %0 + load (9.998091e-01) from %class.GenerateRGG* %0 + load (9.998091e-01) from %class.GenerateRGG* %0 + load (9.998082e-01) from %class.GenerateRGG* %0 + load (9.998082e-01) from %class.GenerateRGG* %0 + load (9.998082e-01) from %class.GenerateRGG* %0 + load (9.998072e-01) from %class.GenerateRGG* %0 + load (9.998015e-01) from %class.GenerateRGG* %0 + load (6.248724e-01) from %class.GenerateRGG* %0 + load (6.248718e-01) from %class.GenerateRGG* %0 + load (1.952724e-01) from %class.GenerateRGG* %0 + load (3.905445e-01) from %class.GenerateRGG* %0 + load (3.905442e-01) from %class.GenerateRGG* %0 + load (6.248393e-01) from %class.GenerateRGG* %0 + load (1.249644e+01) from %class.GenerateRGG* %0 + load (1.249643e+01) from %class.GenerateRGG* %0 + load (1.171538e+00) from %class.GenerateRGG* %0 + load (5.857690e-01) from %class.GenerateRGG* %0 + load (2.928845e-01) from %class.GenerateRGG* %0 + load (1.464422e-01) from %class.GenerateRGG* %0 + load (6.248387e-01) from %class.GenerateRGG* %0 + load (6.248381e-01) from %class.GenerateRGG* %0 + load (1.249638e+01) from %class.GenerateRGG* %0 + load (6.248253e-01) from %class.GenerateRGG* %0 + load (3.905154e-01) from %class.GenerateRGG* %0 + load (2.440719e-01) from %class.GenerateRGG* %0 + load (6.248247e-01) from %class.GenerateRGG* %0 + load (4.881438e+00) from %class.GenerateRGG* %0 + load (9.997431e-01) from %class.GenerateRGG* %0 + load (9.997421e-01) from %class.GenerateRGG* %0 + load (9.997406e-01) from %class.GenerateRGG* %0 + load (9.997406e-01) from %class.GenerateRGG* %0 + load (6.248378e-01) from %class.GenerateRGG* %0 + load (1.999481e+01) from %class.GenerateRGG* %0 + load (9.997388e-01) from %class.GenerateRGG* %0 + load (9.997385e-01) from %class.GenerateRGG* %0 + Frequency of %class.GenerateRGG* %0 + load: 1.248245e+03 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZN14BinaryEdgeList4readEiiiSs +Round 0 + alias entry %39 = getelementptr inbounds %class.BinaryEdgeList, %class.BinaryEdgeList* %0, i64 0, i32 4, !dbg !10380 + alias entry %41 = getelementptr inbounds %"class.std::basic_string", %"class.std::basic_string"* %4, i64 0, i32 0, i32 0, !dbg !10388 + alias entry %99 = getelementptr inbounds %class.BinaryEdgeList, %class.BinaryEdgeList* %0, i64 0, i32 0, !dbg !10514 + alias entry %100 = bitcast %class.BinaryEdgeList* %0 to i8*, !dbg !10515 + alias entry %104 = getelementptr inbounds %class.BinaryEdgeList, %class.BinaryEdgeList* %0, i64 0, i32 1, !dbg !10518 + alias entry %105 = bitcast i64* %104 to i8*, !dbg !10519 + alias entry %118 = getelementptr inbounds %class.BinaryEdgeList, %class.BinaryEdgeList* %0, i64 0, i32 2, !dbg !10532 + alias entry %183 = getelementptr inbounds %class.BinaryEdgeList, %class.BinaryEdgeList* %0, i64 0, i32 3, !dbg !10605 +Round 1 +Round end + load (9.999971e-01) from %class.BinaryEdgeList* %0 + load (9.999971e-01) from %"class.std::basic_string"* %4 + load (6.249948e-01) from %class.BinaryEdgeList* %0 + load (9.999905e-01) from %class.BinaryEdgeList* %0 + store (9.999905e-01) to %class.BinaryEdgeList* %0 + load (9.999895e-01) from %class.BinaryEdgeList* %0 + load (9.999886e-01) from %class.BinaryEdgeList* %0 + load (9.999886e-01) from %class.BinaryEdgeList* %0 + load (9.999729e-01) from %class.BinaryEdgeList* %0 + store (9.999729e-01) to %class.BinaryEdgeList* %0 + load (9.999714e-01) from %class.BinaryEdgeList* %0 + load (9.999714e-01) from %class.BinaryEdgeList* %0 + load (9.999547e-01) from %class.BinaryEdgeList* %0 + load (1.999909e+01) from %class.BinaryEdgeList* %0 + Frequency of %class.BinaryEdgeList* %0 + load: 2.962391e+01 store: 1.999963e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::basic_string"* %4 + load: 9.999971e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZNSt8_Rb_treeIlSt4pairIKl4CommESt10_Select1stIS3_ESt4lessIlESaIS3_EE8_M_eraseEPSt13_Rb_tree_nodeIS3_E +Round 0 +Round end +Warning: wrong traversal order, or recursive call +On function _ZN5GraphC2EllllP19ompi_communicator_t +Round 0 + alias entry %8 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 0, !dbg !10272 + alias entry %9 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 1, !dbg !10272 + alias entry %10 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 2, !dbg !10309 + alias entry %11 = bitcast %class.Graph* %0 to i8*, !dbg !10309 + alias entry %12 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 3, !dbg !10320 + alias entry %13 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 4, !dbg !10322 + alias entry %14 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 5, !dbg !10324 + alias entry %15 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, !dbg !10272 + alias entry %16 = bitcast %"class.std::vector.0"* %15 to i8*, !dbg !10332 + alias entry %17 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !10334 + base alias entry %17 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !10334 + alias entry %18 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 9, !dbg !10336 + alias entry %21 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %17, align 8, !dbg !10338, !tbaa !10335 + alias entry %22 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 8, !dbg !10339 + alias entry %28 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 0, i32 0, i32 0, i32 1, !dbg !10361 + alias entry %29 = bitcast i64** %28 to i64*, !dbg !10361 + alias entry %31 = bitcast %class.Graph* %0 to i64*, !dbg !10365 + alias entry %45 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 1, i32 0, i32 0, i32 1, !dbg !10416 + alias entry %46 = bitcast %struct.Edge** %45 to i64*, !dbg !10416 + alias entry %48 = bitcast %"class.std::vector.5"* %9 to i64*, !dbg !10420 + alias entry %64 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 1, !dbg !10455 + alias entry %65 = bitcast i64** %64 to i64*, !dbg !10455 + alias entry %67 = bitcast %"class.std::vector.0"* %15 to i64*, !dbg !10456 + alias entry %76 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %15, i64 0, i32 0, i32 0, i32 0 + alias entry %111 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %15, i64 0, i32 0, i32 0, i32 0, !dbg !10511 + alias entry %117 = getelementptr inbounds %"class.std::vector.5", %"class.std::vector.5"* %9, i64 0, i32 0, i32 0, i32 0, !dbg !10547 + alias entry %123 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 0, i32 0, i32 0, i32 0, !dbg !10576 +Round 1 +Round end + store (1.000000e+00) to %class.Graph* %0 + store (1.000000e+00) to %class.Graph* %0 + store (1.000000e+00) to %class.Graph* %0 + store (1.000000e+00) to %class.Graph* %0 + store (1.000000e+00) to %class.Graph* %0 + load (9.999990e-01) from %class.Graph* %0 + load (9.999980e-01) from %class.Graph* %0 + load (9.999980e-01) from %class.Graph* %0 + load (9.999980e-01) from %class.Graph* %0 +Warning: wrong traversal order, or recursive call +On function _ZN3LCGC2EjPdlP19ompi_communicator_t +Round 0 + alias entry %6 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 3, !dbg !10268 + alias entry %7 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 4, !dbg !10277 + alias entry %8 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 6, !dbg !10279 + base alias entry %8 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 6, !dbg !10279 + alias entry %9 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 7, !dbg !10281 + alias entry %10 = bitcast %"class.std::vector.0"* %9 to i8*, !dbg !10300 + alias entry %11 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 0, !dbg !10302 + base alias entry %11 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 0, !dbg !10302 + alias entry %12 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 1, !dbg !10306 + alias entry %15 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %11, align 8, !dbg !10308, !tbaa !10305 + alias entry %16 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 2, !dbg !10309 + alias entry %20 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 7, i32 0, i32 0, i32 1, !dbg !10326 + alias entry %21 = bitcast i64** %20 to i64*, !dbg !10326 + alias entry %23 = bitcast %"class.std::vector.0"* %9 to i64*, !dbg !10330 + alias entry %42 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 5, !dbg !10359 + alias entry %45 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %9, i64 0, i32 0, i32 0, i32 0, !dbg !10374 + alias entry %52 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 5, !dbg !10399 + alias entry %53 = bitcast i64* %52 to i8*, !dbg !10400 + alias entry %54 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %11, align 8, !dbg !10401, !tbaa !10305 +Round 1 +Round end + store (1.000000e+00) to %class.LCG* %0 + store (1.000000e+00) to %class.LCG* %0 + store (1.000000e+00) to %class.LCG* %0 + store (1.000000e+00) to %class.LCG* %0 + load (9.999989e-01) from %class.LCG* %0 + load (9.999982e-01) from %class.LCG* %0 + load (9.999982e-01) from %class.LCG* %0 + load (9.999982e-01) from %class.LCG* %0 +Warning: wrong traversal order, or recursive call +On function _ZNSt24uniform_int_distributionIiEclISt26linear_congruential_engineImLm16807ELm0ELm2147483647EEEEiRT_RKNS0_10param_typeE +Round 0 + alias entry %5 = getelementptr inbounds %"struct.std::uniform_int_distribution::param_type", %"struct.std::uniform_int_distribution::param_type"* %2, i64 0, i32 1, !dbg !10267 + alias entry %8 = getelementptr inbounds %"struct.std::uniform_int_distribution::param_type", %"struct.std::uniform_int_distribution::param_type"* %2, i64 0, i32 0, !dbg !10279 + alias entry %19 = getelementptr inbounds %"class.std::linear_congruential_engine", %"class.std::linear_congruential_engine"* %1, i64 0, i32 0 + alias entry %37 = getelementptr inbounds %"class.std::linear_congruential_engine", %"class.std::linear_congruential_engine"* %1, i64 0, i32 0 + alias entry %51 = getelementptr inbounds %"class.std::linear_congruential_engine", %"class.std::linear_congruential_engine"* %1, i64 0, i32 0, !dbg !10376 +Round 1 +Round end + load (1.000000e+00) from %"struct.std::uniform_int_distribution::param_type"* %2 + load (1.000000e+00) from %"struct.std::uniform_int_distribution::param_type"* %2 + load (5.000000e-01) from %"class.std::linear_congruential_engine"* %1 + store (5.000000e-01) to %"class.std::linear_congruential_engine"* %1 +Warning: wrong traversal order, or recursive call +On function _ZNSt6vectorIlSaIlEEaSERKS1_ +Round 0 + alias entry %5 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %1, i64 0, i32 0, i32 0, i32 1, !dbg !10278 + alias entry %6 = bitcast i64** %5 to i64*, !dbg !10278 + alias entry %8 = bitcast %"class.std::vector.0"* %1 to i64*, !dbg !10285 + alias entry %12 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10294 + alias entry %13 = bitcast i64** %12 to i64*, !dbg !10294 + alias entry %15 = bitcast %"class.std::vector.0"* %0 to i64*, !dbg !10296 + alias entry %33 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10459 + alias entry %41 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10490 + alias entry %42 = bitcast i64** %41 to i64*, !dbg !10490 + alias entry %53 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %1, i64 0, i32 0, i32 0, i32 0, !dbg !10573 + alias entry %73 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10633 + alias entry %76 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10635 +Round 1 +Round end + load (6.250000e-01) from %"class.std::vector.0"* %1 + load (6.250000e-01) from %"class.std::vector.0"* %1 + load (6.250000e-01) from %"class.std::vector.0"* %0 + load (6.250000e-01) from %"class.std::vector.0"* %0 + load (3.125000e-01) from %"class.std::vector.0"* %0 + store (3.125000e-01) to %"class.std::vector.0"* %0 + store (3.125000e-01) to %"class.std::vector.0"* %0 + load (3.125000e-01) from %"class.std::vector.0"* %0 + load (9.765625e-02) from %"class.std::vector.0"* %1 + load (9.765625e-02) from %"class.std::vector.0"* %0 + load (9.765625e-02) from %"class.std::vector.0"* %0 + load (9.765625e-02) from %"class.std::vector.0"* %1 + load (6.250000e-01) from %"class.std::vector.0"* %0 + store (6.250000e-01) to %"class.std::vector.0"* %0 + Frequency of %"class.std::vector.0"* %0 + load: 2.695312e+00 store: 1.250000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %1 + load: 1.445312e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZNSt6vectorIlSaIlEE14_M_fill_insertEN9__gnu_cxx17__normal_iteratorIPlS1_EEmRKl +Round 0 + alias entry %8 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10281 + alias entry %9 = bitcast i64** %8 to i64*, !dbg !10281 + alias entry %11 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10288 + alias entry %12 = bitcast i64** %11 to i64*, !dbg !10288 + alias entry %632 = bitcast %"class.std::vector.0"* %0 to i64*, !dbg !10728 + alias entry %848 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10820 +Round 1 +Round end + load (6.250000e-01) from %"class.std::vector.0"* %0 + load (6.250000e-01) from %"class.std::vector.0"* %0 + load (3.125000e-01) from i64* %3 + load (9.765625e-02) from %"class.std::vector.0"* %0 + store (1.562500e-01) to %"class.std::vector.0"* %0 + store (1.562500e-01) to %"class.std::vector.0"* %0 + load (9.765625e-02) from %"class.std::vector.0"* %0 + store (1.562500e-01) to %"class.std::vector.0"* %0 + load (3.125000e-01) from %"class.std::vector.0"* %0 + load (3.125000e-01) from i64* %3 + load (3.125000e-01) from %"class.std::vector.0"* %0 + load (3.125000e-01) from %"class.std::vector.0"* %0 + store (3.125000e-01) to %"class.std::vector.0"* %0 + store (3.125000e-01) to %"class.std::vector.0"* %0 + store (3.125000e-01) to %"class.std::vector.0"* %0 + Frequency of %"class.std::vector.0"* %0 + load: 2.382812e+00 store: 1.406250e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %3 + load: 6.250000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZNSt6vectorI4EdgeSaIS0_EE17_M_default_appendEm +Round 0 + alias entry %5 = getelementptr inbounds %"class.std::vector.5", %"class.std::vector.5"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10274 + alias entry %6 = bitcast %struct.Edge** %5 to i64*, !dbg !10274 + alias entry %8 = getelementptr inbounds %"class.std::vector.5", %"class.std::vector.5"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10281 + alias entry %87 = bitcast %"class.std::vector.5"* %0 to i64*, !dbg !10375 + alias entry %108 = getelementptr inbounds %"class.std::vector.5", %"class.std::vector.5"* %0, i64 0, i32 0, i32 0, i32 0 + alias entry %115 = getelementptr inbounds %"class.std::vector.5", %"class.std::vector.5"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10431 +Round 1 +Round end + load (6.250000e-01) from %"class.std::vector.5"* %0 + load (6.250000e-01) from %"class.std::vector.5"* %0 + load (3.125000e-01) from %"class.std::vector.5"* %0 + load (1.953125e-01) from %"class.std::vector.5"* %0 + load (1.953125e-01) from %"class.std::vector.5"* %0 + load (3.125000e-01) from %"class.std::vector.5"* %0 + store (3.125000e-01) to %"class.std::vector.5"* %0 + store (3.125000e-01) to %"class.std::vector.5"* %0 + Frequency of %"class.std::vector.5"* %0 + load: 2.265625e+00 store: 6.250000e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZN3LCG18parallel_prefix_opEv +Round 0 + alias entry %10 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 4, !dbg !10283 + alias entry %169 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 1, !dbg !10361 + alias entry %175 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 2, !dbg !10269 + alias entry %179 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 0 + alias entry %188 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 5, !dbg !10372 + alias entry %252 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 7, i32 0, i32 0, i32 0, !dbg !10372 +Round 1 +Round end + load (1.000000e+00) from %class.LCG* %0 + load (1.000000e+00) from %class.LCG* %0 + load (1.000000e+00) from %class.LCG* %0 + load (1.000000e+00) from %class.LCG* %0 + load (2.005882e+01) from %class.LCG* %0 + load (2.005882e+01) from %class.LCG* %0 + load (2.005882e+01) from %class.LCG* %0 + load (2.005882e+01) from %class.LCG* %0 + load (1.000000e+00) from %class.LCG* %0 + Frequency of %class.LCG* %0 + load: 8.523529e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZNSt6vectorI9EdgeTupleSaIS0_EE17_M_default_appendEm +Round 0 + alias entry %5 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10273 + alias entry %6 = bitcast %struct.EdgeTuple** %5 to i64*, !dbg !10273 + alias entry %8 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10280 + alias entry %65 = bitcast %"class.std::vector.84"* %0 to i64*, !dbg !10369 + alias entry %86 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 0 + alias entry %93 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10425 +Round 1 +Round end + load (6.250000e-01) from %"class.std::vector.84"* %0 + load (6.250000e-01) from %"class.std::vector.84"* %0 + load (3.125000e-01) from %"class.std::vector.84"* %0 + load (3.125000e-01) from %"class.std::vector.84"* %0 + load (1.953125e-01) from %"class.std::vector.84"* %0 + load (1.953125e-01) from %"class.std::vector.84"* %0 + load (3.125000e-01) from %"class.std::vector.84"* %0 + store (3.125000e-01) to %"class.std::vector.84"* %0 + store (3.125000e-01) to %"class.std::vector.84"* %0 + Frequency of %"class.std::vector.84"* %0 + load: 2.578125e+00 store: 6.250000e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZSt9__find_ifIN9__gnu_cxx17__normal_iteratorIP9EdgeTupleSt6vectorIS2_SaIS2_EEEEZN11GenerateRGG8generateEbbiEUlRKS2_E_ET_SC_SC_T0_St26random_access_iterator_tag +Round 0 +Round end +On function _ZNSt6vectorI9EdgeTupleSaIS0_EE15_M_range_insertIN9__gnu_cxx17__normal_iteratorIPS0_S2_EEEEvS7_T_S8_St20forward_iterator_tag +Round 0 + alias entry %13 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10344 + alias entry %14 = bitcast %struct.EdgeTuple** %13 to i64*, !dbg !10344 + alias entry %16 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10351 + alias entry %17 = bitcast %struct.EdgeTuple** %16 to i64*, !dbg !10351 + alias entry %120 = bitcast %"class.std::vector.84"* %0 to i64*, !dbg !10799 + alias entry %141 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 0 + alias entry %146 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10851 +Round 1 +Round end + load (6.250000e-01) from %"class.std::vector.84"* %0 + load (6.250000e-01) from %"class.std::vector.84"* %0 + load (9.765625e-02) from %"class.std::vector.84"* %0 + store (1.562500e-01) to %"class.std::vector.84"* %0 + load (9.765625e-02) from %"class.std::vector.84"* %0 + store (1.562500e-01) to %"class.std::vector.84"* %0 + load (9.765625e-02) from %"class.std::vector.84"* %0 + store (1.562500e-01) to %"class.std::vector.84"* %0 + load (3.125000e-01) from %"class.std::vector.84"* %0 + load (1.953125e-01) from %"class.std::vector.84"* %0 + load (3.125000e-01) from %"class.std::vector.84"* %0 + load (3.125000e-01) from %"class.std::vector.84"* %0 + store (3.125000e-01) to %"class.std::vector.84"* %0 + store (3.125000e-01) to %"class.std::vector.84"* %0 + store (3.125000e-01) to %"class.std::vector.84"* %0 + Frequency of %"class.std::vector.84"* %0 + load: 2.675781e+00 store: 1.406250e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZSt16__introsort_loopIN9__gnu_cxx17__normal_iteratorIP9EdgeTupleSt6vectorIS2_SaIS2_EEEElZN11GenerateRGG8generateEbbiEUlRKS2_SA_E_EvT_SC_T0_T1_ +Round 0 +Round end +On function _ZSt22__final_insertion_sortIN9__gnu_cxx17__normal_iteratorIP9EdgeTupleSt6vectorIS2_SaIS2_EEEEZN11GenerateRGG8generateEbbiEUlRKS2_SA_E_EvT_SC_T0_ +Round 0 +Round end +On function _ZSt13__heap_selectIN9__gnu_cxx17__normal_iteratorIP9EdgeTupleSt6vectorIS2_SaIS2_EEEEZN11GenerateRGG8generateEbbiEUlRKS2_SA_E_EvT_SC_SC_T0_ +Round 0 +Round end +On function _ZSt13__adjust_heapIN9__gnu_cxx17__normal_iteratorIP9EdgeTupleSt6vectorIS2_SaIS2_EEEElS2_ZN11GenerateRGG8generateEbbiEUlRKS2_SA_E_EvT_T0_SD_T1_T2_ +Round 0 +Round end +On function _ZSt22__move_median_to_firstIN9__gnu_cxx17__normal_iteratorIP9EdgeTupleSt6vectorIS2_SaIS2_EEEEZN11GenerateRGG8generateEbbiEUlRKS2_SA_E_EvT_SC_SC_SC_T0_ +Round 0 +Round end +On function _ZNSt6vectorIlSaIlEE17_M_default_appendEm +Round 0 + alias entry %5 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10274 + alias entry %6 = bitcast i64** %5 to i64*, !dbg !10274 + alias entry %8 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10281 + alias entry %20 = bitcast i64** %8 to i64*, !dbg !10380 + alias entry %21 = bitcast %"class.std::vector.0"* %0 to i64*, !dbg !10381 + alias entry %42 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 0 + alias entry %65 = bitcast %"class.std::vector.0"* %0 to i8**, !dbg !10628 +Round 1 +Round end + load (6.250000e-01) from %"class.std::vector.0"* %0 + load (6.250000e-01) from %"class.std::vector.0"* %0 + load (3.125000e-01) from %"class.std::vector.0"* %0 + load (3.125000e-01) from %"class.std::vector.0"* %0 + load (1.953125e-01) from %"class.std::vector.0"* %0 + load (1.953125e-01) from %"class.std::vector.0"* %0 + store (3.125000e-01) to %"class.std::vector.0"* %0 + store (3.125000e-01) to %"class.std::vector.0"* %0 + Frequency of %"class.std::vector.0"* %0 + load: 2.265625e+00 store: 6.250000e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZNSt6vectorIdSaIdEE17_M_default_appendEm +Round 0 + alias entry %5 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10274 + alias entry %6 = bitcast double** %5 to i64*, !dbg !10274 + alias entry %8 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10281 + alias entry %20 = bitcast double** %8 to i64*, !dbg !10381 + alias entry %21 = bitcast %"class.std::vector.10"* %0 to i64*, !dbg !10382 + alias entry %42 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %0, i64 0, i32 0, i32 0, i32 0 + alias entry %65 = bitcast %"class.std::vector.10"* %0 to i8**, !dbg !10630 +Round 1 +Round end + load (6.250000e-01) from %"class.std::vector.10"* %0 + load (6.250000e-01) from %"class.std::vector.10"* %0 + load (3.125000e-01) from %"class.std::vector.10"* %0 + load (3.125000e-01) from %"class.std::vector.10"* %0 + load (1.953125e-01) from %"class.std::vector.10"* %0 + load (1.953125e-01) from %"class.std::vector.10"* %0 + store (3.125000e-01) to %"class.std::vector.10"* %0 + store (3.125000e-01) to %"class.std::vector.10"* %0 + Frequency of %"class.std::vector.10"* %0 + load: 2.265625e+00 store: 6.250000e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZNSt6vectorI4CommSaIS0_EE17_M_default_appendEm +Round 0 + alias entry %5 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10460 + alias entry %6 = bitcast %struct.Comm** %5 to i64*, !dbg !10460 + alias entry %8 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10467 + alias entry %20 = bitcast %"class.std::vector.15"* %0 to i64*, !dbg !10551 + alias entry %41 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %0, i64 0, i32 0, i32 0, i32 0 + alias entry %48 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10607 +Round 1 +Round end + load (6.250000e-01) from %"class.std::vector.15"* %0 + load (6.250000e-01) from %"class.std::vector.15"* %0 + load (3.125000e-01) from %"class.std::vector.15"* %0 + load (3.125000e-01) from %"class.std::vector.15"* %0 + load (1.953125e-01) from %"class.std::vector.15"* %0 + load (1.953125e-01) from %"class.std::vector.15"* %0 + load (3.125000e-01) from %"class.std::vector.15"* %0 + store (3.125000e-01) to %"class.std::vector.15"* %0 + store (3.125000e-01) to %"class.std::vector.15"* %0 + Frequency of %"class.std::vector.15"* %0 + load: 2.578125e+00 store: 6.250000e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZNSt27__uninitialized_default_n_1ILb0EE18__uninit_default_nIPSt13unordered_setIlSt4hashIlESt8equal_toIlESaIlEEmEEvT_T0_ +Round 0 +Round end + Frequency of %"class.std::unordered_set"* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZNSt10_HashtableIlSt4pairIKllESaIS2_ENSt8__detail10_Select1stESt8equal_toIlESt4hashIlENS4_18_Mod_range_hashingENS4_20_Default_ranged_hashENS4_20_Prime_rehash_policyENS4_17_Hashtable_traitsILb0ELb0ELb1EEEE21_M_insert_unique_nodeEmmPNS4_10_Hash_nodeIS2_Lb0EEE +Round 0 + alias entry %5 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 4, !dbg !10268 + alias entry %6 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 4, i32 1, !dbg !10275 + alias entry %8 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 1, !dbg !10282 + alias entry %10 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 3, !dbg !10288 + alias entry %17 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 0 + alias entry %29 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 2, i32 0, !dbg !10428 + alias entry %30 = bitcast %"struct.std::__detail::_Hash_node_base"* %29 to %"struct.std::__detail::_Hash_node"**, !dbg !10429 + alias entry %32 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %29, i64 0, i32 0, !dbg !10432 + alias entry %35 = bitcast %"struct.std::__detail::_Hash_node_base"* %29 to i64* + base alias entry %44 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %28, i64 %43, !dbg !10509 + alias entry %61 = load %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %44, align 8, !dbg !10529, !tbaa !10511 + alias entry %62 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %61, i64 0, i32 0, !dbg !10530 + alias entry %77 = bitcast %"class.std::_Hashtable"* %0 to i8**, !dbg !10550 + alias entry %83 = bitcast %"struct.std::__detail::_Hash_node"* %3 to i8*, !dbg !10618 + alias entry %87 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 0, !dbg !10296 + alias entry %94 = getelementptr inbounds %"struct.std::__detail::_Hash_node", %"struct.std::__detail::_Hash_node"* %3, i64 0, i32 0, !dbg !10627 + alias entry %95 = bitcast %"struct.std::__detail::_Hash_node"* %3 to i64*, !dbg !10628 + base alias entry %97 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %96, i64 0, i32 0, !dbg !10630 + alias entry %99 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 2, i32 0, !dbg !10639 + alias entry %100 = bitcast %"struct.std::__detail::_Hash_node_base"* %99 to i64*, !dbg !10640 + alias entry %102 = getelementptr inbounds %"struct.std::__detail::_Hash_node", %"struct.std::__detail::_Hash_node"* %3, i64 0, i32 0, !dbg !10641 + alias entry %103 = getelementptr inbounds %"struct.std::__detail::_Hash_node", %"struct.std::__detail::_Hash_node"* %3, i64 0, i32 0, i32 0, !dbg !10641 + alias entry %104 = bitcast %"struct.std::__detail::_Hash_node"* %3 to i64*, !dbg !10642 + alias entry %105 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %99, i64 0, i32 0, !dbg !10645 + base alias entry %105 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %99, i64 0, i32 0, !dbg !10645 + base alias entry %114 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %85, i64 %113, !dbg !10676 + base alias entry %118 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %117, i64 %86, !dbg !10678 +Round 1 +Warning: the first offset is not constant + alias entry %45 = load %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %44, align 8, !dbg !10509, !tbaa !10511 + alias entry %57 = bitcast %"struct.std::__detail::_Hash_node_base"* %45 to i64*, !dbg !10525 + base alias offset entry (0) %96 = load %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %88, align 8, !dbg !10629, !tbaa !10511 +Warning: the first offset is not constant +Warning: the first offset is not constant +Round 2 +Warning: the first offset is not constant +Warning: the first offset is not constant +Warning: the first offset is not constant +Round end + load (1.000000e+00) from %"class.std::_Hashtable"* %0 + load (1.000000e+00) from %"class.std::_Hashtable"* %0 + load (1.000000e+00) from %"class.std::_Hashtable"* %0 + load (5.000000e-01) from %"class.std::_Hashtable"* %0 + load (4.999995e-01) from %"class.std::_Hashtable"* %0 + store (4.999995e-01) to %"class.std::_Hashtable"* %0 + load (3.749996e+00) from %"class.std::_Hashtable"* %0 + store (3.749996e+00) to %"class.std::_Hashtable"* %0 + load (6.249994e+00) from %"class.std::_Hashtable"* %0 + store (6.249994e+00) to %"class.std::_Hashtable"* %0 + store (4.768372e-07) to %"class.std::_Hashtable"* %0 + load (4.999995e-01) from %"class.std::_Hashtable"* %0 + store (4.999995e-01) to %"class.std::_Hashtable"* %0 + store (4.999995e-01) to %"class.std::_Hashtable"* %0 + store (6.249997e-01) to %"struct.std::__detail::_Hash_node"* %3 + load (3.749998e-01) from %"class.std::_Hashtable"* %0 + store (3.749998e-01) to %"struct.std::__detail::_Hash_node"* %3 + store (3.749998e-01) to %"class.std::_Hashtable"* %0 + load (3.749998e-01) from %"struct.std::__detail::_Hash_node"* %3 + load (2.343749e-01) from %"class.std::_Hashtable"* %0 + load (2.343749e-01) from %"class.std::_Hashtable"* %0 + load (9.999995e-01) from %"class.std::_Hashtable"* %0 + store (9.999995e-01) to %"class.std::_Hashtable"* %0 + Frequency of %"class.std::_Hashtable"* %0 + load: 1.634374e+01 store: 1.287499e+01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"struct.std::__detail::_Hash_node"* %3 + load: 3.749998e-01 store: 9.999995e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZNSt10_HashtableIllSaIlENSt8__detail9_IdentityESt8equal_toIlESt4hashIlENS1_18_Mod_range_hashingENS1_20_Default_ranged_hashENS1_20_Prime_rehash_policyENS1_17_Hashtable_traitsILb0ELb1ELb1EEEE21_M_insert_unique_nodeEmmPNS1_10_Hash_nodeIlLb0EEE +Round 0 + alias entry %5 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 4, !dbg !10268 + alias entry %6 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 4, i32 1, !dbg !10275 + alias entry %8 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 1, !dbg !10282 + alias entry %10 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 3, !dbg !10288 + alias entry %17 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 0 + alias entry %29 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 2, i32 0, !dbg !10428 + alias entry %30 = bitcast %"struct.std::__detail::_Hash_node_base"* %29 to %"struct.std::__detail::_Hash_node.61"**, !dbg !10429 + alias entry %32 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %29, i64 0, i32 0, !dbg !10432 + alias entry %35 = bitcast %"struct.std::__detail::_Hash_node_base"* %29 to i64* + base alias entry %44 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %28, i64 %43, !dbg !10469 + alias entry %61 = load %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %44, align 8, !dbg !10489, !tbaa !10471 + alias entry %62 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %61, i64 0, i32 0, !dbg !10490 + alias entry %77 = bitcast %"class.std::_Hashtable.34"* %0 to i8**, !dbg !10510 + alias entry %83 = bitcast %"struct.std::__detail::_Hash_node.61"* %3 to i8*, !dbg !10578 + alias entry %87 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 0, !dbg !10296 + alias entry %94 = getelementptr inbounds %"struct.std::__detail::_Hash_node.61", %"struct.std::__detail::_Hash_node.61"* %3, i64 0, i32 0, !dbg !10587 + alias entry %95 = bitcast %"struct.std::__detail::_Hash_node.61"* %3 to i64*, !dbg !10588 + base alias entry %97 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %96, i64 0, i32 0, !dbg !10590 + alias entry %99 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 2, i32 0, !dbg !10599 + alias entry %100 = bitcast %"struct.std::__detail::_Hash_node_base"* %99 to i64*, !dbg !10600 + alias entry %102 = getelementptr inbounds %"struct.std::__detail::_Hash_node.61", %"struct.std::__detail::_Hash_node.61"* %3, i64 0, i32 0, !dbg !10601 + alias entry %103 = getelementptr inbounds %"struct.std::__detail::_Hash_node.61", %"struct.std::__detail::_Hash_node.61"* %3, i64 0, i32 0, i32 0, !dbg !10601 + alias entry %104 = bitcast %"struct.std::__detail::_Hash_node.61"* %3 to i64*, !dbg !10602 + alias entry %105 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %99, i64 0, i32 0, !dbg !10605 + base alias entry %105 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %99, i64 0, i32 0, !dbg !10605 + base alias entry %114 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %85, i64 %113, !dbg !10630 + base alias entry %118 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %117, i64 %86, !dbg !10632 +Round 1 +Warning: the first offset is not constant + alias entry %45 = load %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %44, align 8, !dbg !10469, !tbaa !10471 + alias entry %57 = bitcast %"struct.std::__detail::_Hash_node_base"* %45 to i64*, !dbg !10485 + base alias offset entry (0) %96 = load %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %88, align 8, !dbg !10589, !tbaa !10471 +Warning: the first offset is not constant +Warning: the first offset is not constant +Round 2 +Warning: the first offset is not constant +Warning: the first offset is not constant +Warning: the first offset is not constant +Round end + load (1.000000e+00) from %"class.std::_Hashtable.34"* %0 + load (1.000000e+00) from %"class.std::_Hashtable.34"* %0 + load (1.000000e+00) from %"class.std::_Hashtable.34"* %0 + load (5.000000e-01) from %"class.std::_Hashtable.34"* %0 + load (4.999995e-01) from %"class.std::_Hashtable.34"* %0 + store (4.999995e-01) to %"class.std::_Hashtable.34"* %0 + load (3.749996e+00) from %"class.std::_Hashtable.34"* %0 + store (3.749996e+00) to %"class.std::_Hashtable.34"* %0 + load (6.249994e+00) from %"class.std::_Hashtable.34"* %0 + store (6.249994e+00) to %"class.std::_Hashtable.34"* %0 + store (4.768372e-07) to %"class.std::_Hashtable.34"* %0 + load (4.999995e-01) from %"class.std::_Hashtable.34"* %0 + store (4.999995e-01) to %"class.std::_Hashtable.34"* %0 + store (4.999995e-01) to %"class.std::_Hashtable.34"* %0 + store (6.249997e-01) to %"struct.std::__detail::_Hash_node.61"* %3 + load (3.749998e-01) from %"class.std::_Hashtable.34"* %0 + store (3.749998e-01) to %"struct.std::__detail::_Hash_node.61"* %3 + store (3.749998e-01) to %"class.std::_Hashtable.34"* %0 + load (3.749998e-01) from %"struct.std::__detail::_Hash_node.61"* %3 + load (2.343749e-01) from %"class.std::_Hashtable.34"* %0 + load (2.343749e-01) from %"class.std::_Hashtable.34"* %0 + load (9.999995e-01) from %"class.std::_Hashtable.34"* %0 + store (9.999995e-01) to %"class.std::_Hashtable.34"* %0 + Frequency of %"class.std::_Hashtable.34"* %0 + load: 1.634374e+01 store: 1.287499e+01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"struct.std::__detail::_Hash_node.61"* %3 + load: 3.749998e-01 store: 9.999995e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZNSt6vectorI8CommInfoSaIS0_EE17_M_default_appendEm +Round 0 + alias entry %7 = getelementptr inbounds %"class.std::vector.52", %"class.std::vector.52"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10273 + alias entry %8 = bitcast %struct.CommInfo** %7 to i64*, !dbg !10273 + alias entry %10 = getelementptr inbounds %"class.std::vector.52", %"class.std::vector.52"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10280 + alias entry %59 = bitcast %struct.CommInfo** %10 to i64*, !dbg !10394 + alias entry %60 = bitcast %"class.std::vector.52"* %0 to i64*, !dbg !10395 + alias entry %81 = getelementptr inbounds %"class.std::vector.52", %"class.std::vector.52"* %0, i64 0, i32 0, i32 0, i32 0 + alias entry %89 = getelementptr inbounds %"class.std::vector.52", %"class.std::vector.52"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10449 + alias entry %143 = bitcast %"class.std::vector.52"* %0 to i8**, !dbg !10651 +Round 1 +Round end + load (6.250000e-01) from %"class.std::vector.52"* %0 + load (6.250000e-01) from %"class.std::vector.52"* %0 + load (3.125000e-01) from %"class.std::vector.52"* %0 + load (3.125000e-01) from %"class.std::vector.52"* %0 + load (1.953125e-01) from %"class.std::vector.52"* %0 + load (1.953125e-01) from %"class.std::vector.52"* %0 + load (3.125000e-01) from %"class.std::vector.52"* %0 + store (3.125000e-01) to %"class.std::vector.52"* %0 + store (3.125000e-01) to %"class.std::vector.52"* %0 + Frequency of %"class.std::vector.52"* %0 + load: 2.578125e+00 store: 6.250000e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _GLOBAL__sub_I_main.cpp +Round 0 +Round end +On function .omp_offloading.descriptor_unreg +Round 0 +Round end + Frequency of i8* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_offloading.descriptor_reg.nvptx64-nvidia-cuda +Round 0 +Round end + ---- Identify Target Regions ---- + target call: %49 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z21distComputeModularityRK5GraphP4CommPKddi_l396.region_id, i32 5, i8** nonnull %24, i8** nonnull %26, i64* nonnull %28, i64* getelementptr inbounds ([5 x i64], [5 x i64]* @.offload_maptypes, i64 0, i64 0), i32 0, i32 0), !dbg !10317 + target call: %24 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z20distUpdateLocalCinfolP4CommPKS__l436.region_id, i32 3, i8** nonnull %9, i8** nonnull %11, i64* nonnull %13, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.15, i64 0, i64 0), i32 0, i32 0), !dbg !10269 + target call: %26 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z16distCleanCWandCUlPdP4Comm_l455.region_id, i32 3, i8** nonnull %11, i8** nonnull %13, i64* nonnull %15, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.20, i64 0, i64 0), i32 0, i32 0), !dbg !10269 + target call: %258 = invoke i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z16distCleanCWandCUlPdP4Comm_l455.region_id, i32 3, i8** nonnull %184, i8** nonnull %186, i64* nonnull %188, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.20, i64 0, i64 0), i32 0, i32 0) + to label %259 unwind label %319, !dbg !11559 + target call: %276 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z17distLouvainMethodiiRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ddRi_l1368.region_id, i32 12, i8** nonnull %200, i8** nonnull %202, i64* nonnull %204, i64* getelementptr inbounds ([12 x i64], [12 x i64]* @.offload_maptypes.47, i64 0, i64 0), i32 0, i32 0), !dbg !11584 + target call: %325 = invoke i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z20distUpdateLocalCinfolP4CommPKS__l436.region_id, i32 3, i8** nonnull %184, i8** nonnull %186, i64* nonnull %188, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.15, i64 0, i64 0), i32 0, i32 0) + to label %326 unwind label %319, !dbg !11667 + ---- Target Distance Calculation ---- +_Z17distLouvainMethodiiRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ddRi converges after 3 iterations +target 0: (0: 1.000000e+00) (1: 1.000000e+00) (2: 1.000000e+00) (3: 1.000000e+00) (4: 1.000000e+00) (5: 1.000000e+00) +target 1: (0: 1.000000e+00) (1: 1.000000e+00) (2: 1.000000e+00) (3: 1.000000e+00) (4: 1.000000e+00) (5: 1.000000e+00) +target 2: (0: 1.000000e+00) (1: 1.000000e+00) (2: 1.000000e+00) (3: 1.000000e+00) (4: 1.000000e+00) (5: 1.000000e+00) +target 3: (0: 1.010000e+02) (1: 1.010000e+02) (2: 1.010000e+02) (3: 9.152967e+00) (4: 1.000095e+00) (5: 2.000190e+00) +target 4: (0: 1.010000e+02) (1: 1.010000e+02) (2: 1.010000e+02) (3: 8.152880e+00) (4: 9.091440e+00) (5: 1.000095e+00) +target 5: (0: 1.010000e+02) (1: 1.010000e+02) (2: 1.010000e+02) (3: 7.152791e+00) (4: 8.091353e+00) (5: 9.029914e+00) + ---- OMP (main.cpp, powerpc64le-unknown-linux-gnu) ---- +new entry %91 = invoke i8* @omp_target_alloc(i64 %90, i32 signext -100) + to label %92 unwind label %291, !dbg !11387 +new entry %105 = invoke i8* @omp_target_alloc(i64 %104, i32 signext -100) + to label %106 unwind label %295, !dbg !11405 +new entry %119 = invoke i8* @omp_target_alloc(i64 %118, i32 signext -100) + to label %120 unwind label %299, !dbg !11431 +new entry %128 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %129 unwind label %303, !dbg !11449 +new entry %134 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %135 unwind label %307, !dbg !11462 +new entry %140 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %141 unwind label %311, !dbg !11478 +new entry %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %146 unwind label %313, !dbg !11486 +new entry %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %152 unwind label %315, !dbg !11502 +new entry %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %157 unwind label %317, !dbg !11510 +Round 0 + base alias entry %130 = bitcast i64** %29 to i8**, !dbg !11450 + base alias entry %142 = bitcast i64** %30 to i8**, !dbg !11479 + alias entry %147 = bitcast i8* %145 to %struct.Comm*, !dbg !11487 + alias entry %158 = bitcast i8* %156 to double*, !dbg !11511 + base alias entry %189 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 1 + base alias entry %190 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 1 + base alias entry %192 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 2 + base alias entry %193 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 2 +Warning: reach to function declaration __kmpc_fork_teams + base alias entry %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1 + base alias entry %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1 + base alias entry %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2 + base alias entry %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2 + base alias entry %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3 + base alias entry %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3 + base alias entry %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6 + base alias entry %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6 + base alias entry %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7 + base alias entry %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7 + base alias entry %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8 + base alias entry %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8 + base alias entry %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10 + base alias entry %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10 +Warning: reach to function declaration __kmpc_fork_teams + base alias entry %189 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 1 + base alias entry %190 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 1 +Warning: reach to function declaration __kmpc_fork_teams + alias entry (func arg) %struct.Comm* %1 + alias entry (func arg) double* %2 +Info: add function _Z21distComputeModularityRK5GraphP4CommPKddi to Round 1 +Round 1 + base alias entry %35 = bitcast i8** %34 to double**, !dbg !10317 + base alias entry %37 = bitcast i8** %36 to double**, !dbg !10317 + base alias entry %45 = bitcast i8** %44 to %struct.Comm**, !dbg !10317 + base alias entry %47 = bitcast i8** %46 to %struct.Comm**, !dbg !10317 +Warning: reach to function declaration __kmpc_fork_teams + base alias entry %29 = alloca i64*, align 8 + base alias entry %30 = alloca i64*, align 8 + base alias offset entry (1) %16 = alloca [3 x i8*], align 8 + base alias offset entry (1) %17 = alloca [3 x i8*], align 8 + base alias offset entry (2) %16 = alloca [3 x i8*], align 8 + base alias offset entry (-1) %192 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 2 + base alias offset entry (2) %17 = alloca [3 x i8*], align 8 + base alias offset entry (-1) %193 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 2 + base alias offset entry (1) %31 = alloca [12 x i8*], align 8 + base alias offset entry (1) %32 = alloca [12 x i8*], align 8 + base alias offset entry (2) %31 = alloca [12 x i8*], align 8 + base alias offset entry (-1) %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2 + base alias offset entry (2) %32 = alloca [12 x i8*], align 8 + base alias offset entry (-1) %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2 + base alias offset entry (3) %31 = alloca [12 x i8*], align 8 + base alias offset entry (-2) %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3 + base alias offset entry (-1) %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3 + base alias offset entry (3) %32 = alloca [12 x i8*], align 8 + base alias offset entry (-2) %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3 + base alias offset entry (-1) %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3 + base alias offset entry (-3) %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4 + base alias offset entry (-2) %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4 + base alias offset entry (-1) %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4 + base alias offset entry (-3) %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4 + base alias offset entry (-2) %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4 + base alias offset entry (-1) %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4 + base alias offset entry (-4) %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5 + base alias offset entry (-3) %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5 + base alias offset entry (-2) %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5 + base alias offset entry (-4) %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5 + base alias offset entry (-3) %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5 + base alias offset entry (-2) %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5 + base alias offset entry (6) %31 = alloca [12 x i8*], align 8 + base alias offset entry (-5) %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6 + base alias offset entry (-4) %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6 + base alias offset entry (-3) %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6 + base alias offset entry (6) %32 = alloca [12 x i8*], align 8 + base alias offset entry (-5) %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6 + base alias offset entry (-4) %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6 + base alias offset entry (-3) %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6 + base alias offset entry (7) %31 = alloca [12 x i8*], align 8 + base alias offset entry (-6) %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7 + base alias offset entry (-5) %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7 + base alias offset entry (-4) %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7 + base alias offset entry (-1) %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7 + base alias offset entry (7) %32 = alloca [12 x i8*], align 8 + base alias offset entry (-6) %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7 + base alias offset entry (-5) %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7 + base alias offset entry (-4) %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7 + base alias offset entry (-1) %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7 + base alias offset entry (8) %31 = alloca [12 x i8*], align 8 + base alias offset entry (-7) %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8 + base alias offset entry (-6) %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8 + base alias offset entry (-5) %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8 + base alias offset entry (-2) %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8 + base alias offset entry (-1) %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8 + base alias offset entry (8) %32 = alloca [12 x i8*], align 8 + base alias offset entry (-7) %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8 + base alias offset entry (-6) %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8 + base alias offset entry (-5) %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8 + base alias offset entry (-2) %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8 + base alias offset entry (-1) %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8 + base alias offset entry (-8) %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9 + base alias offset entry (-7) %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9 + base alias offset entry (-6) %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9 + base alias offset entry (-3) %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9 + base alias offset entry (-2) %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9 + base alias offset entry (-1) %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9 + base alias offset entry (-8) %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9 + base alias offset entry (-7) %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9 + base alias offset entry (-6) %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9 + base alias offset entry (-3) %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9 + base alias offset entry (-2) %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9 + base alias offset entry (-1) %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9 + base alias offset entry (10) %31 = alloca [12 x i8*], align 8 + base alias offset entry (-9) %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10 + base alias offset entry (-8) %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10 + base alias offset entry (-7) %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10 + base alias offset entry (-4) %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10 + base alias offset entry (-3) %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10 + base alias offset entry (-2) %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10 + base alias offset entry (10) %32 = alloca [12 x i8*], align 8 + base alias offset entry (-9) %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10 + base alias offset entry (-8) %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10 + base alias offset entry (-7) %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10 + base alias offset entry (-4) %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10 + base alias offset entry (-3) %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10 + base alias offset entry (-2) %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10 + base alias offset entry (-10) %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11 + base alias offset entry (-9) %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11 + base alias offset entry (-8) %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11 + base alias offset entry (-5) %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11 + base alias offset entry (-4) %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11 + base alias offset entry (-3) %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11 + base alias offset entry (-1) %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11 + base alias offset entry (-10) %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11 + base alias offset entry (-9) %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11 + base alias offset entry (-8) %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11 + base alias offset entry (-5) %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11 + base alias offset entry (-4) %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11 + base alias offset entry (-3) %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11 + base alias offset entry (-1) %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11 +Warning: store a different alias pointer to a base pointer: store i8* %156, i8** %189, align 8, !dbg !11559 +Warning: store a different alias pointer to a base pointer: store i8* %156, i8** %190, align 8, !dbg !11559 +Warning: reach to function declaration __kmpc_fork_teams + alias entry %263 = load i64*, i64** %29, align 8, !dbg !11584, !tbaa !11451 + alias entry %264 = load i64*, i64** %30, align 8, !dbg !11584, !tbaa !11451 + alias entry %274 = ptrtoint i64* %263 to i64, !dbg !11584 + alias entry %275 = ptrtoint i64* %264 to i64, !dbg !11584 + base alias entry %215 = bitcast i8** %214 to i64* + base alias entry %217 = bitcast i8** %216 to i64* + base alias entry %220 = bitcast i8** %219 to i64* + base alias entry %222 = bitcast i8** %221 to i64* +Warning: reach to function declaration __kmpc_fork_teams +Warning: reach to function declaration __kmpc_fork_teams +Info: add function _Z21distComputeModularityRK5GraphP4CommPKddi to Round 2 +Warning: reach to function declaration __kmpc_fork_call +Round 2 + base alias entry %34 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 2, !dbg !10317 + base alias entry %36 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 2, !dbg !10317 + base alias entry %44 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 4, !dbg !10317 + base alias entry %46 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 4, !dbg !10317 +Warning: reach to function declaration __kmpc_fork_teams + base alias offset entry (2) %11 = alloca [5 x i8*], align 8 + base alias offset entry (2) %12 = alloca [5 x i8*], align 8 + base alias offset entry (-1) %39 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 3, !dbg !10317 + base alias offset entry (-1) %41 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 3, !dbg !10317 + base alias offset entry (4) %11 = alloca [5 x i8*], align 8 + base alias offset entry (-2) %44 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 4, !dbg !10317 + base alias offset entry (4) %12 = alloca [5 x i8*], align 8 + base alias offset entry (-2) %46 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 4, !dbg !10317 +Warning: reach to function declaration __kmpc_fork_teams + base alias entry %126 = bitcast i64** %29 to i8*, !dbg !11447 + base alias entry %139 = bitcast i64** %30 to i8*, !dbg !11477 + base alias offset entry (1) %184 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 0 + base alias offset entry (2) %184 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 0 + base alias offset entry (1) %186 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 0 + base alias offset entry (2) %186 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 0 + base alias offset entry (1) %189 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 1 + base alias offset entry (1) %190 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 1 + base alias offset entry (1) %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0 + base alias offset entry (2) %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0 + base alias offset entry (3) %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0 + base alias offset entry (6) %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0 + base alias offset entry (7) %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0 + base alias offset entry (8) %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0 + base alias offset entry (10) %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0 + base alias offset entry (1) %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0 + base alias offset entry (2) %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0 + base alias offset entry (3) %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0 + base alias offset entry (6) %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0 + base alias offset entry (7) %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0 + base alias offset entry (8) %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0 + base alias offset entry (10) %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0 + base alias offset entry (1) %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1 + base alias offset entry (2) %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1 + base alias offset entry (5) %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1 + base alias offset entry (6) %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1 + base alias offset entry (7) %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1 + base alias offset entry (9) %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1 + base alias offset entry (1) %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1 + base alias offset entry (2) %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1 + base alias offset entry (5) %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1 + base alias offset entry (6) %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1 + base alias offset entry (7) %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1 + base alias offset entry (9) %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1 + base alias offset entry (1) %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2 + base alias offset entry (4) %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2 + base alias offset entry (5) %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2 + base alias offset entry (6) %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2 + base alias offset entry (8) %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2 + base alias offset entry (1) %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2 + base alias offset entry (4) %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2 + base alias offset entry (5) %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2 + base alias offset entry (6) %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2 + base alias offset entry (8) %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2 + base alias offset entry (3) %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3 + base alias offset entry (4) %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3 + base alias offset entry (5) %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3 + base alias offset entry (7) %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3 + base alias offset entry (3) %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3 + base alias offset entry (4) %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3 + base alias offset entry (5) %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3 + base alias offset entry (7) %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3 + base alias offset entry (2) %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4 + base alias offset entry (3) %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4 + base alias offset entry (4) %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4 + base alias offset entry (6) %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4 + base alias entry %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4 + base alias offset entry (2) %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4 + base alias offset entry (3) %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4 + base alias offset entry (4) %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4 + base alias offset entry (6) %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4 + base alias entry %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4 + base alias offset entry (1) %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5 + base alias offset entry (2) %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5 + base alias offset entry (3) %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5 + base alias offset entry (5) %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5 + base alias entry %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5 + base alias offset entry (1) %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5 + base alias offset entry (2) %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5 + base alias offset entry (3) %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5 + base alias offset entry (5) %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5 + base alias entry %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5 + base alias offset entry (1) %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6 + base alias offset entry (2) %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6 + base alias offset entry (4) %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6 + base alias offset entry (1) %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6 + base alias offset entry (2) %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6 + base alias offset entry (4) %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6 + base alias offset entry (1) %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7 + base alias offset entry (3) %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7 + base alias offset entry (1) %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7 + base alias offset entry (3) %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7 + base alias offset entry (2) %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8 + base alias offset entry (2) %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8 + base alias offset entry (1) %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9 + base alias offset entry (1) %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9 +Warning: store a different alias pointer to a base pointer: store i8* %156, i8** %189, align 8, !dbg !11559 +Warning: store a different alias pointer to a base pointer: store i8* %156, i8** %190, align 8, !dbg !11559 +Warning: reach to function declaration __kmpc_fork_teams +Warning: reach to function declaration __kmpc_fork_teams +Warning: reach to function declaration __kmpc_fork_teams +Info: add function _Z21distComputeModularityRK5GraphP4CommPKddi to Round 3 +Warning: reach to function declaration __kmpc_fork_call +Round 3 + base alias offset entry (4) %24 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 0, !dbg !10317 + base alias offset entry (2) %24 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 0, !dbg !10317 + base alias offset entry (4) %26 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 0, !dbg !10317 + base alias offset entry (2) %26 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 0, !dbg !10317 + base alias offset entry (3) %29 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 1, !dbg !10317 + base alias offset entry (1) %29 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 1, !dbg !10317 + base alias offset entry (3) %31 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 1, !dbg !10317 + base alias offset entry (1) %31 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 1, !dbg !10317 + base alias offset entry (2) %34 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 2, !dbg !10317 + base alias offset entry (2) %36 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 2, !dbg !10317 + base alias offset entry (1) %39 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 3, !dbg !10317 + base alias offset entry (1) %41 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 3, !dbg !10317 +Warning: reach to function declaration __kmpc_fork_teams +Warning: reach to function declaration __kmpc_fork_teams +Warning: reach to function declaration __kmpc_fork_teams + base alias offset entry (4) %31 = alloca [12 x i8*], align 8 + base alias offset entry (4) %32 = alloca [12 x i8*], align 8 + base alias offset entry (5) %31 = alloca [12 x i8*], align 8 + base alias offset entry (-1) %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5 + base alias offset entry (5) %32 = alloca [12 x i8*], align 8 + base alias offset entry (-1) %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5 + base alias offset entry (-2) %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6 + base alias offset entry (-1) %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6 + base alias offset entry (-2) %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6 + base alias offset entry (-1) %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6 + base alias offset entry (-3) %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7 + base alias offset entry (-2) %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7 + base alias offset entry (-3) %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7 + base alias offset entry (-2) %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7 + base alias offset entry (-4) %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8 + base alias offset entry (-3) %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8 + base alias offset entry (-4) %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8 + base alias offset entry (-3) %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8 + base alias offset entry (-5) %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9 + base alias offset entry (-4) %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9 + base alias offset entry (-5) %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9 + base alias offset entry (-4) %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9 + base alias offset entry (-6) %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10 + base alias offset entry (-5) %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10 + base alias offset entry (-6) %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10 + base alias offset entry (-5) %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10 + base alias offset entry (-7) %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11 + base alias offset entry (-6) %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11 + base alias offset entry (-7) %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11 + base alias offset entry (-6) %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11 +Warning: store a different alias pointer to a base pointer: store i8* %156, i8** %189, align 8, !dbg !11559 +Warning: store a different alias pointer to a base pointer: store i8* %156, i8** %190, align 8, !dbg !11559 +Warning: reach to function declaration __kmpc_fork_teams +Warning: reach to function declaration __kmpc_fork_teams +Warning: reach to function declaration __kmpc_fork_teams +Info: add function _Z21distComputeModularityRK5GraphP4CommPKddi to Round 4 +Warning: reach to function declaration __kmpc_fork_call +Round 4 +Warning: reach to function declaration __kmpc_fork_teams +Warning: reach to function declaration __kmpc_fork_teams + base alias offset entry (4) %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0 + base alias offset entry (5) %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0 + base alias offset entry (4) %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0 + base alias offset entry (5) %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0 + base alias offset entry (3) %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1 + base alias offset entry (4) %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1 + base alias offset entry (3) %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1 + base alias offset entry (4) %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1 + base alias offset entry (2) %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2 + base alias offset entry (3) %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2 + base alias offset entry (2) %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2 + base alias offset entry (3) %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2 + base alias offset entry (1) %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3 + base alias offset entry (2) %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3 + base alias offset entry (1) %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3 + base alias offset entry (2) %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3 + base alias offset entry (1) %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4 + base alias offset entry (1) %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4 +Warning: store a different alias pointer to a base pointer: store i8* %156, i8** %189, align 8, !dbg !11559 +Warning: store a different alias pointer to a base pointer: store i8* %156, i8** %190, align 8, !dbg !11559 +Warning: reach to function declaration __kmpc_fork_teams +Warning: reach to function declaration __kmpc_fork_teams +Warning: reach to function declaration __kmpc_fork_teams +Info: add function _Z21distComputeModularityRK5GraphP4CommPKddi to Round 5 +Warning: reach to function declaration __kmpc_fork_call +Round 5 +Warning: reach to function declaration __kmpc_fork_teams +Warning: store a different alias pointer to a base pointer: store i8* %156, i8** %189, align 8, !dbg !11559 +Warning: store a different alias pointer to a base pointer: store i8* %156, i8** %190, align 8, !dbg !11559 +Warning: reach to function declaration __kmpc_fork_teams +Warning: reach to function declaration __kmpc_fork_teams +Warning: reach to function declaration __kmpc_fork_teams +Info: add function _Z21distComputeModularityRK5GraphP4CommPKddi to Round 6 +Warning: reach to function declaration __kmpc_fork_call +Round 6 +Warning: reach to function declaration __kmpc_fork_teams +Round end + ---- Access Frequency Analysis ---- + target call (1.625206e+01, 0.000000e+00, 5.076920e+00) using %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %146 unwind label %313, !dbg !11486 + target call (1.625206e+01, 0.000000e+00, 1.015380e+01) using %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %152 unwind label %315, !dbg !11502 + target call (1.625204e+01, 1.015380e+01, 0.000000e+00) using %91 = invoke i8* @omp_target_alloc(i64 %90, i32 signext -100) + to label %92 unwind label %291, !dbg !11387 + target call (1.625204e+01, 5.076920e+00, 0.000000e+00) using %105 = invoke i8* @omp_target_alloc(i64 %104, i32 signext -100) + to label %106 unwind label %295, !dbg !11405 + target call (1.625204e+01, 8.757690e+01, 0.000000e+00) using %119 = invoke i8* @omp_target_alloc(i64 %118, i32 signext -100) + to label %120 unwind label %299, !dbg !11431 + target call (1.625204e+01, 4.569230e+01, 0.000000e+00) using %128 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %129 unwind label %303, !dbg !11449 + target call (1.625204e+01, 0.000000e+00, 5.076920e+00) using %140 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %141 unwind label %311, !dbg !11478 + target call (1.625204e+01, 3.807690e+00, 0.000000e+00) using %134 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %135 unwind label %307, !dbg !11462 + target call (1.625204e+01, 1.078710e+02, 0.000000e+00) using %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %146 unwind label %313, !dbg !11486 + target call (1.625204e+01, 2.538460e+00, 0.000000e+00) using %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %152 unwind label %315, !dbg !11502 + target call (1.625204e+01, 2.538460e+00, 2.538460e+00) using %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %157 unwind label %317, !dbg !11510 + target call (1.625202e+01, 1.015380e+01, 1.015380e+01) using %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %146 unwind label %313, !dbg !11486 + target call (1.625202e+01, 1.015380e+01, 0.000000e+00) using %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %152 unwind label %315, !dbg !11502 + call (1.625199e+01, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00) using %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %146 unwind label %313, !dbg !11486 + call (1.625199e+01, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00) using %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %157 unwind label %317, !dbg !11510 +Frequency of %91 = invoke i8* @omp_target_alloc(i64 %90, i32 signext -100) + to label %92 unwind label %291, !dbg !11387 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 1.650200e+02 store: 0.000000e+00 (target) +Frequency of %105 = invoke i8* @omp_target_alloc(i64 %104, i32 signext -100) + to label %106 unwind label %295, !dbg !11405 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 8.251031e+01 store: 0.000000e+00 (target) +Frequency of %119 = invoke i8* @omp_target_alloc(i64 %118, i32 signext -100) + to label %120 unwind label %299, !dbg !11431 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 1.423303e+03 store: 0.000000e+00 (target) +Frequency of %128 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %129 unwind label %303, !dbg !11449 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 7.425931e+02 store: 0.000000e+00 (target) +Frequency of %134 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %135 unwind label %307, !dbg !11462 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 6.188273e+01 store: 0.000000e+00 (target) +Frequency of %140 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %141 unwind label %311, !dbg !11478 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 8.251031e+01 (target) +Frequency of %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %146 unwind label %313, !dbg !11486 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 1.918144e+03 store: 2.475302e+02 (target) +Frequency of %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %152 unwind label %315, !dbg !11502 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 2.062750e+02 store: 1.650201e+02 (target) +Frequency of %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %157 unwind label %317, !dbg !11510 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 4.125515e+01 store: 4.125515e+01 (target) + ---- Optimization Preparation ---- +Rank 9 for %134 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %135 unwind label %307, !dbg !11462 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 6.188273e+01 store: 0.000000e+00 (target) +Rank 8 for %105 = invoke i8* @omp_target_alloc(i64 %104, i32 signext -100) + to label %106 unwind label %295, !dbg !11405 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 8.251031e+01 store: 0.000000e+00 (target) +Rank 7 for %140 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %141 unwind label %311, !dbg !11478 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 8.251031e+01 (target) +Rank 6 for %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %157 unwind label %317, !dbg !11510 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 4.125515e+01 store: 4.125515e+01 (target) +Rank 5 for %91 = invoke i8* @omp_target_alloc(i64 %90, i32 signext -100) + to label %92 unwind label %291, !dbg !11387 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 1.650200e+02 store: 0.000000e+00 (target) +Rank 4 for %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %152 unwind label %315, !dbg !11502 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 2.062750e+02 store: 1.650201e+02 (target) +Rank 3 for %128 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %129 unwind label %303, !dbg !11449 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 7.425931e+02 store: 0.000000e+00 (target) +Rank 2 for %119 = invoke i8* @omp_target_alloc(i64 %118, i32 signext -100) + to label %120 unwind label %299, !dbg !11431 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 1.423303e+03 store: 0.000000e+00 (target) +Rank 1 for %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %146 unwind label %313, !dbg !11486 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 1.918144e+03 store: 2.475302e+02 (target) + ---- Data Mapping Optimization ---- + target call: %49 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z21distComputeModularityRK5GraphP4CommPKddi_l396.region_id, i32 5, i8** nonnull %24, i8** nonnull %26, i64* nonnull %28, i64* getelementptr inbounds ([5 x i64], [5 x i64]* @.offload_maptypes, i64 0, i64 0), i32 0, i32 0), !dbg !10317 +@.offload_maptypes = private unnamed_addr constant [5 x i64] [i64 800, i64 547, i64 33, i64 547, i64 33] + arg 2 (0.000000e+00, 0.000000e+00; 4.125515e+01, 4.125515e+01) is %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %157 unwind label %317, !dbg !11510 + size is %127 = shl i64 %69, 3, !dbg !11448 + global reuse is 0x06 + local reuse is 1.600380e+02, 1.280304e+03 after adjustment; scaled local reuse is 0x500 + reuse distance is 0x01 + arg 4 (0.000000e+00, 0.000000e+00; 1.918144e+03, 2.475302e+02) is %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %146 unwind label %313, !dbg !11486 + size is %144 = shl i64 %69, 4, !dbg !11485 + global reuse is 0x01 + local reuse is 1.600380e+02, 2.560608e+03 after adjustment; scaled local reuse is 0xa00 + reuse distance is 0x01 + map type changed: @.offload_maptypes.0 = private unnamed_addr constant [5 x i64] [i64 800, i64 547, i64 1100853829665, i64 547, i64 1102195986465] + target call: %24 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z20distUpdateLocalCinfolP4CommPKS__l436.region_id, i32 3, i8** nonnull %9, i8** nonnull %11, i64* nonnull %13, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.15, i64 0, i64 0), i32 0, i32 0), !dbg !10269 +@.offload_maptypes.15 = private unnamed_addr constant [3 x i64] [i64 800, i64 35, i64 33] + target call: %26 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z16distCleanCWandCUlPdP4Comm_l455.region_id, i32 3, i8** nonnull %11, i8** nonnull %13, i64* nonnull %15, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.20, i64 0, i64 0), i32 0, i32 0), !dbg !10269 +@.offload_maptypes.20 = private unnamed_addr constant [3 x i64] [i64 800, i64 34, i64 34] + target call: %258 = invoke i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z16distCleanCWandCUlPdP4Comm_l455.region_id, i32 3, i8** nonnull %184, i8** nonnull %186, i64* nonnull %188, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.20, i64 0, i64 0), i32 0, i32 0) + to label %259 unwind label %319, !dbg !11559 +@.offload_maptypes.20 = private unnamed_addr constant [3 x i64] [i64 800, i64 34, i64 34] + arg 1 (0.000000e+00, 0.000000e+00; 1.918144e+03, 2.475302e+02) is %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %146 unwind label %313, !dbg !11486 + size is %144 = shl i64 %69, 4, !dbg !11485 + global reuse is 0x01 + local reuse is 5.076920e+00, 4.061536e+01 after adjustment; scaled local reuse is 0x028 + reuse distance is 0x01 + arg 2 (0.000000e+00, 0.000000e+00; 2.062750e+02, 1.650201e+02) is %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %152 unwind label %315, !dbg !11502 + size is %144 = shl i64 %69, 4, !dbg !11485 + global reuse is 0x04 + local reuse is 1.015380e+01, 1.624608e+02 after adjustment; scaled local reuse is 0x0a2 + reuse distance is 0x01 + map type changed: @.offload_maptypes.20.1 = private unnamed_addr constant [3 x i64] [i64 800, i64 1099553574946, i64 1099681513506] + target call: %276 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z17distLouvainMethodiiRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ddRi_l1368.region_id, i32 12, i8** nonnull %200, i8** nonnull %202, i64* nonnull %204, i64* getelementptr inbounds ([12 x i64], [12 x i64]* @.offload_maptypes.47, i64 0, i64 0), i32 0, i32 0), !dbg !11584 +@.offload_maptypes.47 = private unnamed_addr constant [12 x i64] [i64 800, i64 33, i64 33, i64 33, i64 33, i64 34, i64 33, i64 33, i64 35, i64 800, i64 35, i64 800] + arg 1 (0.000000e+00, 0.000000e+00; 1.650200e+02, 0.000000e+00) is %91 = invoke i8* @omp_target_alloc(i64 %90, i32 signext -100) + to label %92 unwind label %291, !dbg !11387 + size is %90 = sub i64 %87, %89, !dbg !11386 + global reuse is 0x05 + local reuse is 1.015380e+01, 8.123040e+01 after adjustment; scaled local reuse is 0x051 + reuse distance is 0x09 + arg 2 (0.000000e+00, 0.000000e+00; 8.251031e+01, 0.000000e+00) is %105 = invoke i8* @omp_target_alloc(i64 %104, i32 signext -100) + to label %106 unwind label %295, !dbg !11405 + size is %104 = sub i64 %101, %103, !dbg !11404 + global reuse is 0x08 + local reuse is 5.076920e+00, 4.061536e+01 after adjustment; scaled local reuse is 0x028 + reuse distance is 0x09 + arg 3 (0.000000e+00, 0.000000e+00; 1.423303e+03, 0.000000e+00) is %119 = invoke i8* @omp_target_alloc(i64 %118, i32 signext -100) + to label %120 unwind label %299, !dbg !11431 + size is %118 = sub i64 %115, %117, !dbg !11430 + global reuse is 0x02 + local reuse is 8.757690e+01, 1.401230e+03 after adjustment; scaled local reuse is 0x579 + reuse distance is 0x09 + arg 4 (0.000000e+00, 0.000000e+00; 7.425931e+02, 0.000000e+00) is %128 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %129 unwind label %303, !dbg !11449 + size is %127 = shl i64 %69, 3, !dbg !11448 + global reuse is 0x03 + local reuse is 4.569230e+01, 3.655384e+02 after adjustment; scaled local reuse is 0x16d + reuse distance is 0x09 + arg 5 (0.000000e+00, 0.000000e+00; 0.000000e+00, 8.251031e+01) is %140 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %141 unwind label %311, !dbg !11478 + size is %127 = shl i64 %69, 3, !dbg !11448 + global reuse is 0x07 + local reuse is 5.076920e+00, 4.061536e+01 after adjustment; scaled local reuse is 0x028 + reuse distance is 0x09 + arg 6 (0.000000e+00, 0.000000e+00; 6.188273e+01, 0.000000e+00) is %134 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %135 unwind label %307, !dbg !11462 + size is %127 = shl i64 %69, 3, !dbg !11448 + global reuse is 0x09 + local reuse is 3.807690e+00, 3.046152e+01 after adjustment; scaled local reuse is 0x01e + reuse distance is 0x09 + arg 7 (0.000000e+00, 0.000000e+00; 1.918144e+03, 2.475302e+02) is %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %146 unwind label %313, !dbg !11486 + size is %144 = shl i64 %69, 4, !dbg !11485 + global reuse is 0x01 + local reuse is 1.078710e+02, 1.725936e+03 after adjustment; scaled local reuse is 0x6bd + reuse distance is 0x01 + arg 8 (0.000000e+00, 0.000000e+00; 2.062750e+02, 1.650201e+02) is %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %152 unwind label %315, !dbg !11502 + size is %144 = shl i64 %69, 4, !dbg !11485 + global reuse is 0x04 + local reuse is 2.538460e+00, 4.061536e+01 after adjustment; scaled local reuse is 0x028 + reuse distance is 0x01 + arg 10 (0.000000e+00, 0.000000e+00; 4.125515e+01, 4.125515e+01) is %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %157 unwind label %317, !dbg !11510 + size is %127 = shl i64 %69, 3, !dbg !11448 + global reuse is 0x06 + local reuse is 5.076920e+00, 4.061536e+01 after adjustment; scaled local reuse is 0x028 + reuse distance is 0x09 + map type changed: @.offload_maptypes.47.2 = private unnamed_addr constant [12 x i64] [i64 800, i64 9895689605153, i64 9895646625825, i64 9897073713185, i64 9895987392545, i64 9895646621730, i64 9895636144161, i64 1101320425505, i64 1099553587235, i64 800, i64 9895646617635, i64 800] + target call: %325 = invoke i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z20distUpdateLocalCinfolP4CommPKS__l436.region_id, i32 3, i8** nonnull %184, i8** nonnull %186, i64* nonnull %188, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.15, i64 0, i64 0), i32 0, i32 0) + to label %326 unwind label %319, !dbg !11667 +@.offload_maptypes.15 = private unnamed_addr constant [3 x i64] [i64 800, i64 35, i64 33] + arg 1 (0.000000e+00, 0.000000e+00; 1.918144e+03, 2.475302e+02) is %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %146 unwind label %313, !dbg !11486 + size is %144 = shl i64 %69, 4, !dbg !11485 + global reuse is 0x01 + local reuse is 2.030760e+01, 3.249216e+02 after adjustment; scaled local reuse is 0x144 + reuse distance is 0x07 + arg 2 (0.000000e+00, 0.000000e+00; 2.062750e+02, 1.650201e+02) is %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %152 unwind label %315, !dbg !11502 + size is %144 = shl i64 %69, 4, !dbg !11485 + global reuse is 0x04 + local reuse is 1.015380e+01, 1.624608e+02 after adjustment; scaled local reuse is 0x0a2 + reuse distance is 0x07 + map type changed: @.offload_maptypes.15.3 = private unnamed_addr constant [3 x i64] [i64 800, i64 7696921137187, i64 7696751280161] +1 warning generated. +In file included from main.cpp:58: +In file included from ./dspl_gpu_kernel.hpp:58: +In file included from ./graph.hpp:56: +./utils.hpp:263:56: warning: using floating point absolute value function 'fabs' when argument is of integer type [-Wabsolute-value] + drand_[i] = (GraphWeight)((GraphWeight)fabs(rnums_[i]) * mult); // 0-1 + ^ +./utils.hpp:263:56: note: use function 'std::abs' instead + drand_[i] = (GraphWeight)((GraphWeight)fabs(rnums_[i]) * mult); // 0-1 + ^~~~ + std::abs +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 +change loop scale from 32.0 to 1.0 + ---- Function Argument Access Frequency CG Analysis ---- +On function __omp_offloading_33_128194f__Z21distComputeModularityRK5GraphP4CommPKddi_l396 +Round 0 + alias entry %71 = getelementptr inbounds double, double* %2, i64 %68, !dbg !45 + alias entry %74 = getelementptr inbounds %struct.Comm, %struct.Comm* %4, i64 %68, i32 1, !dbg !52 +Round 1 +Round end +change loop scale from 32.0 to 1.0 + load (1.600385e+02) from double* %2 + load (1.600385e+02) from %struct.Comm* %4 + load (6.227106e-02) from double* %1 + store (6.227106e-02) to double* %1 + load (6.227106e-02) from double* %3 + store (6.227106e-02) to double* %3 + Frequency of double* %1 + load: 6.227106e-02 store: 6.227106e-02 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %2 + load: 1.600385e+02 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %3 + load: 6.227106e-02 store: 6.227106e-02 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %4 + load: 1.600385e+02 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function __omp_offloading_33_128194f__Z20distUpdateLocalCinfolP4CommPKS__l436 +Round 0 + alias entry %41 = getelementptr inbounds %struct.Comm, %struct.Comm* %2, i64 %40, i32 0, !dbg !45 + alias entry %43 = getelementptr inbounds %struct.Comm, %struct.Comm* %1, i64 %40, i32 0, !dbg !53 + alias entry %46 = getelementptr inbounds %struct.Comm, %struct.Comm* %2, i64 %40, i32 1, !dbg !55 + alias entry %48 = getelementptr inbounds %struct.Comm, %struct.Comm* %1, i64 %40, i32 1, !dbg !57 +Round 1 +Round end +change loop scale from 32.0 to 1.0 + load (5.076923e+00) from %struct.Comm* %2 + load (5.076923e+00) from %struct.Comm* %1 + store (5.076923e+00) to %struct.Comm* %1 + load (5.076923e+00) from %struct.Comm* %2 + load (5.076923e+00) from %struct.Comm* %1 + store (5.076923e+00) to %struct.Comm* %1 + Frequency of %struct.Comm* %1 + load: 1.015385e+01 store: 1.015385e+01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %2 + load: 1.015385e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function __omp_offloading_33_128194f__Z16distCleanCWandCUlPdP4Comm_l455 +Round 0 + alias entry %41 = getelementptr inbounds double, double* %1, i64 %40, !dbg !45 + alias entry %42 = getelementptr inbounds %struct.Comm, %struct.Comm* %2, i64 %40, i32 1, !dbg !52 + alias entry %43 = getelementptr inbounds %struct.Comm, %struct.Comm* %2, i64 %40, i32 0, !dbg !57 +Round 1 +Round end +change loop scale from 32.0 to 1.0 + store (5.076923e+00) to double* %1 + store (5.076923e+00) to %struct.Comm* %2 + store (5.076923e+00) to %struct.Comm* %2 + Frequency of double* %1 + load: 0.000000e+00 store: 5.076923e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %2 + load: 0.000000e+00 store: 1.015385e+01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function __omp_offloading_33_128194f__Z17distLouvainMethodiiRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ddRi_l1368 +Round 0 +Round end +change loop scale from 32.0 to 1.0 +Warning: wrong traversal order, or recursive call +On function _Z27distExecuteLouvainIterationlPKlS0_PK4EdgeS0_PlPKdP4CommS8_dPdi +Round 0 + alias entry %91 = getelementptr inbounds i64, i64* %2, i64 %90, !dbg !35 + alias entry %93 = getelementptr inbounds i64, i64* %4, i64 %0, !dbg !38 + alias entry %96 = getelementptr inbounds i64, i64* %1, i64 %0, !dbg !40 + alias entry %99 = getelementptr inbounds i64, i64* %1, i64 %98, !dbg !42 + alias entry %103 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %95, i32 0, !dbg !45 + alias entry %105 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %95, i32 1, !dbg !49 + base alias entry %178 = select i1 %119, %struct.Edge** %13, %struct.Edge** %177 + alias entry %188 = getelementptr inbounds %struct.Edge, %struct.Edge* %3, i64 %187, i32 0, !dbg !69 + alias entry %189 = getelementptr inbounds %struct.Edge, %struct.Edge* %3, i64 %187, i32 1, !dbg !70 + alias entry %198 = getelementptr inbounds i64, i64* %4, i64 %197, !dbg !77 + alias entry %239 = bitcast double* %189 to i64*, !dbg !109 + alias entry %282 = getelementptr inbounds double, double* %10, i64 %0, !dbg !122 + alias entry %286 = getelementptr inbounds double, double* %6, i64 %0, !dbg !125 + alias entry %307 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %306, i32 1, !dbg !136 + alias entry %309 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %306, i32 0, !dbg !137 + alias entry %355 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %354, i32 1, !dbg !136 + alias entry %357 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %354, i32 0, !dbg !137 + alias entry %403 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %402, i32 1, !dbg !167 + alias entry %404 = bitcast double* %403 to i64*, !dbg !168 + alias entry %415 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %402, i32 0, !dbg !170 + alias entry %417 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %95, i32 1, !dbg !172 + alias entry %419 = bitcast double* %417 to i64*, !dbg !174 + alias entry %430 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %95, i32 0, !dbg !176 + alias entry %434 = getelementptr inbounds i64, i64* %5, i64 %0, !dbg !179 + alias entry %462 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %461, i32 1, !dbg !136 + alias entry %464 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %461, i32 0, !dbg !137 +Round 1 +Round end +change loop scale from 32.0 to 1.0 + load (1.000000e+00) from i64* %2 + load (1.000000e+00) from i64* %4 + load (1.000000e+00) from i64* %1 + load (1.000000e+00) from i64* %1 + load (5.000000e-01) from %struct.Comm* %7 + load (5.000000e-01) from %struct.Comm* %7 + load (8.000000e+00) from %struct.Edge* %3 + load (4.000000e+00) from %struct.Edge* %3 + load (8.000000e+00) from i64* %4 + load (2.500000e+00) from %struct.Edge* %3 + load (2.750000e+00) from %struct.Edge* %3 + load (5.000000e-01) from double* %10 + store (5.000000e-01) to double* %10 + load (5.000000e-01) from double* %6 + load (1.236264e-01) from %struct.Comm* %7 + load (1.236264e-01) from %struct.Comm* %7 + load (5.000000e+00) from %struct.Comm* %7 + load (5.000000e+00) from %struct.Comm* %7 + load (2.500000e-01) from %struct.Comm* %8 + load (2.500000e-01) from double* %6 + load (2.500000e-01) from %struct.Comm* %8 + store (1.000000e+00) to i64* %5 + load (5.000000e+00) from %struct.Comm* %7 + load (5.000000e+00) from %struct.Comm* %7 + Frequency of i64* %1 + load: 2.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %2 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Edge* %3 + load: 1.725000e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %4 + load: 9.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %5 + load: 0.000000e+00 store: 1.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %6 + load: 7.500000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %7 + load: 2.124725e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %8 + load: 5.000000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %10 + load: 5.000000e-01 store: 5.000000e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z24distBuildLocalMapCounterllP7clmap_tRiPdS1_PK4EdgePKllll +Round 0 + base alias entry %83 = select i1 %16, %struct.Edge** %12, %struct.Edge** %82 + alias entry %93 = getelementptr inbounds %struct.Edge, %struct.Edge* %6, i64 %92, i32 0, !dbg !38 + alias entry %94 = getelementptr inbounds %struct.Edge, %struct.Edge* %6, i64 %92, i32 1, !dbg !39 + alias entry %103 = getelementptr inbounds i64, i64* %7, i64 %102, !dbg !48 + alias entry %111 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %2, i64 %110, i32 0, !dbg !53 + alias entry %121 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %2, i64 %120, !dbg !61 + alias entry %131 = getelementptr inbounds double, double* %4, i64 %125, !dbg !70 + alias entry %138 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %2, i64 %120, i32 1, !dbg !75 + alias entry %139 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %2, i64 %120, i32 0, !dbg !76 + alias entry %146 = getelementptr inbounds double, double* %4, i64 %145, !dbg !83 + alias entry %147 = bitcast double* %146 to i64*, !dbg !84 + alias entry %148 = bitcast double* %94 to i64*, !dbg !85 +Round 1 +Round end +change loop scale from 32.0 to 1.0 + load (5.000000e-01) from %struct.Edge* %6 + load (2.472527e-01) from %struct.Edge* %6 + load (5.000000e-01) from i64* %7 + load (5.000000e-01) from i32* %3 + load (5.076923e+00) from %struct.clmap_t* %2 + load (3.076923e-01) from i32* %5 + load (1.538462e-01) from %struct.Edge* %6 + load (1.538462e-01) from double* %4 + store (1.538462e-01) to double* %4 + store (1.703297e-01) to %struct.clmap_t* %2 + store (1.703297e-01) to %struct.clmap_t* %2 + store (1.703297e-01) to i32* %3 + load (3.406593e-01) from i32* %5 + load (1.703297e-01) from %struct.Edge* %6 + store (1.703297e-01) to double* %4 + store (1.703297e-01) to i32* %5 + Frequency of %struct.clmap_t* %2 + load: 5.076923e+00 store: 3.406593e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %3 + load: 5.000000e-01 store: 1.703297e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %4 + load: 1.538462e-01 store: 3.241758e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %5 + load: 6.483516e-01 store: 1.703297e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Edge* %6 + load: 1.071429e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %7 + load: 5.000000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z15distGetMaxIndexP7clmap_tRiPdS1_dPK4Commdldllld +Round 0 + alias entry %22 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %0, i64 %21 + alias entry %28 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %0, i64 0, i32 0, !dbg !36 + alias entry %33 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %32, i32 1, !dbg !43 + alias entry %35 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %32, i32 0, !dbg !46 + alias entry %37 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %0, i64 0, i32 1, !dbg !48 + alias entry %41 = getelementptr inbounds double, double* %2, i64 %38, !dbg !52 + alias entry %60 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %0, i64 1, !dbg !62 + alias entry %81 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %80, i32 1, !dbg !43 + alias entry %83 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %80, i32 0, !dbg !46 + alias entry %89 = getelementptr inbounds double, double* %2, i64 %86, !dbg !52 + alias entry %126 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %125, i32 1, !dbg !43 + alias entry %128 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %125, i32 0, !dbg !46 + alias entry %134 = getelementptr inbounds double, double* %2, i64 %131, !dbg !52 +Round 1 +Round end +change loop scale from 32.0 to 1.0 + load (1.000000e+00) from double* %2 + load (1.000000e+00) from i32* %3 + load (1.000000e+00) from i32* %1 + load (5.000000e-01) from %struct.clmap_t* %0 + load (2.500000e-01) from %struct.Comm* %5 + load (2.500000e-01) from %struct.Comm* %5 + load (2.500000e-01) from %struct.clmap_t* %0 + load (1.250000e-01) from double* %2 + load (3.125000e-01) from %struct.Comm* %5 + load (3.125000e-01) from %struct.Comm* %5 + load (1.562500e-01) from double* %2 + load (3.125000e-01) from %struct.Comm* %5 + load (3.125000e-01) from %struct.Comm* %5 + load (1.562500e-01) from double* %2 + Frequency of %struct.clmap_t* %0 + load: 7.500000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %2 + load: 1.437500e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %3 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %5 + load: 1.750000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function __omp_offloading_33_128194f__Z17distLouvainMethodiiRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ddRi_l1368 +Round 0 +Round end +change loop scale from 32.0 to 1.0 + call (5.076923e+00, 2.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00) using i64* %1 + call (5.076923e+00, 1.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00) using i64* %2 + call (5.076923e+00, 1.725000e+01, 0.000000e+00, 0.000000e+00, 0.000000e+00) using %struct.Edge* %3 + call (5.076923e+00, 9.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00) using i64* %4 + call (5.076923e+00, 0.000000e+00, 1.000000e+00, 0.000000e+00, 0.000000e+00) using i64* %5 + call (5.076923e+00, 7.500000e-01, 0.000000e+00, 0.000000e+00, 0.000000e+00) using double* %6 + call (5.076923e+00, 2.124725e+01, 0.000000e+00, 0.000000e+00, 0.000000e+00) using %struct.Comm* %7 + call (5.076923e+00, 5.000000e-01, 0.000000e+00, 0.000000e+00, 0.000000e+00) using %struct.Comm* %8 + call (5.076923e+00, 5.000000e-01, 5.000000e-01, 0.000000e+00, 0.000000e+00) using double* %10 + Frequency of i64* %1 + load: 1.015385e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %2 + load: 5.076923e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Edge* %3 + load: 8.757692e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %4 + load: 4.569231e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %5 + load: 0.000000e+00 store: 5.076923e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %6 + load: 3.807692e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %7 + load: 1.078707e+02 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %8 + load: 2.538462e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %10 + load: 2.538462e+00 store: 2.538462e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + ---- Identify Target Regions ---- + ---- OMP (main.cpp, nvptx64-nvidia-cuda) ---- +Info: ignore malloc +Info: ignore malloc +Info: ignore malloc +Round 0 +Round end + ---- Access Frequency Analysis ---- + ---- Optimization Preparation ---- + ---- Data Mapping Optimization ---- +1 warning generated. + ---- Function Argument Access Frequency CG Analysis ---- +On function _Z7is_pwr2i +Round 0 +Round end +On function _Z8reseederj +Round 0 +Round end +On function _ZNSt8seed_seq8generateIN9__gnu_cxx17__normal_iteratorIPjSt6vectorIjSaIjEEEEEEvT_S8_ +Round 0 + alias entry %18 = getelementptr inbounds %"class.std::seed_seq", %"class.std::seed_seq"* %0, i64 0, i32 0, i32 0, i32 0, i32 1, !dbg !10369 + alias entry %19 = bitcast i32** %18 to i64*, !dbg !10369 + alias entry %21 = bitcast %"class.std::seed_seq"* %0 to i64*, !dbg !10376 +Round 1 +Round end + load (6.274510e-01) from %"class.std::seed_seq"* %0 + load (6.274510e-01) from %"class.std::seed_seq"* %0 + Frequency of %"class.std::seed_seq"* %0 + load: 1.254902e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z4lockv +Round 0 +Round end +On function _Z6unlockv +Round 0 +Round end +On function _Z19distSumVertexDegreeRK5GraphRSt6vectorIdSaIdEERS2_I4CommSaIS6_EE +Round 0 + alias entry %6 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 2, !dbg !10459 +Round 1 +Round end + load (1.000000e+00) from %class.Graph* %0 + Frequency of %class.Graph* %0 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.10"* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.15"* %2 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function __clang_call_terminate +Round 0 +Round end + Frequency of i8* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined. +Round 0 + alias entry %25 = getelementptr inbounds %class.Graph, %class.Graph* %3, i64 0, i32 0, i32 0, i32 0, i32 0 + alias entry %26 = getelementptr inbounds %class.Graph, %class.Graph* %3, i64 0, i32 1, i32 0, i32 0, i32 0 + alias entry %27 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %4, i64 0, i32 0, i32 0, i32 0 + alias entry %28 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %5, i64 0, i32 0, i32 0, i32 0 +Round 1 +Round end + load (6.350000e+00) from %class.Graph* %3 + load (6.350000e+00) from %class.Graph* %3 + load (6.350000e+00) from %"class.std::vector.10"* %4 + load (6.350000e+00) from %"class.std::vector.15"* %5 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %class.Graph* %3 + load: 1.270000e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.10"* %4 + load: 6.350000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.15"* %5 + load: 6.350000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z29distCalcConstantForSecondTermRKSt6vectorIdSaIdEEP19ompi_communicator_t +Round 0 + alias entry %9 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10283 + alias entry %10 = bitcast double** %9 to i64*, !dbg !10283 + alias entry %12 = bitcast %"class.std::vector.10"* %0 to i64*, !dbg !10288 +Round 1 +Round end + load (1.000000e+00) from %"class.std::vector.10"* %0 + load (1.000000e+00) from %"class.std::vector.10"* %0 + Frequency of %"class.std::vector.10"* %0 + load: 2.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.ompi_communicator_t* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp.reduction.reduction_func +Round 0 + alias entry %3 = bitcast i8* %1 to double**, !dbg !10261 + alias entry %5 = bitcast i8* %0 to double**, !dbg !10261 +Round 1 +Round end + load (1.000000e+00) from i8* %1 + load (1.000000e+00) from i8* %0 + Frequency of i8* %0 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i8* %1 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..2 +Round 0 + alias entry %32 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %4, i64 0, i32 0, i32 0, i32 0 + alias entry %98 = bitcast double* %3 to i64*, !dbg !10325 +Round 1 +Round end + load (3.157895e-01) from %"class.std::vector.10"* %4 + load (2.105263e-01) from double* %3 + store (2.105263e-01) to double* %3 + load (2.105263e-01) from double* %3 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %3 + load: 4.210526e-01 store: 2.105263e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.10"* %4 + load: 3.157895e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z12distInitCommRSt6vectorIlSaIlEES2_l +Round 0 + alias entry %6 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %1, i64 0, i32 0, i32 0, i32 1, !dbg !10273 + alias entry %7 = bitcast i64** %6 to i64*, !dbg !10273 + alias entry %9 = bitcast %"class.std::vector.0"* %1 to i64*, !dbg !10280 +Round 1 +Round end + load (1.000000e+00) from %"class.std::vector.0"* %1 + load (1.000000e+00) from %"class.std::vector.0"* %1 + Frequency of %"class.std::vector.0"* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %1 + load: 2.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..4 +Round 0 + alias entry %29 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 0 + alias entry %31 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %5, i64 0, i32 0, i32 0, i32 0 +Round 1 +Round end + load (3.200000e-01) from %"class.std::vector.0"* %3 + load (3.200000e-01) from %"class.std::vector.0"* %5 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %3 + load: 3.200000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %5 + load: 3.200000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z15distInitLouvainRK5GraphRSt6vectorIlSaIlEES5_RS2_IdSaIdEES8_RS2_I4CommSaIS9_EESC_Rdi +Round 0 + alias entry %16 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 0, !dbg !10485 + alias entry %20 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 2, !dbg !10502 + alias entry %22 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !10514 + alias entry %24 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %3, i64 0, i32 0, i32 0, i32 1, !dbg !10532 + alias entry %25 = bitcast double** %24 to i64*, !dbg !10532 + alias entry %27 = bitcast %"class.std::vector.10"* %3 to i64*, !dbg !10536 + alias entry %40 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %1, i64 0, i32 0, i32 0, i32 1, !dbg !10572 + alias entry %41 = bitcast i64** %40 to i64*, !dbg !10572 + alias entry %43 = bitcast %"class.std::vector.0"* %1 to i64*, !dbg !10574 + alias entry %56 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %2, i64 0, i32 0, i32 0, i32 1, !dbg !10600 + alias entry %57 = bitcast i64** %56 to i64*, !dbg !10600 + alias entry %59 = bitcast %"class.std::vector.0"* %2 to i64*, !dbg !10601 + alias entry %72 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %4, i64 0, i32 0, i32 0, i32 1, !dbg !10622 + alias entry %73 = bitcast double** %72 to i64*, !dbg !10622 + alias entry %75 = bitcast %"class.std::vector.10"* %4 to i64*, !dbg !10623 + alias entry %88 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %5, i64 0, i32 0, i32 0, i32 1, !dbg !10654 + alias entry %89 = bitcast %struct.Comm** %88 to i64*, !dbg !10654 + alias entry %91 = bitcast %"class.std::vector.15"* %5 to i64*, !dbg !10658 + alias entry %104 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %6, i64 0, i32 0, i32 0, i32 1, !dbg !10685 + alias entry %105 = bitcast %struct.Comm** %104 to i64*, !dbg !10685 + alias entry %107 = bitcast %"class.std::vector.15"* %6 to i64*, !dbg !10686 +Round 1 +Round end + load (1.000000e+00) from %class.Graph* %0 + load (1.000000e+00) from %class.Graph* %0 + load (1.000000e+00) from %class.Graph* %0 + load (1.000000e+00) from %"class.std::vector.10"* %3 + load (1.000000e+00) from %"class.std::vector.10"* %3 +Warning: wrong traversal order, or recursive call +On function _Z15distGetMaxIndexP7clmap_tRiPdS1_dPK4Commdldllld +Round 0 + alias entry %22 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %0, i64 %21 + alias entry %28 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %0, i64 0, i32 0, !dbg !10320 + alias entry %33 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %32, i32 1, !dbg !10330 + alias entry %35 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %32, i32 0, !dbg !10333 + alias entry %37 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %0, i64 0, i32 1, !dbg !10335 + alias entry %41 = getelementptr inbounds double, double* %2, i64 %38, !dbg !10340 + alias entry %60 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %0, i64 1, !dbg !10352 + alias entry %80 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %79, i32 1, !dbg !10330 + alias entry %82 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %79, i32 0, !dbg !10333 + alias entry %88 = getelementptr inbounds double, double* %2, i64 %85, !dbg !10340 + alias entry %124 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %123, i32 1, !dbg !10330 + alias entry %126 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %123, i32 0, !dbg !10333 + alias entry %132 = getelementptr inbounds double, double* %2, i64 %129, !dbg !10340 +Round 1 +Round end + load (1.000000e+00) from double* %2 + load (1.000000e+00) from i32* %3 + load (1.000000e+00) from i32* %1 + load (5.000000e-01) from %struct.clmap_t* %0 + load (2.500000e-01) from %struct.Comm* %5 + load (2.500000e-01) from %struct.Comm* %5 + load (2.500000e-01) from %struct.clmap_t* %0 + load (1.250000e-01) from double* %2 + load (9.984375e+00) from %struct.Comm* %5 + load (9.984375e+00) from %struct.Comm* %5 + load (4.984375e+00) from double* %2 + load (9.984375e+00) from %struct.Comm* %5 + load (9.984375e+00) from %struct.Comm* %5 + load (4.984375e+00) from double* %2 + Frequency of %struct.clmap_t* %0 + load: 7.500000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %2 + load: 1.109375e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %3 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %5 + load: 4.043750e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z24distBuildLocalMapCounterllP7clmap_tRiPdS1_PK4EdgePKllll +Round 0 + alias entry %20 = getelementptr inbounds %struct.Edge, %struct.Edge* %6, i64 %19, i32 0, !dbg !10308 + alias entry %21 = getelementptr inbounds %struct.Edge, %struct.Edge* %6, i64 %19, i32 1, !dbg !10310 + alias entry %30 = getelementptr inbounds i64, i64* %7, i64 %29, !dbg !10326 + alias entry %37 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %2, i64 %35, i32 0, !dbg !10337 + alias entry %45 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %2, i64 %34, !dbg !10348 + alias entry %55 = getelementptr inbounds double, double* %4, i64 %49, !dbg !10358 + alias entry %61 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %2, i64 %34, i32 0, !dbg !10364 + alias entry %62 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %2, i64 %34, i32 1, !dbg !10367 + alias entry %68 = bitcast double* %21 to i64*, !dbg !10375 + alias entry %71 = getelementptr inbounds double, double* %4, i64 %70, !dbg !10377 + alias entry %72 = bitcast double* %71 to i64*, !dbg !10378 +Round 1 +Round end + load (1.593750e+01) from %struct.Edge* %6 + load (7.937500e+00) from %struct.Edge* %6 + load (1.593750e+01) from i64* %7 + load (1.593750e+01) from i32* %3 + load (1.625000e+02) from %struct.clmap_t* %2 + load (9.937500e+00) from i32* %5 + load (4.937500e+00) from %struct.Edge* %6 + load (4.937500e+00) from double* %4 + store (4.937500e+00) to double* %4 + store (5.437500e+00) to %struct.clmap_t* %2 + store (5.437500e+00) to %struct.clmap_t* %2 + store (5.437500e+00) to i32* %3 + load (1.093750e+01) from i32* %5 + load (5.437500e+00) from %struct.Edge* %6 + store (5.437500e+00) to double* %4 + store (5.437500e+00) to i32* %5 + Frequency of %struct.clmap_t* %2 + load: 1.625000e+02 store: 1.087500e+01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %3 + load: 1.593750e+01 store: 5.437500e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %4 + load: 4.937500e+00 store: 1.037500e+01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %5 + load: 2.087500e+01 store: 5.437500e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Edge* %6 + load: 3.425000e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %7 + load: 1.593750e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z27distExecuteLouvainIterationlPKlS0_PK4EdgeS0_PlPKdP4CommS8_dPdi +Round 0 + alias entry %18 = getelementptr inbounds i64, i64* %2, i64 %17, !dbg !10316 + alias entry %20 = getelementptr inbounds i64, i64* %4, i64 %0, !dbg !10322 + alias entry %23 = getelementptr inbounds i64, i64* %1, i64 %0, !dbg !10329 + alias entry %26 = getelementptr inbounds i64, i64* %1, i64 %25, !dbg !10332 + alias entry %30 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %22, i32 0, !dbg !10337 + alias entry %32 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %22, i32 1, !dbg !10341 + alias entry %47 = getelementptr inbounds %struct.Edge, %struct.Edge* %3, i64 %46, i32 0, !dbg !10401 + alias entry %48 = getelementptr inbounds %struct.Edge, %struct.Edge* %3, i64 %46, i32 1, !dbg !10403 + alias entry %57 = getelementptr inbounds i64, i64* %4, i64 %56, !dbg !10414 + alias entry %93 = bitcast double* %48 to i64*, !dbg !10457 + alias entry %116 = getelementptr inbounds double, double* %10, i64 %0, !dbg !10470 + alias entry %120 = getelementptr inbounds double, double* %6, i64 %0, !dbg !10473 + alias entry %137 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %136, i32 1, !dbg !10533 + alias entry %139 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %136, i32 0, !dbg !10534 + alias entry %183 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %182, i32 1, !dbg !10533 + alias entry %185 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %182, i32 0, !dbg !10534 + alias entry %230 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %229, i32 1, !dbg !10572 + alias entry %231 = bitcast double* %230 to i64*, !dbg !10573 + alias entry %242 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %229, i32 0, !dbg !10575 + alias entry %244 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %22, i32 1, !dbg !10578 + alias entry %246 = bitcast double* %244 to i64*, !dbg !10581 + alias entry %257 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %22, i32 0, !dbg !10583 + alias entry %261 = getelementptr inbounds i64, i64* %5, i64 %0, !dbg !10587 + alias entry %264 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %263, i32 1, !dbg !10533 + alias entry %266 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %263, i32 0, !dbg !10534 +Round 1 +Round end + load (1.000000e+00) from i64* %2 + load (1.000000e+00) from i64* %4 + load (1.000000e+00) from i64* %1 + load (1.000000e+00) from i64* %1 + load (5.000000e-01) from %struct.Comm* %7 + load (5.000000e-01) from %struct.Comm* %7 + load (7.992188e+00) from %struct.Edge* %3 + load (3.992188e+00) from %struct.Edge* %3 + load (7.992188e+00) from i64* %4 + load (2.492188e+00) from %struct.Edge* %3 + load (2.742188e+00) from %struct.Edge* %3 + load (5.000000e-01) from double* %10 + store (5.000000e-01) to double* %10 + load (5.000000e-01) from double* %6 + load (1.250000e-01) from %struct.Comm* %7 + load (1.250000e-01) from %struct.Comm* %7 + load (4.992188e+00) from %struct.Comm* %7 + load (4.992188e+00) from %struct.Comm* %7 + load (2.500000e-01) from %struct.Comm* %8 + load (2.500000e-01) from double* %6 + load (2.500000e-01) from %struct.Comm* %8 + store (1.000000e+00) to i64* %5 + load (4.992188e+00) from %struct.Comm* %7 + load (4.992188e+00) from %struct.Comm* %7 + Frequency of i64* %1 + load: 2.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %2 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Edge* %3 + load: 1.721875e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %4 + load: 8.992188e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %5 + load: 0.000000e+00 store: 1.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %6 + load: 7.500000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %7 + load: 2.121875e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %8 + load: 5.000000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %10 + load: 5.000000e-01 store: 5.000000e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z21distComputeModularityRK5GraphP4CommPKddi +Round 0 + alias entry %14 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 2, !dbg !10288 + alias entry %16 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !10304 + base alias entry %35 = bitcast i8** %34 to double**, !dbg !10317 + base alias entry %37 = bitcast i8** %36 to double**, !dbg !10317 + base alias entry %45 = bitcast i8** %44 to %struct.Comm**, !dbg !10317 + base alias entry %47 = bitcast i8** %46 to %struct.Comm**, !dbg !10317 +Round 1 + base alias entry %34 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 2, !dbg !10317 + base alias entry %36 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 2, !dbg !10317 + base alias entry %44 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 4, !dbg !10317 + base alias entry %46 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 4, !dbg !10317 +Round 2 + base alias offset entry (2) %11 = alloca [5 x i8*], align 8 + base alias offset entry (2) %12 = alloca [5 x i8*], align 8 + base alias offset entry (-1) %39 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 3, !dbg !10317 + base alias offset entry (-1) %41 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 3, !dbg !10317 + base alias offset entry (4) %11 = alloca [5 x i8*], align 8 + base alias offset entry (-2) %44 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 4, !dbg !10317 + base alias offset entry (4) %12 = alloca [5 x i8*], align 8 + base alias offset entry (-2) %46 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 4, !dbg !10317 +Round 3 + base alias offset entry (4) %24 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 0, !dbg !10317 + base alias offset entry (2) %24 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 0, !dbg !10317 + base alias offset entry (4) %26 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 0, !dbg !10317 + base alias offset entry (2) %26 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 0, !dbg !10317 + base alias offset entry (3) %29 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 1, !dbg !10317 + base alias offset entry (1) %29 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 1, !dbg !10317 + base alias offset entry (3) %31 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 1, !dbg !10317 + base alias offset entry (1) %31 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 1, !dbg !10317 + base alias offset entry (2) %34 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 2, !dbg !10317 + base alias offset entry (2) %36 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 2, !dbg !10317 + base alias offset entry (1) %39 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 3, !dbg !10317 + base alias offset entry (1) %41 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 3, !dbg !10317 +Round 4 +Round end + load (1.000000e+00) from %class.Graph* %0 + load (1.000000e+00) from %class.Graph* %0 + Frequency of %class.Graph* %0 + load: 2.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %2 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp.reduction.reduction_func.7 +Round 0 + alias entry %3 = bitcast i8* %1 to double**, !dbg !10261 + alias entry %5 = bitcast i8* %0 to double**, !dbg !10261 + alias entry %7 = getelementptr inbounds i8, i8* %1, i64 8, !dbg !10261 + alias entry %8 = bitcast i8* %7 to double**, !dbg !10261 + alias entry %10 = getelementptr inbounds i8, i8* %0, i64 8, !dbg !10261 + alias entry %11 = bitcast i8* %10 to double**, !dbg !10261 +Round 1 +Round end + load (1.000000e+00) from i8* %1 + load (1.000000e+00) from i8* %0 + load (1.000000e+00) from i8* %1 + load (1.000000e+00) from i8* %0 + Frequency of i8* %0 + load: 2.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i8* %1 + load: 2.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..8 +Round 0 + alias entry %39 = getelementptr inbounds double, double* %6, i64 %38, !dbg !10318 + alias entry %42 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %38, i32 1, !dbg !10321 + alias entry %62 = bitcast double* %5 to i64*, !dbg !10329 + alias entry %74 = bitcast double* %7 to i64*, !dbg !10329 +Round 1 +Round end + load (1.010526e+01) from double* %6 + load (1.010526e+01) from %struct.Comm* %8 + load (2.105263e-01) from double* %5 + store (2.105263e-01) to double* %5 + load (2.105263e-01) from double* %7 + store (2.105263e-01) to double* %7 + load (2.105263e-01) from double* %5 + load (2.105263e-01) from double* %7 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %5 + load: 4.210526e-01 store: 2.105263e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %6 + load: 1.010526e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %7 + load: 4.210526e-01 store: 2.105263e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %8 + load: 1.010526e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp.reduction.reduction_func.9 +Round 0 + alias entry %3 = bitcast i8* %1 to double**, !dbg !10261 + alias entry %5 = bitcast i8* %0 to double**, !dbg !10261 + alias entry %7 = getelementptr inbounds i8, i8* %1, i64 8, !dbg !10261 + alias entry %8 = bitcast i8* %7 to double**, !dbg !10261 + alias entry %10 = getelementptr inbounds i8, i8* %0, i64 8, !dbg !10261 + alias entry %11 = bitcast i8* %10 to double**, !dbg !10261 +Round 1 +Round end + load (1.000000e+00) from i8* %1 + load (1.000000e+00) from i8* %0 + load (1.000000e+00) from i8* %1 + load (1.000000e+00) from i8* %0 + Frequency of i8* %0 + load: 2.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i8* %1 + load: 2.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..10 +Round 0 + alias entry %65 = bitcast double* %3 to i64*, !dbg !10310 + alias entry %77 = bitcast double* %5 to i64*, !dbg !10310 +Round 1 +Round end + load (2.916667e-01) from double* %3 + store (2.916667e-01) to double* %3 + load (2.916667e-01) from double* %5 + store (2.916667e-01) to double* %5 + load (3.333333e-01) from double* %3 + load (3.333333e-01) from double* %5 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %3 + load: 6.250000e-01 store: 2.916667e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %4 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %5 + load: 6.250000e-01 store: 2.916667e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %6 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z20distUpdateLocalCinfolP4CommPKS_ +Round 0 + base alias entry %15 = bitcast i8** %14 to %struct.Comm**, !dbg !10269 + base alias entry %17 = bitcast i8** %16 to %struct.Comm**, !dbg !10269 + base alias entry %20 = bitcast i8** %19 to %struct.Comm**, !dbg !10269 + base alias entry %22 = bitcast i8** %21 to %struct.Comm**, !dbg !10269 +Round 1 + base alias entry %14 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 1, !dbg !10269 + base alias entry %16 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 1, !dbg !10269 + base alias entry %19 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 2, !dbg !10269 + base alias entry %21 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 2, !dbg !10269 +Round 2 + base alias offset entry (1) %5 = alloca [3 x i8*], align 8 + base alias offset entry (1) %6 = alloca [3 x i8*], align 8 + base alias offset entry (2) %5 = alloca [3 x i8*], align 8 + base alias offset entry (-1) %19 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 2, !dbg !10269 + base alias offset entry (2) %6 = alloca [3 x i8*], align 8 + base alias offset entry (-1) %21 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 2, !dbg !10269 +Round 3 + base alias offset entry (1) %9 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 0, !dbg !10269 + base alias offset entry (2) %9 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 0, !dbg !10269 + base alias offset entry (1) %11 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 0, !dbg !10269 + base alias offset entry (2) %11 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 0, !dbg !10269 + base alias offset entry (1) %14 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 1, !dbg !10269 + base alias offset entry (1) %16 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 1, !dbg !10269 +Round 4 +Round end + Frequency of %struct.Comm* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %2 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..13 +Round 0 + alias entry %33 = getelementptr %struct.Comm, %struct.Comm* %5, i64 %26, !dbg !10304 + alias entry %34 = getelementptr %struct.Comm, %struct.Comm* %5, i64 %25, i32 1, !dbg !10304 + alias entry %35 = getelementptr %struct.Comm, %struct.Comm* %6, i64 %26, !dbg !10304 + alias entry %36 = getelementptr %struct.Comm, %struct.Comm* %6, i64 %25, i32 1, !dbg !10304 + alias entry %37 = getelementptr %struct.Comm, %struct.Comm* %5, i64 %26, i32 1, !dbg !10304 + alias entry %38 = getelementptr %struct.Comm, %struct.Comm* %5, i64 %29, !dbg !10304 + alias entry %39 = getelementptr %struct.Comm, %struct.Comm* %6, i64 %26, i32 1, !dbg !10304 + alias entry %40 = getelementptr %struct.Comm, %struct.Comm* %6, i64 %29, !dbg !10304 + alias entry %41 = bitcast double* %36 to %struct.Comm*, !dbg !10304 + alias entry %43 = bitcast double* %34 to %struct.Comm*, !dbg !10304 + alias entry %46 = bitcast %struct.Comm* %40 to double*, !dbg !10304 + alias entry %48 = bitcast %struct.Comm* %38 to double*, !dbg !10304 + alias entry %63 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %57, i32 0, !dbg !10304 + alias entry %64 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %58, i32 0, !dbg !10304 + alias entry %65 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %59, i32 0, !dbg !10304 + alias entry %66 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %60, i32 0, !dbg !10304 + alias entry %67 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %61, i32 0, !dbg !10304 + alias entry %68 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %62, i32 0, !dbg !10304 + alias entry %69 = bitcast i64* %63 to <4 x i64>*, !dbg !10304 + alias entry %70 = bitcast i64* %64 to <4 x i64>*, !dbg !10304 + alias entry %71 = bitcast i64* %65 to <4 x i64>*, !dbg !10304 + alias entry %72 = bitcast i64* %66 to <4 x i64>*, !dbg !10304 + alias entry %73 = bitcast i64* %67 to <4 x i64>*, !dbg !10304 + alias entry %74 = bitcast i64* %68 to <4 x i64>*, !dbg !10304 + alias entry %93 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %57, i32 0, !dbg !10307 + alias entry %94 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %58, i32 0, !dbg !10307 + alias entry %95 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %59, i32 0, !dbg !10307 + alias entry %96 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %60, i32 0, !dbg !10307 + alias entry %97 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %61, i32 0, !dbg !10307 + alias entry %98 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %62, i32 0, !dbg !10307 + alias entry %99 = bitcast i64* %93 to <4 x i64>*, !dbg !10307 + alias entry %100 = bitcast i64* %94 to <4 x i64>*, !dbg !10307 + alias entry %101 = bitcast i64* %95 to <4 x i64>*, !dbg !10307 + alias entry %102 = bitcast i64* %96 to <4 x i64>*, !dbg !10307 + alias entry %103 = bitcast i64* %97 to <4 x i64>*, !dbg !10307 + alias entry %104 = bitcast i64* %98 to <4 x i64>*, !dbg !10307 + alias entry %135 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %57, i32 1, !dbg !10309 + alias entry %136 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %58, i32 1, !dbg !10309 + alias entry %137 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %59, i32 1, !dbg !10309 + alias entry %138 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %60, i32 1, !dbg !10309 + alias entry %139 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %61, i32 1, !dbg !10309 + alias entry %140 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %62, i32 1, !dbg !10309 + alias entry %147 = getelementptr inbounds double, double* %135, i64 -1, !dbg !10309 + alias entry %148 = bitcast double* %147 to <4 x double>*, !dbg !10309 + alias entry %149 = getelementptr inbounds double, double* %136, i64 -1, !dbg !10309 + alias entry %150 = bitcast double* %149 to <4 x double>*, !dbg !10309 + alias entry %151 = getelementptr inbounds double, double* %137, i64 -1, !dbg !10309 + alias entry %152 = bitcast double* %151 to <4 x double>*, !dbg !10309 + alias entry %153 = getelementptr inbounds double, double* %138, i64 -1, !dbg !10309 + alias entry %154 = bitcast double* %153 to <4 x double>*, !dbg !10309 + alias entry %155 = getelementptr inbounds double, double* %139, i64 -1, !dbg !10309 + alias entry %156 = bitcast double* %155 to <4 x double>*, !dbg !10309 + alias entry %157 = getelementptr inbounds double, double* %140, i64 -1, !dbg !10309 + alias entry %158 = bitcast double* %157 to <4 x double>*, !dbg !10309 + alias entry %178 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %177, i32 0, !dbg !10304 + alias entry %180 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %177, i32 0, !dbg !10307 + alias entry %183 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %177, i32 1, !dbg !10318 + alias entry %185 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %177, i32 1, !dbg !10309 +Round 1 +Round end + load (2.500000e+00) from %struct.Comm* %6 + load (2.500000e+00) from %struct.Comm* %6 + load (2.500000e+00) from %struct.Comm* %6 + load (2.500000e+00) from %struct.Comm* %6 + load (2.500000e+00) from %struct.Comm* %6 + load (2.500000e+00) from %struct.Comm* %6 + load (2.500000e+00) from %struct.Comm* %5 + load (2.500000e+00) from %struct.Comm* %5 + load (2.500000e+00) from %struct.Comm* %5 + load (2.500000e+00) from %struct.Comm* %5 + load (2.500000e+00) from %struct.Comm* %5 + load (2.500000e+00) from %struct.Comm* %5 + store (2.500000e+00) to %struct.Comm* %5 + store (2.500000e+00) to %struct.Comm* %5 + store (2.500000e+00) to %struct.Comm* %5 + store (2.500000e+00) to %struct.Comm* %5 + store (2.500000e+00) to %struct.Comm* %5 + store (2.500000e+00) to %struct.Comm* %5 + load (9.088235e+00) from %struct.Comm* %6 + load (9.088235e+00) from %struct.Comm* %5 + store (9.088235e+00) to %struct.Comm* %5 + load (9.088235e+00) from %struct.Comm* %6 + load (9.088235e+00) from %struct.Comm* %5 + store (9.088235e+00) to %struct.Comm* %5 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %5 + load: 3.317647e+01 store: 3.317647e+01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %6 + load: 3.317647e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..14 +Round 0 +Round end + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %3 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %4 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z16distCleanCWandCUlPdP4Comm +Round 0 + base alias entry %17 = bitcast i8** %16 to double**, !dbg !10269 + base alias entry %19 = bitcast i8** %18 to double**, !dbg !10269 + base alias entry %22 = bitcast i8** %21 to %struct.Comm**, !dbg !10269 + base alias entry %24 = bitcast i8** %23 to %struct.Comm**, !dbg !10269 +Round 1 + base alias entry %16 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 1, !dbg !10269 + base alias entry %18 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 1, !dbg !10269 + base alias entry %21 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 2, !dbg !10269 + base alias entry %23 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 2, !dbg !10269 +Round 2 + base alias offset entry (1) %5 = alloca [3 x i8*], align 8 + base alias offset entry (1) %6 = alloca [3 x i8*], align 8 + base alias offset entry (2) %5 = alloca [3 x i8*], align 8 + base alias offset entry (-1) %21 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 2, !dbg !10269 + base alias offset entry (2) %6 = alloca [3 x i8*], align 8 + base alias offset entry (-1) %23 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 2, !dbg !10269 +Round 3 + base alias offset entry (1) %11 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 0, !dbg !10269 + base alias offset entry (2) %11 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 0, !dbg !10269 + base alias offset entry (1) %13 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 0, !dbg !10269 + base alias offset entry (2) %13 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 0, !dbg !10269 + base alias offset entry (1) %16 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 1, !dbg !10269 + base alias offset entry (1) %18 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 1, !dbg !10269 +Round 4 +Round end + Frequency of double* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %2 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..18 +Round 0 + alias entry %29 = getelementptr inbounds double, double* %5, i64 %28, !dbg !10304 + alias entry %30 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %28, i32 0, !dbg !10309 + alias entry %33 = bitcast i64* %30 to i8*, !dbg !10299 +Round 1 +Round end + store (1.058333e+01) to double* %5 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %5 + load: 0.000000e+00 store: 1.058333e+01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %6 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..19 +Round 0 +Round end + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %3 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %4 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z21fillRemoteCommunitiesRK5GraphiiRKmS3_RKSt6vectorIlSaIlEES8_S8_S8_S8_RKS4_I4CommSaIS9_EERSt3mapIlS9_St4lessIlESaISt4pairIKlS9_EEERSt13unordered_mapIllSt4hashIlESt8equal_toIlESaISH_ISI_lEEESM_ +Round 0 + alias entry %126 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 0, !dbg !11433 + alias entry %130 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 2, !dbg !11449 + alias entry %132 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !11460 + alias entry %190 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %6, i64 0, i32 0, i32 0, i32 0 + alias entry %197 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %5, i64 0, i32 0, i32 0, i32 0 + alias entry %299 = getelementptr inbounds %"class.std::unordered_map", %"class.std::unordered_map"* %12, i64 0, i32 0, i32 2, i32 0, !dbg !11792 + alias entry %300 = bitcast %"struct.std::__detail::_Hash_node_base"* %299 to %"struct.std::__detail::_Hash_node"**, !dbg !11793 + alias entry %308 = bitcast %"class.std::unordered_map"* %12 to i8**, !dbg !11836 + alias entry %310 = getelementptr inbounds %"class.std::unordered_map", %"class.std::unordered_map"* %12, i64 0, i32 0, i32 1, !dbg !11842 + alias entry %313 = bitcast %"struct.std::__detail::_Hash_node_base"* %299 to i8*, !dbg !11846 + alias entry %316 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %8, i64 0, i32 0, i32 0, i32 0 + alias entry %317 = getelementptr inbounds %"class.std::unordered_map", %"class.std::unordered_map"* %12, i64 0, i32 0 + alias entry %318 = getelementptr inbounds %"class.std::unordered_map", %"class.std::unordered_map"* %12, i64 0, i32 0, i32 0 + alias entry %319 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6 + alias entry %320 = bitcast %"class.std::vector.0"* %319 to i64* + alias entry %321 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 1 + alias entry %322 = bitcast i64** %321 to i64* + alias entry %325 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %9, i64 0, i32 0, i32 0, i32 0 + alias entry %326 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6 + alias entry %327 = bitcast %"class.std::vector.0"* %326 to i64* + alias entry %328 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 1 + alias entry %329 = bitcast i64** %328 to i64* + alias entry %800 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, !dbg !13393 + alias entry %801 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 1, i32 1, !dbg !13405 + alias entry %802 = bitcast %"struct.std::_Rb_tree_node_base"** %801 to %"struct.std::_Rb_tree_node"**, !dbg !13405 + alias entry %808 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 1, !dbg !13419 + alias entry %809 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 1, i32 2, !dbg !13425 + base alias entry %809 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 1, i32 2, !dbg !13425 + alias entry %810 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 1, i32 3, !dbg !13435 + base alias entry %810 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 1, i32 3, !dbg !13435 + alias entry %811 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 2, !dbg !13437 + alias entry %812 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, !dbg !13442 + alias entry %813 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 1, i32 1, !dbg !13447 + alias entry %814 = bitcast %"struct.std::_Rb_tree_node_base"** %813 to %"struct.std::_Rb_tree_node"**, !dbg !13447 + alias entry %820 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 1, !dbg !13452 + alias entry %821 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 1, i32 2, !dbg !13455 + base alias entry %821 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 1, i32 2, !dbg !13455 + alias entry %822 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 1, i32 3, !dbg !13462 + base alias entry %822 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 1, i32 3, !dbg !13462 + alias entry %823 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 2, !dbg !13464 + alias entry %828 = bitcast %"struct.std::_Rb_tree_node_base"** %801 to i64* + alias entry %830 = bitcast %"struct.std::_Rb_tree_node_base"* %808 to %"struct.std::_Rb_tree_node"* + alias entry %832 = bitcast %"struct.std::_Rb_tree_node_base"** %813 to i64* + alias entry %834 = bitcast %"struct.std::_Rb_tree_node_base"* %820 to %"struct.std::_Rb_tree_node"* + alias entry %943 = load %"struct.std::_Rb_tree_node_base"*, %"struct.std::_Rb_tree_node_base"** %809, align 8, !dbg !14017, !tbaa !14018 + alias entry %998 = load %"struct.std::_Rb_tree_node_base"*, %"struct.std::_Rb_tree_node_base"** %821, align 8, !dbg !14306, !tbaa !14018 +Round 1 +Round end + load (1.000000e+00) from i64* %4 + load (9.999994e-01) from i64* %3 + load (9.999963e-01) from %class.Graph* %0 + load (9.999963e-01) from %class.Graph* %0 + load (9.999963e-01) from %class.Graph* %0 + load (9.999803e+00) from %"class.std::vector.0"* %6 + load (1.999960e+01) from %"class.std::vector.0"* %6 + load (6.249782e+00) from %"class.std::vector.0"* %5 + load (1.249956e+01) from %"class.std::vector.0"* %5 + load (9.999777e-01) from %"class.std::unordered_map"* %12 + load (9.999777e-01) from %"class.std::unordered_map"* %12 + load (9.999777e-01) from %"class.std::unordered_map"* %12 + load (1.999809e+01) from %"class.std::vector.0"* %8 + load (1.999807e+01) from %"class.std::unordered_map"* %12 + load (1.999807e+01) from %"class.std::unordered_map"* %12 +Warning: wrong traversal order, or recursive call +On function .omp_outlined..22 +Round 0 + alias entry %31 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 0 + alias entry %33 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 0 + alias entry %35 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %6, i64 0, i32 0, i32 0, i32 0 +Round 1 +Round end + load (1.000000e+00) from i64* %2 + load (3.200000e-01) from %"class.std::vector.0"* %3 + load (3.200000e-01) from %"class.std::vector.0"* %4 + load (3.200000e-01) from %"class.std::vector.0"* %6 + load (1.020000e+01) from i64* %5 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %2 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %3 + load: 3.200000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %4 + load: 3.200000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %5 + load: 1.020000e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %6 + load: 3.200000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp.reduction.reduction_func.24 +Round 0 + alias entry %3 = bitcast i8* %1 to i64**, !dbg !10261 + alias entry %5 = bitcast i8* %0 to i64**, !dbg !10261 +Round 1 +Round end + load (1.000000e+00) from i8* %1 + load (1.000000e+00) from i8* %0 + Frequency of i8* %0 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i8* %1 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..25 +Round 0 + alias entry %33 = getelementptr inbounds %"class.std::vector.29", %"class.std::vector.29"* %4, i64 0, i32 0, i32 0, i32 0 + alias entry %35 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 0 +Round 1 +Round end + load (1.000000e+00) from i32* %2 + load (3.157895e-01) from %"class.std::vector.29"* %4 + load (3.157895e-01) from %"class.std::vector.0"* %3 + load (2.105263e-01) from i64* %5 + store (2.105263e-01) to i64* %5 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %2 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %3 + load: 3.157895e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.29"* %4 + load: 3.157895e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %5 + load: 2.105263e-01 store: 2.105263e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp.reduction.reduction_func.27 +Round 0 + alias entry %3 = bitcast i8* %1 to i64**, !dbg !10261 + alias entry %5 = bitcast i8* %0 to i64**, !dbg !10261 +Round 1 +Round end + load (1.000000e+00) from i8* %1 + load (1.000000e+00) from i8* %0 + Frequency of i8* %0 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i8* %1 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..28 +Round 0 + alias entry %32 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 0 +Round 1 +Round end + load (1.000000e+00) from i32* %2 + load (3.157895e-01) from %"class.std::vector.0"* %4 + load (2.105263e-01) from i64* %3 + store (2.105263e-01) to i64* %3 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %2 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %3 + load: 2.105263e-01 store: 2.105263e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %4 + load: 3.157895e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..30 +Round 0 + alias entry %20 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %2, i64 0, i32 0, i32 0, i32 0, !dbg !10503 + alias entry %34 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 0 + alias entry %35 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %7, i64 0, i32 0, i32 0, i32 0 + alias entry %36 = getelementptr inbounds %"class.std::vector.52", %"class.std::vector.52"* %6, i64 0, i32 0, i32 0, i32 0 +Round 1 +Round end + load (1.000000e+00) from %"class.std::vector.0"* %2 + load (2.047500e+02) from %"class.std::vector.0"* %4 + load (2.047500e+02) from %"class.std::vector.15"* %7 + load (2.047500e+02) from %"class.std::vector.52"* %6 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %2 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %4 + load: 2.047500e+02 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.52"* %6 + load: 2.047500e+02 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.15"* %7 + load: 2.047500e+02 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z22createCommunityMPITypev +Round 0 +Round end +On function _Z23destroyCommunityMPITypev +Round 0 +Round end +On function _Z23updateRemoteCommunitiesRK5GraphRSt6vectorI4CommSaIS3_EERKSt3mapIlS3_St4lessIlESaISt4pairIKlS3_EEEii +Round 0 + alias entry %19 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 0, !dbg !10869 + alias entry %46 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !11050 + alias entry %48 = getelementptr inbounds %"class.std::map", %"class.std::map"* %2, i64 0, i32 0, i32 0, i32 1, i32 2, !dbg !11068 + alias entry %49 = bitcast %"struct.std::_Rb_tree_node_base"** %48 to i64*, !dbg !11068 + alias entry %51 = getelementptr inbounds %"class.std::map", %"class.std::map"* %2, i64 0, i32 0, i32 0, i32 1, !dbg !11085 + alias entry %55 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6 + alias entry %56 = bitcast %"class.std::vector.0"* %55 to i64* + alias entry %57 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 1 + alias entry %58 = bitcast i64** %57 to i64* +Round 1 +Round end + load (1.000000e+00) from %class.Graph* %0 + load (9.999994e-01) from %class.Graph* %0 + load (9.999994e-01) from %"class.std::map"* %2 + load (1.999985e+01) from %class.Graph* %0 + load (1.999985e+01) from %class.Graph* %0 + Frequency of %class.Graph* %0 + load: 4.199970e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.15"* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::map"* %2 + load: 9.999994e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..32 +Round 0 + alias entry %28 = getelementptr inbounds %"class.std::vector.66", %"class.std::vector.66"* %4, i64 0, i32 0, i32 0, i32 0 + alias entry %30 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 0 +Round 1 +Round end + load (1.000000e+00) from i32* %2 + load (3.137255e-01) from %"class.std::vector.66"* %4 + load (3.137255e-01) from %"class.std::vector.0"* %3 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %2 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %3 + load: 3.137255e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.66"* %4 + load: 3.137255e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp.reduction.reduction_func.34 +Round 0 + alias entry %3 = bitcast i8* %1 to i64**, !dbg !10261 + alias entry %5 = bitcast i8* %0 to i64**, !dbg !10261 + alias entry %7 = getelementptr inbounds i8, i8* %1, i64 8, !dbg !10261 + alias entry %8 = bitcast i8* %7 to i64**, !dbg !10261 + alias entry %10 = getelementptr inbounds i8, i8* %0, i64 8, !dbg !10261 + alias entry %11 = bitcast i8* %10 to i64**, !dbg !10261 +Round 1 +Round end + load (1.000000e+00) from i8* %1 + load (1.000000e+00) from i8* %0 + load (1.000000e+00) from i8* %1 + load (1.000000e+00) from i8* %0 + Frequency of i8* %0 + load: 2.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i8* %1 + load: 2.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..35 +Round 0 + alias entry %36 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 0 + alias entry %38 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %6, i64 0, i32 0, i32 0, i32 0 +Round 1 +Round end + load (1.000000e+00) from i32* %2 + load (3.157895e-01) from %"class.std::vector.0"* %4 + load (3.157895e-01) from %"class.std::vector.0"* %6 + load (2.105263e-01) from i64* %3 + store (2.105263e-01) to i64* %3 + load (2.105263e-01) from i64* %5 + store (2.105263e-01) to i64* %5 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %2 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %3 + load: 2.105263e-01 store: 2.105263e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %4 + load: 3.157895e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %5 + load: 2.105263e-01 store: 2.105263e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %6 + load: 3.157895e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..37 +Round 0 + alias entry %26 = getelementptr inbounds %"class.std::vector.52", %"class.std::vector.52"* %3, i64 0, i32 0, i32 0, i32 0 + alias entry %27 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %4, i64 0, i32 0, i32 0, i32 0 +Round 1 +Round end + load (1.000000e+00) from i64* %2 + load (6.350000e+00) from %"class.std::vector.52"* %3 + load (6.350000e+00) from %"class.std::vector.15"* %4 + load (6.350000e+00) from i64* %5 + load (2.047500e+02) from i64* %5 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %2 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.52"* %3 + load: 6.350000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.15"* %4 + load: 6.350000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %5 + load: 2.111000e+02 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z18exchangeVertexReqsRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ii +Round 0 + alias entry %15 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 2, !dbg !10306 + alias entry %17 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !10319 + alias entry %51 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 1, !dbg !10485 + alias entry %52 = bitcast i64** %51 to i64*, !dbg !10485 + alias entry %54 = bitcast %"class.std::vector.0"* %4 to i64*, !dbg !10489 + alias entry %71 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 1, !dbg !10517 + alias entry %72 = bitcast i64** %71 to i64*, !dbg !10517 + alias entry %74 = bitcast %"class.std::vector.0"* %3 to i64*, !dbg !10518 + alias entry %91 = bitcast %"class.std::vector.0"* %3 to i8** + alias entry %94 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 0 + alias entry %98 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 0, !dbg !10598 + alias entry %99 = bitcast %"class.std::vector.0"* %4 to i8**, !dbg !10598 + alias entry %128 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %5, i64 0, i32 0, i32 0, i32 1, !dbg !10673 + alias entry %129 = bitcast i64** %128 to i64*, !dbg !10673 + alias entry %131 = bitcast %"class.std::vector.0"* %5 to i64*, !dbg !10674 + alias entry %147 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %6, i64 0, i32 0, i32 0, i32 1, !dbg !10696 + alias entry %148 = bitcast i64** %147 to i64*, !dbg !10696 + alias entry %150 = bitcast %"class.std::vector.0"* %6 to i64*, !dbg !10697 + alias entry %190 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %6, i64 0, i32 0, i32 0, i32 0 + alias entry %249 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %5, i64 0, i32 0, i32 0, i32 0 + alias entry %306 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %5, i64 0, i32 0, i32 0, i32 2, !dbg !11244 + alias entry %307 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %6, i64 0, i32 0, i32 0, i32 2, !dbg !11245 + alias entry %308 = bitcast i64** %306 to i64*, !dbg !11249 + alias entry %310 = bitcast i64** %307 to i64*, !dbg !11250 + alias entry %316 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 2, !dbg !11279 + alias entry %317 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 2, !dbg !11280 + alias entry %318 = bitcast i64** %316 to i64*, !dbg !11284 + alias entry %320 = bitcast i64** %317 to i64*, !dbg !11285 +Round 1 +Round end + load (1.000000e+00) from %class.Graph* %0 + load (1.000000e+00) from %class.Graph* %0 + load (9.999984e-01) from %"class.std::vector.0"* %4 + load (9.999984e-01) from %"class.std::vector.0"* %4 +Warning: wrong traversal order, or recursive call +On function .omp_outlined..39 +Round 0 + alias entry %26 = getelementptr inbounds %class.Graph, %class.Graph* %3, i64 0, i32 0, i32 0, i32 0, i32 0 + alias entry %27 = getelementptr inbounds %class.Graph, %class.Graph* %3, i64 0, i32 1, i32 0, i32 0, i32 0 + alias entry %28 = getelementptr inbounds %class.Graph, %class.Graph* %3, i64 0, i32 6 + alias entry %29 = bitcast %"class.std::vector.0"* %28 to i64* + alias entry %30 = getelementptr inbounds %class.Graph, %class.Graph* %3, i64 0, i32 6, i32 0, i32 0, i32 1 + alias entry %31 = bitcast i64** %30 to i64* + alias entry %32 = getelementptr inbounds %"class.std::vector.29", %"class.std::vector.29"* %5, i64 0, i32 0, i32 0, i32 0 +Round 1 +Round end + load (1.988141e+02) from %class.Graph* %3 + load (3.180957e+03) from %class.Graph* %3 + load (3.180957e+03) from %class.Graph* %3 + load (3.180957e+03) from %class.Graph* %3 + load (1.590478e+03) from %"class.std::vector.29"* %5 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %class.Graph* %3 + load: 9.741684e+03 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.29"* %5 + load: 1.590478e+03 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp.reduction.reduction_func.41 +Round 0 + alias entry %3 = bitcast i8* %1 to i64**, !dbg !10261 + alias entry %5 = bitcast i8* %0 to i64**, !dbg !10261 +Round 1 +Round end + load (1.000000e+00) from i8* %1 + load (1.000000e+00) from i8* %0 + Frequency of i8* %0 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i8* %1 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..42 +Round 0 + alias entry %32 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 0 +Round 1 +Round end + load (1.000000e+00) from i32* %2 + load (3.157895e-01) from %"class.std::vector.0"* %4 + load (2.105263e-01) from i64* %3 + store (2.105263e-01) to i64* %3 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %2 + load: 1.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %3 + load: 2.105263e-01 store: 2.105263e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %4 + load: 3.157895e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _Z17distLouvainMethodiiRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ddRi +Round 0 + alias entry %68 = getelementptr inbounds %class.Graph, %class.Graph* %2, i64 0, i32 2, !dbg !11180 + alias entry %85 = getelementptr inbounds %class.Graph, %class.Graph* %2, i64 0, i32 0, i32 0, i32 0, i32 1, !dbg !11380 + alias entry %86 = bitcast i64** %85 to i64*, !dbg !11380 + alias entry %88 = bitcast %class.Graph* %2 to i64*, !dbg !11384 + alias entry %93 = bitcast %class.Graph* %2 to i8**, !dbg !11392 + alias entry %98 = getelementptr inbounds %class.Graph, %class.Graph* %2, i64 0, i32 6, !dbg !11399 + alias entry %99 = getelementptr inbounds %class.Graph, %class.Graph* %2, i64 0, i32 6, i32 0, i32 0, i32 1, !dbg !11402 + alias entry %100 = bitcast i64** %99 to i64*, !dbg !11402 + alias entry %102 = bitcast %"class.std::vector.0"* %98 to i64*, !dbg !11403 + alias entry %107 = bitcast %"class.std::vector.0"* %98 to i8**, !dbg !11410 + alias entry %112 = getelementptr inbounds %class.Graph, %class.Graph* %2, i64 0, i32 1, !dbg !11417 + alias entry %113 = getelementptr inbounds %class.Graph, %class.Graph* %2, i64 0, i32 1, i32 0, i32 0, i32 1, !dbg !11424 + alias entry %114 = bitcast %struct.Edge** %113 to i64*, !dbg !11424 + alias entry %116 = bitcast %"class.std::vector.5"* %112 to i64*, !dbg !11428 + alias entry %121 = bitcast %"class.std::vector.5"* %112 to i8**, !dbg !11440 +Round 1 +Round end + load (9.999981e-01) from %class.Graph* %2 +Warning: wrong traversal order, or recursive call +On function .omp_outlined..45 +Round 0 +Round end + call (1.058333e+01, 2.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00) using i64* %5 + call (1.058333e+01, 1.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00) using i64* %6 + call (1.058333e+01, 1.721875e+01, 0.000000e+00, 0.000000e+00, 0.000000e+00) using %struct.Edge* %7 + call (1.058333e+01, 8.992188e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00) using i64* %8 + call (1.058333e+01, 0.000000e+00, 1.000000e+00, 0.000000e+00, 0.000000e+00) using i64* %9 + call (1.058333e+01, 7.500000e-01, 0.000000e+00, 0.000000e+00, 0.000000e+00) using double* %10 + call (1.058333e+01, 2.121875e+01, 0.000000e+00, 0.000000e+00, 0.000000e+00) using %struct.Comm* %11 + call (1.058333e+01, 5.000000e-01, 0.000000e+00, 0.000000e+00, 0.000000e+00) using %struct.Comm* %12 + call (1.058333e+01, 5.000000e-01, 5.000000e-01, 0.000000e+00, 0.000000e+00) using double* %14 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %5 + load: 2.116667e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %6 + load: 1.058333e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Edge* %7 + load: 1.822318e+02 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %8 + load: 9.516732e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %9 + load: 0.000000e+00 store: 1.058333e+01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %10 + load: 7.937500e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %11 + load: 2.245651e+02 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %12 + load: 5.291667e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %14 + load: 5.291667e+00 store: 5.291667e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..46 +Round 0 +Round end + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %3 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %4 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Edge* %5 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %6 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %7 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %8 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %9 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.Comm* %10 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of double* %12 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_outlined..49 +Round 0 + alias entry %28 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 0 +Round 1 +Round end + load (3.200000e-01) from %"class.std::vector.0"* %3 + load (3.200000e-01) from i64** %4 + load (3.200000e-01) from i64** %5 + Frequency of i32* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i32* %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %3 + load: 3.200000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64** %4 + load: 3.200000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64** %5 + load: 3.200000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function main +Round 0 + base alias entry %14 = alloca i8**, align 8 + alias entry %33 = load i8**, i8*** %14, align 8, !dbg !10342, !tbaa !10335 +Round 1 +Round end + Frequency of i8** %1 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZN11GenerateRGGC2ElP19ompi_communicator_t +Round 0 + alias entry %4 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 0, !dbg !10266 + alias entry %5 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 3, !dbg !10276 + base alias entry %5 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 3, !dbg !10276 + alias entry %6 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 5, !dbg !10279 + alias entry %8 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %5, align 8, !dbg !10281, !tbaa !10278 + alias entry %9 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 4, !dbg !10282 + alias entry %11 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 7, !dbg !10284 + alias entry %12 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 6, !dbg !10287 + alias entry %36 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 1, !dbg !10320 + alias entry %100 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %5, align 8, !dbg !10478, !tbaa !10278 + alias entry %171 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %5, align 8, !dbg !10565, !tbaa !10278 + alias entry %183 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 2, !dbg !10579 + alias entry %190 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %5, align 8, !dbg !10583, !tbaa !10278 +Round 1 +Round end + store (1.000000e+00) to %class.GenerateRGG* %0 + store (1.000000e+00) to %class.GenerateRGG* %0 + load (1.000000e+00) from %class.GenerateRGG* %0 + store (1.000000e+00) to %class.GenerateRGG* %0 + store (1.000000e+00) to %class.GenerateRGG* %0 + load (1.000000e+00) from %class.GenerateRGG* %0 + load (5.000000e-01) from %class.GenerateRGG* %0 + store (2.500000e-01) to %class.GenerateRGG* %0 + store (3.437500e-01) to %class.GenerateRGG* %0 + store (2.500000e-01) to %class.GenerateRGG* %0 + load (1.000000e+00) from %class.GenerateRGG* %0 + store (1.000000e+00) to %class.GenerateRGG* %0 + load (5.000000e-01) from %class.GenerateRGG* %0 + load (5.000000e-01) from %class.GenerateRGG* %0 + load (5.000000e-01) from %class.GenerateRGG* %0 + load (7.656250e-01) from %class.GenerateRGG* %0 + load (7.656250e-01) from %class.GenerateRGG* %0 + store (1.000000e+00) to %class.GenerateRGG* %0 + load (1.000000e+00) from %class.GenerateRGG* %0 + load (1.000000e+00) from %class.GenerateRGG* %0 + Frequency of %class.GenerateRGG* %0 + load: 8.531250e+00 store: 6.843750e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %struct.ompi_communicator_t* %2 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZN11GenerateRGG8generateEbbi +Round 0 + alias entry %27 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 1, !dbg !10306 + alias entry %75 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 6, !dbg !10592 + alias entry %112 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 7, !dbg !10709 + alias entry %153 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 0, !dbg !10828 + alias entry %156 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 4, !dbg !10832 + alias entry %160 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 5, !dbg !10836 + alias entry %362 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 3, !dbg !10915 + alias entry %696 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 3, !dbg !11101 + alias entry %772 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 2 + alias entry %1095 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 2 + alias entry %1388 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 2 +Round 1 +Round end + load (1.000000e+00) from %class.GenerateRGG* %0 + load (6.249994e-01) from %class.GenerateRGG* %0 + load (9.999990e-01) from %class.GenerateRGG* %0 + load (4.999995e-01) from %class.GenerateRGG* %0 + load (3.124994e-01) from %class.GenerateRGG* %0 + load (9.999985e-01) from %class.GenerateRGG* %0 + load (4.999993e-01) from %class.GenerateRGG* %0 + load (3.124992e-01) from %class.GenerateRGG* %0 + load (9.999971e-01) from %class.GenerateRGG* %0 + load (9.999971e-01) from %class.GenerateRGG* %0 + load (9.999962e-01) from %class.GenerateRGG* %0 + load (9.999962e-01) from %class.GenerateRGG* %0 + load (4.999966e-01) from %class.GenerateRGG* %0 + load (4.999971e-01) from %class.GenerateRGG* %0 + load (4.999971e-01) from %class.GenerateRGG* %0 + load (4.999966e-01) from %class.GenerateRGG* %0 + load (9.999923e-01) from %class.GenerateRGG* %0 + load (9.999914e-01) from %class.GenerateRGG* %0 + load (3.749968e-01) from %class.GenerateRGG* %0 + load (3.749964e-01) from %class.GenerateRGG* %0 + load (9.999890e-01) from %class.GenerateRGG* %0 + load (9.998746e-01) from %class.GenerateRGG* %0 + load (3.199362e+02) from %class.GenerateRGG* %0 + load (3.199361e+02) from %class.GenerateRGG* %0 + load (6.249210e-01) from %class.GenerateRGG* %0 + load (6.249210e-01) from %class.GenerateRGG* %0 + load (9.998736e-01) from %class.GenerateRGG* %0 + load (9.998736e-01) from %class.GenerateRGG* %0 + load (9.998736e-01) from %class.GenerateRGG* %0 + load (9.998736e-01) from %class.GenerateRGG* %0 + load (9.998726e-01) from %class.GenerateRGG* %0 + load (9.998726e-01) from %class.GenerateRGG* %0 + load (9.998726e-01) from %class.GenerateRGG* %0 + load (9.998726e-01) from %class.GenerateRGG* %0 + load (9.998717e-01) from %class.GenerateRGG* %0 + load (9.998717e-01) from %class.GenerateRGG* %0 + load (9.998717e-01) from %class.GenerateRGG* %0 + load (9.998717e-01) from %class.GenerateRGG* %0 + load (9.998707e-01) from %class.GenerateRGG* %0 + load (9.998707e-01) from %class.GenerateRGG* %0 + load (9.998707e-01) from %class.GenerateRGG* %0 + load (9.998707e-01) from %class.GenerateRGG* %0 + load (9.998698e-01) from %class.GenerateRGG* %0 + load (4.999349e-01) from %class.GenerateRGG* %0 + load (2.499674e-01) from %class.GenerateRGG* %0 + load (7.997451e+01) from %class.GenerateRGG* %0 + load (3.998725e+01) from %class.GenerateRGG* %0 + load (3.998725e+01) from %class.GenerateRGG* %0 + load (7.997448e+01) from %class.GenerateRGG* %0 + load (4.999063e-01) from %class.GenerateRGG* %0 + load (2.499531e-01) from %class.GenerateRGG* %0 + load (7.996993e+01) from %class.GenerateRGG* %0 + load (3.998497e+01) from %class.GenerateRGG* %0 + load (3.998497e+01) from %class.GenerateRGG* %0 + load (7.996991e+01) from %class.GenerateRGG* %0 + load (9.998126e-01) from %class.GenerateRGG* %0 + load (9.998116e-01) from %class.GenerateRGG* %0 + load (9.998116e-01) from %class.GenerateRGG* %0 + load (9.998116e-01) from %class.GenerateRGG* %0 + load (9.998107e-01) from %class.GenerateRGG* %0 + load (9.998107e-01) from %class.GenerateRGG* %0 + load (9.998107e-01) from %class.GenerateRGG* %0 + load (9.998091e-01) from %class.GenerateRGG* %0 + load (9.998091e-01) from %class.GenerateRGG* %0 + load (9.998091e-01) from %class.GenerateRGG* %0 + load (9.998082e-01) from %class.GenerateRGG* %0 + load (9.998082e-01) from %class.GenerateRGG* %0 + load (9.998082e-01) from %class.GenerateRGG* %0 + load (9.998072e-01) from %class.GenerateRGG* %0 + load (9.998015e-01) from %class.GenerateRGG* %0 + load (6.248724e-01) from %class.GenerateRGG* %0 + load (6.248718e-01) from %class.GenerateRGG* %0 + load (1.952724e-01) from %class.GenerateRGG* %0 + load (3.905445e-01) from %class.GenerateRGG* %0 + load (3.905442e-01) from %class.GenerateRGG* %0 + load (6.248393e-01) from %class.GenerateRGG* %0 + load (1.249644e+01) from %class.GenerateRGG* %0 + load (1.249643e+01) from %class.GenerateRGG* %0 + load (1.171538e+00) from %class.GenerateRGG* %0 + load (5.857690e-01) from %class.GenerateRGG* %0 + load (2.928845e-01) from %class.GenerateRGG* %0 + load (1.464422e-01) from %class.GenerateRGG* %0 + load (6.248387e-01) from %class.GenerateRGG* %0 + load (6.248381e-01) from %class.GenerateRGG* %0 + load (1.249638e+01) from %class.GenerateRGG* %0 + load (6.248253e-01) from %class.GenerateRGG* %0 + load (3.905154e-01) from %class.GenerateRGG* %0 + load (2.440719e-01) from %class.GenerateRGG* %0 + load (6.248247e-01) from %class.GenerateRGG* %0 + load (4.881438e+00) from %class.GenerateRGG* %0 + load (9.997431e-01) from %class.GenerateRGG* %0 + load (9.997421e-01) from %class.GenerateRGG* %0 + load (9.997406e-01) from %class.GenerateRGG* %0 + load (9.997406e-01) from %class.GenerateRGG* %0 + load (1.999481e+01) from %class.GenerateRGG* %0 + load (9.997388e-01) from %class.GenerateRGG* %0 + load (9.997385e-01) from %class.GenerateRGG* %0 + Frequency of %class.GenerateRGG* %0 + load: 1.246995e+03 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZN14BinaryEdgeList4readEiiiSs +Round 0 + alias entry %39 = getelementptr inbounds %class.BinaryEdgeList, %class.BinaryEdgeList* %0, i64 0, i32 4, !dbg !10380 + alias entry %41 = getelementptr inbounds %"class.std::basic_string", %"class.std::basic_string"* %4, i64 0, i32 0, i32 0, !dbg !10388 + alias entry %99 = getelementptr inbounds %class.BinaryEdgeList, %class.BinaryEdgeList* %0, i64 0, i32 0, !dbg !10514 + alias entry %100 = bitcast %class.BinaryEdgeList* %0 to i8*, !dbg !10515 + alias entry %104 = getelementptr inbounds %class.BinaryEdgeList, %class.BinaryEdgeList* %0, i64 0, i32 1, !dbg !10518 + alias entry %105 = bitcast i64* %104 to i8*, !dbg !10519 + alias entry %118 = getelementptr inbounds %class.BinaryEdgeList, %class.BinaryEdgeList* %0, i64 0, i32 2, !dbg !10532 + alias entry %182 = getelementptr inbounds %class.BinaryEdgeList, %class.BinaryEdgeList* %0, i64 0, i32 3, !dbg !10605 +Round 1 +Round end + load (9.999971e-01) from %class.BinaryEdgeList* %0 + load (9.999971e-01) from %"class.std::basic_string"* %4 + load (6.249948e-01) from %class.BinaryEdgeList* %0 + load (9.999905e-01) from %class.BinaryEdgeList* %0 + store (9.999905e-01) to %class.BinaryEdgeList* %0 + load (9.999895e-01) from %class.BinaryEdgeList* %0 + load (9.999886e-01) from %class.BinaryEdgeList* %0 + load (9.999886e-01) from %class.BinaryEdgeList* %0 + load (9.999729e-01) from %class.BinaryEdgeList* %0 + store (9.999729e-01) to %class.BinaryEdgeList* %0 + load (9.999714e-01) from %class.BinaryEdgeList* %0 + load (9.999714e-01) from %class.BinaryEdgeList* %0 + load (9.999547e-01) from %class.BinaryEdgeList* %0 + load (1.999909e+01) from %class.BinaryEdgeList* %0 + Frequency of %class.BinaryEdgeList* %0 + load: 2.962391e+01 store: 1.999963e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::basic_string"* %4 + load: 9.999971e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZNSt8_Rb_treeIlSt4pairIKl4CommESt10_Select1stIS3_ESt4lessIlESaIS3_EE8_M_eraseEPSt13_Rb_tree_nodeIS3_E +Round 0 +Round end +Warning: wrong traversal order, or recursive call +On function _ZN5GraphC2EllllP19ompi_communicator_t +Round 0 + alias entry %8 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 0, !dbg !10272 + alias entry %9 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 1, !dbg !10272 + alias entry %10 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 2, !dbg !10309 + alias entry %11 = bitcast %class.Graph* %0 to i8*, !dbg !10309 + alias entry %12 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 3, !dbg !10320 + alias entry %13 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 4, !dbg !10322 + alias entry %14 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 5, !dbg !10324 + alias entry %15 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, !dbg !10272 + alias entry %16 = bitcast %"class.std::vector.0"* %15 to i8*, !dbg !10332 + alias entry %17 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !10334 + base alias entry %17 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !10334 + alias entry %18 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 9, !dbg !10336 + alias entry %21 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %17, align 8, !dbg !10338, !tbaa !10335 + alias entry %22 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 8, !dbg !10339 + alias entry %28 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 0, i32 0, i32 0, i32 1, !dbg !10361 + alias entry %29 = bitcast i64** %28 to i64*, !dbg !10361 + alias entry %31 = bitcast %class.Graph* %0 to i64*, !dbg !10365 + alias entry %45 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 1, i32 0, i32 0, i32 1, !dbg !10416 + alias entry %46 = bitcast %struct.Edge** %45 to i64*, !dbg !10416 + alias entry %48 = bitcast %"class.std::vector.5"* %9 to i64*, !dbg !10420 + alias entry %64 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 1, !dbg !10455 + alias entry %65 = bitcast i64** %64 to i64*, !dbg !10455 + alias entry %67 = bitcast %"class.std::vector.0"* %15 to i64*, !dbg !10456 + alias entry %76 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %15, i64 0, i32 0, i32 0, i32 0 + alias entry %110 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %15, i64 0, i32 0, i32 0, i32 0, !dbg !10511 + alias entry %116 = getelementptr inbounds %"class.std::vector.5", %"class.std::vector.5"* %9, i64 0, i32 0, i32 0, i32 0, !dbg !10547 + alias entry %122 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 0, i32 0, i32 0, i32 0, !dbg !10576 +Round 1 +Round end + store (1.000000e+00) to %class.Graph* %0 + store (1.000000e+00) to %class.Graph* %0 + store (1.000000e+00) to %class.Graph* %0 + store (1.000000e+00) to %class.Graph* %0 + store (1.000000e+00) to %class.Graph* %0 + load (9.999990e-01) from %class.Graph* %0 + load (9.999980e-01) from %class.Graph* %0 + load (9.999980e-01) from %class.Graph* %0 + load (9.999980e-01) from %class.Graph* %0 +Warning: wrong traversal order, or recursive call +On function _ZN3LCGC2EjPdlP19ompi_communicator_t +Round 0 + alias entry %6 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 3, !dbg !10268 + alias entry %7 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 4, !dbg !10277 + alias entry %8 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 6, !dbg !10279 + base alias entry %8 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 6, !dbg !10279 + alias entry %9 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 7, !dbg !10281 + alias entry %10 = bitcast %"class.std::vector.0"* %9 to i8*, !dbg !10300 + alias entry %11 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 0, !dbg !10302 + base alias entry %11 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 0, !dbg !10302 + alias entry %12 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 1, !dbg !10306 + alias entry %15 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %11, align 8, !dbg !10308, !tbaa !10305 + alias entry %16 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 2, !dbg !10309 + alias entry %20 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 7, i32 0, i32 0, i32 1, !dbg !10326 + alias entry %21 = bitcast i64** %20 to i64*, !dbg !10326 + alias entry %23 = bitcast %"class.std::vector.0"* %9 to i64*, !dbg !10330 + alias entry %42 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 5, !dbg !10359 + alias entry %45 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %9, i64 0, i32 0, i32 0, i32 0, !dbg !10374 + alias entry %52 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 5, !dbg !10399 + alias entry %53 = bitcast i64* %52 to i8*, !dbg !10400 + alias entry %54 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %11, align 8, !dbg !10401, !tbaa !10305 +Round 1 +Round end + store (1.000000e+00) to %class.LCG* %0 + store (1.000000e+00) to %class.LCG* %0 + store (1.000000e+00) to %class.LCG* %0 + store (1.000000e+00) to %class.LCG* %0 + load (9.999989e-01) from %class.LCG* %0 + load (9.999982e-01) from %class.LCG* %0 + load (9.999982e-01) from %class.LCG* %0 + load (9.999982e-01) from %class.LCG* %0 +Warning: wrong traversal order, or recursive call +On function _ZNSt24uniform_int_distributionIiEclISt26linear_congruential_engineImLm16807ELm0ELm2147483647EEEEiRT_RKNS0_10param_typeE +Round 0 + alias entry %5 = getelementptr inbounds %"struct.std::uniform_int_distribution::param_type", %"struct.std::uniform_int_distribution::param_type"* %2, i64 0, i32 1, !dbg !10267 + alias entry %8 = getelementptr inbounds %"struct.std::uniform_int_distribution::param_type", %"struct.std::uniform_int_distribution::param_type"* %2, i64 0, i32 0, !dbg !10279 + alias entry %19 = getelementptr inbounds %"class.std::linear_congruential_engine", %"class.std::linear_congruential_engine"* %1, i64 0, i32 0 + alias entry %37 = getelementptr inbounds %"class.std::linear_congruential_engine", %"class.std::linear_congruential_engine"* %1, i64 0, i32 0 + alias entry %51 = getelementptr inbounds %"class.std::linear_congruential_engine", %"class.std::linear_congruential_engine"* %1, i64 0, i32 0, !dbg !10376 +Round 1 +Round end + load (1.000000e+00) from %"struct.std::uniform_int_distribution::param_type"* %2 + load (1.000000e+00) from %"struct.std::uniform_int_distribution::param_type"* %2 + load (5.000000e-01) from %"class.std::linear_congruential_engine"* %1 + store (5.000000e-01) to %"class.std::linear_congruential_engine"* %1 +Warning: wrong traversal order, or recursive call +On function _ZNSt6vectorIlSaIlEEaSERKS1_ +Round 0 + alias entry %5 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %1, i64 0, i32 0, i32 0, i32 1, !dbg !10278 + alias entry %6 = bitcast i64** %5 to i64*, !dbg !10278 + alias entry %8 = bitcast %"class.std::vector.0"* %1 to i64*, !dbg !10285 + alias entry %12 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10294 + alias entry %13 = bitcast i64** %12 to i64*, !dbg !10294 + alias entry %15 = bitcast %"class.std::vector.0"* %0 to i64*, !dbg !10296 + alias entry %.phi.trans.insert = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 0 + alias entry %35 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10460 + alias entry %42 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10490 + alias entry %43 = bitcast i64** %42 to i64*, !dbg !10490 + alias entry %54 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %1, i64 0, i32 0, i32 0, i32 0, !dbg !10573 + alias entry %74 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10633 + alias entry %77 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10635 +Round 1 +Round end + load (6.250000e-01) from %"class.std::vector.0"* %1 + load (6.250000e-01) from %"class.std::vector.0"* %1 + load (6.250000e-01) from %"class.std::vector.0"* %0 + load (6.250000e-01) from %"class.std::vector.0"* %0 + load (1.953125e-01) from %"class.std::vector.0"* %0 + store (3.125000e-01) to %"class.std::vector.0"* %0 + store (3.125000e-01) to %"class.std::vector.0"* %0 + load (3.125000e-01) from %"class.std::vector.0"* %0 + load (9.765625e-02) from %"class.std::vector.0"* %1 + load (9.765625e-02) from %"class.std::vector.0"* %0 + load (9.765625e-02) from %"class.std::vector.0"* %0 + load (9.765625e-02) from %"class.std::vector.0"* %1 + load (6.250000e-01) from %"class.std::vector.0"* %0 + store (6.250000e-01) to %"class.std::vector.0"* %0 + Frequency of %"class.std::vector.0"* %0 + load: 2.578125e+00 store: 1.250000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"class.std::vector.0"* %1 + load: 1.445312e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZNSt6vectorIlSaIlEE14_M_fill_insertEN9__gnu_cxx17__normal_iteratorIPlS1_EEmRKl +Round 0 + alias entry %8 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10281 + alias entry %9 = bitcast i64** %8 to i64*, !dbg !10281 + alias entry %11 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10288 + alias entry %12 = bitcast i64** %11 to i64*, !dbg !10288 + alias entry %543 = bitcast %"class.std::vector.0"* %0 to i64*, !dbg !10728 + alias entry %729 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10820 +Round 1 +Round end + load (6.250000e-01) from %"class.std::vector.0"* %0 + load (6.250000e-01) from %"class.std::vector.0"* %0 + load (3.125000e-01) from i64* %3 + load (9.765625e-02) from %"class.std::vector.0"* %0 + store (1.562500e-01) to %"class.std::vector.0"* %0 + store (1.562500e-01) to %"class.std::vector.0"* %0 + load (9.765625e-02) from %"class.std::vector.0"* %0 + store (1.562500e-01) to %"class.std::vector.0"* %0 + load (3.125000e-01) from %"class.std::vector.0"* %0 + load (3.125000e-01) from i64* %3 + load (3.125000e-01) from %"class.std::vector.0"* %0 + load (3.125000e-01) from %"class.std::vector.0"* %0 + store (3.125000e-01) to %"class.std::vector.0"* %0 + store (3.125000e-01) to %"class.std::vector.0"* %0 + store (3.125000e-01) to %"class.std::vector.0"* %0 + Frequency of %"class.std::vector.0"* %0 + load: 2.382812e+00 store: 1.406250e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of i64* %3 + load: 6.250000e-01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZNSt6vectorI4EdgeSaIS0_EE17_M_default_appendEm +Round 0 + alias entry %5 = getelementptr inbounds %"class.std::vector.5", %"class.std::vector.5"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10274 + alias entry %6 = bitcast %struct.Edge** %5 to i64*, !dbg !10274 + alias entry %8 = getelementptr inbounds %"class.std::vector.5", %"class.std::vector.5"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10281 + alias entry %83 = bitcast %"class.std::vector.5"* %0 to i64*, !dbg !10375 + alias entry %104 = getelementptr inbounds %"class.std::vector.5", %"class.std::vector.5"* %0, i64 0, i32 0, i32 0, i32 0 + alias entry %111 = getelementptr inbounds %"class.std::vector.5", %"class.std::vector.5"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10431 +Round 1 +Round end + load (6.250000e-01) from %"class.std::vector.5"* %0 + load (6.250000e-01) from %"class.std::vector.5"* %0 + load (3.125000e-01) from %"class.std::vector.5"* %0 + load (1.953125e-01) from %"class.std::vector.5"* %0 + load (1.953125e-01) from %"class.std::vector.5"* %0 + load (3.125000e-01) from %"class.std::vector.5"* %0 + store (3.125000e-01) to %"class.std::vector.5"* %0 + store (3.125000e-01) to %"class.std::vector.5"* %0 + Frequency of %"class.std::vector.5"* %0 + load: 2.265625e+00 store: 6.250000e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZN3LCG18parallel_prefix_opEv +Round 0 + alias entry %10 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 4, !dbg !10283 + alias entry %168 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 1, !dbg !10362 + alias entry %174 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 2, !dbg !10269 + alias entry %178 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 0 + alias entry %186 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 5, !dbg !10373 + alias entry %250 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 7, i32 0, i32 0, i32 0, !dbg !10373 +Round 1 +Round end + load (1.000000e+00) from %class.LCG* %0 + load (1.000000e+00) from %class.LCG* %0 + load (1.000000e+00) from %class.LCG* %0 + load (1.000000e+00) from %class.LCG* %0 + load (2.005882e+01) from %class.LCG* %0 + load (2.005882e+01) from %class.LCG* %0 + load (2.005882e+01) from %class.LCG* %0 + load (2.005882e+01) from %class.LCG* %0 + load (1.000000e+00) from %class.LCG* %0 + Frequency of %class.LCG* %0 + load: 8.523529e+01 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZNSt6vectorI9EdgeTupleSaIS0_EE17_M_default_appendEm +Round 0 + alias entry %5 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10273 + alias entry %6 = bitcast %struct.EdgeTuple** %5 to i64*, !dbg !10273 + alias entry %8 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10280 + alias entry %60 = bitcast %"class.std::vector.84"* %0 to i64*, !dbg !10369 + alias entry %81 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 0 + alias entry %88 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10425 +Round 1 +Round end + load (6.250000e-01) from %"class.std::vector.84"* %0 + load (6.250000e-01) from %"class.std::vector.84"* %0 + load (3.125000e-01) from %"class.std::vector.84"* %0 + load (3.125000e-01) from %"class.std::vector.84"* %0 + load (1.953125e-01) from %"class.std::vector.84"* %0 + load (1.953125e-01) from %"class.std::vector.84"* %0 + load (3.125000e-01) from %"class.std::vector.84"* %0 + store (3.125000e-01) to %"class.std::vector.84"* %0 + store (3.125000e-01) to %"class.std::vector.84"* %0 + Frequency of %"class.std::vector.84"* %0 + load: 2.578125e+00 store: 6.250000e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZSt9__find_ifIN9__gnu_cxx17__normal_iteratorIP9EdgeTupleSt6vectorIS2_SaIS2_EEEEZN11GenerateRGG8generateEbbiEUlRKS2_E_ET_SC_SC_T0_St26random_access_iterator_tag +Round 0 +Round end +On function _ZNSt6vectorI9EdgeTupleSaIS0_EE15_M_range_insertIN9__gnu_cxx17__normal_iteratorIPS0_S2_EEEEvS7_T_S8_St20forward_iterator_tag +Round 0 + alias entry %13 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10344 + alias entry %14 = bitcast %struct.EdgeTuple** %13 to i64*, !dbg !10344 + alias entry %16 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10351 + alias entry %17 = bitcast %struct.EdgeTuple** %16 to i64*, !dbg !10351 + alias entry %116 = bitcast %"class.std::vector.84"* %0 to i64*, !dbg !10799 + alias entry %137 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 0 + alias entry %142 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10851 +Round 1 +Round end + load (6.250000e-01) from %"class.std::vector.84"* %0 + load (6.250000e-01) from %"class.std::vector.84"* %0 + load (9.765625e-02) from %"class.std::vector.84"* %0 + store (1.562500e-01) to %"class.std::vector.84"* %0 + load (9.765625e-02) from %"class.std::vector.84"* %0 + store (1.562500e-01) to %"class.std::vector.84"* %0 + load (9.765625e-02) from %"class.std::vector.84"* %0 + store (1.562500e-01) to %"class.std::vector.84"* %0 + load (3.125000e-01) from %"class.std::vector.84"* %0 + load (1.953125e-01) from %"class.std::vector.84"* %0 + load (3.125000e-01) from %"class.std::vector.84"* %0 + load (3.125000e-01) from %"class.std::vector.84"* %0 + store (3.125000e-01) to %"class.std::vector.84"* %0 + store (3.125000e-01) to %"class.std::vector.84"* %0 + store (3.125000e-01) to %"class.std::vector.84"* %0 + Frequency of %"class.std::vector.84"* %0 + load: 2.675781e+00 store: 1.406250e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZSt16__introsort_loopIN9__gnu_cxx17__normal_iteratorIP9EdgeTupleSt6vectorIS2_SaIS2_EEEElZN11GenerateRGG8generateEbbiEUlRKS2_SA_E_EvT_SC_T0_T1_ +Round 0 +Round end +On function _ZSt22__final_insertion_sortIN9__gnu_cxx17__normal_iteratorIP9EdgeTupleSt6vectorIS2_SaIS2_EEEEZN11GenerateRGG8generateEbbiEUlRKS2_SA_E_EvT_SC_T0_ +Round 0 +Round end +On function _ZSt13__heap_selectIN9__gnu_cxx17__normal_iteratorIP9EdgeTupleSt6vectorIS2_SaIS2_EEEEZN11GenerateRGG8generateEbbiEUlRKS2_SA_E_EvT_SC_SC_T0_ +Round 0 +Round end +On function _ZSt13__adjust_heapIN9__gnu_cxx17__normal_iteratorIP9EdgeTupleSt6vectorIS2_SaIS2_EEEElS2_ZN11GenerateRGG8generateEbbiEUlRKS2_SA_E_EvT_T0_SD_T1_T2_ +Round 0 +Round end +On function _ZSt22__move_median_to_firstIN9__gnu_cxx17__normal_iteratorIP9EdgeTupleSt6vectorIS2_SaIS2_EEEEZN11GenerateRGG8generateEbbiEUlRKS2_SA_E_EvT_SC_SC_SC_T0_ +Round 0 +Round end +On function _ZNSt6vectorIlSaIlEE17_M_default_appendEm +Round 0 + alias entry %5 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10274 + alias entry %6 = bitcast i64** %5 to i64*, !dbg !10274 + alias entry %8 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10281 + alias entry %20 = bitcast i64** %8 to i64*, !dbg !10380 + alias entry %21 = bitcast %"class.std::vector.0"* %0 to i64*, !dbg !10381 + alias entry %42 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 0 + alias entry %65 = bitcast %"class.std::vector.0"* %0 to i8**, !dbg !10628 +Round 1 +Round end + load (6.250000e-01) from %"class.std::vector.0"* %0 + load (6.250000e-01) from %"class.std::vector.0"* %0 + load (3.125000e-01) from %"class.std::vector.0"* %0 + load (3.125000e-01) from %"class.std::vector.0"* %0 + load (1.953125e-01) from %"class.std::vector.0"* %0 + load (1.953125e-01) from %"class.std::vector.0"* %0 + store (3.125000e-01) to %"class.std::vector.0"* %0 + store (3.125000e-01) to %"class.std::vector.0"* %0 + Frequency of %"class.std::vector.0"* %0 + load: 2.265625e+00 store: 6.250000e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZNSt6vectorIdSaIdEE17_M_default_appendEm +Round 0 + alias entry %5 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10274 + alias entry %6 = bitcast double** %5 to i64*, !dbg !10274 + alias entry %8 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10281 + alias entry %20 = bitcast double** %8 to i64*, !dbg !10381 + alias entry %21 = bitcast %"class.std::vector.10"* %0 to i64*, !dbg !10382 + alias entry %42 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %0, i64 0, i32 0, i32 0, i32 0 + alias entry %65 = bitcast %"class.std::vector.10"* %0 to i8**, !dbg !10630 +Round 1 +Round end + load (6.250000e-01) from %"class.std::vector.10"* %0 + load (6.250000e-01) from %"class.std::vector.10"* %0 + load (3.125000e-01) from %"class.std::vector.10"* %0 + load (3.125000e-01) from %"class.std::vector.10"* %0 + load (1.953125e-01) from %"class.std::vector.10"* %0 + load (1.953125e-01) from %"class.std::vector.10"* %0 + store (3.125000e-01) to %"class.std::vector.10"* %0 + store (3.125000e-01) to %"class.std::vector.10"* %0 + Frequency of %"class.std::vector.10"* %0 + load: 2.265625e+00 store: 6.250000e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZNSt6vectorI4CommSaIS0_EE17_M_default_appendEm +Round 0 + alias entry %5 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10460 + alias entry %6 = bitcast %struct.Comm** %5 to i64*, !dbg !10460 + alias entry %8 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10467 + alias entry %20 = bitcast %"class.std::vector.15"* %0 to i64*, !dbg !10551 + alias entry %41 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %0, i64 0, i32 0, i32 0, i32 0 + alias entry %48 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10607 +Round 1 +Round end + load (6.250000e-01) from %"class.std::vector.15"* %0 + load (6.250000e-01) from %"class.std::vector.15"* %0 + load (3.125000e-01) from %"class.std::vector.15"* %0 + load (3.125000e-01) from %"class.std::vector.15"* %0 + load (1.953125e-01) from %"class.std::vector.15"* %0 + load (1.953125e-01) from %"class.std::vector.15"* %0 + load (3.125000e-01) from %"class.std::vector.15"* %0 + store (3.125000e-01) to %"class.std::vector.15"* %0 + store (3.125000e-01) to %"class.std::vector.15"* %0 + Frequency of %"class.std::vector.15"* %0 + load: 2.578125e+00 store: 6.250000e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZNSt27__uninitialized_default_n_1ILb0EE18__uninit_default_nIPSt13unordered_setIlSt4hashIlESt8equal_toIlESaIlEEmEEvT_T0_ +Round 0 +Round end + Frequency of %"class.std::unordered_set"* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZNSt10_HashtableIlSt4pairIKllESaIS2_ENSt8__detail10_Select1stESt8equal_toIlESt4hashIlENS4_18_Mod_range_hashingENS4_20_Default_ranged_hashENS4_20_Prime_rehash_policyENS4_17_Hashtable_traitsILb0ELb0ELb1EEEE21_M_insert_unique_nodeEmmPNS4_10_Hash_nodeIS2_Lb0EEE +Round 0 + alias entry %5 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 4, !dbg !10268 + alias entry %6 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 4, i32 1, !dbg !10275 + alias entry %8 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 1, !dbg !10282 + alias entry %10 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 3, !dbg !10288 + alias entry %17 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 0 + alias entry %29 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 2, i32 0, !dbg !10428 + alias entry %30 = bitcast %"struct.std::__detail::_Hash_node_base"* %29 to %"struct.std::__detail::_Hash_node"**, !dbg !10429 + alias entry %32 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %29, i64 0, i32 0, !dbg !10432 + alias entry %35 = bitcast %"struct.std::__detail::_Hash_node_base"* %29 to i64* + base alias entry %44 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %28, i64 %43, !dbg !10509 + alias entry %61 = load %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %44, align 8, !dbg !10529, !tbaa !10511 + alias entry %62 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %61, i64 0, i32 0, !dbg !10530 + alias entry %76 = bitcast %"class.std::_Hashtable"* %0 to i8**, !dbg !10550 + alias entry %82 = bitcast %"struct.std::__detail::_Hash_node"* %3 to i8*, !dbg !10618 + alias entry %86 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 0, !dbg !10296 + alias entry %93 = getelementptr inbounds %"struct.std::__detail::_Hash_node", %"struct.std::__detail::_Hash_node"* %3, i64 0, i32 0, !dbg !10627 + alias entry %94 = bitcast %"struct.std::__detail::_Hash_node"* %3 to i64*, !dbg !10628 + base alias entry %96 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %95, i64 0, i32 0, !dbg !10630 + alias entry %98 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 2, i32 0, !dbg !10639 + alias entry %99 = bitcast %"struct.std::__detail::_Hash_node_base"* %98 to i64*, !dbg !10640 + alias entry %101 = getelementptr inbounds %"struct.std::__detail::_Hash_node", %"struct.std::__detail::_Hash_node"* %3, i64 0, i32 0, !dbg !10641 + alias entry %102 = getelementptr inbounds %"struct.std::__detail::_Hash_node", %"struct.std::__detail::_Hash_node"* %3, i64 0, i32 0, i32 0, !dbg !10641 + alias entry %103 = bitcast %"struct.std::__detail::_Hash_node"* %3 to i64*, !dbg !10642 + alias entry %104 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %98, i64 0, i32 0, !dbg !10645 + base alias entry %104 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %98, i64 0, i32 0, !dbg !10645 + base alias entry %113 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %84, i64 %112, !dbg !10676 + base alias entry %117 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %116, i64 %85, !dbg !10678 +Round 1 +Warning: the first offset is not constant + alias entry %45 = load %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %44, align 8, !dbg !10509, !tbaa !10511 + alias entry %57 = bitcast %"struct.std::__detail::_Hash_node_base"* %45 to i64*, !dbg !10525 + base alias offset entry (0) %95 = load %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %87, align 8, !dbg !10629, !tbaa !10511 +Warning: the first offset is not constant +Warning: the first offset is not constant +Round 2 +Warning: the first offset is not constant +Warning: the first offset is not constant +Warning: the first offset is not constant +Round end + load (1.000000e+00) from %"class.std::_Hashtable"* %0 + load (1.000000e+00) from %"class.std::_Hashtable"* %0 + load (1.000000e+00) from %"class.std::_Hashtable"* %0 + load (5.000000e-01) from %"class.std::_Hashtable"* %0 + load (4.999995e-01) from %"class.std::_Hashtable"* %0 + store (4.999995e-01) to %"class.std::_Hashtable"* %0 + load (3.749996e+00) from %"class.std::_Hashtable"* %0 + store (3.749996e+00) to %"class.std::_Hashtable"* %0 + load (6.249994e+00) from %"class.std::_Hashtable"* %0 + store (6.249994e+00) to %"class.std::_Hashtable"* %0 + store (4.768372e-07) to %"class.std::_Hashtable"* %0 + load (4.999995e-01) from %"class.std::_Hashtable"* %0 + store (4.999995e-01) to %"class.std::_Hashtable"* %0 + store (4.999995e-01) to %"class.std::_Hashtable"* %0 + store (6.249997e-01) to %"struct.std::__detail::_Hash_node"* %3 + load (3.749998e-01) from %"class.std::_Hashtable"* %0 + store (3.749998e-01) to %"struct.std::__detail::_Hash_node"* %3 + store (3.749998e-01) to %"class.std::_Hashtable"* %0 + load (3.749998e-01) from %"struct.std::__detail::_Hash_node"* %3 + load (2.343749e-01) from %"class.std::_Hashtable"* %0 + load (2.343749e-01) from %"class.std::_Hashtable"* %0 + load (9.999995e-01) from %"class.std::_Hashtable"* %0 + store (9.999995e-01) to %"class.std::_Hashtable"* %0 + Frequency of %"class.std::_Hashtable"* %0 + load: 1.634374e+01 store: 1.287499e+01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"struct.std::__detail::_Hash_node"* %3 + load: 3.749998e-01 store: 9.999995e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZNSt10_HashtableIllSaIlENSt8__detail9_IdentityESt8equal_toIlESt4hashIlENS1_18_Mod_range_hashingENS1_20_Default_ranged_hashENS1_20_Prime_rehash_policyENS1_17_Hashtable_traitsILb0ELb1ELb1EEEE21_M_insert_unique_nodeEmmPNS1_10_Hash_nodeIlLb0EEE +Round 0 + alias entry %5 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 4, !dbg !10268 + alias entry %6 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 4, i32 1, !dbg !10275 + alias entry %8 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 1, !dbg !10282 + alias entry %10 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 3, !dbg !10288 + alias entry %17 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 0 + alias entry %29 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 2, i32 0, !dbg !10428 + alias entry %30 = bitcast %"struct.std::__detail::_Hash_node_base"* %29 to %"struct.std::__detail::_Hash_node.61"**, !dbg !10429 + alias entry %32 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %29, i64 0, i32 0, !dbg !10432 + alias entry %35 = bitcast %"struct.std::__detail::_Hash_node_base"* %29 to i64* + base alias entry %44 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %28, i64 %43, !dbg !10469 + alias entry %61 = load %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %44, align 8, !dbg !10489, !tbaa !10471 + alias entry %62 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %61, i64 0, i32 0, !dbg !10490 + alias entry %76 = bitcast %"class.std::_Hashtable.34"* %0 to i8**, !dbg !10510 + alias entry %82 = bitcast %"struct.std::__detail::_Hash_node.61"* %3 to i8*, !dbg !10578 + alias entry %86 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 0, !dbg !10296 + alias entry %93 = getelementptr inbounds %"struct.std::__detail::_Hash_node.61", %"struct.std::__detail::_Hash_node.61"* %3, i64 0, i32 0, !dbg !10587 + alias entry %94 = bitcast %"struct.std::__detail::_Hash_node.61"* %3 to i64*, !dbg !10588 + base alias entry %96 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %95, i64 0, i32 0, !dbg !10590 + alias entry %98 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 2, i32 0, !dbg !10599 + alias entry %99 = bitcast %"struct.std::__detail::_Hash_node_base"* %98 to i64*, !dbg !10600 + alias entry %101 = getelementptr inbounds %"struct.std::__detail::_Hash_node.61", %"struct.std::__detail::_Hash_node.61"* %3, i64 0, i32 0, !dbg !10601 + alias entry %102 = getelementptr inbounds %"struct.std::__detail::_Hash_node.61", %"struct.std::__detail::_Hash_node.61"* %3, i64 0, i32 0, i32 0, !dbg !10601 + alias entry %103 = bitcast %"struct.std::__detail::_Hash_node.61"* %3 to i64*, !dbg !10602 + alias entry %104 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %98, i64 0, i32 0, !dbg !10605 + base alias entry %104 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %98, i64 0, i32 0, !dbg !10605 + base alias entry %113 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %84, i64 %112, !dbg !10630 + base alias entry %117 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %116, i64 %85, !dbg !10632 +Round 1 +Warning: the first offset is not constant + alias entry %45 = load %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %44, align 8, !dbg !10469, !tbaa !10471 + alias entry %57 = bitcast %"struct.std::__detail::_Hash_node_base"* %45 to i64*, !dbg !10485 + base alias offset entry (0) %95 = load %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %87, align 8, !dbg !10589, !tbaa !10471 +Warning: the first offset is not constant +Warning: the first offset is not constant +Round 2 +Warning: the first offset is not constant +Warning: the first offset is not constant +Warning: the first offset is not constant +Round end + load (1.000000e+00) from %"class.std::_Hashtable.34"* %0 + load (1.000000e+00) from %"class.std::_Hashtable.34"* %0 + load (1.000000e+00) from %"class.std::_Hashtable.34"* %0 + load (5.000000e-01) from %"class.std::_Hashtable.34"* %0 + load (4.999995e-01) from %"class.std::_Hashtable.34"* %0 + store (4.999995e-01) to %"class.std::_Hashtable.34"* %0 + load (3.749996e+00) from %"class.std::_Hashtable.34"* %0 + store (3.749996e+00) to %"class.std::_Hashtable.34"* %0 + load (6.249994e+00) from %"class.std::_Hashtable.34"* %0 + store (6.249994e+00) to %"class.std::_Hashtable.34"* %0 + store (4.768372e-07) to %"class.std::_Hashtable.34"* %0 + load (4.999995e-01) from %"class.std::_Hashtable.34"* %0 + store (4.999995e-01) to %"class.std::_Hashtable.34"* %0 + store (4.999995e-01) to %"class.std::_Hashtable.34"* %0 + store (6.249997e-01) to %"struct.std::__detail::_Hash_node.61"* %3 + load (3.749998e-01) from %"class.std::_Hashtable.34"* %0 + store (3.749998e-01) to %"struct.std::__detail::_Hash_node.61"* %3 + store (3.749998e-01) to %"class.std::_Hashtable.34"* %0 + load (3.749998e-01) from %"struct.std::__detail::_Hash_node.61"* %3 + load (2.343749e-01) from %"class.std::_Hashtable.34"* %0 + load (2.343749e-01) from %"class.std::_Hashtable.34"* %0 + load (9.999995e-01) from %"class.std::_Hashtable.34"* %0 + store (9.999995e-01) to %"class.std::_Hashtable.34"* %0 + Frequency of %"class.std::_Hashtable.34"* %0 + load: 1.634374e+01 store: 1.287499e+01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) + Frequency of %"struct.std::__detail::_Hash_node.61"* %3 + load: 3.749998e-01 store: 9.999995e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _ZNSt6vectorI8CommInfoSaIS0_EE17_M_default_appendEm +Round 0 + alias entry %7 = getelementptr inbounds %"class.std::vector.52", %"class.std::vector.52"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10273 + alias entry %8 = bitcast %struct.CommInfo** %7 to i64*, !dbg !10273 + alias entry %10 = getelementptr inbounds %"class.std::vector.52", %"class.std::vector.52"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10280 + alias entry %54 = bitcast %struct.CommInfo** %10 to i64*, !dbg !10394 + alias entry %55 = bitcast %"class.std::vector.52"* %0 to i64*, !dbg !10395 + alias entry %76 = getelementptr inbounds %"class.std::vector.52", %"class.std::vector.52"* %0, i64 0, i32 0, i32 0, i32 0 + alias entry %84 = getelementptr inbounds %"class.std::vector.52", %"class.std::vector.52"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10449 + alias entry %133 = bitcast %"class.std::vector.52"* %0 to i8**, !dbg !10651 +Round 1 +Round end + load (6.250000e-01) from %"class.std::vector.52"* %0 + load (6.250000e-01) from %"class.std::vector.52"* %0 + load (3.125000e-01) from %"class.std::vector.52"* %0 + load (3.125000e-01) from %"class.std::vector.52"* %0 + load (1.953125e-01) from %"class.std::vector.52"* %0 + load (1.953125e-01) from %"class.std::vector.52"* %0 + load (3.125000e-01) from %"class.std::vector.52"* %0 + store (3.125000e-01) to %"class.std::vector.52"* %0 + store (3.125000e-01) to %"class.std::vector.52"* %0 + Frequency of %"class.std::vector.52"* %0 + load: 2.578125e+00 store: 6.250000e-01 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function _GLOBAL__sub_I_main.cpp +Round 0 +Round end +On function .omp_offloading.descriptor_unreg +Round 0 +Round end + Frequency of i8* %0 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 0.000000e+00 (target) +On function .omp_offloading.descriptor_reg.nvptx64-nvidia-cuda +Round 0 +Round end + ---- Identify Target Regions ---- + target call: %49 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z21distComputeModularityRK5GraphP4CommPKddi_l396.region_id, i32 5, i8** nonnull %24, i8** nonnull %26, i64* nonnull %28, i64* getelementptr inbounds ([5 x i64], [5 x i64]* @.offload_maptypes.0, i64 0, i64 0), i32 0, i32 0), !dbg !10317 + target call: %24 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z20distUpdateLocalCinfolP4CommPKS__l436.region_id, i32 3, i8** nonnull %9, i8** nonnull %11, i64* nonnull %13, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.15, i64 0, i64 0), i32 0, i32 0), !dbg !10269 + target call: %26 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z16distCleanCWandCUlPdP4Comm_l455.region_id, i32 3, i8** nonnull %11, i8** nonnull %13, i64* nonnull %15, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.20, i64 0, i64 0), i32 0, i32 0), !dbg !10269 + target call: %258 = invoke i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z16distCleanCWandCUlPdP4Comm_l455.region_id, i32 3, i8** nonnull %184, i8** nonnull %186, i64* nonnull %188, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.20.1, i64 0, i64 0), i32 0, i32 0) + to label %259 unwind label %319, !dbg !11559 + target call: %276 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z17distLouvainMethodiiRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ddRi_l1368.region_id, i32 12, i8** nonnull %200, i8** nonnull %202, i64* nonnull %204, i64* getelementptr inbounds ([12 x i64], [12 x i64]* @.offload_maptypes.47.2, i64 0, i64 0), i32 0, i32 0), !dbg !11584 + target call: %325 = invoke i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z20distUpdateLocalCinfolP4CommPKS__l436.region_id, i32 3, i8** nonnull %184, i8** nonnull %186, i64* nonnull %188, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.15.3, i64 0, i64 0), i32 0, i32 0) + to label %326 unwind label %319, !dbg !11667 + ---- Target Distance Calculation ---- +_Z17distLouvainMethodiiRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ddRi converges after 3 iterations +target 0: (0: 1.000000e+00) (1: 1.000000e+00) (2: 1.000000e+00) (3: 1.000000e+00) (4: 1.000000e+00) (5: 1.000000e+00) +target 1: (0: 1.000000e+00) (1: 1.000000e+00) (2: 1.000000e+00) (3: 1.000000e+00) (4: 1.000000e+00) (5: 1.000000e+00) +target 2: (0: 1.000000e+00) (1: 1.000000e+00) (2: 1.000000e+00) (3: 1.000000e+00) (4: 1.000000e+00) (5: 1.000000e+00) +target 3: (0: 1.010000e+02) (1: 1.010000e+02) (2: 1.010000e+02) (3: 9.152967e+00) (4: 1.000095e+00) (5: 2.000190e+00) +target 4: (0: 1.010000e+02) (1: 1.010000e+02) (2: 1.010000e+02) (3: 8.152880e+00) (4: 9.091440e+00) (5: 1.000095e+00) +target 5: (0: 1.010000e+02) (1: 1.010000e+02) (2: 1.010000e+02) (3: 7.152791e+00) (4: 8.091353e+00) (5: 9.029914e+00) + ---- OMP (/tmp/main-cdf4fe.bc, powerpc64le-unknown-linux-gnu) ---- +new entry %91 = invoke i8* @omp_target_alloc(i64 %90, i32 signext -100) + to label %92 unwind label %291, !dbg !11387 +new entry %105 = invoke i8* @omp_target_alloc(i64 %104, i32 signext -100) + to label %106 unwind label %295, !dbg !11405 +new entry %119 = invoke i8* @omp_target_alloc(i64 %118, i32 signext -100) + to label %120 unwind label %299, !dbg !11431 +new entry %128 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %129 unwind label %303, !dbg !11449 +new entry %134 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %135 unwind label %307, !dbg !11462 +new entry %140 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %141 unwind label %311, !dbg !11478 +new entry %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %146 unwind label %313, !dbg !11486 +new entry %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %152 unwind label %315, !dbg !11502 +new entry %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %157 unwind label %317, !dbg !11510 +Round 0 + base alias entry %130 = bitcast i64** %29 to i8**, !dbg !11450 + base alias entry %142 = bitcast i64** %30 to i8**, !dbg !11479 + alias entry %147 = bitcast i8* %145 to %struct.Comm*, !dbg !11487 + alias entry %158 = bitcast i8* %156 to double*, !dbg !11511 + base alias entry %189 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 1 + base alias entry %190 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 1 + base alias entry %192 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 2 + base alias entry %193 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 2 +Warning: reach to function declaration __kmpc_fork_teams + base alias entry %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1 + base alias entry %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1 + base alias entry %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2 + base alias entry %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2 + base alias entry %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3 + base alias entry %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3 + base alias entry %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6 + base alias entry %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6 + base alias entry %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7 + base alias entry %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7 + base alias entry %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8 + base alias entry %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8 + base alias entry %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10 + base alias entry %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10 +Warning: reach to function declaration __kmpc_fork_teams + base alias entry %189 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 1 + base alias entry %190 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 1 +Warning: reach to function declaration __kmpc_fork_teams + alias entry (func arg) %struct.Comm* %1 + alias entry (func arg) double* %2 +Info: add function _Z21distComputeModularityRK5GraphP4CommPKddi to Round 1 +Round 1 + base alias entry %35 = bitcast i8** %34 to double**, !dbg !10317 + base alias entry %37 = bitcast i8** %36 to double**, !dbg !10317 + base alias entry %45 = bitcast i8** %44 to %struct.Comm**, !dbg !10317 + base alias entry %47 = bitcast i8** %46 to %struct.Comm**, !dbg !10317 +Warning: reach to function declaration __kmpc_fork_teams + base alias entry %29 = alloca i64*, align 8 + base alias entry %30 = alloca i64*, align 8 + base alias offset entry (1) %16 = alloca [3 x i8*], align 8 + base alias offset entry (1) %17 = alloca [3 x i8*], align 8 + base alias offset entry (2) %16 = alloca [3 x i8*], align 8 + base alias offset entry (-1) %192 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 2 + base alias offset entry (2) %17 = alloca [3 x i8*], align 8 + base alias offset entry (-1) %193 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 2 + base alias offset entry (1) %31 = alloca [12 x i8*], align 8 + base alias offset entry (1) %32 = alloca [12 x i8*], align 8 + base alias offset entry (2) %31 = alloca [12 x i8*], align 8 + base alias offset entry (-1) %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2 + base alias offset entry (2) %32 = alloca [12 x i8*], align 8 + base alias offset entry (-1) %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2 + base alias offset entry (3) %31 = alloca [12 x i8*], align 8 + base alias offset entry (-2) %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3 + base alias offset entry (-1) %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3 + base alias offset entry (3) %32 = alloca [12 x i8*], align 8 + base alias offset entry (-2) %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3 + base alias offset entry (-1) %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3 + base alias offset entry (-3) %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4 + base alias offset entry (-2) %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4 + base alias offset entry (-1) %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4 + base alias offset entry (-3) %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4 + base alias offset entry (-2) %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4 + base alias offset entry (-1) %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4 + base alias offset entry (-4) %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5 + base alias offset entry (-3) %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5 + base alias offset entry (-2) %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5 + base alias offset entry (-4) %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5 + base alias offset entry (-3) %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5 + base alias offset entry (-2) %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5 + base alias offset entry (6) %31 = alloca [12 x i8*], align 8 + base alias offset entry (-5) %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6 + base alias offset entry (-4) %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6 + base alias offset entry (-3) %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6 + base alias offset entry (6) %32 = alloca [12 x i8*], align 8 + base alias offset entry (-5) %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6 + base alias offset entry (-4) %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6 + base alias offset entry (-3) %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6 + base alias offset entry (7) %31 = alloca [12 x i8*], align 8 + base alias offset entry (-6) %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7 + base alias offset entry (-5) %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7 + base alias offset entry (-4) %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7 + base alias offset entry (-1) %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7 + base alias offset entry (7) %32 = alloca [12 x i8*], align 8 + base alias offset entry (-6) %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7 + base alias offset entry (-5) %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7 + base alias offset entry (-4) %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7 + base alias offset entry (-1) %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7 + base alias offset entry (8) %31 = alloca [12 x i8*], align 8 + base alias offset entry (-7) %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8 + base alias offset entry (-6) %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8 + base alias offset entry (-5) %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8 + base alias offset entry (-2) %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8 + base alias offset entry (-1) %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8 + base alias offset entry (8) %32 = alloca [12 x i8*], align 8 + base alias offset entry (-7) %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8 + base alias offset entry (-6) %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8 + base alias offset entry (-5) %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8 + base alias offset entry (-2) %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8 + base alias offset entry (-1) %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8 + base alias offset entry (-8) %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9 + base alias offset entry (-7) %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9 + base alias offset entry (-6) %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9 + base alias offset entry (-3) %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9 + base alias offset entry (-2) %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9 + base alias offset entry (-1) %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9 + base alias offset entry (-8) %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9 + base alias offset entry (-7) %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9 + base alias offset entry (-6) %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9 + base alias offset entry (-3) %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9 + base alias offset entry (-2) %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9 + base alias offset entry (-1) %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9 + base alias offset entry (10) %31 = alloca [12 x i8*], align 8 + base alias offset entry (-9) %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10 + base alias offset entry (-8) %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10 + base alias offset entry (-7) %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10 + base alias offset entry (-4) %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10 + base alias offset entry (-3) %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10 + base alias offset entry (-2) %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10 + base alias offset entry (10) %32 = alloca [12 x i8*], align 8 + base alias offset entry (-9) %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10 + base alias offset entry (-8) %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10 + base alias offset entry (-7) %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10 + base alias offset entry (-4) %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10 + base alias offset entry (-3) %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10 + base alias offset entry (-2) %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10 + base alias offset entry (-10) %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11 + base alias offset entry (-9) %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11 + base alias offset entry (-8) %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11 + base alias offset entry (-5) %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11 + base alias offset entry (-4) %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11 + base alias offset entry (-3) %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11 + base alias offset entry (-1) %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11 + base alias offset entry (-10) %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11 + base alias offset entry (-9) %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11 + base alias offset entry (-8) %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11 + base alias offset entry (-5) %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11 + base alias offset entry (-4) %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11 + base alias offset entry (-3) %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11 + base alias offset entry (-1) %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11 +Warning: store a different alias pointer to a base pointer: store i8* %156, i8** %189, align 8, !dbg !11559 +Warning: store a different alias pointer to a base pointer: store i8* %156, i8** %190, align 8, !dbg !11559 +Warning: reach to function declaration __kmpc_fork_teams + alias entry %263 = load i64*, i64** %29, align 8, !dbg !11584, !tbaa !11451 + alias entry %264 = load i64*, i64** %30, align 8, !dbg !11584, !tbaa !11451 + alias entry %274 = ptrtoint i64* %263 to i64, !dbg !11584 + alias entry %275 = ptrtoint i64* %264 to i64, !dbg !11584 + base alias entry %215 = bitcast i8** %214 to i64* + base alias entry %217 = bitcast i8** %216 to i64* + base alias entry %220 = bitcast i8** %219 to i64* + base alias entry %222 = bitcast i8** %221 to i64* +Warning: reach to function declaration __kmpc_fork_teams +Warning: reach to function declaration __kmpc_fork_teams +Info: add function _Z21distComputeModularityRK5GraphP4CommPKddi to Round 2 +Warning: reach to function declaration __kmpc_fork_call +Round 2 + base alias entry %34 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 2, !dbg !10317 + base alias entry %36 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 2, !dbg !10317 + base alias entry %44 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 4, !dbg !10317 + base alias entry %46 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 4, !dbg !10317 +Warning: reach to function declaration __kmpc_fork_teams + base alias offset entry (2) %11 = alloca [5 x i8*], align 8 + base alias offset entry (2) %12 = alloca [5 x i8*], align 8 + base alias offset entry (-1) %39 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 3, !dbg !10317 + base alias offset entry (-1) %41 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 3, !dbg !10317 + base alias offset entry (4) %11 = alloca [5 x i8*], align 8 + base alias offset entry (-2) %44 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 4, !dbg !10317 + base alias offset entry (4) %12 = alloca [5 x i8*], align 8 + base alias offset entry (-2) %46 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 4, !dbg !10317 +Warning: reach to function declaration __kmpc_fork_teams + base alias entry %126 = bitcast i64** %29 to i8*, !dbg !11447 + base alias entry %139 = bitcast i64** %30 to i8*, !dbg !11477 + base alias offset entry (1) %184 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 0 + base alias offset entry (2) %184 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 0 + base alias offset entry (1) %186 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 0 + base alias offset entry (2) %186 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 0 + base alias offset entry (1) %189 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 1 + base alias offset entry (1) %190 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 1 + base alias offset entry (1) %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0 + base alias offset entry (2) %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0 + base alias offset entry (3) %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0 + base alias offset entry (6) %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0 + base alias offset entry (7) %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0 + base alias offset entry (8) %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0 + base alias offset entry (10) %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0 + base alias offset entry (1) %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0 + base alias offset entry (2) %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0 + base alias offset entry (3) %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0 + base alias offset entry (6) %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0 + base alias offset entry (7) %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0 + base alias offset entry (8) %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0 + base alias offset entry (10) %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0 + base alias offset entry (1) %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1 + base alias offset entry (2) %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1 + base alias offset entry (5) %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1 + base alias offset entry (6) %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1 + base alias offset entry (7) %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1 + base alias offset entry (9) %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1 + base alias offset entry (1) %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1 + base alias offset entry (2) %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1 + base alias offset entry (5) %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1 + base alias offset entry (6) %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1 + base alias offset entry (7) %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1 + base alias offset entry (9) %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1 + base alias offset entry (1) %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2 + base alias offset entry (4) %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2 + base alias offset entry (5) %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2 + base alias offset entry (6) %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2 + base alias offset entry (8) %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2 + base alias offset entry (1) %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2 + base alias offset entry (4) %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2 + base alias offset entry (5) %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2 + base alias offset entry (6) %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2 + base alias offset entry (8) %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2 + base alias offset entry (3) %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3 + base alias offset entry (4) %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3 + base alias offset entry (5) %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3 + base alias offset entry (7) %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3 + base alias offset entry (3) %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3 + base alias offset entry (4) %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3 + base alias offset entry (5) %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3 + base alias offset entry (7) %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3 + base alias offset entry (2) %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4 + base alias offset entry (3) %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4 + base alias offset entry (4) %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4 + base alias offset entry (6) %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4 + base alias entry %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4 + base alias offset entry (2) %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4 + base alias offset entry (3) %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4 + base alias offset entry (4) %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4 + base alias offset entry (6) %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4 + base alias entry %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4 + base alias offset entry (1) %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5 + base alias offset entry (2) %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5 + base alias offset entry (3) %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5 + base alias offset entry (5) %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5 + base alias entry %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5 + base alias offset entry (1) %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5 + base alias offset entry (2) %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5 + base alias offset entry (3) %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5 + base alias offset entry (5) %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5 + base alias entry %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5 + base alias offset entry (1) %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6 + base alias offset entry (2) %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6 + base alias offset entry (4) %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6 + base alias offset entry (1) %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6 + base alias offset entry (2) %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6 + base alias offset entry (4) %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6 + base alias offset entry (1) %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7 + base alias offset entry (3) %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7 + base alias offset entry (1) %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7 + base alias offset entry (3) %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7 + base alias offset entry (2) %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8 + base alias offset entry (2) %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8 + base alias offset entry (1) %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9 + base alias offset entry (1) %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9 +Warning: store a different alias pointer to a base pointer: store i8* %156, i8** %189, align 8, !dbg !11559 +Warning: store a different alias pointer to a base pointer: store i8* %156, i8** %190, align 8, !dbg !11559 +Warning: reach to function declaration __kmpc_fork_teams +Warning: reach to function declaration __kmpc_fork_teams +Warning: reach to function declaration __kmpc_fork_teams +Info: add function _Z21distComputeModularityRK5GraphP4CommPKddi to Round 3 +Warning: reach to function declaration __kmpc_fork_call +Round 3 + base alias offset entry (4) %24 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 0, !dbg !10317 + base alias offset entry (2) %24 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 0, !dbg !10317 + base alias offset entry (4) %26 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 0, !dbg !10317 + base alias offset entry (2) %26 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 0, !dbg !10317 + base alias offset entry (3) %29 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 1, !dbg !10317 + base alias offset entry (1) %29 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 1, !dbg !10317 + base alias offset entry (3) %31 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 1, !dbg !10317 + base alias offset entry (1) %31 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 1, !dbg !10317 + base alias offset entry (2) %34 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 2, !dbg !10317 + base alias offset entry (2) %36 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 2, !dbg !10317 + base alias offset entry (1) %39 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 3, !dbg !10317 + base alias offset entry (1) %41 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 3, !dbg !10317 +Warning: reach to function declaration __kmpc_fork_teams +Warning: reach to function declaration __kmpc_fork_teams +Warning: reach to function declaration __kmpc_fork_teams + base alias offset entry (4) %31 = alloca [12 x i8*], align 8 + base alias offset entry (4) %32 = alloca [12 x i8*], align 8 + base alias offset entry (5) %31 = alloca [12 x i8*], align 8 + base alias offset entry (-1) %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5 + base alias offset entry (5) %32 = alloca [12 x i8*], align 8 + base alias offset entry (-1) %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5 + base alias offset entry (-2) %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6 + base alias offset entry (-1) %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6 + base alias offset entry (-2) %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6 + base alias offset entry (-1) %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6 + base alias offset entry (-3) %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7 + base alias offset entry (-2) %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7 + base alias offset entry (-3) %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7 + base alias offset entry (-2) %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7 + base alias offset entry (-4) %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8 + base alias offset entry (-3) %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8 + base alias offset entry (-4) %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8 + base alias offset entry (-3) %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8 + base alias offset entry (-5) %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9 + base alias offset entry (-4) %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9 + base alias offset entry (-5) %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9 + base alias offset entry (-4) %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9 + base alias offset entry (-6) %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10 + base alias offset entry (-5) %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10 + base alias offset entry (-6) %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10 + base alias offset entry (-5) %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10 + base alias offset entry (-7) %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11 + base alias offset entry (-6) %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11 + base alias offset entry (-7) %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11 + base alias offset entry (-6) %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11 +Warning: store a different alias pointer to a base pointer: store i8* %156, i8** %189, align 8, !dbg !11559 +Warning: store a different alias pointer to a base pointer: store i8* %156, i8** %190, align 8, !dbg !11559 +Warning: reach to function declaration __kmpc_fork_teams +Warning: reach to function declaration __kmpc_fork_teams +Warning: reach to function declaration __kmpc_fork_teams +Info: add function _Z21distComputeModularityRK5GraphP4CommPKddi to Round 4 +Warning: reach to function declaration __kmpc_fork_call +Round 4 +Warning: reach to function declaration __kmpc_fork_teams +Warning: reach to function declaration __kmpc_fork_teams + base alias offset entry (4) %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0 + base alias offset entry (5) %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0 + base alias offset entry (4) %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0 + base alias offset entry (5) %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0 + base alias offset entry (3) %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1 + base alias offset entry (4) %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1 + base alias offset entry (3) %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1 + base alias offset entry (4) %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1 + base alias offset entry (2) %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2 + base alias offset entry (3) %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2 + base alias offset entry (2) %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2 + base alias offset entry (3) %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2 + base alias offset entry (1) %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3 + base alias offset entry (2) %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3 + base alias offset entry (1) %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3 + base alias offset entry (2) %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3 + base alias offset entry (1) %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4 + base alias offset entry (1) %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4 +Warning: store a different alias pointer to a base pointer: store i8* %156, i8** %189, align 8, !dbg !11559 +Warning: store a different alias pointer to a base pointer: store i8* %156, i8** %190, align 8, !dbg !11559 +Warning: reach to function declaration __kmpc_fork_teams +Warning: reach to function declaration __kmpc_fork_teams +Warning: reach to function declaration __kmpc_fork_teams +Info: add function _Z21distComputeModularityRK5GraphP4CommPKddi to Round 5 +Warning: reach to function declaration __kmpc_fork_call +Round 5 +Warning: reach to function declaration __kmpc_fork_teams +Warning: store a different alias pointer to a base pointer: store i8* %156, i8** %189, align 8, !dbg !11559 +Warning: store a different alias pointer to a base pointer: store i8* %156, i8** %190, align 8, !dbg !11559 +Warning: reach to function declaration __kmpc_fork_teams +Warning: reach to function declaration __kmpc_fork_teams +Warning: reach to function declaration __kmpc_fork_teams +Info: add function _Z21distComputeModularityRK5GraphP4CommPKddi to Round 6 +Warning: reach to function declaration __kmpc_fork_call +Round 6 +Warning: reach to function declaration __kmpc_fork_teams +Round end + ---- Access Frequency Analysis ---- + target call (1.625206e+01, 0.000000e+00, 5.076920e+00) using %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %146 unwind label %313, !dbg !11486 + target call (1.625206e+01, 0.000000e+00, 1.015380e+01) using %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %152 unwind label %315, !dbg !11502 + target call (1.625204e+01, 1.015380e+01, 0.000000e+00) using %91 = invoke i8* @omp_target_alloc(i64 %90, i32 signext -100) + to label %92 unwind label %291, !dbg !11387 + target call (1.625204e+01, 5.076920e+00, 0.000000e+00) using %105 = invoke i8* @omp_target_alloc(i64 %104, i32 signext -100) + to label %106 unwind label %295, !dbg !11405 + target call (1.625204e+01, 8.757690e+01, 0.000000e+00) using %119 = invoke i8* @omp_target_alloc(i64 %118, i32 signext -100) + to label %120 unwind label %299, !dbg !11431 + target call (1.625204e+01, 4.569230e+01, 0.000000e+00) using %128 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %129 unwind label %303, !dbg !11449 + target call (1.625204e+01, 0.000000e+00, 5.076920e+00) using %140 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %141 unwind label %311, !dbg !11478 + target call (1.625204e+01, 3.807690e+00, 0.000000e+00) using %134 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %135 unwind label %307, !dbg !11462 + target call (1.625204e+01, 1.078710e+02, 0.000000e+00) using %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %146 unwind label %313, !dbg !11486 + target call (1.625204e+01, 2.538460e+00, 0.000000e+00) using %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %152 unwind label %315, !dbg !11502 + target call (1.625204e+01, 2.538460e+00, 2.538460e+00) using %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %157 unwind label %317, !dbg !11510 + target call (1.625202e+01, 1.015380e+01, 1.015380e+01) using %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %146 unwind label %313, !dbg !11486 + target call (1.625202e+01, 1.015380e+01, 0.000000e+00) using %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %152 unwind label %315, !dbg !11502 + call (1.625199e+01, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00) using %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %146 unwind label %313, !dbg !11486 + call (1.625199e+01, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00) using %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %157 unwind label %317, !dbg !11510 +Frequency of %91 = invoke i8* @omp_target_alloc(i64 %90, i32 signext -100) + to label %92 unwind label %291, !dbg !11387 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 1.650200e+02 store: 0.000000e+00 (target) +Frequency of %105 = invoke i8* @omp_target_alloc(i64 %104, i32 signext -100) + to label %106 unwind label %295, !dbg !11405 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 8.251031e+01 store: 0.000000e+00 (target) +Frequency of %119 = invoke i8* @omp_target_alloc(i64 %118, i32 signext -100) + to label %120 unwind label %299, !dbg !11431 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 1.423303e+03 store: 0.000000e+00 (target) +Frequency of %128 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %129 unwind label %303, !dbg !11449 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 7.425931e+02 store: 0.000000e+00 (target) +Frequency of %134 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %135 unwind label %307, !dbg !11462 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 6.188273e+01 store: 0.000000e+00 (target) +Frequency of %140 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %141 unwind label %311, !dbg !11478 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 8.251031e+01 (target) +Frequency of %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %146 unwind label %313, !dbg !11486 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 1.918144e+03 store: 2.475302e+02 (target) +Frequency of %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %152 unwind label %315, !dbg !11502 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 2.062750e+02 store: 1.650201e+02 (target) +Frequency of %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %157 unwind label %317, !dbg !11510 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 4.125515e+01 store: 4.125515e+01 (target) + ---- Optimization Preparation ---- +Rank 9 for %134 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %135 unwind label %307, !dbg !11462 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 6.188273e+01 store: 0.000000e+00 (target) +Rank 8 for %105 = invoke i8* @omp_target_alloc(i64 %104, i32 signext -100) + to label %106 unwind label %295, !dbg !11405 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 8.251031e+01 store: 0.000000e+00 (target) +Rank 7 for %140 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %141 unwind label %311, !dbg !11478 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 0.000000e+00 store: 8.251031e+01 (target) +Rank 6 for %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %157 unwind label %317, !dbg !11510 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 4.125515e+01 store: 4.125515e+01 (target) +Rank 5 for %91 = invoke i8* @omp_target_alloc(i64 %90, i32 signext -100) + to label %92 unwind label %291, !dbg !11387 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 1.650200e+02 store: 0.000000e+00 (target) +Rank 4 for %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %152 unwind label %315, !dbg !11502 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 2.062750e+02 store: 1.650201e+02 (target) +Rank 3 for %128 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %129 unwind label %303, !dbg !11449 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 7.425931e+02 store: 0.000000e+00 (target) +Rank 2 for %119 = invoke i8* @omp_target_alloc(i64 %118, i32 signext -100) + to label %120 unwind label %299, !dbg !11431 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 1.423303e+03 store: 0.000000e+00 (target) +Rank 1 for %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %146 unwind label %313, !dbg !11486 + load: 0.000000e+00 store: 0.000000e+00 (host) + load: 1.918144e+03 store: 2.475302e+02 (target) + ---- Data Mapping Optimization ---- + target call: %49 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z21distComputeModularityRK5GraphP4CommPKddi_l396.region_id, i32 5, i8** nonnull %24, i8** nonnull %26, i64* nonnull %28, i64* getelementptr inbounds ([5 x i64], [5 x i64]* @.offload_maptypes.0, i64 0, i64 0), i32 0, i32 0), !dbg !10317 +@.offload_maptypes.0 = private unnamed_addr constant [5 x i64] [i64 800, i64 547, i64 1100853829665, i64 547, i64 1102195986465] + arg 2 (0.000000e+00, 0.000000e+00; 4.125515e+01, 4.125515e+01) is %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %157 unwind label %317, !dbg !11510 + size is %127 = shl i64 %69, 3, !dbg !11448 + global reuse is 0x06 + local reuse is 1.600380e+02, 1.280304e+03 after adjustment; scaled local reuse is 0x500 + reuse distance is 0x01 + arg 4 (0.000000e+00, 0.000000e+00; 1.918144e+03, 2.475302e+02) is %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %146 unwind label %313, !dbg !11486 + size is %144 = shl i64 %69, 4, !dbg !11485 + global reuse is 0x01 + local reuse is 1.600380e+02, 2.560608e+03 after adjustment; scaled local reuse is 0xa00 + reuse distance is 0x01 + target call: %24 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z20distUpdateLocalCinfolP4CommPKS__l436.region_id, i32 3, i8** nonnull %9, i8** nonnull %11, i64* nonnull %13, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.15, i64 0, i64 0), i32 0, i32 0), !dbg !10269 +@.offload_maptypes.15 = private unnamed_addr constant [3 x i64] [i64 800, i64 35, i64 33] + target call: %26 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z16distCleanCWandCUlPdP4Comm_l455.region_id, i32 3, i8** nonnull %11, i8** nonnull %13, i64* nonnull %15, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.20, i64 0, i64 0), i32 0, i32 0), !dbg !10269 +@.offload_maptypes.20 = private unnamed_addr constant [3 x i64] [i64 800, i64 34, i64 34] + target call: %258 = invoke i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z16distCleanCWandCUlPdP4Comm_l455.region_id, i32 3, i8** nonnull %184, i8** nonnull %186, i64* nonnull %188, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.20.1, i64 0, i64 0), i32 0, i32 0) + to label %259 unwind label %319, !dbg !11559 +@.offload_maptypes.20.1 = private unnamed_addr constant [3 x i64] [i64 800, i64 1099553574946, i64 1099681513506] + arg 1 (0.000000e+00, 0.000000e+00; 1.918144e+03, 2.475302e+02) is %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %146 unwind label %313, !dbg !11486 + size is %144 = shl i64 %69, 4, !dbg !11485 + global reuse is 0x01 + local reuse is 5.076920e+00, 4.061536e+01 after adjustment; scaled local reuse is 0x028 + reuse distance is 0x01 + arg 2 (0.000000e+00, 0.000000e+00; 2.062750e+02, 1.650201e+02) is %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %152 unwind label %315, !dbg !11502 + size is %144 = shl i64 %69, 4, !dbg !11485 + global reuse is 0x04 + local reuse is 1.015380e+01, 1.624608e+02 after adjustment; scaled local reuse is 0x0a2 + reuse distance is 0x01 + target call: %276 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z17distLouvainMethodiiRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ddRi_l1368.region_id, i32 12, i8** nonnull %200, i8** nonnull %202, i64* nonnull %204, i64* getelementptr inbounds ([12 x i64], [12 x i64]* @.offload_maptypes.47.2, i64 0, i64 0), i32 0, i32 0), !dbg !11584 +@.offload_maptypes.47.2 = private unnamed_addr constant [12 x i64] [i64 800, i64 9895689605153, i64 9895646625825, i64 9897073713185, i64 9895987392545, i64 9895646621730, i64 9895636144161, i64 1101320425505, i64 1099553587235, i64 800, i64 9895646617635, i64 800] + arg 1 (0.000000e+00, 0.000000e+00; 1.650200e+02, 0.000000e+00) is %91 = invoke i8* @omp_target_alloc(i64 %90, i32 signext -100) + to label %92 unwind label %291, !dbg !11387 + size is %90 = sub i64 %87, %89, !dbg !11386 + global reuse is 0x05 + local reuse is 1.015380e+01, 8.123040e+01 after adjustment; scaled local reuse is 0x051 + reuse distance is 0x09 + arg 2 (0.000000e+00, 0.000000e+00; 8.251031e+01, 0.000000e+00) is %105 = invoke i8* @omp_target_alloc(i64 %104, i32 signext -100) + to label %106 unwind label %295, !dbg !11405 + size is %104 = sub i64 %101, %103, !dbg !11404 + global reuse is 0x08 + local reuse is 5.076920e+00, 4.061536e+01 after adjustment; scaled local reuse is 0x028 + reuse distance is 0x09 + arg 3 (0.000000e+00, 0.000000e+00; 1.423303e+03, 0.000000e+00) is %119 = invoke i8* @omp_target_alloc(i64 %118, i32 signext -100) + to label %120 unwind label %299, !dbg !11431 + size is %118 = sub i64 %115, %117, !dbg !11430 + global reuse is 0x02 + local reuse is 8.757690e+01, 1.401230e+03 after adjustment; scaled local reuse is 0x579 + reuse distance is 0x09 + arg 4 (0.000000e+00, 0.000000e+00; 7.425931e+02, 0.000000e+00) is %128 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %129 unwind label %303, !dbg !11449 + size is %127 = shl i64 %69, 3, !dbg !11448 + global reuse is 0x03 + local reuse is 4.569230e+01, 3.655384e+02 after adjustment; scaled local reuse is 0x16d + reuse distance is 0x09 + arg 5 (0.000000e+00, 0.000000e+00; 0.000000e+00, 8.251031e+01) is %140 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %141 unwind label %311, !dbg !11478 + size is %127 = shl i64 %69, 3, !dbg !11448 + global reuse is 0x07 + local reuse is 5.076920e+00, 4.061536e+01 after adjustment; scaled local reuse is 0x028 + reuse distance is 0x09 + arg 6 (0.000000e+00, 0.000000e+00; 6.188273e+01, 0.000000e+00) is %134 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %135 unwind label %307, !dbg !11462 + size is %127 = shl i64 %69, 3, !dbg !11448 + global reuse is 0x09 + local reuse is 3.807690e+00, 3.046152e+01 after adjustment; scaled local reuse is 0x01e + reuse distance is 0x09 + arg 7 (0.000000e+00, 0.000000e+00; 1.918144e+03, 2.475302e+02) is %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %146 unwind label %313, !dbg !11486 + size is %144 = shl i64 %69, 4, !dbg !11485 + global reuse is 0x01 + local reuse is 1.078710e+02, 1.725936e+03 after adjustment; scaled local reuse is 0x6bd + reuse distance is 0x01 + arg 8 (0.000000e+00, 0.000000e+00; 2.062750e+02, 1.650201e+02) is %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %152 unwind label %315, !dbg !11502 + size is %144 = shl i64 %69, 4, !dbg !11485 + global reuse is 0x04 + local reuse is 2.538460e+00, 4.061536e+01 after adjustment; scaled local reuse is 0x028 + reuse distance is 0x01 + arg 10 (0.000000e+00, 0.000000e+00; 4.125515e+01, 4.125515e+01) is %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100) + to label %157 unwind label %317, !dbg !11510 + size is %127 = shl i64 %69, 3, !dbg !11448 + global reuse is 0x06 + local reuse is 5.076920e+00, 4.061536e+01 after adjustment; scaled local reuse is 0x028 + reuse distance is 0x09 + target call: %325 = invoke i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z20distUpdateLocalCinfolP4CommPKS__l436.region_id, i32 3, i8** nonnull %184, i8** nonnull %186, i64* nonnull %188, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.15.3, i64 0, i64 0), i32 0, i32 0) + to label %326 unwind label %319, !dbg !11667 +@.offload_maptypes.15.3 = private unnamed_addr constant [3 x i64] [i64 800, i64 7696921137187, i64 7696751280161] + arg 1 (0.000000e+00, 0.000000e+00; 1.918144e+03, 2.475302e+02) is %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %146 unwind label %313, !dbg !11486 + size is %144 = shl i64 %69, 4, !dbg !11485 + global reuse is 0x01 + local reuse is 2.030760e+01, 3.249216e+02 after adjustment; scaled local reuse is 0x144 + reuse distance is 0x07 + arg 2 (0.000000e+00, 0.000000e+00; 2.062750e+02, 1.650201e+02) is %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100) + to label %152 unwind label %315, !dbg !11502 + size is %144 = shl i64 %69, 4, !dbg !11485 + global reuse is 0x04 + local reuse is 1.015380e+01, 1.624608e+02 after adjustment; scaled local reuse is 0x0a2 + reuse distance is 0x07 diff --git a/miniVite/main.cpp b/miniVite/main.cpp new file mode 100644 index 0000000..eb695f3 --- /dev/null +++ b/miniVite/main.cpp @@ -0,0 +1,252 @@ +// *********************************************************************** +// +// miniVite +// +// *********************************************************************** +// +// Copyright (2018) Battelle Memorial Institute +// All rights reserved. +// +// Redistribution and use in source and binary forms, with or without +// modification, are permitted provided that the following conditions +// are met: +// +// 1. Redistributions of source code must retain the above copyright +// notice, this list of conditions and the following disclaimer. +// +// 2. Redistributions in binary form must reproduce the above copyright +// notice, this list of conditions and the following disclaimer in the +// documentation and/or other materials provided with the distribution. +// +// 3. Neither the name of the copyright holder nor the names of its +// contributors may be used to endorse or promote products derived from +// this software without specific prior written permission. +// +// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS +// FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE +// COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, +// INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, +// BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; +// LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER +// CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +// LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN +// ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +// POSSIBILITY OF SUCH DAMAGE. +// +// ************************************************************************ + + +#include +#include +#include + +#include +#include + +#include +#include +#include +#include + +#include +#include + +//#include "dspl.hpp" +//#include "dspl_gpu.hpp" +#include "dspl_gpu_kernel.hpp" + +static std::string inputFileName; +static int me, nprocs; +static int ranksPerNode = 1; +static GraphElem nvRGG = 0; +static bool generateGraph = false; +static int randomEdgePercent = 0; +static bool randomNumberLCG = false; +static bool isUnitEdgeWeight = true; +static double threshold = 1.0E-6; + +// parse command line parameters +static void parseCommandLine(const int argc, char * const argv[]); + +int main(int argc, char *argv[]) +{ + double t0, t1, t2, t3, ti = 0.0; + + MPI_Init(&argc, &argv); + + MPI_Comm_size(MPI_COMM_WORLD, &nprocs); + MPI_Comm_rank(MPI_COMM_WORLD, &me); + + parseCommandLine(argc, argv); + + createCommunityMPIType(); + double td0, td1, td, tdt; + + MPI_Barrier(MPI_COMM_WORLD); + td0 = MPI_Wtime(); + + Graph* g = nullptr; + + // generate graph only supports RGG as of now + if (generateGraph) { + GenerateRGG gr(nvRGG); + g = gr.generate(randomNumberLCG, isUnitEdgeWeight, randomEdgePercent); + //g->print(false); + + if (me == 0) { + std::cout << "**********************************************************************" << std::endl; + std::cout << "Generated Random Geometric Graph with d: " << gr.get_d() << std::endl; +#ifndef PRINT_DIST_STATS + const GraphElem nv = g->get_nv(); + const GraphElem ne = g->get_ne(); + std::cout << "Number of vertices: " << nv << std::endl; + std::cout << "Number of edges: " << ne << std::endl; +#endif + //std::cout << "Sparsity: "<< (double)((double)nv / (double)(nvRGG*nvRGG))*100.0 <<"%"<< std::endl; + //std::cout << "Average degree: " << (ne / nv) << std::endl; + } + + MPI_Barrier(MPI_COMM_WORLD); + } + else { // read input graph + BinaryEdgeList rm; + g = rm.read(me, nprocs, ranksPerNode, inputFileName); + //g->print(); + } + +#ifdef PRINT_DIST_STATS + g->print_dist_stats(); +#endif + assert(g != nullptr); + + MPI_Barrier(MPI_COMM_WORLD); +#ifdef DEBUG_PRINTF + assert(g); +#endif + td1 = MPI_Wtime(); + td = td1 - td0; + + MPI_Reduce(&td, &tdt, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD); + + if (me == 0) { + if (!generateGraph) + std::cout << "Time to read input file and create distributed graph (in s): " + << (tdt/nprocs) << std::endl; + else + std::cout << "Time to generate distributed graph of " + << nvRGG << " vertices (in s): " << (tdt/nprocs) << std::endl; + } + + double currMod = -1.0; + double prevMod = -1.0; + double total = 0.0; + + std::vector ssizes, rsizes, svdata, rvdata; +#if defined(USE_MPI_RMA) + MPI_Win commwin; +#endif + size_t ssz = 0, rsz = 0; + int iters = 0; + + MPI_Barrier(MPI_COMM_WORLD); + + t1 = MPI_Wtime(); + + std::cout << "Size: " << sizeof(Edge) << " : " << sizeof(GraphElem) << std::endl; +#if defined(USE_MPI_RMA) + currMod = distLouvainMethod(me, nprocs, *g, ssz, rsz, ssizes, rsizes, + svdata, rvdata, currMod, threshold, iters, commwin); +#else + currMod = distLouvainMethod(me, nprocs, *g, ssz, rsz, ssizes, rsizes, + svdata, rvdata, currMod, threshold, iters); +#endif + MPI_Barrier(MPI_COMM_WORLD); + t0 = MPI_Wtime(); + + if(me == 0) { + std::cout << "Modularity: " << currMod << ", Iterations: " + << iters << ", Time (in s): "< 0) + generateGraph = true; + break; + case 'w': + isUnitEdgeWeight = false; + break; + case 'l': + randomNumberLCG = true; + break; + case 'p': + randomEdgePercent = atoi(optarg); + break; + default: + assert(0 && "Should not reach here!!"); + break; + } + } + + if (me == 0 && (argc == 1)) { + std::cerr << "Must specify some options." << std::endl; + MPI_Abort(MPI_COMM_WORLD, -99); + } + + if (me == 0 && !generateGraph && inputFileName.empty()) { + std::cerr << "Must specify a binary file name with -f or provide parameters for generating a graph." << std::endl; + MPI_Abort(MPI_COMM_WORLD, -99); + } + + if (me == 0 && !generateGraph && randomNumberLCG) { + std::cerr << "Must specify -g for graph generation using LCG." << std::endl; + MPI_Abort(MPI_COMM_WORLD, -99); + } + + if (me == 0 && !generateGraph && randomEdgePercent) { + std::cerr << "Must specify -g for graph generation first to add random edges to it." << std::endl; + MPI_Abort(MPI_COMM_WORLD, -99); + } + + if (me == 0 && !generateGraph && !isUnitEdgeWeight) { + std::cerr << "Must specify -g for graph generation first before setting edge weights." << std::endl; + MPI_Abort(MPI_COMM_WORLD, -99); + } + + if (me == 0 && generateGraph && ((randomEdgePercent < 0) || (randomEdgePercent >= 100))) { + std::cerr << "Invalid random edge percentage for generated graph!" << std::endl; + MPI_Abort(MPI_COMM_WORLD, -99); + } +} // parseCommandLine diff --git a/miniVite/miniVite b/miniVite/miniVite new file mode 100755 index 0000000..3b3b798 Binary files /dev/null and b/miniVite/miniVite differ diff --git a/miniVite/miniVite_alloc b/miniVite/miniVite_alloc new file mode 100755 index 0000000..3b3b798 Binary files /dev/null and b/miniVite/miniVite_alloc differ diff --git a/miniVite/miniVite_noalloc b/miniVite/miniVite_noalloc new file mode 100755 index 0000000..a0f31ef Binary files /dev/null and b/miniVite/miniVite_noalloc differ diff --git a/miniVite/run b/miniVite/run new file mode 100644 index 0000000..7d738a1 --- /dev/null +++ b/miniVite/run @@ -0,0 +1,15 @@ +LIBOMPTARGET_DEBUG=1 LLD_GPU_MODE=SDEV bsub -nnodes 1 -P GEN010SOLLVE -J km -W 120 -q batch -o log jsrun -n 1 -g 6 nvprof ./miniVite -n 50000000 +LLD_GPU_MODE=UM mpirun -n 1 nvprof ./miniVite -n 50000000 + +grep Time: summit/alloc_032819_large.log | awk '{print $2}' | v2m 11 3 2 +grep "Host To Device" summit/alloc_032819_sm.log | awk '{print $6}' | awk -F "m" '{print $1}' | v2m 4 2 3 +grep "Device To Host" summit/alloc_032819_sm.log | awk '{print $6}' | awk -F "m" '{print $1}' | v2m 5 +grep "Gpu page fault groups" summit/alloc_032819_sm.log | awk '{print $6}' | awk -F "m|s" '{print $1}' +grep "cuMemPrefetchAsync" summit/alloc_032819_sm.log | awk '{print $2}' | awk -F "m|s" '{print $1}' +grep "cuMemPrefetchAsync" summit/alloc_032819_sm.log | awk '{print $4}' | awk -F "m|s" '{print $1}' +grep "cuMemcpyHtoD" summit/alloc_032819_sm.log | awk '{print $4}' | awk -F "m|s" '{print $1}' | v2m 4 4 3 + +grep "Host To Device" summit/alloc_032819_large.log | awk '{print $6}' | awk -F "m" '{print $1}' | v2m 11 2 3 +grep "Device To Host" summit/alloc_032819_large.log | awk '{print $6}' | awk -F "m" '{print $1}' | v2m 11 2 3 +grep "Gpu page fault groups" summit/alloc_032819_large.log | awk '{print $6}' | awk -F "m|s" '{print $1}'| v2m 11 3 3 +grep "Host To Device" summit/alloc_032819_large.log | awk '{print $5}' | awk -F "G" '{print $1}' | v2m 11 2 3 diff --git a/miniVite/run.sh b/miniVite/run.sh new file mode 100644 index 0000000..e4b907a --- /dev/null +++ b/miniVite/run.sh @@ -0,0 +1,32 @@ +#!/bin/bash + +log0="summit/alloc_051919_lru_sm.log" +log1="summit/alloc_051919_lru_la.log" + +cd /ccs/home/lld/apps/miniVite + +for(( j=0; j<3; j++ )) +do + bsub -o $log0 submit_sm.lsf + sleep 1 + job_num=`bjob | grep lld | grep mnV | wc -l` + while [ $job_num -ne 0 ] + do + sleep 20 + job_num=`bjob | grep lld | grep mnV | wc -l` + done + for(( i=50000000; i<=150000000; i+=10000000 )) + do + sed "s/input/$i/" < submit_one.lsf > temp.lsf + bsub -o $log1 temp.lsf + sleep 1 + job_num=`bjob | grep lld | grep mnV | wc -l` + while [ $job_num -ne 0 ] + do + sleep 20 + job_num=`bjob | grep lld | grep mnV | wc -l` + done + done +done + +cd - diff --git a/miniVite/run2.sh b/miniVite/run2.sh new file mode 100644 index 0000000..4bb64d5 --- /dev/null +++ b/miniVite/run2.sh @@ -0,0 +1,21 @@ +#!/bin/bash + +log1="summit/all_032619_1_2.log" +log2="summit/all_032619_2_2.log" + +cd /ccs/home/lld/apps/miniVite + +for(( j=0; j<3; j++ )) +do + bsub -o $log1 submit_mid2.lsf + bsub -o $log2 submit_hu.lsf + sleep 1 + job_num=`bjob | grep lld | wc -l` + while [ $job_num -ne 0 ] + do + sleep 30 + job_num=`bjob | grep lld | wc -l` + done +done + +cd - diff --git a/miniVite/stats b/miniVite/stats new file mode 100644 index 0000000..57aaf9c --- /dev/null +++ b/miniVite/stats @@ -0,0 +1,2 @@ +grep Time: summit/alloc_040419_sm.log | awk '{print $2}' | v2m 4 7 3 +grep Time: summit/alloc_040519_la.log | awk '{print $2}' | v2m 11 6 3 diff --git a/miniVite/submit.sh b/miniVite/submit.sh new file mode 100644 index 0000000..a039b51 --- /dev/null +++ b/miniVite/submit.sh @@ -0,0 +1,23 @@ +#!/bin/bash + +log="summit/all_032619.log" +opt="-nnodes 1 -P GEN010SOLLVE -J km -W 120 -q batch -o $log" + +cd /ccs/home/lld/apps/miniVite + +for(( j=0; j<3; j++ )) +do + for(( i=5000000; i<=150000000; i+=5000000 )) + do + LLD_GPU_MODE=UM bsub $opt jsrun -n1 -g6 nvprof ./miniVite -n 10000000 + sleep 1 + job_num=`bjob | grep lld | wc -l` + while [ $job_num -ne 0 ] + do + sleep 30 + job_num=`bjob | grep lld | wc -l` + done + done +done + +cd - diff --git a/miniVite/utils.hpp b/miniVite/utils.hpp new file mode 100644 index 0000000..50337e4 --- /dev/null +++ b/miniVite/utils.hpp @@ -0,0 +1,328 @@ +// *********************************************************************** +// +// miniVite +// +// *********************************************************************** +// +// Copyright (2018) Battelle Memorial Institute +// All rights reserved. +// +// Redistribution and use in source and binary forms, with or without +// modification, are permitted provided that the following conditions +// are met: +// +// 1. Redistributions of source code must retain the above copyright +// notice, this list of conditions and the following disclaimer. +// +// 2. Redistributions in binary form must reproduce the above copyright +// notice, this list of conditions and the following disclaimer in the +// documentation and/or other materials provided with the distribution. +// +// 3. Neither the name of the copyright holder nor the names of its +// contributors may be used to endorse or promote products derived from +// this software without specific prior written permission. +// +// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS +// FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE +// COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, +// INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, +// BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; +// LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER +// CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +// LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN +// ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +// POSSIBILITY OF SUCH DAMAGE. +// +// ************************************************************************ + +#pragma once +#ifndef UTILS_HPP +#define UTILS_HPP + +#define PI (3.14159) +#define MAX_PRINT_NEDGE (100000) + +// Read https://en.wikipedia.org/wiki/Linear_congruential_generator#Period_length +// about choice of LCG parameters +// From numerical recipes +// TODO FIXME investigate larger periods +#define MLCG (2147483647) // 2^31 - 1 +#define ALCG (16807) // 7^5 +#define BLCG (0) + +#define SR_UP_TAG 100 +#define SR_DOWN_TAG 101 +#define SR_SIZES_UP_TAG 102 +#define SR_SIZES_DOWN_TAG 103 +#define SR_X_UP_TAG 104 +#define SR_X_DOWN_TAG 105 +#define SR_Y_UP_TAG 106 +#define SR_Y_DOWN_TAG 107 +#define SR_LCG_TAG 108 + +#include +#include +#include + +#ifdef USE_32_BIT_GRAPH +using GraphElem = int32_t; +using GraphWeight = float; +const MPI_Datatype MPI_GRAPH_TYPE = MPI_INT32_T; +const MPI_Datatype MPI_WEIGHT_TYPE = MPI_FLOAT; +#else +using GraphElem = int64_t; +using GraphWeight = double; +const MPI_Datatype MPI_GRAPH_TYPE = MPI_INT64_T; +const MPI_Datatype MPI_WEIGHT_TYPE = MPI_DOUBLE; +#endif + +extern unsigned seed; + +// Is nprocs a power-of-2? +int is_pwr2(int nprocs) +{ return ((nprocs != 0) && !(nprocs & (nprocs - 1))); } + +// return unint32_t seed +GraphElem reseeder(unsigned initseed) +{ + std::seed_seq seq({initseed}); + std::vector seeds(1); + seq.generate(seeds.begin(), seeds.end()); + + return (GraphElem)seeds[0]; +} + +// Local random number generator +template +T genRandom(T lo, T hi) +{ + thread_local static G gen(seed); + using Dist = typename std::conditional + < + std::is_integral::value + , std::uniform_int_distribution + , std::uniform_real_distribution + >::type; + + thread_local static Dist utd {}; + return utd(gen, typename Dist::param_type{lo, hi}); +} + +// Parallel Linear Congruential Generator +// x[i] = (a*x[i-1] + b)%M +class LCG +{ + public: + LCG(unsigned seed, GraphWeight* drand, + GraphElem n, MPI_Comm comm = MPI_COMM_WORLD): + seed_(seed), drand_(drand), n_(n) + { + comm_ = comm; + MPI_Comm_size(comm_, &nprocs_); + MPI_Comm_rank(comm_, &rank_); + + // allocate long random numbers + rnums_.resize(n_); + + // init x0 + if (rank_ == 0) + x0_ = reseeder(seed_); + + // step #1: bcast x0 from root + MPI_Bcast(&x0_, 1, MPI_GRAPH_TYPE, 0, comm_); + + // step #2: parallel prefix to generate first random value per process + parallel_prefix_op(); + } + + ~LCG() { rnums_.clear(); } + + // matrix-matrix multiplication for 2x2 matrices + void matmat_2x2(GraphElem c[], GraphElem a[], GraphElem b[]) + { + for (int i = 0; i < 2; i++) { + for (int j = 0; j < 2; j++) { + GraphElem sum = 0; + for (int k = 0; k < 2; k++) { + sum += a[i*2+k]*b[k*2+j]; + } + c[i*2+j] = sum; + } + } + } + + // x *= y + void matop_2x2(GraphElem x[], GraphElem y[]) + { + GraphElem tmp[4]; + matmat_2x2(tmp, x, y); + memcpy(x, tmp, sizeof(GraphElem[4])); + } + + // find kth power of a 2x2 matrix + void mat_power(GraphElem mat[], GraphElem k) + { + GraphElem tmp[4]; + memcpy(tmp, mat, sizeof(GraphElem[4])); + + // mat-mat multiply k times + for (GraphElem p = 0; p < k-1; p++) + matop_2x2(mat, tmp); + } + + // parallel prefix for matrix-matrix operation + // `x0 is the very first random number in the series + // `ab is a 2-length array which stores a and b + // `n_ is (n/p) + // `rnums is n_ length array which stores the random nums for a process + void parallel_prefix_op() + { + GraphElem global_op[4]; + global_op[0] = ALCG; + global_op[1] = 0; + global_op[2] = BLCG; + global_op[3] = 1; + + mat_power(global_op, n_); // M^(n/p) + GraphElem prefix_op[4] = {1,0,0,1}; // I in row-major + + GraphElem global_op_recv[4]; + + int steps = (int)(log2((double)nprocs_)); + + for (int s = 0; s < steps; s++) { + + int mate = rank_^(1 << s); // toggle the sth LSB to find my neighbor + + // send/recv global to/from mate + MPI_Sendrecv(global_op, 4, MPI_GRAPH_TYPE, mate, SR_LCG_TAG, + global_op_recv, 4, MPI_GRAPH_TYPE, mate, SR_LCG_TAG, + comm_, MPI_STATUS_IGNORE); + + matop_2x2(global_op, global_op_recv); + + if (mate < rank_) + matop_2x2(prefix_op, global_op_recv); + + MPI_Barrier(comm_); + } + + // populate the first random number entry for each process + // (x0*a + b)%P + if (rank_ == 0) + rnums_[0] = x0_; + else + rnums_[0] = (x0_*prefix_op[0] + prefix_op[2])%MLCG; + } + + // generate random number based on the first + // random number on a process + // TODO check the 'quick'n dirty generators to + // see if we can avoid the mod + void generate() + { +#if defined(PRINT_LCG_LONG_RANDOM_NUMBERS) + for (int k = 0; k < nprocs_; k++) { + if (k == rank_) { + std::cout << "------------" << std::endl; + std::cout << "Process#" << rank_ << " :" << std::endl; + std::cout << "------------" << std::endl; + std::cout << rnums_[0] << std::endl; + for (GraphElem i = 1; i < n_; i++) { + rnums_[i] = (rnums_[i-1]*ALCG + BLCG)%MLCG; + std::cout << rnums_[i] << std::endl; + } + } + MPI_Barrier(comm_); + } +#else + for (GraphElem i = 1; i < n_; i++) { + rnums_[i] = (rnums_[i-1]*ALCG + BLCG)%MLCG; + } +#endif + GraphWeight mult = 1.0 / (GraphWeight)(1.0 + (GraphWeight)(MLCG-1)); + +#if defined(PRINT_LCG_DOUBLE_RANDOM_NUMBERS) + for (int k = 0; k < nprocs_; k++) { + if (k == rank_) { + std::cout << "------------" << std::endl; + std::cout << "Process#" << rank_ << " :" << std::endl; + std::cout << "------------" << std::endl; + + for (GraphElem i = 0; i < n_; i++) { + drand_[i] = (GraphWeight)((GraphWeight)fabs(rnums_[i]) * mult ); // 0-1 + std::cout << drand_[i] << std::endl; + } + } + MPI_Barrier(comm_); + } +#else + for (GraphElem i = 0; i < n_; i++) + drand_[i] = (GraphWeight)((GraphWeight)fabs(rnums_[i]) * mult); // 0-1 +#endif + } + + // copy from drand_[idx_start] to new_drand, + // rescale the random numbers between lo and hi + void rescale(GraphWeight* new_drand, GraphElem idx_start, GraphWeight const& lo) + { + GraphWeight range = (1.0 / (GraphWeight)nprocs_); + +#if defined(PRINT_LCG_DOUBLE_LOHI_RANDOM_NUMBERS) + for (int k = 0; k < nprocs_; k++) { + if (k == rank_) { + std::cout << "------------" << std::endl; + std::cout << "Process#" << rank_ << " :" << std::endl; + std::cout << "------------" << std::endl; + + for (GraphElem i = idx_start, j = 0; i < n_; i++, j++) { + new_drand[j] = lo + (GraphWeight)(range * drand_[i]); + std::cout << new_drand[j] << std::endl; + } + } + MPI_Barrier(comm_); + } +#else + for (GraphElem i = idx_start, j = 0; i < n_; i++, j++) + new_drand[j] = lo + (GraphWeight)(range * drand_[i]); // lo-hi +#endif + } + + private: + MPI_Comm comm_; + int nprocs_, rank_; + unsigned seed_; + GraphElem n_, x0_; + GraphWeight* drand_; + std::vector rnums_; +}; + +// locks +#ifdef USE_OPENMP_LOCK +#else +#ifdef USE_SPINLOCK +#include +std::atomic_flag lkd_ = ATOMIC_FLAG_INIT; +#else +#include +std::mutex mtx_; +#endif +void lock() { +#ifdef USE_SPINLOCK + while (lkd_.test_and_set(std::memory_order_acquire)) { ; } +#else + mtx_.lock(); +#endif +} +void unlock() { +#ifdef USE_SPINLOCK + lkd_.clear(std::memory_order_release); +#else + mtx_.unlock(); +#endif +} +#endif + +#endif // UTILS